Re: XML Primer (was Keys. (derives from Re: Sequences of combining characters.))

2002-09-27 Thread William Overington

Shawn Steele wrote to the [EMAIL PROTECTED] list, not directly to me, yet
began by writing.

Mr. Overington,

There is then a long document of very helpful information, for which I am
grateful.

Mr Steele then concludes with the following.

I hope that this example improves your understanding of XML and how it may
be applied to your inventions.  As others have mentioned, this topic is
digressing from the purpose of this message board and would be best
discussed off line or in a different forum.

Well, a letter addressed to me could have been sent by private email.

- Shawn

Shawn Steele
Software Developer Engineer
Microsoft

Unfortunately, this is then followed by the following.

My comments in no way endorse the original

Well, that is fine, the letter has been posted to the Unicode list from a
Microsoft address, so a clarification makes the situation clear just in case
anyone had thought that in some way it might.

and are not intended to confer legitimacy,

Ah!  That is not fine.  The original is entirely legitimate and there is no
need for legitimacy to be conferred at all, also the conferring of
legitimacy is not something which is within the powers of Microsoft to
confer, as Microsoft is a corporation and does not vote in public elections,
let alone have jurisdiction in such matters.  Mentioning legitimacy in that
way in a document from Microsoft, a member of the Unicode Consortium, is
very unfair.

rather they are merely intended to be educational.

Well, they are merely intended to be educational.  No rather about it.

This posting is provided AS IS with no warranties,

Well, that is fine, the letter has been posted to the Unicode list from a
Microsoft address, so a clarification makes the situation clear just in case
anyone had thought that in some way it might.

and confers no rights.

What rights are being referred to here?

William Overington

27 September 2002











Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread William Overington

Peter Constable commented as follows.


On 09/26/2002 06:05:45 AM William Overington wrote:

Dallas is 6 hours behind England on the clock.


I'm going to refrain from commenting on anything beyond the markup issues

As you wish.  Though did you stick to that even in the same sentence?

-- and I'm continuing with that only because it's an easy follow-on to what
I already wrote,

As you wish.

even though there is every indication that the sensibility
of it will be ignored.

This did not appear to have meaning.

I checked on the meaning of the word sensibility just to make sure.

Did you intend to convey the meaning the good sense of what I write rather
than the sensibility of it?

Yet what indication whatsoever do you have that I ignore what you write?

I do not always agree with you, yet where specific references to documents
on the web are made I always attempt to obtain them and study the points you
make.

Certainly, I may not agree with you.  Sometimes I agree, sometimes I do not
agree and sometimes I am undecided in a matter.  That surely is the nature
of critical scholarship and research.



A document would contain a sequence such as follows.

U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2


You could just as easily have used

S C=12001London/S

or

S C=12001 P1=London/

which are only slightly more verbose, but which follow a widely-implemented
standard that can be parsed by lots of existing software, for which there
are a large number of tools available, and which a vast number of
indivuals, businesses and other agencies have an interest in. Your markup
convention is completely proprietary,

Thank you.  That is excellent.  I designed the comet circumflex key with the
specific intention that it was creatively original whilst being expressible
using a standard all-Unicode font.

it has no existing software support,
and nobody but you has any interest in it.

You have no basis whatsoever for claiming that nobody other than me has any
interest in it.  Maybe you are not interested, maybe some people you know
are not interested, yet I feel that it is unfair for you to make such a
statement without evidence when writing from an established organization as
that remark may prejudice people from taking an interest in helping to
develop the idea because of a political dimension of going against the tide.
You have your position and I feel that you should allow someone who does not
have such a position an even-handed chance to put forward an idea and have
it considered on its merits.

You tell me which one is more
likely to result in productive work and adoption by others.

Likelihood of success and what actually happens are not the same thing.  I
do not know which is more likely as I do not know of what has happened
already.  Some people may have deleted the email, some may have read it and
disregarded it, yet it is possible that some people might have tried to
produce a comet circumflex button on the screen using an all-Unicode font
and might be considering the possibilities of how the system could be
applied or might even be writing an experimental software program which can
take comet circumflex sequences and process them through a database.

Look, for example, at The Respectfully Experiment in the Unicode mailing
list archives.  There a result was assumed and something different was
observed in practice.


that it is
because I am an inventor, interested in pushing the envelope as to what is
possible scientifically and technologically.

Marco asked me a specific question, so I answered what he had asked.


Perhaps there is an [EMAIL PROTECTED] list somewhere where you might
find greater interest in your ideas than here.

That is unfair of you.  You have chosen to respond to my posts and I have
answered the questions which you asked.

You even stated in the same post.

quote

I'm going to refrain from commenting on anything beyond the markup issues

end quote

The topic of keys generally which I have introduced is potentially a
far-reaching development in the application of markup in Unicode based
systems.  My own comet circumflex system may be highly useful in business
communications and distance education.  I am happy to respond to questions
and to consider documents which people suggest.

None of us here mind
invention, but I think most would believe that inventiveness is most
productive when building off the advancement of others rather than
reinventing wheels or widgets. XML exists, and it works.

XML exists and it uses U+003C in a way that makes using U+003C with the
meaning LESS-THAN SIGN in body text intermixed with markup sections awkward.
That feature of XML may not matter for situations involving encoding simply
literary works, yet for a comprehensive system which can include the U+003C
character with the meaning LESS-THAN SIGN in body text and in markup
parameters, it does not suit my need.


Beside the fact that your proposed markup convention is not a good idea, it
has nothing 

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread William Overington

Peter Constable wrote as follows.


On 09/26/2002 03:42:16 AM William Overington wrote:


Well, it might have been 03:42:16 AM where you are, indeed it probably was,
as Dallas is six hours behind England on the clock, but I would not want
people to think that I write my posts in the middle of the night!


On the one hand, you say

XML does not suit my specific need as far as I can tell.


But you also said

Documents with the code sequence are intended to be sent over the internet
as email, used as web pages and broadcast in multimedia broadcasts over a
direct broadcast satellite system, so the codes which you suggest would be
unsuitable.

In that quote the codes which you suggest  was your list of specific
Unicode code points as follows.

quote

Sorry to be blunt, but that's silly. If you need a special-purpose
character (a code-sequence, to be more precise) for use within your
specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1,
2FFFE...  10FFFE, 10. They are non-characters available for exactly
this use.


end quote

I maintain that they are unsuitable for use in documents which are to be
sent from one end user to another.

Yet the first part of my sentence which you have quoted could by going to
the final comma and converting it to a full stop form a sentence on its own
as follows.

Documents with the code sequence are intended to be sent over the internet
as email, used as web pages and broadcast in multimedia broadcasts over a
direct broadcast satellite system.

So, I will reason from that.

You also quote me as stating the following sentence.

XML does not suit my specific need as far as I can tell.

I am happy with that.

The two sentences are entirely consistent.

Are you perhaps trying to make a deduction by the fallacy of the
undistributed middle, along the following lines.

William's need is a markup system.
XML is a markup system.

William's need is XML.

It may well be that XML could be used to carry the comet circumflex code
numbers which I am devising.  I am not saying that it could not be so used.

I am simply saying that XML, as I understand it, does not suit my specific
need.

For example, if I understand it correctly, XML uses U+003C in a document in
such a manner that its use for the meaning LESS-THAN SIGN in the body of the
text cannot be used directly.  For me, that is a major limitation of XML.
Now, I am not trying to make some big issue out of this by criticising XML
as I am not trying to criticise XML, yet to my mind that is a very big
legacy issue of which I do not want to have the problem with my research in
language translation and distance education.  Maybe one day Unicode will
encode special XML opening and closing angle brackets so that XML can
operate without that problem.  However, as XML uses the U+003C character in
that manner at the moment, for me it is a problem and it has led me to use
the key method using a comet circumflex key.

Also, I do not need to have all those  characters and = characters and /
characters within messages.

One of the things that is especially useful about XML and related
technologies is the facility with which data can be repurposed. You have
one schema for marking up data, and stylesheets that transform it as needed
for different publishing / usage contexts.

Also, I don't see how it can be that a character sequence such as U+003C
U+0061 U+003E can't be useful to you when some ridiculous character
sequence like U+2604 U+0302 U+20E3 is.


Well, U+2604 U+0302 U+20E3 is not ridiculous.  It is entirely permissible
within the Unicode specification.  I have used combining characters
productively, in accordance with the rules set out in the specification.
Please see section 7.9.  The button displays using an all-Unicode font.  If
you think it ridiculous then maybe that is good evidence of its originality
as a piece of creativity.  A comet circumflex key could be viewed as a piece
of original art.  I specifically designed it so as to be a design which
involves an inventive leap so as to produce something new and unexpected,
which someone skilled in the art would not produce as the application of
skill in the existing art without invention, yet which would display
properly using an all-Unicode font.

The sequence U+003C U+0061 U+003E is unsuitable because it begins with a
U+003C character and  I do not want the use of U+003C to mean LESS-THAN SIGN
to be unavailable in a simple direct manner.  I want to be able to use the
comet circumflex translation system in documents which contain mathematics
and software listings as well as literary text.  So, I have decided to use a
straightforward system which allows me to do that without problems.

An added bonus of using the comet circumflex key is that documents
containing comet circumflex codes do not necessarily need to contain any
characters from the Latin alphabet.

William Overington

27 September 2002










Re: glyph selection for Unicode in browsers

2002-09-27 Thread John Cowan

Tex Texin scripsit:

 I do need to point out that user preference is problematic if it means
 that for a user to display a multilingual document, the user has to go
 thru and specify font preferences for languages they know nothing about.

How can this be avoided?  If I print a document containing a small amount
of text in Georgian (in a bibliography entry, say), I am not going to
know if the Georgian font is the most beautiful thing ever made or one
that is utterly illegible.  I have to pass it to someone who can read
Georgian and wait for the Aah! or Arrgh! as the case may be.

Or I can take the default and hope for the best.

 Just because I don't read CJK, doesn't mean I don't have legitimate
 needs to display or print CJK in a typographically correct way.
 Librarians, Commerce exchanges, mailing lists, localizers, etc.

Since the issue is not really a matter of language, but of typographic
tradition (see John Jenkins's excellent discussion of this question at
http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing
as a typographically correct way.  In particular (as noted in the FAQ),
it is commonplace for a Japanese document that quotes Chinese text to
use Japanese-style glyphs for both languages, as this is apparently less
jarring to the average Japanese reader.

 But although you didn't quite say this, a user could provide a
 preference not for font, but language, i.e. if the script is CJK,
 display it as C or J or K (or T). And given the language the font
 mechanisms would do a reasonable thing.

That is reasonable provided you grasp what is meant by language
preference here: namely, typographical tradition preference.  It would
be like choosing between Fraktur and Antiqua when reading German text:
this too is rather broader than a mere font difference.

-- 
A mosquito cried out in his pain,   John Cowan
A chemist has poisoned my brain!  http://www.ccil.org/~cowan
The cause of his sorrow http://www.reutershealth.com
Was para-dichloro-  [EMAIL PROTECTED]
Diphenyltrichloroethane.(aka DDT)




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 10:46:42 PM Andrew Cunningham wrote:

For me, this is the crux: that browsers have not implimented the css
:lang selector.

Again, the problem is knowing just *how* they should go about doing this.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 08:55:18 PM Tex Texin wrote:

Yes code page was not a good indicator of language, but it was used that
way by some applications.

And yes, Language should not dominate font selection, it should
influence it. Other typographic preferences also must be accomodated.

I agree.


In the case of HTML, XML, CSS, ways to specify typographic preferences
exist, and language can be expressed via lang. We just need browsers
and other user agents to make use of the lang information as part of
font selection.

The difficult question is How? Do we want some means (not codepage) to know
that certain fonts are suited to particular languages? Or do we want to
make use of smart-font capabilities to allow culturally-preferred glyphs to
be selected from a font? If the latter, then some more infrastructure still
needs to be developed within APIs and layout engines.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







RE: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/26/2002 07:24:08 PM Murray Sargent wrote:

I don't think the idea is that codepage equals language. Rather codepage
equals a writing system, which consists of one or more scripts (e.g., 6
scripts for ShiftJIS). As such the codepage is a useful cue in choosing
an appropriate font for rendering text.

(Murray and I talked about this some at dinner a couple of weeks ago, so
there's some history here.)

I don't think things are quite that simple. A codepage *can* be a useful
cue in choosing an appropriate font (or in choosing typographic preferences
by whatever means). This certainly may be the case in some instances, such
as Shift JIS. But it's not always the case. For instance, cp1251 doesn't
tell you what language is involved, and isn't sufficient to determine which
italic variants of certain Cyrillic characters are needed. Similarly,
cp1250 doesn't tell you what cultural preferences should apply in relation
to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian
differ in this regard), or other diacritics (e.g. caron should have a
distinct form for Czech); and cp1252 doesn't tell you about cultural
preferences regarding cedilla (three different forms can be used for
French, but only one is acceptable for Portuguese or Catalan).

That's why I maintain that a codepage is a character set, but not a writing
system. In general, a codepage does not determine a set of rules for
writing; it just provides a vocabularly with which to work.




The bottom line is that if text was generated using a particular
codepage it's likely that the creator of that text intended the text to
be rendered with a font that supports that codepage.

Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma
and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257
and 1258. That doesn't tell you whether they're appropriate for Polish or
Lithuanian or Czech or whatever. Even the fact that they support cp1258
doesn't imply that they are appropriate for Vietnamese: e.g. the default
glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in
the way needed for Vietnamese.

I'm not saying that codepage information isn't ever useful. Obviously, you
have found it very useful. But the usefulness has limits.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/27/2002 12:27:22 AM jameskass wrote:

Don't despair.  As Peter Constable has pointed out, the infrastructure
for having browsers support language tags is already present.

Actually, my point was specifically that *part* of the infrastructure is
already present, at least in OpenType, but not *all*, either in OpenType
(meaning of language in the OT spec needs to be clarified, and
relationships between these tags and the language tags used for data e.g.
RFC 3066, need to be resolved), or in APIs (there's no way for apps to
indicate which OT language tag to apply to a run unless the app wishes to
do *all* of the OT support -- replacing e.g. Uniscribe -- itself).


Once the font specs for all this are set and fonts are released with
the necessary coverage and the shaping engines can access all of this,
the browsers are sure to quickly add support, too.

I'm not quite as optimistic in terms of how close we are to having all this
ready to go. I think there's some hard work still ahead.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]








Re: script or block detection needed for Unicode fonts

2002-09-27 Thread Peter_Constable


On 09/26/2002 11:34:42 PM jameskass wrote:

Some apps won't display a glyph from a specified font if its corresponding
Unicode Ranges Supported bit in the OS/2 table isn't set.  So, font
developers producing fonts intended to be used with such apps set the
corresponding bit even if only one glyph from the entire range is
present in the font.

Unfortunately true. E.g., for our Yi font, we included Katakana middle dot
as well as a handful of other CJK punctuation characters. But in order to
make this font work in Word 2000, we had to do things like indicate that we
support the Shift JIS codepage!



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Please stop feeding the troll

2002-09-27 Thread gpw


I appreciate that list likes to maintain a polite tone, but the attempts
to discuss William's ideas just encourage him to keep spewing his
pedantic drivel.  Either because he is a troll seeking attention or
because he is a genuine crackpot who thinks his pet ideas will find
fertile ground.   

I'll resume biting my tongue, but rest assured my silence does not
mean I accept anything he scribbles.

Geoffrey




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

John,
Thanks for commenting. Responses embedded.

John Cowan wrote:
 
 Tex Texin scripsit:
 
  I do need to point out that user preference is problematic if it means
  that for a user to display a multilingual document, the user has to go
  thru and specify font preferences for languages they know nothing about.
 
 How can this be avoided?  If I print a document containing a small amount
 of text in Georgian (in a bibliography entry, say), I am not going to
 know if the Georgian font is the most beautiful thing ever made or one
 that is utterly illegible.  I have to pass it to someone who can read
 Georgian and wait for the Aah! or Arrgh! as the case may be.
 
 Or I can take the default and hope for the best.

All I ask is the defaults be adequate. I wouldn't disallow software from
providing for users to express preferences. I am trying to avoid it
being required of users to provide preferences. Yes, most users don't
know which fonts are the best choices.

 
  Just because I don't read CJK, doesn't mean I don't have legitimate
  needs to display or print CJK in a typographically correct way.
  Librarians, Commerce exchanges, mailing lists, localizers, etc.
 
 Since the issue is not really a matter of language, but of typographic
 tradition (see John Jenkins's excellent discussion of this question at
 http://www.unicode.org/unicode/faq/han_cjk.html#3), there is no such thing
 as a typographically correct way.  In particular (as noted in the FAQ),
 it is commonplace for a Japanese document that quotes Chinese text to
 use Japanese-style glyphs for both languages, as this is apparently less
 jarring to the average Japanese reader.

Typographically correct was too strong. I am just looking for the font
to reflect the language, so CJK is displayed as either C or J or K as
indicated by HTML or XML lang tags.

With respect to the comment from John's FAQ, it is reasonable but only
for a user who is primarily or strongly a C, or J or K reader.
For many applications, such as printing labels for card catalogs or
mailing lists, the user's preference does not matter (because the
printout targets someone other than the person operating the software).
Also, for someone like myself who is not a reader, I would like text
displayed the same way each time so I stand a better chance of
recognizing it.
As more people work with multilingual data, I think more users will be
like myself.

An author of a primarily Japanese document could choose not to tag
Chinese text as Chinese, and so get a Japanese rendering of the text,
but that could hurt search engines or other applications that use
language tags for purposes other than rendering... So I stick with the
idea that text should be tagged with language appropriately, and a user
that reads Japanese and prefers to see Chinese text with Japanese glyphs
have the ability to override the language tags to affect rendering.

 
  But although you didn't quite say this, a user could provide a
  preference not for font, but language, i.e. if the script is CJK,
  display it as C or J or K (or T). And given the language the font
  mechanisms would do a reasonable thing.
 
 That is reasonable provided you grasp what is meant by language
 preference here: namely, typographical tradition preference.  It would
 be like choosing between Fraktur and Antiqua when reading German text:
 this too is rather broader than a mere font difference.

I am not a typographer, and I am just trying to point out requirements
for font selection for a typical user or at least a user that is not a
linguist, not a typographer, not a font specialist, and who wants to
display/print pan-Unicode or pan-script unicode-based text.
I am not trying to address high end publishing requirements.

I can't say if typographical tradition preference (TTP) is the correct
term for language preference. (I figure I got into enough trouble
using typographically correct.) I hope the discussion above was clear
enough. I'll let others comment on TTP, and if there is general
agreement that it is a better and more precise and accurate term, I am
fine with it. I am not familiar enough with Fraktur and Antiqua to
knowledgably comment. From what little I do know this seems to require
more than language information to decide between them. 
(I did find an interesting article on Fraktur though in trying to
understand your meaning, http://www.waldenfont.com/public/gbpmanual.pdf)


hth
tex
p.s. I am about to travel and may not have email for a few days. (A
cheer goes up from the list...)





 
 --
 A mosquito cried out in his pain,   John Cowan
 A chemist has poisoned my brain!  http://www.ccil.org/~cowan
 The cause of his sorrow http://www.reutershealth.com
 Was para-dichloro-  [EMAIL PROTECTED]
 Diphenyltrichloroethane.(aka DDT)

-- 
-
Tex Texin   cell: +1 781 789 1898   

Re: XML Primer (was Keys. (derives from Re: Sequences of combiningcharacters.))

2002-09-27 Thread Peter_Constable


On 09/27/2002 06:24:35 AM William Overington wrote:

Shawn Steele wrote to the [EMAIL PROTECTED] list, not directly to me,
yet
began by writing...


Unfortunately, this is then followed by the following...

He was only trying to be helpful, and added the kind of disclaimer one
sometimes sees in these public fora to make sure nobody assumes that his
comments reflect *anything*, including endorsement of any proposals or
ideas, on the part of the company or organisation he is associated with. He
happened to choose the word legitimacy where endorsement would probably
have been better. I think it was obvious (at least it was to me) what his
intent was. As for rights, I think he's simply saying nobody can assume
the right to expect anything in particular of Microsoft as a result of his
comments. Again, I think it was clear what his intent was. I don't think
his disclaimer needed a careful exegesis.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin



[EMAIL PROTECTED] wrote:
 
 On 09/26/2002 08:55:18 PM Tex Texin wrote:
 In the case of HTML, XML, CSS, ways to specify typographic preferences
 exist, and language can be expressed via lang. We just need browsers
 and other user agents to make use of the lang information as part of
 font selection.
 
 The difficult question is How? Do we want some means (not codepage) to know
 that certain fonts are suited to particular languages? 

Fonts already span multiple languages, so by itself would not work.

Or do we want to
 make use of smart-font capabilities to allow culturally-preferred glyphs to
 be selected from a font? If the latter, then some more infrastructure still
 needs to be developed within APIs and layout engines.

I think yes. Whereas api relied on codepages either implicitly or
explicitly, this needs to be reexamined and language should be allowed
to play a suitable role in glyph selection and font selection.




-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Jungshik Shin

On Thu, 26 Sep 2002, Tex Texin wrote:

 Yes, underlying fonts can be a Unicode architecture. That's a good
 thing, but invisible to end-users.
 I would like to keep the sense of Unicode font as meaning a font which
 supports a large number of scripts, rather than meaning one that uses
 Unicode for its mapping architecture.

 Yes, OS and browsers are getting better. My concerns center around:
 Is the mechanism for selecting fallback fonts language-sensitive, so
 that it would favor a Japanese font for Unicode Han characters that were
 tagged as lang:ja


  I'm a little at loss as to why you have the impression
that  'lang' tag has little effect on rendering of html (in
UTF-8. e.g. your page or IUC10 announcement page which used to be at
http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS
IE has been making use of 'lang' attribute(html) for a long time and
Mozilla solved the problem (although 'xml:lang' is not yet supported)
last December. In case of Mozilla(and Netscape 7), see

  http://bugzilla.mozilla.org/show_bug.cgi?id=105199  (fixed.
   where you'll find a pair of screenshots with dramatically
   different rendering results)
  http://bugzilla.mozilla.org/show_bug.cgi?id=115121
  (xml:lang : not yet fixed)
  http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header
 and  UTF-8 document)

 And are the fonts labeled so that the supported language is known?

  Judging from the discussion about the issue in Xfree86-font
list, most of modern OTFs are. Otherwise, applications (or  a library
for text rendering/font selection) can resort to a kind of mapping the
character repertoire of a font to language(s) covered as is done by
fontconfig for XFree86. For instance, characters in JIS X 0208 are all
covered, but characters from GB2312, Big5 and KS X 1001 are missing,
a font is likely to be Japanese.

 Even so, I'd still need to have a large collection of fonts then.

  Indeed that's the case. If OT lang-tag is made use of and
multiple alternative glyphs are available in a single(or
a few) pan-script Unicode font(s), you'd not have to.

  Jungshik





Re: glyph selection for Unicode in browsers

2002-09-27 Thread John Cowan

Tex Texin scripsit:

 An author of a primarily Japanese document could choose not to tag
 Chinese text as Chinese, and so get a Japanese rendering of the text,
 but that could hurt search engines or other applications that use
 language tags for purposes other than rendering...

Indeed, indeed.  Tagging (even implicit tagging) with a false language is
a very bad idea.

 So I stick with the
 idea that text should be tagged with language appropriately, and a user
 that reads Japanese and prefers to see Chinese text with Japanese glyphs
 have the ability to override the language tags to affect rendering.

The trouble is that that's the default for a Japanese reader reading
mixed-language text.  No override should be required.

 I can't say if typographical tradition preference (TTP) is the correct
 term for language preference. (I figure I got into enough trouble
 using typographically correct.) I hope the discussion above was clear
 enough. I'll let others comment on TTP, and if there is general
 agreement that it is a better and more precise and accurate term, I am
 fine with it.

My point was that it's one thing to want Chinese text displayed with
Japanese glyphs, based on a typographical-tradition preference, and it's
another thing to want the text in a Japanese-language version,
which is what setting a language preference would suggest.

 I am not familiar enough with Fraktur and Antiqua to
 knowledgably comment. From what little I do know this seems to require
 more than language information to decide between them. 

Absolutely.  The analogy is that Fraktur is quite, or nearly, illegible if
all you know how to read is Antiqua (which looks like what you are seeing
now, ordinary Latin-script type).  This makes the difference greater than a
mere font difference.

-- 
Business before pleasure, if not too bloomering long before.
--Nicholas van Rijn
John Cowan [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

Hi,
I am glad to see the issue has been given some attention.
I concluded there was a problem after experimenting with some CJK
characters that I repeated with different lang tags and could not get
any display differences unless I used non-Unicode fonts assigned to each
language. I did this with IE 6 and NS 7 and Opera (dont recall if it was
6 or 7.)

tex


Jungshik Shin wrote:
 
 On Thu, 26 Sep 2002, Tex Texin wrote:
 
  Yes, underlying fonts can be a Unicode architecture. That's a good
  thing, but invisible to end-users.
  I would like to keep the sense of Unicode font as meaning a font which
  supports a large number of scripts, rather than meaning one that uses
  Unicode for its mapping architecture.
 
  Yes, OS and browsers are getting better. My concerns center around:
  Is the mechanism for selecting fallback fonts language-sensitive, so
  that it would favor a Japanese font for Unicode Han characters that were
  tagged as lang:ja
 
   I'm a little at loss as to why you have the impression
 that  'lang' tag has little effect on rendering of html (in
 UTF-8. e.g. your page or IUC10 announcement page which used to be at
 http://www.unicode.org/iuc/iuc10/x-utf8.html) by major browsers. MS
 IE has been making use of 'lang' attribute(html) for a long time and
 Mozilla solved the problem (although 'xml:lang' is not yet supported)
 last December. In case of Mozilla(and Netscape 7), see
 
   http://bugzilla.mozilla.org/show_bug.cgi?id=105199  (fixed.
where you'll find a pair of screenshots with dramatically
different rendering results)
   http://bugzilla.mozilla.org/show_bug.cgi?id=115121
   (xml:lang : not yet fixed)
   http://bugzilla.mozilla.org/show_bug.cgi?id=122779 (C-L http header
  and  UTF-8 document)
 
  And are the fonts labeled so that the supported language is known?
 
   Judging from the discussion about the issue in Xfree86-font
 list, most of modern OTFs are. Otherwise, applications (or  a library
 for text rendering/font selection) can resort to a kind of mapping the
 character repertoire of a font to language(s) covered as is done by
 fontconfig for XFree86. For instance, characters in JIS X 0208 are all
 covered, but characters from GB2312, Big5 and KS X 1001 are missing,
 a font is likely to be Japanese.
 
  Even so, I'd still need to have a large collection of fonts then.
 
   Indeed that's the case. If OT lang-tag is made use of and
 multiple alternative glyphs are available in a single(or
 a few) pan-script Unicode font(s), you'd not have to.
 
   Jungshik

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin



John Cowan wrote:
 
 Tex Texin scripsit:
 
  An author of a primarily Japanese document could choose not to tag
  Chinese text as Chinese, and so get a Japanese rendering of the text,
  but that could hurt search engines or other applications that use
  language tags for purposes other than rendering...
 
 Indeed, indeed.  Tagging (even implicit tagging) with a false language is
 a very bad idea.
 
  So I stick with the
  idea that text should be tagged with language appropriately, and a user
  that reads Japanese and prefers to see Chinese text with Japanese glyphs
  have the ability to override the language tags to affect rendering.
 
 The trouble is that that's the default for a Japanese reader reading
 mixed-language text.  No override should be required.

It's not a big trouble. browsers already have options such as netscape's
preferences under fonts, radio button:
use fonts specified in document vs. override and use user-defined fonts.

 
  I can't say if typographical tradition preference (TTP) is the correct
  term for language preference. (I figure I got into enough trouble
  using typographically correct.) I hope the discussion above was clear
  enough. I'll let others comment on TTP, and if there is general
  agreement that it is a better and more precise and accurate term, I am
  fine with it.
 
 My point was that it's one thing to want Chinese text displayed with
 Japanese glyphs, based on a typographical-tradition preference, and it's
 another thing to want the text in a Japanese-language version,
 which is what setting a language preference would suggest.
 
  I am not familiar enough with Fraktur and Antiqua to
  knowledgably comment. From what little I do know this seems to require
  more than language information to decide between them.
 
 Absolutely.  The analogy is that Fraktur is quite, or nearly, illegible if
 all you know how to read is Antiqua (which looks like what you are seeing
 now, ordinary Latin-script type).  This makes the difference greater than a
 mere font difference.

ok, but it is not clear to me that we should try to fix this problem in
the same way we fix the cjk rendering problem.
Language is being tagged and provided for a number of reasons, and it
should be utilized.
Fraktur/Antiqua and other distinctions might need a different mechanism.

tex

 
 --
 Business before pleasure, if not too bloomering long before.
 --Nicholas van Rijn
 John Cowan [EMAIL PROTECTED]
 http://www.ccil.org/~cowan  http://www.reutershealth.com

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread John Cowan

William Overington scripsit:

 Well, it depends what one is trying to do.  If one wishes to establish a
 system whereby proprietary intellectual property rights exist, then a
 proprietary coding can be a good idea.

That is the function of encryption.

 XML is the way to go.
 
 Maybe, maybe not.  The issue of U+003C being used to mean LESS-THAN SIGN in
 documents which mix ordinary text and markup may or may not, depending upon
 the application, be a problem.

Since there are several standard ways to represent the semantic LESS-THAN
SIGN in XML (lt; is most typical, but #x3C; also works), there is
no problem, only a little extra work as tradeoff.  After all, why not
invent your own character code as well as your own markup language?

 The keys idea is pushing the envelope.  As spin off from this discussion,
 maybe the XML people, and the Unicode Technical Committee, will do something
 about having special characters for the XML tags rather than using U+003C
 and thereby help people wanting to place mathematics and software listings
 in the same file as markup.  

MathML is a markup standard for mathematical text that is an application of
XML, so people wanting to place etc. need no further help.

Don't hold your breath, and don't *mutcheh* us about it.

 What is wrong with private encodings?

Interchanging them does not scale.

 People may ignore them if they wish.  

They will, they will.

 High level application semantics assigned to particular code points are
 potentially very useful.  I have published various documents on the web
 about them with Private Use Area allocations for various items such as
 colour and point size for text.

Of course you can use the Private Use Area for whatever you like.  A character
standard, however, is intended for encoding *characters*.  It is not intended
as a source of useful integers -- for that, apply to Dedekind.

-- 
John Cowan   [EMAIL PROTECTED]
You need a change: try Canada  You need a change: try China
--fortune cookies opened by a couple that I know




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Jungshik Shin




On Fri, 27 Sep 2002 [EMAIL PROTECTED] wrote:


 On 09/26/2002 10:46:42 PM Andrew Cunningham wrote:

 For me, this is the crux: that browsers have not implimented the css
 :lang selector.

 As I wrote in my response to Tex, css 'lang' pseudo-class is
honored by MS IE and Mozilla 1.x/Netscape 7.

 Again, the problem is knowing just *how* they should go about doing this.

  As for 'how', what MS IE and Mozilla do may not be as user-friendly
as Tex wants them to be, but I think it's pretty reasonable at
least for CJK. If they're configured to use different Unicode-cmapped
(non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode
fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K
are rendered with fonts configured for TC,SC,J and K, respectively.

I guess you already know this much and what you're alluding
to is a problem of another dimension:  developing ( Pan-script
if necessary/possible) Unicode fonts with multiple lang-depedent
glyphs  (if that's possible at all overcoming/solving various subtles
issues involved. it seems like selecting lang-dependent glyphs for
Latin/Cyrillic letters are more difficult than CJK case) and getting
apps and rendering/font selection library to make use of them.  The font
selection part of these problems is addressed by fontconfig package by
Keith Packard (http://fontconfig.org). Of course, there should be
other implementations of/attempts at this problem.

  Jungshik Shin





Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread Peter_Constable


[This is entirely off-topic.]

On 09/27/2002 06:24:27 AM William Overington wrote:

Yet what indication whatsoever do you have that I ignore what you write?

The fact that you have been given recommendations from several people on
this list not to invent new markup conventions but to take advantage of the
existing, state-of-the-art technologies for this purpose, yet you have
consistently rejected those recommendations.



I do not always agree with you,

I doubt there's anyone on this list that always agrees with me (I certainly
hope not; after the passage of time, I often don't agree with myself :-).



it has no existing software support,
and nobody but you has any interest in it.

You have no basis whatsoever for claiming that nobody other than me has
any
interest in it.

It's only a claim, a hypothesis that I happen to consider to have enough
probability of validity to make me feel confident in stating in a public
forum. Of course, I may be wrong.



Maybe you are not interested, maybe some people you know
are not interested, yet I feel that it is unfair for you to make such a
statement without evidence when writing from an established organization
as
that remark may prejudice people from taking an interest in helping to
develop the idea because of a political dimension of going against the
tide.

I feel there is evidence: take a look at any serial publication related to
the software industry from the past three years and look for references to
XML. It comes up again and again and again. The evidence very strongly
points in favour of XML if one is needing a markup convention for some
protocol. There may well be some situation in which XML isn't appropriate;
e.g. one might have valid reasons for wanting to maintain a binary file
format as the native storage representation for a word-processing or
spreadsheet app. But if one is going to use a *character*-based markup
convention, I think you'd be hard pressed to come up with good reasons at
this point for using something other than XML.



Perhaps there is an [EMAIL PROTECTED] list somewhere where you
might
find greater interest in your ideas than here.

That is unfair of you.

If I offended, then I apologize. I merely wished to suggest that your ideas
regarding markup are what I think the vast majority on this list would
consider eccentric, and to also suggest that it's all off-topic for this
list and really should be taken up elsewhere.




You even stated in the same post.

quote

I'm going to refrain from commenting on anything beyond the markup issues

end quote

And I believe I did so.



The topic of keys generally which I have introduced is potentially a
far-reaching development in the application of markup in Unicode based
systems.  My own comet circumflex system may be highly useful in business
communications and distance education.  I am happy to respond to questions
and to consider documents which people suggest.

But please, not on this list. The is not the comet circumflex list.



XML exists and it uses U+003C in a way that makes using U+003C with the
meaning LESS-THAN SIGN in body text intermixed with markup sections
awkward.

Not significantly so, as evidenced by the fact that many have needed to
represent the character  within content yet this has not impeded the
widespread -- near ubiquitous -- adoption of XML.


That feature of XML may not matter for situations involving encoding
simply
literary works, yet for a comprehensive system which can include the
U+003C
character with the meaning LESS-THAN SIGN in body text and in markup
parameters, it does not suit my need.

Then I think you're making decisions about design of a protocol using the
wrong criteria.



Actually, I was rather hoping that, with your specific interest in
languages
that you would have wished to have a try at using the comet circumflex
system as one of the features of the comet circumflex system is that it
could be used with minority languages as easily as with the major
languages
of the world.

Actually, one of the things that I chose *not* to comment on in the
previous message was the very significant issues the comet circumflex
system raises in relation to internationalisation and localisation. As
someone else pointed out, your system has a problem in that a parameter
such as London needs to be localised. There are a range of
internationalisation issues that your system doesn't address. It isn't
always safe to assume that one can define a matrix statement that can be
translated into multiple languages and into which parameter strings can be
inserted; issues such as grammatical concord may be a problem. I don't want
to get into such a discussion (especially on this list). My point is, I see
many potential problems in terms of multilingual application of the system.
Also, the users I support are not dealing with text involving a set of
short, pre-defined messages, so this system isn't all that relevant for my
work.



- Peter



Re: glyph selection for Unicode in browsers

2002-09-27 Thread Tex Texin

Jungshik,

I used characters that should display differently. Some punctuation and
some like (I think it is) the bone character.

However, feel free to suggest a list of characters that should be
distinctive and I'll post a page with them that we can all review
whether there are differences or not on various platforms, browsers,
etc.

I agree for many characters there should be no differences.

tex

Jungshik Shin wrote:
 
   Actually, you might have had hard time telling the display difference
 depending on what characters you used for your testing EVEN IF you
 configured browsers to use different (but with __very similar__ design
 principles and look/feels) Unicode-cmapped (but NON-pan-script) fonts
 for TC,SC, J and K *under MS Windows*.  This difficulty demonstrates
 that CJK Unification in Unicode/10646 is not such a big problem as some
 people tried to make it.
 
   Jungshik

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread John H. Jenkins


On Friday, September 27, 2002, at 09:52 AM, [EMAIL PROTECTED] 
wrote:

 I doubt there's anyone on this list that always agrees with me


I think you're wrong, there, Peter.  I *never* disagree with you.  :-)

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/





Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread Tex Texin



William Overington wrote:
 Message catalogs are not new.
 
 I had not heard the description Message catalog previously, so I can
 search for that too.
 
 I have previously searched under telegraphic code and language and
 translation.

look for: software localization, message, catalog, resource files,
perhaps localisation ;-)

 
 An email correspondent drew my attention to the following list of numbered
 radiograms.
 
 http://www.arrl.org/FandES/field/forms/fsd3.html
 
 That is an interesting document.
 
 I have not yet found any example oriented to language translation.  I have
 not yet found any example oriented to carrying on a complete conversation.

A new prisoner sits down for his first lunch. Someone shouts out 53.
Everyone laughs.
Another shouts 26. More laughter. He asks his neighbor what's going
on... The neighbor explains they have all been there so long they have
heard all the jokes told very many times. Finally they just gave them
numbers. So when someone shouts out a number they remember the joke and
laugh.
After a bit the new guy shouts out: 42! Dead silence. He asks his
neighbor what went wrong. He turns to him and says That one is not
funny..

This is a very old joke. It is an indication of how old the idea of
numbered messages might be. ;-)

The arrl list was missing quite a few. 73  88 were common for Best
regards, and love and kisses.
I was rather surprised therefore when the Target products with 88 were
recently pulled from the market because they signaled the neo-nazi
movement. I thought it meant Love and kisses.

 
 A proprietary coding system is a bad idea.
 
 Well, it depends what one is trying to do. 


Yes, for the problem you described, given the availability of an open
system, with lots of tool support, creating a proprietary system in
which you could not create nearly as many tools as the open-based
systems, it would not be competitive. You would really have to build in
some significant market advantage. Given your lack of familiarity with
what exists in the market, and a presumption of a one-man shop (limited
resources), we speculated it was a mistake.

 XML is the way to go.
 
 Maybe, maybe not.  The issue of U+003C being used to mean LESS-THAN SIGN in
 documents which mix ordinary text and markup may or may not, depending upon
 the application, be a problem.

You can use the character with some minor escaping. It is a smaller
issue than trying to create all the various tools and benefits you would
get from XML.

 
 but as Peter and others have already defined
 several times where the envelope needs pushing (e.g. XML), and in
 particular where they should not (private encodings, and hi level
 application semantics assigned to particular code points), continued
 attempts to do so are not welcome.
 
 What is wrong with private encodings?  The Private Use Area is there to be
 used. 

Sure, but use them privately and discuss them privately with people who
have an interest in those particular purposes.
This is not the place. I know this has been stated before.

I think Suzanne or Barry even created a list for purposes of PUA
discussion:
http://groups.yahoo.com/group/CharMan/
Or start a list of your own.

You are welcome (as are others) to send announcements here saying- Hey
I have these PUA ideas, and would like to discuss them here and here.

It is really quite unfair to the members of the list to cause it to go
over the same ground.

hth
tex
-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread John Cowan

John H. Jenkins scripsit:

 I think you're wrong, there, Peter.  I *never* disagree with you.  :-)

Hmm.  Has anyone ever seen Peter and John together?  :-)

-- 
John Cowan  [EMAIL PROTECTED]  www.ccil.org/~cowan  www.reutershealth.com
In the sciences, we are now uniquely privileged to sit side by side
with the giants on whose shoulders we stand.
--Gerald Holton




Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread John Cowan

Tex Texin scripsit:

 After a bit the new guy shouts out: 42! Dead silence. He asks his
 neighbor what went wrong. He turns to him and says That one is not
 funny..

Other punchlines I have heard:

(about a third party):  Steve should know he can't handle Swedish dialect.

(after uproarious laughter):  Hey, we've never heard that one before!

(after silence): I guess you just don't know how to tell a joke.

 This is a very old joke. It is an indication of how old the idea of
 numbered messages might be. ;-)

As William mentions, commercial telegraph codes are almost as old as the
telegraph itself; when the five-letter-code principle was eventually
accepted internationally, it became possible to use a single group to
represent things as complex as We are shipping to you, care of your
agent in X, our product Y where all possible combinations of X and Y
were given individual codes.  This of course was a code commissioned by
a private company; public codes necessarily had to be more inclusive and
thus more verbose.  Several of them were indeed published in multilingual
editions, so that the same code sequence could be read as English,
French, German, 

In the case of public codes, company code clerks became quite adept
at reading the more frequent codes without reference to the code book.
On one occasion, a code clerk got a cable from an agent located halfway
around the planet reading simply AHXNO, a code entirely unfamiliar to him.
Unfortunately, when he looked it up, he found the reading to be:

Met with a fatal accident.

-- 
John Cowanhttp://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your values|   Check your assumptions.  In fact,
   at the front desk.   |  check your assumptions at the door.
 --sign in Paris hotel  |--Miles Vorkosigan




Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread Barry Caplan

At 04:26 PM 9/27/2002 +0100, William Overington wrote:
I had not heard the description Message catalog previously, so I can
search for that too.

I have previously searched under telegraphic code and language and
translation.

An email correspondent drew my attention to the following list of numbered
I have not yet found any example oriented to language translation.  

Key Unix libraries have used message catalogs as part of the API since time 
immemorial. Hence any Unix application with even a whiff of a chance of being 
internationalized is likely to have used those functions.


I have
not yet found any example oriented to carrying on a complete conversation.

I would look for the earliest references to machine translation int he 1940s and 50s, 
up to the work with Eliza at MIT in the 60s. I think there is an enormous project 
whose name I don't recall right now going on in Texas, perhaps Austin, which is 
spiritually derived from Eliza and focused on sending whole, previous composed 
sentences back conversational style.

If you want to find the whole of the literature in this area, I suggest searching 
Turing Test.


A proprietary coding system is a bad idea.

Well, it depends what one is trying to do.  If one wishes to establish a
system whereby proprietary intellectual property rights exist, then a
proprietary coding can be a good idea.  Various large companies use
proprietary coding systems for files used with their software packages.  If,
however, one is trying to establish an open system, then you might well be
right.

Or if you want to minimize the amount of reinventing the wheel you do internally. You 
can easily use a proprietary format outside and XML inside, just as you can use SJIS 
outside and Unicode for internal processing.


Failure to investigate the state of the art, (especially where google is
so effortless), means this idea is not pushing any envelope.

Well, if you have any specific suggestions of what keywords to use in a
search, that would be very helpful.


I have given you some. Rather than focusing on pseudo-scientific terms like 
radiogram, I suggest a starting with a familiarity with the history of computer 
science, both pure and applied research.


The keys idea is pushing the envelope.  


No it is not. 

As spin off from this discussion,
maybe the XML people, and the Unicode Technical Committee, will do something
about having special characters for the XML tags rather than using U+003C
and thereby help people wanting to place mathematics and software listings
in the same file as markup.  Is using U+003C a legacy from ASCII days?

Why is it not possible to use  signs in XML? 


Most of my postings in this thread are in response to people asking me
specific questions and raising interesting points.  That is surely why a
discussion group exists.

But most of the answers you get are based on a shared technical and educational 
background which you don't have and/or seem to value. It is difficult to describe but 
a lot of early computer science research was about how to effectively decompose 
functionality and data. Sadly, I think  a lot of this is being lost. For a more 
technical starting point, look for the works of Edsger Dijkstra starting in the 1960s. 
For a less technical point of view, look for The mythical Man-month from the mid 60s 
(recently updated), and its spiritual followups by Ed Yourdon and Tom Demarco. 

When I read the responses you get, I have the feeling that the authors have 
internalized the lessons of these important texts (even if they may not have studied 
them explicitly). Once you internalize the lessons also, then you will have a better 
understanding of the points of view you are consistently receiving with friction.


I am hoping that I can publish some web pages with some comet circumflex
codes and sentences about asking about the weather conditions and
temperatures at the message recipients location together with codes and
sentences for making replies so that hopefully people who might be
interested in some concept proving experiments can hopefully have a go at
some fascinating experiments with this technology.  Unicode can be used to
encode many langauges and it will be interesting to find out what can be
achieved using the comet circumflex system.

That might be an interesting web site in its own right, but the technology is nothing 
special and has ben done a million times under a million names and ten million times 
with no name at all.

Barry Caplan
Publisher, www.i18n.com





The WO Message catalog

2002-09-27 Thread Tex Texin

(Apologies to William, I hope you can find some humor in this, as we
have been a bit repetitive, haven't we. I am sure the magnificent
Sarasvati will slap me for this. I beg for mercy. At the same time I
can't resist this.)

Dear list,

I have a new invention, a message catalog to be assigned code points in
the PUA area. I think this will make some of the Unicode list discussion
more efficient. This can't be done in XML. I researched the subject and
codes for the Unicode list have not been invented before. I will write a
limerick about this shortly and post some web pages using the codes.
Peter, PUA is for anyone to use, so why can't I discuss this here?

Everyone have a good weekend!
tex


Message  Message
#   Text

100 I have a new invention.
110 This will be a great advance for markets 1, 2, and 3.
120 I will be using codes in the PUA area. It is most efficient to use a
proprietary approach.
130 I will be assigning these to the range 1 - 2.
140 I have researched the subject and discovered interesting papers on
1, 2 and 3. The papers were written in 181 - 192.
150 Peter, PUA is for anyone to use, so why can't I discuss them here?
160 I can't use XML for this purpose.
161 XML uses too many characters. It is inefficient.
162 XML uses the character 1 and I need to express that for purpose 2.
165 Besides, code points are there to be used.
170 I hope to publish some web pages using these codes.
180 I have created a 1 (select from poem, painting, sculpture, other
art form) as a creative, inventive and interesting aside to the project.



--
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-




Re: glyph selection for Unicode in browsers

2002-09-27 Thread Peter_Constable


On 09/27/2002 10:56:00 AM Jungshik Shin wrote:

 Again, the problem is knowing just *how* they should go about doing
this.

  As for 'how', what MS IE and Mozilla do may not be as user-friendly
as Tex wants them to be, but I think it's pretty reasonable at
least for CJK. If they're configured to use different Unicode-cmapped
(non-Pan-script) fonts for TC/SC/J/K (as opposed to pan-script Unicode
fonts like MS Arial Unicode, Cyberbit), runs of text tagged with TC/SC/J/K
are rendered with fonts configured for TC,SC,J and K, respectively.

A couple of notes:

Speaking in generalities, a font that isn't a pan-script Unicode font
potentially can support TC/SC/J/K equally well with glyphs suited to users
in each culture -- but not using default character-to-glyph mappings. The
mechanisms available to IE or Mozilla today would not provide any means to
determine which typographic preferences are supported by default in a given
font. Nor does the infrastructure exist that will allow these apps to
request the culturally-preferred fonts that would exist in such fonts. Of
course, in practice, many currently-existing CJK fonts may have been
developed to support a single group of users, and don't include alternate
glyphs that might be prefered by users in other cultures.

Also, what IE and Mozilla currently do helps with the CJK issues, but these
apps don't do anything, that I know of, in relation to comparable issues
for other scripts, e.g. language-related preferences for Latin diacritics
or Cyrillic italic forms. Which you anticipate:


I guess you already know this much and what you're alluding
to is a problem of another dimension:  developing ( Pan-script
if necessary/possible) Unicode fonts with multiple lang-depedent
glyphs

Yes (with the added note that the pan-script element is orthogonal to what
I'm referring to).


it seems like selecting lang-dependent glyphs for
Latin/Cyrillic letters are more difficult than CJK case

I'm not sure; I haven't thought about that, in part because I don't have
only limited knowledge of what glyph variations issues there are for most
scripts.


The font
selection part of these problems is addressed by fontconfig package by
Keith Packard (http://fontconfig.org). Of course, there should be
other implementations of/attempts at this problem.

The fontconfig library is entirely new to me. Thanks for the link.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]







RE: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread Marco Cimarosti

Tex Texin wrote:
 What's funny to me about this message, is a product message catalog I
 was responsible for localizing had messages created by software
 developers, such as (paraphrasing from memory):
 
 The client is dead.
 The client has been killed.
 You killed the client.
 
 Some of the translators were horrified. We had to explain that the
 client was software used by the user, and that to kill it
 meant the software was no longer operating, not that the 
 product caused
 the death of the user. And then we had to get the developers to change
 the message, since even in english they were not the most effective
 messages.
 
 Lucky too, that support couldn't cause someone on the phone to give a
 command that could kill the client...

Years ago, I was in charge of supporting software system composed of a main
module, called the parent (task), and of a number of secondary modules,
called child (tasks). Each child was identified with a name and a (task)
address.

One day, the IT manager reported that the system started having problems
after a child had turned off the computer.

I explained that, according to my knowledge, that was impossible: children
ran in a protected area, so the parent would have stopped them before they
had any chance of turning off the computer. But he replied that he saw the
child turning off the system with his own eyes, and the parent could not
stop it.

This guy was such an idiot, and I was quite surprised to discover that he
could use the utility called Children Monitor. So, I asked him to let me
know the child's name and address.

He said that he didn't understand how this detail could help us but, anyway,
he obtained the child's name and address from the parent:

Daniel Zubeispiel
Hauptkirchestrasse, 26
Zürich, Switzerland

(Seven years-old Daniel, the son of a system engineer, was in the laboratory
that day because his school was closed for maintenance.)

Ciao.
Marco