Re: Controls, gliphs, flies, lemonade

2011-09-20 Thread QSJN 4 UKR
Yes, i had written 'egyptian hieroglyphs' but how about banal CJK? We
still have no way to insert nonstandard ideogramme into text. Isn't it
a simple task? There are just 20 basic strokes :)  ok, 500 basic
symbols. Or 20? However  we can't combine it together :( !
Unicode is to complex standard. I even don't know how many properties
have one character (did you know about unicode-coloured characters? -
there was somewhere that my theme in this list), how can i know how my
application has to render 'plain' text with bidi, noncanonicordered
diacritics, and korean script. Right, i don't know that. And my
application render it in my way, some else in another (a_a / aa_ -
double comb. char., sure you seen that), so we have no standard at
all.
Off course, i can learn this complex standard, but what for? Most of
them i never use.
There must be a simpler system, not so many aprior data for it work.

2011/9/13, John H. Jenkins jenk...@apple.com:

 QSJN 4 UKR 於 2011年9月12日 下午9:06 寫道:

 I know it is sacred cow, but let me just ask, how do you people think.
 Is it good or bad that the codepoint means all about character: what,
 where, how... (see theme)? Maybe have we separate graph  control
 codes - wellnt have many problems, from banal ltr (( rtl instead ltr
 (rtl) to placing one tilde above 3, 4, anymore letters, or egyptian
 hierogliphs in rows'n'cols. Conceptually, I mean! Each letter in text
 is at least two codepoints (what and where) in file. Is it stupid?
 Trying to render the text we anyway must generate this data.



 It's not really a sacred cow per se, but it is a fundamental architectural
 decision which would be pretty much impossible to revisit now.

 Almost all writing is done using a small set of script-specific rules which
 are pretty straightforward.  English, for example, is laid out in horizontal
 lines running left-to-right and arranged top-to-bottom of the writing
 surface.  East Asian languages were traditionally laid out in vertical lines
 running from top-to-bottom and arranged right-to-left on the writing
 surface.

 Because some scripts are right-to-left and ltr and rtl text can be freely
 intermingled on a single line, Unicode provides plain-text directionality
 controls.  The preference, however, is to use higher-level protocols where
 possible.

 As for the scripts which are inherently two-dimensional (using
 hieroglyphics, mathematics, and music), it's almost impossible to provide
 plain text support for them.  There is too much dependence on additional
 information such as the specifics of font and point size.  Because of this,
 the UTC decided long ago that layout for such scripts absolutely must be
 done using a higher-level protocol to handle all the details.

 There are occasionally suggestions that positioning controls be added to
 plain text in Unicode, but so far the UTC has felt that the benefits are too
 marginal to overcome its reasons for having left them out in the first
 place.

 =
 Hoani H. Tinikini
 John H. Jenkins
 jenk...@apple.com









Re: Controls, gliphs, flies, lemonade

2011-09-20 Thread John H. Jenkins
In re CJK, that's already a FAQ: http://www.unicode.org/faq/han_cjk.html#16.  
The short version is: if all you want to do is to draw something, then yes, 
making up new hanzi on the fly is a solvable problem.  If you want to do 
anything that deals with the *content* (lexical analysis, sorting, 
text-to-speech), it's an incredibly difficult problem.  

And, actually, there's already a way to insert nonstandard hanzi into text 
(well, two, if you count the Ideographic Variation Indicator), namely 
Ideographic Description Sequences.  They're clumsy and awkward, but they do 
make it possible to exchange text with unencoded hanzi in a vaguely standard 
fashion.  

And yes, Unicode is very complicated, but that's because of the problem it's 
intended to solve.  If all you're interested in is drawing text in a couple of 
common scripts, such as Latin and Japanese, then you really don't need Unicode 
with all of its complexity.  Unicode is trying to provide a basis for handling 
all aspects of plain text processing for all the languages of the world in a 
single application.  

Just go to Wikipedia and look down the long list of different languages that a 
popular subject has articles in.  *That* is what Unicode is trying to provide.  
It's very tough to implement, but fortunately on all the major platforms, there 
are libraries that make it unnecessary for you to do all the work yourself.

QSJN 4 UKR 於 2011年9月20日 下午9:01 寫道:

 Yes, i had written 'egyptian hieroglyphs' but how about banal CJK? We
 still have no way to insert nonstandard ideogramme into text. Isn't it
 a simple task? There are just 20 basic strokes :)  ok, 500 basic
 symbols. Or 20? However  we can't combine it together :( !
 Unicode is to complex standard. I even don't know how many properties
 have one character (did you know about unicode-coloured characters? -
 there was somewhere that my theme in this list), how can i know how my
 application has to render 'plain' text with bidi, noncanonicordered
 diacritics, and korean script. Right, i don't know that. And my
 application render it in my way, some else in another (a_a / aa_ -
 double comb. char., sure you seen that), so we have no standard at
 all.
 Off course, i can learn this complex standard, but what for? Most of
 them i never use.
 There must be a simpler system, not so many aprior data for it work.
 
 2011/9/13, John H. Jenkins jenk...@apple.com:
 
 QSJN 4 UKR 於 2011年9月12日 下午9:06 寫道:
 
 I know it is sacred cow, but let me just ask, how do you people think.
 Is it good or bad that the codepoint means all about character: what,
 where, how... (see theme)? Maybe have we separate graph  control
 codes - wellnt have many problems, from banal ltr (( rtl instead ltr
 (rtl) to placing one tilde above 3, 4, anymore letters, or egyptian
 hierogliphs in rows'n'cols. Conceptually, I mean! Each letter in text
 is at least two codepoints (what and where) in file. Is it stupid?
 Trying to render the text we anyway must generate this data.
 
 
 
 It's not really a sacred cow per se, but it is a fundamental architectural
 decision which would be pretty much impossible to revisit now.
 
 Almost all writing is done using a small set of script-specific rules which
 are pretty straightforward.  English, for example, is laid out in horizontal
 lines running left-to-right and arranged top-to-bottom of the writing
 surface.  East Asian languages were traditionally laid out in vertical lines
 running from top-to-bottom and arranged right-to-left on the writing
 surface.
 
 Because some scripts are right-to-left and ltr and rtl text can be freely
 intermingled on a single line, Unicode provides plain-text directionality
 controls.  The preference, however, is to use higher-level protocols where
 possible.
 
 As for the scripts which are inherently two-dimensional (using
 hieroglyphics, mathematics, and music), it's almost impossible to provide
 plain text support for them.  There is too much dependence on additional
 information such as the specifics of font and point size.  Because of this,
 the UTC decided long ago that layout for such scripts absolutely must be
 done using a higher-level protocol to handle all the details.
 
 There are occasionally suggestions that positioning controls be added to
 plain text in Unicode, but so far the UTC has felt that the benefits are too
 marginal to overcome its reasons for having left them out in the first
 place.
 
 =
 Hoani H. Tinikini
 John H. Jenkins
 jenk...@apple.com
 
 
 
 
 
 
 

=
John H. Jenkins
jenk...@apple.com