Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
I hasten to add: > UTF-8 and UTF-32, at least, already have the architecture > to represent 2^31 and 2^32 code points, respectively. The definitions would > simply have to changed to make the additional code points legal. > > Only UTF-16 would truly need to be redesigned, and that has already been > proposed. None of this is actually going to happen, of course. Unicode and 10646 are committed to staying with 17 planes. I was just pointing out that certain individuals had made informal proposals to extend the code space. -Doug Ewell Fullerton, California
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
In a message dated 2002-01-02 5:05:23 Pacific Standard Time, [EMAIL PROTECTED] writes: > There are worse things than thi: what if someone discovers a script with > more than 1,114,111 characters? Back to the drawing board to redesign all > the UTF's! Not all of them. UTF-8 and UTF-32, at least, already have the architecture to represent 2^31 and 2^32 code points, respectively. The definitions would simply have to changed to make the additional code points legal. Only UTF-16 would truly need to be redesigned, and that has already been proposed. For example, Masahiko Maedera once proposed a "UTF-16x" in which code points in the U+EExxx block were designated as "super surrogates." Three of these "super surrogates," or six 16-bit words, would be combined to represent code points beyond plane 17. (This was back in the days when some people felt that a great and crippling schism existed between Unicode and ISO 10646 because the former disallowed such code points and the latter allowed them.) -Doug Ewell Fullerton, California
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 09:38 -0500 2002-01-03, John Cowan wrote: >This leads to an interesting, if so far theoretical, Unicode question: >how to encode abjads and abugidas that have vowel signs which are >pronounced *before* the base consonant. Two Unicode principles, >logical order and base-before-combining, are thus put into conflict. > >In (Feanorian) Tengwar itself, the reading order is actually >language-dependent: thus "Quenya" (a Quenya word) is written >QU-e-N-y-a (where caps are base, smalls are combining), but >"Sindarin" (a Sindarin word) would be "S-N-i-D-R-a-N-i", if written with >base-before-combining, or "S-i-N-D-a-R-i-N" if written with logical order, >in which case the default grapheme clusters have to be broken up using >complex rendering code in order to get i over N and a over R. Did you not read my draft paper proposing the solution for this feature of this script? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
I wrote: > Vowel marks appearing to the left of the > consonants are pronounced before them; those to the right, after them. This leads to an interesting, if so far theoretical, Unicode question: how to encode abjads and abugidas that have vowel signs which are pronounced *before* the base consonant. Two Unicode principles, logical order and base-before-combining, are thus put into conflict. In (Feanorian) Tengwar itself, the reading order is actually language-dependent: thus "Quenya" (a Quenya word) is written QU-e-N-y-a (where caps are base, smalls are combining), but "Sindarin" (a Sindarin word) would be "S-N-i-D-R-a-N-i", if written with base-before-combining, or "S-i-N-D-a-R-i-N" if written with logical order, in which case the default grapheme clusters have to be broken up using complex rendering code in order to get i over N and a over R. The problem could be sidestepped with a grapheme-cluster encoding such as is used for Ethiopic, but the feel is very different: Ethiopic vowel signs are normally treated as part of the letter, whereas Tengwar vowel signs are more like typical abjad signs: partly optional indications of "colorings" to the fundamental consonant structure. Unicode tribal elders are invited to mention which of the two conflicting principles they reckon to be the more important. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values| Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel |--Miles Vorkosigan
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Patrick Andries scripsit: > This is the time for an aspiring J. R. R. Tolkien to leave his mark in > the Unicode saga by adopting a new strictly vertical script...à la Tengwar. JRRT actually did create such a vertical script, which was used in the Blessed Realm before Feanor got around to creating the Tengwar as we know them today: the Sarati of Ruumil. This is a TTB LTR abjad, like Mongolian. Vowel marks appearing to the left of the consonants are pronounced before them; those to the right, after them. http://user.tninet.se/~xof995c/sarati.htm -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values| Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel |--Miles Vorkosigan
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 16:51 -0800 2002-01-02, Kenneth Whistler wrote: >John Wilcock wrote: > >> All *known* vertical scripts! What happens if someone discovers a > > hitherto-unknown vertical script that is never written horizontally? It would be unthinkable that merchants using such a script wouldn't have horizontal and vertical variants for shop signs and neon. And crossword puzzles. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 20:14 -0500 2002-01-02, Patrick Andries wrote: >This is the time for an aspiring J. R. R. Tolkien to leave his mark >in the Unicode saga by adopting a new strictly vertical script...à >la Tengwar. That would be Sarati. Which I have already proposed for addition to the SMP, though for now it is waiting in the wings. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Kenneth Whistler wrote: > >Also, you'd have to go pretty far out to find a "hitherto-unknown >vertical script" that has escaped the eagle eyes of the Unicode >Roadmap committee. See, for example: > >http://www.unicode.org/roadmaps/smp-3-1.html > This is the time for an aspiring J. R. R. Tolkien to leave his mark in the Unicode saga by adopting a new strictly vertical script...à la Tengwar. He will, of course, first have to convince his editor... Best wishes for 2002 Unicode en français http://hapax.iquebec.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
John Wilcock wrote: > All *known* vertical scripts! What happens if someone discovers a > hitherto-unknown vertical script that is never written horizontally? I predict that the people who want to write about it will quickly render it LTR horizontally, to match the metadirectionality of the script they use to write about it. Scholars already regularly turn RTL epigraphy into LTR when they want to cite it in text (other than in facsimiles), to avoid the bidi problem. Also, you'd have to go pretty far out to find a "hitherto-unknown vertical script" that has escaped the eagle eyes of the Unicode Roadmap committee. See, for example: http://www.unicode.org/roadmaps/smp-3-1.html --Ken
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Sampo Syreeni wrote a fine FAQ answer about rendering directionality and then asked: > BTW, something akin to the above should really go in a FAQ. Is there > anything resembling a Unicode FAQ in existence, anywhere? Well, you could start on the Unicode home page http://www.unicode.org/ and click on the "FAQ" link. ;-) There's even a section in the FAQ on "Writing Directions", to which a distilled-down version of some of this discussion might be a fine addition. --Ken
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Forget about ancient and dead scripts. What about the future, when we start communicating with extra-terrestrials and have to start encoding all the other scripts in the galaxy! utf-googoolplex! ;-) (And don't nobody bring up klingon...) And so the new year begins on the Unicode list tex Marco Cimarosti wrote: > > John Wilcock wrote: > > All *known* vertical scripts! What happens if someone discovers a > > hitherto-unknown vertical script that is never written horizontally? > > There are worse things than thi: what if someone discovers a script with > more than 1,114,111 characters? Back to the drawing board to redesign all > the UTF's! > > :-) > _ Marco -- - Tex TexinDirector, International Business mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271 the Progress Company Fax: +1-781-280-4655 - For a compelling demonstration for Unicode: http://www.geocities.com/i18nguy/unicode-example.html
RE: Vertical scripts (was: Tategaki (was: Re: Updated...))
John Wilcock wrote: > All *known* vertical scripts! What happens if someone discovers a > hitherto-unknown vertical script that is never written horizontally? There are worse things than thi: what if someone discovers a script with more than 1,114,111 characters? Back to the drawing board to redesign all the UTF's! :-) _ Marco
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
On Wed, 2 Jan 2002 11:27:02 +0100 , Marco Cimarosti wrote: > Because all vertical scripts (CJK and Mongolian) can also be written > horizontally, whereas modern right-to-left script cannot be written > left-to-right. All *known* vertical scripts! What happens if someone discovers a hitherto-unknown vertical script that is never written horizontally? John. -- -- Over 1600 webcams from ski resorts around the world - http://www.snoweye.com/ -- Translate your technical documents and web pages- http://www.tradoc.fr/
RE: Vertical scripts (was: Tategaki (was: Re: Updated...))
Doug Ewell wrote: > TUS 3.0 states (p. 24): "In contrast to the bidirectional > case, the choice to lay out text either vertically or > horizontally is treated as a formatting style. > [...] why should overrides of default horizontal > directionality be a plain-text issue but overrides of > default vertical directionality be a higher-level > "formatting style" issue? Because all vertical scripts (CJK and Mongolian) can also be written horizontally, whereas modern right-to-left script cannot be written left-to-right. Also, all horizontal scripts, when embedded in Far East text, may be written vertically by rotating them 90° degrees (clockwise for LTR scripts, counterclockwise RTL scripts). So you can happily define a system-level vertical/horizontal preference, and use it blindly for plain text in any kind of script. _ Marco
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 12:22 PM 12/31/01 -0500, Tex Texin wrote: >I was fooled by that earlier in the year as well. The links to the other >pages should be at the top of the web page to highlight that the page is >a partial list and to make it easy to reference the other pages. Most >people will not scroll to the bottom of the page to find the other >links. That's in the plan for the 3.2 upgrade I'm working on. A./
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
I was fooled by that earlier in the year as well. The links to the other pages should be at the top of the web page to highlight that the page is a partial list and to make it easy to reference the other pages. Most people will not scroll to the bottom of the page to find the other links. Michael Everson wrote: stuff deleted... > >Did you miss these? > > Bloody hell. Yes, I missed them, because I assumed that the > charindex.html indexed all the characters. It does NOT! It indexes > A-D. Now I assumed that when it loaded I could just command-F and > find the text. So I did not scroll down the list. Therefore: > > I suggest that the Title of this document be changed to: > > Unicode 3.0.0 Character Name Index A-D > > and the other two (charindex2.html and charindex3.html) to > > Unicode 3.0.0 Character Name Index E-N > and > Unicode 3.0.0 Character Name Index O-Z > > -- > Michael Everson *** Everson Typography *** http://www.evertype.com -- - Tex TexinDirector, International Business mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271 the Progress Company Fax: +1-781-280-4655 - For a compelling demonstration for Unicode: http://www.geocities.com/i18nguy/unicode-example.html
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 20:31 -0800 2001-12-30, Asmus Freytag wrote: >At 12:50 PM 12/30/01 +, Michael Everson wrote: >>At 18:31 -0800 2001-12-29, Asmus Freytag wrote: >>> >>>Please see >>> >>>http://www.unicode.org/charts/charindex.html >> >>That's not very helpful, Asmus. I went there and tried searching >>"override", "left-to-right", and "left to right" and nothing was >>found. > >Quoting right from the file: > >these entries should have been what you were looking for: > >LEFT-TO-RIGHT OVERRIDE 202D >OVERRIDE, LEFT-TO-RIGHT 202D > >and even: >OVERRIDE, RIGHT-TO-LEFT 202E > >Did you miss these? Bloody hell. Yes, I missed them, because I assumed that the charindex.html indexed all the characters. It does NOT! It indexes A-D. Now I assumed that when it loaded I could just command-F and find the text. So I did not scroll down the list. Therefore: I suggest that the Title of this document be changed to: Unicode 3.0.0 Character Name Index A-D and the other two (charindex2.html and charindex3.html) to Unicode 3.0.0 Character Name Index E-N and Unicode 3.0.0 Character Name Index O-Z -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Thanks for the explanation Asmus. tex Asmus Freytag wrote: > > At 02:33 PM 12/30/01 -0500, Tex Texin wrote: > >It is a bit inconsistent and therefore confusing. > > > >I searched for "bidirectional" which immediately pointed me at the > >general punctuation pages in a pdf file. > >Searching for "bidrectional" in that file turns up empty. > > This is one of the few cases of an index entry that has no corresponding > line in the nameslist file. Usually the index entry is derived directly > from the character names and aliases, or the text of the block names and > sub headers. That's the reason you couldn't find "bidirectional" in the pdf > file. The subheaderin this case is just "Formatting characters" and that's > not very specific. > > >If you search > >for left-to-right, right-to-left, override, or embed, there you do get > >to the characters. However a saving grace is that when you are first > >pointed at the general punctuation file, the character code 202A is > >mentioned, so if you notice that you can go right to the character > >range. > > I'll make sure that is clearly worded in the instructions. > > >Maybe the initial index needs to be more comprehensive. It is usually a > >difficult task for any large book to get right. However, tracking the > >web queries might help improve it over time... > > The problem you encountered was one where the index is already more > comprehensive and detailed than the nameslist. ;-) > > One could monkey with the nameslist, adding the subheader for the > bidirectional controls, but then we would pick up a number of one-character > ranges with subheaders, which becomes awkward in itself. > > A./ > > PS: I'm in the process of updating the HTML files for the index to match > the contents of the Index-3.2.0dnn.txt file in the BETA directory. That > file covers the new 3.2 character names etc. but does not pick up new or > revised aliases and subheaders in the existing repertoire... -- - Tex TexinDirector, International Business mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271 the Progress Company Fax: +1-781-280-4655 - For a compelling demonstration for Unicode: http://www.geocities.com/i18nguy/unicode-example.html
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 02:33 PM 12/30/01 -0500, Tex Texin wrote: >It is a bit inconsistent and therefore confusing. > >I searched for "bidirectional" which immediately pointed me at the >general punctuation pages in a pdf file. >Searching for "bidrectional" in that file turns up empty. This is one of the few cases of an index entry that has no corresponding line in the nameslist file. Usually the index entry is derived directly from the character names and aliases, or the text of the block names and sub headers. That's the reason you couldn't find "bidirectional" in the pdf file. The subheaderin this case is just "Formatting characters" and that's not very specific. >If you search >for left-to-right, right-to-left, override, or embed, there you do get >to the characters. However a saving grace is that when you are first >pointed at the general punctuation file, the character code 202A is >mentioned, so if you notice that you can go right to the character >range. I'll make sure that is clearly worded in the instructions. >Maybe the initial index needs to be more comprehensive. It is usually a >difficult task for any large book to get right. However, tracking the >web queries might help improve it over time... The problem you encountered was one where the index is already more comprehensive and detailed than the nameslist. ;-) One could monkey with the nameslist, adding the subheader for the bidirectional controls, but then we would pick up a number of one-character ranges with subheaders, which becomes awkward in itself. A./ PS: I'm in the process of updating the HTML files for the index to match the contents of the Index-3.2.0dnn.txt file in the BETA directory. That file covers the new 3.2 character names etc. but does not pick up new or revised aliases and subheaders in the existing repertoire...
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 12:50 PM 12/30/01 +, Michael Everson wrote: >At 18:31 -0800 2001-12-29, Asmus Freytag wrote: >> >>Please see >> >>http://www.unicode.org/charts/charindex.html > >That's not very helpful, Asmus. I went there and tried searching >"override", "left-to-right", and "left to right" and nothing was found. Quoting right from the file: these entries should have been what you were looking for: LEFT-TO-RIGHT OVERRIDE 202D OVERRIDE, LEFT-TO-RIGHT 202D and even: OVERRIDE, RIGHT-TO-LEFT 202E Did you miss these? A./
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
It is a bit inconsistent and therefore confusing. I searched for "bidirectional" which immediately pointed me at the general punctuation pages in a pdf file. Searching for "bidrectional" in that file turns up empty. If you search for left-to-right, right-to-left, override, or embed, there you do get to the characters. However a saving grace is that when you are first pointed at the general punctuation file, the character code 202A is mentioned, so if you notice that you can go right to the character range. Maybe the initial index needs to be more comprehensive. It is usually a difficult task for any large book to get right. However, tracking the web queries might help improve it over time... tex Michael Everson wrote: > > At 18:31 -0800 2001-12-29, Asmus Freytag wrote: > >At 12:07 PM 12/29/01 +0100, Stefan Persson wrote: > >> > Seeing that Unicode already has left-to-right and right-to-left override > >>> characters, I wonder if a top-to-bottom override character might also be > >>> reasonable. > >> > >>Which are the code points for these characters? > > > >Please see > > > >http://www.unicode.org/charts/charindex.html > > That's not very helpful, Asmus. I went there and tried searching > "override", "left-to-right", and "left to right" and nothing was > found. > -- > Michael Everson *** Everson Typography *** http://www.evertype.com -- - Tex TexinDirector, International Business mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271 the Progress Company Fax: +1-781-280-4655 - For a compelling demonstration for Unicode: http://www.geocities.com/i18nguy/unicode-example.html
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 18:31 -0800 2001-12-29, Asmus Freytag wrote: >At 12:07 PM 12/29/01 +0100, Stefan Persson wrote: >> > Seeing that Unicode already has left-to-right and right-to-left override >>> characters, I wonder if a top-to-bottom override character might also be >>> reasonable. >> >>Which are the code points for these characters? > >Please see > >http://www.unicode.org/charts/charindex.html That's not very helpful, Asmus. I went there and tried searching "override", "left-to-right", and "left to right" and nothing was found. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 12:07 PM 12/29/01 +0100, Stefan Persson wrote: > > Seeing that Unicode already has left-to-right and right-to-left override > > characters, I wonder if a top-to-bottom override character might also be > > reasonable. > >Which are the code points for these characters? Please see http://www.unicode.org/charts/charindex.html A./
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 04:00 AM 12/29/01, Michael Everson wrote: >When written in manuscripts and on computers, Ogham is written as Latin >is. When inscribed on stone, it is written bottom-to-top, along the top of >the stone, and then down to the bottom on the other side. I don't believe >that there are any examples of multiple-line Ogham lapidary text. One could well argue, too, that when computer-controlled devices for cutting ogham stones become common, higher-level protocols will be necessary for proper placement of the glyphs. -- Curtis Clark http://www.csupomona.edu/~jcclark/ Mockingbird Font Works http://www.mockfont.com/
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
On Sat, 29 Dec 2001 [EMAIL PROTECTED] wrote: >Tex's example may or may not be realistic -- I have no way of knowing -- >but in suggesting a top-to-bottom directional override, I had hoped it >would be possible to represent a run of text such as Tex describes >without resorting to the infamous "higher protocol." But it is. Unicode just does not take a stand on how it should be formatted. See below. >This may seem arbitrary to some; why should overrides of default >horizontal directionality be a plain-text issue but overrides of default >vertical directionality be a higher-level "formatting style" issue? I >hope this discussion can shed some light on this question, and possibly >help me see what I may be missing. I think this has to do with the way people conceive the term "plaintext" -- anything beyond a simple line (or column) based flow layout will likely be thought of as "rich" instead. The reason is both historical and practical. Text is laid out like this in most cultures, and early printing/computer/typewriter technology followed suit. The matter of mixed writing directions is a relatively new one, and so isn't really covered by the concept of "plaintext". The practical reason is that comprehensive layout of fully free direction text is really difficult, if not impossible, whereas writing systems with identical line progression directions are more or less compatible, using a simplish algorithm (Unicode BiDi). If you look at the way text is normally displayed on 2D media, it's printed in a unidirectional stream and then chopped into lines at sheet edge. As long as the lines progress in the same direction, you can always manipulate the order of the symbols within the stream to get more or less correct display of mixed script directionalities. (Yes, line breaking and deeply nested BiDi levels are still troublesome.) This way, lr-tb is sorta compatible with rl-tb. There are of course three more pairs, not counting boustrophedon and the likes, but AFAIK this is the most common combination. It's also where the ease stops. If you try to mix opposite line progression directions, you will end up with something like the Unicode BiDi algo, only applied at the paragraph level. That soon becomes unreadable, and makes for really lousy APIs. (Even BiDi is difficult, as one usually needs to render entire paragraphs at a time.) Mixing vertical and horizontal writing modes is even more complicated since you cannot think of the text as a directional, chopped-into-lines stream, anymore. You *can* use all sorts of funky heuristics, but keeping the text both readable and "plain" is pretty much impossible. (If you don't believe that, think about how you would format a string of 1000 lr-tb, 100 tb-lr, 100 rl-bt and 1000 bt-lr characters. This is not a realistic example, of course, but illustrates the general point.) Now, there are many ways to cope with simplified variations of the theme. One is to rotate nested characters of foreign directionality so that the character progression direction for all the scripts present remains the same, no matter what the script. E.g. XSL-FO documentation gives a number of examples of this approach. Another is to force the character progression direction to agree between scripts, without rotation. This only works when characters are graphically separate, like they are in the Latin script or scripts based on Han ideographs. Top-to-bottom Latin within Japanese is a good example. (It also illustrates the effects on readability of messing with the natural directionality of text.) You can also print short spans of foreign text in its natural direction, within a line of text of differing native directionality. Metric units, printed in Latin within tb-rl traditional Japanese, are probably the most common case. I'm sure that people on this list could cite countless weirder examples. The point is, all such solutions are for special cases. They do not solve the problem of how to fit longer, nested spans with arbitrary directionality on a page without in some cases making the text as a whole illegible and/or unaesthetic. Hence, it's better to handle the special cases as what they are, instead of bringing them all into Unicode and forcing every Unicode compatible application to incorporate a full page layout engine. I think this is the ultimate reason why TUS 3.0 leaves this stuff to those "higher level protocols". We might in fact say that the Unicode Standard has two completely separate parts. The first is the logical encoding of any character based script as a stream of character codes, the second is an actual 2D, line based rendering of the encoding for the very special case where two scripts of identical line progression direction are mixed. Anything beyond this could well be said to be beyond the scope of TUS. We might indeed go as far as to say that certain combinations of scripts which *can* be encoded in Unicode, *cannot* actually be consistently rendered on 2D graphical media. (Afte
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 02:47 -0500 2001-12-29, [EMAIL PROTECTED] wrote: >Actually, there is a more serious problem involved with vertical directional >overrides: They would force the Unicode plain-text mechanism to become aware >of both vertical directionality and directional priority. This sounds >obvious, but in fact there are not two, but THREE issues involved with text >directionality: > >1. Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL). >2. Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT). >3. Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)). There are more complex aspects of layout that might apply to Egyptian and Mayan. > Ogham is either (LTR, TTB) or (BTT, ???). When written in manuscripts and on computers, Ogham is written as Latin is. When inscribed on stone, it is written bottom-to-top, along the top of the stone, and then down to the bottom on the other side. I don't believe that there are any examples of multiple-line Ogham lapidary text. By analogy with the manuscript tradition, I would recommend (BTT, LTR) for Ogham vertical columnar display. >Unicode characters have a default directionality, but both this and the >override mechanism cover only the horizontal aspect, not the vertical aspect >or the priority of one over the other. Thus, Mongolian characters are >assigned the same directionality code as Latin ("L") even though the TTB >directionality takes precedence over the LTR, the opposite of Latin. Not in mixed Latin/Mongolian text. Mongolians do interesting things too with Latin words in predominantly Mongolan text. But it seems that the whole thing is done by rotating the whole text field. >And there is no plain-text way to indicate the alternative directionality of >Ogham or Han. I think it is a question of DTP layout for Ogham, at least. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
At 12:07 +0100 2001-12-29, Stefan Persson wrote: >Someone said that Unicode contains switches for LTR & RTL. By adding >switches for TTB and BTT this problem could be solved. It would also be >necessary to define a priority order (i.e. which of them that should come >first). > >As an alternative solution, the current switches could be considered LTR, >TTB and RTL, TTB. Then 6 other code points would be necessary for the other >directions. I can't imagine this working for Egyptian or Mayan, or indeed Mongolian or Ogham. Mongolian and Ogham, when mixed with Latin text, are traditionally written LTR (sometimes Mongolian is RTL). It isn't normal or natural to do otherwise. Egyptian LTR or RTL is not problematic. But for columnar display, in current applications, markup is used. Mayan writes LTR or RTL in repeated columns of two, as I recall. I strongly suspect markup is required for this behaviour. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
- Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: den 29 december 2001 08:47 Subject: Re: Vertical scripts (was: Tategaki (was: Re: Updated...)) > 1. Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL). > 2. Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT). > 3. Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)). > [...] > An elaboration of the directional override mechanism to handle vertical > directionality would have to take priority into account as well. Instead of > two directionalities, LTR and RTL, the Unicode Standard would have to > consider eight. The Bidirectional Algorithm might have to become > Octodirectional, with a commensurate increase in complexity. Perhaps this is > the problem that is avoided by declaring vertical directionality to be a > higher-level "formatting style" issue. But it still seems arbitrary. Someone said that Unicode contains switches for LTR & RTL. By adding switches for TTB and BTT this problem could be solved. It would also be necessary to define a priority order (i.e. which of them that should come first). As an alternative solution, the current switches could be considered LTR, TTB and RTL, TTB. Then 6 other code points would be necessary for the other directions. _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
- Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: den 26 december 2001 06:48 Subject: Re: Vertical scripts (was: Tategaki (was: Re: Updated...)) > Seeing that Unicode already has left-to-right and right-to-left override > characters, I wonder if a top-to-bottom override character might also be > reasonable. Which are the code points for these characters? _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Tex Texin replied to Marco Cimarosti: >> Right-to-left vs. left-to-right are attributes of arbitrary *spans* of text, >> which can easily be mixed within the same paragraph. >> >> On the other hand, horizontal vs. vertical are attributes that can be only >> be applied to a whole paragraph or section. > > Marco, is that true? I thought that sometimes numbers for example "123." > might be written horizontally in the middle of a vertical run. Marco responded: > But that would a limited case for horizontal text embedded in vertical text: > I cannot imagine a real-world situation for a vertical text embedded in > horizontal text. And Sampo Syreeni weighed in: > I think this is something better handled by special-casing in rendering > software -- the numbers (and whatnot) could be rendered as rotated or > straight top-to-bottom as well. Considering this, it seems like a > stylistic variation better controlled by an upper level protocol, if at > all. Tex's example may or may not be realistic -- I have no way of knowing -- but in suggesting a top-to-bottom directional override, I had hoped it would be possible to represent a run of text such as Tex describes without resorting to the infamous "higher protocol." TUS 3.0 states (p. 24): "In contrast to the bidirectional case, the choice to lay out text either vertically or horizontally is treated as a formatting style. Therefore, the Unicode Standard does not provide directionality controls to specify that choice." This may seem arbitrary to some; why should overrides of default horizontal directionality be a plain-text issue but overrides of default vertical directionality be a higher-level "formatting style" issue? I hope this discussion can shed some light on this question, and possibly help me see what I may be missing. Actually, there is a more serious problem involved with vertical directional overrides: They would force the Unicode plain-text mechanism to become aware of both vertical directionality and directional priority. This sounds obvious, but in fact there are not two, but THREE issues involved with text directionality: 1. Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL). 2. Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT). 3. Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)). If you think about it, all text of non-trivial length has both horizontal and vertical directionality, and also a priority to the directionality. Horizontal and vertical directionalities are not opposites, they are complements. The Latin script is written (LTR, TTB) which means not only that there is a horizontal directionality of left-to-right and a vertical directionality of top-to-bottom, but also that the horizontal directionality takes precedence over the vertical. That is, we complete a horizontal (LTR) line before moving down the page (TTB) to start another line. According to TUS 3.0, Latin and most other European scripts are (LTR, TTB). Arabic and most other Middle Eastern scripts are (RTL, TTB). Ogham is either (LTR, TTB) or (BTT, ???). Han is traditionally written (TTB, RTL) and more recently (LTR, TTB). Mongolian is written (TTB, LTR). Unicode characters have a default directionality, but both this and the override mechanism cover only the horizontal aspect, not the vertical aspect or the priority of one over the other. Thus, Mongolian characters are assigned the same directionality code as Latin ("L") even though the TTB directionality takes precedence over the LTR, the opposite of Latin. And there is no plain-text way to indicate the alternative directionality of Ogham or Han. An elaboration of the directional override mechanism to handle vertical directionality would have to take priority into account as well. Instead of two directionalities, LTR and RTL, the Unicode Standard would have to consider eight. The Bidirectional Algorithm might have to become Octodirectional, with a commensurate increase in complexity. Perhaps this is the problem that is avoided by declaring vertical directionality to be a higher-level "formatting style" issue. But it still seems arbitrary. -Doug Ewell Fullerton, California
RE: Vertical scripts (was: Tategaki (was: Re: Updated...))
On Fri, 28 Dec 2001, Marco Cimarosti wrote: >>I thought that sometimes numbers for example "123." might be written >>horizontally in the middle of a vertical run. >>y >>a >>d >>d >>a >> 123. > >That's true: an extra complication! However, I have only seen that for >one- or two-digit numbers. I think this is something better handled by special-casing in rendering software -- the numbers (and whatnot) could be rendered as rotated or straight top-to-bottom as well. Considering this, it seems like a stylistic variation better controlled by an upper level protocol, if at all. >But that would a limited case for horizontal text embedded in vertical >text: I cannot imagine a real-world situation for a vertical text >embedded in horizontal text. If you think about the history of this particular rendering, it's about the Western/Arabic numbers intruding the East Asian writing system. If there's anything to believe in cyberpunk, the tide might well turn one day. I'm not quite sure we couldn't one day have residual English embedded with native Japanese terms. ;) Sampo Syreeni, aka decoy - mailto:[EMAIL PROTECTED], tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
RE: Vertical scripts (was: Tategaki (was: Re: Updated...))
Tex Texin wrote: > > > > On the other hand, horizontal vs. vertical are attributes > that can be only > > be applied to a whole paragraph or section. > > Marco, is that true? I thought that sometimes numbers for > example "123." > might be written horizontally in the middle of a vertical run. >y >a >d >d >a > 123. That's true: an extra complication! However, I have only seen that for one- or two-digit numbers. This is also used for single letters or two-letter acronyms (such as "Km"). Probably this is the reason for the "squared letters" in range U+3380..U+33DD (some of which are 3-letter long, BTW). But that would a limited case for horizontal text embedded in vertical text: I cannot imagine a real-world situation for a vertical text embedded in horizontal text. _ Marco
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
Marco Cimarosti wrote: > I see a big difference between the two cases. > > Right-to-left vs. left-to-right are attributes of arbitrary *spans* of text, > which can easily be mixed within the same paragraph. > > On the other hand, horizontal vs. vertical are attributes that can be only > be applied to a whole paragraph or section. Marco, is that true? I thought that sometimes numbers for example "123." might be written horizontally in the middle of a vertical run. y a d d a 123. y a ... tex -- - Tex TexinDirector, International Business mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271 the Progress Company Fax: +1-781-280-4655 - For a compelling demonstration for Unicode: http://www.geocities.com/i18nguy/unicode-example.html
RE: Vertical scripts (was: Tategaki (was: Re: Updated...))
Doug Ewell wrote: > > Unicode doesn't have some way to indicate vertical writing. > I think the > > only consideration for it is vertical presentation forms of some > > characters. Anything more is left for other software layers to deal > > with. > > Seeing that Unicode already has left-to-right and > right-to-left override > characters, I wonder if a top-to-bottom override character > might also be reasonable. I see a big difference between the two cases. Right-to-left vs. left-to-right are attributes of arbitrary *spans* of text, which can easily be mixed within the same paragraph. On the other hand, horizontal vs. vertical are attributes that can be only be applied to a whole paragraph or section. So, an hypothetical pair (start/end) of top-to-bottom override character should probably also act as paragraph separators. I wish a decent 2002 to everybody (as wishing more than "decent" would be quite irrealistic). _ Marco
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
In a message dated 2001-12-25 16:57:39 Pacific Standard Time, [EMAIL PROTECTED] writes: > Unicode doesn't have some way to indicate vertical writing. I think the > only consideration for it is vertical presentation forms of some > characters. Anything more is left for other software layers to deal > with. Seeing that Unicode already has left-to-right and right-to-left override characters, I wonder if a top-to-bottom override character might also be reasonable. -Doug Ewell Fullerton, California
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
From: "&Agr;&lgr;&eacgr;&xgr;&agr;&ngr;&dgr;&rgr;&ogr;&sfgr; &Dgr;&igr;&agr;&mgr;&agr;&ngr;&tgr;&iacgr;&dgr;&eegr;&sfgr;" <[EMAIL PROTECTED]> > By the way, does any browser in common use > support the Ruby extensions to HTML? Well, looking at links like: http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/rt.asp (all on one line) and just doing a random search on http://msdn.microsoft.com/ for keywords like "HTML Ruby" make me think its supported in IE5 and later? MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
Re: Vertical scripts (was: Tategaki (was: Re: Updated...))
* Stefan Persson <[EMAIL PROTECTED]> [2001-12-26 00:02]: > Is there some way to indicate vertical writing (in columns from right to > left) for Japanese and Chinese? Is there a Unicode code point assigned for > this, a HTML command, or just a special option in some word processors? Well, some word processors and typesetting systems do support vertical writing. It's probably more common in software oriented towards Chinese and Japanese, but I can't help you there. I do know that the Omega typesetting system supports vertical writing. Omega is based on TeX but with many extensions and some changes, and uses Unicode as its internal text encoding. Unicode doesn't have some way to indicate vertical writing. I think the only consideration for it is vertical presentation forms of some characters. Anything more is left for other software layers to deal with. As for HTML, I don't know (I'm sure someone will fill us in) but even if some mechanism for vertical writing is defined, I don't think any current browser supports it. By the way, does any browser in common use support the Ruby extensions to HTML? While doing a web search for the word "tategaki", looking for its meaning, I found a Java program that formats Japanese text for vertical display using HTML tables with a cell for each character. It's here: http://homepage.mac.com/kkonaka/TategakiProg.html This is kind of a kludge, but it may be useful in some circumstances. The author warns though: > (this generates far many cells in a table commonly observed in normal > web pages). - many browser cannot display text layout this way of more > than a few pages... (they'd run out of memory). -- &Agr;&lgr;&eacgr;&xgr;&agr;&ngr;&dgr;&rgr;&ogr;&sfgr; &Dgr;&igr;&agr;&mgr;&agr;&ngr;&tgr;&iacgr;&dgr;&eegr;&sfgr; * [EMAIL PROTECTED]