Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-05 Thread DougEwell2

I hasten to add:

> UTF-8 and UTF-32, at least, already have the architecture 
> to represent 2^31 and 2^32 code points, respectively.  The definitions 
would 
> simply have to changed to make the additional code points legal.
>
> Only UTF-16 would truly need to be redesigned, and that has already been 
> proposed.

None of this is actually going to happen, of course.  Unicode and 10646 are 
committed to staying with 17 planes.  I was just pointing out that certain 
individuals had made informal proposals to extend the code space.

-Doug Ewell
 Fullerton, California




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-05 Thread DougEwell2

In a message dated 2002-01-02 5:05:23 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

> There are worse things than thi: what if someone discovers a script with
> more than 1,114,111 characters? Back to the drawing board to redesign all
> the UTF's!

Not all of them.  UTF-8 and UTF-32, at least, already have the architecture 
to represent 2^31 and 2^32 code points, respectively.  The definitions would 
simply have to changed to make the additional code points legal.

Only UTF-16 would truly need to be redesigned, and that has already been 
proposed.  For example, Masahiko Maedera once proposed a "UTF-16x" in which 
code points in the U+EExxx block were designated as "super surrogates."  
Three of these "super surrogates," or six 16-bit words, would be combined to 
represent code points beyond plane 17.  (This was back in the days when some 
people felt that a great and crippling schism existed between Unicode and ISO 
10646 because the former disallowed such code points and the latter allowed 
them.)

-Doug Ewell
 Fullerton, California




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-03 Thread Michael Everson

At 09:38 -0500 2002-01-03, John Cowan wrote:

>This leads to an interesting, if so far theoretical, Unicode question:
>how to encode abjads and abugidas that have vowel signs which are
>pronounced *before* the base consonant.  Two Unicode principles,
>logical order and base-before-combining, are thus put into conflict.
>
>In (Feanorian) Tengwar itself, the reading order is actually
>language-dependent: thus "Quenya" (a Quenya word) is written
>QU-e-N-y-a (where caps are base, smalls are combining), but
>"Sindarin" (a Sindarin word) would be "S-N-i-D-R-a-N-i", if written with
>base-before-combining, or "S-i-N-D-a-R-i-N" if written with logical order,
>in which case the default grapheme clusters have to be broken up using
>complex rendering code in order to get i over N and a over R.

Did you not read my draft paper proposing the solution for this 
feature of this script?
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-03 Thread John Cowan

I wrote:

> Vowel marks appearing to the left of the
> consonants are pronounced before them; those to the right, after them.

This leads to an interesting, if so far theoretical, Unicode question:
how to encode abjads and abugidas that have vowel signs which are
pronounced *before* the base consonant.  Two Unicode principles,
logical order and base-before-combining, are thus put into conflict.

In (Feanorian) Tengwar itself, the reading order is actually
language-dependent: thus "Quenya" (a Quenya word) is written
QU-e-N-y-a (where caps are base, smalls are combining), but
"Sindarin" (a Sindarin word) would be "S-N-i-D-R-a-N-i", if written with
base-before-combining, or "S-i-N-D-a-R-i-N" if written with logical order,
in which case the default grapheme clusters have to be broken up using
complex rendering code in order to get i over N and a over R.

The problem could be sidestepped with a grapheme-cluster encoding such as
is used for Ethiopic, but the feel is very different: Ethiopic vowel
signs are normally treated as part of the letter, whereas Tengwar
vowel signs are more like typical abjad signs: partly optional
indications of "colorings" to the fundamental consonant structure.

Unicode tribal elders are invited to mention which of the two conflicting
principles they reckon to be the more important.

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your values|   Check your assumptions.  In fact,
   at the front desk.   |  check your assumptions at the door.
 --sign in Paris hotel  |--Miles Vorkosigan




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-03 Thread John Cowan

Patrick Andries scripsit:

> This is the time for an aspiring J. R. R. Tolkien to leave his mark in 
> the Unicode saga by adopting a new strictly vertical script...à la Tengwar.

JRRT actually did create such a vertical script, which was used in
the Blessed Realm before Feanor got around to creating the Tengwar
as we know them today: the Sarati of Ruumil.  This is a TTB LTR
abjad, like Mongolian.  Vowel marks appearing to the left of the
consonants are pronounced before them; those to the right, after them.

http://user.tninet.se/~xof995c/sarati.htm

-- 
John Cowan   http://www.ccil.org/~cowan  [EMAIL PROTECTED]
Please leave your values|   Check your assumptions.  In fact,
   at the front desk.   |  check your assumptions at the door.
 --sign in Paris hotel  |--Miles Vorkosigan




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-03 Thread Michael Everson

At 16:51 -0800 2002-01-02, Kenneth Whistler wrote:
>John Wilcock wrote:
>
>>  All *known* vertical scripts! What happens if someone discovers a
>  > hitherto-unknown vertical script that is never written horizontally?

It would be unthinkable that merchants using such a script wouldn't 
have horizontal and vertical variants for shop signs and neon.

And crossword puzzles.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-03 Thread Michael Everson

At 20:14 -0500 2002-01-02, Patrick Andries wrote:

>This is the time for an aspiring J. R. R. Tolkien to leave his mark 
>in the Unicode saga by adopting a new strictly vertical script...à 
>la Tengwar.

That would be Sarati. Which I have already proposed for addition to 
the SMP, though for now it is waiting in the wings.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Patrick Andries


Kenneth Whistler wrote:

>
>Also, you'd have to go pretty far out to find a "hitherto-unknown
>vertical script" that has escaped the eagle eyes of the Unicode
>Roadmap committee. See, for example:
>
>http://www.unicode.org/roadmaps/smp-3-1.html
>

This is the time for an aspiring J. R. R. Tolkien to leave his mark in 
the Unicode saga by adopting a new strictly vertical script...à la Tengwar.

He will, of course, first have to convince his editor...

Best wishes for 2002

Unicode en français
http://hapax.iquebec.com











Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Kenneth Whistler

John Wilcock wrote:

> All *known* vertical scripts! What happens if someone discovers a
> hitherto-unknown vertical script that is never written horizontally?

I predict that the people who want to write about it will quickly
render it LTR horizontally, to match the metadirectionality of
the script they use to write about it.

Scholars already regularly turn RTL epigraphy into LTR when they
want to cite it in text (other than in facsimiles), to avoid the
bidi problem.

Also, you'd have to go pretty far out to find a "hitherto-unknown
vertical script" that has escaped the eagle eyes of the Unicode
Roadmap committee. See, for example:

http://www.unicode.org/roadmaps/smp-3-1.html

--Ken




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Kenneth Whistler

Sampo Syreeni wrote a fine FAQ answer about rendering directionality
and then asked:

> BTW, something akin to the above should really go in a FAQ. Is there
> anything resembling a Unicode FAQ in existence, anywhere?

Well, you could start on the Unicode home page http://www.unicode.org/
and click on the "FAQ" link. ;-) There's even a section in the FAQ
on "Writing Directions", to which a distilled-down version of
some of this discussion might be a fine addition.

--Ken




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Tex Texin

Forget about ancient and dead scripts. What about the future, when we
start communicating with extra-terrestrials and have to start encoding
all the other scripts in the galaxy! utf-googoolplex!

;-)

(And don't nobody bring up klingon...)

And so the new year begins on the Unicode list
tex


Marco Cimarosti wrote:
> 
> John Wilcock wrote:
> > All *known* vertical scripts! What happens if someone discovers a
> > hitherto-unknown vertical script that is never written horizontally?
> 
> There are worse things than thi: what if someone discovers a script with
> more than 1,114,111 characters? Back to the drawing board to redesign all
> the UTF's!
> 
> :-)
> _ Marco

-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




RE: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Marco Cimarosti

John Wilcock wrote:
> All *known* vertical scripts! What happens if someone discovers a
> hitherto-unknown vertical script that is never written horizontally?

There are worse things than thi: what if someone discovers a script with
more than 1,114,111 characters? Back to the drawing board to redesign all
the UTF's!

:-)
_ Marco




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread John Wilcock

On Wed, 2 Jan 2002 11:27:02 +0100 , Marco Cimarosti wrote:
> Because all vertical scripts (CJK and Mongolian) can also be written
> horizontally, whereas modern right-to-left script cannot be written
> left-to-right.

All *known* vertical scripts! What happens if someone discovers a
hitherto-unknown vertical script that is never written horizontally?

John.

-- 
-- Over 1600 webcams from ski resorts around the world - http://www.snoweye.com/
-- Translate your technical documents and web pages- http://www.tradoc.fr/





RE: Vertical scripts (was: Tategaki (was: Re: Updated...))

2002-01-02 Thread Marco Cimarosti

Doug Ewell wrote:
> TUS 3.0 states (p. 24): "In contrast to the bidirectional 
> case, the choice to lay out text either vertically or
> horizontally is treated as a formatting style.
> [...] why should overrides of default horizontal
> directionality be a plain-text issue but overrides of
> default vertical directionality be a higher-level 
> "formatting style" issue?

Because all vertical scripts (CJK and Mongolian) can also be written
horizontally, whereas modern right-to-left script cannot be written
left-to-right.

Also, all horizontal scripts, when embedded in Far East text, may be written
vertically by rotating them 90° degrees (clockwise for LTR scripts,
counterclockwise RTL scripts).

So you can happily define a system-level vertical/horizontal preference, and
use it blindly for plain text in any kind of script.

_ Marco




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-31 Thread Asmus Freytag

At 12:22 PM 12/31/01 -0500, Tex Texin wrote:
>I was fooled by that earlier in the year as well. The links to the other
>pages should be at the top of the web page to highlight that the page is
>a partial list and to make it easy to reference the other pages. Most
>people will not scroll to the bottom of the page to find the other
>links.

That's in the plan for the 3.2 upgrade I'm working on.

A./




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-31 Thread Tex Texin

I was fooled by that earlier in the year as well. The links to the other
pages should be at the top of the web page to highlight that the page is
a partial list and to make it easy to reference the other pages. Most
people will not scroll to the bottom of the page to find the other
links.


Michael Everson wrote:
stuff deleted...
> >Did you miss these?
> 
> Bloody hell. Yes, I missed them, because I assumed that the
> charindex.html indexed all the characters. It does NOT! It indexes
> A-D. Now I assumed that when it loaded I could just command-F and
> find the text. So I did not scroll down the list. Therefore:
> 
> I suggest that the Title of this document be changed to:
> 
> Unicode 3.0.0 Character Name Index A-D
> 
> and the other two (charindex2.html and charindex3.html) to
> 
> Unicode 3.0.0 Character Name Index E-N
> and
> Unicode 3.0.0 Character Name Index O-Z
> 
> --
> Michael Everson *** Everson Typography *** http://www.evertype.com

-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-31 Thread Michael Everson

At 20:31 -0800 2001-12-30, Asmus Freytag wrote:
>At 12:50 PM 12/30/01 +, Michael Everson wrote:
>>At 18:31 -0800 2001-12-29, Asmus Freytag wrote:
>>>
>>>Please see
>>>
>>>http://www.unicode.org/charts/charindex.html
>>
>>That's not very helpful, Asmus. I went there and tried searching 
>>"override", "left-to-right", and "left to right" and nothing was 
>>found.
>
>Quoting right from the file:
>
>these entries should have been what you were looking for:
>
>LEFT-TO-RIGHT OVERRIDE 202D
>OVERRIDE, LEFT-TO-RIGHT 202D
>
>and even:
>OVERRIDE, RIGHT-TO-LEFT 202E
>
>Did you miss these?

Bloody hell. Yes, I missed them, because I assumed that the 
charindex.html indexed all the characters. It does NOT! It indexes 
A-D. Now I assumed that when it loaded I could just command-F and 
find the text. So I did not scroll down the list. Therefore:

I suggest that the Title of this document be changed to:

Unicode 3.0.0 Character Name Index A-D

and the other two (charindex2.html and charindex3.html) to

Unicode 3.0.0 Character Name Index E-N
and
Unicode 3.0.0 Character Name Index O-Z


-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-30 Thread Tex Texin

Thanks for the explanation Asmus.
tex

Asmus Freytag wrote:
> 
> At 02:33 PM 12/30/01 -0500, Tex Texin wrote:
> >It is a bit inconsistent and therefore confusing.
> >
> >I searched for "bidirectional" which immediately pointed me at the
> >general punctuation pages in a pdf file.
> >Searching for "bidrectional" in that file turns up empty.
> 
> This is one of the few cases of an index entry that has no corresponding
> line in the nameslist file. Usually the index entry is derived directly
> from the character names and aliases, or the text of the block names and
> sub headers. That's the reason you couldn't find "bidirectional" in the pdf
> file. The subheaderin this case is just "Formatting characters" and that's
> not very specific.
> 
> >If you search
> >for left-to-right, right-to-left, override, or embed, there you do get
> >to the characters. However a saving grace is that when you are first
> >pointed at the general punctuation file, the character code 202A is
> >mentioned, so if you notice that you can go right to the character
> >range.
> 
> I'll make sure that is clearly worded in the instructions.
> 
> >Maybe the initial index needs to be more comprehensive. It is usually a
> >difficult task for any large book to get right. However, tracking the
> >web queries might help improve it over time...
> 
> The problem you encountered was one where the index is already more
> comprehensive and detailed than the nameslist. ;-)
> 
> One could monkey with the nameslist, adding the subheader for the
> bidirectional controls, but then we would pick up a number of one-character
> ranges with subheaders, which becomes awkward in itself.
> 
> A./
> 
> PS: I'm in the process of updating the HTML files for the index to match
> the contents of the Index-3.2.0dnn.txt file in the BETA directory. That
> file covers the new 3.2 character names etc. but does not pick up new or
> revised aliases and subheaders in the existing repertoire...

-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-30 Thread Asmus Freytag

At 02:33 PM 12/30/01 -0500, Tex Texin wrote:
>It is a bit inconsistent and therefore confusing.
>
>I searched for "bidirectional" which immediately pointed me at the
>general punctuation pages in a pdf file.
>Searching for "bidrectional" in that file turns up empty.

This is one of the few cases of an index entry that has no corresponding 
line in the nameslist file. Usually the index entry is derived directly 
from the character names and aliases, or the text of the block names and 
sub headers. That's the reason you couldn't find "bidirectional" in the pdf 
file. The subheaderin this case is just "Formatting characters" and that's 
not very specific.

>If you search
>for left-to-right, right-to-left, override, or embed, there you do get
>to the characters. However a saving grace is that when you are first
>pointed at the general punctuation file, the character code 202A is
>mentioned, so if you notice that you can go right to the character
>range.

I'll make sure that is clearly worded in the instructions.

>Maybe the initial index needs to be more comprehensive. It is usually a
>difficult task for any large book to get right. However, tracking the
>web queries might help improve it over time...

The problem you encountered was one where the index is already more 
comprehensive and detailed than the nameslist. ;-)

One could monkey with the nameslist, adding the subheader for the 
bidirectional controls, but then we would pick up a number of one-character 
ranges with subheaders, which becomes awkward in itself.

A./

PS: I'm in the process of updating the HTML files for the index to match 
the contents of the Index-3.2.0dnn.txt file in the BETA directory. That 
file covers the new 3.2 character names etc. but does not pick up new or 
revised aliases and subheaders in the existing repertoire...




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-30 Thread Asmus Freytag

At 12:50 PM 12/30/01 +, Michael Everson wrote:
>At 18:31 -0800 2001-12-29, Asmus Freytag wrote:
>>
>>Please see
>>
>>http://www.unicode.org/charts/charindex.html
>
>That's not very helpful, Asmus. I went there and tried searching 
>"override", "left-to-right", and "left to right" and nothing was found.

Quoting right from the file:

these entries should have been what you were looking for:

LEFT-TO-RIGHT OVERRIDE 202D
OVERRIDE, LEFT-TO-RIGHT 202D

and even:
OVERRIDE, RIGHT-TO-LEFT 202E

Did you miss these?

A./






Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-30 Thread Tex Texin

It is a bit inconsistent and therefore confusing.

I searched for "bidirectional" which immediately pointed me at the
general punctuation pages in a pdf file.
Searching for "bidrectional" in that file turns up empty. If you search
for left-to-right, right-to-left, override, or embed, there you do get
to the characters. However a saving grace is that when you are first
pointed at the general punctuation file, the character code 202A is
mentioned, so if you notice that you can go right to the character
range.

Maybe the initial index needs to be more comprehensive. It is usually a
difficult task for any large book to get right. However, tracking the
web queries might help improve it over time...

tex

Michael Everson wrote:
> 
> At 18:31 -0800 2001-12-29, Asmus Freytag wrote:
> >At 12:07 PM 12/29/01 +0100, Stefan Persson wrote:
> >>  > Seeing that Unicode already has left-to-right and right-to-left override
> >>>  characters, I wonder if a top-to-bottom override character might also be
> >>>  reasonable.
> >>
> >>Which are the code points for these characters?
> >
> >Please see
> >
> >http://www.unicode.org/charts/charindex.html
> 
> That's not very helpful, Asmus. I went there and tried searching
> "override", "left-to-right", and "left to right" and nothing was
> found.
> --
> Michael Everson *** Everson Typography *** http://www.evertype.com

-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-30 Thread Michael Everson

At 18:31 -0800 2001-12-29, Asmus Freytag wrote:
>At 12:07 PM 12/29/01 +0100, Stefan Persson wrote:
>>  > Seeing that Unicode already has left-to-right and right-to-left override
>>>  characters, I wonder if a top-to-bottom override character might also be
>>>  reasonable.
>>
>>Which are the code points for these characters?
>
>Please see
>
>http://www.unicode.org/charts/charindex.html

That's not very helpful, Asmus. I went there and tried searching 
"override", "left-to-right", and "left to right" and nothing was 
found.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Calendars (was: Re: Vertical scripts)

2001-12-30 Thread juuichiketajin


> Oh, no. We use three calendars in Iran: Jalali,
Hijri, and Gregorian. The
> official one is Jalali, for some holidays we use
Hijri dates, and we use
> Gregorian just for international occasions like the
Internation Workers'
> Day.

I can't help but wonder how they would show that on a
digital watch!! Just month / day, with both month and
day being shown by digits, and the first day of the
year being 1 / 1 ?

>
> roozbeh
>
>
>

-- 

___
Get your free email from http://www.ranmamail.com

Powered by Outblaze




Re: Vertical scripts

2001-12-29 Thread Asmus Freytag

At 02:39 PM 12/29/01 +0100, Philipp Reichmuth wrote:
>Admittedly, some directions are
>rather arcane, but believe me, it will be possible to dig up some
>ancient document or other which is written into any odd combination of
>directions. If Unicode wants to achieve complete representation of any
>kind of presentation of text direction here, it's going to be a pretty
>rough job.


Unicode happily settles for covering the must-have common cases for modern 
text, in particular those that are needed for modern plain text. One of the 
reasons vertical text support has been left out of Unicode is that modern 
*plain* text in the languages in question usually is presented 
left-to-right horizontally, whereas bidirectional plain text cannot be 
presented without explicit support.

Already with bidi, the support for the explicit directionality - required 
for plain text - is in danger of colliding with markup solutions for the 
same functionality.

The 32 directions from the Omega example belong in markup - not plain text.

A./

PS: as far as ancient writing is concerned, we clearly need a common markup 
solution for "capturing the way things look in the original", which can be 
based on the plain-text backbone provided by Unicode. The Text Encoding 
Initiative (at http://www.tei-c.org/ ) produces The TEI Guidelines: 
"detailed recommendations for the encoding of all kinds of textual material 
of all kinds in all languages from all times". 




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Asmus Freytag

At 12:07 PM 12/29/01 +0100, Stefan Persson wrote:
> > Seeing that Unicode already has left-to-right and right-to-left override
> > characters, I wonder if a top-to-bottom override character might also be
> > reasonable.
>
>Which are the code points for these characters?

Please see

http://www.unicode.org/charts/charindex.html

A./




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Curtis Clark

At 04:00 AM 12/29/01, Michael Everson wrote:
>When written in manuscripts and on computers, Ogham is written as Latin 
>is. When inscribed on stone, it is written bottom-to-top, along the top of 
>the stone, and then down to the bottom on the other side. I don't believe 
>that there are any examples of multiple-line Ogham lapidary text.

One could well argue, too, that when computer-controlled devices for 
cutting ogham stones become common, higher-level protocols will be 
necessary for proper placement of the glyphs.


-- 
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/





Re: Vertical scripts

2001-12-29 Thread Roozbeh Pournader

On Sat, 29 Dec 2001, Philipp Reichmuth wrote:

> Kind regards and happy new year to everyone (at least everyone
> following the Gregorian calendar, that is :-)

Oh, no. We use three calendars in Iran: Jalali, Hijri, and Gregorian. The
official one is Jalali, for some holidays we use Hijri dates, and we use
Gregorian just for international occasions like the Internation Workers'
Day.

roozbeh





Re: Vertical scripts

2001-12-29 Thread Philipp Reichmuth

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Stefan and others,

>> 1.  Horizontal, that is, left-to-right (LTR) versus right-to-left
>> (RTL).
>> 2.  Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT).
>> 3.  Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)).

>> [...] An elaboration of the directional override mechanism to
>> handle vertical directionality would have to take priority into
>> account as well.  Instead of two directionalities, LTR and RTL, the
>> Unicode Standard would have to consider eight.

This is not quite sufficient, I'm afraid. An interesting approach is,
for example, taken in the Omega typesetting system, an extension of
TeX, where they deal with a total of 32 writing directions by
describing writing directions by three variables:

1. "Top" side of the page (one out of four)

2. "Left" side of the page (one out of two, since it *does* have to be
orthogonal to 1)

3. "Top" of character (one out of four). This is needed if you want to
embed, say, RTL and LTR text in the middle of a TTB paragraph, such as
embedding other scripts in Mongolian, or rotated plain script in
vertical CJK.

This gives you 32 combinations of what is written into which
directions and how glyphs have to be aligned. It's described in a
paper on http://omega.cse.unsw.edu.au:8080/papers/directions.pdf, but
be kind to the poor server :-) The most "common" direction
combinations given there are the following eight:

TLT - Left-right scripts, horizontal CJK
TRT - Right-left scripts
RTT - Vertical CJK, upright left-right scripts in vertical CJK
RTL - Mongolian in vertical CJK
RTR - Rotated left-right scripts in vertical CJK
LTL - Mongolian [This actually depents on the orientation of Mongolian
  glyphs]
LTR - Rotated left-right scripts in Mongolian
LTT - Vertical CJK in Mongolian

Even this 32-direction system does not cover cases such as Mayan where
scripts follow the rebus principle, where text gets embedded inside
other glyphs or where text is written "boustrophedon" ("as the ox
plows", i.e. line by line in both directions). Actually, this is a DTP
issue. It has very little to do with the information content of the
text, instead it's rather a presentation thing, which is why IMHO it
does not belong into Unicode, but that's a matter of discussion.

SP> As an alternative solution, the current switches could be considered LTR,
SP> TTB and RTL, TTB. Then 6 other code points would be necessary for the other
SP> directions.

As said, this is not quite enough. Admittedly, some directions are
rather arcane, but believe me, it will be possible to dig up some
ancient document or other which is written into any odd combination of
directions. If Unicode wants to achieve complete representation of any
kind of presentation of text direction here, it's going to be a pretty
rough job.

Kind regards and happy new year to everyone (at least everyone
following the Gregorian calendar, that is :-)

 Philippmailto:[EMAIL PROTECTED]
__
Out of memory / We wish to hold the whole sky / But we never will
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.6 (MingW32)
Comment: Freedom of the press is limited to those who own one.

iD8DBQE8Lcd/AFQhKhQ6O0kRAp8tAJ9lfrU19gkm4de/b9wO/ucRjj18GwCgkRgx
B3G6wlPGAmbF44Y3imrZGc0=
=ZBG9
-END PGP SIGNATURE-





Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Sampo Syreeni

On Sat, 29 Dec 2001 [EMAIL PROTECTED] wrote:

>Tex's example may or may not be realistic -- I have no way of knowing --
>but in suggesting a top-to-bottom directional override, I had hoped it
>would be possible to represent a run of text such as Tex describes
>without resorting to the infamous "higher protocol."

But it is. Unicode just does not take a stand on how it should be
formatted. See below.

>This may seem arbitrary to some; why should overrides of default
>horizontal directionality be a plain-text issue but overrides of default
>vertical directionality be a higher-level "formatting style" issue? I
>hope this discussion can shed some light on this question, and possibly
>help me see what I may be missing.

I think this has to do with the way people conceive the term "plaintext"
-- anything beyond a simple line (or column) based flow layout will likely
be thought of as "rich" instead. The reason is both historical and
practical. Text is laid out like this in most cultures, and early
printing/computer/typewriter technology followed suit. The matter of mixed
writing directions is a relatively new one, and so isn't really covered by
the concept of "plaintext".

The practical reason is that comprehensive layout of fully free direction
text is really difficult, if not impossible, whereas writing systems with
identical line progression directions are more or less compatible, using a
simplish algorithm (Unicode BiDi). If you look at the way text is normally
displayed on 2D media, it's printed in a unidirectional stream and then
chopped into lines at sheet edge. As long as the lines progress in the
same direction, you can always manipulate the order of the symbols within
the stream to get more or less correct display of mixed script
directionalities. (Yes, line breaking and deeply nested BiDi levels are
still troublesome.) This way, lr-tb is sorta compatible with rl-tb.
There are of course three more pairs, not counting boustrophedon and the
likes, but AFAIK this is the most common combination.

It's also where the ease stops. If you try to mix opposite line
progression directions, you will end up with something like the Unicode
BiDi algo, only applied at the paragraph level. That soon becomes
unreadable, and makes for really lousy APIs. (Even BiDi is difficult, as
one usually needs to render entire paragraphs at a time.) Mixing vertical
and horizontal writing modes is even more complicated since you cannot
think of the text as a directional, chopped-into-lines stream, anymore.
You *can* use all sorts of funky heuristics, but keeping the text both
readable and "plain" is pretty much impossible. (If you don't believe
that, think about how you would format a string of 1000 lr-tb, 100 tb-lr,
100 rl-bt and 1000 bt-lr characters. This is not a realistic example, of
course, but illustrates the general point.)

Now, there are many ways to cope with simplified variations of the theme.
One is to rotate nested characters of foreign directionality so that the
character progression direction for all the scripts present remains the
same, no matter what the script. E.g. XSL-FO documentation gives a number
of examples of this approach. Another is to force the character
progression direction to agree between scripts, without rotation. This
only works when characters are graphically separate, like they are in the
Latin script or scripts based on Han ideographs. Top-to-bottom Latin
within Japanese is a good example. (It also illustrates the effects on
readability of messing with the natural directionality of text.) You can
also print short spans of foreign text in its natural direction, within a
line of text of differing native directionality. Metric units, printed in
Latin within tb-rl traditional Japanese, are probably the most common
case. I'm sure that people on this list could cite countless weirder
examples.

The point is, all such solutions are for special cases. They do not solve
the problem of how to fit longer, nested spans with arbitrary
directionality on a page without in some cases making the text as a whole
illegible and/or unaesthetic. Hence, it's better to handle the special
cases as what they are, instead of bringing them all into Unicode and
forcing every Unicode compatible application to incorporate a full page
layout engine. I think this is the ultimate reason why TUS 3.0 leaves this
stuff to those "higher level protocols".

We might in fact say that the Unicode Standard has two completely separate
parts. The first is the logical encoding of any character based script as
a stream of character codes, the second is an actual 2D, line based
rendering of the encoding for the very special case where two scripts of
identical line progression direction are mixed. Anything beyond this could
well be said to be beyond the scope of TUS. We might indeed go as far as
to say that certain combinations of scripts which *can* be encoded in
Unicode, *cannot* actually be consistently rendered on 2D graphical media.

(Afte

Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Michael Everson

At 02:47 -0500 2001-12-29, [EMAIL PROTECTED] wrote:

>Actually, there is a more serious problem involved with vertical directional
>overrides: They would force the Unicode plain-text mechanism to become aware
>of both vertical directionality and directional priority.  This sounds
>obvious, but in fact there are not two, but THREE issues involved with text
>directionality:
>
>1.  Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL).
>2.  Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT).
>3.  Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)).

There are more complex aspects of layout that might apply to Egyptian 
and Mayan.


> Ogham is either (LTR, TTB) or (BTT, ???).

When written in manuscripts and on computers, Ogham is written as 
Latin is. When inscribed on stone, it is written bottom-to-top, along 
the top of the stone, and then down to the bottom on the other side. 
I don't believe that there are any examples of multiple-line Ogham 
lapidary text. By analogy with the manuscript tradition, I would 
recommend (BTT, LTR) for Ogham vertical columnar display.

>Unicode characters have a default directionality, but both this and the
>override mechanism cover only the horizontal aspect, not the vertical aspect
>or the priority of one over the other.  Thus, Mongolian characters are
>assigned the same directionality code as Latin ("L") even though the TTB
>directionality takes precedence over the LTR, the opposite of Latin.

Not in mixed Latin/Mongolian text. Mongolians do interesting things 
too with Latin words in predominantly Mongolan text. But it seems 
that the whole thing is done by rotating the whole text field.

>And there is no plain-text way to indicate the alternative directionality of
>Ogham or Han.

I think it is a question of DTP layout for Ogham, at least.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Michael Everson

At 12:07 +0100 2001-12-29, Stefan Persson wrote:

>Someone said that Unicode contains switches for LTR & RTL. By adding
>switches for TTB and BTT this problem could be solved. It would also be
>necessary to define a priority order (i.e. which of them that should come
>first).
>
>As an alternative solution, the current switches could be considered LTR,
>TTB and RTL, TTB. Then 6 other code points would be necessary for the other
>directions.

I can't imagine this working for Egyptian or Mayan, or indeed 
Mongolian or Ogham.

Mongolian and Ogham, when mixed with Latin text, are traditionally 
written LTR (sometimes Mongolian is RTL). It isn't normal or natural 
to do otherwise.

Egyptian LTR or RTL is not problematic. But for columnar display, in 
current applications, markup is used.

Mayan writes LTR or RTL in repeated columns of two, as I recall. I 
strongly suspect markup is required for this behaviour.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Stefan Persson

- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: den 29 december 2001 08:47
Subject: Re: Vertical scripts (was: Tategaki (was: Re: Updated...))


> 1.  Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL).
> 2.  Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT).
> 3.  Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)).
> [...]
> An elaboration of the directional override mechanism to handle vertical
> directionality would have to take priority into account as well.  Instead
of
> two directionalities, LTR and RTL, the Unicode Standard would have to
> consider eight.  The Bidirectional Algorithm might have to become
> Octodirectional, with a commensurate increase in complexity.  Perhaps this
is
> the problem that is avoided by declaring vertical directionality to be a
> higher-level "formatting style" issue.  But it still seems arbitrary.

Someone said that Unicode contains switches for LTR & RTL. By adding
switches for TTB and BTT this problem could be solved. It would also be
necessary to define a priority order (i.e. which of them that should come
first).

As an alternative solution, the current switches could be considered LTR,
TTB and RTL, TTB. Then 6 other code points would be necessary for the other
directions.


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread Stefan Persson

- Original Message - 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: den 26 december 2001 06:48
Subject: Re: Vertical scripts (was: Tategaki (was: Re: Updated...))


> Seeing that Unicode already has left-to-right and right-to-left override 
> characters, I wonder if a top-to-bottom override character might also be 
> reasonable.

Which are the code points for these characters?


_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com





Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-29 Thread DougEwell2

Tex Texin replied to Marco Cimarosti:

>> Right-to-left vs. left-to-right are attributes of arbitrary *spans* of 
text,
>> which can easily be mixed within the same paragraph.
>>
>> On the other hand, horizontal vs. vertical are attributes that can be only
>> be applied to a whole paragraph or section.
>
> Marco, is that true? I thought that sometimes numbers for example "123."
> might be written horizontally in the middle of a vertical run.

Marco responded:

> But that would a limited case for horizontal text embedded in vertical text:
> I cannot imagine a real-world situation for a vertical text embedded in
> horizontal text.

And Sampo Syreeni weighed in:

> I think this is something better handled by special-casing in rendering
> software -- the numbers (and whatnot) could be rendered as rotated or
> straight top-to-bottom as well. Considering this, it seems like a
> stylistic variation better controlled by an upper level protocol, if at
> all.

Tex's example may or may not be realistic -- I have no way of knowing -- but 
in suggesting a top-to-bottom directional override, I had hoped it would be 
possible to represent a run of text such as Tex describes without resorting 
to the infamous "higher protocol."

TUS 3.0 states (p. 24): "In contrast to the bidirectional case, the choice to 
lay out text either vertically or horizontally is treated as a formatting 
style.  Therefore, the Unicode Standard does not provide directionality 
controls to specify that choice."  This may seem arbitrary to some; why 
should overrides of default horizontal directionality be a plain-text issue 
but overrides of default vertical directionality be a higher-level 
"formatting style" issue?  I hope this discussion can shed some light on this 
question, and possibly help me see what I may be missing.

Actually, there is a more serious problem involved with vertical directional 
overrides: They would force the Unicode plain-text mechanism to become aware 
of both vertical directionality and directional priority.  This sounds 
obvious, but in fact there are not two, but THREE issues involved with text 
directionality:

1.  Horizontal, that is, left-to-right (LTR) versus right-to-left (RTL).
2.  Vertical, that is, top-to-bottom (TTB) versus bottom-to-top (BTT).
3.  Priority of direction (e.g. (LTR, TTB) versus (TTB, LTR)).

If you think about it, all text of non-trivial length has both horizontal and 
vertical directionality, and also a priority to the directionality.  
Horizontal and vertical directionalities are not opposites, they are 
complements.  The Latin script is written (LTR, TTB) which means not only 
that there is a horizontal directionality of left-to-right and a vertical 
directionality of top-to-bottom, but also that the horizontal directionality 
takes precedence over the vertical.  That is, we complete a horizontal (LTR) 
line before moving down the page (TTB) to start another line.

According to TUS 3.0,
Latin and most other European scripts are (LTR, TTB).
Arabic and most other Middle Eastern scripts are (RTL, TTB).
Ogham is either (LTR, TTB) or (BTT, ???).
Han is traditionally written (TTB, RTL) and more recently (LTR, TTB).
Mongolian is written (TTB, LTR).

Unicode characters have a default directionality, but both this and the 
override mechanism cover only the horizontal aspect, not the vertical aspect 
or the priority of one over the other.  Thus, Mongolian characters are 
assigned the same directionality code as Latin ("L") even though the TTB 
directionality takes precedence over the LTR, the opposite of Latin.  And 
there is no plain-text way to indicate the alternative directionality of 
Ogham or Han.

An elaboration of the directional override mechanism to handle vertical 
directionality would have to take priority into account as well.  Instead of 
two directionalities, LTR and RTL, the Unicode Standard would have to 
consider eight.  The Bidirectional Algorithm might have to become 
Octodirectional, with a commensurate increase in complexity.  Perhaps this is 
the problem that is avoided by declaring vertical directionality to be a 
higher-level "formatting style" issue.  But it still seems arbitrary.

-Doug Ewell
 Fullerton, California




RE: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-28 Thread Sampo Syreeni

On Fri, 28 Dec 2001, Marco Cimarosti wrote:

>>I thought that sometimes numbers for example "123." might be written
>>horizontally in the middle of a vertical run.
>>y
>>a
>>d
>>d
>>a
>>   123.
>
>That's true: an extra complication! However, I have only seen that for
>one- or two-digit numbers.

I think this is something better handled by special-casing in rendering
software -- the numbers (and whatnot) could be rendered as rotated or
straight top-to-bottom as well. Considering this, it seems like a
stylistic variation better controlled by an upper level protocol, if at
all.

>But that would a limited case for horizontal text embedded in vertical
>text: I cannot imagine a real-world situation for a vertical text
>embedded in horizontal text.

If you think about the history of this particular rendering, it's about
the Western/Arabic numbers intruding the East Asian writing system. If
there's anything to believe in cyberpunk, the tide might well turn one
day. I'm not quite sure we couldn't one day have residual English embedded
with native Japanese terms. ;)

Sampo Syreeni, aka decoy - mailto:[EMAIL PROTECTED], tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2





RE: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-28 Thread Marco Cimarosti

Tex Texin wrote:
> > 
> > On the other hand, horizontal vs. vertical are attributes 
> that can be only
> > be applied to a whole paragraph or section.
> 
> Marco, is that true? I thought that sometimes numbers for 
> example "123."
> might be written horizontally in the middle of a vertical run.
>y
>a
>d
>d
>a
>   123.

That's true: an extra complication! However, I have only seen that for one-
or two-digit numbers.

This is also used for single letters or two-letter acronyms (such as "Km").
Probably this is the reason for the "squared letters" in range
U+3380..U+33DD (some of which are 3-letter long, BTW).

But that would a limited case for horizontal text embedded in vertical text:
I cannot imagine a real-world situation for a vertical text embedded in
horizontal text.

_ Marco




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-28 Thread Tex Texin

Marco Cimarosti wrote:
> I see a big difference between the two cases.
> 
> Right-to-left vs. left-to-right are attributes of arbitrary *spans* of text,
> which can easily be mixed within the same paragraph.
> 
> On the other hand, horizontal vs. vertical are attributes that can be only
> be applied to a whole paragraph or section.

Marco, is that true? I thought that sometimes numbers for example "123."
might be written horizontally in the middle of a vertical run.
   y
   a
   d
   d
   a
  123.
   y
   a
   ...

tex





-- 
-
Tex TexinDirector, International Business
mailto:[EMAIL PROTECTED]Tel: +1-781-280-4271
the Progress Company Fax: +1-781-280-4655
-
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html




RE: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-28 Thread Marco Cimarosti

Doug Ewell wrote:
> > Unicode doesn't have some way to indicate vertical writing. 
> I think the
> > only consideration for it is vertical presentation forms of some
> > characters. Anything more is left for other software layers to deal
> > with.
> 
> Seeing that Unicode already has left-to-right and 
> right-to-left override 
> characters, I wonder if a top-to-bottom override character 
> might also be reasonable.

I see a big difference between the two cases.

Right-to-left vs. left-to-right are attributes of arbitrary *spans* of text,
which can easily be mixed within the same paragraph.

On the other hand, horizontal vs. vertical are attributes that can be only
be applied to a whole paragraph or section.

So, an hypothetical pair (start/end) of top-to-bottom override character
should probably also act as paragraph separators.


I wish a decent 2002 to everybody (as wishing more than "decent" would be
quite irrealistic).

_ Marco






Ruby (was: Re: Vertical scripts)

2001-12-26 Thread Martin Duerst
At 17:30 01/12/25 -0800, Michael (michka) Kaplan wrote:
>From: "$BAk]namdqor(B $BDialamt_dgr"(B <[EMAIL PROTECTED]>
>
> > By the way, does any browser in common use
> > support the Ruby extensions to HTML?

The 'ruby extensions for HTML' are defined in
http://www.w3.org/TR/ruby/, a W3C recommendation.


>Well, looking at links like:
>
>http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/rt.asp
>
>(all on one line) and just doing a random search on
>http://msdn.microsoft.com/ for keywords like "HTML Ruby" make me think its
>supported in IE5 and later?

As far as I know, IE5 and later support the 'simple ruby markup'
(including parentheses) defined in the Recommendation
(see http://www.w3.org/TR/ruby/#simple-ruby1), but do
not support complex ruby markup.
[I'd be very glad to be corrected.]

Regards,   Martin.


Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-25 Thread DougEwell2

In a message dated 2001-12-25 16:57:39 Pacific Standard Time, 
[EMAIL PROTECTED] writes:

> Unicode doesn't have some way to indicate vertical writing. I think the
> only consideration for it is vertical presentation forms of some
> characters. Anything more is left for other software layers to deal
> with.

Seeing that Unicode already has left-to-right and right-to-left override 
characters, I wonder if a top-to-bottom override character might also be 
reasonable.

-Doug Ewell
 Fullerton, California




Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-25 Thread Michael \(michka\) Kaplan

From: "&Agr;&lgr;&eacgr;&xgr;&agr;&ngr;&dgr;&rgr;&ogr;&sfgr; &Dgr;&igr;&agr;&mgr;&agr;&ngr;&tgr;&iacgr;&dgr;&eegr;&sfgr;" <[EMAIL PROTECTED]>

> By the way, does any browser in common use
> support the Ruby extensions to HTML?

Well, looking at links like:

http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/rt.asp

(all on one line) and just doing a random search on
http://msdn.microsoft.com/ for keywords like "HTML Ruby" make me think its
supported in IE5 and later?


MichKa

Michael Kaplan
Trigeminal Software, Inc.  -- http://www.trigeminal.com/





Re: Vertical scripts (was: Tategaki (was: Re: Updated...))

2001-12-25 Thread Αλέξανδρος Διαμαντίδης

* Stefan Persson <[EMAIL PROTECTED]> [2001-12-26 00:02]:
> Is there some way to indicate vertical writing (in columns from right to
> left) for Japanese and Chinese? Is there a Unicode code point assigned for
> this, a HTML command, or just a special option in some word processors?

Well, some word processors and typesetting systems do support vertical
writing. It's probably more common in software oriented towards
Chinese and Japanese, but I can't help you there. I do know that
the Omega typesetting system supports vertical writing. Omega is
based on TeX but with many extensions and some changes, and uses
Unicode as its internal text encoding.

Unicode doesn't have some way to indicate vertical writing. I think the
only consideration for it is vertical presentation forms of some
characters. Anything more is left for other software layers to deal
with.

As for HTML, I don't know (I'm sure someone will fill us in) but even if
some mechanism for vertical writing is defined, I don't think any
current browser supports it. By the way, does any browser in common use
support the Ruby extensions to HTML?

While doing a web search for the word "tategaki", looking for its
meaning, I found a Java program that formats Japanese text for vertical
display using HTML tables with a cell for each character. It's here:

http://homepage.mac.com/kkonaka/TategakiProg.html

This is kind of a kludge, but it may be useful in some circumstances.
The author warns though:

> (this generates far many cells in a table commonly observed in normal
> web pages). - many browser cannot display text layout this way of more
> than a few pages... (they'd run out of memory).


-- 
&Agr;&lgr;&eacgr;&xgr;&agr;&ngr;&dgr;&rgr;&ogr;&sfgr; &Dgr;&igr;&agr;&mgr;&agr;&ngr;&tgr;&iacgr;&dgr;&eegr;&sfgr; * [EMAIL PROTECTED]