Re: TR35 (was: Standardize TimeZone ID)

2004-05-10 Thread Peter Kirk
On 07/05/2004 14:53, [EMAIL PROTECTED] wrote:

...

So the database aliases one to the other. Aliases are used for timezones
that are compeltely equivalent on the whole timeframe considered
(apparently only starting in the early years of last century).
   

The cutoff date is 1970-01-01; if two timezones have been the same ever since
then, they are not separately encoded *unless* they are in separate national
jurisdictions (because after all it is the nation-state which sets up the
rules).  This date is the Posix zero point.
 

It is not always the nation-state which sets the rules. For example, in 
Australia each state sets its own rules; and so there are six different 
schemes with half hour differences, some daylight saving and some 
without. It is not only possible but quite likely that new distinctions 
will be introduced in time zones which have been the same since 1970; 
e.g. very likely New South Wales and Victoria have been in the same time 
zone ever since then, but there is a real chance that NSW will abolish 
daylight saving but Victoria will not. So don't assume too quickly that 
time zones will not be split.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Phoenician

2004-05-10 Thread Peter Kirk
On 09/05/2004 01:05, Peter Constable wrote:

I think one's track record in making judgments on boundary cases is
established only after having successfully dealt with boundary cases --
and enough to establish a level of confidence. Of things already in
Unicode, what have been boundary cases between unificiation and
de-unification?
The unified Latin-but-not-Cyrillic w  q (if I've recalled the two
letters correctly) and Coptic/Greek characters are the only prior
boundary cases I can think of.
Peter

 

And these two cases are hardly a good advertisement for the expert's 
reputation. The Coptic/Greek unification proved to be ill-advised and is 
being undone. As for the unified W and Q, well, I guess that if the 
Kurds and others who use these letters in Cyrillic knew how this 
decision would mean that their alphabet will never be sorted correctly 
(unless they get round to tailoring their collations), they would make a 
strongly argued case for disunification. Well, perhaps the expert can 
feel how much his fingers have been burned by over-unification and so is 
now pressing for everything to be disunified.

And then there is the matter of CJK unification, which I gather is still 
rather contentious.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: TR35

2004-05-10 Thread Peter Kirk
On 07/05/2004 09:44, Carl W. Brown wrote:

...

If I live in Guam I will probably be using an en_US locale.  However the US territory does not contain my time zone.  Probably the best solution for this problem is to add a category of possessions to the territory information.  This allows applications to enumerate available time zones for not only the country itself but also it possessions that might be using the locale.  
 

This issue is not limited to a country's possessions. Many expatriates 
and travelling business people etc want to keep their (laptop) 
computer's general locale settings as that of their home country (not 
least because changing this often destabilises data) but need to set it 
to the time zone in which they are temporarily resident. So time zones 
should be kept independent of other locale information, especially 
independent of such things as date and decimal point formats, and 
preferred languages.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Phoenician

2004-05-10 Thread Peter Kirk
On 07/05/2004 15:59, Michael Everson wrote:

At 17:10 -0400 2004-05-07, [EMAIL PROTECTED] wrote:

This would only be the *default* rules.  Unicode-savvy sort programs can
accept tailorings that make the rules different, like the Swedish 
tailoring
that makes a-ring, a-umlaut, and o-umlaut sort after z instead of in 
their
default places with a and o.


As I said, they would be the *tailored* rules. Mixing scripts would go 
against the current practice of ISO/IEC 14651.


Well, we are not talking about ISO/IEC 14651 but about Unicode. Is there 
any really good reason not to mix two scripts, which are according to 
many people actually variants of one script but which are (if your 
proposal is accepted) seperately encoded for the convenience of some 
scholars? This sounds to me like the kind of rule which is made to be 
broken. If all the 22 CSWA scripts are collated together by default, 
this would significantly reduce the objections to encoding them as 
separate scripts. We can perhaps consider them as a family of congruent 
scripts. Of course we might then think that there are other such 
families, e.g. the different Indic scripts, but how to collate them 
should depend on Indian etc custom.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Everson-bashing (was: Phoenician)

2004-05-10 Thread John Cowan
Peter Kirk scripsit:

 But have the others agreed with his judgments because they are convinced 
 of their correctness? Or is it more that the others have trusted the 
 judgments of the one they consider to be an expert, and have either not 
 dared to stand up to him or have simply been unqulified to do so?  

This is laughable.

 It amazes me that all of the existing scripts have apparently been encoded 
 without any properly documented justification apart from one expert's 
 unchallenged judgments.

It would be amazing if it were true, but of course it's absolutely false.

 And these two cases are hardly a good advertisement for the expert's
 reputation. The Coptic/Greek unification proved to be ill-advised and is
 being undone. As for the unified W and Q, well, I guess that if the
 Kurds and others who use these letters in Cyrillic knew how this
 decision would mean that their alphabet will never be sorted correctly
 (unless they get round to tailoring their collations), they would make a
 strongly argued case for disunification. 

Nobody writes Kurdish in Cyrillic any more: it's a historic use of the
script only.

In any event, Michael had *nothing* to do with those unifications.
He has consistently pressed for disunification (rightly, IMHO).

 Well, perhaps the expert can
 feel how much his fingers have been burned by over-unification and so is
 now pressing for everything to be disunified.

Nonsense, and insulting nonsense to boot.  Michael has never pressed
for either total unification or total disunification, because both
positions are absurd, and his position is never absurd.  (I may
disagree with it from time to time, and I am willing to press him for
reasons, but I *always* respect his point of view.)

This verbal sniping on a subject (the history of character encoding)
you know nothing about is beneath you.  Try and do better.

 And then there is the matter of CJK unification, which I gather is still
 rather contentious.

Only among the invincibly ignorant.

-- 
John Cowan   [EMAIL PROTECTED]   http://www.ccil.org/~cowan
One time I called in to the central system and started working on a big
thick 'sed' and 'awk' heavy duty data bashing script.  One of the geologists
came by, looked over my shoulder and said 'Oh, that happens to me too.
Try hanging up and phoning in again.'  --Beverly Erlebacher



Re: Phoenician

2004-05-10 Thread Doug Ewell
Peter Kirk peterkirk at qaya dot org wrote:

 And these two cases are hardly a good advertisement for the expert's
 reputation. The Coptic/Greek unification proved to be ill-advised and
 is being undone. As for the unified W and Q, well, I guess that if the
 Kurds and others who use these letters in Cyrillic knew how this
 decision would mean that their alphabet will never be sorted correctly
 (unless they get round to tailoring their collations), they would make
 a strongly argued case for disunification. Well, perhaps the expert
 can feel how much his fingers have been burned by over-unification and
 so is now pressing for everything to be disunified.

I can't believe I am reading this.  Far more than anyone else, Michael
has *always* supported the disunification of Coptic from Greek and of
Kurdish Cyrillic Q and W from their Latin counterparts.  They have been
two of his signature causes through the years.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




Re: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Michael Everson
At 11:45 +0200 2004-05-10, Kent Karlsson wrote:
 
 We do actually mix scripts. Hiragana and Katakana are interleaved.

 Mark
And it might make sense to interleave (say) Thai and Lao in the 
default ordering.
No, it wouldn't.

Or to interleave, in the default ordering, the Indic scripts covered by ISCII.
No, it wouldn't!

Any pecularities could be handled in tailorings.
Such interleaving is the peculiarity. It renders an ordered text 
illegible to interleave Kannada, Sinhala, and Gujarati. Japanese is 
different; the users all use both scripts all the time.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Philippe Verdy
From: Michael Everson [EMAIL PROTECTED]
 Japanese is different; the users all use both scripts all the time.

And there are occurences in Japanese of Katakana suffixes or particules added to
Latin or Han words, notably to people names and trademarks... I've seen many
texts where Han and Katakana are mixed in the same word (where it would be
inappropriate to insert a word-break between runs of Han and Katakana
particules.)

My first implementation allowed line-breaks after each Han character, but an
exception was made after users request to not do that after Han and before
Katakana (despite line break is allowed between two Han characters), or after
Latin and Katakana. So a simple approache that allows linebreaks between
distinct scripts is deceptive. Am I wrong, or are my users wrong and want it as
a presentation preference?

Also, what about line breaking in long runs of Hangul grapheme clusters (I mean
here the true L+V*T* syllables with their diacritics, not the simplified LV and
LVT sub-syllables encoded in Hangul)? It seems that line breaking in Korean
obeys more to semantics constraints than to normative syllables, and I think it
is quite logical when you see that such presentation is sometimes prefered by
Latin readers too...

To make this work appropriately for some long Japanese or Korean sentences, and
match with users expectations, I had to support explicitly marks where
line-breaks should be allowed, using zero-width spaces. This makes things
complicate if the text is not modified with them. So I had to consider
ideographic (full-width) punctuation too (which is not directly equivalent to
their half-width Latin counter-part, as they already include the space after
them (for example the full-width period/dot, comma or colon) even if the glyph
looks a bit larger.




Re: Katakana_Or_Hiragana

2004-05-10 Thread jcowan
Tom Emerson scripsit:

 Perhaps Michael can enlighten us on the rational for grouping hiragana
 and katakana together as a single script.

They aren't.  They are collated together, that's all.

-- 
How they ever reached any conclusion at all[EMAIL PROTECTED]
is starkly unknowable to the human mind.   http://www.reutershealth.com
--Backstage Lensman, Randall Garrett  http://www.ccil.org/~cowan



Re: Katakana_Or_Hiragana

2004-05-10 Thread jcowan
Michael Everson scripsit:

 Phoenician and Hebrew should not be interfiled, of course, in the 
 default table, though John Cowan seems to think otherwise.

'Seems', monsieur?  Nay, 'does'; I know not 'seems'.
--Not Quite Hamlet

The point is, of course, that if Phoenician is to be used to represent
palaeo-Hebrew (as I agree is correct), then it will create an artificial
separation to *not* interfile them.  Consider a concordance to your
Phoenician-script-Tetragrammaton Bibles.  Such Tetras should not appear
at the beginning, nor yet at the end, but under yod where they belong.
This will also be of great value in the other application of collation,
viz. searching.

Those who use Phoenician primarily contrastively with Greek will want them
filed separately, and my proposal will sort Phoenician words after Greek
ones.

-- 
John Cowan  [EMAIL PROTECTED]
http://www.ccil.org/~cowan  http://www.reutershealth.com
Charles li reis, nostre emperesdre magnes,
Set anz totz pleinz ad ested in Espagnes.



Who's Harry Potter, err..., Potter Stewart? (was Re: Phoenician)

2004-05-10 Thread Kenneth Whistler

  Who's Potter Stewart?  (I don't own a TV).Elaine

 A former Associate Justice of the U.S. Supreme Court, who memorably
 declared in a 1964 concurring opinion that he could not define
 pornography, but he knew it when he saw it (and the movie in
  ^
  
Les Amants
 
 question wasn't it).

Jacobellis v. Ohio, 378 U.S. 184 (1964)

Read it here:

http://caselaw.lp.findlaw.com/scripts/getcase.pl?court=usvol=378invol=184

I shall not today attempt further to define the kinds of material
to be embraced within that shorthand description [hard-core
pornography]; and perhaps I could never succed in intelligibly
doing so. But I know it when I see it, and the motion picture
involved in this case is not that.

Eminently sensible of him, by the way.

And that, folks, is about as OT as we get on this list. :-)

--Ken




RE: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Mike Ayers
Title: RE: interleaved ordering (was RE: Phoenician)






 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of Philippe Verdy
 Sent: Monday, May 10, 2004 9:09 AM


 From: Michael Everson [EMAIL PROTECTED]
  Japanese is different; the users all use both scripts all the time.
 
 And there are occurences in Japanese of Katakana suffixes or 
 particules added to
 Latin or Han words, notably to people names and trademarks... 
 I've seen many
 texts where Han and Katakana are mixed in the same word 
 (where it would be
 inappropriate to insert a word-break between runs of Han and Katakana
 particules.)


 You mean hiragana, not katakana, and kanji, not Han, I believe. Katakana are used for transliteration, and are not typically joined to kanji, whereas hiragana are ubiquitously joined to kanji, as Japanese particles do not ordinarily have kanji representation. I have not seen katakana joined to kanji (or romaji), and suspect that such does not occur.

 My first implementation allowed line-breaks after each Han 
 character, but an
 exception was made after users request to not do that after 
 Han and before
 Katakana (despite line break is allowed between two Han 
 characters), or after
 Latin and Katakana. So a simple approache that allows 
 linebreaks between
 distinct scripts is deceptive. Am I wrong, or are my users 
 wrong and want it as
 a presentation preference?


 I believe, but am not certain, that nonbreaking kanji-to-hiragana is correct, whereas you can break on kanji-to-katakana.

 But all this leads me to finally ask: what does script mean? It seems clear to me that although the term has been used throughout the Phoenician debate, not everyone is using it the same way. I know that there is a definition of script that is used for encoding purposes, but can I find it written anywhere, or is it more of an ephemeral thing?


 Thanks,


/|/|ike





RE: Katakana_Or_Hiragana

2004-05-10 Thread Mike Ayers
Title: RE: Katakana_Or_Hiragana






 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
 Behalf Of [EMAIL PROTECTED]
 Sent: Monday, May 10, 2004 10:22 AM


 Tom Emerson scripsit:
 
  Perhaps Michael can enlighten us on the rational for 
 grouping hiragana
  and katakana together as a single script.
 
 They aren't. They are collated together, that's all.


 I guess it depends on how you look at it. The japanese refer to kana script, which encompasses both hiragana and katakana, so it could be said that the single scripts hirgana and katakana are encoded, whereas the single script kana is collated. This would not be evasive, either, as this is how they are used.


/|/|ike





RE: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Tom Emerson
Mike Ayers writes:
   You mean hiragana, not katakana, and kanji, not Han, I believe.
 Katakana are used for transliteration, and are not typically joined to
 kanji, whereas hiragana are ubiquitously joined to kanji, as Japanese
 particles do not ordinarily have kanji representation.  I have not seen
 katakana joined to kanji (or romaji), and suspect that such does not occur.

We have observed that katakana is being used more and more in places
that you traditionally saw hiragana, especially in advertisements and
on the Web. Katakana is also being used as a way of emphasizing words
in a text, even those that would normally be written in hiragana. The
choice of script is becoming a stylistic issue lately and you are
seeing katakana in places you wouldn't expect them.

I also haven't seen katakana attached to kanji, though I have seen it
attached to romaji in constrained circumstances. It is very rare,
however.

I have seen hiragana attached to romaji, usually in the context of
particles attached to English nouns. You see the same thing (only more
so) in Korean, where an eojeol may contain mixed latin script and
hankul.

This may be all beside the point: people are probably not interested
in contemporary script usage in these contexts.

-tree

-- 
Tom Emerson  Basis Technology Corp.
Software Architect http://www.basistech.com
  Beware the lollipop of mediocrity: lick it once and you suck forever



Re: Phoenician

2004-05-10 Thread E. Keown
  Elaine Keown
  Tucson

Dear Asmus Freytag:

 Becker's law:
 
 For every expert there's an equal and opposite
 expert.

This saying is especially true within Semitics (I'm
sure).  

But for me personally, my only interest is in database
functionality for Semitics.  

As far as I can tell, that interest runs directly
counter to the interests of more font-oriented people.
 - Elaine




__
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 



RE: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Michael Everson
At 12:12 -0700 2004-05-10, Mike Ayers wrote:
But all this leads me to finally ask:  what does script mean?  It 
seems clear to me that although the term has been used throughout 
the Phoenician debate, not everyone is using it the same way.  I 
know that there is a definition of script that is used for 
encoding purposes, but can I find it written anywhere, or is it more 
of an ephemeral thing?
I am way too jetlagged to go near this one today.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Subject lines that have nothing to do with message content

2004-05-10 Thread Rick McGowan
Personally speaking, I would have expected that a recent message on this  
list with the sujbect line Katakana_Or_Hiragana might have something to  
do with Japanese, Hiragana, Katakana, or at least Han, or perhaps even  
Asia. But no... It was about Phoenician.

It would be really helpful if people could use subject lines that have  
something to do with the subject of the message.

It just can't be that difficult for people to pick a reasonable subject  
line. And if you're going to go off-topic in a thread, you might consider  
getting a different subject line -- or at least adding a parenthetical  
about how you're going to go off the thread...

(As usual, this is my personal opinion and doesn't reflect an official  
policy, etc.)

Rick



Re: interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Stefan Persson
Mike Ayers wrote:
I have not seen 
katakana joined to kanji (or romaji), and suspect that such does not occur.
There are a few cases, e.g.  (So-Ren: Soviet Union), but that could 
also be written as two kanji as  (which is however very rare in 
modern Japanese).

I believe, but am not certain, that nonbreaking 
kanji-to-hiragana is correct, whereas you can break on kanji-to-katakana.
In Japanese you can put a line break between *any* characer, except 
before punctuation  end quote or after start quote.

Stefan



Script vs Writing System

2004-05-10 Thread Patrick Andries
At 12:12 -0700 2004-05-10, Mike Ayers wrote:

But all this leads me to finally ask:  what does script mean?  It 
seems clear to me that although the term has been used throughout the 
Phoenician debate, not everyone is using it the same way.  I know 
that there is a definition of script that is used for encoding 
purposes, but can I find it written anywhere, or is it more of an 
ephemeral thing?

[PA] The glossary has « A collection of symbols used to represent 
textual information in one or more writing systems. »

Chapter 6 also defines Writing Systems summarized by Table 6-1 Typology 
of Scripts (Writing Systems then Scripts) :

A writing system is then defined as « A set of rules for using one or 
more scripts to write a particular language. Examples include the 
American English writing System, the British English writing system, the 
French writing system, and the Japanese writing system. »

Writing
System
TypeUnicode Script(s)
--
«
Alphabets:   Latin, Greek, Cyrillic, Armenian, Thaana, Georgian, Ogham,
  Runic, Mongolian, Old Italic, Gothic, Ugaritic, 
Deseret, Shavian,
  Osmanya

Abjads:Hebrew, Arabic, Syriac

Abugidas: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
  Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar,
 Tagalog, Hanunóo, Buhid, Tagbanwa, Khmer, Limbu, Tai Le
Logosyllabaries: Han

Simple Syllabaries: Cherokee, Hiragana, Katakana, Bopomofo, Yi, Linear 
B, Cypriot

Featural Syllabaries: Ethiopic, Canadian Aboriginal Syllabics, Hangul
»
Note : «Table 6-1 lists all of the scripts currently encoded in the 
Unicode Standard, showing the
writing system type for each. The list is an approximate guide, rather 
than a definitive classification,
because of the mix of features seen in many scripts. The writing systems 
for some
languages may be quite complex, mixing more than one writing system 
together in a composite
system. Japanese is the best example; it mixes a logosyllabary (Han), 
two syllabaries
(Hiragana and Katakana), and one alphabet (Latin, for romaji).»



Re: Japanese line breaks (was: interleaved ordering)

2004-05-10 Thread Philippe Verdy
From: Stefan Persson [EMAIL PROTECTED]
 In Japanese you can put a line break between *any* characer, except
 before punctuation  end quote or after start quote.

Are you SURE of that? I had many negative comments about undesirable line breaks
in the middle of what is perceived as a single word, and where a single Kana
moved to the next line was seen as bad, notably when it is a particle.
I had similar comments from Korean users with Hangul.

OK the traditional writing rules will allow putting breaks everywhere so that
characters will line up equally in a grid, that would fill all free space in
paper rolls, but today, with mixed use of half-width/full-width, mixed scripts,
mixed font sizes or styles, etc... this traditional usage does not seem
tolerable as it would be hard to read. Japense users are now adpt of
fast-reading technics, and breaking some words or concepts to the next line does
not ease the understanding of text.

Users today want better hyphenation of text (bad term because they don't use
hyphens to mark it...), and they want style on it. Most commercial Asian
websites are very colorful, and use many font sizes and styles, much more often
than on European/American websites which look so monotonous for them...

We don't perceive the same idea of what is ugly such as patchworked colors.
Asian text is generally better shown with carefully chosen layouts so that words
will be placed according to their meaning and relation). Web design in Japan is
extremely creative. And there's a strong tradition in graphic arts.




Katakana and Kanji (was: Re: interleaved ordering (was RE: Phoenician))

2004-05-10 Thread Kenneth Whistler
Stefan Persson wrote:

 Mike Ayers wrote:
  I have not seen 
  katakana joined to kanji (or romaji), and suspect that such does not occur.
 
 There are a few cases, e.g. ソ連 (So-Ren: Soviet Union), but that could 
 also be written as two kanji as 蘇連 (which is however very rare in 
 modern Japanese).

It's actually quite common, depending on how you choose
to construe joined. Certainly, mixed katakana/kanji
lexical items occur all the time.

Japanese for PGA: puroogorufukyookai
  ^^^===
  katakana   kanji
  
 PGA Championship:  zenbeipuroo
==^
kanji katakana

It's true that katakana aren't normally used the way okurigana
are, to write out the grammatically changeable suffixal portion
of verb stems written in kanji. But that's rather beside the
point when kanji and katakana are rather freely mixed in
nominal compounds of all sorts.

By the way, the So-Ren example is just an abbreviation of the
same kind of pattern I show above:

Japanese for Soviet Union: sobietorenhoo  == soren
   ^^^==  ^^===
  katakana kanji
  
This process is an onrushing, accelerating one. If you look
at early 20th century Japanese materials, it is rather uncommon,
but if you look at contemporary Japanese writing -- particularly
the sort seen in popular culture, which is the leading edge of
this kind of change, it is all over the place. Katakana is
sweeping in as it carries with it all the English (and other)
language material rapidly moving into Japanese, along with all
the other popular functions of katakana.

Other examples from corporate names:

fujizerokkusu (Fuji Xerox)
^ 

tookyoogasu   (Tokyo Gas)
===  

nihonai·bii·emu  (IBM Japan)
=^^ ^^^ ^^^

Then there's always that all-purpose fixer-upper:

nenchakuteepu   (duct tape, adhesive tape)
^

--Ken   





RE: Subject lines that have nothing to do with message content

2004-05-10 Thread Peter Constable
Of course, if ever there was a subject line that permitted the topic to
wander howsoever far from where it started, the one on this thread is
it. :-)

Peter




Re: Thai Fongman and Khmer Phnek Moan

2004-05-10 Thread Kenneth Whistler
 What little
 I know about the phnek moan makes it seem peculiar that its Line Break
 class is NS.  Is there truly a distinction between how these two characters
 are used in their respective scripts that makes this difference warranted,

Dunno.

 or is this a possible error in the standard 

Possibly.

 that deserves official scrutiny?

Certainly, if it is wrong.

By the way, this is the kind of thing which *can* be fixed in
the standard, if shown to be problematical.

This deserves some research by people who know something about
how these characters do in fact behave in line-breaking, and
then, if a change is in order, a documented proposal explaining
the problem and the suggested fix could be submitted to the
UTC for consideration.

--Ken




RE: Thai Fongman and Khmer Phnek Moan

2004-05-10 Thread Peter Constable
Title: RE: Thai Fongman and Khmer Phnek Moan






Insofar as both AL and NS are informative properties, how much does in matter?



I cannot find any discussion of the Thai fongman in NECTEC's book on typography. It is described in the names list as a bullet. The Royal Institute's Thai dictionary defines ¿Í§Áѹ as name of a type of symbol… used in old books to mark the beginning of a section [this word can also mean a paragraph or verse, or blank lines separating them] or the start of a line [either poetry or prose]. So, the description bullet seems reasonable.

Other bullets have a breaking class of AL, so that seems appropriate for the Thai fongman.

I have no info regarding the Khmer counterpart.


Peter



Peter Constable

Globalization Infrastructure and Font Technologies

Microsoft Windows Division








RE: Japanese line breaks (was: interleaved ordering)

2004-05-10 Thread Han-Yi Shaw









Microsoft Office (Win
and Mac) applications ensure that the line breaking is correct for East Asian
Text. For example, in Microsoft Word, under Options | Asian Typography | First
and Last Characters, you will find the following options for Japanese:



Cannot Start Line with: !%),.:;?]}

Cannot
End Line with: $([\{



There are slight
variations for Traditional Chinese, Simplified Chinese, Japanese, and Korean
--- which is respected by Word as well.



Han-yi



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Tom Emerson
Sent: Monday, May 10, 2004 7:39 PM
To: Philippe Verdy
Cc: Unicode Mailing List
Subject: Re: Japanese line breaks (was: interleaved ordering)



Philippe Verdy writes:

 From: Stefan Persson
[EMAIL PROTECTED]

  In Japanese you can put a line break between
*any* characer, except

  before punctuation  end quote or after
start quote.

 

 Are you SURE of that? I had many negative
comments about undesirable line breaks

 in the middle of what is perceived as a single
word, and where a single Kana

 moved to the next line was seen as bad, notably
when it is a particle.

 I had similar comments from Korean users with
Hangul.



We've found an amazing amount of variation in where
breaks occur on

text live on the web... breaks show up everywhere and
anywhere, to the

point where our Japanese morphological analyzer has to
ignore

whitespace (horizontal and vertical) in many
situations.(*)



There is a JIS standard for line breaking, though I
don't have a copy

of it here at home right now. I can look up the
official rules

tomorrow if people are interested.



 -tree



(*) The worst case we've seen was the use of katanana
and hiragana in

 ASCII art, Picasso's Guarnica to be
exact. Gave our analyzer a

 real fit for a while.



-- 

Tom Emerson
Basis Technology Corp.

Software Architect http://www.basistech.com

 Beware the lollipop of mediocrity: lick it
once and you suck forever










interleaved ordering (was RE: Phoenician)

2004-05-10 Thread Kent Karlsson
 
 We do actually mix scripts. Hiragana and Katakana are interleaved.
 
 Mark

And it might make sense to interleave (say) Thai and Lao in
the default ordering. Or to interleave, in the default ordering,
the Indic scripts covered by ISCII. Any pecularities could be
handled in tailorings.

/kent k