date:20040520

James Kass  wrote:

> As a member of the Latin script user community, I'd not be threatened
> by a separate encoding for Fraktur.  I have Fraktur books in my
> library.  Whether I've got their titles stored in my database using
> Latin characters or abusing math variables is best left to
> speculation.


ððð ð ðð ð 
 ðð ððð ð
ðð ðð ð 
  ðð
 ðð ððð.


-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Response to Everson Phoenician and why June 7?


Dean Snyder replied,

> >Many people believe that the Dead Sea Scrolls were written by the Essenes,
> >but there are some who believe otherwise.  Discussion among knowledgeable
> >historians of the alphabet as to its origins may be lively and entertaining,
> >but, its identity as a separate script doesn't depend on whether the
> >Phoenicians created the alphabet or just traded for it.
> 
> You miss my point. I insist that it isn't a separate script, and that
> people are zealous to encode it largely because of the "romance" of its,
> possibly wrong, association with later alphabet developments.

I respect your point of view even if we disagree.  I suspect that people
are zealous to encode Phoenician because they are eager to store and
exchange Phoenician data in a standard fashion.  Whether or not that 
eagerness and zeal are sparked by romantic notions about the script's
origins doesn't dampen the zeal any more than whether or not the
Essenes actually wrote the Dead Sea Scrolls denies the existence or
importance of the scrolls themselves.

> As I've said in other emails, rather than encoding Phoenician, the really
> interesting thing to encode would be Archaic Greek - here you have a
> series of diascripts that are more different from Classical Greek than
> Phoenician is from Jewish Hebrew, in glyph shapes, letter stances,
> directions of writing, numbers of letters, etc. Now THAT would be useful
> for Classicists - not the meager 22 West Semitic letters enshrined as a
> "Phoenician" encoding. Furthermore, this has the advantage of side-
> stepping the whole issue of the origins of the Greek alphabet along with
> its subsequent Mediterranean script descendants, while not mucking up
> Canaanite which is already encoded in Unicode, albeit somewhat
> "prematurely", or "misnamed", as Hebrew.
> 

Float a proposal for Archaic Greek.  Maybe we should actually read your
proposal before rejecting or accepting it.  In other words, we should find 
out what we have to contend with before becoming contentious.

When you mentioned Archaic Greek in other e-mails, I reviewed several
on-line charts.  I couldn't see that Archaic Greek is more different from
Classical Greek than Phoenician from modern Hebrew, though, rather
my observations were just the opposite.  To me, Phoenician itself looks
more like Classical Greek than it does modern Hebrew.  But, perhaps
you have additional evidence to include in a proposal which would persuade
a change of mind on this.

Here's one on-line chart showing some of the Archaic Greek variants along
with Phoenician and Classical Greek.  It's contained in this PDF file authored 
by Michael Everson in 1998:
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n1938.pdf
(The chart is on page five.)

Best regards,

James Kass

Re: ISO 15924

From: "Michael Everson" <[EMAIL PROTECTED]>
> Beta files are now available for testing and verification at
> http://www.unicode.org/iso15294. The Registrar thanks everyone who
> has commented on the ISO 15924 website to date, and looks forward to
> final corrections if any should be required.

Not really a fatal error, but typos are visible in HTML tables 1, 2, 3, 4
http://www.unicode.org/iso15924/iso15924-codes.html
http://www.unicode.org/iso15924/iso15924-num.html
http://www.unicode.org/iso15924/iso15924-en.html
http://www.unicode.org/iso15924/iso15924-fr.html
for the PropertyValueAlias column which "displays" for example:
Old_Italic
instead of just
Old_Italic
This occurs for all values with underscores, and is visible because of the
missing (normally required in HTML) semicolon at end of numeric character
references...

As there's no reason why a "" spacing would be needed for displaying
ASCII underscores, this is just junk.

Re: Response to Everson Phoenician and why June 7?


Dean Snyder wrote,

> Doesn't the idea that so many people will embrace a new Fraktur range
> imply that it's the right thing to do?

It might, if it were true.

During the course of this discussion, I've often lamented that there is
no evidence whatsoever that Semitic mathematicians ever used palaeo-
Hebrew glyphs as math variables.  It would sure simplify things, wouldn't
it?

I have no fear that many people would embrace a new Fraktur range, therefore
its unnecessary to dredge up hypothetical problems that such a range might 
foster.

As a member of the Latin script user community, I'd not be threatened by
a separate encoding for Fraktur.  I have Fraktur books in my library.
Whether I've got their titles stored in my database using Latin characters
or abusing math variables is best left to speculation.

Best regards,

James Kass

Re: Response to Everson Phoenician and why June 7?

Dean Snyder  wrote:

>>> Doesn't the idea that so many people will embrace a new Fraktur
>>> range imply that it's the right thing to do?
>>
>> Who has ever asked for that?
>
> I have no one in mind.

There is no one for you to have in mind.

> But, by analogy, so should no one think thusly about Phoenician (which
> is to Jewish Hebrew script what Fraktur is to Roman German script).

James Kass pointed out that fears of large numbers of people adopting a
Phoenician Unicode encoding would demonstrate the usefulness of the
encoding.  You responded that the same was true for Fraktur, even though
there are no large numbers of people and no demand.  You've sidestepped
the question.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Response to Everson Phoenician and why June 7?

Dean Snyder  wrote:

> Doesn't the idea that so many people will embrace a new Fraktur range
> imply that it's the right thing to do?

Who has ever asked for that?

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Response to Everson Phoenician and why June 7?

Doug Ewell wrote at 9:27 PM on Thursday, May 20, 2004:

>Dean Snyder  wrote:
>
>> Doesn't the idea that so many people will embrace a new Fraktur range
>> imply that it's the right thing to do?
>
>Who has ever asked for that?

I have no one in mind.

But, by analogy, so should no one think thusly about Phoenician (which is
to Jewish Hebrew script what Fraktur is to Roman German script).


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: Response to Everson Phoenician and why June 7?

James Kass wrote at 4:06 AM on Friday, May 21, 2004:

>Dean Snyder wrote,
>
>> I know Phoenician has been sexy, provocative, glamorous, and enthralling
>> to historians of the alphabet for centuries - it was a part of the Greek
>> cultural psyche that they got their letters from the Phoenicians; and
>> many modern books have just repeated such ancient dicta uncritically. But
>> among serious scholars of West Semitic scripts there are standing
>> controversies about just what were, in fact, the exact sources for the
>> Archaic Greek alphabets. No one doubts, to my knowledge, that the sources
>> were Levantine, but there are conflicting signs, for example, in the
>> shapes of individual letters, the letter names, and the multiple
>> directions of writing, that point to sources other than what we call
>> "Phoenician" today. The issue is an open area of discussion among
>> knowledgeable historians of the alphabet.
>
>Many people believe that the Dead Sea Scrolls were written by the Essenes,
>but there are some who believe otherwise.  Discussion among knowledgeable
>historians of the alphabet as to its origins may be lively and entertaining,
>but, its identity as a separate script doesn't depend on whether the
>Phoenicians created the alphabet or just traded for it.

You miss my point. I insist that it isn't a separate script, and that
people are zealous to encode it largely because of the "romance" of its,
possibly wrong, association with later alphabet developments.

As I've said in other emails, rather than encoding Phoenician, the really
interesting thing to encode would be Archaic Greek - here you have a
series of diascripts that are more different from Classical Greek than
Phoenician is from Jewish Hebrew, in glyph shapes, letter stances,
directions of writing, numbers of letters, etc. Now THAT would be useful
for Classicists - not the meager 22 West Semitic letters enshrined as a
"Phoenician" encoding. Furthermore, this has the advantage of side-
stepping the whole issue of the origins of the Greek alphabet along with
its subsequent Mediterranean script descendants, while not mucking up
Canaanite which is already encoded in Unicode, albeit somewhat
"prematurely", or "misnamed", as Hebrew.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: Response to Everson Phoenician and why June 7?

James Kass wrote at 4:06 AM on Friday, May 21, 2004:

>Doesn't the idea that so many people will embrace a new Phoenician range
>imply that it's the right thing to do?

Doesn't the idea that so many people will embrace a new Fraktur range
imply that it's the right thing to do?


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: Response to Everson Phoenician and why June 7?


Dean Snyder wrote,

> Your seven-repeated "reasonable" analysis of this engineering issue does
> not even mention once, much less address, the PROBLEMS that will be
> caused by encoding this diascript.

There seems to be a fear among those opposed to the Phoenician proposal
that many people will welcome a separate encoding for the script and
begin to use it.  These people will create new data from old material and
convert existing data to the Phoenician encoding.

Doesn't the idea that so many people will embrace a new Phoenician range
imply that it's the right thing to do?

> I know Phoenician has been sexy, provocative, glamorous, and enthralling
> to historians of the alphabet for centuries - it was a part of the Greek
> cultural psyche that they got their letters from the Phoenicians; and
> many modern books have just repeated such ancient dicta uncritically. But
> among serious scholars of West Semitic scripts there are standing
> controversies about just what were, in fact, the exact sources for the
> Archaic Greek alphabets. No one doubts, to my knowledge, that the sources
> were Levantine, but there are conflicting signs, for example, in the
> shapes of individual letters, the letter names, and the multiple
> directions of writing, that point to sources other than what we call
> "Phoenician" today. The issue is an open area of discussion among
> knowledgeable historians of the alphabet.

Many people believe that the Dead Sea Scrolls were written by the Essenes,
but there are some who believe otherwise.  Discussion among knowledgeable
historians of the alphabet as to its origins may be lively and entertaining,
but, its identity as a separate script doesn't depend on whether the
Phoenicians created the alphabet or just traded for it.

> I think the unspoken issues here are based more on culture, more on
> intellectual chauvinism, and maybe even on religion, than on encoding
> issues per se. 

How about that?  We agree on something...

Best regards,

James Kass

Re: ISO 15924 codes for ConScript

Curtis Clark  wrote:

>>> One person wrote, regarding Qaak for Klingon:
>>>
 It's a shame you didn't pick something that could be pronounced in
 tlhIngan Hol, perhaps Qaap for pIqaD.
>>
>> Identifiers are identifiers, not words.
>
> That's why I sent my message to Doug off-list; it was a joke.

Note that I identified you only as "one person," so as not to breach
netiquette by re-posting to the list.

I took it semi-seriously.  One never knows what issues are going to be
critically important to Klingon aficionados.  All I know is that the
code I did choose for Klingon is something that can be pronounced by the
AFLAC Duck.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: ISO 15924

2004-05-20 Thread Michael \(michka\) Kaplan

<<>>

MichKa

P.S. Hint: the subject is smarter than the body.

- Original Message - 
From: "Mahesh T. Pai" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, May 20, 2004 7:49 PM
Subject: Re: ISO 15924


> Michael Everson said on Fri, May 21, 2004 at 12:56:44AM +0100,:
> 
>  > http://www.unicode.org/iso15294. The Registrar thanks everyone who 
> 
> I am getting a 404 here? 
>  
> 
> -- 
> 
>"Those willing to give up a little liberty for a little security
>   deserve neither security nor liberty"
> 
>

Re: Response to Everson Phoenician and why June 7?


A kind list member has advised privately that CAL does now use
Unicode for Syriac text.

I must not have been looking in the right places.

>From this page,
http://cal1.cn.huc.edu/fonts/fontinfo.html

 "Most of our Aramaic text display is now done in Unicode, 
 for which see our page hebunicodeinfo.html

 "However, m any of the bibliographic, dictionary, and 
 newsletter reference pages displayed on this site still 
 use a transliteration font ("CalSemitic") and a Jewish 
 Aramaic font ("WebHebrew AD" or "CalWebHebrew"). ..."

It's nice to see a transition to Unicode and it's comforting to
see that Unicode Syriac is used rather than Unicode Hebrew
to store and display Syriac text.

Best regards,

James Kass

Re: ISO 15924

At 08:19 +0530 2004-05-21, Mahesh T. Pai wrote:
Michael Everson said on Fri, May 21, 2004 at 12:56:44AM +0100,:
 > http://www.unicode.org/iso15294. The Registrar thanks everyone who
I am getting a 404 here?
Sorry, it's http://www.unicode.org/iso15924
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

At 22:31 -0400 2004-05-20, Dean Snyder wrote:
 >That proves nothing at all. In fact, I have a number of Phoenician
fonts using Latin clones to represent Phoenician letters. I have yet
to find a single font with Hebrew encoding and Phoenician glyphs.
Where have you looked?
On the internet, of course.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924

2004-05-20 Thread Mahesh T. Pai

Michael Everson said on Fri, May 21, 2004 at 12:56:44AM +0100,:

 > http://www.unicode.org/iso15294. The Registrar thanks everyone who 

I am getting a 404 here? 
 

-- 

   "Those willing to give up a little liberty for a little security
deserve neither security nor liberty"

Re: Response to Everson Phoenician and why June 7?

Kenneth Whistler wrote at 4:51 PM on Thursday, May 20, 2004:

>John Hudson asked, again:
>
>> My question, again, is whether there is a need for the plain 
>> text distinction in the first place?
>
>And I claim that there is no final answer for this question. We
>simply have irresolvable differences of opinion, with some
>asserting that it is self-evident that there is such a need,
>and others asserting that it is ridiculous to even consider
>encoding Phoenician as a distinct script, and that there is
>no such need.
>
>My own take on this seemingly irreconcilable clash of opinion is
>that if *some* people assert a need (and if they seem to be
>reasonable people instead of crackpots with no demonstrable
>knowledge of the standard and of plain text) then there *is*
>a need. And that people who assert that there is *no* need
>are really asserting that *they* have no need and are making
>the reasonable (but fallacious) assumption that since they
>are rational and knowledgable, the fact that *they* have no
>need demonstrates that there *is* no need.
>
>If such is the case, then there *is* a need -- the question
>then just devolves to whether the need is significant enough
>for the UTC and WG2 to bother with it, and whether even if
>the need is met by encoding of characters, anyone will actually
>implement any relevant behavior in software or design fonts
>for it.
>
>In my opinion, Phoenician as a script has passed a
>reasonable need test, and has also passed a significant-enough-
>to-bother test.
>
>Note that these considerations need to be matters of
>reasonableness and appropriateness. There is no absolutely
>correct answer to be sought here. A character encoding standard
>is an engineering construct, not a revelation of truth, and
>we are seeking solutions that will enable software handling
>text content and display to do reasonable things with it at
>reasonable costs.
>
>If you start looking for absolutes here, it is relatively easy
>to apply reductio ad absurdum. In an absolute sense, there is
>no "need" to encode *any* other script, because they can *all*
>be represented by one or another transliteration scheme or
>masquerading scheme and be rendered with some variety or
>other of symbol font encoding. After all, that's exactly what
>people have been doing to date already for them -- or they
>are making use of encodings outside the context of Unicode,
>which they could go on using, or they are making use of graphics
>and facsimiles, and so on. The world wouldn't end if all such
>methods and "hacks" continued in use.
>
>The question is rather, given the fundamental nature of the
>Unicode Standard as enabling text processing for modern
>software, it is cost-effective and *reasonable* to provide
>a Unicode encoding for one particular script or another,
>unencoded to date, so as to maximize the chances that it
>will be handled more easily by modern software in the global
>infrastructure and to minimize the costs associated with
>doing so.
>
>*That* is the test which should be applied when trying to
>make decisions about which of the remaining varieties of
>unencoded writing systems rise to the level of distinctness,
>utility, and cost-effectiveness to be encoded as another
>script in the standard.
>
>--Ken

Your seven-repeated "reasonable" analysis of this engineering issue does
not even mention once, much less address, the PROBLEMS that will be
caused by encoding this diascript.

The precedent you are bent on setting here will open a can of worms for
the dozens of diascripts of these 22 West Semitic letters.

Your analysis here is a rationalization for a decision you made years ago
before you ever consulted Semitic scholars on this issue.

And you are ignoring the advice you are being given by Semitic scholars now.

The problem is, you will not have to live with the resultant mess, but WE
WILL. That's why we care more about this issue.

I know Phoenician has been sexy, provocative, glamorous, and enthralling
to historians of the alphabet for centuries - it was a part of the Greek
cultural psyche that they got their letters from the Phoenicians; and
many modern books have just repeated such ancient dicta uncritically. But
among serious scholars of West Semitic scripts there are standing
controversies about just what were, in fact, the exact sources for the
Archaic Greek alphabets. No one doubts, to my knowledge, that the sources
were Levantine, but there are conflicting signs, for example, in the
shapes of individual letters, the letter names, and the multiple
directions of writing, that point to sources other than what we call
"Phoenician" today. The issue is an open area of discussion among
knowledgeable historians of the alphabet.

On the other hand, Hebrew, as a script system, is such a loaded and
complicated series of cultural artifacts, that it seems incongruous
(mostly to non-Semitists, I fear) to associate Hebrew and Phoenician
together in the same encoding.

It's really a shame that the Unicode Techn

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

Kenneth Whistler wrote:
A character encoding standard is an engineering construct,
> not a revelation of truth
Amen.
I begin to suspect that part of the problem -- the problem of interminable debate, not any
technical problem -- is due in part to different perceptions of the Unicode Standard. It
must seem pretty obvious to engineers that this is a standard for encoding characters and
that implementing support for the standard does not, per se, imply much of anything about
how users should encode text. This is perhaps less obvious to non-engineers -- i.e. to
users --, and understandably so given the typical representation of Unicode to this
audience: 'Now supports all the world's major living languages!'. It is evident from the
Phoenician discussion that a good number of people -- intelligent people, and experts in
particular fields -- expect UTC decisions on what characters to encode to influence user
decisions on how to encode specific texts. I don't think this expectation is unreasonable,
given their perception of the standard, and perhaps Unicode needs to do a better job in
conveying what the standard is and does and how it can be used.

There remains, in the Phoenician debate, much fuss about Unicode disunifying what a
particular set of people consider to be the same thing. Perhaps the point needs to be made
more strongly that for practical text processing purposes *unification or disunification
of Phoenician and Palaeo-Hebrew happens only at the point of encoding a particular text*.
There is no reason at all why Semiticists cannot simply totally ignore the proposed
Phoenician block. The important question then, it seems to me, is not whether to encode
Phoenician or not, but how to better communicate that the encoding of a particular set of
characters does not mean that they have to be used to encode particular texts or languages.

John Hudson

U+0482

2004-05-20 Thread Anto'nio Martins-Tuva'lkin

Why the general Category of U+0482 : CYRILLIC THOUSANDS SIGN is So
[Symbol, Other] while the apparently equivalent characters in the
U+2160 - U+2183 range are rather Nl [Number, Letter].

Granted that U+0482 is not a letter, but IMHO it should at least be
No [Number, Other] -- "So" puts it in the same category as a dingbat
(no disrespect to dingbats implied).

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|

Re: Response to Everson Phoenician and why June 7?

Michael Everson wrote at 2:45 AM on Friday, May 21, 2004:

>At 21:28 -0400 2004-05-20, Dean Snyder wrote:
>>Michael Everson wrote at 11:24 PM on Thursday, May 20, 2004:
>>
>>>At 14:59 -0700 2004-05-20, Patrick Andries wrote:
>>>
You may mean that the Unicode book does not document how Phoenician
(or Paleo-Hebrew) may be encoded. This is not to say that no one is
using Unicode to encode Paleo-Hebrew texts.
>>>
>>>The several Phoenician fonts which I have are *all* Latin clones.
>>
>>So are most of the Hebrew fonts that people have. And I'll bet they use
>>the same code points for the same letters - change the font, and you have
>>the same characters displayed in Jewish Hebrew or Phoenician.
>
>That proves nothing at all. In fact, I have a number of Phoenician 
>fonts using Latin clones to represent Phoenician letters. I have yet 
>to find a single font with Hebrew encoding and Phoenician glyphs. 

Where have you looked?


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: Response to Everson Phoenician and why June 7?


Elaine Keown wrote,

> Hi,
> 
> > Does Dr. Kaufman speak for all professionals in the
> > field, or would it be fair to say that Dr. Kaufman 
> > is speaking for only one such professional?
> 
> Prof. Dr. Stephen Kaufman, of Hebrew Union College,
> Cincinnati, is the leading computational Aramaist in
> the world.  He is certainly among the top 5 Semitics
> scholars.  
> 
> Prof. Kaufman is the head of the CAL, the
> Comprehensive Aramaic Lexicon.  He hopes to completely
> computerize all Aramaic ever written.
> 
> He knows more about manuscripts and about variant
> glyphs than I ever expect to know.---Elaine


Let's review some material from the CAL web site.

From...
http://cal1.cn.huc.edu/hebunicodeinfo.html

 "Under Explorer, if Hebrew displays properly but Syriac 
 does not, try the following trick: click View->Encoding->autoselect, 
 then click back to View->Encoding->Hebrew-(ISO Visual). 
 Now you should see Estrangelo. "

Although I haven't tried this, I'd expect to see Hebrew rather
than Estrangelo if I had a Hebrew-ISO Visual font installed, and
I'd expect to see Latin mojibake if I didn't.  Further, switching
the View-Encoding to ISO-8859-8 would most likely ruin the ability
to display any actual Unicode (UTF-8) material on the same page.

 "When displaying Hebrew square characters for Jewish Aramaic 
 texts, many of the CAL pages depend on the UNICODE modules 
 built into current browsers. Thus, you do not have to have any 
 Hebrew font loaded into your system to view text in Hebrew."

Actually, unless the source embeds a font, you'll need to have a Hebrew
font installed on your system in order to view Hebrew text.

>From the results for a search of the lexicon on the English word
"horse"...  


 byTr a
 1 Syr on horseback
 LS2 68
 LS2 v: bayTAr

 byryd N byryd)
 1 Syr swift horse
 LS2 95
 LS2 v: byryd)


The word "byryd" does not appear to be in the Syriac script.

This page...
http://cal1.cn.huc.edu/searching/targumsearch.html
...allows the user to view the output in "Roman" rather than "Aramaic".

Here's what the HTML source does for the Syriac output if the
"Peshitta" option is selected in the query:

 Peshitta:
 .)(r) tYw )YM$ tY .)hL) )rb tY$rb 
  

That's not Unicode.

But, the original question didn't concern Prof. Kaufman's credentials,
rather it was asked if Prof. Kaufman spoke for himself or if he claimed
to speak for all professionals in the field.  (Not that Prof. Kaufman
appeared to make such a claim, rather this claim might be inferred from
something written by Peter Kirk.)

The Comprehensive Aramaic Lexicon looks like a fine project and, clearly,
a lot of work is involved.  This lexicon appears to aim to cover all of the
ancient Aramaic words and does not appear to be much concerned with
the original script(s) used in the source material.  The CAL offers output
in either Latin or Hebrew characters.  If the source was not originally
written in modern Hebrew script, then the Hebrew script output is a 
transliteration, and there's nothing wrong with that.

(I've seen other lexicons for languages using scripts not yet encoded
in Unicode which, of course, use such transliteration.)

The Syriac material is clearly neither Unicode nor Syriac.  When 
someone is comfortable transliterating Syriac texts into the
characters of an old Hebrew code page, we shouldn't be suprised 
if they like to do the same thing with Phoenician script texts.

Best regards,

James Kass

Re: Unibook and Code2000

2004-05-20 Thread Anto'nio Martins-Tuva'lkin

On 2004.05.20, 20:54, I wrote:

> I just noticed that some UCS blocks are displayed in Unibook turned
> 90 deg. counter-clockwise (apparently everything from U+2460 to
> U+27B0); the only thing I changed recently was adding Code2000

OK, fixed by adding the font properly to . However, it
should not happen, when a single font is forced thru Options | Font.

BTW, I'm using Unibook 3.3 (221) and my Code2000 is dated 2003.01.19
(3 155 104 bytes).

(I also added Unicode BMP Fallback SIL at the very bottom of
: It looks great, though not really necessary. :-)

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|

Re: Response to Everson Phoenician and why June 7?

At 21:28 -0400 2004-05-20, Dean Snyder wrote:
Michael Everson wrote at 11:24 PM on Thursday, May 20, 2004:
At 14:59 -0700 2004-05-20, Patrick Andries wrote:
You may mean that the Unicode book does not document how Phoenician
(or Paleo-Hebrew) may be encoded. This is not to say that no one is
using Unicode to encode Paleo-Hebrew texts.
The several Phoenician fonts which I have are *all* Latin clones.
So are most of the Hebrew fonts that people have. And I'll bet they use
the same code points for the same letters - change the font, and you have
the same characters displayed in Jewish Hebrew or Phoenician.
That proves nothing at all. In fact, I have a number of Phoenician 
fonts using Latin clones to represent Phoenician letters. I have yet 
to find a single font with Hebrew encoding and Phoenician glyphs. I'm 
not "betting" either. I have actual evidence. Hearsay does not 
convince.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Cowan

Kenneth Whistler scripsit:

> The question is rather, given the fundamental nature of the
> Unicode Standard as enabling text processing for modern
> software, it is cost-effective and *reasonable* to provide
> a Unicode encoding for one particular script or another,
> unencoded to date, so as to maximize the chances that it
> will be handled more easily by modern software in the global
> infrastructure and to minimize the costs associated with
> doing so.

These words (and indeed your entire posting) deserve to be written
up in letters of gold somewhere.

-- 
LEAR: Dost thou call me fool, boy?  John Cowan
FOOL: All thy other titles  http://www.ccil.org/~cowan
 thou hast given away:  [EMAIL PROTECTED]
  That thou wast born with. http://www.reutershealth.com

Re: Response to Everson Phoenician and why June 7?

Michael Everson wrote at 11:24 PM on Thursday, May 20, 2004:

>At 14:59 -0700 2004-05-20, Patrick Andries wrote:
>
>>You may mean that the Unicode book does not document how Phoenician 
>>(or Paleo-Hebrew) may be encoded. This is not to say that no one is 
>>using Unicode to encode Paleo-Hebrew texts.
>
>The several Phoenician fonts which I have are *all* Latin clones.

So are most of the Hebrew fonts that people have. And I'll bet they use
the same code points for the same letters - change the font, and you have
the same characters displayed in Jewish Hebrew or Phoenician.


Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: Response to Everson Phoenician and why June 7?

James Kass wrote at 9:04 PM on Thursday, May 20, 2004:

>In this case, I think it's important to be picky because there are
>no current Unicoding practices for Phoenician.  Phoenician is not 
>yet encoded in the standard.

That's like saying Fraktur is not yet encoded in the standard.



Respectfully,

Dean A. Snyder

Assistant Research Scholar
Manager, Digital Hammurabi Project
Computer Science Department
Whiting School of Engineering
218C New Engineering Building
3400 North Charles Street
Johns Hopkins University
Baltimore, Maryland, USA 21218

office: 410 516-6850
cell: 717 817-4897
www.jhu.edu/digitalhammurabi

Re: ISO 15924 codes for ConScript

2004-05-20 Thread Curtis Clark

on 2004-05-20 07:52 Peter Constable wrote:
One person wrote, regarding Qaak for Klingon:

It's a shame you didn't pick something that could be pronounced in
tlhIngan Hol, perhaps Qaap for pIqaD.

Identifiers are identifiers, not words. 
That's why I sent my message to Doug off-list; it was a joke.
--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread E. Keown

Elaine Keown
Tucson

Hi,

> Does Dr. Kaufman speak for all professionals in the
> field, or would it be fair to say that Dr. Kaufman 
> is speaking for only one such professional?

Prof. Dr. Stephen Kaufman, of Hebrew Union College,
Cincinnati, is the leading computational Aramaist in
the world.  He is certainly among the top 5 Semitics
scholars.  

Prof. Kaufman is the head of the CAL, the
Comprehensive Aramaic Lexicon.  He hopes to completely
computerize all Aramaic ever written.

He knows more about manuscripts and about variant
glyphs than I ever expect to know.---Elaine




__
Do you Yahoo!?
Yahoo! Domains  Claim yours for only $14.70/year
http://smallbusiness.promotions.yahoo.com/offer

Re: ISO 15924 draft fixes

From: "Michael Everson" <[EMAIL PROTECTED]>
> At 00:05 +0200 2004-05-21, Philippe Verdy wrote:
>
> >This (below) is my own plain text version (still using the field and row
order
> >of table 3 by english name, instead of the order of table 1 by code)... Some
> >entries are commented out with %.
>
> The RA has no intention whatsoever of making use of this file. Absolutely not.

OK. But you have also argumented incorrectly to oppose one of my questions
related to the "Common" script ID, when I was asking to what "Common" and
"Inherited" (defined in UAX#24) corresponded in ISO-15924. If I look at the
standard Property Values Aliases defined in the UCD files, I see this rule:

sc ; Zyyy  ; Common

Clearly it states that "Common" is an alias of the "Zyyy" script code.
So the ISO-15924 tables should reflect it in their "ID" columns. For example in
Table 1 (list by code):

Zyyy;998;Code for undetermined script;codet pour écriture
indéterminée;Common;2004-05-01

If your arguments related to the usage of the ISO-15924 "Zyyy;998" codes are
valid, and differ from the definition of the "Common" script ID in UAX #24, then
there's a problem in the definition of the PropertyValueAliases.txt file in the
UCD 4.0 and the "sc ; Zyyy ; Common" line should be removed... This will require
an amendment to Unicode.

Variation Sequences as Substitute for Fonts or for Encoding a Script (Was... Phoenician ...)

2004-05-20 Thread Kenneth Whistler

Ernest indicated:

> Whether using variation sequences to separate
> Phoenician from Square Hebrew would be daft
> would depend upon a number of factors.
> 
> How often would both glyph repertoires appear in
> the same document?
> 
> How frequently would non-Square Hebrew glyphs
> be used?
> 
> How important is it to any particular body of users
> to emphasize the relationship of the different
> repertoires by using the same base characters?
> 
> How large would that body of users be compared
> to other users who do not need such an emphasis?
> 
> I don't know the answers to the above questions.

Actually, I think the answers to those questions are
irrelevant.

> I see those answers as determining whether 
> non-unification or unification supplemented with
> variation sequences would be the better choice.

The main reason why such a proposal is daft is because
the UTC has never had any intention that variation sequences
be used this way -- and as a result would never acquiesce
in encoding an entire *script* as a set of variation
sequences off another script.

The options are, as John indicated:

a. Assume one script, and render differences via fonts
   (mapped to the same code points).

b. Assume two scripts, encode distinctly, and render
   differences via fonts (mapped to different code points).
   
Variation sequences are used to indicate variant glyphs for
particular characters within a script (or set of symbols) --
not as a hack for avoiding the encoding of a script entire
or for avoiding the need for font tagging to make visual
distinctions in a writing system.

Variation sequences are also a last resort, used only in
instances where a distinct character encoding approach
smells too much of duplication of otherwise identical
"characters" which just happen to have some particular
formal distinction that is needed for rendering, roundtrip
mapping, etc.

Of course you can (and have) argued that you can simply apply
that logic to *every* character of a script. But I can
assure you that there is *no* constiuency for approaching
decisions about encoding an entire script that way in
either the UTC or in WG2.

--Ken

ISO 15924

Beta files are now available for testing and verification at 
http://www.unicode.org/iso15294. The Registrar thanks everyone who 
has commented on the ISO 15924 website to date, and looks forward to 
final corrections if any should be required.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Kenneth Whistler

Patrick said:

> >In this case, I think it's important to be picky because there are
> >no current Unicoding practices for Phoenician. 
> >
> You may mean that the Unicode book does not document how Phoenician (or 
> Paleo-Hebrew) may be encoded. This is not to say that no one is using 
> Unicode to encode Paleo-Hebrew texts.
 ^^
 represent
 
I like to distinguish this, because the whole notion of
what it means to "encode a text" tends to derail the discussion
immediately.

The Unicode Standard *encodes* abstract characters.

There are many potential abstract characters, but one of the
general principles used is that each significant "letter" (grapheme)
from a *script* will be encoded once as a character in the
standard. That, of course, begs the question of identifying
the "script" and its exact repertoire of "letters". The identification
of the "script" is what the Phoenician argument has been about,
since there is no serious question about the repertoire of
"letters" for it.

Once a repertoire of abstract characters has been *encoded*
in the Unicode Standard, those encoded characters can then
be used to *represent* the plain text content of documents.

This is deliberately different from talking about "encoding the
text", because people don't have common understandings about
what that means, and often expect various aspects of format
and appearance to also be "encoded" -- hence the way these
discussions tend to veer off into ditches.

Now returning to Patrick's statement and substituting for a
different unencoded script:

> the Unicode standard does not document how *Avestan*  
> may be encoded. This is not to say that no one is using 
> Unicode to represent *Avestan* texts.

Also true, right? Or...

> the Unicode standard does not document how *Tifinagh*  
> may be encoded. This is not to say that no one is using 
> Unicode to represent *Tifinagh* texts.

O.k., I guess you can see that this particular argument is not
going to go anywhere. Any script which is not currently encoded
in the standard can be (and probably is) represented *somehow*
by Unicode characters, either via PUA or transliteration or
some other arbitrary intermediate encoding of entities. That it
is (or could be) so represented has little or no bearing on the
question of whether the script in question is or is not
distinct enough from some already encoded but historically
related script to warrant a distinct encoding as a "script" in
the Unicode sense of a script.

John Hudson asked, again:

> My question, again, is whether there is a need for the plain 
> text distinction in the first place?

And I claim that there is no final answer for this question. We
simply have irresolvable differences of opinion, with some
asserting that it is self-evident that there is such a need,
and others asserting that it is ridiculous to even consider
encoding Phoenician as a distinct script, and that there is
no such need.

My own take on this seemingly irreconcilable clash of opinion is
that if *some* people assert a need (and if they seem to be
reasonable people instead of crackpots with no demonstrable
knowledge of the standard and of plain text) then there *is*
a need. And that people who assert that there is *no* need
are really asserting that *they* have no need and are making
the reasonable (but fallacious) assumption that since they
are rational and knowledgable, the fact that *they* have no
need demonstrates that there *is* no need.

If such is the case, then there *is* a need -- the question
then just devolves to whether the need is significant enough
for the UTC and WG2 to bother with it, and whether even if
the need is met by encoding of characters, anyone will actually
implement any relevant behavior in software or design fonts
for it.

In my opinion, Phoenician as a script has passed a
reasonable need test, and has also passed a significant-enough-
to-bother test.

Note that these considerations need to be matters of
reasonableness and appropriateness. There is no absolutely
correct answer to be sought here. A character encoding standard
is an engineering construct, not a revelation of truth, and
we are seeking solutions that will enable software handling
text content and display to do reasonable things with it at
reasonable costs.

If you start looking for absolutes here, it is relatively easy
to apply reductio ad absurdum. In an absolute sense, there is
no "need" to encode *any* other script, because they can *all*
be represented by one or another transliteration scheme or
masquerading scheme and be rendered with some variety or
other of symbol font encoding. After all, that's exactly what
people have been doing to date already for them -- or they
are making use of encodings outside the context of Unicode,
which they could go on using, or they are making use of graphics
and facsimiles, and so on. The world wouldn't end if all such
methods and "hacks" continued in use.

The question is rather, given

Re: ISO 15924 draft fixes

From: "Peter Constable" <[EMAIL PROTECTED]>
> > From: Philippe Verdy [mailto:[EMAIL PROTECTED]
>
> > No the structure is correct, however the text file was prepared by
> copy/pasting
> > HTML text inserted in empty cells, namely the " " character
> reference (that
> > contains a syntaxic semicolon conflicting with the CSV separator).
>
> IMO, the structure of data is effectively determined by how processes
> will interpret the data. A process won't see 6 columns one of which
> contains " ". It will see seven columns one of which contains
> " ".
>
> He's said the file has been fixed (though I don't know if he's posted
> the fixed file).

It's not fixed in the zipped archive linked from the ISO 15924/RA web pages (no
changed occured for now for this download), but it is fixed in the corrected
archive that Michael indicated here:
http://www.unicode.org/iso15924/iso15924-fixes.zip
(this link is not published officially for now, because Michael wanted comments
about it before, thanks because it was still not perfect)

Michael has started the corrections in the HTML tables 1 and 2, but table 3 (and
its "downlodable" alternative plain-text version) and table 4 are still not
corrected.

I said this was "lots of files" to change, but in fact all can be done with one
spreadsheet saved into 5 files. Michael could also have used a very basic
database application (an Access or FileMaker or dBase or Paradox database, with
1 table and 5 query-views, or other similar tools that each programmer or data
maintainer should have to perform easily such basic task without lots of manual
editing, and even without programming a script).

Re: ISO 15924 draft fixes

From: "Michael Everson" <[EMAIL PROTECTED]>
> I could use a little help rendering this into French, lest I
> embarrass myself
>
> "The Property Value Alias is defined as part of the Unicode Standard
> and is provided informatively in the tables here to show how entries
> in the ISO 15924 code table relate to script names defined in
> Unicode."

Tip: French translation is:
"Le synonyme de valeur de propriété est défini au sein du Standard Unicode
et est fourni ici de façon informative dans les tables, afin de montrer comment
les entrées des tables de codets ISO 15924 correspondent aux noms de scripts
définis dans Unicode."
(there should be a reference to the PropertyValueAliases.txt file in the
UCD, and the section in the UTS or its annexes that describes this UCD text
file.)

It's true that the PropertyValueAliases.txt file in the UCD already contains
long aliases for the shorter ISO-15924 codes:

(...)
sc ; Arab  ; Arabic
sc ; Armn  ; Armenian
(...)
sc ; Zyyy  ; Common
(...)

It's true that this same file does not list all possible values (the long value
"Inherited" has no other alias defined in that file).
May be this file in the UCD could list also the ISO-15924 numeric codes, but
there's no obligation to add them there. Simply the existence of the "sc: ..."
lines are enough to indicate that the prefered alias is the ISO-15924 code when
it exists, so that "Arab" is prefered to "Arabic", or "Linb" is prefered to
"Linear_B".

With regards to semantics however, there's no difference between "Arab" and
"Arabic", or between "Linb" and "Linear_B", meaning that these values are in the
same value space. That's a good reason to not pollute that value space with new
long uneeded aliases. The long aliases only exist for legacy reasons, also in
Unicode, and the "ID" column in ISO-15924 tables is mostly informative, and
should not be normative.

This ID column in ISO-15924 already has the semantics of a "Unicode Script
Property Value Alias", but it could be any other alias needed for some other
legacy applications. I just wonder why this column was placed there, before the
Date column that is required, given that there may possibly exist several legacy
aliases to list in ISO-15924, and defined in other standards than Unicode.

If you want to keep a master table for the long term, I would either drop this
ID column, or put it at end of the row, after the Date field (so that more than
1 alias could be added to each code; For example, there are some numeric script
ids defined in OpenType and that could be listed as "X_OT_17", if they are bound
directly to standard script codes)

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Ernest Cline




> [Original Message]
> From: John Hudson <[EMAIL PROTECTED]>
>
> Peter, are we talking about the same thing? Ernest is
> suggesting bizarre measures to deal with a problem
> -- in my opinion, a non-existent one -- that he sees in
> *unification*. You are arguing against Michael's
> *dis-unification*. The ridiculousness of Ernest's
> suggestion to use variation selector sequences --
> indeed, perhaps he intends it to be ridiculous to 
> make a point -- is an argument in favour of
> dis-unification, since the alternative for 
> making a plain-text distinction is so daft.

Whether using variation sequences to separate
Phoenician from Square Hebrew would be daft
would depend upon a number of factors.

How often would both glyph repertoires appear in
the same document?

How frequently would non-Square Hebrew glyphs
be used?

How important is it to any particular body of users
to emphasize the relationship of the different
repertoires by using the same base characters?

How large would that body of users be compared
to other users who do not need such an emphasis?

I don't know the answers to the above questions.
I see those answers as determining whether 
non-unification or unification supplemented with
variation sequences would be the better choice.

Re: ISO 15924 draft fixes

At 00:05 +0200 2004-05-21, Philippe Verdy wrote:
This (below) is my own plain text version (still using the field and row order
of table 3 by english name, instead of the order of table 1 by code)... Some
entries are commented out with %.
The RA has no intention whatsoever of making use of this file. Absolutely not.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

At 14:59 -0700 2004-05-20, Patrick Andries wrote:
You may mean that the Unicode book does not document how Phoenician 
(or Paleo-Hebrew) may be encoded. This is not to say that no one is 
using Unicode to encode Paleo-Hebrew texts.
The several Phoenician fonts which I have are *all* Latin clones.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Unibook and Code2000

2004-05-20 Thread Anto'nio Martins-Tuva'lkin

I just noticed that some UCS blocks are displayed in Unibook turned 90
deg. counter-clockwise (apparently everything from U+2460 to U+27B0);
the only thing I changed recently was adding Code2000 to the list of
usable fonts (this does not happend with Arial Unicode, f.i.). FWIW,
Code2000 displays correctly with BabelMap and other applications.

--.
António MARTINS-Tuválkin |  ()|
<[EMAIL PROTECTED]>||
PT-1XXX-XXX LISBOA   Não me invejo de quem tem|
+351 934 821 700 carros, parelhas e montes|
http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe|
http://pagina.de/bandeiras/  a água em todas as fontes|

RE: ISO 15924 draft fixes

> From: Philippe Verdy [mailto:[EMAIL PROTECTED]

> No the structure is correct, however the text file was prepared by
copy/pasting
> HTML text inserted in empty cells, namely the " " character
reference (that
> contains a syntaxic semicolon conflicting with the CSV separator).

IMO, the structure of data is effectively determined by how processes
will interpret the data. A process won't see 6 columns one of which
contains " ". It will see seven columns one of which contains
" ".

He's said the file has been fixed (though I don't know if he's posted
the fixed file).



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes


"Peter Constable" wrote:
> > Michael Everson wrote:
> > >Also, it appears you have not fixed a serious error in the
> > >plain-text file: it is not well-structured. Some rows have 6
> > >columns, and some have 7.
> >
> > That might be fixed in the newest one.
>
> It is not fixed in the file that's on the site now. If this is the
> normative file, I'd suggest you fix it as soon as possible.

This (below) is my own plain text version (still using the field and row order
of table 3 by english name, instead of the order of table 1 by code)... Some
entries are commented out with %.

Philippe.

---
% The format is Name;Code;NÂ;Nom;ID;Date

% Codes for the representation of names of scripts
% Codes pour la reprÃsentation des noms dâÃcritures

% Alphabetical list of English script names

English_Name;Code;NÂ;Nom_franÃais;ID;Date
(alias for Hiragana + Katakana);Hrkt;412;(alias pour hiragana +
katakana);Katakana_Or_Hiragana;2004-05-01
Arabic;Arab;160;arabe;Arabic;2004-05-01
Armenian;Armn;230;armÃnien;Armenian;2004-05-01
Balinese;Bali;360;balinais;;2004-05-18
Batak;Batk;365;batak;;2004-05-01
Bengali;Beng;325;bengalÃ;Bengali;2004-05-01
Blissymbols;Blis;550;symboles Bliss;;2004-05-01
Bopomofo;Bopo;285;bopomofo;Bopomofo;2004-05-01
Brahmi;Brah;300;brÃhmÃ;;2004-05-01
Braille;Brai;570;braille;Braille;2004-05-01
Buginese;Bugi;367;bouguis;;2004-05-01
Buhid;Buhd;372;bouhide;Buhid;2004-05-01
Cham;Cham;358;cham (Äam, tcham);;2004-05-01
Cherokee;Cher;445;tchÃrokÃ;Cherokee;2004-05-01
Cirth;Cirt;291;cirth;;2004-05-01
Code for uncoded script;Zzzz;999;codet pour Ãcriture non codÃe;;2004-05-01
Code for undetermined script;Zyyy;998;codet pour Ãcriture
indÃterminÃe;;2004-05-01
Code for unwritten languages;Zxxx;997;codet pour les langues non
Ãcrites;;2004-05-01

% Still missing...
%Coptic;copt;201;copte;;2004-05-20

Cuneiform, Sumero-Akkadian;Xsux;020;cunÃiforme sumÃro-akkadien;;2004-05-01
Cypriot;Cprt;403;syllabaire chypriote;Cypriot;2004-05-01
Cyrillic;Cyrl;220;cyrillique;Cyrillic;2004-05-01
Cyrillic (Old Church Slavonic variant);Cyrs;221;cyrillique (variante
slavonne);;2004-05-01
Deseret (Mormon);Dsrt;250;dÃseret (mormon);Deseret;2004-05-01
Devanagari (Nagari);Deva;315;dÃvanÃgarÃ;Devanagari;2004-05-01
Egyptian demotic;Egyd;070;dÃmotique Ãgyptien;;2004-05-01
Egyptian hieratic;Egyh;060;hiÃratique Ãgyptien;;2004-05-01
Egyptian hieroglyphs;Egyp;050;hiÃroglyphes Ãgyptiens;;2004-05-01
Ethiopic (Geâez);Ethi;430;Ãthiopique (Ãthiopien, geâez);Ethiopic;2004-05-01

% Why was this removed? Wasn't it present for bibliographic references?
%Georgian (Asomtavruli);Geoa;241;gÃorgien (assomtavrouli);;2004-05-18

Georgian (Mkhedruli);Geor;240;gÃorgien (mkhÃdrouli);Georgian;2004-05-18
Glagolitic;Glag;225;glagolitique;;2004-05-01
Gothic;Goth;206;gotique;Gothic;2004-05-01
Greek;Grek;200;grec;Greek;2004-05-01
Gujarati;Gujr;320;goudjarÃtÃ (gujrÃtÃ);Gujarati;2004-05-01
Gurmukhi;Guru;310;gourmoukhÃ;Gurmukhi;2004-05-01
Han (Hanzi, Kanji, Hanja);Hani;500;idÃogrammes han;Han;2004-05-01
Han (Simplified variant);Hans;501;idÃogrammes han (variante
simplifiÃe);;2004-05-01
Han (Traditional variant);Hant;502;idÃogrammes han (variante
traditionelle);;2004-05-01

% This should better be:
%Hangul (HangÅl, Hangeul);Hang;286;hangul (hanâgul);Hangul;2004-05-01
Hangul (HangÅl, Hangeul);Hang;286;hangul (hangÅl, hangeul);Hangul;2004-05-01

HanunÃo;Hano;371;hanounÃo;Hanunoo;2004-05-01
Hebrew;Hebr;125;hÃbreu;Hebrew;2004-05-01
Hiragana;Hira;410;hiragana;Hiragana;2004-05-01
Indus (Harappan);Inds;610;indus;;2004-05-01
Javanese;Java;361;javanais;;2004-05-18
Kannada;Knda;345;kannara (canara);Kannada;2004-05-18
Katakana;Kana;411;katakana;Katakana;2004-05-01
Kayah Li;Kali;357;kayah li;;2004-05-01
Kharoshthi;Khar;305;kharochthÃ;;2004-05-18
Khmer;Khmr;355;khmer;Khmer;2004-05-18
Lao;Laoo;356;laotien;Lao;2004-05-01
Latin;Latn;215;latin;Latin;2004-05-01
Latin (Fraktur variant);Latf;217;latin (variante brisÃe);;2004-05-01
Latin (Gaelic variant);Latg;216;latin (variante gaÃlique);;2004-05-01
Lepcha (RÃng);Lepc;335;lepcha (rÃng);;2004-05-01
Limbu;Limb;336;limbou;Limbu;2004-05-18
Linear A;Lina;400;linÃaire A;;2004-05-01
Linear B;Linb;401;linÃaire B;Linear_B;2004-05-18
Malayalam;Mlym;347;malayÃlam;Malayalam;2004-05-01
Mandaean;Mnda;140;mandÃen;;2004-05-01
Mayan hieroglyphs;Maya;090;hiÃroglyphes mayas;;2004-05-01
Meroitic;Mero;100;mÃroÃtique;;2004-05-01
Mongolian;Mong;145;mongol;Mongolian;2004-05-01
Myanmar (Burmese);Mymr;350;birman;Myanmar;2004-05-01
Ogham;Ogam;212;ogam;Ogham;2004-05-01
Old Hungarian;Hung;176;ancien hongrois;;2004-05-01
Old Italic (Etruscan, Oscan, etc.);Ital;210;ancien italique (Ãtrusque, osque,
etc.);Old_Italic;2004-05-18
Old Permic;Perm;227;ancien permien;;2004-05-01
Old Persian;Xpeo;030;cunÃiforme persÃpolitain;;2004-05-01
Oriya;Orya;327;oriyÃ;Oriya;2004-05-01
Orkhon;Orkh;175;orkhon;;2004-05-01
Osmanya;Osma;260;osmanais;Osmanya;2004-05-01
Pahawh Hmong;Hmng;450;pahawh hmong;;2004-05-01
Phoenician;Phnx;115;phÃnicien;;2004-05-01
Pollard Phonetic;Plrd;282;phonÃtique de Pollard;

Re: ISO 15924 draft fixes

From: "Antoine Leca" <[EMAIL PROTECTED]>
> > Antoine Leca a écrit :
> >
> >> The French name for Hang looks strange. It happened to be "hangul
> >> (hangul, hangeul)" (after quite a bit of discussion.)
>
> Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8
> this morning. I meant "hangul(hangul, hangeul)".
>
> According to a native  the
> correct form are the ones between parenthesis (with an added apostrophe
> between han'gul).
>
> : From: "Jian YANG" [EMAIL PROTECTED]
> : Subject: Re: Re: (iso15924.275) "Hangul (Hang~ul, Hangeul)"
> :   as script name (~is  adiacritical mark)
> : Date: Mon, 29 May 2000 15:49:25 -0400
> :
> :
> : «Hangeul» = Norme de romanisation du Ministère de
> : l'Éducation de la Corée du Sud;
> : «Hangul» = Romanisation Mc-Cune-Reischauer (la forme exacte
> : est «Han'gul» : «u» with breve, et non caron; mais on a
> : enlevé le signe diacritique pour accommoder la convention de
> : ascii, sans doute);
>
>
> On Thursday, May 20, 2004 3:51 PM, Patrick Andries va escriure:
> >
> > The name in ISO/CEI 10646 (F)  is « hangûl  » from a Corean dictionary
> > and a Corean grammar published by the Inalco (Langues O').
>
> Clearly, the Langues'O did adapt it to French typographical possibilities,
> reversing the breve accent into a circumflex.
>
> > Another
> > suggested form in some sources, to appromixate the pronounciation.
> > is « hangueul »
>
> This is the other form, with an added, euphonical u after the g, to avoid a
> complete misprononciation.
>
> About whether all this right or not, I do not know. But I believe this text
> did go through two ballots against the very people of Langues'O (?), so we
> have no reason to correct now what was accepted in the standard. The only
> choice right now is to type exactly what was printed, since I understand we
> do not have any more the master that served to the [F]DIS texts.
>
> Since I am not a member of TC46, and furthermore I was away from the process
> last year, I might very easily be wrong.

I see no real problem if not all the different orthographies are listed or if
they are not used universally. As long as the name is non ambiguous. What will
be important for interchange of data will not be this name but the Code (or N°,
or even ID in UAX#24 properties).

So there's nothing wrong if "Han'gul" is shown to users without the prefered
apostrophe (I don't mean here not the single quote!), or with a caron or
circumflex instead of breve (to dapat to the rendering or encoding context in
which this name would be exposed to users), or even without any diacritic (my
opinion is that substituting a diacritic for another is worse than just removing
the diacritic that can't be displayed or encoded).

French normally has no caron and no breve, and the circumflex is used to mark a
slight alteration of the vowel because of an assimilated consonnant in the
historical orthograph (most often this circumflex in French denotes a lost "s"
after the vowel).

So the curcumflex on "Hangul" would be inappropriate for French, as well as
"Hangeul" (breaks the common reading rules). "Hangul" and "Han'gul" are more
acceptable, as well as "Hangoeul" with a "oe" ligature, or "Han'goeul" with an
additional apostrophe, which would have been even more accurate but have been
seen nowhere for now.

[Comments-OT]
The problem of apostrophes is that French keyboards don't have it, but only
have a single-quote. Handling the presence of quotes as meaning apostrophe is
limited in French to very few words as a mark of ellision of some characters,
not as a mark for the phonetic.
In "Han'gul" there's no ellision but its absence places a nasalisation of
the previous letter "a". A solution would be to write "Hanngul". However there
are now lots of proper names _ending_ in "-an" (such as "Alan") for which the
nasalisation is easy to avoid by readers (so "Han", i.e. the ideographic script
of Chinese, is appropriate in French, but not "Hanzi", "Hanja", or "Hangul",
where almlost all native readers would not pronounce the "n" but would nasalize
the previous vowel "a"). The simplest solution to avoid nasalization is to place
another n after it in French (nazalisation never occurs with double-n in
French).
This would give in French: "Hanngul" (or "Hanngoeul", or is it "Hanngoul"
?), "Hannzi" (to avoid pronounce it like in "enzyme"), "Hannja" (to avoid
pronounce it like in "en japonais"), but still "Han" (preferably to "Hann")...
[/Comments-OT]

Philippe.

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Patrick Andries

James Kass a Ãcrit :
Ernest Cline wrote,
 

In order for Phoenician to be "disunified" from Hebrew, it must
first have been unified with Hebrew.  This is not the case.
 

Well then, nonunification if you wish to be picky about it.
   

Sorry if I offended.  Many on this list have referred to the current
proposal as a "disunification" and seem to be arguing that accepting
this proposal would change and disrupt current Unicoding practices.
In this case, I think it's important to be picky because there are
no current Unicoding practices for Phoenician. 

You may mean that the Unicode book does not document how Phoenician (or 
Paleo-Hebrew) may be encoded. This is not to say that no one is using 
Unicode to encode Paleo-Hebrew texts.

P. A.

Re: ISO 15924 draft fixes

At 23:21 +0200 2004-05-20, Philippe Verdy wrote:
Micheal has just changed the online version (but with the wrong dates...that's
irritating).
Patience... Unchanged codes will retain 2004-05-01 as the starting 
date. Changed codes have (as of the current BETA draft which is 
uploaded for testing purposes now; look at it if you like to do that 
sort of thing) the date of 2004-05-20. If there are further changes 
before we go to RELEASE then I will adjust that date accordingly. OK?

There is still a conflict of "Code" for Mandaean, is it "Mand" or "Mnda"?
Mand.
As Michael indicates that the plain-text should be the reference, then it
suggests "Mnda" and not "Mand"... But the new plain-text version 
uses "Mand"... I had already signaled it in a past message... So 
it's also irritating.
They all say Mand now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

From: "Michael Everson" <[EMAIL PROTECTED]>
> Is the format order satisfactory? English_Name;Code;Nº;Nom_français;PVA;Date
> Or would it be preferable to have it in the
> format of Table 1
> (Code;Nº;English_Name;Nom_français;PVA;Date)

I vote for the order of table 1; the Code is the most important one, and the
start of line will have a fixed format, easing its parsing, or simply easing its
legibility for readers.

Make the plain-text normative and published online (out of a zip file), and make
the HTML pages only informative...

I have done another scripted page (with PHP; however the PHP generated page may
be stored in a cached static page if you don't want scripts on the Unicode
server itself) using the text file as the reference, to generate all the HTML
pages for browsing.

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

Peter Kirk wrote:
This is not a practical use of variation sequences if, by this, you
mean use of variation selectors. What are you going to do, add a
variation selector after every single base character in the text? ...
...
... Are you expecting fonts to support the tiny stylistic variations
between Phoenician, Moabite, Palaeo-Hebrew, etc. -- variations that
are not even cleanly defined by language usage -- with such sequences?

No one has suggested this.
Then what is Ernest suggesting? He wrote that the distinction between stylistic variants
of unified scripts could be done with variation sequences, i.e. a sequence that 'always
consists of a base character followed by the variation selector, may be specified as part
of the Unicode Standard'. He then went further and wrote:

My point was that I have seen enough evidence to
absolutely convince me that if both glyph repertoires
are unified in a single script, variation sequences
would be *necessary*. [My emphasis.]
So what is he suggesting if not that every single base character in a text would be
followed by a variation selector character in order to make a plain-text distinction
between stylistic variations?

Why not change the friggin' font? Why not use something other than plain-text?
The solution may be a catch-all, but the problem is a real one. Dr
Kaufman's response makes it clear that to professionals in the field
Everson's proposal is not just questionable but ridiculous. There is
certainly some PR work to be done in this area, not name-calling.
Peter, are we talking about the same thing? Ernest is suggesting bizarre measures to deal
with a problem -- in my opinion, a non-existent one -- that he sees in *unification*. You
are arguing against Michael's *dis-unification*. The ridiculousness of Ernest's suggestion
to use variation selector sequences -- indeed, perhaps he intends it to be ridiculous to
make a point -- is an argument in favour of dis-unification, since the alternative for
making a plain-text distinction is so daft.

My question, again, is whether there is a need for the plain text distinction in the first
place?

John Hudson

RE: ISO 15924 draft fixes

At 13:49 -0700 2004-05-20, Peter Constable wrote:
I agree with Addison here: the most important thing is stability, but it
makes sense that the first and second columns be the symbolic code and
the numeric code, especially if this is *the* plain-text version and
normative reference.
That's going to happen.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

From: "Peter Constable" <[EMAIL PROTECTED]>
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf
> > Of Michael Everson
>
> > I hope this satisfies you.
> http://www.unicode.org/iso15924/codelists.html
>
> If they are consistent and reliable, I'm satisfied with them. I hope you
> will be preparing a page for corrigenda / errata.
>
> It's not a big issue, but I don't understand why the dates don't match:
> was "Arab" added on January 9 or May 1? So, they're not entirely
> consistent.
>
> Also, it appears you have not fixed a serious error in the plain-text
> file: it is not well-structured. Some rows have 6 columns, and some have
> 7.

No the structure is correct, however the text file was prepared by copy/pasting
HTML text inserted in empty cells, namely the " " character reference (that
contains a syntaxic semicolon conflicting with the CSV separator). That's the
first thing I had signaled to Michael several days ago, and he has acknowledged
it and corrected it in its new update.

I have already signaled almost all bugs and inconsistencies to Michael, and
prepared corrected files.
Micheal has just changed the online version (but with the wrong dates...that's
irritating).

There is still a conflict of "Code" for Mandaean, is it "Mand" or "Mnda"?
- Table 1 (HTML by Code):
Mand;140;Mandaean;mandéen;;2004-05-01
- Table 2 (HTML by N°):
140;Mand;Mandaean;mandéen;;2004-05-01
- Table 3 (HTML by Name):
Mandaean;140;Mnda;mandéen;;2004-05-01
- same thing for Table 3 (plain-text by Name)
- Table 4 (HTML by Nom):
mandéen;140;Mnda;Mandaean;;2004-05-01
As Michael indicates that the plain-text should be the reference, then it
suggests "Mnda" and not "Mand"... But the new plain-text version uses "Mand"...
I had already signaled it in a past message... So it's also irritating.

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread John Hudson

[EMAIL PROTECTED] wrote:
In order for Phoenician to be "disunified" from Hebrew, it must
first have been unified with Hebrew.  This is not the case.
Okay, un-unified, non-unified, kept-separate-from ... pick your term.
At the moment Phoenician is neither unified nor non-unified with Hebrew *because no 
decision has been made by the UTC*. Lack of a decision neither implies unification or 
non-unification. Phoenician is in the box with Schroedinger's cat.

John Hudson

Re: ISO 15924 "code"s and "ID"s

From: "Peter Constable" <[EMAIL PROTECTED]>
> Could someone please explain why the data tables for ISO 15924 list both
> "codes" and "ID"s? ("ID"s are not discussed in the text of the
> standard.)

My opinion is that "Codes" where defined to be locale-neutral, and easily
parsable in locale identifiers, with their fixed format.
The "IDs" are inherited from Unicode UAX #24 and Unicode Properties... and
needed for compatibility. If view IDs as aliases of "Codes", and no interest in
adding new "IDs" for UAX #24 in the future, the absence of ID meaning that there
will be no more alias defined for new scripts in Unicode.
This means that future standardized scripts for Unicode (for example Phoenician
if it is accepted), will use the now defined ISO-15924 "Phnx" code in Unicode
character properties, without attempting to add a new "Phoenician" value which
would have to be added in the ISO-15924 code list (I think it is useless to
change something only to add aliases for new codes that Unicode has still not
defined and used).
Am I wrong?

Re: Response to Everson Phoenician and why June 7?


Ernest Cline wrote,

> > In order for Phoenician to be "disunified" from Hebrew, it must
> > first have been unified with Hebrew.  This is not the case.
> 
> Well then, nonunification if you wish to be picky about it.

Sorry if I offended.  Many on this list have referred to the current
proposal as a "disunification" and seem to be arguing that accepting
this proposal would change and disrupt current Unicoding practices.

In this case, I think it's important to be picky because there are
no current Unicoding practices for Phoenician.  Phoenician is not 
yet encoded in the standard.

Best regards,

James Kass

RE: ISO 15924 draft fixes

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Addison Phillips [wM]


> I don't care about the order, so long as it is stable over time.
Personally I find the
> latter form more logical (with the identifier, i.e. the code, first).

I agree with Addison here: the most important thing is stability, but it
makes sense that the first and second columns be the symbolic code and
the numeric code, especially if this is *the* plain-text version and
normative reference.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

I could use a little help rendering this into French, lest I 
embarrass myself

"The Property Value Alias is defined as part of the Unicode Standard 
and is provided informatively in the tables here to show how entries 
in the ISO 15924 code table relate to script names defined in 
Unicode."

--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 "code"s and "ID"s

At 12:02 -0700 2004-05-20, Peter Constable wrote:
Well, how about as comments in the plain-text file and on the HTML
pages:
"The Property Value Alias is defined as part of the Unicode Standard and
is provided here as informative information
*giggles*
to show how entries in the
ISO 15924 code table relate to script names defined in Unicode."
I'd like to put this on codelists.html and in the plain-text file, 
but not on each of the HTML pages.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Ernest Cline


> [Original Message]
> From: <[EMAIL PROTECTED]>
>
> Ernest Cline wrote,
>
> > ... This indicates to me that variation
> > sequences are a potential solution that should be considered,
> > even if it ends up being rejected in favor of disunification.
>
> In order for Phoenician to be "disunified" from Hebrew, it must
> first have been unified with Hebrew.  This is not the case.

Well then, nonunification if you wish to be picky about it.

RE: ISO 15924 draft fixes

2004-05-20 Thread Addison Phillips [wM]

I don't care about the order, so long as it is stable over time. Personally I find the 
latter form more logical (with the identifier, i.e. the code, first). I view the 
English and French names and the "PVA" as merely descriptive or informative 
information. The code and the ID number should go first, IMO.

But if the file is in some other format, that's fine, so long as the format is stable.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf Of Michael Everson
> Sent: 2004å5æ20æ 10:59
> To: [EMAIL PROTECTED]
> Subject: RE: ISO 15924 draft fixes
> 
> 
> At 10:00 -0700 2004-05-20, Addison Phillips [wM] wrote:
> >I concur with Peter. If there are multiple 
> >documents now, then I'd like to see a single 
> >normative document...
> 
> It will be the plain-text version, and for the 
> purposes of fixing the current regrettable mess 
> I'm taking it as read that the plain text version 
> was always the normative version.
> 
> >and furthermore I would like it to *be* 
> >normative (and I'd like to know which one it 
> >is). The text file is listed on the web site as 
> >the "alternative"...
> 
> It should say normative.
> 
> Is the format order satisfactory? 
> English_Name;Code;NÂ;Nom_franÃais;PVA;Date
> Or would it be preferable to have it in the 
> format of Table 1 
> (Code;NÂ;English_Name;Nom_franÃais;PVA;Date)
> 
> 
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>

RE: ISO 15924 draft fixes

At 12:07 -0700 2004-05-20, Peter Constable wrote:
 > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 >Also, it appears you have not fixed a serious error in the
 >plain-text file: it is not well-structured. Some rows have 6
 >columns, and some have 7.
 That might be fixed in the newest one.
It is not fixed in the file that's on the site now. If this is the
normative file, I'd suggest you fix it as soon as possible.
Which file on the site?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread jameskass

Peter Constable wrote,

>  I'm sure even Youtie would go for this.

Except that she's too busy writing new lyrics for Janis Joplin tunes.

Ernest Cline wrote,

> ... This indicates to me that variation
> sequences are a potential solution that should be considered,
> even if it ends up being rejected in favor of disunification.

In order for Phoenician to be "disunified" from Hebrew, it must
first have been unified with Hebrew.  This is not the case.

(If anyone can cite from TUS any passage recommending that Phoenician
text should be encoded using Hebrew characters, I'll stand corrected.)

Variation sequences could be very helpful to distinguish variants in
plain text.  But, if every character in an entire text needs to have a
corresponding variant selector in order for the text to render as
expected, then that's a strong argument in favor of a separate encoding.

Variation sequences could be used to distinguish glyph variants between
Phoenician and neo-Punic, though, or even between neo-Punic and neo-Punic.
If members of any discipline need such granularity in plain text, say
epigraphers or numismatists, then they'll float a proposal and the
proposal can be judged on its merits.

Somebody>  "You should use graphics for such distinctions."

Graphics aren't part of plain text.

Somebody>  "Well then, you should just use mark-up."

Neither is mark-up.

Best regards,

James Kass

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread jameskass


Peter Kirk wrote,

> The solution may be a catch-all, but the problem is a real one. Dr 
> Kaufman's response makes it clear that to professionals in the field 
> Everson's proposal is not just questionable but ridiculous. There is 
> certainly some PR work to be done in this area, not name-calling.

Does Dr. Kaufman speak for all professionals in the field, or would
it be fair to say that Dr. Kaufman is speaking for only one such
professional?

Best regards,

James Kass

RE: ISO 15924 draft fixes

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson

 
> >Also, it appears you have not fixed a serious error in the
> >plain-text file: it is not well-structured. Some rows have 6
> >columns, and some have 7.
> 
> That might be fixed in the newest one.

It is not fixed in the file that's on the site now. If this is the
normative file, I'd suggest you fix it as soon as possible.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca

> Antoine Leca a écrit :
>
>> The French name for Hang looks strange. It happened to be "hangul
>> (hangul, hangeul)" (after quite a bit of discussion.)

Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8
this morning. I meant "hangul(hangul, hangeul)".

According to a native  the
correct form are the ones between parenthesis (with an added apostrophe
between han'gul).

: From: "Jian YANG" [EMAIL PROTECTED]
: Subject: Re: Re: (iso15924.275) "Hangul (Hang~ul, Hangeul)"
:   as script name (~is  adiacritical mark)
: Date: Mon, 29 May 2000 15:49:25 -0400
:
:
: «Hangeul» = Norme de romanisation du Ministère de
: l'Éducation de la Corée du Sud;
: «Hangul» = Romanisation Mc-Cune-Reischauer (la forme exacte
: est «Han'gul» : «u» with breve, et non caron; mais on a
: enlevé le signe diacritique pour accommoder la convention de
: ascii, sans doute);


On Thursday, May 20, 2004 3:51 PM, Patrick Andries va escriure:
>
> The name in ISO/CEI 10646 (F)  is « hangûl  » from a Corean dictionary
> and a Corean grammar published by the Inalco (Langues O').

Clearly, the Langues'O did adapt it to French typographical possibilities,
reversing the breve accent into a circumflex.

> Another
> suggested form in some sources, to appromixate the pronounciation.
> is « hangueul »

This is the other form, with an added, euphonical u after the g, to avoid a
complete misprononciation.

About whether all this right or not, I do not know. But I believe this text
did go through two ballots against the very people of Langues'O (?), so we
have no reason to correct now what was accepted in the standard. The only
choice right now is to type exactly what was printed, since I understand we
do not have any more the master that served to the [F]DIS texts.

Since I am not a member of TC46, and furthermore I was away from the process
last year, I might very easily be wrong.


Antoine

RE: ISO 15924 "code"s and "ID"s

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson


> >Calling the other thing "Property Value Alias" would solve the
> >problem, but it really ought to be defined somewhere;
> 
> The heading links to it.

Not good enough: the plain-text file does not contain links. (Beside the
fact that I cannot find the supposed link you refer to.)


> >and since it's not mentioned in the standard, then it's status must
> >be informative, and that should be indicated.
> 
> I invite suggestions as to where and how.

Well, how about as comments in the plain-text file and on the HTML
pages: 

"The Property Value Alias is defined as part of the Unicode Standard and
is provided here as informative information to show how entries in the
ISO 15924 code table relate to script names defined in Unicode."

Or substitute whatever text describes *why* these are being shown here.
You've got to say *something* about them, else it's completely unclear
whether the reader is supposed to care about them or not, and what
they're supposed to be used for.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Jony Rosenne

I think we should be careful not to introduce new features, such as
variation selectors, to new scripts, unless there is a strong reason to do
so.

The fact that VS are now standard in Unicode does not require every Hebrew
software to support them, even by ignoring them.

There is a huge cost involved.

Jony

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Peter Constable
> Sent: Thursday, May 20, 2004 5:35 PM
> To: Unicode List
> Subject: RE: Response to Everson Phoenician and why June 7?
> 
> 
> > Even with a separate Phoenician script, it might be a good idea to 
> > provide variation sequences
> 
> Hmmm, gives me an idea: For those people that want to unify, 
> would it help if all of the Phoenician characters were 
> considered as variation sequences of Hebrew characters, but 
> for convenience we used "pre-composed", atomic characters to 
> represent each of those sequences? Then people wouldn't 
> actually need to use those sequences themselves, 'cause the 
> atomic characters would do the same thing. But someone could 
> convert the atomic characters into the real variation 
> sequences for comparisons with Hebrew-cum-Hebrew, and since 
> the variation mappings are 1:1, the same VS would be used for 
> all sequences, and it could just as well be a null, virtual 
> VS, which would make it way easier to process the data. So 
> the conversion would be between the atomic 
> Phoenician-variation-of-Hebrew-sequence characters to the 
> sequences of virtual-VS + Hebrew characters. And we could 
> tell the splitters that we were encoding a distinct script 
> just to keep them happy, but we'd be the ones who really know 
> what's happening.
> 
> 
> I'm sure even Youtie would go for this.
> 
> 
> Peter
> 
> 
> 
>

RE: ISO 15924 draft fixes

At 10:00 -0700 2004-05-20, Addison Phillips [wM] wrote:
I concur with Peter. If there are multiple 
documents now, then I'd like to see a single 
normative document...
It will be the plain-text version, and for the 
purposes of fixing the current regrettable mess 
I'm taking it as read that the plain text version 
was always the normative version.

and furthermore I would like it to *be* 
normative (and I'd like to know which one it 
is). The text file is listed on the web site as 
the "alternative"...
It should say normative.
Is the format order satisfactory? English_Name;Code;Nº;Nom_français;PVA;Date
Or would it be preferable to have it in the 
format of Table 1 
(Code;Nº;English_Name;Nom_français;PVA;Date)

--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 "code"s and "ID"s

At 10:14 -0700 2004-05-20, Peter Constable wrote:
Calling the other thing "Property Value Alias" would solve the 
problem, but it really ought to be defined somewhere;
The heading links to it.
and since it's not mentioned in the standard, then it's status must 
be informative, and that should be indicated.
I invite suggestions as to where and how.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

At 09:49 -0700 2004-05-20, Peter Constable wrote:
 > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 I hope this satisfies you.
http://www.unicode.org/iso15924/codelists.html
If they are consistent and reliable, I'm satisfied with them. I hope you
will be preparing a page for corrigenda / errata.
That's what http://www.unicode.org/iso15924/codechanges.html is for.
It's not a big issue, but I don't understand why the dates don't match:
was "Arab" added on January 9 or May 1? So, they're not entirely
consistent.
Because long long ago when I thought that ISO was going to publish 
the document on my birthday (sigh) I put 2004-01-09 on the document; 
that didn't happen, and it wasn't published until 2004-05-01.

Also, it appears you have not fixed a serious error in the 
plain-text file: it is not well-structured. Some rows have 6 
columns, and some have 7.
That might be fixed in the newest one.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Addison Phillips [wM]

I concur with Peter. If there are multiple documents now, then I'd like to see a 
single normative document... and furthermore I would like it to *be* normative (and 
I'd like to know which one it is). The text file is listed on the web site as the 
"alternative"...

By all means correct errors. Spelling or nomenclatural (non-substantive) changes in 
the descriptions are errata. But I view changes, additions, and deletions to/from the 
data tables as changes to the standard and they should, in my opinion, be treated as 
such even if they are only to correct errors.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable
> Sent: 2004å5æ20æ 8:10
> To: Unicode List
> Subject: RE: ISO 15924 draft fixes
> 
> 
> > >For now I suggest an immediate warning in the ISO15924 web pages,
> > >explicitly stating that these published tables were in beta, and
> > >contain incoherences, which are being corrected.
> > 
> > No. This is purely cosmetic. Let us move on.
> 
> I find this cavalier attitude a bit disconcerting. Errors in the tables
> are not purely cosmetic. An IT standard is created to support IT
> implementations, and people have been and will be referring to those
> tables to create their implementations. Each view of the data should be
> reliable, and if it is found that it was not, then that needs to be
> communicated in some way. 
> 
> IMO, it is essential that there be a place on the site for errata. I'm
> inclined to agree with Philippe: the errata notes should indicate that
> there were errors in the original tables and what the nature of those
> errors were. If IDs were misspelled or missing, those should be
> enumerated. If English or French names were misspelled, I think a
> general note is sufficient.
> 
> 
> > >A link should list the incoherences and the proposed changes. I have
> > >such a list and all it takes for me is a simple Excel spreadsheet,
> > >used to sort the tables and detecting differences between published
> > >tables and proposed corrections.
> > 
> > The only delta we are going to deal with is the one between the
> > plain-text documents; it is that which is going to be considered
> > authoritative
> 
> Is that document*s* (plural)? I strongly encourage you to maintain *one*
> master source from which all others are derived.
> 
> 
> 
> Peter
>  
> Peter Constable
> Globalization Infrastructure and Font Technologies
> Microsoft Windows Division
>

RE: ISO 15924 "code"s and "ID"s

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson

> There is no solving the problem of the use of "code" in the TC37 and
> TC46 standards. That is an old, old argument.

True, but making a distinction between "ID" and the "code" can be
solved. Calling the other thing "Property Value Alias" would solve the
problem, but it really ought to be defined somewhere; and since it's not
mentioned in the standard, then it's status must be informative, and
that should be indicated.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson

> I hope this satisfies you.
http://www.unicode.org/iso15924/codelists.html

If they are consistent and reliable, I'm satisfied with them. I hope you
will be preparing a page for corrigenda / errata.

It's not a big issue, but I don't understand why the dates don't match:
was "Arab" added on January 9 or May 1? So, they're not entirely
consistent.

Also, it appears you have not fixed a serious error in the plain-text
file: it is not well-structured. Some rows have 6 columns, and some have
7.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 "code"s and "ID"s

I could change the column heading to "Property Value Alias" and link 
it to http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
> Of Michael Everson


> Taking time to put up "an immediate warning" isn't a
> good use of my time.

I didn't ask for an immediate warning. I will note, though, that
incorporating bad data into a product may not be a good use of time for
someone else -- and it may be far more costly for them than it will be
for you.

 
> The changes will be noted at
> http://www.unicode.org/iso15924/codechanges.html Please be a little
> bit patient.

I don't think I'm being at all impatient. I didn't ask you to do
anything yesterday; I just ask that it be done carefully. And not to
think that bad data files can be relegated to "cosmetics", which is what
you seemed to be saying.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 "code"s and "ID"s

At 08:59 -0700 2004-05-20, Peter Constable wrote:
Could someone please explain why the data tables for ISO 15924 list both
"codes" and "ID"s? ("ID"s are not discussed in the text of the
standard.)
The Registration Authority (The Unicode Consortium) requested it be 
added to that tables. Perhaps additional wording is needed somewhere 
to explain it. I would welcome specific suggestions and wording.

I find the inclusion of both under these labels somewhat less than
ideal. The term "code" is not consistently used. Most people do use
"code" to refer to a symbol (such as an alpha-4 string) that denotes
some entity or category; and some standards also use the term in that
way. But other standards use "code" to refer to a collection of such
symbols. For instance, ISO 639-1 clearly treats "code" as the
collection, and calls the individual entries "code elements"; the
alpha-2 symbols in ISO 639-1 are "identifiers".
There is no solving the problem of the use of "code" in the TC37 and 
TC46 standards. That is an old, old argument.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

ISO 15924 "code"s and "ID"s

Could someone please explain why the data tables for ISO 15924 list both
"codes" and "ID"s? ("ID"s are not discussed in the text of the
standard.)


I find the inclusion of both under these labels somewhat less than
ideal. The term "code" is not consistently used. Most people do use
"code" to refer to a symbol (such as an alpha-4 string) that denotes
some entity or category; and some standards also use the term in that
way. But other standards use "code" to refer to a collection of such
symbols. For instance, ISO 639-1 clearly treats "code" as the
collection, and calls the individual entries "code elements"; the
alpha-2 symbols in ISO 639-1 are "identifiers".

Nowhere have I seen the symbols called "codes" when they weren't also
called "identifiers". Until now. So, having learned to avoid the
ambiguity of the word "code" and to always use "identifier" (or "ID")
instead, now in the case of ISO 15924 that doesn't work since the "ID"
is something different. 

(The "ID" appears to be a locale-independent reference name that is
structured in a way that allows it to be used in higher-level identifier
protocols, but in the context of ISO 15924, I would not call it the
"ID".)



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

Peter, Philippe,
I hope this satisfies you. http://www.unicode.org/iso15924/codelists.html
It is enough work finding and fixing and figuring out whatever it is 
that a perl script is and how to make it work. It may seem obvious to 
you, but it is not obvious to me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: Response to Everson Phoenician and why June 7?

> Even with a separate Phoenician script, it might be a good idea
> to provide variation sequences

Hmmm, gives me an idea: For those people that want to unify, would it
help if all of the Phoenician characters were considered as variation
sequences of Hebrew characters, but for convenience we used
"pre-composed", atomic characters to represent each of those sequences?
Then people wouldn't actually need to use those sequences themselves,
'cause the atomic characters would do the same thing. But someone could
convert the atomic characters into the real variation sequences for
comparisons with Hebrew-cum-Hebrew, and since the variation mappings are
1:1, the same VS would be used for all sequences, and it could just as
well be a null, virtual VS, which would make it way easier to process
the data. So the conversion would be between the atomic
Phoenician-variation-of-Hebrew-sequence characters to the sequences of
virtual-VS + Hebrew characters. And we could tell the splitters that we
were encoding a distinct script just to keep them happy, but we'd be the
ones who really know what's happening.


I'm sure even Youtie would go for this.


Peter

RE: ISO 15924 draft fixes

At 08:10 -0700 2004-05-20, Peter Constable wrote:
 > >For now I suggest an immediate warning in the ISO15924 web pages,
 > >explicitly stating that these published tables were in beta, and
 > >contain incoherences, which are being corrected.
 >
 > No. This is purely cosmetic. Let us move on.
I find this cavalier attitude a bit disconcerting. Errors in the tables
are not purely cosmetic.
Look, Peter. I'm glad people found errors and inconsistencies. We are 
working on fixing that, and expect it to be fixed very soon. You're 
ALL listening. Taking time to put up "an immediate warning" isn't a 
good use of my time.

IMO, it is essential that there be a place on the site for errata. I'm
inclined to agree with Philippe: the errata notes should indicate that
there were errors in the original tables and what the nature of those
errors were. If IDs were misspelled or missing, those should be
enumerated. If English or French names were misspelled, I think a
general note is sufficient.
The changes will be noted at 
http://www.unicode.org/iso15924/codechanges.html Please be a little 
bit patient.

 > The only delta we are going to deal with is the one between the
 > plain-text documents; it is that which is going to be considered
 authoritative
Is that document*s* (plural)? I strongly encourage you to maintain *one*
master source from which all others are derived.
That would be THE old plain-text document and THE new plain-text 
document which will replace it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

> >For now I suggest an immediate warning in the ISO15924 web pages,
> >explicitly stating that these published tables were in beta, and
> >contain incoherences, which are being corrected.
> 
> No. This is purely cosmetic. Let us move on.

I find this cavalier attitude a bit disconcerting. Errors in the tables
are not purely cosmetic. An IT standard is created to support IT
implementations, and people have been and will be referring to those
tables to create their implementations. Each view of the data should be
reliable, and if it is found that it was not, then that needs to be
communicated in some way. 

IMO, it is essential that there be a place on the site for errata. I'm
inclined to agree with Philippe: the errata notes should indicate that
there were errors in the original tables and what the nature of those
errors were. If IDs were misspelled or missing, those should be
enumerated. If English or French names were misspelled, I think a
general note is sufficient.


> >A link should list the incoherences and the proposed changes. I have
> >such a list and all it takes for me is a simple Excel spreadsheet,
> >used to sort the tables and detecting differences between published
> >tables and proposed corrections.
> 
> The only delta we are going to deal with is the one between the
> plain-text documents; it is that which is going to be considered
> authoritative

Is that document*s* (plural)? I strongly encourage you to maintain *one*
master source from which all others are derived.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 codes for ConScript

> One person wrote, regarding Qaak for Klingon:
> 
> > It's a shame you didn't pick something that could be pronounced in
> > tlhIngan Hol, perhaps Qaap for pIqaD.

Identifiers are identifiers, not words. 



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Peter Kirk

On 19/05/2004 20:54, John Hudson wrote:
Ernest Cline wrote:
I would be very surprised if there were such a cybercafe.  One
that had both a Hebrew-Phoenican and a Hebrew-Hebrew font
with the Hebrew-Phoenician as the default would be much easier
to believe as a possibility.  Still, it is a valid point.  I think 
that if
Phoenician were to be unified with Hebrew, it would probably
behoove Unicode to establish variation sequences for Phoenician.

Even with a separate Phoenician script, it might be a good idea
to provide variation sequences that could be used to identify
different script styles such as Paleo-Hebrew and Punic
in the plain text.

This is not a practical use of variation sequences if, by this, you 
mean use of variation selectors. What are you going to do, add a 
variation selector after every single base character in the text? ...

Well, this won't make the text any longer in UTF-16.
... Are you expecting fonts to support the tiny stylistic variations 
between Phoenician, Moabite, Palaeo-Hebrew, etc. -- variations that 
are not even cleanly defined by language usage -- with such sequences?

No one has suggested this. It does make some sense to encode the 
difference between Phoenician and Hebrew in this way, and possibly also 
other clearly definable distinctions. Presumably a line has to be drawn 
somewhere, but the whole concept cannot be rejected just because it 
could be taken to ridiculous extremes. Actually exactly the same 
argument could be made against Everson's proposal for Phoenician, that 
it opens the way to separate proposals for Moabite, Palaeo-Hebrew etc. 
But I am not making that argument because Everson has clearly stated 
that he does not intend to make these distinctions.

Some people seem keen on variation selectors in the same way that 
others are keen on PUA: as a catch-all solution to non-existent problems.

The solution may be a catch-all, but the problem is a real one. Dr 
Kaufman's response makes it clear that to professionals in the field 
Everson's proposal is not just questionable but ridiculous. There is 
certainly some PR work to be done in this area, not name-calling.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Ernest Cline

> [Original Message]
> From: John Hudson <[EMAIL PROTECTED]>
>
> Ernest Cline wrote:
>
> > I would be very surprised if there were such a cybercafe.  One
> > that had both a Hebrew-Phoenican and a Hebrew-Hebrew font
> > with the Hebrew-Phoenician as the default would be much easier
> > to believe as a possibility.  Still, it is a valid point.  I think that
if
> > Phoenician were to be unified with Hebrew, it would probably
> > behoove Unicode to establish variation sequences for Phoenician.
>
> This is not a practical use of variation sequences if, by this, you
> mean use of variation selectors. What are you going to do, add
> a variation selector after every single base character in the text?
> Are you expecting fonts to support the tiny stylistic variations 
> between Phoenician, Moabite, Palaeo-Hebrew, etc. -- variations
> that are not even cleanly defined by language usage -- with such
> sequences?
>
> Some people seem keen on variation selectors in the same
> way that others are keen on PUA: as a catch-all solution to
> non-existent problems.

To quote from Section 15.6 of TUS 4.0 "A variation sequence, 
which always consists of a base character followed by the
variation selector, may be specified as part of the Unicode
Standard."  Adding a variation _selector_ would mean using
another code point which is not what I am intending here.

Variation sequences are a potential solution to any question of
unification/disunfication of scripts.  They may not be the optimal
solution, but they are a solution.  My point was that I have seen
enough evidence to absolutely convince me that if both glyph
repertoires are unified in a single script, variation sequences
would be necessary.  It has not convinced me that either
unification or disunification would be desirable.

Just as there are no hard and fast rules for disunification of
scripts, there are no hard and fast rules for when variation
sequences are appropriate.  The only guidance that the
standard offers for when variation sequences are appropriate
is: "... provide a mechanism for specifying variants ... that have
essentially the same semantic but substantially different ranges
of glyphs."  Those arguing in favor of unification claim that the
repertoires have the same semantics and those arguing in
favor of separation base at least part of their argument upon
the glyph shapes.  This indicates to me that variation
sequences are a potential solution that should be considered,
even if it ends up being rejected in favor of disunification.

Re: ISO 15924 draft fixes

At 06:51 -0700 2004-05-20, Patrick Andries wrote:
Antoine Leca a écrit :
The French name for Hang looks strange. It happened to be "hangul (hangul,
hangeul)" (after quite a bit of discussion.)
The name in ISO/CEI 10646 (F)  is « hangûl  » 
from a Corean dictionary and a Corean grammar 
published by the Inalco (Langues O'). Another 
suggested form in some sources, to appromixate 
the pronounciation.  is « hangueul »
transliterations of Korean that the Korean NB 
insisted upon. Hangul instead of hangûl we will 
treat as a spelling error (so you don't have to 
file a change form).
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

To terminate with this discussion, I have put online the corrected tables.
http://www.rodage.org/pub/iso15924-sheets.html

(this is a Excel workbook in HTML format with frames but without "Excel
interactivity", that references other URLs in a subfolder; it can be navigated
by the tabs at the bottom)

Also available as a plain Excel file:
http://www.rodage.org/pub/iso15924-sheets.xls

The above collection is also archived in
http://www.rodage.org/pub/iso15924-sheets.zip

Re: ISO 15924 draft fixes

2004-05-20 Thread Patrick Andries

Antoine Leca a Ãcrit :
The French name for Hang looks strange. It happened to be "hangul (hangul,
hangeul)" (after quite a bit of discussion.)
 

The name in ISO/CEI 10646 (F)  is Â hangÃl  Â from a Corean dictionary 
and a Corean grammar published by the Inalco (Langues O'). Another 
suggested form in some sources, to appromixate the pronounciation.  is Â 
hangueul Â

P. A.

Re: ISO 15924 draft fixes

At 14:44 +0200 2004-05-20, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>
 >It can't be Unicode's UTC alone, as there are
 >already codes for bibliographic references that
 >are not (and will never) be encoded separately
 >in Unicode,so I suppose that there are librarian
 >or publishers members with which you have to
 >discuss, independantly of the work of Unicode,
 >which should only be the registrar for these
 >codes. May be there's still no formal procedure,
 > >and for now the codes are maintainable without
 >lots of administration.
 Read the standard.
Stop this easy argument (that I find offensive here), you could have 
read it too before publishing tables with errors
Errors are errors. The RA-JAC had an opportunity to review all the 
tables. Do not blame me alone. People err. People have kindly pointed 
out discrepancies.

(most probably because you forgot to consult the relevant sources to 
check that your document were correct;
Don't presume.
I note that you are taking some freedom with you own decisions, 
regarding Coptic and the removal of Georgian (Asomtavruli) coded 
"Geoa").
I have (properly) proposed the addition of Coptic (and some other 
scripts) to the JAC. Asomtavruli was removed for good reasons. Live 
with it. It will be reinstated in due course.

I have read it and that's why I propose corrections...
And that's why I am communicating with you, to get relevant feedback. 
The only delta we are going to deal with is the one between the 
plain-text documents; it is that which is going to be considered 
authoritative and which will be used (somehow) to generate the other 
tables.

Sorry if you think that these sentences are a bit aggressive but for 
now the RA has made a bad start, and it's mainly because of your 
work...
Nonsense. I am not ashamed. It was a hell of a lot of work getting 
that standard together. It is, as you have pointed out, difficult to 
maintain different tables by hand.

If the publication was preliminary (waiting for comments) it should 
have been documented as such on the Unicode web site (like for the 
proposals in Unicode, which pass by a testbed before being listed as 
"standard").
It does NOT matter, Philippe. The corrections are being made.
For now I suggest an immediate warning in the ISO15924 web pages, 
explicitly stating that these published tables were in beta, and 
contain incoherences, which are being corrected.
No. This is purely cosmetic. Let us move on.
A link should list the incoherences and the proposed changes. I have 
such a list and all it takes for me is a simple Excel spreadsheet, 
used to sort the tables and detecting differences between published 
tables and proposed corrections.
The only delta we are going to deal with is the one between the 
plain-text documents; it is that which is going to be considered 
authoritative and which will be used (somehow) to generate the other 
tables.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Phoenician (was, Response to Everson Phoenician and why June 7?; was, Archaic-Greek/Palaeo-Hebrew; was, interleaved ordering; was, Phoenician)

2004-05-20 Thread Ted Hopp

On Wednesday, May 19, 2004 9:11 PM, John Jenkins wrote:

> You go down to your local cybercafe to read your email from your
> grandmother telling you all about your nephew's bar-mitzvah.
> Unfortunately, your local cybercafe has no modern Hebrew (or Yiddish)
> installed, but they *do* have a Phonecian one.  You cannot, as a
> result, even tell what language your grandmother is writing you in, let
> alone what it means.

Of course, the same thing would likely happen if there was only a Rashi font
installed.

> Of course, this criterion is difficult to apply to two varieties of
> writing separated by thousands of years -- and it might behoove the UTC
> to discuss the problems involved -- but if we accept minimum legibility
> as a factor in deciding when to unify/separate, I think it's a valid
> one.

Minimum legibility? Among what population?

By this logic, Rashi script should be separately encoded as well. An Israeli
friend of mine says (only half jokingly) that he can read Hebrew words in
Rashi script, but doesn't know any of the letters.

Phoenician (k'tav ivri) may not be quite as widely recognized as Rashi
script, but I've always thought of both of them as using the same set of
characters, just different glyph sets--same letter names, same phonetic
values, even the same word spellings. (I find it remarkable that the
Phoenician proposal states that none of the characters can be considered to
be similar in function to an existing character.) If Rashi and k'tav ivri
aren't legible to many, this is really only due to unfamiliarity with the
glyphs--much the same problem that young children have with cursive writing
(in English as well as in Hebrew).

So I don't get it. Why does Phoenician need its own Unicode encoding? What's
the operational need that can't be met by using Hebrew characters with the
right font? Why is "render this in k'tav ivri" any different than "render
this in Rashi script?" Is it just that when the glyphs are different enough,
it becomes a good idea to encode them as separate characters, or is there
more to it? And if there is no difference, are we to look forward to a Rashi
script proposal? Surely not.

There is a surfeit of proof by forceful assertion on both sides of this
argument. Very little by way of clear rationale. (And saying "it's in the
roadmap" just begs the question.)

Ted

Ted Hopp, Ph.D.
ZigZag, Inc.
[EMAIL PROTECTED]
+1-301-990-7453

newSLATE is your personal learning workspace
   ...on the web at http://www.newSLATE.com/

Re: ISO 15924 draft fixes

From: "Michael Everson" <[EMAIL PROTECTED]>
> >It can't be Unicode's UTC alone, as there are
> >already codes for bibliographic references that
> >are not (and will never) be encoded separately
> >in Unicode,so I suppose that there are librarian
> >or publishers members with which you have to
> >discuss, independantly of the work of Unicode,
> >which should only be the registrar for these
> >codes. May be there's still no formal procedure,
> >and for now the codes are maintainable without
> >lots of administration.
>
> Read the standard.

Stop this easy argument (that I find offensive here), you could have read it too
before publishing tables with errors (most probably because you forgot to
consult the relevant sources to check that your document were correct; I note
that you are taking some freedom with you own decisions, regarding Coptic and
the removal of Georgian (Asomtavruli) coded "Geoa"). I have read it and that's
why I propose corrections...

OK there are lots of corrections, but that's not a reason of ignoring some
elements that were already published (and are still published for now on the
Unicode web site, which is the only reference for the ISO15924 "Registration
Authority". Unicode has just appointed you to perform administrative updates for
the RA, not to take your own decisions.)

Sorry if you think that these sentences are a bit aggressive but for now the RA
has made a bad start, and it's mainly because of your work... If the publication
was preliminary (waiting for comments) it should have been documented as such on
the Unicode web site (like for the proposals in Unicode, which pass by a testbed
before being listed as "standard").

For now I suggest an immediate warning in the ISO15924 web pages, explicitly
stating that these published tables were in beta, and contain incoherences,
which are being corrected. A link should list the incoherences and the proposed
changes.
I have such a list and all it takes for me is a simple Excel spreadsheet, used
to sort the tables and detecting differences between published tables and
proposed corrections.

Re: ISO 15924 draft fixes

At 13:37 +0200 2004-05-20, Philippe Verdy wrote:
 > I added Coptic unilaterally.
I can't see Coptic for now in your source zip file.
It isn't in that file.
There are other "duplicate" lines for name aliases that should be listed in
changes:
I'm not going to list those changes. There is no code or name change involved.
- "Berber (Tifinagh)"
=> "Tifinagh (Berber)"
[...]
Note that the French names for Han variants are identical 'idéogrammes han",
when the English names correctly indicates the 
distinction between "Traditional" and 
"Simplified" variants. These French names should 
be:
"idéogrammes han (Hanzi, Kanji, Hanja)";"Hani";"500";"Han (Hanzi, Kanji,
Hanja)"
"idéogrammes han (variante simplifiée)";"Hans";"501";""
"idéogrammes han (variante traditionnelle)";"Hant";"502";""
This has been corrected.
For the French name of "Hangul", I also found "Hang" quite strange (never seen
this orthograph before)
"Orthograph" is not the word you want. You want 
the word "spelling". I already said, this error 
has been corrected.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, May 19, 2004 10:40 PM
Subject: ISO 15924 draft fixes

> The Registrar wishes to thank everyone who has taken an interest in
> the ISO 15924 data pages, and regrets the imperfections which are
> contained there. I am not sure how we will manage the generation of
> the pages, but it is clear that the base should be the plain-text
> document.
>
> I have made changes to the plain-text document and placed it, a draft
> Changes page, and the original plain-text document available at
> http://www.unicode.org/iso15924/iso15924-fixes.zip
>
> I would appreciate it if interested persons could look this over and
> inform me if they find any further discrepancies between the two
> which are worth troubling about. Then we will proceed to generate the
> other files.
>
> I deleted some duplicate lines: Ethiopic was on two lines, under
> Ethiopic and under Ge'ez. It seemed inappropriate to burden the
> tables with such duplication.
>
> I added Coptic unilaterally.

I can't see Coptic for now in your source zip file.

There are other "duplicate" lines for name aliases that should be listed in
changes:
- "Berber (Tifinagh)"
=> "Tifinagh (Berber)"
- "(Burmese) Myanmar"
=> "Myanmar (Burmese)"
- "Fraktur (variant of Latin)"
=> "Latin (Fraktur variant)"
- "Gaelic (variant of Latin)"
=> "Latin (Gaelic variant)"
- "Harappan (Indus)"
=> "Indus (Harappan)"
- "Mormon (Deseret)"
=> "Deseret (Mormon)"
- "Nagari (Devanagari)"
=> "Devanagari (Nagari)"
- "Old Church Slavonic (variant of Cyrillic)"
=> "Cyrillic (Old Church Slavonic variant)"

Note that the French names for Han variants are identical 'idéogrammes han",
when the English names correctly indicates the distinction between "Traditional"
and "Simplified" variants. These French names should be:
"idéogrammes han (Hanzi, Kanji, Hanja)";"Hani";"500";"Han (Hanzi, Kanji,
Hanja)"
"idéogrammes han (variante simplifiée)";"Hans";"501";""
"idéogrammes han (variante traditionnelle)";"Hant";"502";""

For the French name of "Hangul", I also found "Hang" quite strange (never seen
this orthograph before)
Documents in French from Korea or from Korean users in French refer to "Hangul",
"Hangoul", or "Hangûl", rarely "Hangeul" whose French reading as "*Ha:n'jeul" or
"*Hãjeul" would cause problem.
Some sources are using "Hangueul" which spells correctly in French but it may be
offensive as it is too near from the popular slang French verb "engueuler"
conjugated as "engueule" (a "correct" synonym for this verb is "gronder",
sometimes "enguirlander" in the popular language, because the radical "gueule"
is used normally to speak about to animal faces/mouths).

Re: ISO 15924 draft fixes

At 13:00 +0200 2004-05-20, Philippe Verdy wrote:
(I wonder why this file is zipped, given its small size,
If uncompressed, downloading it opens it in the browser rather than 
downloading it.

and the fact that the text file is coded in Unix-style end-of-line format,
I used Mac OS X TextEdit.
not in MIME/DOS/Windows format which one could assume as Zip was 
primarily developed on DOS/Windows... If you want a Unix-style 
format, compress it with gzip instead)
Can everyone un-gzip? Everyone can un-zip.
You did not reply to the change of orthograph for the English name 
of Malalayam (a dot below diacritic removed), which was not shown in 
your proposed list of changes (in HTML format, within your zip 
archive).
I am NOT going to track all the problems in all of those tables. I am 
tracking the changes between the two plain-text files ONLY, and 
Malayalam was not spelled differently in the first one.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

From: "Michael Everson" <[EMAIL PROTECTED]>
> At 11:16 +0200 2004-05-20, Philippe Verdy wrote:
> >From: "Michael Everson" <[EMAIL PROTECTED]>
> >At 03:28 +0200 2004-05-20, Philippe Verdy wrote:
> >>  >It was in the previous list (see the online HTML table 2).
> >>  What does that refer to?
> >
> >See
> >http://www.unicode.org/iso15924/iso15924-codes.html
> >(sorry it was Table 1):
> >Sylo 316 Syloti Nagri sylotî nâgrî   2004-01-09
> >Can't you get the same page from the Unicode web site?
>
> There are a number of pages, Philippe.

Not so much: 4 pages only (the links for the English left column and the French
right column are the same), plus 1 link to the downloadable zipped plain-text
version (I wonder why this file is zipped, given its small size, and the fact
that the text file is coded in Unix-style end-of-line format, not in
MIME/DOS/Windows format which one could assume as Zip was primarily developed on
DOS/Windows... If you want a Unix-style format, compress it with gzip instead)

Keep this in mind:
- table 1 is sorted alphabetically by 4-letter codes
http://www.unicode.org/iso15924/iso15924-codes.html
- table 2 is sorted numerically by 3-digits codes
http://www.unicode.org/iso15924/iso15924-num.html
- table 3 is sorted alphebetically by English script name
http://www.unicode.org/iso15924/iso15924-en.html
- table 4 is sorted alphebetically by French script name
http://www.unicode.org/iso15924/iso15924-fr.html

Table numbers correspond to the order of fields in the plain text version.

You did not reply to the change of orthograph for the English name of Malalayam
(a dot below diacritic removed), which was not shown in your proposed list of
changes (in HTML format, within your zip archive).

Re: Qamats Qatan (was Response to Everson Phoenician and why June 7?)

At 22:18 -0700 2004-05-19, John Hudson wrote:
I don't automatically accept the argument, made by Michael earlier 
today, that 'There is a requirement for distinction for X in 
plain-text'.
The Universal Character Set is supposed to contain all the scripts of 
the world. For generations students of writing systems have been 
naming and distinguishing scripts. I have such books here from the 
middle of the 19th century; presses in France, England, and Italy 
were cutting type for these scripts two hundred years before that at 
least. The plan is to encode the important distinguishable nodes. A 
different level from palaeography, as can be seen from the 
unification of varieties under the rubric "Phoenician". (I note that 
at least one person has argued reasonably for splitting Neo-Punic off 
of that unification.)

On what basis do we decide that X is necessary in plain-text while Y 
should be done with mark-up or some other 'higher level protocol'?
Wit and taste? There isn't an algorithm.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: Response to Everson Phoenician and why June 7?

At 20:54 -0700 2004-05-19, John Hudson wrote:
Some people seem keen on variation selectors in the same way that 
others are keen on PUA: as a catch-all solution to non-existent 
problems.
I agree. My mantra is "Variation selectors are pseudo-coding."
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

At 11:52 +0200 2004-05-20, Antoine Leca wrote:
[Mailed _and_ posted to the list; UTF-8]
On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote:
 I would appreciate it if interested persons could look this over and
 inform me if they find any further discrepancies between the two
 which are worth troubling about. Then we will proceed to generate the
 other files.
The French name for Hang looks strange. It happened to be "hangul (hangul,
hangeul)" (after quite a bit of discussion.)
That's an error in the file.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

At 11:16 +0200 2004-05-20, Philippe Verdy wrote:
From: "Michael Everson" <[EMAIL PROTECTED]>
At 03:28 +0200 2004-05-20, Philippe Verdy wrote:
 >It was in the previous list (see the online HTML table 2).
 What does that refer to?
See 
http://www.unicode.org/iso15924/iso15924-codes.html
(sorry it was Table 1):
Sylo 316 Syloti Nagri sylotî nâgrî   2004-01-09
Can't you get the same page from the Unicode web site?
There are a number of pages, Philippe.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca

[Mailed _and_ posted to the list; UTF-8]

On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote:

> I would appreciate it if interested persons could look this over and
> inform me if they find any further discrepancies between the two
> which are worth troubling about. Then we will proceed to generate the
> other files.

The French name for Hang looks strange. It happened to be "hangul (hangul,
hangeul)" (after quite a bit of discussion.)

Antoine

Re: ISO 15924 draft fixes




From: "Michael Everson" <[EMAIL PROTECTED]>
At 03:28 +0200 2004-05-20, Philippe Verdy wrote:> >It was in the 
previous list (see the online HTML table 2).> What does that refer 
to?
See http://www.unicode.org/iso15924/iso15924-codes.html
(sorry it was Table 1):


  
  
Sylo
316
Syloti Nagri
sylotî nâgrî
 
2004-01-09
Can't you get the same page from the Unicode web site?

Re: ISO 15924 codes for ConScript