subject:"Re\: ISO 15924 draft fixes"

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 At 23:21 +0200 2004-05-20, Philippe Verdy wrote:
 There is still a conflict of Code for Mandaean, is it Mand or Mnda?

 Mand.

OK This is now corrected on the new HTML pages.
But the new normative plain-text file now contains... Mnda !!!

---

Beside this, an important note to other users trying to get the corrected pages:
I know I need to clear my browser cache to get the newest files, because the
Unicode server does not mark page changes with documents dates, so they will
rarely expire in browser caches. For HTML pages this is not a problem as you can
use the Refresh button of your browser to load the new pages.
But for downloading (even with right-click and Save as...), the browser cache
may maintain a copy of the old archive, that MUST be removed manually by the
visitor to get the new file and not an old copy!

---

As a side note to Michael or the other 6 RA members (Ken, and Rick notably), I
don't think it's even a good idea to ZIP this reference plain-text file due to
its very small size (which smaller than each of the HTML versions of codelists).
It could be presented directly under the URL:
http://www.unicode.org/iso15924/iso15924-code.UTF-8.txt

The plain text would appear directly in the browser window where it could be
saved as well, without needing any ZIP tool... This will work provided that the
text is effectively coded with the MIME plain-text conventions for end-of-lines
(as used in DOS, Windows, OS/2, CP/M, VMS, and all RFCs... i.e. with CRLF
end-of-lines, as stated in the text/plain MIME document type RFC), and if the
Unicode web site server correctly identifies *.txt files with a Content-Type:
text/plain header, or even better by identifying *.UTF-8.txt as
Content-Type: text/plain; charset=UTF-8.

---

I hope that the published versions will soon be acceptable for getting from the
current FDIS status (Final Draft International Standarf) to the Standard
status in ISO. I looked into the TC46 web site and for now ISO 15924 is still
not a final standard but a final draft (the last step before standard and
technical revisions which ISO may publish yearly):

http://comelec.afnor.fr/servlet/ServletComelec?form_name=cFormIndexlogin=invitepassword=inviteorganisme=isocomite=tc46
(then click on the Work programme link, which can't be posted here as it
requires a session ID set by the AFNOR web server)

Re: ISO 15924 draft fixes

2004-05-21 Thread Antoine Leca

On Thursday, May 20th, 2004 23:56, Philippe Verdy wrote:

 I see no real problem if not all the different orthographies are
 listed or if they are not used universally. As long as the name is
 non ambiguous. What will be important for interchange of data will
 not be this name but the Code (or N°, or even ID in UAX#24
 properties).

I disagree. When I put content on the web, under my signature, I care about
whether is written correctly or not. And when there are different
possibilities, I prefer the best one given any other constraints (such as
technical limitations here or there.)


 So there's nothing wrong if Han'gul is shown to users

Sorry: this is meaningless to me as French reader. And it is a mistake
(missing breve) when it comes about the McCune-Reischauer scheme. Half-good
fallback mechanisms are usually better than nothing, but worse than anything
else. And we do have better possibilities here.


 French normally has no caron and no breve, and the circumflex is used
 to mark a slight alteration of the vowel because of an assimilated
 consonnant in the historical orthograph (most often this circumflex
 in French denotes a lost s after the vowel).

Or it can be for other reasons. Which consonant is involved in dû?


 So the curcumflex on Hangul would be inappropriate for French,

Please go to Langues'O for this commentary. As I wrote, you will be probably
answered with the historical context.

Also, there are a number of circumflexes already in the names, which have
nothing to do with swallowed s (like in dévanâgarî), which furthermore are
the main entries, unlike the case at hand. Are you proposing to drop them?
Perhaps in favour of macrons (like is done in a number of dictionnaries, by
the way)?


 [Comments-OT]
 The problem of apostrophes is that French keyboards don't have
 it, but only have a single-quote.

Huh ???
That is quite a time I did not use a French keyboard on NT/2000, but until
now, all did send apostrophes, not single-quote.


Antoine

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

Philippe Verdy wrote:
 Michael Everson wrote:
  Philippe Verdy wrote:
  There is still a conflict of Code for Mandaean, is it Mand or Mnda?
 
  Mand.

 OK This is now corrected on the new HTML pages.
 But the new normative plain-text file now contains... Mnda !!!

I updated my own Excel sheet at:
http://www.rodage.org/pub/iso15924-sheets.xls
with the addition of 'Phags-Pa, and new fixes by Michael.
Also browsable in HTML:
http://www.rodage.org/pub/iso15924-sheets.html
Or downloadable:
http://www.rodage.org/pub/iso15924-sheets.zip

About cell background colors:

- light blue signals the english or French names that have been kept when
removing duplicate rows with alternate names.

- yellow signals the changes that have already been applied with the previous
published version.
(I see that some dates have been changed for a row without any change in the
other fields in any of the 5 previous tables, when compared to their first
published version.)

- light red signals missing changes:
* dates that should be changed but have still not,
* the case of Asomtavruli whiwh has been removed but is not signaled in changes,
* the PropertyValueAlias=Common for Code=Zyyy, as found in the UCD
* the missing change from Mnda to Mand in the current plain-text version
(change already applied in the currently published HTML versions of table 1 and
2).

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

From: Antoine Leca [EMAIL PROTECTED]
 On Thursday, May 20th, 2004 23:56, Philippe Verdy wrote:

  I see no real problem if not all the different orthographies are
  listed or if they are not used universally. As long as the name is
  non ambiguous. What will be important for interchange of data will
  not be this name but the Code (or N, or even ID in UAX#24
  properties).

 I disagree. When I put content on the web, under my signature, I care about
 whether is written correctly or not. And when there are different
 possibilities, I prefer the best one given any other constraints (such as
 technical limitations here or there.)

  So there's nothing wrong if Han'gul is shown to users

 Sorry: this is meaningless to me as French reader.

Je ne sais pas si tu t'en es rendu compte mais je suis franais aussi et vit en
France...

Comment prononces-tu les termes franais trs courants hanche ou hangar ?
Avec une voyelle nasale, mais sans le son n! La proximit orthographie
vidente avec ces deux mots conduits  sa prononciation normale par un lecteur
francophone avec une voyelle nasale, le terme hangul tant trs mal connu des
franais, ou reconnu comme un terme non franais...

L'apostrophe est nettement plus correcte car son usage marque une lision d'au
moins une voyelle aprs la consonne, cette dernire (le n dans notre cas)
tant alors lue distinctement et fusionne avec la lettre (voyelle ou consonne)
suivante.
Dans ce cas la squence consonnantale n'g sera lue nettement plus
correctement, sparment du a qui la prcde dans Han'gul prononc /*h a: ng
 l/  alors que Hangul se prononcerait normalement /*h : g  l/ (ici je note
avec /*h/ le h aspir, normalement non prononc en franais mais qui interdit
les liaisons et lisions avant le mot).

 And it is a mistake
 (missing breve) when it comes about the McCune-Reischauer scheme. Half-good
 fallback mechanisms are usually better than nothing, but worse than anything
 else. And we do have better possibilities here.

Est-ce que ce McCune ou ce Reischauer sont des francophones natifs? Ils
connaissent sans aucun dote le franais mais leur choix acadmique qui a
conduit  leur standard de _translitration_ (et non pas de _traduction_) est
tranger  toute considration sur l'adquation de cette translitration latine
avec la langue et l'orthographe franaise...

  French normally has no caron and no breve, and the circumflex is used
  to mark a slight alteration of the vowel because of an assimilated
  consonnant in the historical orthograph (most often this circumflex
  in French denotes a lost s after the vowel).

 Or it can be for other reasons. Which consonant is involved in d?

Difficile  dire, vu qu'il s'agit d'une forme conjugue d'un verbe TRES
irrgulier (devoir) o mme le radical est modifi, ou de sa substantivation. Je
suppose que la prsence de ce circonflxe se justifie hitoriquement par la
volont de le distinguer de l'article indfini contract du. Au passage, note
que le circonflxe disparat au fminin et (selong certains auteurs) au
pluriel... Certains lecteurs font la diffrence  cause de cet accent, et
prononcent du avec un u bref, et d avec un u long.

  So the curcumflex on Hangul would be inappropriate for French,

Sachant que la prononciation du circonflxe en Franais produit souvent un
allongement de la voyelle, l'utilisation du circonflxe  la place d'un accent
bref est trs incorrect...

 Please go to Langues'O for this commentary. As I wrote, you will be probably
 answered with the historical context.

C'est quoi Langues'O ? O est-ce ?

 Also, there are a number of circumflexes already in the names, which have
 nothing to do with swallowed s (like in dvangar), which furthermore are
 the main entries, unlike the case at hand. Are you proposing to drop them?
 Perhaps in favour of macrons (like is done in a number of dictionnaries, by
 the way)?

Ici
  [Comments-OT]
  The problem of apostrophes is that French keyboards don't have
  it, but only have a single-quote.

 Huh ???
 That is quite a time I did not use a French keyboard on NT/2000, but until
 now, all did send apostrophes, not single-quote.

Le clavier franais standard affiche une apostrophe sur le clavier qias gnre
seulement une quote simple utilisable  droite comme  gauche (donc gnralement
rendue verticalement dans nombre de polices).

Dans cette phrase je faisis rfrence  la diffrence de codage entre la quote
simple ASCII et le vritable caractre apostrophe (ou virgule haute). Il est
vrait que certainbes polices de caractres ne font pas la diffrence entre les
deux, mais les deux codes ont des usages spars. J'insiste donc: le clavier
franais standard ne gnre pas l'apostrophe (qui n'est pas le caractre ASCII)
mais une quote simple (dans le jeu ASCII)...

Re: ISO 15924 draft fixes

2004-05-21 Thread John Cowan

Philippe Verdy scripsit:

  Please go to Langues'O for this commentary. As I wrote, you will be
  probably answered with the historical context.
 
 C'est quoi Langues'O ? Où est-ce ?

Please forgive me for intruding into an internal francophone matter, but
whenever I see Langues'O, my mind insists on correcting it into
Langues d'O, as in Histoire d'O.  Not that I read French.

-- 
John Cowan  [EMAIL PROTECTED]http://www.reutershealth.com
Not to know The Smiths is not to know K.X.U.  --K.X.U.

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 10:28 +0200 2004-05-21, Philippe Verdy wrote:
From: Michael Everson [EMAIL PROTECTED]
 At 23:21 +0200 2004-05-20, Philippe Verdy wrote:
 There is still a conflict of Code for Mandaean, is it Mand or Mnda?
 Mand.
OK This is now corrected on the new HTML pages.
But the new normative plain-text file now contains... Mnda !!!
Whoops. It was late, and that change was made by hand.
As a side note to Michael or the other 6 RA members (Ken, and Rick notably), I
don't think it's even a good idea to ZIP this reference plain-text file due to
its very small size (which smaller than each of the HTML versions of 
codelists).
Surely it is not harmful.
It could be presented directly under the URL:
http://www.unicode.org/iso15924/iso15924-code.UTF-8.txt
The plain text would appear directly in the browser window where it could be
saved as well, without needing any ZIP tool...
Everyone has a zip tool.
I am not very happy about loading the plain-text in browsers. Three 
of my browsers load it and *all* the French UTF-8 is displayed in 
Latin 1.

I hope that the published versions will soon be acceptable for 
getting from the current FDIS status (Final Draft International 
Standarf) to the Standard status in ISO. I looked into the TC46 
web site and for now ISO 15924 is still not a final standard but a 
final draft
It *has* been published by ISO, though the TC46 web site doesn't reflect this.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-21 Thread Curtis Clark

on 2004-05-21 07:10 Michael Everson wrote:
I am not very happy about loading the plain-text in browsers. Three of 
my browsers load it and *all* the French UTF-8 is displayed in Latin 1.
This *may* be a server issue. Iirc, the server has to be told to mark 
the text/plain MIME-type as UTF-8, since there are no meta tags (as 
there could be in HTML) and since browsers generally lack the heuristics 
to decide on coding of plain text.

--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/

Re: ISO 15924 draft fixes

2004-05-21 Thread Doug Ewell

Michael Everson everson at evertype dot com wrote:

 The plain text would appear directly in the browser window where it
 could be saved as well, without needing any ZIP tool...

 Everyone has a zip tool.

 I am not very happy about loading the plain-text in browsers. Three
 of my browsers load it and *all* the French UTF-8 is displayed in
 Latin 1.

Why not post both copies, zipped and unzipped, and let the user decide
which one she wants to download?

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: ISO 15924 draft fixes

2004-05-21 Thread Mark Davis

 I am not very happy about loading the plain-text in browsers. Three
 of my browsers load it and *all* the French UTF-8 is displayed in
 Latin 1.

Michael, you just need to put a BOM at the start of the file. Direct access to
the plain text file, would be much preferred. The file is small -- there is no
need to zip it (unlike, say, Unihan!).

Mark
__
http://www.macchiato.com
  

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Fri, 2004 May 21 07:10
Subject: Re: ISO 15924 draft fixes


 At 10:28 +0200 2004-05-21, Philippe Verdy wrote:
 From: Michael Everson [EMAIL PROTECTED]
   At 23:21 +0200 2004-05-20, Philippe Verdy wrote:
   There is still a conflict of Code for Mandaean, is it Mand or Mnda?
 
   Mand.
 
 OK This is now corrected on the new HTML pages.
 But the new normative plain-text file now contains... Mnda !!!

 Whoops. It was late, and that change was made by hand.

 As a side note to Michael or the other 6 RA members (Ken, and Rick notably),
I
 don't think it's even a good idea to ZIP this reference plain-text file due
to
 its very small size (which smaller than each of the HTML versions of
 codelists).

 Surely it is not harmful.

 It could be presented directly under the URL:
 http://www.unicode.org/iso15924/iso15924-code.UTF-8.txt
 
 The plain text would appear directly in the browser window where it could be
 saved as well, without needing any ZIP tool...

 Everyone has a zip tool.

 I am not very happy about loading the plain-text in browsers. Three
 of my browsers load it and *all* the French UTF-8 is displayed in
 Latin 1.

 I hope that the published versions will soon be acceptable for
 getting from the current FDIS status (Final Draft International
 Standarf) to the Standard status in ISO. I looked into the TC46
 web site and for now ISO 15924 is still not a final standard but a
 final draft

 It *has* been published by ISO, though the TC46 web site doesn't reflect this.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-21 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Philippe Verdy

 I updated my own Excel sheet at:

Philippe, I really appreciate the content you posted for it's potential
value in guiding the RA in doing a better job with their data.

I hope, however, that you do not plan to leave it online once Michael
has his content corrected. In the long run, it really is unhelpful to
have alternate sources for data. Inevitably, the mirrors get out of sync
as the owners move on to other interests, and inevitably someone points
to the copy, not the source.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 12:31 +0200 2004-05-21, Philippe Verdy wrote:
- light blue signals the english or French names that have been kept when
removing duplicate rows with alternate names.
Those duplicate rows did not appear in the plain-text data files, so 
will not be considered further or tracked on the code changes page. 
Khar and Khmr should have been yellow, not blue.

- yellow signals the changes that have already been applied with 
the previous published version.
(I see that some dates have been changed for a row without any change in the
other fields in any of the 5 previous tables, when compared to their first
published version.)
You didn't highlight Java.
I'm not tracking the change to Latf because it was just a 
parenthesis missing not an actual change.
Malayalam and Oriya did not change between the plain text versions 
(as I have said before).
Hanunoo had a name change.

- light red signals missing changes:
* dates that should be changed but have still not
Only changes between the plain-text documents are going to be tracked.
* the case of Asomtavruli whiwh has been removed but is not signaled 
in changes,
Only changes between the plain-text documents are going to be tracked.
* the PropertyValueAlias=Common for Code=Zyyy, as found in the UCD
Thanks! It's good that you found this.
* the missing change from Mnda to Mand in the current plain-text version
(change already applied in the currently published HTML versions of 
table 1 and 2).
That should be fixed now.
I'm going to ask Rick to regenerate the four tables. Change dates 
will be 2004-05-21. When they're posted we'll consider it the final 
beta.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 08:31 -0700 2004-05-21, Mark Davis wrote:
  I am not very happy about loading the plain-text in browsers. Three
 of my browsers load it and *all* the French UTF-8 is displayed in
 Latin 1.
Michael, you just need to put a BOM at the start of the file. Direct access to
the plain text file, would be much preferred. The file is small -- there is no
need to zip it (unlike, say, Unihan!).
I asked you yesterday How?.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-21 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 As a side note to Michael or the other 6 RA members (Ken, and Rick
notably), I
 don't think it's even a good idea to ZIP this reference plain-text
file due to
 its very small size (which smaller than each of the HTML versions of
 codelists).
 
 Surely it is not harmful.

I agree, it's not harmful. But I agree with Philippe, it's not
particularly helpful or necessary.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 07:57 -0700 2004-05-21, Curtis Clark wrote:
on 2004-05-21 07:10 Michael Everson wrote:
I am not very happy about loading the plain-text in browsers. Three 
of my browsers load it and *all* the French UTF-8 is displayed in 
Latin 1.
This *may* be a server issue. Iirc, the server has to be told to 
mark the text/plain MIME-type as UTF-8, since there are no meta 
tags (as there could be in HTML) and since browsers generally lack 
the heuristics to decide on coding of plain text.
I am going to keep the file zipped.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-21 Thread Doug Ewell

Antoine Leca Antoine10646 at leca dash marti dot org wrote:

 So there's nothing wrong if Han'gul is shown to users

 Sorry: this is meaningless to me as French reader. And it is a mistake
 (missing breve) when it comes about the McCune-Reischauer scheme.
 Half-good fallback mechanisms are usually better than nothing, but
 worse than anything else. And we do have better possibilities here.

This question of how to spell Hangul in French was discussed on the
ISO 15924 discussion list back in 2000.  (Antoine, you may remember that
discussion; you were involved in it.)  Michael wasn't happy about having
to maintain three separate spellings, which, as he correctly pointed
out, adds nothing to the standard but draws attention to the fact that
Korean transliteration is unstandardized.  But apparently somebody
deemed it necessary, because there they are.

In any case, the question of *which* French-based transliteration(s) to
use seems to have been decided already.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

From: Peter Constable [EMAIL PROTECTED]
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf
  Of Philippe Verdy

  I updated my own Excel sheet at:

 Philippe, I really appreciate the content you posted for it's potential
 value in guiding the RA in doing a better job with their data.

 I hope, however, that you do not plan to leave it online once Michael
 has his content corrected. In the long run, it really is unhelpful to
 have alternate sources for data. Inevitably, the mirrors get out of sync
 as the owners move on to other interests, and inevitably someone points
 to the copy, not the source.

In fact what I'll do is to replace that by a JavaScript version, whose source
data will be feeded and cached (in the server) from the Unicode normative text
file, to generate a Javascript array. The colored version was there to help
show what I found.

Michael said that he will ignore all differences found in the previous HTML
files, considering only the text file as the source and adding the missing
elements.

Since then, there has been no clear justification for the removal of Georgian
Asomtavruli (I was told that the two scripts were being disunified in Unicode,
and it is already for bibliographic references, and considered distinct by most
Georgian readers that can't read it, but can read perfectly the default
Mkhedruli script variant with various combinations of diacritics for
transliteration, that could not work correctly if written with the Asomtavruli
variant).

So the 4-letter code has been published for some time, but only with a
conflicting 3-digits numeric code. As most users of ISO15924 will ignore the
numeric code in most applications, they may already have started to tag their
Asomtavruli references with Geoa (it was said that it was valid and
standard...) instead of Private Use codes (in Qaaa to Qabx). Will they need to
revert them? What if documents or books have already been printed in Georgia
using the Geoa code in their references? Or if this has already been used to
feed librarian indices for interchange?

May be there was no prior approval of this code and the publication was delayed
for later and should not have been published... Oh well...

---

Thanks to Michael for the addition of PropertyValueAlias=Common for
Code=Zyyy, and the correction of the incorrect HTML syntax of NCRs. I would
have much prefered the absence of line wrap in this code (copy/paste operations
by developers will insert an undesirable additional character that may be
unnoticed in sources).
On the opposite, there was no real need to prohibit line wraps in the Date
column.

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

From: Doug Ewell [EMAIL PROTECTED]
 In any case, the question of *which* French-based transliteration(s) to
 use seems to have been decided already.

Is it true also for N=206, Code=Goth, English_Name=Gothic,
Nom_franais=Gotique, Property_Value_Alias=Gothic ?

My French dictionnaries (Petit Larousse, Robert de la Langue Franaise) refer to
Gothique (with a h), including my French-German dictionnary:
* [fr] Goth (n.m.)
= [de] Gote (n.m.),  Gotin (n.f.), Gotik (adj. in Archeology).
* [fr] gothique (adj.)
= [de] gotisch (adj.)

The French name of a script is built on the adjective (used to qualify
caractre or criture), written in the masculine singular form as it can
also be a substantivation of the adjective used alone (some exceptions exist in
the other current French names containing syllabaire, codet, parole and
hiroglyphes where a nominal group is used rather than an adjective, with only
hiroglyphes using the plural in both French and English).

I have no rfrence in my French dictionnaries for Gotique, but LOTS of
references to criture gothique ou caractres gothiques (including on the
web and in calligraphy/typography books). I think it's a typo here... So this
should be Nom_franais=gothique.

Re: ISO 15924 draft fixes - UTF-8 BOM

2004-05-21 Thread Markus Scherer

Michael Everson wrote:
At 08:31 -0700 2004-05-21, Mark Davis wrote:
 I am not very happy about loading the plain-text in browsers. Three
 of my browsers load it and *all* the French UTF-8 is displayed in
 Latin 1.
Michael, you just need to put a BOM at the start of the file. Direct 
access to
the plain text file, would be much preferred. The file is small -- 
there is no
need to zip it (unlike, say, Unihan!).
I asked you yesterday How?.
In Windows Notepad, if you save as UTF-8, then you get a BOM/signature.
(It has been suggested in the past to define a charset name that implies the use of the signature 
with UTF-8 much like the UTF-16 charset/CES works.)

markus

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 21:38 +0200 2004-05-21, Philippe Verdy wrote:
Michael said that he will ignore all differences found in the previous HTML
files, considering only the text file as the source and adding the missing
elements.
Yes, I did.
Since then, there has been no clear justification for the removal of Georgian
Asomtavruli (I was told that the two scripts were being disunified in Unicode,
and it is already for bibliographic references, and considered 
distinct by most Georgian readers that can't read it, but can read 
perfectly the default
Mkhedruli script variant with various combinations of diacritics for
transliteration, that could not work correctly if written with the Asomtavruli
variant).
Good gods.
Philippe, an early version of the draft had Asomtavruli and Nuskhuri. 
Both were removed before publiciation. One instance of Asomtavruli 
was left in by ACCIDENT. It is likely that Khutsuri will be added, 
since it encompasses the casing pair.

So the 4-letter code has been published for some time, but only with a
conflicting 3-digits numeric code. As most users of ISO15924 will ignore the
numeric code in most applications, they may already have started to tag their
Asomtavruli references with Geoa
Might they? In the last 20 days? Be serious.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-21 Thread Michael Everson

At 22:04 +0200 2004-05-21, Philippe Verdy wrote:
I have no référence in my French dictionnaries for Gotique, but LOTS of
references to écriture gothique ou caractères gothiques (including on the
web and in calligraphy/typography books). I think it's a typo here... So this
should be Nom_français=gothique.
1. I don't want to entertain this sort of thing 
until the other problems are dealt with.

2. There is an online form to fill out if you want suchlike to be considered.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-21 Thread Doug Ewell

Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 In any case, the question of *which* French-based transliteration(s)
 to use seems to have been decided already.

 Is it true also for N=206, Code=Goth, English_Name=Gothic,
 Nom_franais=Gotique, Property_Value_Alias=Gothic ?

I have no idea.  I was only referring to the Hangul question.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: ISO 15924 draft fixes

2004-05-21 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 At 22:04 +0200 2004-05-21, Philippe Verdy wrote:

 I have no référence in my French dictionnaries for Gotique, but LOTS of
 references to écriture gothique ou caractères gothiques (including on the
 web and in calligraphy/typography books). I think it's a typo here... So this
 should be Nom_français=gothique.

 1. I don't want to entertain this sort of thing
 until the other problems are dealt with.

 2. There is an online form to fill out if you want suchlike to be considered.

May be that's a question that the ISO 15924 member representing TC46 and working
for Encyclopedia Universalis (I can't remember his name) could reply. He's
French too and he has access to large collections of documents. I don't have his
encyclopedia, much too expensive for my budget, and as most French people, my
references to French dictionnaries is limited to the common Petit Larousse and
Petit Robert, and a much less expensive encyclopedia... These are now old
editions... dated 1982 with much less concerns with imports of foreign words.

OK it's not critical for now. The script names are described in Unicode, and in
the first edition, they could be left informative but still descriptive enough
to avoid ambiguities, rather than becoming normative. Only the codes and aliases
will be normative for now.

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy




From: "Michael Everson" [EMAIL PROTECTED]
At 03:28 +0200 2004-05-20, Philippe Verdy wrote: It was in the 
previous list (see the online HTML table 2). What does that refer 
to?
See http://www.unicode.org/iso15924/iso15924-codes.html
(sorry it was Table 1):


  
  
Sylo
316
Syloti Nagri
sylotî nâgrî

2004-01-09
Can't you get the same page from the Unicode web site?

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca

[Mailed _and_ posted to the list; UTF-8]

On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote:

 I would appreciate it if interested persons could look this over and
 inform me if they find any further discrepancies between the two
 which are worth troubling about. Then we will proceed to generate the
 other files.

The French name for Hang looks strange. It happened to be hangul (hangul,
hangeul) (after quite a bit of discussion.)

Antoine

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 11:16 +0200 2004-05-20, Philippe Verdy wrote:
From: Michael Everson mailto:[EMAIL PROTECTED][EMAIL PROTECTED]
At 03:28 +0200 2004-05-20, Philippe Verdy wrote:
 It was in the previous list (see the online HTML table 2).
 What does that refer to?
See 
http://www.unicode.org/iso15924/iso15924-codes.htmlhttp://www.unicode.org/iso15924/iso15924-codes.html
(sorry it was Table 1):
Sylo 316 Syloti Nagri sylotî nâgrî   2004-01-09
Can't you get the same page from the Unicode web site?
There are a number of pages, Philippe.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 11:52 +0200 2004-05-20, Antoine Leca wrote:
[Mailed _and_ posted to the list; UTF-8]
On Wednesday, May 19th, 2004 10:40 PM, Michael Everson wrote:
 I would appreciate it if interested persons could look this over and
 inform me if they find any further discrepancies between the two
 which are worth troubling about. Then we will proceed to generate the
 other files.
The French name for Hang looks strange. It happened to be hangul (hangul,
hangeul) (after quite a bit of discussion.)
That's an error in the file.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 At 11:16 +0200 2004-05-20, Philippe Verdy wrote:
 From: Michael Everson mailto:[EMAIL PROTECTED][EMAIL PROTECTED]
 At 03:28 +0200 2004-05-20, Philippe Verdy wrote:
   It was in the previous list (see the online HTML table 2).
   What does that refer to?

 See
 http://www.unicode.org/iso15924/iso15924-codes.html
 (sorry it was Table 1):
 Sylo 316 Syloti Nagri sylotî nâgrî   2004-01-09
 Can't you get the same page from the Unicode web site?

 There are a number of pages, Philippe.

Not so much: 4 pages only (the links for the English left column and the French
right column are the same), plus 1 link to the downloadable zipped plain-text
version (I wonder why this file is zipped, given its small size, and the fact
that the text file is coded in Unix-style end-of-line format, not in
MIME/DOS/Windows format which one could assume as Zip was primarily developed on
DOS/Windows... If you want a Unix-style format, compress it with gzip instead)

Keep this in mind:
- table 1 is sorted alphabetically by 4-letter codes
http://www.unicode.org/iso15924/iso15924-codes.html
- table 2 is sorted numerically by 3-digits codes
http://www.unicode.org/iso15924/iso15924-num.html
- table 3 is sorted alphebetically by English script name
http://www.unicode.org/iso15924/iso15924-en.html
- table 4 is sorted alphebetically by French script name
http://www.unicode.org/iso15924/iso15924-fr.html

Table numbers correspond to the order of fields in the plain text version.

You did not reply to the change of orthograph for the English name of Malalayam
(a dot below diacritic removed), which was not shown in your proposed list of
changes (in HTML format, within your zip archive).

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 13:00 +0200 2004-05-20, Philippe Verdy wrote:
(I wonder why this file is zipped, given its small size,
If uncompressed, downloading it opens it in the browser rather than 
downloading it.

and the fact that the text file is coded in Unix-style end-of-line format,
I used Mac OS X TextEdit.
not in MIME/DOS/Windows format which one could assume as Zip was 
primarily developed on DOS/Windows... If you want a Unix-style 
format, compress it with gzip instead)
Can everyone un-gzip? Everyone can un-zip.
You did not reply to the change of orthograph for the English name 
of Malalayam (a dot below diacritic removed), which was not shown in 
your proposed list of changes (in HTML format, within your zip 
archive).
I am NOT going to track all the problems in all of those tables. I am 
tracking the changes between the two plain-text files ONLY, and 
Malayalam was not spelled differently in the first one.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 10:40 PM
Subject: ISO 15924 draft fixes

 The Registrar wishes to thank everyone who has taken an interest in
 the ISO 15924 data pages, and regrets the imperfections which are
 contained there. I am not sure how we will manage the generation of
 the pages, but it is clear that the base should be the plain-text
 document.

 I have made changes to the plain-text document and placed it, a draft
 Changes page, and the original plain-text document available at
 http://www.unicode.org/iso15924/iso15924-fixes.zip

 I would appreciate it if interested persons could look this over and
 inform me if they find any further discrepancies between the two
 which are worth troubling about. Then we will proceed to generate the
 other files.

 I deleted some duplicate lines: Ethiopic was on two lines, under
 Ethiopic and under Ge'ez. It seemed inappropriate to burden the
 tables with such duplication.

 I added Coptic unilaterally.

I can't see Coptic for now in your source zip file.

There are other duplicate lines for name aliases that should be listed in
changes:
- Berber (Tifinagh)
= Tifinagh (Berber)
- (Burmese) Myanmar
= Myanmar (Burmese)
- Fraktur (variant of Latin)
= Latin (Fraktur variant)
- Gaelic (variant of Latin)
= Latin (Gaelic variant)
- Harappan (Indus)
= Indus (Harappan)
- Mormon (Deseret)
= Deseret (Mormon)
- Nagari (Devanagari)
= Devanagari (Nagari)
- Old Church Slavonic (variant of Cyrillic)
= Cyrillic (Old Church Slavonic variant)

Note that the French names for Han variants are identical 'idéogrammes han,
when the English names correctly indicates the distinction between Traditional
and Simplified variants. These French names should be:
idéogrammes han (Hanzi, Kanji, Hanja);Hani;500;Han (Hanzi, Kanji,
Hanja)
idéogrammes han (variante simplifiée);Hans;501;
idéogrammes han (variante traditionnelle);Hant;502;

For the French name of Hangul, I also found Hang quite strange (never seen
this orthograph before)
Documents in French from Korea or from Korean users in French refer to Hangul,
Hangoul, or Hangûl, rarely Hangeul whose French reading as *Ha:n'jeul or
*Hãjeul would cause problem.
Some sources are using Hangueul which spells correctly in French but it may be
offensive as it is too near from the popular slang French verb engueuler
conjugated as engueule (a correct synonym for this verb is gronder,
sometimes enguirlander in the popular language, because the radical gueule
is used normally to speak about to animal faces/mouths).

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 13:37 +0200 2004-05-20, Philippe Verdy wrote:
  I added Coptic unilaterally.
I can't see Coptic for now in your source zip file.
It isn't in that file.
There are other duplicate lines for name aliases that should be listed in
changes:
I'm not going to list those changes. There is no code or name change involved.
- Berber (Tifinagh)
= Tifinagh (Berber)
[...]
Note that the French names for Han variants are identical 'idéogrammes han,
when the English names correctly indicates the 
distinction between Traditional and 
Simplified variants. These French names should 
be:
idéogrammes han (Hanzi, Kanji, Hanja);Hani;500;Han (Hanzi, Kanji,
Hanja)
idéogrammes han (variante simplifiée);Hans;501;
idéogrammes han (variante traditionnelle);Hant;502;
This has been corrected.
For the French name of Hangul, I also found Hang quite strange (never seen
this orthograph before)
Orthograph is not the word you want. You want 
the word spelling. I already said, this error 
has been corrected.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 It can't be Unicode's UTC alone, as there are
 already codes for bibliographic references that
 are not (and will never) be encoded separately
 in Unicode,so I suppose that there are librarian
 or publishers members with which you have to
 discuss, independantly of the work of Unicode,
 which should only be the registrar for these
 codes. May be there's still no formal procedure,
 and for now the codes are maintainable without
 lots of administration.

 Read the standard.

Stop this easy argument (that I find offensive here), you could have read it too
before publishing tables with errors (most probably because you forgot to
consult the relevant sources to check that your document were correct; I note
that you are taking some freedom with you own decisions, regarding Coptic and
the removal of Georgian (Asomtavruli) coded Geoa). I have read it and that's
why I propose corrections...

OK there are lots of corrections, but that's not a reason of ignoring some
elements that were already published (and are still published for now on the
Unicode web site, which is the only reference for the ISO15924 Registration
Authority. Unicode has just appointed you to perform administrative updates for
the RA, not to take your own decisions.)

Sorry if you think that these sentences are a bit aggressive but for now the RA
has made a bad start, and it's mainly because of your work... If the publication
was preliminary (waiting for comments) it should have been documented as such on
the Unicode web site (like for the proposals in Unicode, which pass by a testbed
before being listed as standard).

For now I suggest an immediate warning in the ISO15924 web pages, explicitly
stating that these published tables were in beta, and contain incoherences,
which are being corrected. A link should list the incoherences and the proposed
changes.
I have such a list and all it takes for me is a simple Excel spreadsheet, used
to sort the tables and detecting differences between published tables and
proposed corrections.

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 14:44 +0200 2004-05-20, Philippe Verdy wrote:
From: Michael Everson [EMAIL PROTECTED]
 It can't be Unicode's UTC alone, as there are
 already codes for bibliographic references that
 are not (and will never) be encoded separately
 in Unicode,so I suppose that there are librarian
 or publishers members with which you have to
 discuss, independantly of the work of Unicode,
 which should only be the registrar for these
 codes. May be there's still no formal procedure,
  and for now the codes are maintainable without
 lots of administration.
 Read the standard.
Stop this easy argument (that I find offensive here), you could have 
read it too before publishing tables with errors
Errors are errors. The RA-JAC had an opportunity to review all the 
tables. Do not blame me alone. People err. People have kindly pointed 
out discrepancies.

(most probably because you forgot to consult the relevant sources to 
check that your document were correct;
Don't presume.
I note that you are taking some freedom with you own decisions, 
regarding Coptic and the removal of Georgian (Asomtavruli) coded 
Geoa).
I have (properly) proposed the addition of Coptic (and some other 
scripts) to the JAC. Asomtavruli was removed for good reasons. Live 
with it. It will be reinstated in due course.

I have read it and that's why I propose corrections...
And that's why I am communicating with you, to get relevant feedback. 
The only delta we are going to deal with is the one between the 
plain-text documents; it is that which is going to be considered 
authoritative and which will be used (somehow) to generate the other 
tables.

Sorry if you think that these sentences are a bit aggressive but for 
now the RA has made a bad start, and it's mainly because of your 
work...
Nonsense. I am not ashamed. It was a hell of a lot of work getting 
that standard together. It is, as you have pointed out, difficult to 
maintain different tables by hand.

If the publication was preliminary (waiting for comments) it should 
have been documented as such on the Unicode web site (like for the 
proposals in Unicode, which pass by a testbed before being listed as 
standard).
It does NOT matter, Philippe. The corrections are being made.
For now I suggest an immediate warning in the ISO15924 web pages, 
explicitly stating that these published tables were in beta, and 
contain incoherences, which are being corrected.
No. This is purely cosmetic. Let us move on.
A link should list the incoherences and the proposed changes. I have 
such a list and all it takes for me is a simple Excel spreadsheet, 
used to sort the tables and detecting differences between published 
tables and proposed corrections.
The only delta we are going to deal with is the one between the 
plain-text documents; it is that which is going to be considered 
authoritative and which will be used (somehow) to generate the other 
tables.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Patrick Andries

Antoine Leca a crit :
The French name for Hang looks strange. It happened to be hangul (hangul,
hangeul) (after quite a bit of discussion.)
 

The name in ISO/CEI 10646 (F)  is  hangl   from a Corean dictionary 
and a Corean grammar published by the Inalco (Langues O'). Another 
suggested form in some sources, to appromixate the pronounciation.  is  
hangueul 

P. A.

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

To terminate with this discussion, I have put online the corrected tables.
http://www.rodage.org/pub/iso15924-sheets.html

(this is a Excel workbook in HTML format with frames but without Excel
interactivity, that references other URLs in a subfolder; it can be navigated
by the tabs at the bottom)

Also available as a plain Excel file:
http://www.rodage.org/pub/iso15924-sheets.xls

The above collection is also archived in
http://www.rodage.org/pub/iso15924-sheets.zip

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 06:51 -0700 2004-05-20, Patrick Andries wrote:
Antoine Leca a écrit :
The French name for Hang looks strange. It happened to be hangul (hangul,
hangeul) (after quite a bit of discussion.)
The name in ISO/CEI 10646 (F)  is « hangûl  » 
from a Corean dictionary and a Corean grammar 
published by the Inalco (Langues O'). Another 
suggested form in some sources, to appromixate 
the pronounciation.  is « hangueul »
transliterations of Korean that the Korean NB 
insisted upon. Hangul instead of hangûl we will 
treat as a spelling error (so you don't have to 
file a change form).
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 For now I suggest an immediate warning in the ISO15924 web pages,
 explicitly stating that these published tables were in beta, and
 contain incoherences, which are being corrected.
 
 No. This is purely cosmetic. Let us move on.

I find this cavalier attitude a bit disconcerting. Errors in the tables
are not purely cosmetic. An IT standard is created to support IT
implementations, and people have been and will be referring to those
tables to create their implementations. Each view of the data should be
reliable, and if it is found that it was not, then that needs to be
communicated in some way. 

IMO, it is essential that there be a place on the site for errata. I'm
inclined to agree with Philippe: the errata notes should indicate that
there were errors in the original tables and what the nature of those
errors were. If IDs were misspelled or missing, those should be
enumerated. If English or French names were misspelled, I think a
general note is sufficient.


 A link should list the incoherences and the proposed changes. I have
 such a list and all it takes for me is a simple Excel spreadsheet,
 used to sort the tables and detecting differences between published
 tables and proposed corrections.
 
 The only delta we are going to deal with is the one between the
 plain-text documents; it is that which is going to be considered
 authoritative

Is that document*s* (plural)? I strongly encourage you to maintain *one*
master source from which all others are derived.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 08:10 -0700 2004-05-20, Peter Constable wrote:
  For now I suggest an immediate warning in the ISO15924 web pages,
  explicitly stating that these published tables were in beta, and
  contain incoherences, which are being corrected.
 
  No. This is purely cosmetic. Let us move on.
I find this cavalier attitude a bit disconcerting. Errors in the tables
are not purely cosmetic.
Look, Peter. I'm glad people found errors and inconsistencies. We are 
working on fixing that, and expect it to be fixed very soon. You're 
ALL listening. Taking time to put up an immediate warning isn't a 
good use of my time.

IMO, it is essential that there be a place on the site for errata. I'm
inclined to agree with Philippe: the errata notes should indicate that
there were errors in the original tables and what the nature of those
errors were. If IDs were misspelled or missing, those should be
enumerated. If English or French names were misspelled, I think a
general note is sufficient.
The changes will be noted at 
http://www.unicode.org/iso15924/codechanges.html Please be a little 
bit patient.

  The only delta we are going to deal with is the one between the
  plain-text documents; it is that which is going to be considered
 authoritative
Is that document*s* (plural)? I strongly encourage you to maintain *one*
master source from which all others are derived.
That would be THE old plain-text document and THE new plain-text 
document which will replace it.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

Peter, Philippe,
I hope this satisfies you. http://www.unicode.org/iso15924/codelists.html
It is enough work finding and fixing and figuring out whatever it is 
that a perl script is and how to make it work. It may seem obvious to 
you, but it is not obvious to me.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson


 Taking time to put up an immediate warning isn't a
 good use of my time.

I didn't ask for an immediate warning. I will note, though, that
incorporating bad data into a product may not be a good use of time for
someone else -- and it may be far more costly for them than it will be
for you.

 
 The changes will be noted at
 http://www.unicode.org/iso15924/codechanges.html Please be a little
 bit patient.

I don't think I'm being at all impatient. I didn't ask you to do
anything yesterday; I just ask that it be done carefully. And not to
think that bad data files can be relegated to cosmetics, which is what
you seemed to be saying.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 I hope this satisfies you.
http://www.unicode.org/iso15924/codelists.html

If they are consistent and reliable, I'm satisfied with them. I hope you
will be preparing a page for corrigenda / errata.

It's not a big issue, but I don't understand why the dates don't match:
was Arab added on January 9 or May 1? So, they're not entirely
consistent.

Also, it appears you have not fixed a serious error in the plain-text
file: it is not well-structured. Some rows have 6 columns, and some have
7.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

2004-05-20 Thread Addison Phillips [wM]

I concur with Peter. If there are multiple documents now, then I'd like to see a 
single normative document... and furthermore I would like it to *be* normative (and 
I'd like to know which one it is). The text file is listed on the web site as the 
alternative...

By all means correct errors. Spelling or nomenclatural (non-substantive) changes in 
the descriptions are errata. But I view changes, additions, and deletions to/from the 
data tables as changes to the standard and they should, in my opinion, be treated as 
such even if they are only to correct errors.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Behalf Of Peter Constable
 Sent: 2004520 8:10
 To: Unicode List
 Subject: RE: ISO 15924 draft fixes
 
 
  For now I suggest an immediate warning in the ISO15924 web pages,
  explicitly stating that these published tables were in beta, and
  contain incoherences, which are being corrected.
  
  No. This is purely cosmetic. Let us move on.
 
 I find this cavalier attitude a bit disconcerting. Errors in the tables
 are not purely cosmetic. An IT standard is created to support IT
 implementations, and people have been and will be referring to those
 tables to create their implementations. Each view of the data should be
 reliable, and if it is found that it was not, then that needs to be
 communicated in some way. 
 
 IMO, it is essential that there be a place on the site for errata. I'm
 inclined to agree with Philippe: the errata notes should indicate that
 there were errors in the original tables and what the nature of those
 errors were. If IDs were misspelled or missing, those should be
 enumerated. If English or French names were misspelled, I think a
 general note is sufficient.
 
 
  A link should list the incoherences and the proposed changes. I have
  such a list and all it takes for me is a simple Excel spreadsheet,
  used to sort the tables and detecting differences between published
  tables and proposed corrections.
  
  The only delta we are going to deal with is the one between the
  plain-text documents; it is that which is going to be considered
  authoritative
 
 Is that document*s* (plural)? I strongly encourage you to maintain *one*
 master source from which all others are derived.
 
 
 
 Peter
  
 Peter Constable
 Globalization Infrastructure and Font Technologies
 Microsoft Windows Division

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 09:49 -0700 2004-05-20, Peter Constable wrote:
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 I hope this satisfies you.
http://www.unicode.org/iso15924/codelists.html
If they are consistent and reliable, I'm satisfied with them. I hope you
will be preparing a page for corrigenda / errata.
That's what http://www.unicode.org/iso15924/codechanges.html is for.
It's not a big issue, but I don't understand why the dates don't match:
was Arab added on January 9 or May 1? So, they're not entirely
consistent.
Because long long ago when I thought that ISO was going to publish 
the document on my birthday (sigh) I put 2004-01-09 on the document; 
that didn't happen, and it wasn't published until 2004-05-01.

Also, it appears you have not fixed a serious error in the 
plain-text file: it is not well-structured. Some rows have 6 
columns, and some have 7.
That might be fixed in the newest one.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 10:00 -0700 2004-05-20, Addison Phillips [wM] wrote:
I concur with Peter. If there are multiple 
documents now, then I'd like to see a single 
normative document...
It will be the plain-text version, and for the 
purposes of fixing the current regrettable mess 
I'm taking it as read that the plain text version 
was always the normative version.

and furthermore I would like it to *be* 
normative (and I'd like to know which one it 
is). The text file is listed on the web site as 
the alternative...
It should say normative.
Is the format order satisfactory? English_Name;Code;Nº;Nom_français;PVA;Date
Or would it be preferable to have it in the 
format of Table 1 
(Code;Nº;English_Name;Nom_français;PVA;Date)

--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Antoine Leca

 Antoine Leca a écrit :

 The French name for Hang looks strange. It happened to be hangul
 (hangul, hangeul) (after quite a bit of discussion.)

Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8
this morning. I meant hangul(hangul, hangeul).

According to a native ftp://dkuug.dk/ftp.anonymous/email/iso15924/277 the
correct form are the ones between parenthesis (with an added apostrophe
between han'gul).

: From: Jian YANG [EMAIL PROTECTED]
: Subject: Re: Re: (iso15924.275) Hangul (Hang~ul, Hangeul)
:   as script name (~is  adiacritical mark)
: Date: Mon, 29 May 2000 15:49:25 -0400
:
:
: «Hangeul» = Norme de romanisation du Ministère de
: l'Éducation de la Corée du Sud;
: «Hangul» = Romanisation Mc-Cune-Reischauer (la forme exacte
: est «Han'gul» : «u» with breve, et non caron; mais on a
: enlevé le signe diacritique pour accommoder la convention de
: ascii, sans doute);


On Thursday, May 20, 2004 3:51 PM, Patrick Andries va escriure:

 The name in ISO/CEI 10646 (F)  is « hangûl  » from a Corean dictionary
 and a Corean grammar published by the Inalco (Langues O').

Clearly, the Langues'O did adapt it to French typographical possibilities,
reversing the breve accent into a circumflex.

 Another
 suggested form in some sources, to appromixate the pronounciation.
 is « hangueul »

This is the other form, with an added, euphonical u after the g, to avoid a
complete misprononciation.

About whether all this right or not, I do not know. But I believe this text
did go through two ballots against the very people of Langues'O (?), so we
have no reason to correct now what was accepted in the standard. The only
choice right now is to type exactly what was printed, since I understand we
do not have any more the master that served to the [F]DIS texts.

Since I am not a member of TC46, and furthermore I was away from the process
last year, I might very easily be wrong.


Antoine

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 
 Also, it appears you have not fixed a serious error in the
 plain-text file: it is not well-structured. Some rows have 6
 columns, and some have 7.
 
 That might be fixed in the newest one.

It is not fixed in the file that's on the site now. If this is the
normative file, I'd suggest you fix it as soon as possible.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 12:07 -0700 2004-05-20, Peter Constable wrote:
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Michael Everson

 Also, it appears you have not fixed a serious error in the
 plain-text file: it is not well-structured. Some rows have 6
 columns, and some have 7.
 That might be fixed in the newest one.
It is not fixed in the file that's on the site now. If this is the
normative file, I'd suggest you fix it as soon as possible.
Which file on the site?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Addison Phillips [wM]

I don't care about the order, so long as it is stable over time. Personally I find the 
latter form more logical (with the identifier, i.e. the code, first). I view the 
English and French names and the PVA as merely descriptive or informative 
information. The code and the ID number should go first, IMO.

But if the file is in some other format, that's fine, so long as the format is stable.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture. 
It is not a feature.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Behalf Of Michael Everson
 Sent: 2004520 10:59
 To: [EMAIL PROTECTED]
 Subject: RE: ISO 15924 draft fixes
 
 
 At 10:00 -0700 2004-05-20, Addison Phillips [wM] wrote:
 I concur with Peter. If there are multiple 
 documents now, then I'd like to see a single 
 normative document...
 
 It will be the plain-text version, and for the 
 purposes of fixing the current regrettable mess 
 I'm taking it as read that the plain text version 
 was always the normative version.
 
 and furthermore I would like it to *be* 
 normative (and I'd like to know which one it 
 is). The text file is listed on the web site as 
 the alternative...
 
 It should say normative.
 
 Is the format order satisfactory? 
 English_Name;Code;N;Nom_franais;PVA;Date
 Or would it be preferable to have it in the 
 format of Table 1 
 (Code;N;English_Name;Nom_franais;PVA;Date)
 
 
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

I could use a little help rendering this into French, lest I 
embarrass myself

The Property Value Alias is defined as part of the Unicode Standard 
and is provided informatively in the tables here to show how entries 
in the ISO 15924 code table relate to script names defined in 
Unicode.

--
Michael Everson * * Everson Typography *  * http://www.evertype.com

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Addison Phillips [wM]


 I don't care about the order, so long as it is stable over time.
Personally I find the
 latter form more logical (with the identifier, i.e. the code, first).

I agree with Addison here: the most important thing is stability, but it
makes sense that the first and second columns be the symbolic code and
the numeric code, especially if this is *the* plain-text version and
normative reference.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Peter Constable [EMAIL PROTECTED]
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 On Behalf
  Of Michael Everson

  I hope this satisfies you.
 http://www.unicode.org/iso15924/codelists.html

 If they are consistent and reliable, I'm satisfied with them. I hope you
 will be preparing a page for corrigenda / errata.

 It's not a big issue, but I don't understand why the dates don't match:
 was Arab added on January 9 or May 1? So, they're not entirely
 consistent.

 Also, it appears you have not fixed a serious error in the plain-text
 file: it is not well-structured. Some rows have 6 columns, and some have
 7.

No the structure is correct, however the text file was prepared by copy/pasting
HTML text inserted in empty cells, namely the nbsp; character reference (that
contains a syntaxic semicolon conflicting with the CSV separator). That's the
first thing I had signaled to Michael several days ago, and he has acknowledged
it and corrected it in its new update.

I have already signaled almost all bugs and inconsistencies to Michael, and
prepared corrected files.
Micheal has just changed the online version (but with the wrong dates...that's
irritating).

There is still a conflict of Code for Mandaean, is it Mand or Mnda?
- Table 1 (HTML by Code):
Mand;140;Mandaean;mandéen;;2004-05-01
- Table 2 (HTML by N°):
140;Mand;Mandaean;mandéen;;2004-05-01
- Table 3 (HTML by Name):
Mandaean;140;Mnda;mandéen;;2004-05-01
- same thing for Table 3 (plain-text by Name)
- Table 4 (HTML by Nom):
mandéen;140;Mnda;Mandaean;;2004-05-01
As Michael indicates that the plain-text should be the reference, then it
suggests Mnda and not Mand... But the new plain-text version uses Mand...
I had already signaled it in a past message... So it's also irritating.

RE: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 13:49 -0700 2004-05-20, Peter Constable wrote:
I agree with Addison here: the most important thing is stability, but it
makes sense that the first and second columns be the symbolic code and
the numeric code, especially if this is *the* plain-text version and
normative reference.
That's going to happen.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 Is the format order satisfactory? English_Name;Code;Nº;Nom_français;PVA;Date
 Or would it be preferable to have it in the
 format of Table 1
 (Code;Nº;English_Name;Nom_français;PVA;Date)

I vote for the order of table 1; the Code is the most important one, and the
start of line will have a fixed format, easing its parsing, or simply easing its
legibility for readers.

Make the plain-text normative and published online (out of a zip file), and make
the HTML pages only informative...

I have done another scripted page (with PHP; however the PHP generated page may
be stored in a cached static page if you don't want scripts on the Unicode
server itself) using the text file as the reference, to generate all the HTML
pages for browsing.

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 23:21 +0200 2004-05-20, Philippe Verdy wrote:
Micheal has just changed the online version (but with the wrong dates...that's
irritating).
Patience... Unchanged codes will retain 2004-05-01 as the starting 
date. Changed codes have (as of the current BETA draft which is 
uploaded for testing purposes now; look at it if you like to do that 
sort of thing) the date of 2004-05-20. If there are further changes 
before we go to RELEASE then I will adjust that date accordingly. OK?

There is still a conflict of Code for Mandaean, is it Mand or Mnda?
Mand.
As Michael indicates that the plain-text should be the reference, then it
suggests Mnda and not Mand... But the new plain-text version 
uses Mand... I had already signaled it in a past message... So 
it's also irritating.
They all say Mand now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Antoine Leca [EMAIL PROTECTED]
  Antoine Leca a crit :
 
  The French name for Hang looks strange. It happened to be hangul
  (hangul, hangeul) (after quite a bit of discussion.)

 Sorry guys. For reasons known to itself, my mailer refused to post in UTF-8
 this morning. I meant hangul(hangul, hangeul).

 According to a native ftp://dkuug.dk/ftp.anonymous/email/iso15924/277 the
 correct form are the ones between parenthesis (with an added apostrophe
 between han'gul).

 : From: Jian YANG [EMAIL PROTECTED]
 : Subject: Re: Re: (iso15924.275) Hangul (Hang~ul, Hangeul)
 :   as script name (~is  adiacritical mark)
 : Date: Mon, 29 May 2000 15:49:25 -0400
 :
 :
 : Hangeul = Norme de romanisation du Ministre de
 : l'ducation de la Core du Sud;
 : Hangul = Romanisation Mc-Cune-Reischauer (la forme exacte
 : est Han'gul : u with breve, et non caron; mais on a
 : enlev le signe diacritique pour accommoder la convention de
 : ascii, sans doute);


 On Thursday, May 20, 2004 3:51 PM, Patrick Andries va escriure:
 
  The name in ISO/CEI 10646 (F)  is  hangl   from a Corean dictionary
  and a Corean grammar published by the Inalco (Langues O').

 Clearly, the Langues'O did adapt it to French typographical possibilities,
 reversing the breve accent into a circumflex.

  Another
  suggested form in some sources, to appromixate the pronounciation.
  is  hangueul 

 This is the other form, with an added, euphonical u after the g, to avoid a
 complete misprononciation.

 About whether all this right or not, I do not know. But I believe this text
 did go through two ballots against the very people of Langues'O (?), so we
 have no reason to correct now what was accepted in the standard. The only
 choice right now is to type exactly what was printed, since I understand we
 do not have any more the master that served to the [F]DIS texts.

 Since I am not a member of TC46, and furthermore I was away from the process
 last year, I might very easily be wrong.

I see no real problem if not all the different orthographies are listed or if
they are not used universally. As long as the name is non ambiguous. What will
be important for interchange of data will not be this name but the Code (or N,
or even ID in UAX#24 properties).

So there's nothing wrong if Han'gul is shown to users without the prefered
apostrophe (I don't mean here not the single quote!), or with a caron or
circumflex instead of breve (to dapat to the rendering or encoding context in
which this name would be exposed to users), or even without any diacritic (my
opinion is that substituting a diacritic for another is worse than just removing
the diacritic that can't be displayed or encoded).

French normally has no caron and no breve, and the circumflex is used to mark a
slight alteration of the vowel because of an assimilated consonnant in the
historical orthograph (most often this circumflex in French denotes a lost s
after the vowel).

So the curcumflex on Hangul would be inappropriate for French, as well as
Hangeul (breaks the common reading rules). Hangul and Han'gul are more
acceptable, as well as Hangoeul with a oe ligature, or Han'goeul with an
additional apostrophe, which would have been even more accurate but have been
seen nowhere for now.

[Comments-OT]
The problem of apostrophes is that French keyboards don't have it, but only
have a single-quote. Handling the presence of quotes as meaning apostrophe is
limited in French to very few words as a mark of ellision of some characters,
not as a mark for the phonetic.
In Han'gul there's no ellision but its absence places a nasalisation of
the previous letter a. A solution would be to write Hanngul. However there
are now lots of proper names _ending_ in -an (such as Alan) for which the
nasalisation is easy to avoid by readers (so Han, i.e. the ideographic script
of Chinese, is appropriate in French, but not Hanzi, Hanja, or Hangul,
where almlost all native readers would not pronounce the n but would nasalize
the previous vowel a). The simplest solution to avoid nasalization is to place
another n after it in French (nazalisation never occurs with double-n in
French).
This would give in French: Hanngul (or Hanngoeul, or is it Hanngoul
?), Hannzi (to avoid pronounce it like in enzyme), Hannja (to avoid
pronounce it like in en japonais), but still Han (preferably to Hann)...
[/Comments-OT]

Philippe.

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy


Peter Constable wrote:
  Michael Everson wrote:
  Also, it appears you have not fixed a serious error in the
  plain-text file: it is not well-structured. Some rows have 6
  columns, and some have 7.
 
  That might be fixed in the newest one.

 It is not fixed in the file that's on the site now. If this is the
 normative file, I'd suggest you fix it as soon as possible.

This (below) is my own plain text version (still using the field and row order
of table 3 by english name, instead of the order of table 1 by code)... Some
entries are commented out with %.

Philippe.

---
% The format is Name;Code;N;Nom;ID;Date

% Codes for the representation of names of scripts
% Codes pour la reprsentation des noms dcritures

% Alphabetical list of English script names

English_Name;Code;N;Nom_franais;ID;Date
(alias for Hiragana + Katakana);Hrkt;412;(alias pour hiragana +
katakana);Katakana_Or_Hiragana;2004-05-01
Arabic;Arab;160;arabe;Arabic;2004-05-01
Armenian;Armn;230;armnien;Armenian;2004-05-01
Balinese;Bali;360;balinais;;2004-05-18
Batak;Batk;365;batak;;2004-05-01
Bengali;Beng;325;bengal;Bengali;2004-05-01
Blissymbols;Blis;550;symboles Bliss;;2004-05-01
Bopomofo;Bopo;285;bopomofo;Bopomofo;2004-05-01
Brahmi;Brah;300;brhm;;2004-05-01
Braille;Brai;570;braille;Braille;2004-05-01
Buginese;Bugi;367;bouguis;;2004-05-01
Buhid;Buhd;372;bouhide;Buhid;2004-05-01
Cham;Cham;358;cham (am, tcham);;2004-05-01
Cherokee;Cher;445;tchrok;Cherokee;2004-05-01
Cirth;Cirt;291;cirth;;2004-05-01
Code for uncoded script;Zzzz;999;codet pour criture non code;;2004-05-01
Code for undetermined script;Zyyy;998;codet pour criture
indtermine;;2004-05-01
Code for unwritten languages;Zxxx;997;codet pour les langues non
crites;;2004-05-01

% Still missing...
%Coptic;copt;201;copte;;2004-05-20

Cuneiform, Sumero-Akkadian;Xsux;020;cuniforme sumro-akkadien;;2004-05-01
Cypriot;Cprt;403;syllabaire chypriote;Cypriot;2004-05-01
Cyrillic;Cyrl;220;cyrillique;Cyrillic;2004-05-01
Cyrillic (Old Church Slavonic variant);Cyrs;221;cyrillique (variante
slavonne);;2004-05-01
Deseret (Mormon);Dsrt;250;dseret (mormon);Deseret;2004-05-01
Devanagari (Nagari);Deva;315;dvangar;Devanagari;2004-05-01
Egyptian demotic;Egyd;070;dmotique gyptien;;2004-05-01
Egyptian hieratic;Egyh;060;hiratique gyptien;;2004-05-01
Egyptian hieroglyphs;Egyp;050;hiroglyphes gyptiens;;2004-05-01
Ethiopic (Geez);Ethi;430;thiopique (thiopien, geez);Ethiopic;2004-05-01

% Why was this removed? Wasn't it present for bibliographic references?
%Georgian (Asomtavruli);Geoa;241;gorgien (assomtavrouli);;2004-05-18

Georgian (Mkhedruli);Geor;240;gorgien (mkhdrouli);Georgian;2004-05-18
Glagolitic;Glag;225;glagolitique;;2004-05-01
Gothic;Goth;206;gotique;Gothic;2004-05-01
Greek;Grek;200;grec;Greek;2004-05-01
Gujarati;Gujr;320;goudjart (gujrt);Gujarati;2004-05-01
Gurmukhi;Guru;310;gourmoukh;Gurmukhi;2004-05-01
Han (Hanzi, Kanji, Hanja);Hani;500;idogrammes han;Han;2004-05-01
Han (Simplified variant);Hans;501;idogrammes han (variante
simplifie);;2004-05-01
Han (Traditional variant);Hant;502;idogrammes han (variante
traditionelle);;2004-05-01

% This should better be:
%Hangul (Hangl, Hangeul);Hang;286;hangul (hangul);Hangul;2004-05-01
Hangul (Hangl, Hangeul);Hang;286;hangul (hangl, hangeul);Hangul;2004-05-01

Hanuno;Hano;371;hanouno;Hanunoo;2004-05-01
Hebrew;Hebr;125;hbreu;Hebrew;2004-05-01
Hiragana;Hira;410;hiragana;Hiragana;2004-05-01
Indus (Harappan);Inds;610;indus;;2004-05-01
Javanese;Java;361;javanais;;2004-05-18
Kannada;Knda;345;kannara (canara);Kannada;2004-05-18
Katakana;Kana;411;katakana;Katakana;2004-05-01
Kayah Li;Kali;357;kayah li;;2004-05-01
Kharoshthi;Khar;305;kharochth;;2004-05-18
Khmer;Khmr;355;khmer;Khmer;2004-05-18
Lao;Laoo;356;laotien;Lao;2004-05-01
Latin;Latn;215;latin;Latin;2004-05-01
Latin (Fraktur variant);Latf;217;latin (variante brise);;2004-05-01
Latin (Gaelic variant);Latg;216;latin (variante galique);;2004-05-01
Lepcha (Rng);Lepc;335;lepcha (rng);;2004-05-01
Limbu;Limb;336;limbou;Limbu;2004-05-18
Linear A;Lina;400;linaire A;;2004-05-01
Linear B;Linb;401;linaire B;Linear_B;2004-05-18
Malayalam;Mlym;347;malaylam;Malayalam;2004-05-01
Mandaean;Mnda;140;manden;;2004-05-01
Mayan hieroglyphs;Maya;090;hiroglyphes mayas;;2004-05-01
Meroitic;Mero;100;mrotique;;2004-05-01
Mongolian;Mong;145;mongol;Mongolian;2004-05-01
Myanmar (Burmese);Mymr;350;birman;Myanmar;2004-05-01
Ogham;Ogam;212;ogam;Ogham;2004-05-01
Old Hungarian;Hung;176;ancien hongrois;;2004-05-01
Old Italic (Etruscan, Oscan, etc.);Ital;210;ancien italique (trusque, osque,
etc.);Old_Italic;2004-05-18
Old Permic;Perm;227;ancien permien;;2004-05-01
Old Persian;Xpeo;030;cuniforme perspolitain;;2004-05-01
Oriya;Orya;327;oriy;Oriya;2004-05-01
Orkhon;Orkh;175;orkhon;;2004-05-01
Osmanya;Osma;260;osmanais;Osmanya;2004-05-01
Pahawh Hmong;Hmng;450;pahawh hmong;;2004-05-01
Phoenician;Phnx;115;phnicien;;2004-05-01
Pollard Phonetic;Plrd;282;phontique de Pollard;;2004-05-01
Reserved for private use (start);Qaaa;900;rserv  lusage priv
(dbut);;2004-05-18

RE: ISO 15924 draft fixes

2004-05-20 Thread Peter Constable

 From: Philippe Verdy [mailto:[EMAIL PROTECTED]

 No the structure is correct, however the text file was prepared by
copy/pasting
 HTML text inserted in empty cells, namely the nbsp; character
reference (that
 contains a syntaxic semicolon conflicting with the CSV separator).

IMO, the structure of data is effectively determined by how processes
will interpret the data. A process won't see 6 columns one of which
contains nbsp;. It will see seven columns one of which contains
nbsp.

He's said the file has been fixed (though I don't know if he's posted
the fixed file).



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Re: ISO 15924 draft fixes

2004-05-20 Thread Michael Everson

At 00:05 +0200 2004-05-21, Philippe Verdy wrote:
This (below) is my own plain text version (still using the field and row order
of table 3 by english name, instead of the order of table 1 by code)... Some
entries are commented out with %.
The RA has no intention whatsoever of making use of this file. Absolutely not.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 I could use a little help rendering this into French, lest I
 embarrass myself

 The Property Value Alias is defined as part of the Unicode Standard
 and is provided informatively in the tables here to show how entries
 in the ISO 15924 code table relate to script names defined in
 Unicode.

Tip: French translation is:
Le synonyme de valeur de propriété est défini au sein du Standard Unicode
et est fourni ici de façon informative dans les tables, afin de montrer comment
les entrées des tables de codets ISO 15924 correspondent aux noms de scripts
définis dans Unicode.
(there should be a reference to the PropertyValueAliases.txt file in the
UCD, and the section in the UTS or its annexes that describes this UCD text
file.)

It's true that the PropertyValueAliases.txt file in the UCD already contains
long aliases for the shorter ISO-15924 codes:

(...)
sc ; Arab  ; Arabic
sc ; Armn  ; Armenian
(...)
sc ; Zyyy  ; Common
(...)

It's true that this same file does not list all possible values (the long value
Inherited has no other alias defined in that file).
May be this file in the UCD could list also the ISO-15924 numeric codes, but
there's no obligation to add them there. Simply the existence of the sc: ...
lines are enough to indicate that the prefered alias is the ISO-15924 code when
it exists, so that Arab is prefered to Arabic, or Linb is prefered to
Linear_B.

With regards to semantics however, there's no difference between Arab and
Arabic, or between Linb and Linear_B, meaning that these values are in the
same value space. That's a good reason to not pollute that value space with new
long uneeded aliases. The long aliases only exist for legacy reasons, also in
Unicode, and the ID column in ISO-15924 tables is mostly informative, and
should not be normative.

This ID column in ISO-15924 already has the semantics of a Unicode Script
Property Value Alias, but it could be any other alias needed for some other
legacy applications. I just wonder why this column was placed there, before the
Date column that is required, given that there may possibly exist several legacy
aliases to list in ISO-15924, and defined in other standards than Unicode.

If you want to keep a master table for the long term, I would either drop this
ID column, or put it at end of the row, after the Date field (so that more than
1 alias could be added to each code; For example, there are some numeric script
ids defined in OpenType and that could be listed as X_OT_17, if they are bound
directly to standard script codes)

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Peter Constable [EMAIL PROTECTED]
  From: Philippe Verdy [mailto:[EMAIL PROTECTED]

  No the structure is correct, however the text file was prepared by
 copy/pasting
  HTML text inserted in empty cells, namely the nbsp; character
 reference (that
  contains a syntaxic semicolon conflicting with the CSV separator).

 IMO, the structure of data is effectively determined by how processes
 will interpret the data. A process won't see 6 columns one of which
 contains nbsp;. It will see seven columns one of which contains
 nbsp.

 He's said the file has been fixed (though I don't know if he's posted
 the fixed file).

It's not fixed in the zipped archive linked from the ISO 15924/RA web pages (no
changed occured for now for this download), but it is fixed in the corrected
archive that Michael indicated here:
http://www.unicode.org/iso15924/iso15924-fixes.zip
(this link is not published officially for now, because Michael wanted comments
about it before, thanks because it was still not perfect)

Michael has started the corrections in the HTML tables 1 and 2, but table 3 (and
its downlodable alternative plain-text version) and table 4 are still not
corrected.

I said this was lots of files to change, but in fact all can be done with one
spreadsheet saved into 5 files. Michael could also have used a very basic
database application (an Access or FileMaker or dBase or Paradox database, with
1 table and 5 query-views, or other similar tools that each programmer or data
maintainer should have to perform easily such basic task without lots of manual
editing, and even without programming a script).

Re: ISO 15924 draft fixes

2004-05-20 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 At 00:05 +0200 2004-05-21, Philippe Verdy wrote:

 This (below) is my own plain text version (still using the field and row
order
 of table 3 by english name, instead of the order of table 1 by code)... Some
 entries are commented out with %.

 The RA has no intention whatsoever of making use of this file. Absolutely not.

OK. But you have also argumented incorrectly to oppose one of my questions
related to the Common script ID, when I was asking to what Common and
Inherited (defined in UAX#24) corresponded in ISO-15924. If I look at the
standard Property Values Aliases defined in the UCD files, I see this rule:

sc ; Zyyy  ; Common

Clearly it states that Common is an alias of the Zyyy script code.
So the ISO-15924 tables should reflect it in their ID columns. For example in
Table 1 (list by code):

Zyyy;998;Code for undetermined script;codet pour écriture
indéterminée;Common;2004-05-01

If your arguments related to the usage of the ISO-15924 Zyyy;998 codes are
valid, and differ from the definition of the Common script ID in UAX #24, then
there's a problem in the definition of the PropertyValueAliases.txt file in the
UCD 4.0 and the sc ; Zyyy ; Common line should be removed... This will require
an amendment to Unicode.

Re: ISO 15924 draft fixes

2004-05-19 Thread Philippe Verdy

I see some differences

- For Georgian, your new file contains only:
Georgian (Mkhedruli);Geor;240;géorgien (mkhédrouli);Georgian;2004-05-18
But the previous version also contained in one of the online tables:
Georgian (Asomtavruli);Geoa;242;géorgien (assomtavrouli);Georgian;2004-01-05

- Where is this line?:
Syloti Nagri;Sylo;316;sylotî nâgrî;;2004-09-01

Limbu has been adjusted to a more appropriate numeric code within South-Asian
scripts (401 to 336).

I also think that the removal of duplicate rows for English or French name
aliases was a good decision (after all the aliases are already listed between
parentheses). I also think that slpitting the line for the start end end codes
of private scripts was a good idea.

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 10:40 PM
Subject: ISO 15924 draft fixes


 The Registrar wishes to thank everyone who has taken an interest in
 the ISO 15924 data pages, and regrets the imperfections which are
 contained there. I am not sure how we will manage the generation of
 the pages, but it is clear that the base should be the plain-text
 document.

 I have made changes to the plain-text document and placed it, a draft
 Changes page, and the original plain-text document available at
 http://www.unicode.org/iso15924/iso15924-fixes.zip

 I would appreciate it if interested persons could look this over and
 inform me if they find any further discrepancies between the two
 which are worth troubling about. Then we will proceed to generate the
 other files.

 I deleted some duplicate lines: Ethiopic was on two lines, under
 Ethiopic and under Ge'ez. It seemed inappropriate to burden the
 tables with such duplication.

 I added Coptic unilaterally.
 -- 
 Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-19 Thread Philippe Verdy

I note also that the list of change (the HTML file in your archive) does not
include the change of orthograph in English names for consonnants with dots
below (such as malalayam). As this ISO-15924 standard should make the English
and French names unambiguous, their orthograph is important.

- Original Message - 
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 10:40 PM
Subject: ISO 15924 draft fixes

 The Registrar wishes to thank everyone who has taken an interest in
 the ISO 15924 data pages, and regrets the imperfections which are
 contained there. I am not sure how we will manage the generation of
 the pages, but it is clear that the base should be the plain-text
 document.

 I have made changes to the plain-text document and placed it, a draft
 Changes page, and the original plain-text document available at
 http://www.unicode.org/iso15924/iso15924-fixes.zip

Re: ISO 15924 draft fixes

2004-05-19 Thread Michael Everson

At 01:08 +0200 2004-05-20, Philippe Verdy wrote:
I see some differences
- For Georgian, your new file contains only:
Georgian (Mkhedruli);Geor;240;géorgien (mkhédrouli);Georgian;2004-05-18
But the previous version also contained in one of the online tables:
Georgian (Asomtavruli);Geoa;242;géorgien 
(assomtavrouli);Georgian;2004-01-05
That's correct. Asomtavruli has been deleted for now.
- Where is this line?:
Syloti Nagri;Sylo;316;sylotî nâgrî;;2004-09-01
A new script? Oh, it's in the old file and not in 
the new one? It, Coptic, and Phags-pa need to be 
in the list (they are all under ballot).

Limbu has been adjusted to a more appropriate numeric code within South-Asian
scripts (401 to 336).
Error corrected.
I also think that the removal of duplicate rows for English or French name
aliases was a good decision (after all the aliases are already listed between
parentheses).
No, it would allow a huge number of aliases. 
People can search the online files with command-F 
or control-F.

I also think that slpitting the line for the start end end codes
of private scripts was a good idea.
It wasn't mine. I forget whose it was, but it 
makes the tables print more nicely.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-19 Thread Michael Everson

At 01:26 +0200 2004-05-20, Philippe Verdy wrote:
I note also that the list of change (the HTML file in your archive) does not
include the change of orthograph in English names for consonnants with dots
below (such as malalayam). As this ISO-15924 standard should make the English
and French names unambiguous, their orthograph is important.
I understand that there are many problems with the online files; I 
made a comparison only with the plain-text files, and Malayalam was 
not spelled differently in that file, so I judged it irrelevant to 
the task of correcting the basic database.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: ISO 15924 draft fixes

2004-05-19 Thread Philippe Verdy

From: Michael Everson [EMAIL PROTECTED]
 - Where is this line?:
  Syloti Nagri;Sylo;316;sylotî nâgrî;;2004-09-01

 A new script? Oh, it's in the old file and not in
 the new one? It, Coptic, and Phags-pa need to be
 in the list (they are all under ballot).

It was in the previous list (see the online HTML table 2).
Who decides for the addition of scripts in ISO-15924? I thought there was a
separate technical commity and that you were just the bookkeeper of the
decisions made by this sub-commitee. It can't be Unicode's UTC alone, as there
are already codes for bibliographic references that are not (and will never) be
encoded separately in Unicode,so I suppose that there are librarian or
publishers members with which you have to discuss, independantly of the work of
Unicode, which should only be the registrar for these codes. May be there's
still no formal procedure, and for now the codes are maintainable without lots
of administration.

Do you want a script that generate HTML tables from the reference text file?
I'm not an expert in Perl, but my knowledge of PHP or awk is enough to create
it.
Or may be a simple Javascript could generate the presentation in browsers.
I suggest you use a spreadsheet for now to allow sorting or moving columns.

One final note: there's still a missing closing parenthese in a French name 
latin (variante brisée  for the Fraktur script.

Re: ISO 15924 draft fixes

2004-05-19 Thread Michael Everson

At 03:28 +0200 2004-05-20, Philippe Verdy wrote:
It was in the previous list (see the online HTML table 2).
What does that refer to?
Who decides for the addition of scripts in ISO-15924?
The ISO 15924 RA-JAC.
I thought there was a separate technical commity 
and that you were just the bookkeeper of the 
decisions made by this sub-commitee.
With regard to Coptic, and the need to sort out 
the initial difficulties we are having, it seems 
prudent that I do what is necessary to correct 
faults. It is unlikely that the RA-JAC will 
object to this.

It can't be Unicode's UTC alone, as there are 
already codes for bibliographic references that 
are not (and will never) be encoded separately 
in Unicode,so I suppose that there are librarian 
or publishers members with which you have to 
discuss, independantly of the work of Unicode, 
which should only be the registrar for these 
codes. May be there's still no formal procedure, 
and for now the codes are maintainable without 
lots of administration.
Read the standard.
Do you want a script that generate HTML tables from the reference text file?
No. We will handle that in due course.
One final note: there's still a missing closing parenthese in a French name 
latin (variante brisée  for the Fraktur script.
I think that has been corrected by now.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

66 matches

Mail list logo