Around 16 o'clock on Jul 6, David Starner wrote:

> These aren't that useful, but
> vo (Volapük): a ä b c d e f g h i j k i m n o ö p r s t u ü v x y z.
>... 
> chr (Cherokee):

I'd like a complete set of 639-1 languages and Volapük is a welcome 
addition.  I'll also start adding the 639-2 languages as I receive them, 
but I don't expect to get a complete set of those any time soon.

> Punctuation (not listed for Dutch?) is the same as German.

The goal is to list only the alphabet, abjad or logography needed to 
represent the complete language; punctuation has too many possible 
encodings and might accidentally mischaracterize some fonts.  One 
outstanding question is whether we should include numerals; I'm willing to 
listen to arguments on both sides of that issue.

For logographic languages, I'm using standard encodings and stripping out 
non-language specific bits.  So far, that's working pretty well, but I may 
want to reduce the sets some to make sure I don't miss any fonts.  Of 
course, the key is to include codepoints not generally included in fonts 
for other languages.  That's been less successful -- the simplified 
chinese font 'simsun' contains every Han codepoint in Big5.  Again, we're 
fortunate that more and more fonts are pre tagged with OS/2 language 
tables from which we can deduce intended language targets far more 
accurately.

Keith Packard        XFree86 Core Team        HP Cambridge Research Lab


_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to