RE: sort of OT: politics and scripts

2000-11-22 Thread Erland Sommarskog

Cathy Wissink [EMAIL PROTECTED] writes:
 The Soviet language policies under both Lenin and Stalin were amazing in
 what they managed to change in a very short time, especially considering the
 scripts first shifted from Arabic to Latin, then just a decade or so later
 to Cyrillic.  I too have been wondering when there would be a movement in
 the post-Soviet, Central Asian countries away from Cyrillic; my assumption
 has always been that they would want to return to Arabic (or for others,
 back to their indigenous scripts).

 Surprisingly, however, in our NLS implementation, the movement is away from
 Cyrillic, as you noted, but towards Latin rather than Arabic.

The answer does of course lies in the reform imposed by Mustafa Kemal
in Turkey. Turkey is naturally the leading state in Turkic world, so
it's natural to turn to Turkey to get an alphabet.

We've seen this in Azeri and Uzbek,

The most recent to announces a change was Tatarstan.
--
Yours sincerely,
Erland Sommarskog
[EMAIL PROTECTED]



Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Lukas Pietsch

Dear all,

a lot was said in this thread about intelligent rendering mechanisms, such
as fonts implementing automatic glyph substitution and things like that. The
notion appears to be quite commonplace to the experts, whereas I (being an
amateur) must admit it seemed just like a utopic dream to me when I first
heard of the possibility of such a thing, a few months ago. I figure that
people are mostly thinking of the technology called "Open Type", is that
right?

Can anybody enlighten me about how much support for that technology is
already available in standard software, say, in browsers or text processors
under Windows 9x? If I had a True-Type font that implemented the glyph
substitutions, say, for the Greek combining diacritics, could I make my
average standard word processing software actually use these features? Or
would I have to wait for specialized multilingual word processors to appear
on the market?

I found the documentation of the "GetCharacterPlacement" function in the
Windows API. It looks like that was the place were these things should be
implemented system-wide. But I played with it a bit and found it didn't
actually do any glyph replacements. Is that function actually implemented in
Win98, or is it just a stub? Or did I make a mistake in my testing, or is
something wrong with my system? Can Win2000 do more than Win98 in this
respect?

I also noticed that MS Internet Explorer does use glyph replacement features
on my system when it is displaying Arabic. How does it do that? Would there
be a way of making it use other Open-Type features too?




Lukas Pietsch
Ferdinand-Kopf-Str. 11
D-79117 Freiburg
Tel. 0761-696 37 23

Universität Freiburg
Englisches Seminar




Re: Greek Prosgegrammeni

2000-11-22 Thread Lukas Pietsch

Thanks to Asmus and Kenneth for their clarifying comments. Things are
beginning to seem to make sense to me... (:-)

Especially, I'm quite relieved to see now that:
- for any one of the common printing variants of mute iota that a user might
want to see,
- there is already at least one easily available truetype font, so that
- even *without* special glyph shaping or glyph substitution mechanisms in
display,
- there will be at least one way of encoding that will be stable, in the
sense that it will guarantee the desired display and not get corrupted when
undergoing canonical composition/decmposition;
and, most importantly:
- all these encodings will be recognized as equivalent by Unicode
applications when it comes to case-insensitive matching (because all these
character sequences case-fold to the same sequence of vowel + small iota
(03B9)). That's something, isn't it?

What will *not* work, for most users, is automatic case *conversion*. This
will lead to undesired or unexpected results in most cases. But there are
other independent reasons for that anyway: For most users, correct
uppercasing also involves the stripping of accents and breathings, and the
Unicode casing rules don't provide for that either. But then again: who
wants to use automatic case conversion for polytonic Greek anyway? (I can
hardly remember having ever used it even in the Latin script in all the text
processing I've done.) People will simply be typing sequences that Unicode
will see as irregular mixed-case strings, but who cares? I guess all the
computational features that really matter to most of us common mortals (like
sorting, word searches etc.) involve the "case-folding" feature used for
case-insensitive matching, and as I said above, this seems to work out in a
fairly intuitive and sensible way.

So, after all, the UTC people do deserve a pat on the back for their good
work? (:-)

I have another ignorant layman's sort of question, but I'll put it into a
second message because it really consitutes a different topic.

Lukas





Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Marco Cimarosti

Lukas Pietsch wrote:
 a lot was said in this thread about intelligent rendering
 mechanisms, [...]
 I figure that people are mostly thinking of the technology
 called "Open  Type", is that right?

Right, but quite partial. There are several major technologies for rendering
"complex Unicode scripts".

Here are some of the principal ones:

- Open Type itself (see in http:/www.microsoft.com).
The "font-specific intelligence" is in the font itself; the "generic script
intelligence" is in a software component called UniScribe.

- AAT/ATSUI (see in http:/www.apple.com).
Most of the "intelligence" is in the font itself, which also includes a
state machine to operate substitution. The behavior of the smart fonts may
be influenced by external user settings.

- Graphite --my favorite, so far-- (see in http:/www.sil.org).
Takes a "stupid" TrueType font and merges it with the "intelligence" written
in an ad-hoc description language (GDL), to produce an "intelligent" font
quite similar to AAT/ATSUI. The accent is on extendability and, specially,
in supporting the Private User Area (which is a precious resource for
linguistic research and defining new orthographies).

- Omega (http://omega-system.sourceforge.net).
Built on top of the old and glorious TeX typesetting system. It may becaome
(or already is?) the standard for Unicode in Linux.

- More...
Other projects are ongoing, with a variety of approaches, philosophies,
scopes, applications.

OTH
_ Marco

__
La mia e-mail è ora: My e-mail is now:
   marco.cimarostiªeurope.com   
(Cambiare "ª" in "@")  (Change "ª" to "@")
 

__
FREE Personalized Email at Mail.com
Sign up at http://www.mail.com/?sr=signup



Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread David Starner

On Wed, Nov 22, 2000 at 04:19:42AM -0800, Marco Cimarosti wrote:
 - Omega (http://omega-system.sourceforge.net).
 Built on top of the old and glorious TeX typesetting system. It may becaome
 (or already is?) the standard for Unicode in Linux.

I've never seen Omega used under Linux, nor have I found any good (English)
documentation for it, although it is shipped with tetex and hence with Debian
and probably other Linux distributions. FreeType seems to support OpenType
fonts.  Pango (http://www.pango.org) apparently is going to use FreeType at
some point, but is currently hacking some complex script support into bdf
(http://www.wholehog.fsnet.co.uk/robert/indic/fonts.html).

-- 
David Starner - [EMAIL PROTECTED]
http://dvdeug.dhis.org
Looking for a Debian developer in the Stillwater, Oklahoma area 
to sign my GPG key



Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Lukas Pietsch

John Hudson wrote:

 At present, polytonic Greek is not supported in Uniscribe,
 I suspect because no one has determined that it needs to be.

So, would you agree that it does need to be? Keeping in mind what Kenneth
Whistler wrote:

 Not if the fonts they use map capital letter + ypogegrammeni character
 combinations into capital letter + full-size iota glyph sequences.

 Of course, if the fonts they use are not designed for correct use with
 polytonic Greek, then the default rendering behavior of the ypogegrammeni
 will not be what they expect or want. Time to upgrade the fonts.
...
 This is not all that sophisticated. It should be a matter that can be
 wholly encapsulated within the fonts:

 Font IFont II

 A. 0397 0313 0345  ==  'H iota adscript  'H iota subscript

 B. 1F98==  'H iota adscript  'H iota subscript
 ...
 Many of us have felt all along that polytonic Greek should always be
 represented decomposed, and that the ELOT polytonic "character" encoding
 was a dangerous conflation of glyph design and character encoding
concerns.
...

 Implementations that use full decomposition for polytonic Greek and fonts
 that correctly map the accentual and diacritic combinations are the
 best bet for consistency *and* good presentation in the long run.


Mind that the case-mapping question we were discussing is just one minor
aspect of the issue; the main task is much more general, and at the same
time more straightforward (If we leave aside the issue of automatic case
conversion and the fancy problems of, let's say, small-caps): the decomposed
character sequences simply need to be mapped to the precomposed ones. It
affects not only the iota subscripts/adscripts but also all the other
diacritics. Without some glyph processing most combinations will never
display readably. Since the precomposed glyphs already exist as Unicode
codepoints, I suppose that the implementation would probably not even be
very difficult, and not much of it would even depend on the individual font,
would it?

By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have
the precomposed characters in Unicode in the first place. I'm very glad they
are there, since, as we see, the beautiful smart rendering features we are
talking about are simply not yet available in mainstream text processing
software. Much as I like the idea of the projects such as "Graphite" that
Marco mentioned, I do think there are quite a number of people out here who
would love to be able to handle Greek comfortably in their everyday
all-purpose text-processing and browsing software. The precomposed
characters are at present the only means they have to do so on a Windows
platform. Adding smart rendering support for the decomposed characters would
provide them with a much better means; I'd certainly agree with Kenneth
about that. And I'd also think it would be preferable if that could be done
system-wide and not just by some individual application, wouldn't it? So it
seems as if Uniscribe looks like the best bet at the moment, for Windows
users.

What do the Microsoft people think? May we hope?



Lukas




Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread John Hudson

At 08:05 AM 11/22/2000 -0800, Lukas Pietsch wrote:

Mind that the case-mapping question we were discussing is just one minor
aspect of the issue; the main task is much more general, and at the same
time more straightforward (If we leave aside the issue of automatic case
conversion and the fancy problems of, let's say, small-caps): the decomposed
character sequences simply need to be mapped to the precomposed ones. It
affects not only the iota subscripts/adscripts but also all the other
diacritics. Without some glyph processing most combinations will never
display readably. Since the precomposed glyphs already exist as Unicode
codepoints, I suppose that the implementation would probably not even be
very difficult, and not much of it would even depend on the individual font,
would it?

Mapping decomposed character sequences to precomposed is not something that
necessarily needs to be done in a font, or even in a script shaping engine
like those in Uniscribe. This could be handled entirely at the IME level
(e.g. as a simple extension of keyboard input). Font level glyph processing
is particularly adapt at handling character-to-glyph and glyph-to-glyph
manipulations, character-to-character manipulations can be handled almost
anywhere in an input process.

By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have
the precomposed characters in Unicode in the first place. I'm very glad they
are there, since, as we see, the beautiful smart rendering features we are
talking about are simply not yet available in mainstream text processing
software. 

The counter argument could be made: that if Unicode had not accepted so
many precomposed diacritic characters, especially in the Latin blocks,
smart rendering software would have become mainstream much sooner. It is
unfortunately true that, if smart rendering were necessary to process
German and French, it would have been a priority many years ago.

John Hudson


Tiro Typeworks | 
Vancouver, BC  | All empty souls tend to extreme opinion.
www.tiro.com   |   W.B. Yeats
[EMAIL PROTECTED]| 



Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Peter_Constable


Let me add a little to what Marco has written:


- Open Type itself (see in http:/www.microsoft.com).
The "font-specific intelligence" is in the font itself; the "generic
script
intelligence" is in a software component called UniScribe.

OpenType provides partial support for complex script rendering. It is
dependent upon software to interpret the font-specific information in the
OT tables in an OT font, and to also take care of some rendering issues
which OT itself does not address (e.g. reordering as needed for Indic).
These things can be handled directly by an application. MS has also
provided the Uniscribe engine for this purpose, however. (There are some
aspects of OT support related to fine typography that Uniscribe does not
address. Uniscribe is intended eventually to provide adequate support for
complex script rendering, however.)

On Win9x/Me and on WinNT4, Uniscribe support must be explicitly written
into an app; i.e. an app must explicitly call the Uniscribe engine to take
advantage of its benefits. Word 2000 does this, for example, to handle
Arabic, but it does not do this for Thai (except in the S. Asia version of
Word 2000). In contrast, on Win2000, all Win32 text drawing interfaces make
use of Uniscribe. Thus, *any* app running on Win2000 benefits from
Uniscribe.

As has been mentioned, current versions of Uniscribe provide support for
some scripts but not others. Work is being done to extend the selection of
scripts that are supported. Currently, polytonic Greek is not supported,
but it will be supported in the future. New updates of the Uniscribe engine
will appear next year with Office 10 and with Whistler (apparently Win2000
consumer version) or with other updates to Windows, Office or Internet
Explorer. I have no idea what new script support will appear when. I just
know that more is coming.

OT implementations are being done for Mac and Unix/Linux. On the Mac side,
Apple reps have made statements that suggest that they would incorporate
system-level support for the aspects of complex rendering that OT itself
doesn't provide (i.e. they'd write something comparable to Uniscribe). On
Unix/Linux, I'm not sure what is being done about providing the support
that OT itself lacks.



- AAT/ATSUI (see in http:/www.apple.com).
Most of the "intelligence" is in the font itself, which also includes a
state machine to operate substitution. The behavior of the smart fonts may
be influenced by external user settings.

Essentially, all of the intelligence is in the font. (There is an external
engine that runs the state tables in the font, but that's a generic engine
- all the behaviour is embodied in the state tables in the font). Thus,
complex script rendering for polytonic Greek (for example) is available if
a system has an AAT font that implements support for that script. In order
to take advantage of that capability, however, an application must be
written to use the ATSUI text drawing interfaces rather than the older
QuickDraw interfaces. Developers have been slow on the uptake, but Apple
has been working hard to make it easier for developers to support these
interfaces.


- Graphite --my favorite, so far-- (see in http:/www.sil.org).
Takes a "stupid" TrueType font and merges it with the "intelligence"
written
in an ad-hoc description language (GDL), to produce an "intelligent" font
quite similar to AAT/ATSUI. The accent is on extendability and, specially,
in supporting the Private User Area (which is a precious resource for
linguistic research and defining new orthographies).

The font technology itself is indeed very much like AAT, though there are
some differences. The existence of GDL is an important difference, though I
wouldn't have called it an "ad-hoc" language. It is a carefully designed
high-level language intended to deal specifically with the kinds of issues
involved in complex scripts. Graphite also relies on a generic run-time
engine that interprets the state tables that are added to the font, and
also requires applications to be written using special interfaces that call
upon that engine. There is not yet support for this outside of SIL that I
know of, though many have expressed interest. In particular, there has been
a lot of interest in seeing this technology implemented for the Unix/Linux
environment.


- Omega (http://omega-system.sourceforge.net).
Built on top of the old and glorious TeX typesetting system. It may
becaome
(or already is?) the standard for Unicode in Linux.

Whatever Omega does or doesn't do, I wouldn't categorize it as a general
script rendering system like AAT, OT/Uniscribe and Graphite. It is an
end-user application, not a system extension for complex script support. I
suppose you could write an app that only output text by generating TeX
source and processing it via Omega, but I wouldn't expect to find much of a
market for such an app.

The other platform of potential interest is Java. Sun has been working on
providing complex script support in Java 2. 

Re: Open-Type Support (was: Greek Prosgegrammeni)

2000-11-22 Thread Jungshik Shin

On Wed, 22 Nov 2000, John Hudson wrote:

 At 08:05 AM 11/22/2000 -0800, Lukas Pietsch wrote:

 By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have
 the precomposed characters in Unicode in the first place. I'm very glad they
 are there, since, as we see, the beautiful smart rendering features we are
 talking about are simply not yet available in mainstream text processing
 software.

 The counter argument could be made: that if Unicode had not accepted so
 many precomposed diacritic characters, especially in the Latin blocks,
 smart rendering software would have become mainstream much sooner. It is
 unfortunately true that, if smart rendering were necessary to process
 German and French, it would have been a priority many years ago.

I agree with you on this point.  I guess this is kind of 'kitchen
and egg' issue. Let me draw another example from Korean Hangul. If
Unicode/ISO-10646 had just a subset of precomposed syllables (perhaps
2350 of them from KS X 1001) and left out the rest (some 8000 of them
for modern Korean) to be composed out of Jamos(alphabets) in U1100
block, we would be more(though not very much more) likely to have
rendering infrastructure on major platforms that can offer 'beautiful
rendering features' for Hangul (which is essential for the full support
of modern, let alone medivial, Korean).  (I'm well aware that Korean
delegation adamantly insisted that all 11,172 of them be included, but
in retrospect...) And, the same might be true of Greek and other
scripts for which both precomposed characters and 'component' characters
(decomposed) are available.

Jungshik Shin




Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Thomas Chan

On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote:

 If the difference between "A" and "a" is called "case",
 what is the difference between HIRAGANA LETTER YA
 and KATAKANA LETTER YA called? (I think either of
 those letters would do to describe this with the
 new code pages. The description would be enhanced
 by liberal application of HIRAGANA-KATAKANA LONG
 VOWEL MARK.)

Maybe you should also be asking what the difference between U+0041 LATIN
CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, and U+0410 CYRILLIC
CAPITAL LETTER A is called.
 
However, although U+3084 HIRAGANA LETTER YA and U+30E4 KATAKANA LETTER YA
are both derived from U+4E54 (the former from a cursive form; the latter
from a simplification of the print form), it doesn't hold for most other
kana, such as U+3042 HIRAGANA LETTER A and U+30A2 KATAKANA LETTER A, which
are derived from a cursive form of U+5B89 and a simplification of the
print form of U+963F, respectively.

I don't get what you mean by "new code pages".  Who's creating those
anymore?
 
Hiragana, unlike katakana, doesn't use U+30FC KATAKANA-HIRAGANA
PROLONGED SOUND MARK for writing long vowels.  (Why does it have this name
in Unicode?)  What's this "HIRAGANA-KATAKANA LONG VOWEL MARK"?--I see no
such thing.

 
 I like "Astral Planes" better.
 Will they include INUKTITUT VIGESIMAL DIGITs?

I don't.  I write in Cantonese and some of contents of Plane 2 are very
much down-to-earth for me.  Are you a musician?  If so, then Plane 1 would
be important to you, too.

Throwing around terms like "Astral Planes", whether official or not, will
just engender lack of credibility for Unicode, which has already happened
to some extent among people who heard about some "Klingon" (in the Private
Use Area) in Unicode.


Thomas Chan
[EMAIL PROTECTED]





Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Peter_Constable


On 11/22/2000 01:39:53 PM Thomas Chan wrote:

Maybe you should also be asking what the difference between U+0041 LATIN
CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, and U+0410 CYRILLIC
CAPITAL LETTER A is called.

You call it the same thing as the difference between U+10A0 GEORGIAN
CAPITAL LETTER AN and U+126D ETHIOPIC SYLLABLE VE: a character difference.



 I like "Astral Planes" better.

I don't.  I write in Cantonese and some of contents of Plane 2 are very
much down-to-earth for me.  Are you a musician?  If so, then Plane 1 would
be important to you, too.

Throwing around terms like "Astral Planes", whether official or not, will
just engender lack of credibility for Unicode, which has already happened
to some extent among people who heard about some "Klingon" (in the Private
Use Area) in Unicode.

I agree that, since there are official terms, "supplementary planes",
"supplementary characters" etc. that we should encourage their use. It is
true that some question credibility of Unicode and that the use of esoteric
terms or occasional allusions to literary classics like The Hitchhiker's
Guide to the Galaxy probably don't contribute to building credibility. On
the other hand, the thing that will most strongly build credibility is
seeing Unicode supported in software implementations, and this is
happening. I won't be surprised, on the day that Unicode 3.1 is published,
if MS makes available from the MS Office web site a font and IME update for
Office 10 that provides support for all of those new Han ideographs you've
all been waiting for. (I believe Office 10 will ship with everything else
that would be needed to support these characters.) Things like that give a
lot of credibility to Unicode. So, if in our discussions on this list John
Cowan refers to an astral character or two, or if I invite someone to the
restaurant at the end of the universe, I don't think that will hurt much.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Michael \(michka\) Kaplan

 I don't get what you mean by "new code pages".  Who's creating those
 anymore?

Actually, lots of people, unfortunately. From WG3 and the endless parades of
8859 codepages, to WG01 of INFITT, to the now [in]famous GB-18030, there are
lots of code pages being researched, created, modified, and otherwise used.

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/





Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread 11digitboy

Okay. Get out your copy of the lyrics to the Ranma
1/2 Complete Vocal Collection Vol. 1. Now look at
the lyrics to Ranbada Ranma (that's Track 12) and
tell me that the long vowel mark is not used with
hiragana.

| ||\ __/__  |   |  _/_   | ||  
/
| _|_  ,--, /   \  /_|  -+- / --- | /
|V T_)| |   |\   |   ||/
_
 \_/   T /  \   /  __/   |   /---  \_/ L/
\


 Thomas Chan [EMAIL PROTECTED] wrote:
 On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote:
 
  If the difference between "A" and "a" is called
 "case",
  what is the difference between HIRAGANA LETTER
 YA
  and KATAKANA LETTER YA called? (I think either
 of
  those letters would do to describe this with
 the
  new code pages. The description would be enhanced
  by liberal application of HIRAGANA-KATAKANA LONG
  VOWEL MARK.)
 
 Maybe you should also be asking what the difference
 between U+0041 LATIN
 CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA,
 and U+0410 CYRILLIC
 CAPITAL LETTER A is called.
  
 However, although U+3084 HIRAGANA LETTER YA and
 U+30E4 KATAKANA LETTER YA
 are both derived from U+4E54 (the former from a
 cursive form; the latter
 from a simplification of the print form), it doesn't
 hold for most other
 kana, such as U+3042 HIRAGANA LETTER A and U+30A2
 KATAKANA LETTER A, which
 are derived from a cursive form of U+5B89 and a
 simplification of the
 print form of U+963F, respectively.
 
 I don't get what you mean by "new code pages".
  Who's creating those
 anymore?
  
 Hiragana, unlike katakana, doesn't use U+30FC KATAKANA-HIRAGANA
 PROLONGED SOUND MARK for writing long vowels. 
 (Why does it have this name
 in Unicode?)  What's this "HIRAGANA-KATAKANA LONG
 VOWEL MARK"?--I see no
 such thing.
 
  
  I like "Astral Planes" better.
  Will they include INUKTITUT VIGESIMAL DIGITs?
 
 I don't.  I write in Cantonese and some of contents
 of Plane 2 are very
 much down-to-earth for me.  Are you a musician?
  If so, then Plane 1 would
 be important to you, too.
 
 Throwing around terms like "Astral Planes", whether
 official or not, will
 just engender lack of credibility for Unicode,
 which has already happened
 to some extent among people who heard about some
 "Klingon" (in the Private
 Use Area) in Unicode.
 
 
 Thomas Chan
 [EMAIL PROTECTED]
 
 
 

___
Get your own FREE Bolt Onebox - FREE voicemail, email, and
fax, all in one place - sign up at http://www.bolt.com




Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread David Starner

On Wed, Nov 22, 2000 at 11:39:53AM -0800, Thomas Chan wrote:
 On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote:
  I like "Astral Planes" better.
  Will they include INUKTITUT VIGESIMAL DIGITs?
 
 I don't.  I write in Cantonese and some of contents of Plane 2 are very
 much down-to-earth for me.  Are you a musician?  If so, then Plane 1 would
 be important to you, too.

What does importance have to do with it? A lot of societies would regard things
astral as much more important than things earthly. I personally read
supplimentary as a 'suppliement' i.e. an add-on, not essential. But that's just
me. I think you're reading too much into it. 

Personally, I don't dislike supplimentary because of any connotations it may or
may not have, but instead because it's one of the clumsy words this field is
littered with: internationalization, localization, supplimentary.

 Throwing around terms like "Astral Planes", whether official or not, will
 just engender lack of credibility for Unicode, which has already happened
 to some extent among people who heard about some "Klingon" (in the Private
 Use Area) in Unicode.

Yes, I can see how a bunch of characters created by people to name their horses
getting added to Unicode could cause a loss of credibility. Or am I getting
something confused here? 

How about this - Unicode judges characters by their usefulness and the
principles set forth in Chapter 1 of the Unicode standard, instead of looking
down on some languages and users and considering them inherantly less worthy?

-- 
David Starner - [EMAIL PROTECTED]
http://dvdeug.dhis.org
Looking for a Debian developer in the Stillwater, Oklahoma area 
to sign my GPG key



RE: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Ayers, Mike


 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]


 Okay. Get out your copy of the lyrics to the Ranma
 1/2 Complete Vocal Collection Vol. 1. Now look at
 the lyrics to Ranbada Ranma (that's Track 12) and
 tell me that the long vowel mark is not used with
 hiragana.

The long vowel mark is not used with hiragana.  Either there is a
misuse or (most likely), you're interpreting a hyphen as a long vowel mark.

 | ||\ __/__  |   |  _/_   | ||
 /
 | _|_  ,--, /   \  /_|  -+- / --- | /
 |V T_)| |   |\   |   ||/
 _
  \_/   T /  \   /  __/   |   /---  \_/ L/
 \

Whatever you were trying to do here, it didn't work very well.


/|/|ike



Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Rick McGowan

For what it's worth, in this oh-so-important discussion... I have seen this length 
mark used with both Katakana and Hiragana (I suppose that puts me in the good company 
of 'Leven Digit Boy, only he can prove it and I can't).  Call the usage nonce or 
whatever... So what?  It would be fair to say this length mark is not NORMALLY used 
with Hiragana, which NORMALLY uses the vowel "u" to indicate lengthening.  Katakana 
likewise NORMALLY uses the length mark, but is not prevented from using the "u" vowel, 
and in some contexts does so.  For what it's worth trivia-wise, Katakana-as-okurigana 
is a style not normally used in the ordinary writing of Japanese sentences, but they 
can be, and on occasion are (especially in old orthography)...so don't be surprised 
when you see them... the natives are not going nuts, they're merely surprising the 
Conservative Foreign Formalists.

I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA PROLONGED SOUND 
MARK, is one of those Great Mysteries Buried in Time, the answer to which only Dr. 
Whistler knows.  (I would lay a handful of soft currency on the truth of the 
proposition that there exists an ancient meeting document on yellow lined paper of the 
pre-Consortium Unicode Working Group which could shed light on the question of this 
name, but I digress.)  At least the name indicates that one is not nominally prevented 
from using it for Katakana, thus pre-empting perennial requests from the Completist 
Fringe for the addition of a second length mark for use with Hiragana.

Rick


Begin forwarded message:

 From: "Ayers, Mike" [EMAIL PROTECTED]
 Date: Wed Nov 22, 2000  01:32:58 PM US/Pacific
 To: Unicode List [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: RE: Kana and Case (was [totally OT] Unicode terminology)
 
 
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 
 
  Okay. Get out your copy of the lyrics to the Ranma
  1/2 Complete Vocal Collection Vol. 1. Now look at
  the lyrics to Ranbada Ranma (that's Track 12) and
  tell me that the long vowel mark is not used with
  hiragana.
 
   The long vowel mark is not used with hiragana.  Either there is a
 misuse or (most likely), you're interpreting a hyphen as a long vowel mark.

 


Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Peter_Constable


On 11/22/2000 04:06:59 PM Rick McGowan wrote:

I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA
PROLONGED
SOUND MARK, is one of those Great Mysteries Buried in Time, the answer to
which
only Dr. Whistler knows.  (I would lay a handful of soft currency on the
truth
of the proposition that there exists an ancient meeting document on yellow

lined paper of the pre-Consortium Unicode Working Group which could shed
light
on the question of this name,

And I, on the truth of the proposition that the aforementioned Dr. Whistler
could provide at least a summary of the contents of The Yellow Lined Paper
Manuscript and of the interpretations and reactions of said manuscript by
various parties, if not a facsimile or the original itself.


Peter




Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Tom Emerson

[EMAIL PROTECTED] writes:
 And I, on the truth of the proposition that the aforementioned Dr. Whistler
 could provide at least a summary of the contents of The Yellow Lined Paper
 Manuscript and of the interpretations and reactions of said manuscript by
 various parties, if not a facsimile or the original itself.

Yes, probably the same Yellow Lined Paper containing the rationale for
the missnamed "Hangzhou" numerals...

http://cymru.basistech.com/papers/Hangzhou.pdf

-tree

-- 
Tom Emerson  Basis Technology Corp.
Zenkaku Language Hackerhttp://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"



Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Rick McGowan

The Venerable Dr Whistler wrote:

 I'm sure there is, but I can't lay hands on it right at the moment.
 It's sitting in a box in the basement somewhere.

Uh... He probably meant to write:

  "Yes, it's right here ahem as you can see from Diagram 7,
  it's part of the thin banded layer right above the level of the
  Late Xerox midden but beneath the First Dynasty Unicodic layers."

In any case, I breathe a sigh of relief that my handful of soft currency is safe once 
again. ;-)

Rick

 


Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Tex Texin



Kenneth Whistler wrote:
 ...The place you'll see this usage of the prolonged sound
 mark fairly frequently is in Japanese comics, which are rather
 loose and inventive in their use of spellings and "paraspellings"
 to convey tone of voice and other prosodic information.


Which brings up the question, when do we encode the
comic book (non-spacing) zig-zaggy-balloon-thingie that goes around 
the text for pow!, biff#@!, bam%$#!, and shazam! ?

;-)
Tex




--
Tex Texin  Director, International Business
mailto:[EMAIL PROTECTED]  +1-781-280-4271 Fax:+1-781-280-4655
Progress Software Corp.14 Oak Park, Bedford, MA 01730

http://www.Progress.com#1 Embedded Database
http://www.SonicMQ.com #1 Performing JMS Messaging
http://www.ASPconnections.com  #1 provider in the ASP marketplace
http://www.NuSphere.comOpen Source software and services for
MySQL

Globalization Program   
http://www.Progress.com/partners/globalization.htm
---



Lakota reprise: (Re)birth of a character

2000-11-22 Thread Kenneth Whistler

On a couple occasions the issue of Unicode coverage of
the Lakota orthography has come up on this list. I finally
tracked down enough source material to identify the problem.

The issue for Lakota in Unicode is the representation of
the Lakota nasal vowels in the 1982 Lakota orthography. That
orthography was developed by Lakota educators, was adopted
by the South Dakota Association of Bilingual and Bicultural
Education, and is being used to print books, dictionaries,
and teaching materials for Lakota.

There are a number of encoding issues for the 1982 Lakota
orthography in Unicode, because of the nature of the diacritic
usage that was chosen. That diacritic usage departs from
Americanist conventions to meet a number of criteria, including
familiarity from older usage, aesthetics, and some other
intangible factors.

In particular, to represent the 1982 Lakota orthography in
Unicode, you must make use of Latin letters plus the following
characters as diacritics:

U+0307 COMBINING DOT ABOVE  

   indicates aspiration on surds (p, t, c, k); modified point 
   of articulation on fricatives (s, h); modified manner
   of articulation on g [g-dot-above = voiced velar fricative].

U+0304 COMBINING MACRON

   indicates voicelessness on surds (p, t, c, k).

U+02B9 MODIFIER LETTER PRIME

   indicates ejective release on surds (p, t, c, k); post-glottalic
   release on fricatives.

The latter usage is derivative from the use in the Buechel 1939
grammar of the (typewriter) apostrophe (i.e. U+0027) for the
same function. And that, in turn, is related to the Americanist
usage of U+02BC MODIFIER LETTER APOSTROPHE to indicate ejective
or glottal release. This means there is probably going to be some
ambiguity in the representation of Lakota, since people are going
to be uncertain as to whether U+02B9, U+02BC, or U+0027 should be
used. The fonts used with the current printed material clearly show
a prime mark, rather than a raised comma or a directionally neutral
apostrophe, but Lakota linguists and educators will presumably need
to decide this one.

The real issue is for the mark used to indicate nasalization of vowels.

Lakota has three nasal vowels, a nasalized form of /i/, /a/, and of /u/.
The 1982 orthography indicates these with digraphs, where the second
element is basically an n with a long right leg. Earlier discussion
of this had pointed to Unicode U+019E LATIN SMALL LETTER N WITH LONG RIGHT LEG
as this character. But that character has no associated uppercase character,
which is needed for the Lakota orthography.

The issue is complex, however. It is clear that this Lakota letter
is a new creation. If you go back to the source of this element of
the orthography, you can find it in Buechel, 1939, A Grammar of Lakota,
which represents the vowels this way, but using what is clearly a
lowercase Greek letter eta (i.e. U+03B7). This, in turn, derived from
a 19th century Dakota alphabet created by Episcopal missionaries and
associated particularly with the name of Stephen R. Riggs. The Greek
letter eta was often a printing substitution for eng (i.e. U+014B),
to indicate nasalization. So we have a complicated confusion here of
three letterforms.

U+019E was proposed in the IPA Principles (1949) for use in digraphic
spellings of nasal vowels -- presumably as a way of regularizing the
eta/eng confusion. But the letter was withdrawn from the IPA in 1976.

However, presumably because of the enormous impact of the missionary
orthography on the history of the written Lakota language, the
digraphic spelling of nasal vowels was preferred by the Lakota
educators when deciding on the 1982 orthography, over the general
Siouan linguistic tradition of writing nasal vowels with ogoneks.
Effectively, this meant a resurrection of the n-with-long-right-leg,
since the orthography was intended to be Latin, not Latin with one
Greek letter eta.

The practical orthographies used in the missionary dictionaries and
grammars, and technical linguistic orthography of Boas and Deloria
never had to decide on the problem of how to uppercase the nasal
vowel, since as a digraphic representation, the nasal indicator never
occurs initially, and those sources don't use all-cap text anywhere.
But the 1982 orthography is intended for general use-- and that means
that the Lakota text can also occur in all-cap environments such
as chapter headers, and so on.

So as in the case of African languages that adopted an IPA-based
orthography, and then created uppercase versions of letters that
had no uppercase in IPA (cf. U+0186, U+018F, U+01A9, for example),
we have another instance here of orthographic usage driving the
need for a new uppercase character: LATIN CAPITAL LETTER N WITH LONG RIGHT
LEG.

--Ken





Re: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Thomas Chan

On Wed, 22 Nov 2000, David Starner wrote:

 On Wed, Nov 22, 2000 at 11:39:53AM -0800, Thomas Chan wrote:
  On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote:
   I like "Astral Planes" better.
   Will they include INUKTITUT VIGESIMAL DIGITs?
  
  I don't.  I write in Cantonese and some of contents of Plane 2 are very
  much down-to-earth for me.  Are you a musician?  If so, then Plane 1 would
  be important to you, too.
 
 What does importance have to do with it? A lot of societies would regard things
 astral as much more important than things earthly. I personally read
 supplimentary as a 'suppliement' i.e. an add-on, not essential. But that's just
 me. I think you're reading too much into it.

"Astral" might be okay, but for many people, "astral plane" conjures up
images of metaphysical or things of science fiction, and suggest they are
to be taken less seriously.


 Personally, I don't dislike supplimentary because of any connotations it may or
 may not have, but instead because it's one of the clumsy words this field is
 littered with: internationalization, localization, supplimentary.

Whether we like it or not, "supplementary" is the official term now, just
like the use of the term "ideograph" or "letter".

 
  Throwing around terms like "Astral Planes", whether official or not, will
  just engender lack of credibility for Unicode, which has already happened
  to some extent among people who heard about some "Klingon" (in the Private
  Use Area) in Unicode.
 
 Yes, I can see how a bunch of characters created by people to name their horses
 getting added to Unicode could cause a loss of credibility. Or am I getting
 something confused here? 

I think a bit, yes.  Those characters for names of horses (or individuals)
aren't fictional like the Klingon alphabet.  There already are some in
the BMP for names of horses, such as U+9A04, U+9A4A, U+9A2E; or
individuals such as U+66CC, btw--but probably included on the basis of
being in legacy character sets as of the early 90's.  In time, some of
them, such as U+66CC, have become used by more people than the original
bearer.

I personally don't agree with frivolous racehorse names, but the bulk of
the CJK Extension B in Plane 2 isn't stuff like that, but characters that
have withstood at least the test of being included in large dictionaries
and encyclopedias of the last few centuries.

(I'm curious to know the codepoints of those racehorse names, and if any
actually made it into Plane 2.)

 
 How about this - Unicode judges characters by their usefulness and the
 principles set forth in Chapter 1 of the Unicode standard, instead of looking
 down on some languages and users and considering them inherantly less worthy?

I don't disagree with those principles, but it is clear that what is in
the BMP occupies a first-class position until and if support for non-BMP
areas comes--e.g., a few of the things mentioned on this list include
support in Java, capacity of TrueType fonts, UTF-16 encoding, etc.  If
Planes 1 and 2 are not implemented because people think there are only
nonsense personal ideographs there, or stuff only of interest to
"unprofitable" academics, then that in turn harms the users of living
written cultures, such as Cantonese, who do make use of them.  (If
they were in the BMP, then I could be using them today, with even software
written years ago, but alas that is not the case.  I know I can use
the PUA now, but even that is second-class because of lack of
standardization by definition, exclusion in sorting and character
properties, etc.)


Thomas Chan
[EMAIL PROTECTED]





Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)

2000-11-22 Thread Katsuhiko Momoi

As other people commented, there is nothing in principle that prevents 
Japanese from writing Hiragana with the elongation mark U+30FC. The 
Japanese Language Council can recommend all they want but the "spirit of 
language" has its own will as it has always been in any language. In 
fact a couple of Japanese top 10 "Popular Words of the Year" in recent 
years use U+30FC. See for example an entry for 1987, "da-ijo-buda-" (No 
problem.) on this page:

http://www.fujifilm.co.jp/salon/utsurun/y87/ry.html

and one of the 2 most widely popular words of 1998, "dattyu-no" (requires a context 
and a physical gag to explain this and so I won't.), on this page:

http://www.jiyu.co.jp/gendai/shingo/shingo.html
(Click on the 1998 link on the left.)
The other elongation character, U+FF5E, is also used very widely in 
Hiragana writing in informal/comic book/personal mail writing. It could 
even be that U+FF5E represents a kind of contour tone associated with 
this jocular use, as opposed to, say, a more flat tone, with U+30FC.

The use of these elongation symbols for Hiragana is so established in 
popular writing that Japanese search engines must ignore the differences 
between these elongation symbols in addition to ignoring Hiragana and 
Katakana differences.

- Kat

Rick McGowan wrote:

 For what it's worth, in this oh-so-important discussion... I have seen this length 
mark used with both Katakana and Hiragana (I suppose that puts me in the good company 
of 'Leven Digit Boy, only he can prove it and I can't).  Call the usage nonce or 
whatever... So what?  It would be fair to say this length mark is not NORMALLY used 
with Hiragana, which NORMALLY uses the vowel "u" to indicate lengthening.  Katakana 
likewise NORMALLY uses the length mark, but is not prevented from using the "u" 
vowel, and in some contexts does so.  For what it's worth trivia-wise, 
Katakana-as-okurigana is a style not normally used in the ordinary writing of 
Japanese sentences, but they can be, and on occasion are (especially in old 
orthography)...so don't be surprised when you see them... the natives are not going 
nuts, they're merely surprising the Conservative Foreign Formalists.
 
 I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA PROLONGED SOUND 
MARK, is one of those Great Mysteries Buried in Time, the answer to which only Dr. 
Whistler knows.  (I would lay a handful of soft currency on the truth of the 
proposition that there exists an ancient meeting document on yellow lined paper of 
the pre-Consortium Unicode Working Group which could shed light on the question of 
this name, but I digress.)  At least the name indicates that one is not nominally 
prevented from using it for Katakana, thus pre-empting perennial requests from the 
Completist Fringe for the addition of a second length mark for use with Hiragana.
 
   Rick
 
 
 Begin forwarded message:
 
 From: "Ayers, Mike" [EMAIL PROTECTED]
 Date: Wed Nov 22, 2000  01:32:58 PM US/Pacific
 To: Unicode List [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Subject: RE: Kana and Case (was [totally OT] Unicode terminology)
 
 
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
 
 
 Okay. Get out your copy of the lyrics to the Ranma
 1/2 Complete Vocal Collection Vol. 1. Now look at
 the lyrics to Ranbada Ranma (that's Track 12) and
 tell me that the long vowel mark is not used with
 hiragana.
 
  The long vowel mark is not used with hiragana.  Either there is a
 misuse or (most likely), you're interpreting a hyphen as a long vowel mark.
 
 
  


-- 
Katsuhiko Momoi
Netscape International Client Products Group
[EMAIL PROTECTED]

What is expressed here is my personal opinion and does not reflect 
official Netscape views. 




Re: Lakota reprise: (Re)birth of a character

2000-11-22 Thread Kenneth Whistler

'leven Digit Boy expostulated:

 Just put in that letter.

  |\|
  | \   |
  |  \  |
  |   \ |
  |\|
\
 \

 THAT is the letter you mean, right?
 And it's NOT IN UNICODE?!

Well, no, not exactly. To borrow the ASCII art technique,
it is:

  |/\
  | |
  | |
  | |
  | |
|
|

That is, a capital form based on the shape of U+019E.

See Albert White Hat, Sr., Reading and Writing the Lakota
Language (Lakota Iyapi un Wowapi nahan Yawapi), 1999,
p. 12. The chapter heading, WOUNSPE TOKAHE (The First
Teaching) is in all-caps, and what I have roughly indicated
here in ASCII with an "N" is actually the letter shown above.

--Ken