RE: sort of OT: politics and scripts
Cathy Wissink [EMAIL PROTECTED] writes: The Soviet language policies under both Lenin and Stalin were amazing in what they managed to change in a very short time, especially considering the scripts first shifted from Arabic to Latin, then just a decade or so later to Cyrillic. I too have been wondering when there would be a movement in the post-Soviet, Central Asian countries away from Cyrillic; my assumption has always been that they would want to return to Arabic (or for others, back to their indigenous scripts). Surprisingly, however, in our NLS implementation, the movement is away from Cyrillic, as you noted, but towards Latin rather than Arabic. The answer does of course lies in the reform imposed by Mustafa Kemal in Turkey. Turkey is naturally the leading state in Turkic world, so it's natural to turn to Turkey to get an alphabet. We've seen this in Azeri and Uzbek, The most recent to announces a change was Tatarstan. -- Yours sincerely, Erland Sommarskog [EMAIL PROTECTED]
Open-Type Support (was: Greek Prosgegrammeni)
Dear all, a lot was said in this thread about intelligent rendering mechanisms, such as fonts implementing automatic glyph substitution and things like that. The notion appears to be quite commonplace to the experts, whereas I (being an amateur) must admit it seemed just like a utopic dream to me when I first heard of the possibility of such a thing, a few months ago. I figure that people are mostly thinking of the technology called "Open Type", is that right? Can anybody enlighten me about how much support for that technology is already available in standard software, say, in browsers or text processors under Windows 9x? If I had a True-Type font that implemented the glyph substitutions, say, for the Greek combining diacritics, could I make my average standard word processing software actually use these features? Or would I have to wait for specialized multilingual word processors to appear on the market? I found the documentation of the "GetCharacterPlacement" function in the Windows API. It looks like that was the place were these things should be implemented system-wide. But I played with it a bit and found it didn't actually do any glyph replacements. Is that function actually implemented in Win98, or is it just a stub? Or did I make a mistake in my testing, or is something wrong with my system? Can Win2000 do more than Win98 in this respect? I also noticed that MS Internet Explorer does use glyph replacement features on my system when it is displaying Arabic. How does it do that? Would there be a way of making it use other Open-Type features too? Lukas Pietsch Ferdinand-Kopf-Str. 11 D-79117 Freiburg Tel. 0761-696 37 23 Universität Freiburg Englisches Seminar
Re: Greek Prosgegrammeni
Thanks to Asmus and Kenneth for their clarifying comments. Things are beginning to seem to make sense to me... (:-) Especially, I'm quite relieved to see now that: - for any one of the common printing variants of mute iota that a user might want to see, - there is already at least one easily available truetype font, so that - even *without* special glyph shaping or glyph substitution mechanisms in display, - there will be at least one way of encoding that will be stable, in the sense that it will guarantee the desired display and not get corrupted when undergoing canonical composition/decmposition; and, most importantly: - all these encodings will be recognized as equivalent by Unicode applications when it comes to case-insensitive matching (because all these character sequences case-fold to the same sequence of vowel + small iota (03B9)). That's something, isn't it? What will *not* work, for most users, is automatic case *conversion*. This will lead to undesired or unexpected results in most cases. But there are other independent reasons for that anyway: For most users, correct uppercasing also involves the stripping of accents and breathings, and the Unicode casing rules don't provide for that either. But then again: who wants to use automatic case conversion for polytonic Greek anyway? (I can hardly remember having ever used it even in the Latin script in all the text processing I've done.) People will simply be typing sequences that Unicode will see as irregular mixed-case strings, but who cares? I guess all the computational features that really matter to most of us common mortals (like sorting, word searches etc.) involve the "case-folding" feature used for case-insensitive matching, and as I said above, this seems to work out in a fairly intuitive and sensible way. So, after all, the UTC people do deserve a pat on the back for their good work? (:-) I have another ignorant layman's sort of question, but I'll put it into a second message because it really consitutes a different topic. Lukas
Re: Open-Type Support (was: Greek Prosgegrammeni)
Lukas Pietsch wrote: a lot was said in this thread about intelligent rendering mechanisms, [...] I figure that people are mostly thinking of the technology called "Open Type", is that right? Right, but quite partial. There are several major technologies for rendering "complex Unicode scripts". Here are some of the principal ones: - Open Type itself (see in http:/www.microsoft.com). The "font-specific intelligence" is in the font itself; the "generic script intelligence" is in a software component called UniScribe. - AAT/ATSUI (see in http:/www.apple.com). Most of the "intelligence" is in the font itself, which also includes a state machine to operate substitution. The behavior of the smart fonts may be influenced by external user settings. - Graphite --my favorite, so far-- (see in http:/www.sil.org). Takes a "stupid" TrueType font and merges it with the "intelligence" written in an ad-hoc description language (GDL), to produce an "intelligent" font quite similar to AAT/ATSUI. The accent is on extendability and, specially, in supporting the Private User Area (which is a precious resource for linguistic research and defining new orthographies). - Omega (http://omega-system.sourceforge.net). Built on top of the old and glorious TeX typesetting system. It may becaome (or already is?) the standard for Unicode in Linux. - More... Other projects are ongoing, with a variety of approaches, philosophies, scopes, applications. OTH _ Marco __ La mia e-mail è ora: My e-mail is now: marco.cimarostiªeurope.com (Cambiare "ª" in "@") (Change "ª" to "@") __ FREE Personalized Email at Mail.com Sign up at http://www.mail.com/?sr=signup
Re: Open-Type Support (was: Greek Prosgegrammeni)
On Wed, Nov 22, 2000 at 04:19:42AM -0800, Marco Cimarosti wrote: - Omega (http://omega-system.sourceforge.net). Built on top of the old and glorious TeX typesetting system. It may becaome (or already is?) the standard for Unicode in Linux. I've never seen Omega used under Linux, nor have I found any good (English) documentation for it, although it is shipped with tetex and hence with Debian and probably other Linux distributions. FreeType seems to support OpenType fonts. Pango (http://www.pango.org) apparently is going to use FreeType at some point, but is currently hacking some complex script support into bdf (http://www.wholehog.fsnet.co.uk/robert/indic/fonts.html). -- David Starner - [EMAIL PROTECTED] http://dvdeug.dhis.org Looking for a Debian developer in the Stillwater, Oklahoma area to sign my GPG key
Re: Open-Type Support (was: Greek Prosgegrammeni)
John Hudson wrote: At present, polytonic Greek is not supported in Uniscribe, I suspect because no one has determined that it needs to be. So, would you agree that it does need to be? Keeping in mind what Kenneth Whistler wrote: Not if the fonts they use map capital letter + ypogegrammeni character combinations into capital letter + full-size iota glyph sequences. Of course, if the fonts they use are not designed for correct use with polytonic Greek, then the default rendering behavior of the ypogegrammeni will not be what they expect or want. Time to upgrade the fonts. ... This is not all that sophisticated. It should be a matter that can be wholly encapsulated within the fonts: Font IFont II A. 0397 0313 0345 == 'H iota adscript 'H iota subscript B. 1F98== 'H iota adscript 'H iota subscript ... Many of us have felt all along that polytonic Greek should always be represented decomposed, and that the ELOT polytonic "character" encoding was a dangerous conflation of glyph design and character encoding concerns. ... Implementations that use full decomposition for polytonic Greek and fonts that correctly map the accentual and diacritic combinations are the best bet for consistency *and* good presentation in the long run. Mind that the case-mapping question we were discussing is just one minor aspect of the issue; the main task is much more general, and at the same time more straightforward (If we leave aside the issue of automatic case conversion and the fancy problems of, let's say, small-caps): the decomposed character sequences simply need to be mapped to the precomposed ones. It affects not only the iota subscripts/adscripts but also all the other diacritics. Without some glyph processing most combinations will never display readably. Since the precomposed glyphs already exist as Unicode codepoints, I suppose that the implementation would probably not even be very difficult, and not much of it would even depend on the individual font, would it? By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have the precomposed characters in Unicode in the first place. I'm very glad they are there, since, as we see, the beautiful smart rendering features we are talking about are simply not yet available in mainstream text processing software. Much as I like the idea of the projects such as "Graphite" that Marco mentioned, I do think there are quite a number of people out here who would love to be able to handle Greek comfortably in their everyday all-purpose text-processing and browsing software. The precomposed characters are at present the only means they have to do so on a Windows platform. Adding smart rendering support for the decomposed characters would provide them with a much better means; I'd certainly agree with Kenneth about that. And I'd also think it would be preferable if that could be done system-wide and not just by some individual application, wouldn't it? So it seems as if Uniscribe looks like the best bet at the moment, for Windows users. What do the Microsoft people think? May we hope? Lukas
Re: Open-Type Support (was: Greek Prosgegrammeni)
At 08:05 AM 11/22/2000 -0800, Lukas Pietsch wrote: Mind that the case-mapping question we were discussing is just one minor aspect of the issue; the main task is much more general, and at the same time more straightforward (If we leave aside the issue of automatic case conversion and the fancy problems of, let's say, small-caps): the decomposed character sequences simply need to be mapped to the precomposed ones. It affects not only the iota subscripts/adscripts but also all the other diacritics. Without some glyph processing most combinations will never display readably. Since the precomposed glyphs already exist as Unicode codepoints, I suppose that the implementation would probably not even be very difficult, and not much of it would even depend on the individual font, would it? Mapping decomposed character sequences to precomposed is not something that necessarily needs to be done in a font, or even in a script shaping engine like those in Uniscribe. This could be handled entirely at the IME level (e.g. as a simple extension of keyboard input). Font level glyph processing is particularly adapt at handling character-to-glyph and glyph-to-glyph manipulations, character-to-character manipulations can be handled almost anywhere in an input process. By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have the precomposed characters in Unicode in the first place. I'm very glad they are there, since, as we see, the beautiful smart rendering features we are talking about are simply not yet available in mainstream text processing software. The counter argument could be made: that if Unicode had not accepted so many precomposed diacritic characters, especially in the Latin blocks, smart rendering software would have become mainstream much sooner. It is unfortunately true that, if smart rendering were necessary to process German and French, it would have been a priority many years ago. John Hudson Tiro Typeworks | Vancouver, BC | All empty souls tend to extreme opinion. www.tiro.com | W.B. Yeats [EMAIL PROTECTED]|
Re: Open-Type Support (was: Greek Prosgegrammeni)
Let me add a little to what Marco has written: - Open Type itself (see in http:/www.microsoft.com). The "font-specific intelligence" is in the font itself; the "generic script intelligence" is in a software component called UniScribe. OpenType provides partial support for complex script rendering. It is dependent upon software to interpret the font-specific information in the OT tables in an OT font, and to also take care of some rendering issues which OT itself does not address (e.g. reordering as needed for Indic). These things can be handled directly by an application. MS has also provided the Uniscribe engine for this purpose, however. (There are some aspects of OT support related to fine typography that Uniscribe does not address. Uniscribe is intended eventually to provide adequate support for complex script rendering, however.) On Win9x/Me and on WinNT4, Uniscribe support must be explicitly written into an app; i.e. an app must explicitly call the Uniscribe engine to take advantage of its benefits. Word 2000 does this, for example, to handle Arabic, but it does not do this for Thai (except in the S. Asia version of Word 2000). In contrast, on Win2000, all Win32 text drawing interfaces make use of Uniscribe. Thus, *any* app running on Win2000 benefits from Uniscribe. As has been mentioned, current versions of Uniscribe provide support for some scripts but not others. Work is being done to extend the selection of scripts that are supported. Currently, polytonic Greek is not supported, but it will be supported in the future. New updates of the Uniscribe engine will appear next year with Office 10 and with Whistler (apparently Win2000 consumer version) or with other updates to Windows, Office or Internet Explorer. I have no idea what new script support will appear when. I just know that more is coming. OT implementations are being done for Mac and Unix/Linux. On the Mac side, Apple reps have made statements that suggest that they would incorporate system-level support for the aspects of complex rendering that OT itself doesn't provide (i.e. they'd write something comparable to Uniscribe). On Unix/Linux, I'm not sure what is being done about providing the support that OT itself lacks. - AAT/ATSUI (see in http:/www.apple.com). Most of the "intelligence" is in the font itself, which also includes a state machine to operate substitution. The behavior of the smart fonts may be influenced by external user settings. Essentially, all of the intelligence is in the font. (There is an external engine that runs the state tables in the font, but that's a generic engine - all the behaviour is embodied in the state tables in the font). Thus, complex script rendering for polytonic Greek (for example) is available if a system has an AAT font that implements support for that script. In order to take advantage of that capability, however, an application must be written to use the ATSUI text drawing interfaces rather than the older QuickDraw interfaces. Developers have been slow on the uptake, but Apple has been working hard to make it easier for developers to support these interfaces. - Graphite --my favorite, so far-- (see in http:/www.sil.org). Takes a "stupid" TrueType font and merges it with the "intelligence" written in an ad-hoc description language (GDL), to produce an "intelligent" font quite similar to AAT/ATSUI. The accent is on extendability and, specially, in supporting the Private User Area (which is a precious resource for linguistic research and defining new orthographies). The font technology itself is indeed very much like AAT, though there are some differences. The existence of GDL is an important difference, though I wouldn't have called it an "ad-hoc" language. It is a carefully designed high-level language intended to deal specifically with the kinds of issues involved in complex scripts. Graphite also relies on a generic run-time engine that interprets the state tables that are added to the font, and also requires applications to be written using special interfaces that call upon that engine. There is not yet support for this outside of SIL that I know of, though many have expressed interest. In particular, there has been a lot of interest in seeing this technology implemented for the Unix/Linux environment. - Omega (http://omega-system.sourceforge.net). Built on top of the old and glorious TeX typesetting system. It may becaome (or already is?) the standard for Unicode in Linux. Whatever Omega does or doesn't do, I wouldn't categorize it as a general script rendering system like AAT, OT/Uniscribe and Graphite. It is an end-user application, not a system extension for complex script support. I suppose you could write an app that only output text by generating TeX source and processing it via Omega, but I wouldn't expect to find much of a market for such an app. The other platform of potential interest is Java. Sun has been working on providing complex script support in Java 2.
Re: Open-Type Support (was: Greek Prosgegrammeni)
On Wed, 22 Nov 2000, John Hudson wrote: At 08:05 AM 11/22/2000 -0800, Lukas Pietsch wrote: By the way, I wouldn't agree with Kenneth that it wasn't a good idea to have the precomposed characters in Unicode in the first place. I'm very glad they are there, since, as we see, the beautiful smart rendering features we are talking about are simply not yet available in mainstream text processing software. The counter argument could be made: that if Unicode had not accepted so many precomposed diacritic characters, especially in the Latin blocks, smart rendering software would have become mainstream much sooner. It is unfortunately true that, if smart rendering were necessary to process German and French, it would have been a priority many years ago. I agree with you on this point. I guess this is kind of 'kitchen and egg' issue. Let me draw another example from Korean Hangul. If Unicode/ISO-10646 had just a subset of precomposed syllables (perhaps 2350 of them from KS X 1001) and left out the rest (some 8000 of them for modern Korean) to be composed out of Jamos(alphabets) in U1100 block, we would be more(though not very much more) likely to have rendering infrastructure on major platforms that can offer 'beautiful rendering features' for Hangul (which is essential for the full support of modern, let alone medivial, Korean). (I'm well aware that Korean delegation adamantly insisted that all 11,172 of them be included, but in retrospect...) And, the same might be true of Greek and other scripts for which both precomposed characters and 'component' characters (decomposed) are available. Jungshik Shin
Re: Kana and Case (was [totally OT] Unicode terminology)
On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote: If the difference between "A" and "a" is called "case", what is the difference between HIRAGANA LETTER YA and KATAKANA LETTER YA called? (I think either of those letters would do to describe this with the new code pages. The description would be enhanced by liberal application of HIRAGANA-KATAKANA LONG VOWEL MARK.) Maybe you should also be asking what the difference between U+0041 LATIN CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, and U+0410 CYRILLIC CAPITAL LETTER A is called. However, although U+3084 HIRAGANA LETTER YA and U+30E4 KATAKANA LETTER YA are both derived from U+4E54 (the former from a cursive form; the latter from a simplification of the print form), it doesn't hold for most other kana, such as U+3042 HIRAGANA LETTER A and U+30A2 KATAKANA LETTER A, which are derived from a cursive form of U+5B89 and a simplification of the print form of U+963F, respectively. I don't get what you mean by "new code pages". Who's creating those anymore? Hiragana, unlike katakana, doesn't use U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK for writing long vowels. (Why does it have this name in Unicode?) What's this "HIRAGANA-KATAKANA LONG VOWEL MARK"?--I see no such thing. I like "Astral Planes" better. Will they include INUKTITUT VIGESIMAL DIGITs? I don't. I write in Cantonese and some of contents of Plane 2 are very much down-to-earth for me. Are you a musician? If so, then Plane 1 would be important to you, too. Throwing around terms like "Astral Planes", whether official or not, will just engender lack of credibility for Unicode, which has already happened to some extent among people who heard about some "Klingon" (in the Private Use Area) in Unicode. Thomas Chan [EMAIL PROTECTED]
Re: Kana and Case (was [totally OT] Unicode terminology)
On 11/22/2000 01:39:53 PM Thomas Chan wrote: Maybe you should also be asking what the difference between U+0041 LATIN CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, and U+0410 CYRILLIC CAPITAL LETTER A is called. You call it the same thing as the difference between U+10A0 GEORGIAN CAPITAL LETTER AN and U+126D ETHIOPIC SYLLABLE VE: a character difference. I like "Astral Planes" better. I don't. I write in Cantonese and some of contents of Plane 2 are very much down-to-earth for me. Are you a musician? If so, then Plane 1 would be important to you, too. Throwing around terms like "Astral Planes", whether official or not, will just engender lack of credibility for Unicode, which has already happened to some extent among people who heard about some "Klingon" (in the Private Use Area) in Unicode. I agree that, since there are official terms, "supplementary planes", "supplementary characters" etc. that we should encourage their use. It is true that some question credibility of Unicode and that the use of esoteric terms or occasional allusions to literary classics like The Hitchhiker's Guide to the Galaxy probably don't contribute to building credibility. On the other hand, the thing that will most strongly build credibility is seeing Unicode supported in software implementations, and this is happening. I won't be surprised, on the day that Unicode 3.1 is published, if MS makes available from the MS Office web site a font and IME update for Office 10 that provides support for all of those new Han ideographs you've all been waiting for. (I believe Office 10 will ship with everything else that would be needed to support these characters.) Things like that give a lot of credibility to Unicode. So, if in our discussions on this list John Cowan refers to an astral character or two, or if I invite someone to the restaurant at the end of the universe, I don't think that will hurt much. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Kana and Case (was [totally OT] Unicode terminology)
I don't get what you mean by "new code pages". Who's creating those anymore? Actually, lots of people, unfortunately. From WG3 and the endless parades of 8859 codepages, to WG01 of INFITT, to the now [in]famous GB-18030, there are lots of code pages being researched, created, modified, and otherwise used. michka a new book on internationalization in VB at http://www.i18nWithVB.com/
Re: Kana and Case (was [totally OT] Unicode terminology)
Okay. Get out your copy of the lyrics to the Ranma 1/2 Complete Vocal Collection Vol. 1. Now look at the lyrics to Ranbada Ranma (that's Track 12) and tell me that the long vowel mark is not used with hiragana. | ||\ __/__ | | _/_ | || / | _|_ ,--, / \ /_| -+- / --- | / |V T_)| | |\ | ||/ _ \_/ T / \ / __/ | /--- \_/ L/ \ Thomas Chan [EMAIL PROTECTED] wrote: On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote: If the difference between "A" and "a" is called "case", what is the difference between HIRAGANA LETTER YA and KATAKANA LETTER YA called? (I think either of those letters would do to describe this with the new code pages. The description would be enhanced by liberal application of HIRAGANA-KATAKANA LONG VOWEL MARK.) Maybe you should also be asking what the difference between U+0041 LATIN CAPITAL LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, and U+0410 CYRILLIC CAPITAL LETTER A is called. However, although U+3084 HIRAGANA LETTER YA and U+30E4 KATAKANA LETTER YA are both derived from U+4E54 (the former from a cursive form; the latter from a simplification of the print form), it doesn't hold for most other kana, such as U+3042 HIRAGANA LETTER A and U+30A2 KATAKANA LETTER A, which are derived from a cursive form of U+5B89 and a simplification of the print form of U+963F, respectively. I don't get what you mean by "new code pages". Who's creating those anymore? Hiragana, unlike katakana, doesn't use U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK for writing long vowels. (Why does it have this name in Unicode?) What's this "HIRAGANA-KATAKANA LONG VOWEL MARK"?--I see no such thing. I like "Astral Planes" better. Will they include INUKTITUT VIGESIMAL DIGITs? I don't. I write in Cantonese and some of contents of Plane 2 are very much down-to-earth for me. Are you a musician? If so, then Plane 1 would be important to you, too. Throwing around terms like "Astral Planes", whether official or not, will just engender lack of credibility for Unicode, which has already happened to some extent among people who heard about some "Klingon" (in the Private Use Area) in Unicode. Thomas Chan [EMAIL PROTECTED] ___ Get your own FREE Bolt Onebox - FREE voicemail, email, and fax, all in one place - sign up at http://www.bolt.com
Re: Kana and Case (was [totally OT] Unicode terminology)
On Wed, Nov 22, 2000 at 11:39:53AM -0800, Thomas Chan wrote: On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote: I like "Astral Planes" better. Will they include INUKTITUT VIGESIMAL DIGITs? I don't. I write in Cantonese and some of contents of Plane 2 are very much down-to-earth for me. Are you a musician? If so, then Plane 1 would be important to you, too. What does importance have to do with it? A lot of societies would regard things astral as much more important than things earthly. I personally read supplimentary as a 'suppliement' i.e. an add-on, not essential. But that's just me. I think you're reading too much into it. Personally, I don't dislike supplimentary because of any connotations it may or may not have, but instead because it's one of the clumsy words this field is littered with: internationalization, localization, supplimentary. Throwing around terms like "Astral Planes", whether official or not, will just engender lack of credibility for Unicode, which has already happened to some extent among people who heard about some "Klingon" (in the Private Use Area) in Unicode. Yes, I can see how a bunch of characters created by people to name their horses getting added to Unicode could cause a loss of credibility. Or am I getting something confused here? How about this - Unicode judges characters by their usefulness and the principles set forth in Chapter 1 of the Unicode standard, instead of looking down on some languages and users and considering them inherantly less worthy? -- David Starner - [EMAIL PROTECTED] http://dvdeug.dhis.org Looking for a Debian developer in the Stillwater, Oklahoma area to sign my GPG key
RE: Kana and Case (was [totally OT] Unicode terminology)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Okay. Get out your copy of the lyrics to the Ranma 1/2 Complete Vocal Collection Vol. 1. Now look at the lyrics to Ranbada Ranma (that's Track 12) and tell me that the long vowel mark is not used with hiragana. The long vowel mark is not used with hiragana. Either there is a misuse or (most likely), you're interpreting a hyphen as a long vowel mark. | ||\ __/__ | | _/_ | || / | _|_ ,--, / \ /_| -+- / --- | / |V T_)| | |\ | ||/ _ \_/ T / \ / __/ | /--- \_/ L/ \ Whatever you were trying to do here, it didn't work very well. /|/|ike
Fwd: Kana and Case (was [totally OT] Unicode terminology)
For what it's worth, in this oh-so-important discussion... I have seen this length mark used with both Katakana and Hiragana (I suppose that puts me in the good company of 'Leven Digit Boy, only he can prove it and I can't). Call the usage nonce or whatever... So what? It would be fair to say this length mark is not NORMALLY used with Hiragana, which NORMALLY uses the vowel "u" to indicate lengthening. Katakana likewise NORMALLY uses the length mark, but is not prevented from using the "u" vowel, and in some contexts does so. For what it's worth trivia-wise, Katakana-as-okurigana is a style not normally used in the ordinary writing of Japanese sentences, but they can be, and on occasion are (especially in old orthography)...so don't be surprised when you see them... the natives are not going nuts, they're merely surprising the Conservative Foreign Formalists. I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK, is one of those Great Mysteries Buried in Time, the answer to which only Dr. Whistler knows. (I would lay a handful of soft currency on the truth of the proposition that there exists an ancient meeting document on yellow lined paper of the pre-Consortium Unicode Working Group which could shed light on the question of this name, but I digress.) At least the name indicates that one is not nominally prevented from using it for Katakana, thus pre-empting perennial requests from the Completist Fringe for the addition of a second length mark for use with Hiragana. Rick Begin forwarded message: From: "Ayers, Mike" [EMAIL PROTECTED] Date: Wed Nov 22, 2000 01:32:58 PM US/Pacific To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: RE: Kana and Case (was [totally OT] Unicode terminology) From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Okay. Get out your copy of the lyrics to the Ranma 1/2 Complete Vocal Collection Vol. 1. Now look at the lyrics to Ranbada Ranma (that's Track 12) and tell me that the long vowel mark is not used with hiragana. The long vowel mark is not used with hiragana. Either there is a misuse or (most likely), you're interpreting a hyphen as a long vowel mark.
Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)
On 11/22/2000 04:06:59 PM Rick McGowan wrote: I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK, is one of those Great Mysteries Buried in Time, the answer to which only Dr. Whistler knows. (I would lay a handful of soft currency on the truth of the proposition that there exists an ancient meeting document on yellow lined paper of the pre-Consortium Unicode Working Group which could shed light on the question of this name, And I, on the truth of the proposition that the aforementioned Dr. Whistler could provide at least a summary of the contents of The Yellow Lined Paper Manuscript and of the interpretations and reactions of said manuscript by various parties, if not a facsimile or the original itself. Peter
Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)
[EMAIL PROTECTED] writes: And I, on the truth of the proposition that the aforementioned Dr. Whistler could provide at least a summary of the contents of The Yellow Lined Paper Manuscript and of the interpretations and reactions of said manuscript by various parties, if not a facsimile or the original itself. Yes, probably the same Yellow Lined Paper containing the rationale for the missnamed "Hangzhou" numerals... http://cymru.basistech.com/papers/Hangzhou.pdf -tree -- Tom Emerson Basis Technology Corp. Zenkaku Language Hackerhttp://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)
The Venerable Dr Whistler wrote: I'm sure there is, but I can't lay hands on it right at the moment. It's sitting in a box in the basement somewhere. Uh... He probably meant to write: "Yes, it's right here ahem as you can see from Diagram 7, it's part of the thin banded layer right above the level of the Late Xerox midden but beneath the First Dynasty Unicodic layers." In any case, I breathe a sigh of relief that my handful of soft currency is safe once again. ;-) Rick
Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)
Kenneth Whistler wrote: ...The place you'll see this usage of the prolonged sound mark fairly frequently is in Japanese comics, which are rather loose and inventive in their use of spellings and "paraspellings" to convey tone of voice and other prosodic information. Which brings up the question, when do we encode the comic book (non-spacing) zig-zaggy-balloon-thingie that goes around the text for pow!, biff#@!, bam%$#!, and shazam! ? ;-) Tex -- Tex Texin Director, International Business mailto:[EMAIL PROTECTED] +1-781-280-4271 Fax:+1-781-280-4655 Progress Software Corp.14 Oak Park, Bedford, MA 01730 http://www.Progress.com#1 Embedded Database http://www.SonicMQ.com #1 Performing JMS Messaging http://www.ASPconnections.com #1 provider in the ASP marketplace http://www.NuSphere.comOpen Source software and services for MySQL Globalization Program http://www.Progress.com/partners/globalization.htm ---
Lakota reprise: (Re)birth of a character
On a couple occasions the issue of Unicode coverage of the Lakota orthography has come up on this list. I finally tracked down enough source material to identify the problem. The issue for Lakota in Unicode is the representation of the Lakota nasal vowels in the 1982 Lakota orthography. That orthography was developed by Lakota educators, was adopted by the South Dakota Association of Bilingual and Bicultural Education, and is being used to print books, dictionaries, and teaching materials for Lakota. There are a number of encoding issues for the 1982 Lakota orthography in Unicode, because of the nature of the diacritic usage that was chosen. That diacritic usage departs from Americanist conventions to meet a number of criteria, including familiarity from older usage, aesthetics, and some other intangible factors. In particular, to represent the 1982 Lakota orthography in Unicode, you must make use of Latin letters plus the following characters as diacritics: U+0307 COMBINING DOT ABOVE indicates aspiration on surds (p, t, c, k); modified point of articulation on fricatives (s, h); modified manner of articulation on g [g-dot-above = voiced velar fricative]. U+0304 COMBINING MACRON indicates voicelessness on surds (p, t, c, k). U+02B9 MODIFIER LETTER PRIME indicates ejective release on surds (p, t, c, k); post-glottalic release on fricatives. The latter usage is derivative from the use in the Buechel 1939 grammar of the (typewriter) apostrophe (i.e. U+0027) for the same function. And that, in turn, is related to the Americanist usage of U+02BC MODIFIER LETTER APOSTROPHE to indicate ejective or glottal release. This means there is probably going to be some ambiguity in the representation of Lakota, since people are going to be uncertain as to whether U+02B9, U+02BC, or U+0027 should be used. The fonts used with the current printed material clearly show a prime mark, rather than a raised comma or a directionally neutral apostrophe, but Lakota linguists and educators will presumably need to decide this one. The real issue is for the mark used to indicate nasalization of vowels. Lakota has three nasal vowels, a nasalized form of /i/, /a/, and of /u/. The 1982 orthography indicates these with digraphs, where the second element is basically an n with a long right leg. Earlier discussion of this had pointed to Unicode U+019E LATIN SMALL LETTER N WITH LONG RIGHT LEG as this character. But that character has no associated uppercase character, which is needed for the Lakota orthography. The issue is complex, however. It is clear that this Lakota letter is a new creation. If you go back to the source of this element of the orthography, you can find it in Buechel, 1939, A Grammar of Lakota, which represents the vowels this way, but using what is clearly a lowercase Greek letter eta (i.e. U+03B7). This, in turn, derived from a 19th century Dakota alphabet created by Episcopal missionaries and associated particularly with the name of Stephen R. Riggs. The Greek letter eta was often a printing substitution for eng (i.e. U+014B), to indicate nasalization. So we have a complicated confusion here of three letterforms. U+019E was proposed in the IPA Principles (1949) for use in digraphic spellings of nasal vowels -- presumably as a way of regularizing the eta/eng confusion. But the letter was withdrawn from the IPA in 1976. However, presumably because of the enormous impact of the missionary orthography on the history of the written Lakota language, the digraphic spelling of nasal vowels was preferred by the Lakota educators when deciding on the 1982 orthography, over the general Siouan linguistic tradition of writing nasal vowels with ogoneks. Effectively, this meant a resurrection of the n-with-long-right-leg, since the orthography was intended to be Latin, not Latin with one Greek letter eta. The practical orthographies used in the missionary dictionaries and grammars, and technical linguistic orthography of Boas and Deloria never had to decide on the problem of how to uppercase the nasal vowel, since as a digraphic representation, the nasal indicator never occurs initially, and those sources don't use all-cap text anywhere. But the 1982 orthography is intended for general use-- and that means that the Lakota text can also occur in all-cap environments such as chapter headers, and so on. So as in the case of African languages that adopted an IPA-based orthography, and then created uppercase versions of letters that had no uppercase in IPA (cf. U+0186, U+018F, U+01A9, for example), we have another instance here of orthographic usage driving the need for a new uppercase character: LATIN CAPITAL LETTER N WITH LONG RIGHT LEG. --Ken
Re: Kana and Case (was [totally OT] Unicode terminology)
On Wed, 22 Nov 2000, David Starner wrote: On Wed, Nov 22, 2000 at 11:39:53AM -0800, Thomas Chan wrote: On Wed, 22 Nov 2000 [EMAIL PROTECTED] wrote: I like "Astral Planes" better. Will they include INUKTITUT VIGESIMAL DIGITs? I don't. I write in Cantonese and some of contents of Plane 2 are very much down-to-earth for me. Are you a musician? If so, then Plane 1 would be important to you, too. What does importance have to do with it? A lot of societies would regard things astral as much more important than things earthly. I personally read supplimentary as a 'suppliement' i.e. an add-on, not essential. But that's just me. I think you're reading too much into it. "Astral" might be okay, but for many people, "astral plane" conjures up images of metaphysical or things of science fiction, and suggest they are to be taken less seriously. Personally, I don't dislike supplimentary because of any connotations it may or may not have, but instead because it's one of the clumsy words this field is littered with: internationalization, localization, supplimentary. Whether we like it or not, "supplementary" is the official term now, just like the use of the term "ideograph" or "letter". Throwing around terms like "Astral Planes", whether official or not, will just engender lack of credibility for Unicode, which has already happened to some extent among people who heard about some "Klingon" (in the Private Use Area) in Unicode. Yes, I can see how a bunch of characters created by people to name their horses getting added to Unicode could cause a loss of credibility. Or am I getting something confused here? I think a bit, yes. Those characters for names of horses (or individuals) aren't fictional like the Klingon alphabet. There already are some in the BMP for names of horses, such as U+9A04, U+9A4A, U+9A2E; or individuals such as U+66CC, btw--but probably included on the basis of being in legacy character sets as of the early 90's. In time, some of them, such as U+66CC, have become used by more people than the original bearer. I personally don't agree with frivolous racehorse names, but the bulk of the CJK Extension B in Plane 2 isn't stuff like that, but characters that have withstood at least the test of being included in large dictionaries and encyclopedias of the last few centuries. (I'm curious to know the codepoints of those racehorse names, and if any actually made it into Plane 2.) How about this - Unicode judges characters by their usefulness and the principles set forth in Chapter 1 of the Unicode standard, instead of looking down on some languages and users and considering them inherantly less worthy? I don't disagree with those principles, but it is clear that what is in the BMP occupies a first-class position until and if support for non-BMP areas comes--e.g., a few of the things mentioned on this list include support in Java, capacity of TrueType fonts, UTF-16 encoding, etc. If Planes 1 and 2 are not implemented because people think there are only nonsense personal ideographs there, or stuff only of interest to "unprofitable" academics, then that in turn harms the users of living written cultures, such as Cantonese, who do make use of them. (If they were in the BMP, then I could be using them today, with even software written years ago, but alas that is not the case. I know I can use the PUA now, but even that is second-class because of lack of standardization by definition, exclusion in sorting and character properties, etc.) Thomas Chan [EMAIL PROTECTED]
Re: Fwd: Kana and Case (was [totally OT] Unicode terminology)
As other people commented, there is nothing in principle that prevents Japanese from writing Hiragana with the elongation mark U+30FC. The Japanese Language Council can recommend all they want but the "spirit of language" has its own will as it has always been in any language. In fact a couple of Japanese top 10 "Popular Words of the Year" in recent years use U+30FC. See for example an entry for 1987, "da-ijo-buda-" (No problem.) on this page: http://www.fujifilm.co.jp/salon/utsurun/y87/ry.html and one of the 2 most widely popular words of 1998, "dattyu-no" (requires a context and a physical gag to explain this and so I won't.), on this page: http://www.jiyu.co.jp/gendai/shingo/shingo.html (Click on the 1998 link on the left.) The other elongation character, U+FF5E, is also used very widely in Hiragana writing in informal/comic book/personal mail writing. It could even be that U+FF5E represents a kind of contour tone associated with this jocular use, as opposed to, say, a more flat tone, with U+30FC. The use of these elongation symbols for Hiragana is so established in popular writing that Japanese search engines must ignore the differences between these elongation symbols in addition to ignoring Hiragana and Katakana differences. - Kat Rick McGowan wrote: For what it's worth, in this oh-so-important discussion... I have seen this length mark used with both Katakana and Hiragana (I suppose that puts me in the good company of 'Leven Digit Boy, only he can prove it and I can't). Call the usage nonce or whatever... So what? It would be fair to say this length mark is not NORMALLY used with Hiragana, which NORMALLY uses the vowel "u" to indicate lengthening. Katakana likewise NORMALLY uses the length mark, but is not prevented from using the "u" vowel, and in some contexts does so. For what it's worth trivia-wise, Katakana-as-okurigana is a style not normally used in the ordinary writing of Japanese sentences, but they can be, and on occasion are (especially in old orthography)...so don't be surprised when you see them... the natives are not going nuts, they're merely surprising the Conservative Foreign Formalists. I suppose the bicameral name of this thing, U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK, is one of those Great Mysteries Buried in Time, the answer to which only Dr. Whistler knows. (I would lay a handful of soft currency on the truth of the proposition that there exists an ancient meeting document on yellow lined paper of the pre-Consortium Unicode Working Group which could shed light on the question of this name, but I digress.) At least the name indicates that one is not nominally prevented from using it for Katakana, thus pre-empting perennial requests from the Completist Fringe for the addition of a second length mark for use with Hiragana. Rick Begin forwarded message: From: "Ayers, Mike" [EMAIL PROTECTED] Date: Wed Nov 22, 2000 01:32:58 PM US/Pacific To: Unicode List [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: RE: Kana and Case (was [totally OT] Unicode terminology) From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] Okay. Get out your copy of the lyrics to the Ranma 1/2 Complete Vocal Collection Vol. 1. Now look at the lyrics to Ranbada Ranma (that's Track 12) and tell me that the long vowel mark is not used with hiragana. The long vowel mark is not used with hiragana. Either there is a misuse or (most likely), you're interpreting a hyphen as a long vowel mark. -- Katsuhiko Momoi Netscape International Client Products Group [EMAIL PROTECTED] What is expressed here is my personal opinion and does not reflect official Netscape views.
Re: Lakota reprise: (Re)birth of a character
'leven Digit Boy expostulated: Just put in that letter. |\| | \ | | \ | | \ | |\| \ \ THAT is the letter you mean, right? And it's NOT IN UNICODE?! Well, no, not exactly. To borrow the ASCII art technique, it is: |/\ | | | | | | | | | | That is, a capital form based on the shape of U+019E. See Albert White Hat, Sr., Reading and Writing the Lakota Language (Lakota Iyapi un Wowapi nahan Yawapi), 1999, p. 12. The chapter heading, WOUNSPE TOKAHE (The First Teaching) is in all-caps, and what I have roughly indicated here in ASCII with an "N" is actually the letter shown above. --Ken