RE: Indic scripts, visual-order vs phonetic-order
When we looked into this at the Cambodian Ministry of Education, Youth and Sport, it was decided that Khmer handwriting order should {largely} follow phonetic order. Of course typewriters had to follow visual-order. Most computer implementations previously were not able to handle phonetic order so also were in visual order. The mention above of 'largely' is a subsequent discovery that ROBAT (an analog to indic REPHA) in handwriting is written in visual order (it is a superscript which phonetically is an initial RO). Possibly this relatively recent habit of using visual order has begun to affect the handwriting order...so many Khmers now write in visual order as well. In Khmer one of the problems visual order brings up for computer implementations is the large variety of character orders this could involve. There are two-glyph vowels with pre and post consonant placement, one-glyph vowels which preceed, and one-glyph vowels which follow (super or sub or post). Failure to lock those into a standard order would result in quite a bit of preprocessing for sorting, not to mention the problems of searchin/spell checking. Maurice -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Samphan Raruenrom Sent: 05 June 2002 15:16 To: Unicode Public List Subject: Indic scripts, visual-order vs phonetic-order Hello, I'm wondering about the practice of using visual-order vs phonetic-order in Indic writing on typewriter vs computer vs handwritten. Are they all the same? I also heard that there are two input-method styles for Indic, visual-order and phonetic-order. Is it true? And what is more popular? -- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html
RE: Indic scripts, visual-order vs phonetic-order
On 06/06/2002 12:45:15 AM Maurice Bauhahn wrote: In Khmer one of the problems visual order brings up for computer implementations is the large variety of character orders this could involve. There are two-glyph vowels with pre and post consonant placement, one-glyph vowels which preceed, and one-glyph vowels which follow (super or sub or post). Failure to lock those into a standard order would result in quite a bit of preprocessing for sorting, not to mention the problems of searchin/spell checking. It seems to me that this is a non-issue in relation to searching and spell checking since both of those processes are sensitive only to sequences of encoded characters and do no need to know what any given character is used to represent (unless you're doing something akin to sound-based searching). As for sorting, the preprocessing is not necessarily a big deal -- at least, Thai and Lao have visually-ordered encoding that requires a bit of reordering before creating sort keys (or as part of the process of creating sort keys), but the preprocessing is pretty trivial: Vp C C Vp. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Indic scripts, visual-order vs phonetic-order
Hello, I'm wondering about the practice of using visual-order vs phonetic-order in Indic writing on typewriter vs computer vs handwritten. Are they all the same? I also heard that there are two input-method styles for Indic, visual-order and phonetic-order. Is it true? And what is more popular? -- Samphan Raruenrom Information Research and Development Division, National Electronics and Computer Technology Center, Thailand. http://www.nectec.or.th/home/index.html
Phonetic grouping in UniHan
In the on-line UniHan database (http://www.unicode.org/charts/unihan.html) I see a field that I have never seen before: - Other useful dictionary-like data - [...] - A phonetic grouping for the character The phonetic grouping seems to be an integer number, and I wonder: - What does this information mean? - Why some characters don't have it? Is it just missing or it does not apply to them? - Where does it come from? I have not seen a corresponding field in the plain-text file UniHan.txt. Thanks in advance. _ Marco P.S.: I take the occasion to congratulate the author(s) of the on-line UniHan for all the recent improvements, especially the addition of the Chinese and Japanese compounds words. I also take the occasion to suggest a new field that could be very useful: the frequency of usage of each character. This information may be derived from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html). (I don't know the licensing terms for using these data.) _ M.
Re: Phonetic grouping in UniHan
On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote: In the on-line UniHan database (http://www.unicode.org/charts/unihan.html) I see a field that I have never seen before: - Other useful dictionary-like data - [...] - A phonetic grouping for the character The phonetic grouping seems to be an integer number, and I wonder: - What does this information mean? - Why some characters don't have it? Is it just missing or it does not apply to them? - Where does it come from? I have not seen a corresponding field in the plain-text file UniHan.txt. You need the latest Unihan.txt. In there you have: # kPhonetic* # The phonetic index for the character from _Ten Thousand Characters: An # Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and Walsh, # 1980. The asterisk indicates that it's a field we're still populating. I also take the occasion to suggest a new field that could be very useful: the frequency of usage of each character. This information may be derived from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html) . (I don't know the licensing terms for using these data.) We also have a newish kFrequency field. # kFrequency # A rough fequency measurement for the character based on analysis of Chinese # USENET postings == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Phonetic grouping in UniHan
On Mon, 4 Feb 2002, Marco Cimarosti wrote: I also take the occasion to suggest a new field that could be very useful: the frequency of usage of each character. This information may be derived from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html). (I don't know the licensing terms for using these data.) I think whatever frequency data is included, the particulars of how they were arrived at (or where to find such information) should be included, e.g., Tsai's findings were based on 1993-1994 Big5 Usenet postings. There's also frequency data buried under the kFenn field (as yet unpopulated), where A, B, C, D, E, F, G, H, I, K (J is omitted) indicates if it falls in the first, second, third, etc group of five hundred characters, based on earliness of occurrence in the textbooks of 1926. (The P code is also used for something that is not quite clear to me from the explanation in the dictionary alone--I presume it might refer to characters in the dictionary that were not in the 1926 study.) P.S. Recently you asked about estimates of usage of Plane 2 characters--since a large percentage are CNS 11643-1992 characters (and perhaps the oldest IT source), that may provide a clue. In the Concluding Remarks section of Christian Wittern's Taming the Masses[1], the higher CNS planes (ignore 1 and 2, which are in the BMP, and perhaps some parts of 3) are rarely used in historic texts, and he expects even lower usage in modern texts. [1] http://www.gwdg.de/~cwitter/cw/taming.html Thomas Chan [EMAIL PROTECTED]
Re: Information about curly-tailed phonetic letters
Ar 23:05 -0800 2000-12-17, scríobh Richard Cook: And as for the consonant symbols, why stop with t, d, n, l, c, z? Why not include the rest of the curly-tail and other symbols in the following chart: http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf there are a few other bits of data you might glean also, including usage of the apical vowel symbols. I think we need to consult, offline, with the IPA about this matter. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Information about curly-tailed phonetic letters
The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen
Re: Information about curly-tailed phonetic letters
"J%ORG KNAPPEN" wrote: The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen Hi J"org, It looks as if you sent the wrong url. The right path is, I believe: ftp://ftp.dante.de/tex-archive/fonts/tipa/ And as for the consonant symbols, why stop with t, d, n, l, c, z? Why not include the rest of the curly-tail and other symbols in the following chart: http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf there are a few other bits of data you might glean also, including usage of the apical vowel symbols. -Richard
Re: curly-tailed phonetic letters
This table has undergone some further revision: http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf Please note in the center of the table: U+0291/U+0293 and U+0255/U+0286 These 4 may in fact be 2 pairs of functional equivalents (synographs), pointing to the same place of articulation. According to Pullum Ladusaw (1996), IPA approval of U+0286 and U+0293 was withdrawn in 1989. Please note that also in the above table are symbols for the 2 pairs of so-called "apical" vowels. These include U+0285 and U+027F (the unrounded apicals, relatively front and back, respectively), as well as their rounded counterparts. These are all 4 non-IPA-sanctioned symbols. Richard S. COOK, Jr. STEDT Project, Linguistics Department University of California, Berkeley
Re: Information about curly-tailed phonetic letters
From: JÖRG KNAPPEN [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Subject: Re: Information about curly-tailed phonetic letters Date: Fri, 24 Nov 2000 01:33:05 -0800 (GMT-0800) The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen Hello! Most IPA fonts include these lowercase right-tailed retroflex letters: t, d, z, c, j, l, n, r; however, SIL's *Encore* Series Fonts (currently in version 3.0) also has the highercase versions of those 8 + curly-tailed s, esh, ezh in both higher- lowercase. I'd use a curly-tailed s to pair up with curly-tailed z for the retroflex sibilantsthat'll save the curly-tailed c to pair with curly-tailed j for your retroflex laminal affricatesonly if you don't want to use a diacritic accent (like an underring) to represent retroflexion. Thank You! Robert Lloyd Wheelock _ Get more from the Web. FREE MSN Explorer download : http://explorer.msn.com
Re: Information about curly-tailed phonetic letters
Ar 13:10 -0800 2000-11-23, scríobh Richard Cook: Hi everyone, This paper, brought to your attention last June http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg has been updated recently. Still working on getting the formal proposal together, and still welcoming comments and/or suggestions. Ah. I forgot. Richard, I'd come across these characters independently some time ago, when at the Beijing meeting of WG2 I'd collected a number of books on Yi, in which these characters occur. I think your arguments about the productivity of the curl in the IPA are spot on. In short, I think these characters should be added and that there should be no impediment to doing so. In fact, in September I was updating one of the fonts Asmus and I use to prepare tables and I added these characters for future use. Michael Everson ** Everson Gunn Teoranta ** http://www.egt.ie 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire
Re: Information about curly-tailed phonetic letters
Michael Everson wrote: Ar 13:10 -0800 2000-11-23, scríobh Richard Cook: Hi everyone, This paper, brought to your attention last June http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg has been updated recently. Still working on getting the formal proposal together, and still welcoming comments and/or suggestions. Ah. I forgot. Richard, I'd come across these characters independently some time ago, when at the Beijing meeting of WG2 I'd collected a number of books on Yi, in which these characters occur. I think your arguments about the productivity of the curl in the IPA are spot on. Michael, Yes, transcription of Yi (Lolo) and other Lolo-ish and Lolo-Burmese languages is one of the things I'm talking about in the above paper. And phonetic transcriptions of Tibetan etc. ... In short, I think these characters should be added and that there should be no impediment to doing so. In fact, in September I was updating one of the fonts Asmus and I use to prepare tables and I added these characters for future use. Did you add curly-tail-l and curly-tail-r too? As I mention in the paper, the productivity of symbols for this place of articulation admits the possibility of curly-tail-r as well ... though I've never seen it except in my transcription font. I added it to my font just for the production of that paper ... but haven't added the symbol to the paper yet. Wondering if I should also add it to the paper title ... But I think that some phonologists or phoneticians may in fact one day take it into their heads to use curly-tail-l and curly-tail-r more widely ... so, the chars for this place series ought to be available to everyone. Richard S. COOK, Jr. STEDT Project, Linguistics Department University of California, Berkeley
Re: Information about curly-tailed phonetic letters
The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen
Re: Information about curly-tailed phonetic letters
"J%ORG KNAPPEN" wrote: The curly-tail consonants t, d, n, l, c, z are also included in the TeX IPA (tipa fonts). The documentation of those fonts is available on ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz --J"org Knappen Thanks. The URL should have a hyphen in it: ftp://ftp.dante.de/tex-archive/fonts/tipa/ and I don't see the curly-tail-l in the tipaman.pdf ... which is not really surprising. and no curly-tail-r either :-)
Re: Information about curly-tailed phonetic letters
Hi everyone, This paper, brought to your attention last June http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg has been updated recently. Still working on getting the formal proposal together, and still welcoming comments and/or suggestions. Best, Richard Date: Mon, 5 Jun 2000 14:48:09 -0800 (GMT-0800) Kenneth Whistler wrote: Richard S. Cook, of the STEDT Project at the University of California, Berkeley, passes on the following URL's, which contain documentation regarding the use of curly-tailed phonetic letters in the Sinological and Sino-Tibetan traditions. --Ken Hi there, You may recall that we (on the Unicode list and elsewhere) discussed the issue of certain phonetic transcription characters and their possible inclusion in the Unicode standard. Here is a copy of a paper that I prepared some time ago on this subject. old URL's deleted I welcome any comments or suggestions, and please feel free to pass these URL's on to the Unicode list, as I am currently not subscribed. Best, Richard Richard S. COOK, Jr. STEDT Project, Linguistics Department University of California, Berkeley mailto:[EMAIL PROTECTED] http://stedt.berkeley.edu/
Phonetic?
Exactly what constitutes a phonetic sound, besides being made by a human being? I mean, clapping isn't phonetic, is it? Robert Lozyniak 01 02 03 04 05 06 "Don't stop movin', 07 08 09 10 11 12 13 14 It's your life, keep on groovin', 15 16 17 18 19 20 21 Get it right, 2223 24 25 26 27 28 29 30 31 32 You've got to get it right" -- some dance song I can't remember who by Get free email and a permanent address at http://www.netaddress.com/?N=1