RE: Indic scripts, visual-order vs phonetic-order

2002-06-06 Thread Maurice Bauhahn

When we looked into this at the Cambodian Ministry of Education, Youth and
Sport, it was decided that Khmer handwriting order should {largely} follow
phonetic order. Of course typewriters had to follow visual-order. Most
computer implementations previously were not able to handle phonetic order
so also were in visual order.

The mention above of 'largely' is a subsequent discovery that ROBAT (an
analog to indic REPHA) in handwriting is written in visual order (it is a
superscript which phonetically is an initial RO).

Possibly this relatively recent habit of using visual order has begun to
affect the handwriting order...so many Khmers now write in visual order as
well.

In Khmer one of the problems visual order brings up for computer
implementations is the large variety of character orders this could involve.
There are two-glyph vowels with pre and post consonant placement, one-glyph
vowels which preceed, and one-glyph vowels which follow (super or sub or
post). Failure to lock those into a standard order would result in quite a
bit of preprocessing for sorting, not to mention the problems of
searchin/spell checking.

Maurice

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Samphan Raruenrom
Sent: 05 June 2002 15:16
To: Unicode Public List
Subject: Indic scripts, visual-order vs phonetic-order


Hello,

I'm wondering about the practice of using visual-order vs phonetic-order
in Indic writing on typewriter vs computer vs handwritten. Are they
all the same?

I also heard that there are two input-method styles for Indic,
visual-order and phonetic-order. Is it true? And what is more popular?

--
Samphan Raruenrom
Information Research and Development Division,
National Electronics and Computer Technology Center, Thailand.
http://www.nectec.or.th/home/index.html








RE: Indic scripts, visual-order vs phonetic-order

2002-06-06 Thread Peter_Constable


On 06/06/2002 12:45:15 AM Maurice Bauhahn wrote:

In Khmer one of the problems visual order brings up for computer
implementations is the large variety of character orders this could
involve.
There are two-glyph vowels with pre and post consonant placement,
one-glyph
vowels which preceed, and one-glyph vowels which follow (super or sub or
post). Failure to lock those into a standard order would result in quite a
bit of preprocessing for sorting, not to mention the problems of
searchin/spell checking.

It seems to me that this is a non-issue in relation to searching and spell
checking since both of those processes are sensitive only to sequences of
encoded characters and do no need to know what any given character is used
to represent (unless you're doing something akin to sound-based searching).
As for sorting, the preprocessing is not necessarily a big deal -- at
least, Thai and Lao have visually-ordered encoding that requires a bit of
reordering before creating sort keys (or as part of the process of creating
sort keys), but the preprocessing is pretty trivial: Vp C  C Vp.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]






Indic scripts, visual-order vs phonetic-order

2002-06-05 Thread Samphan Raruenrom

Hello,

I'm wondering about the practice of using visual-order vs phonetic-order
in Indic writing on typewriter vs computer vs handwritten. Are they
all the same?

I also heard that there are two input-method styles for Indic,
visual-order and phonetic-order. Is it true? And what is more popular?

-- 
Samphan Raruenrom
Information Research and Development Division,
National Electronics and Computer Technology Center, Thailand.
http://www.nectec.or.th/home/index.html





Phonetic grouping in UniHan

2002-02-04 Thread Marco Cimarosti

In the on-line UniHan database (http://www.unicode.org/charts/unihan.html) I
see a field that I have never seen before:

-  Other useful dictionary-like data 
-   [...]
-   A phonetic grouping for the character

The phonetic grouping seems to be an integer number, and I wonder:

- What does this information mean?

- Why some characters don't have it? Is it just missing or it does not apply
to them?

- Where does it come from? I have not seen a corresponding field in the
plain-text file UniHan.txt.

Thanks in advance.
_ Marco


P.S.: I take the occasion to congratulate the author(s) of the on-line
UniHan for all the recent improvements, especially the addition of the
Chinese and Japanese compounds words.

I also take the occasion to suggest a new field that could be very useful:
the frequency of usage of each character. This information may be derived
from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
(http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html).
(I don't know the licensing terms for using these data.)

_ M.




Re: Phonetic grouping in UniHan

2002-02-04 Thread John H. Jenkins


On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote:

 In the on-line UniHan database (http://www.unicode.org/charts/unihan.html)
  I
 see a field that I have never seen before:

   -  Other useful dictionary-like data
   -   [...]
   -   A phonetic grouping for the character

 The phonetic grouping seems to be an integer number, and I wonder:

 - What does this information mean?

 - Why some characters don't have it? Is it just missing or it does not 
 apply
 to them?

 - Where does it come from? I have not seen a corresponding field in the
 plain-text file UniHan.txt.


You need the latest Unihan.txt.  In there you have:

#   kPhonetic*
#   The phonetic index for the character from _Ten Thousand 
Characters: An
#   Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and 
Walsh,
#   1980.

The asterisk indicates that it's a field we're still populating.

 I also take the occasion to suggest a new field that could be very useful:
 the frequency of usage of each character. This information may be derived
 from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
 (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
 KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html)
 .
 (I don't know the licensing terms for using these data.)



We also have a newish kFrequency field.

#   kFrequency
#   A rough fequency measurement for the character based on analysis 
of Chinese
#   USENET postings

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





Re: Phonetic grouping in UniHan

2002-02-04 Thread Thomas Chan

On Mon, 4 Feb 2002, Marco Cimarosti wrote:

 I also take the occasion to suggest a new field that could be very useful:
 the frequency of usage of each character. This information may be derived
 from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
 (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
 KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html).
 (I don't know the licensing terms for using these data.)

I think whatever frequency data is included, the particulars of how they
were arrived at (or where to find such information) should be included,
e.g., Tsai's findings were based on 1993-1994 Big5 Usenet postings.

There's also frequency data buried under the kFenn field (as yet
unpopulated), where A, B, C, D, E, F, G, H, I, K (J is omitted)
indicates if it falls in the first, second, third, etc group of five
hundred characters, based on earliness of occurrence in the textbooks of
1926.  (The P code is also used for something that is not quite clear to
me from the explanation in the dictionary alone--I presume it might refer
to characters in the dictionary that were not in the 1926 study.)

P.S. Recently you asked about estimates of usage of Plane 2
characters--since a large percentage are CNS 11643-1992 characters (and
perhaps the oldest IT source), that may provide a clue.  In the
Concluding Remarks section of Christian Wittern's Taming the
Masses[1], the higher CNS planes (ignore 1 and 2, which are in the
BMP, and perhaps some parts of 3) are rarely used in historic texts, and
he expects even lower usage in modern texts.

[1] http://www.gwdg.de/~cwitter/cw/taming.html


Thomas Chan
[EMAIL PROTECTED]





Re: Information about curly-tailed phonetic letters

2000-12-18 Thread Michael Everson

Ar 23:05 -0800 2000-12-17, scríobh Richard Cook:

And as for the consonant symbols, why stop with t, d, n, l, c, z? Why
not include the rest of the curly-tail and other symbols in the
following chart:

http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf

there are a few other bits of data you might glean also, including usage
of the apical vowel symbols.

I think we need to consult, offline, with the IPA about this matter.

Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Mob +353 86 807 9169 ** Fax +353 1 478 2597 ** Vox +353 1 478 2597
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire





Re: Information about curly-tailed phonetic letters

2000-12-17 Thread J%ORG KNAPPEN

The curly-tail consonants t, d, n, l, c, z are also included in the
TeX IPA (tipa fonts). The documentation of those fonts is available
on 

ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz

--J"org Knappen





Re: Information about curly-tailed phonetic letters

2000-12-17 Thread Richard Cook

"J%ORG KNAPPEN" wrote:
 
 The curly-tail consonants t, d, n, l, c, z are also included in the
 TeX IPA (tipa fonts). The documentation of those fonts is available
 on
 
 ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz
 
 --J"org Knappen

Hi J"org,
It looks as if you sent the wrong url. The right path is, I believe:

ftp://ftp.dante.de/tex-archive/fonts/tipa/

And as for the consonant symbols, why stop with t, d, n, l, c, z? Why
not include the rest of the curly-tail and other symbols in the
following chart:

http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf

there are a few other bits of data you might glean also, including usage
of the apical vowel symbols.

-Richard



Re: curly-tailed phonetic letters

2000-12-08 Thread Richard Cook

This table has undergone some further revision:

http://stedt.berkeley.edu/pdf/curly-tail-table3.pdf

Please note in the center of the table:

U+0291/U+0293 and U+0255/U+0286

These 4 may in fact be 2 pairs of functional equivalents (synographs),
pointing to the same place of articulation. According to Pullum 
Ladusaw (1996), IPA approval of U+0286 and U+0293 was withdrawn in 1989.

Please note that also in the above table are symbols for the 2 pairs of
so-called "apical" vowels. These include U+0285 and U+027F (the
unrounded apicals, relatively front and back, respectively), as well as
their rounded counterparts. These are all 4 non-IPA-sanctioned symbols.


Richard S. COOK, Jr.
STEDT Project, Linguistics Department
University of California, Berkeley



Re: Information about curly-tailed phonetic letters

2000-11-27 Thread Robert Wheelock




From: JÖRG KNAPPEN [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Information about curly-tailed phonetic letters
Date: Fri, 24 Nov 2000 01:33:05 -0800 (GMT-0800)

The curly-tail consonants t, d, n, l, c, z are also included in the
TeX IPA (tipa fonts). The documentation of those fonts is available
on

ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz

--J"org Knappen


Hello!
Most IPA fonts include these lowercase right-tailed retroflex letters:  t, 
d, z, c, j, l, n, r; however, SIL's *Encore* Series Fonts (currently in 
version 3.0) also has the highercase versions of those 8 + curly-tailed s, 
esh, ezh in both higher-  lowercase.  I'd use a curly-tailed s to pair up 
with curly-tailed z for the retroflex sibilants—that'll save the 
curly-tailed c to pair with curly-tailed j for your retroflex laminal 
affricates—only if you don't want to use a diacritic accent (like an 
underring) to represent retroflexion.  Thank You!

Robert Lloyd Wheelock


_
Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com




Re: Information about curly-tailed phonetic letters

2000-11-25 Thread Michael Everson

Ar 13:10 -0800 2000-11-23, scríobh Richard Cook:
Hi everyone,
This paper, brought to your attention last June

http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf
http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg

has been updated recently. Still working on getting the formal
proposal together, and still welcoming comments and/or suggestions.

Ah. I forgot. Richard, I'd come across these characters independently some
time ago, when at the Beijing meeting of WG2 I'd collected a number of
books on Yi, in which these characters occur. I think your arguments about
the productivity of the curl in the IPA are spot on.

In short, I think these characters should be added and that there should be
no impediment to doing so. In fact, in September I was updating one of the
fonts Asmus and I use to prepare tables and I added these characters for
future use.


Michael Everson  **  Everson Gunn Teoranta  **   http://www.egt.ie
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Vox +353 1 478 2597 ** Fax +353 1 478 2597 ** Mob +353 86 807 9169
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire





Re: Information about curly-tailed phonetic letters

2000-11-25 Thread Richard Cook

Michael Everson wrote:
 
 Ar 13:10 -0800 2000-11-23, scríobh Richard Cook:
 Hi everyone,
 This paper, brought to your attention last June
 
 http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf
 http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg
 
 has been updated recently. Still working on getting the formal
 proposal together, and still welcoming comments and/or suggestions.
 
 Ah. I forgot. Richard, I'd come across these characters independently some
 time ago, when at the Beijing meeting of WG2 I'd collected a number of
 books on Yi, in which these characters occur. I think your arguments about
 the productivity of the curl in the IPA are spot on.

Michael,
Yes, transcription of Yi (Lolo) and other Lolo-ish and Lolo-Burmese
languages is one of the things I'm talking about in the above paper. And
phonetic transcriptions of Tibetan etc. ...
 
 In short, I think these characters should be added and that there should be
 no impediment to doing so. In fact, in September I was updating one of the
 fonts Asmus and I use to prepare tables and I added these characters for
 future use.
 

Did you add curly-tail-l and curly-tail-r too? As I mention in the
paper, the productivity of symbols for this place of articulation admits
the possibility of curly-tail-r as well ... though I've never seen it
except in my transcription font. I added it to my font just for the
production of that paper ... but haven't added the symbol to the paper
yet. Wondering if I should also add it to the paper title ...

But I think that some phonologists or phoneticians may in fact one day
take it into their heads to use curly-tail-l and curly-tail-r more
widely ... so, the chars for this place series ought to be available to everyone.


Richard S. COOK, Jr.
STEDT Project, Linguistics Department
University of California, Berkeley



Re: Information about curly-tailed phonetic letters

2000-11-24 Thread J%ORG KNAPPEN

The curly-tail consonants t, d, n, l, c, z are also included in the
TeX IPA (tipa fonts). The documentation of those fonts is available
on 

ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz

--J"org Knappen





Re: Information about curly-tailed phonetic letters

2000-11-24 Thread Richard Cook

"J%ORG KNAPPEN" wrote:
 
 The curly-tail consonants t, d, n, l, c, z are also included in the
 TeX IPA (tipa fonts). The documentation of those fonts is available
 on
 
 ftp://ftp.dante.de/texarchive/fonts/tipa/tipaman.ps.gz
 
 --J"org Knappen

Thanks. The URL should have a hyphen in it:

ftp://ftp.dante.de/tex-archive/fonts/tipa/

and I don't see the curly-tail-l in the tipaman.pdf ... which is not
really surprising. and no curly-tail-r either :-)



Re: Information about curly-tailed phonetic letters

2000-11-23 Thread Richard Cook

Hi everyone,
This paper, brought to your attention last June

http://stedt.berkeley.edu/pdf/curly-tailed-tdnlcz.pdf
http://stedt.berkeley.edu/pdf/TranscriptionTable-WUZongji.jpg

has been updated recently. Still working on getting the formal
proposal together, and still welcoming comments and/or suggestions.

Best,
Richard


Date: Mon, 5 Jun 2000 14:48:09 -0800 (GMT-0800)
Kenneth Whistler wrote:
 
 Richard S. Cook, of the STEDT Project at the University
 of California, Berkeley, passes on the following URL's, which
 contain documentation regarding the use of curly-tailed phonetic
 letters in the Sinological and Sino-Tibetan traditions.
 
 --Ken
 
  Hi there,
  You may recall that we (on the Unicode list and elsewhere) discussed the
  issue of certain phonetic transcription characters and their possible
  inclusion in the Unicode standard. Here is a copy of a paper that I
  prepared some time ago on this subject.
 

old URL's deleted

 
  I welcome any comments or suggestions, and please feel free to pass
  these URL's on to the Unicode list, as I am currently not subscribed.
 
  Best,
  Richard
  
  Richard S. COOK, Jr.
  STEDT Project, Linguistics Department
  University of California, Berkeley
  mailto:[EMAIL PROTECTED]
  http://stedt.berkeley.edu/
 
 



Phonetic?

2000-06-28 Thread rampshot

Exactly what constitutes a phonetic sound, besides being made by a human
being? I mean, clapping isn't phonetic, is it?

Robert Lozyniak

 01 02  03 04 05 06
"Don't stop   movin',
 07   08   09  10  11  12  13  14
It's your life,   keep on groovin',
 15 16 17 18 19 20 21
Get it right,
2223  24 25  26 27 28 29 30 31 32
You've got to get it right"

-- some dance song I can't remember who by


Get free email and a permanent address at http://www.netaddress.com/?N=1