Re: h in Greek epigraphy

2002-12-15 Thread Doug Ewell
David J. Perry  wrote:

> My first answer to my correspondent was "just use Roman h."  Then I
> got to thinking: are there any situations in Unicode where actual
> letters of the alphabet are unified across scripts?  There are lots of
> punctuation marks and symbols that can be used with multiple scripts;
> but I can't think of a situation where an actual letter of the
> alphabet is so used.  A program that was sorting text, or trying to
> determine what script a word was written in, would get confused by
> hε̄γεμο̄ν.  Would this justify a proposal for "Greek small letter
> epigraphical h"?

One classic case of letters being unified across scripts is Kurdish,
which uses Latin Q and W in an otherwise all-Cyrillic alphabet.

-Doug Ewell
 Fullerton, California





Re: h in Greek epigraphy

2002-12-15 Thread Peter_Constable

On 12/15/2002 06:59:33 AM "David J. Perry" wrote:

>My first answer to my correspondent was "just use Roman h."  Then I got to

>thinking: are there any situations in Unicode where actual letters of the
>alphabet are unified across scripts?  There are lots of punctuation marks
and
>symbols that can be used with multiple scripts; but I can't think of a
>situation where an actual letter of the alphabet is so used.  A program
that
>was sorting text, or trying to determine what script a word was written
in,
>would get confused by hε̄γεμο̄ν.  Would this justify a proposal for "Greek
>small letter epigraphical h"?

This seems to be a variation on the question I asked recently having to do
with gamma, delta and theta being used in an otherwise Latin writing system
for Wakhi (and whether we needed to encode Latin versions of these). The
answer that most respondents gave was to simply say that this writing
system is based on more than one script.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485





Re: Documenting in Tamil Computing

2002-12-15 Thread Jungshik Shin



On Sun, 15 Dec 2002, Avarangal wrote:

> If you are preparing a Tamil document, intended for long term use you must
> use Unicode Encoding. Any other approach you take can be considered a

  Absolutely.

> Unfortunately Windows 95 and Windows 98 can only read Unicode pages.
> You can write in Unicode using Windows NT, 2000, XP and linux.

  Even under Win 9x/ME, there are free and commercial word processors,
and editors to enable you to make files in UTF-8 or UTF-16.  For instance,
yudit(http://www.yudit.org) has supported Tamil (both UTF-8 and TSCII)
for over a year now.


> 2/
>
> You know we all use Tamil eMail and for that we can not use Unicode.
> For Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that
> you still need to use this 8 bit encoding (which is not Unicode),

 What is 'Tamil eMail'? Is it a web mail service for Tamil?

> because
> Unicode is not mature enough to be used in multilingual email yet.
> You just have to make do with the 8bit TSCII encoding for Tamil eMail.

  I don't understand what you meant by Unicode not being
mature enough to support multilingual emails. Modern email clients like
Netscape7/Mozilla, MS Outlook (Express), and Mutt support UTF-8 very well.
If you believe in Unicode, there's no reason not to promote UTF-8 right
now for email exchange. Of course, some people relying on **broken**
Web mail services that assume that there's one-to-one relationship
between languages and encodings for them would have trouble reading UTF-8
messages, but that's not a fault of Unicode but that of those web mail
services. Unfortunately, most web mail services(hotmail, Yahoo, Lycos,
etc) are broken in that aspect. (btw, I have made a patch to a popular
opensource web mail program, IMP, to make it better support multilingual
emails, but there are stil rough edges in my patch)

  Jungshik





Documenting in Tamil Computing

2002-12-15 Thread Avarangal



fwd:
fyi: Below is a copy of a mail I circulated on the 
subject of 
Documenting in Tamil Computing 
<<
From: "sisrivas <[EMAIL PROTECTED]>" <[EMAIL PROTECTED]> Date: Sun Dec 
15, 2002 11:24pm Subject: Documenting in Tamil Computing  
 
We need to be clear as to the direction that Tamil is going with regard 
to Tamil computing. I'm writing this again and again as there is some miss 
understanding about what font encodings are doing to Tamil computing. (TSC is 
Temporary. TAB is temporary, OldType(alas Bamini) is temporary.
 
1/If you are preparing a Tamil document, intended for long term use you 
must use Unicode Encoding. Any other approach you take can be considered a 
waste of time if your content is intended for long term use.So do yourself 
and others a favour, prepare your documents using Tamil Unicode.
 
see item 7 at the URL http://www.gbizg.com/Tamilfonts/ekalappai.htm 
on how to get Unicode keyboard drivers.
 
Unfortunately Windows 95 and Windows 98 can only read Unicode pages. You 
can write in Unicode using Windows NT, 2000, XP and linux.
 
So what can you do if you only have Windows 95, 98 or 3.1,Well sorry 
you need to use TSC or TAB or even OldType (alas Bamini) encoding. You can 
assume that these documents that you make will not be usable in the near 
future.
 
Are you going to write a book, are you going to publish some research 
materials, etc, etc, do your self a favour. use Unicode and nothing else. 

 
DO NOT WASTE YOUR TIME. TIME IS PRESIOUS.
 
2/catch 22
 
You know we all use Tamil eMail and for that we can not use Unicode.For 
Tamil eMail we use 8bit encoding called TSCii. I'm sorry to say that you 
still need to use this 8 bit encoding (which is not Unicode), because 
Unicode is not mature enough to be used in multilingual email yet.
 
You just have to make do with the 8bit TSCII encoding for Tamil 
eMail.
 
For more infohttp://www.geocities.com/avarangal/
 
Sinnathuirai Srivas
 


Searching for CJK etc characters on electronic media

2002-12-15 Thread Smith, Mike
Hello

I frequently need to search computer storage media for words in
languages such as chinese, japanese, korean, russian etc.  Currently I
have been using tools that primarily display the computer values as
ASCII or hex.  The search tool has no display capability for unicode
values (or glyphs).  

When I re-order the UTF-16 value to hex (ie flip the bits) I get a large
number of false positive hits on the hex values.  Further, when I look
at the surrounding hex values to ascertain the context of the keyword
'hit' I am finding that it is extremely difficult to deduce a meaningful
context.

Can anyone assist me with an approach and or tools that can assist with
reading and searching computer media that contains CJK etc characters?

Thanks in advance

Mike Smith





Re: IPA for "hard g"

2002-12-15 Thread Eric Muller




Doug Ewell wrote:

  I didn't
know (and had not checked) whether the Handbook was available to
non-members of the Association. 

ISBN 0-521-63751-1. Cambridge University Press. List price is $18 in the
US. Available through Amazon and such.

Eric.







h in Greek epigraphy

2002-12-15 Thread David J. Perry
I had a question about how to handle the use of lowercase h in Greek epigraphy.  For 
example, the word spelled ἡγεμών in modern standardized texts might be found 
on a stone written in one of the archaic Greek alphabets as ΗΕΓΕΜΟΝ, where the 
capital Eta represents the "h" sound.  This would be transcribed by an epigrapher 
using lowercase letters as hε̄γεμο̄ν  (note the use of Roman h to represent 
the aspirate and the combining macrons on epsilon and omicron).

My first answer to my correspondent was "just use Roman h."  Then I got to thinking: 
are there any situations in Unicode where actual letters of the alphabet are unified 
across scripts?  There are lots of punctuation marks and symbols that can be used with 
multiple scripts; but I can't think of a situation where an actual letter of the 
alphabet is so used.  A program that was sorting text, or trying to determine what 
script a word was written in, would get confused by hε̄γεμο̄ν.  Would this 
justify a proposal for "Greek small letter epigraphical h"?

David