Unicode Transcriptions

2001-02-15 Thread Mark Davis

I am still missing

Bopomofo, Khmer, Mongolian, Myanmar, Sinhala, Syriac, Thaana

on http://www.macchiato.com/unicode/Unicode_transcriptions.html

If anyone could supply one of these, I would appreciate it.

Also, Ken suggested that the Bopomofo should be a Bopomofo transcription of
the Chinese for Unicode, not a transliteration from English. Can anyone
supply that?

Mark

--
http://www.macchiato.com




RE: Unicode Transcriptions

2001-02-15 Thread ROBERT HODGSON

Dear colleagues,

The American Bible Society is undertaking a prototype project which will
bring an elementary Hebrew course online in the next year or so.

We anticipate using the Unicode fonts to display the Hebrew characters.

I would very much appreciate being in touch with other colleagues who have
had experience in displaying Hebrew characters with Unicode. Of interest to
us would be experiences in
1) browser selection
2) operating systems selection
3) size of Hebrew character set to chose

I'm sure there will be other questions as we move forward, but these are the
ones we are looking at just now.

Thanks in advance to the Unicode list members for any leads they can give
me.

Best wishes,

Bob Hodgson
American Bible Society




remapping devanagari

2001-02-15 Thread Pam Lothspeich

Hello,

I am a new user of unicode for devanagari (Hindi) in Microsoft Word.  I am
very impressed with this font, but I'm wondering if there is a way to remap
the keyboard, so that I don't have to use shortcut keys which require
multiple keystrokes in order to type devanagari.

Also, I noticed that not all of the roman diacriticals for devanagari are
available in independent form, requiring one to use the "spacing modifier
letters."   Does anyone know if there are plans to make the roman
transliteration fonts more comprehensive?

Pam Lothspeich
Columbia University  




Re: Surrogate space in Unicode

2001-02-15 Thread jgo

 At 2001-02-06 07:48:29 -0800 Mark Davis wrote:
 At 2001-02-06 01:51 "nikita k" [EMAIL PROTECTED] wrote:
 What is surrogate space in unicode?

 It is the set of code points that can be addressed using
 surrogate code points.  For more information, see the
 glossary at www.unicode.org.

+ Supplementary Code Point.  A Unicode code point between
+ U+1 and U+10.

+ Surrogate Code Point.  A Unicode code point in the range U+D800
+ through U+DC00.  Reserved for use by UTF-16, where a pair of
+ surrogate code units (a high surrogate followed by a low
+ surrogate) "stand in" for a supplementary code point.

+ Surrogate Character.  A misnomer.  It would be an encoded
+ character having a surrogate code point, which is impossible.
+ Do not use this term.

+ Surrogate Pair.  A coded character representation for a single
+ abstract character that consists of a sequence of two code units,
+ where the first unit of the pair is a high-surrogate and the
+ second is a low-surrogate. (See Definition D27 in Section 3.7,
+ Surrogates .)

So, I guess it's safe to say that a surrogate code point is
a surrogate code point... which is a surrogate for a supplementary
code point, which is a code point between something and something
else.

Someone needs to take a break from the bureaucrateze and learn
again how to communicate clearly.  Is that not a part of the
goal, here?

John G. Otto Nisus Software, Engineering
www.infoclick.com  www.mathhelp.com  www.nisus.com  software4usa.com
EasyAlarms  PowerSleuth  NisusEMail  NisusWriter  MailKeeper  QUED/M
   My opinions are probably not those of Nisus Software, Inc.





Re: Surrogate space in Unicode

2001-02-15 Thread DougEwell2

In a message dated 2001-02-15 15:26:55 Pacific Standard Time, [EMAIL PROTECTED] 
writes:

  At 2001-02-06 07:48:29 -0800 Mark Davis wrote:
   At 2001-02-06 01:51 "nikita k" [EMAIL PROTECTED] wrote:
   What is surrogate space in unicode?
  
  (Mark defines various terms relating to 'supplementary' and 'surrogate')
  
  So, I guess it's safe to say that a surrogate code point is
  a surrogate code point... which is a surrogate for a supplementary
  code point, which is a code point between something and something
  else.
  
  Someone needs to take a break from the bureaucrateze and learn
  again how to communicate clearly.  Is that not a part of the
  goal, here?

I thought Mark's definitions were both accurate and clear, unlike John's 
rejoinder, which was neither.

It has proven difficult to come up with convenient terms for the Unicode 
characters encoded at U+1 and beyond.  The term 'surrogate' has been 
misused in an attempt to do this.  It is important to use consistent terms 
that demonstrate an understanding of what is going on.

I am not a member of the Consortium, and certainly would not consider myself 
a bureaucrat, so I wil take a stab at this in the plainest English I can find 
that does not sacrifice accuracy.

1.  A Unicode 'code point' is a number between 0 and 1,114,111 inclusive, 
usually expressed in hexadecimal (U+ through U+10).  Not every code 
point necessarily represents a valid character, although most do.  For 
example, there is no character encoded at U+.

2.  A 'basic' code point, which may represent a 'basic character', can range 
from U+ through U+.  The remaining code points (U+1 through 
U+10) are 'supplementary' code points, each of which may represent a 
'supplementary character'.

3.  'Surrogate' code points range from U+D800 through U+DFFF (not U+DC00).  
They do not directly represent characters (so there is no such thing as a 
'surrogate character'), but two of them may be used together according to the 
rules of UTF-16 to represent a supplementary character.  The two surrogate 
code points used for this purpose would be called a 'surrogate pair'.  Don't 
separate them.

Is that better?

-Doug Ewell
 Fullerton, California



Re: Surrogate space in Unicode

2001-02-15 Thread Tom Lord


It has proven difficult to come up with convenient terms for
the Unicode characters encoded at U+1 and beyond.
[]
2.  A 'basic' code point, which may represent a 'basic
character', can range from U+ through U+.

For what purpose is such a distinction needed?  

-t



Re: Surrogate space in Unicode

2001-02-15 Thread DougEwell2

In a message dated 2001-02-15 23:15:23 Pacific Standard Time, [EMAIL PROTECTED] 
writes:

   It has proven difficult to come up with convenient terms for
   the Unicode characters encoded at U+1 and beyond.
   []
   2.  A 'basic' code point, which may represent a 'basic
   character', can range from U+ through U+.
  
  For what purpose is such a distinction needed?  

It is needed because of UTF-16, which requires two 16-bit code points to 
represent a character with a value of U+1 or higher (a supplementary 
character) but only one 16-bit code point to represent a basic character.

Many descriptions on the Web erroneously claim that Unicode contains only the 
first 64K characters of ISO 10646.  Even the Unicode Standard Version 3.0 
states, "Plain Unicode text consists of sequences of 16-bit character codes." 
 To me this sentence is very misleading and requires that special attention 
be paid to the nature of supplementary characters, those to be assigned in 
Unicode 3.1 and those to be assigned in future versions.

Because of the widespread belief that Unicode stops at U+, many fonts and 
applications that claim to support Unicode can only handle basic characters, 
not supplementary characters.

-Doug Ewell
 Fullerton, California