RE: Unicode String literals on various

2000-08-13 Thread Edward Cherlin

At 9:58 AM -0800 8/8/00, [EMAIL PROTECTED] wrote:
Hi, Antoine.

  I can continue to dissert on this subject

Please!

(all of this should
  finally be
   cooked in a FAQ anyway),

I'll help, which means I need as much of your dissertings as possible.

but I do not want to flood the list
  with a marginaly interesting subject.

Merci beaucoup. It was very informative!

Ciao.
   Marco

   P.S. You should not be so shy: up to date information
   about how Unicode may be used in the world's most
   important programming language does not sound so
   "off topic" or "marginally interesting" to me.

Second the motion. All in favor, please say "Aye" to Marco off-list.

   Ciao++
   M.

-- 

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



Re: Taiwanese: unicode of o with dot right above

2000-08-13 Thread Doug Ewell

(Summary for the impatient:
A new character, COMBINING DOT ABOVE RIGHT, should be proposed.)

Kiatgak [EMAIL PROTECTED] wrote:

 1. U+0186/U+0254 (LATIN CAPITAL/SMALL LETTER OPEN O)
 with alternative form in font design.

 This solution is based on the pronunciation and need the help of
 font design, but it induces different outlooks of OPEN O. Is that
 allowed or adequate?

Probably not, regardless of the related meaning.  The "alternative form"
is too different from the standard glyph, and IPA glyphs, perhaps more
than any others except dingbats, need to be constant.

 2. U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT)
 with the GSUB to fix the outlooks in font design.

It scares me whenever someone proposes that proprietary font mechanisms
should be necessary to render Unicode correctly.  Marco Cimarosti
expressed the same concern earlier.  No matter how "open" OpenType,
TrueType, AAT, ATSUI, etc. are or claim to be, there *must* be a way to
render Unicode characters correctly without relying on them.  I don't
even know what a GSUB or a GPOS is, and as a Unicode user and
implementor (but not a font designer) I shouldn't have to.

 The problem is U+00B7 is not a combining character.

That's another problem, yes.  This is supposed to be one character, so
it should not be encoded with two spacing characters.

 Another problem of the same reason:
 Is it a valid sequence if a combining character follows them,
 eg. U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT) + U+0301(COMBINING
 ACUTE ACCENT)
 Is such a solution allowed or adequate?

You could do it, but as Peter Constable pointed out, the acute accent
would be centered over the dot rather than the 'o', which is probably
not what you want.

 3. U+004F/U+006F(O/o) + U+05C1(HEBREW POINT SHIN DOT).

 U+05C1 is the only combining character with a dot in the north-
 east corner which I can find in unicode 3.0.
 To use it is only based on the outlook.

It does look correct, but somehow I don't think it's a great idea to mix
characters from scripts like this if at all possible.  Better to create
a combining character intended for use with the Latin script.

 4. U+004F/U+006F(O/o) + U+031B (COMBINING HORN) or precomposed ones
U+01A0/U+01A1(LATIN CAPITAL/SMALL LETTER O WITH HORN).

This solution is based on the similar outlooks.

Except that it's not even all that similar.

It has the same problem as 3.: use character by outlooks not by
meaning.  Even worse: it changes the shape of a horn into a dot.

More likely, it *wouldn't* change, and you'd end up looking at a horn.

 5. To apply a new combining character.

This is the way to go.  A new character, COMBINING DOT ABOVE RIGHT
(analogous to U+0307 COMBINING DOT ABOVE), should be proposed.  This
doesn't seem to be an especially productive diacritic -- so far it only
appears with 'O' and 'o' -- but a combining character is much more
likely to be approved than a precomposed character.

Fill out the form and gather the necessary examples of this character
(possibly with the help of Tè Khái-su) and send them to the Unicode
Consortium and WG2.  Good luck.

-Doug Ewell
 Fullerton, California