Re: Combining diacriticals and Cyrillic

2003-07-11 Thread Jungshik Shin


On Fri, 11 Jul 2003, Andrew Cunningham wrote:

 Although the actual application of the theory will differ from operating
 system to operating system.

'OS' here has to be interpreted a bit broadly to include 'APIs and
toolkits' used in text rendering. I guess that's what you meant.

For instance, on Win 9x/ME, MS IE that (appears to) use Uniscribe
APIs directly can render complex scripts but Mozilla that uses
standard Win32 Text APIs (such as TextOut) does not as well (except
for Thai, Arabic, Hebrew, Tamil and Korean  for which it has built-in
glyph-based solution). The same is true of MacOS X (ATSUI vs
non-ATSUI) and Unix/X11 (pango/Qt  vs lowerer-level APIs).  And as
you wrote, a version of Mozilla (SILA : http://sila.mozdev.org)
that uses Graphite can render complex scripts well


 2) You need a rendering system that supports the features. On Windows,
 this means that you will need a version of Uniscribe that supports the
 use of combining diacritics with cyrillic characters. Currently none are
 available, except for the version in the MS Office 2003 Beta. I did a

 With this version of Uniscribe (that I don't have) installed, I
 guess MS IE 6 and 'non-graphite version' of Mozilla  (without
 graphite support)  work as well.  In case of Mozilla, you need to
 be on Win 2k/XP (and possibly needs MS Office 2003beta to be
 installed.) Could you check that out?

 Jungshik





Re: Combining diacriticals and Cyrillic

2003-07-11 Thread Philippe Verdy
On Friday, July 11, 2003 12:14 PM, Jungshik Shin [EMAIL PROTECTED] wrote:
 'OS' here has to be interpreted a bit broadly to include 'APIs and
 toolkits' used in text rendering. I guess that's what you meant.
 
 For instance, on Win 9x/ME, MS IE that (appears to) use Uniscribe
 APIs directly can render complex scripts but Mozilla that uses
 standard Win32 Text APIs (such as TextOut) does not as well (except
 for Thai, Arabic, Hebrew, Tamil and Korean  for which it has built-in
 glyph-based solution).

The Win32 Text APIs (such as TextOut) actually DO support UniScribe transparently on 
Windows XP... In most applications, this means that the UniScribe support works 
without requiring explicit calls to the Uniscribe API.

So there's a difference in terms of usable APIs: on Win9x/ME, the basic Win32 Text 
APIs don't have UniScribe support built-in, but the UniScribe API is available 
separately as an additional system component (installed with Internet Explorer which 
uses it if available). On Windows XP, an application can use either the Basic Win32 
Text API, or the UniScribe API for finer controls of glyph substitutions according to 
user preferences and customizable or dynamic locales.

However on XP, UniScribe is only installed if one user selects (in the regional 
settings) the support for complex scripts (includes the minimum system support for 
Hebrew, Arabic, Thai, Vietnamese). This support can be installed even if the regional 
locale data and fonts for these scripts and languages are not loaded.

I don't know why UniScribe is not always installed by default, as it is also useful 
for Latin, Greek and Cyrillic (the regional settings checkbox label is quite confusive 
as users may think they they don't need it for their language, and it should have been 
better named Support for text rendering using Unicode combining sequences, or just 
Support for UniScribe and OpenType fonts with some accessible help, explaining its 
interest such as the use of linked fonts for missing glyphs or additional glyph 
substitution tables, which allows the Arial font to be internally linked to Arial 
Unicode MS, or Lucida Console to be linked to Courrier New, or allows the browser to 
create and use an internal sans-serif font linked to a stack of fonts customized 
according to per script user preferences, and stylesheets).




RE: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Kent Karlsson

 Note also: the Soft_Dotted property was created and considered
 specially for Turkish and Azeri.

Adding to the long, and unfortunately getting longer, list of misleading
statements from Philippe!  No, the reason for the Soft_Dotted property
was/is to mark which characters (regardless of language) that don't
display
intrinsic dot(s) above subglyph(s) when (another) combining character
above
is applied to it (and to then keep the dot(s) a combining dot above or a
combining diaeresis, as appropriate, must be used explicitly).

 In this language context the ASCII i is always rendered with a dot,
 kept also for uppercases.

I hope you don't mean to use a dotted glyph for U+0069!

B.t.w.  It is perfectly legal to use a ligature (in the TECHNICAL sense,
perhaps not the typographic sense) for f, i also for Turkish and
related
languages, especially if the f and i would otherwise overlap.  The point
is that f, i and f, dotless i must be clearly distinguishable for
these
languages, and that may mean that one has to use a TECHNICAL ligature
for f, i having a glyph where the dot on the i is clearly visible (the
horizontal bar of the f and the top serif of the i may still merge).
That may be done by whatever means that is better-looking for that
particular font, e.g. moving the loop of the f to the left, right, or
up.
(Using ZWNJ should not do that, if correctly implemented, but can
instead, mistakenly, result in overlapping f and dot-of-i glyphs, since
not 
even a technical ligature, IIUC (correct me if I'm wrong), would be
allowed...)

/kent k




RE: Combining diacriticals and Cyrillic

2003-07-11 Thread Jon Hanna
 The Win32 Text APIs (such as TextOut) actually DO support
 UniScribe transparently on Windows XP... In most applications,
 this means that the UniScribe support works without requiring
 explicit calls to the Uniscribe API.

And Windows2000. However some ways of using the Text APIs will meant that
few of the benefits  of UniScribe are gained. In particular an application
may use the API a character at a time (for fine control of placement) and
base and combining characters will then be separated unless particular care
is taken to avoid this.

Hence, while it is true that an application using TextOut is indirectly
using UniScribe, it does not follow that they are doing so in a manner
appropriate for solving the problem described.




Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy
On Friday, July 11, 2003 1:12 PM, Kent Karlsson [EMAIL PROTECTED] wrote:

  Note also: the Soft_Dotted property was created and considered
  specially for Turkish and Azeri.
 
 Adding to the long, and unfortunately getting longer, list of
 misleading statements from Philippe!  No, the reason for the
 Soft_Dotted property was/is to mark which characters (regardless of
 language) that don't display intrinsic dot(s) above subglyph(s)
 when (another) combining character above
 is applied to it (and to then keep the dot(s) a combining dot above
 or a combining diaeresis, as appropriate, must be used explicitly).

I don't know how I can say, with my limited English, things without
being always accused of creating misleading things.

Correct things if you think my words create possible confusion in
their interpretation, but please don't over-exhibit them. I don't know
how non-English native writers can participate here if all differences
of interpretations caused by possible use of inappropriate English
terms are answered with flame. This is really frustrating...

The important words in my sentence is considered specially,
where specially does not imply only. It's just that Turkish and
Azeri are already given special treatment in Unicode, which already
includes language exceptions in its technical algorithms (notably
for character foldings).

And according to this treatment, the U+0069 character is already
intended to have a semantic value of a dotted i and not a dotless
i in languages where this creates a semantic difference, so the
question of the Soft_Dotted property is more glyphic than purely
semantic, and it has a semantic behavior (at the abstract text
level where Unicode is supposed to standardize things) mostly in
case folding operations where the actual encoding of the converted
abstract text is important.

The rest of the description of the Soft_Dotted property is mostly a
recommandation for authors of fonts and text renderers, so that
they should *preserve this semantic difference* in the rendered text
between abstract letters dotted and dotless i's... And this does
not affect the encoding of the abstract text or any algorithmic
transformation of the encoded abstract text.

By saying preserve this semantic difference*, I do not imply that
the U+0069 must/should have a dot above: it remains a font design
problem, out of scope of Unicode. There are certainly many ways
to preserve the semantic difference in the rendered text when this
is really appropriate (for example in Turkish and Azeri, or with a
distinct and emphasized rendering of the Turkish dot, including
in possible ligatures with other letters).

FLAME-OFF
And please, do not flame me if this message contains new
terms that also create confusion. I can reread the best I can,
and there are certainly other better ways to say the same thing
in English without these unintentional confusive interpretations,
and I am sorry by advance that such confusion still persist.

Accept the fact that I'm not a Unicode member and Unicode
is only one of my interests, and I have a lot of other
terminologies with which I have to work with.

If you can't accept that approximative English language may
be used by participants here, and refuse to understand the
real intent of users when they write here, then have this
group be moderated, but don't say it is open to discussions
from anybody using Unicode.

For normative aspects, with all exact terms, Unicode has its
web site, its publications, its data files, its working draft
documents, its technical committees, its permanent members,
its chaimans, and even bugcomment report forms to
interact with users at the normative level.
And I am sure that permanent Unicode members do not even
need this newsgroup to exchange their work on normative
documents that are directly sent to the working committee
bureaus, or via private email, phone calls, snail letters, or
their own web sites.
Please don't expect the same linguistic level quality here.

Also don't complain if my messages are long, but the constant
critics about what I am supposed to imply, gives me no
other choice than explaining always what I mean, and this is
particularly lengthy, and really boring in a newsgroup.
/FLAME-OFF

Thanks for your patience.

-- Philippe.




Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Peter Kirk
On 11/07/2003 05:56, Philippe Verdy wrote:

Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
 

Whatever it was that was specially created or adjusted for Turkish and 
Azeri, was it specifically restricted to these two languages? These are 
I think the only relatively major languages which use the special dotted 
and dotless i case mappings. But they are also used, at least in a small 
way, for minority languages of Turkey and Azerbaijan. (Use of these 
minority languages in Turkey is illegal, but that's another matter.) 
They were used in the 1930's for many Central Asian languages, and were 
at least proposed in the 1990's for newly introduced Latin alphabets. So 
I hope that what is fixed by Unicode is the name not of two languages 
but of an extensible family of scripts.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Language kits for Mac...

2003-07-11 Thread Rick McGowan
Ran across a place that has a number of language kits for Mac OS X,  
including Burmese, Cherokee, Inuktitut, Kannada, Malayalam, Telugu, and  
Tibetan. I haven't seen any blurbs about them anywhere...

http://www.xenotypetech.com

Rick




Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy
On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
 So I hope that what is fixed by Unicode is the name not
 of two languages but of an extensible family of scripts.

I think you speak about family of languages?

Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
matching sometimes very imprecize families of languages
overlapping other language codes...

Until it is demonstrated that a language needs such fix
in Unicode support tables, it's best to just say that these
fixes are needed for some recognized language codes and
that applications are allowed to add their own fixes or
language tailorings, and that the existing language
tailorings in Unicode databases are just non-normative
samples.

-- Philippe.




Re: Combining diacriticals and Cyrillic

2003-07-11 Thread Andrew C. West
On Fri, 11 Jul 2003 13:15:14 +0200, Philippe Verdy wrote:

 The Win32 Text APIs (such as TextOut) actually DO support UniScribe
 transparently on Windows XP... In most applications, this means that the
 UniScribe support works without requiring explicit calls to the Uniscribe API.

Surely some mistake here.

quote src=MSDN
Starting with Microsoft Windows 2000, these functions [TextOut, ExtTextOut,
TabbedTextOut, DrawText, and GetTextExtentExPoint] have been extended to support
complex scripts. In general, this support is transparent to the application.
/quote

quote src=MSDN
The [Uniscribe] ScriptTextOut function takes the output of both ScriptShape and
ScriptPlace calls and calls the operating system ExtTextOut function
appropriately.
/quote

Now if Uniscribe's ScriptTextOut function calls ExtTextOut, and according to
Philippe ExtTextOut utilises Uniscribe to output text ...

No, I don't think so. There is a big difference between support complex
scripts (MSDN) and support UniScribe (Philippe). I don't know what the exact
implementation of complex script support is for ExtTextOut etc., but I'm pretty
sure that it is independant of Uniscribe. Maybe I'm wrong, but at least I'm not
going to dress up a wild guess as a statement of certain fact as Philippe so
likes to do (and it is disingenuous of him to pretend that we are all picking on
him because his English is not good enough - there's nothing ambiguous about his
misleading statements, and if he wants to repeat them in French they'll still be
misleading or just plain wrong).

Andrew



RE: Combining diacriticals and Cyrillic

2003-07-11 Thread Rick Cameron
Ah, but what you don't realise [and it's not surprising, because MSDN
doesn't make it clear] is that when ScriptTextOut calls ExtTextOut, it
passes glyph indices, and uses the ETO_GLYPH_INDEX option. 

Thus, the two statements are perfectly consistent.  For once, Philippe's
bold statement of fact is right. ;^)

(BTW, the authority for my bold statement of fact above is a conversation
with David Brown, the architect of Uniscribe)

Cheers

- rick cameron

-Original Message-
From: Andrew C. West [mailto:[EMAIL PROTECTED] 
Sent: Friday, 11 July 2003 8:53
To: [EMAIL PROTECTED]
Subject: Re: Combining diacriticals and Cyrillic


On Fri, 11 Jul 2003 13:15:14 +0200, Philippe Verdy wrote:

 The Win32 Text APIs (such as TextOut) actually DO support UniScribe 
 transparently on Windows XP... In most applications, this means that 
 the UniScribe support works without requiring explicit calls to the 
 Uniscribe API.

Surely some mistake here.

quote src=MSDN
Starting with Microsoft Windows 2000, these functions [TextOut, ExtTextOut,
TabbedTextOut, DrawText, and GetTextExtentExPoint] have been extended to
support complex scripts. In general, this support is transparent to the
application. /quote

quote src=MSDN
The [Uniscribe] ScriptTextOut function takes the output of both ScriptShape
and ScriptPlace calls and calls the operating system ExtTextOut function
appropriately. /quote

Now if Uniscribe's ScriptTextOut function calls ExtTextOut, and according to
Philippe ExtTextOut utilises Uniscribe to output text ...

No, I don't think so. There is a big difference between support complex
scripts (MSDN) and support UniScribe (Philippe). I don't know what the
exact implementation of complex script support is for ExtTextOut etc., but
I'm pretty sure that it is independant of Uniscribe. Maybe I'm wrong, but at
least I'm not going to dress up a wild guess as a statement of certain fact
as Philippe so likes to do (and it is disingenuous of him to pretend that we
are all picking on him because his English is not good enough - there's
nothing ambiguous about his misleading statements, and if he wants to repeat
them in French they'll still be misleading or just plain wrong).

Andrew



Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Peter Kirk
On 11/07/2003 08:51, Philippe Verdy wrote:

On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
 

So I hope that what is fixed by Unicode is the name not
of two languages but of an extensible family of scripts.
   

I think you speak about family of languages?

Not really. A set of languages, but they are not all related in any way, 
and many of them have more than one script or alphabet so this is not 
really a property of the languages. Perhaps set of alphabets would be 
a better way to put it.

Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
matching sometimes very imprecize families of languages
overlapping other language codes...
Until it is demonstrated that a language needs such fix
in Unicode support tables, ...
If necessary I can collect some data to demonstrate this, at least for 
some languages.

... it's best to just say that these
fixes are needed for some recognized language codes and
that applications are allowed to add their own fixes or
language tailorings, and that the existing language
tailorings in Unicode databases are just non-normative
samples.
-- Philippe.



 

Agreed. But does Unicode actually treat them as non-normative samples?

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-11 Thread Philippe Verdy
On Friday, July 11, 2003 6:43 PM, Peter Kirk [EMAIL PROTECTED] wrote:

 Agreed. But does Unicode actually treat them as non-normative samples?

Note clear here: the reference documents say that these tables are
normative for applications that want to implement a conforming
case folding. But UTR#30 (characters folding) contains still many
areas marked as to be done, so it is not clear that all folding issues
have been solved. It seems reasonnable however that non language
specific elements in the CaseFolding table are normative, as they
are computed from UCD...

I see this comment:
[quote]
# The entries in this file are in the following machine-readable format:
#
# code; status; mapping; # name
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple
characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.
# T: special case for uppercase I and dotted uppercase I
#- For non-Turkic languages, this mapping is normally not used.
#- For Turkic languages (tr, az), this mapping can be used instead of the normal 
mapping for these characters.
#  Note that the Turkic mappings do not maintain canonical equivalence without 
additional processing.
#  See the discussions of case mapping in the Unicode Standard for more 
information.
#
# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.
#
#The mappings with status T can be used or omitted depending on the desired 
case-folding
#behavior. (The default option is to exclude them.)
#
[/quote]

Simple Case Mapping (C+S) is not marked to be done in UTR#30, but other special 
mappings with status T are off by default (so they depend of a specific tailoring, a 
non-normative behavior if I interpret it correctly, as applications are free to use or 
not use them, under unspecified conditions, i.e. here the desired behavior).

This concerns many more characters than just Turkish/Azeri uses, and there is some 
overlap with the informative and unfinished UTR#30 reference:

(1) Simple mappings (are they normative?):

1F88; S; 1F80; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI
1F89; S; 1F81; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PROSGEGRAMMENI
1F8A; S; 1F82; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F8B; S; 1F83; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F8C; S; 1F84; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F8D; S; 1F85; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F8E; S; 1F86; # GREEK CAPITAL LETTER ALPHA WITH PSILI AND PERISPOMENI AND 
PROSGEGRAMMENI
1F8F; S; 1F87; # GREEK CAPITAL LETTER ALPHA WITH DASIA AND PERISPOMENI AND 
PROSGEGRAMMENI

1F98; S; 1F90; # GREEK CAPITAL LETTER ETA WITH PSILI AND PROSGEGRAMMENI
1F99; S; 1F91; # GREEK CAPITAL LETTER ETA WITH DASIA AND PROSGEGRAMMENI
1F9A; S; 1F92; # GREEK CAPITAL LETTER ETA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1F9B; S; 1F93; # GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1F9C; S; 1F94; # GREEK CAPITAL LETTER ETA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1F9D; S; 1F95; # GREEK CAPITAL LETTER ETA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1F9E; S; 1F96; # GREEK CAPITAL LETTER ETA WITH PSILI AND PERISPOMENI AND PROSGEGRAMMENI
1F9F; S; 1F97; # GREEK CAPITAL LETTER ETA WITH DASIA AND PERISPOMENI AND PROSGEGRAMMENI

1FA8; S; 1FA0; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PROSGEGRAMMENI
1FA9; S; 1FA1; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PROSGEGRAMMENI
1FAA; S; 1FA2; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND VARIA AND PROSGEGRAMMENI
1FAB; S; 1FA3; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND VARIA AND PROSGEGRAMMENI
1FAC; S; 1FA4; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND OXIA AND PROSGEGRAMMENI
1FAD; S; 1FA5; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND OXIA AND PROSGEGRAMMENI
1FAE; S; 1FA6; # GREEK CAPITAL LETTER OMEGA WITH PSILI AND PERISPOMENI AND 
PROSGEGRAMMENI
1FAF; S; 1FA7; # GREEK CAPITAL LETTER OMEGA WITH DASIA AND PERISPOMENI AND 
PROSGEGRAMMENI

1FBC; S; 1FB3; # GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI
1FCC; S; 1FC3; # GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI
1FFC; S; 1FF3; # GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI

(2) Full mappings (clearly optional):

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
0130; F; 0069 0307; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0149; F; 02BC 006E; # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
01F0; F; 006A 030C; # LATIN SMALL LETTER J WITH CARON

0390; F; 03B9 0308 0301; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
03B0; F; 03C5 0308 0301; # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS

0587; F; 0565 0582; # ARMENIAN SMALL LIGATURE ECH YIWN

1E96; F; 0068 0331; # LATIN SMALL LETTER H WITH LINE BELOW
1E97; 

Unicode Conf. in Africa? India? (Re: 24th Unicode Conference - Atlanta, GA ...)

2003-07-11 Thread Don Osborn
I've come back to my mailbox a bit amazed to find so much mail on this
address issue. I'd have to agree with Peter's last on the topic - not to
prolong the thread (!) but to pick up on his description of Gabon as a
country not particularly known for involvement in this industry.  Is there
any likelihood of a Unicode conference being held somewhere in Africa with a
bit more going on in ICT (South Africa? Ghana?)?  Might that be helpful in
Unicode outreach to that continent?

For that matter what about India, which I did not notice on the list of past
conference venues?  It would seem that multilingual regions such as
sub-Saharan Africa and South Asia would draw particular local benefits from
Unicode, and India at least is certainly not lacking in involvement in
ICT...

Don Osborn
Bisharat.net


- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, July 10, 2003 6:57 PM
Subject: Re: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003

[ . . . ]

 Sure, it's better not to assume USA is understood, but the criticism was
 being taken too far. The original criticism was certainly valid, but
 taking it to the extent of suggesting that people will do a lot of
 research on villages and suburbs in Gabon is, IMO, absurd. If you do a
 google search on atlanta, ga, you'd have to wade through pages and pages
 of results related to the US city in the state of Georgia before you'd
 come close to anything else, and if somebody hasn't figured out by that
 point that the conference is probably in Atlanta, Georgia, USA rather than
 a village in a country not particularly known for involvement in this
 industry, then they're probably not intelligent enough to be involved in
 this industry anyway.



 - Peter







Re: Combining diacriticals and Cyrillic

2003-07-11 Thread Jungshik Shin


On Fri, 11 Jul 2003, Philippe Verdy wrote:

 On Friday, July 11, 2003 12:14 PM, Jungshik Shin [EMAIL PROTECTED] wrote:


  For instance, on Win 9x/ME, MS IE that (appears to) use Uniscribe
  APIs directly can render complex scripts but Mozilla that uses
  standard Win32 Text APIs (such as TextOut) does not as well (except
  for Thai, Arabic, Hebrew, Tamil and Korean  for which it has built-in
  glyph-based solution).

 The Win32 Text APIs (such as TextOut) actually DO support UniScribe
 transparently on Windows XP... In most applications, this means
 that the UniScribe support works without requiring explicit calls
 to the Uniscribe API.

 You can add Win2k whereever you have Win XP. Anyway, didn't I write
(or at least imply) that Mozilla (for that matter any applications)
relying on  basic Win32 Text APIs _can_ *render* complex scripts
on Win2k/XP? What different conclusion did you reach from mine?
Most people here are well aware that Win 9x/ME and Win 2k/XP are
two *different*  breeds of OS'.

 So there's a difference in terms of usable APIs: on Win9x/ME, the
 basic Win32 Text APIs don't have UniScribe support built-in, but

  As Andrew noted in his response to your message, MSDN articles
on related issues are a bit confusing as to what exactly is going
on inside *TextOut*.  You might be right, but also could be wrong
about ' built-in  '  and 'transparently'.


 the UniScribe API is available separately as an additional system
 component (installed with Internet Explorer which uses it if
 available). On Windows XP, an application can use either the Basic
 Win32 Text API, or the UniScribe API for finer controls of glyph
 substitutions according to user preferences and customizable or
 dynamic locales.

  Well, actually there are two more ways to deal with CTL rendering
on Win32. And a lot more important difference between two approach
you mentioned lies in the text selection and caret/cursor movement.
With basic Text APIs, you cannot move cursor graphme by graphme
or make  copy and cut always fall on the grapheme boundaries.
Well, sure you can, but you have to do all the jobs yourself.
One can just look up MSDN articles if interested.


 I don't know why UniScribe is not always installed by default,
 as it is also useful for Latin, Greek and Cyrillic (the regional

   Add Korean to the list. After making a Korean opentype font,
I was confused because it worked fine inside font editing tools,
but it didn't work with MS IE and Mozilla under Win2k (although I
had heard that Korean OTFs shipped with Korean version of MS Office
XP worked well with MS IE.). On that particular Win2k box, I hadn't
enabled (needless to say, Korean support had been installed) support
for any of South and Southeast Asian scripts. After installing
Devanagari, Tamil and Thai support (just one should be sufficient),
the font worked well with both MS IE and Mozilla.


 settings checkbox label is quite confusive as users may think they
 they don't need it for their language, and it should have been
 better named Support for text rendering using Unicode combining
 sequences, or just Support for UniScribe and OpenType fonts with

  This is a very good suggestion.


  Jungshik




ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-11 Thread Doug Ewell
Philippe Verdy verdy_p at wanadoo dot fr wrote:

 Good luck with ISO language codes which does not even
 define them, and contain many duplicate codes even in
 the Alpha-2 space (he/iw, in/id), or unprecize codes
 matching sometimes very imprecize families of languages
 overlapping other language codes...

The codes iw for Hebrew and in for Indonesian were deprecated
FOURTEEN YEARS AGO.  It is not accurate or fair to refer to them as
duplicates of he and id.  The Registration Authority deprecates
such codes, rather than deleting them, for backward compatibility with
any data that might contain the old codes.

The part about codes for language families overlapping other codes for
specific languages is, regrettably, true.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/