On Mon, 4 Sep 2006 09:58:05 +0900,
Matthias Schmidt ([EMAIL PROTECTED]) wrote:


>Mikael,
>
>there are 2 approaches to solve multi-language problems:
>
>- Unicode in different variants. This requires a unicode font.
>Unicode text is usually much bigger than normal text, because you need
>up to 4 characters to encode one letter.
>Unicode fonts are also much bigger than normal fonts, because they
>include thousands a characters from different writing systems, like arab
>or chinese.

Please allow me to chime in on this thread to dispel some
apparent misconceptions:

First of all, Mac OS X does not distinguish between "Unicode"
fonts and "WorldScript" (or "normal") fonts.
Most fonts that ship with Mac OS X, from Times to Osaka, can be
used by both Unicode and WorldScript applications, though
WorldScript applications can typically access only a subset of
the font repertoire.  Some fonts, like Al Bayan (used to write
Arabic) are Unicode-only, meaning that they cannot be used by
WorldScript code, like the version of WASTE currently being
used in PowerMail.  But these are more exceptions than the rule.

Secondly, there is no single font that includes glyphs for
each and every Unicode character.  Some third-party fonts,
like Code2000 by James Kaas, have a huge repertoire covering
a large number of different writing systems, but are by no
means complete.

Thirdly, how many bytes (not characters) are needed to encode
a Unicode character depends on the Unicode transformation format
being used (UTF-8, UTF-16 or UTF-32).  Most, if not all, Unicode
applications running on Mac OS X use the UTF-16 transformation
format internally because that's the format required by Carbon
and Cocoa APIs.  When represented in UTF-16, the vast majority
of currently defined Unicode characters, including most Chinese/Japanese
ideograms, take up two bytes.  When represented in legacy
WorldScript encodings, letters in small alphabets like Latin,
Cyrillic or Arabic take up one byte, and Chinese/Japanese ideograms
take up two bytes.  So Unicode increases an application's memory
requirements for text storage by a factor of two *at most*.

>- you use script-based system, which is using different scripts and code
>pages for different languages. This requires different fonts (Osaka for
>Japanese for example) for every language. This is what the wast text
>engine is doing.

This is correct, but only applies to the old version (2.x) of
the WASTE text engine currently being used by PowerMail.
The current version of WASTE (3.0) has abandoned WorldScript
in favor of Unicode.


                                    -- marco

-- 
It's not the data universe only, it's human conversation.
They want to turn it into a one-way flow that they have entirely
monetized. I look at the collective human mind as a kind of
ecosystem. They want to clear cut it. They want to go into the
rainforest of human thought and mow the thing down.


Reply via email to