Re: FYI: Some links about UTF-16

Jungshik Shin Thu, 10 Jul 2003 19:44:51 -0700

On Thu, 10 Jul 2003, Wu Yongwei wrote:

> Jungshik Shin wrote:
>
> > I think it's not so much due to defects in programs as due to the lack of
> > high-quality fonts. These days, most Linux distributions come with free
> > truetype fonts for zh, ja, ko, th and other Asian scripts. However,
> > the number and the quality of fonts for Linux desktop are still
> > inferior to those for Windows.
>
> The problem is mainly not font itself, but font combination.  I really
> cannot bear the display of ASCII characters in Song Ti, which is simply ugly
> (and fixed width).


  Why don't you specify a variable-width font as the system default?
I understand you still don't like Latin glyphs in Chinese fonts. I hate
Latin glyphs in Korean fonts, too.


> locale Linux seems to be able to do so, but in the Chinese locale all is in
> the Chinese font, which is not suitable at all for Latin characters.

  I don't think there's any difference between English and Chinese locales
provided that you meant en_*.UTF-8 and zh_*.UTF-8. You may get an impression
that it seems to work under en_US.UTF-8 because the 'system default font'
for en_US.UTF-8 does not cover Chinese characters and the automatic font
selection mechanism picks up a Chinese font for Chinese characters while
using the default font for Latin letters. On the other hand, in zh*.UTF-8,
the system default font covers Latin letters as well as Chinese characters
so that both Latin/Chinese are rendered with the default font.

  A way to work around is to specify your favorite Latin font ahead
of your Chinese font if CSS-style font list can be used.

> Beginning with Windows 2000, Windows could choose the
> font to use based on the Unicode range (Java does this too).  In the English

  This is a  good feature to have although CSS-style font list works
most of time.  Almost everything we need for this is already in
place (fontconfig, pango). BTW, I haven't seen this available in
Win2k. How can I do that? (not that I don't believe you but that
I'm curious)



> I used an Windows Gtk application, which used Tahoma (an good sans serif
> font) at first.  But after an upgrade it automatically chose to use the
> system default font, which is the Chinese Song Ti.  It took me several hours
> to "correct" the ugly and corrupt (yes, because dialogue dimensions are
> different) display.

  Again, I haven't run Gtk programs under Win32 so that I don't know how
they select fonts. Do they use fontconfig? fontconfig can make a big
difference.


> >> There seems little sense now arguing the virtues of UTF-8 and UTF-16.
> >> Technically they both have advantages and disadvantages.  I suppose we

> >   If MS had decided to use UTF-8 (instead of coming up with a whole new
> > set of APIs for UTF-16) with  'A' APIs, Mozilla developers' headache(and
....
> > UTF-8/'A' APIs vs UTF-16/'W' APIs and there are many other things to
> > consider in case of Win32.
>

> It seems impossible because there are some many legacy applications.  On the
> Simplified Chinese versions of Windows, 'A' always implies GB2312/GBK.
> Switching ALL to UTF-8 seems too radical an idea about 1994.  At the time

 Using 'A' APIs and UTF-8 does not mean that 'A' APIs are made to work ONLY
with UTF-8.  As you know well, 'A' APIs are bascially for APIs to deal with
'char *'. As such, in theory, it can be used for any single or multibyte encodings
including Windows 932, 936, 949, 950 and 6xxxx(I forgot the codepage
designation for UTF-8).

 As Unix(e.g. Solaris and AIX and to a lesser degree Linux) demonstrated,
a single application (written to support multibyte encodings) can work
well both under legacy-encoding-based locales and under UTF-8 locales.


> Microsoft adopted Unicode, people might truly believe UCS-2 is enough for
> most application, and Microsoft had not the file name compatibility burden
> in Unix

  Well, this is an orthogonal issue. POSIX
file system is so 'simple' (which is a virtue in some aspects) that it doesn't
have an inherent notion of 'codeset/encoding/charset'. However, Windows
doesn't use POSIX file system and  using 'A' APIs does NOT  mean that they
couldn't use VFAT or NTFS where filenames are in a form of  Unicode.


> (I suppose you all know that the long file names in Windows are in
> UTF-16).

  Actually, VFAT documentation is so hard to come by that we can just
speculate that it's UTF-16 (it could well be just UCS-2 in Windows 95)


> I would not blame Microsoft for this.

  I wouldn't either and I didn't mean to. I believe they weighted
all pros and cons of different options and decided to go with their
two-tiered API approach. In my previous message, I just gave a downside to
that approach aggregating all other arguments into a single phrase
'there are many other things to consider.....'

> Also consider the following
> fact:  Windows 95 emerged at a time when many people had only 8MB of RAM.
> Yah, I don't think AT THAT TIME we could tolerate a 50% growth in memory
> occupation.

 Windows 95/98/ME are not Unicode-enabled in many senses while Win 2k/XP (NT4 to a
lesser degree) are [1].  Therefore, it was  not an issue for  Win95 in 1994/95 simply
because Win95 still used legacy encodings.


[1] Win 9x/ME is rather like POSIX system running under locales with legacy encodings
whereas Win 2k/XP is similar to POSIX system running under UTF-8 locales.


 Jungshik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: FYI: Some links about UTF-16

Reply via email to