Re: FYI: Some links about UTF-16

Beni Cherniavsky Fri, 11 Jul 2003 02:27:26 -0700

Wu Yongwei wrote on 2003-07-11:

> Regarding the 'A' APIs in Windows.  Do you mean that there should be
> some API to change the interpretation of strings in 'A' APIs (esp.
> regarding file names, etc.)?  If that were the case, the OS must
> speak Unicode in some form internally.  In my previous message I
> interpreted your talk about UTF-8 in 'A' APIs as all things are to
> be encoded in UTF-8 (instead of the language-specific encodings),
> which I thought could not be acceptable at the time of Windows 95.
>
For win95 the 'W' APIs were faked anyway so this is not a big issue.
The inner representation of the OS would be the local encoding of the
specific win95 version (if that's the most they were ready to
implement at this stage).  There would indeed be some forward
compatible API to change the interpretation of strings but it would
only accept or UTF-8 and OS calls would refuse to accept UTF-8
characters that are not in the local encoding.


On winNT and up, however, the UTF-8 locale would be made fully
functional (the internal OS encoding could be UTF-8, -16 or what ever
they would like).  This way the whole issue would be far less painful.
Just like on Linux with UTF-8 :-).  BTW, win2k does have the option of
switching the encoding used in the 'A' APIs, it's just global and
requires a reboot.

Granted, it would take a lot of foresight to go this way back the
choice they made is still bad.  In fact, they can still adopt UTF-8
for the 'A' functions.  I think it's a good idea even now.

> When talking about the file system, I really like NTFS much better.

Trusting you files to a complex undocumented FS is strange idea but
I'm digressing ;-).

> POSIX file system is *too* simple.  I hate the fact that when I
> switch from en_US.UTF-8 to zh_CN.GB18030, the file names with
> characters beyond ASCII are corrupt.  If the file is on a Windows
> partition, it is possible to remount the partition in an appropriate
> encoding;

This doesn't solve the issue.  Don't you have filenames lying around
in playlists?  Makefiles?  A zillion other places.  At least on unix,
filenames fly around in text too freqeuntly to afford a mismatch
between your text encoding and your filename encoding.  So you would
still need to go over all these filelists and recode them.

> if it is on an EXT2/3 partition or on a CD-ROM, then I am out of
> luck.  Maybe the mount tool should do something to handle this? :-)
>
Probably.  Recoding in the mount level would be both possible and
painless on program's APIs.  An alternative would a tool to rename all
files, run once when switching the locale.  I'm not aware of either of
these having been implemented, which is a hint: use UTF-8 ;-).

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>

> --- From Original Message from Jungshik Shin ---
> On Thu, 10 Jul 2003, Wu Yongwei wrote:
>
> > Jungshik Shin wrote:
> >
> > >   If MS had decided to use UTF-8 (instead of coming up with a whole new
> > > set of APIs for UTF-16) with  'A' APIs, Mozilla developers' headache(and
> ....
> > > UTF-8/'A' APIs vs UTF-16/'W' APIs and there are many other things to
> > > consider in case of Win32.
> >
>
> > It seems impossible because there are some many legacy
> > applications.  On the Simplified Chinese versions of Windows, 'A'
> > always implies GB2312/GBK. Switching ALL to UTF-8 seems too
> > radical an idea about 1994.  At the time
>
> Using 'A' APIs and UTF-8 does not mean that 'A' APIs are made to
> work ONLY with UTF-8.  As you know well, 'A' APIs are bascially for
> APIs to deal with 'char *'. As such, in theory, it can be used for
> any single or multibyte encodings including Windows 932, 936, 949,
> 950 and 6xxxx(I forgot the codepage designation for UTF-8).
>
> As Unix(e.g. Solaris and AIX and to a lesser degree Linux)
> demonstrated, a single application (written to support multibyte
> encodings) can work well both under legacy-encoding-based locales
> and under UTF-8 locales.
>
> > Microsoft adopted Unicode, people might truly believe UCS-2 is
> > enough for most application, and Microsoft had not the file name
> > compatibility burden in Unix
>
> Well, this is an orthogonal issue. POSIX file system is so 'simple'
> (which is a virtue in some aspects) that it doesn't have an inherent
> notion of 'codeset/encoding/charset'. However, Windows doesn't use
> POSIX file system and using 'A' APIs does NOT mean that they
> couldn't use VFAT or NTFS where filenames are in a form of Unicode.
>
> > (I suppose you all know that the long file names in Windows are in
> > UTF-16).
>
> Actually, VFAT documentation is so hard to come by that we can just
> speculate that it's UTF-16 (it could well be just UCS-2 in Windows
> 95).
>
> > I would not blame Microsoft for this.
>
> I wouldn't either and I didn't mean to. I believe they weighted all
> pros and cons of different options and decided to go with their
> two-tiered API approach. In my previous message, I just gave a
> downside to that approach aggregating all other arguments into a
> single phrase 'there are many other things to consider.....'
>
> > Also consider the following fact:  Windows 95 emerged at a time
> > when many people had only 8MB of RAM. Yah, I don't think AT THAT
> > TIME we could tolerate a 50% growth in memory occupation.
>
> Windows 95/98/ME are not Unicode-enabled in many senses while Win
> 2k/XP (NT4 to a lesser degree) are [1].  Therefore, it was not an
> issue for Win95 in 1994/95 simply because Win95 still used legacy
> encodings.
>
> [1] Win 9x/ME is rather like POSIX system running under locales with
> legacy encodings whereas Win 2k/XP is similar to POSIX system
> running under UTF-8 locales.
>
>  Jungshik
>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: FYI: Some links about UTF-16

Reply via email to