Re: perlunicode comment - when Unicode does not happen

Nick Ing-Simmons Sun, 28 Dec 2003 08:57:27 -0800

Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
>>    Let's not 'fix' it (not carve it on a stone), but offer a few
>> well-thought-out options. For instance, Perl may offer (not that these
>> are particularly well-thought-out) 'just treat this as a sequence of
>> octets', 'locale', and 'unicode'. 'locale' on Unix means multibyte
>> encoding returned by  nl_langinfo(CODESET) or equivalent.  On Windows,
>> it's whatever 'A' APIs accept or is returned by ACP_??().  'unicode'
>> is utf8 on Unix-like OS, BeOS and 'utf-16(le)' on Windows.
>
>Something like that could work, yes.


Agreed.

>
>> creating files with UTF-8 names while still using en_GB.ISO-8859-1
>> locale. Why does Perl have to be held responsible for your intentional 
>> act
>> that is bound to break things?
>
>Whoa!  It's the other way round here.  Nick is using a locale that suits
>him for other reasons (e.g. getting time and data formats in proper 
>British
>ways), but why should he be constrained not to use for his filenames 
>whatever
>he wants?

I was at least partly being a devil's (UTF-8) advocate anyway, and to that 
end Jungshik Shin's intervention saying use a UTF-8 locale is positive.
When I want non-ASCII it is for one of the following:
  For phonetics for the speech synthesis stuff
  To represent Euro currency symbol
  To typeset mother-in-law's welsh poetry 
  cross-references for Japanese customers of day job

There is no "locale" for phonetics, there is for Euro issues of course,
but setting my locale to "cy_GB" so I can name file by poem is going
to render dates and the like opaque to me the user, likewise 
for Japanese. So for _my_ use UTF-8 is what I want - but I _don't_ want 
some locale derived multi-byte guess. Unicode suits me.


>
>>   Well, actually, if your WinXP file system has only characters covered
>> by Windows-1252,

Well AFAIK there isn't a Windows code page that covers welsh accented 
characters (and certainly not if you mix in phonetics). 
The shared drives at work I mount have user's which are native speakers
of not only English, Italian, Norwegian, Swedish, but also two kinds of 
Chinese, and various Indian languages - and we have Japanese customers, 
so even in a small English startup cp1252 does not give them 
all the freedom to give files natural names.

>
>And how would Nick know that, or he could he guarantee that, if the 
>Windows
>share is in multiuser use?
>
>PLEASE, PEOPLE: stop thinking of this in terms of an environment 
>controlled
>solely by one user.

Exactly - a file system should be able to cope even if files 
are named in english, welsh, chinese, ...

So IMHO perl's -d etc. should be helping the move to Unicode not 
pandering to multi-byte compromises. I have no objection to some 
way to name files in shift-jis if that has been done, but I hope for
a to-become common practice of "unicode"

Re: perlunicode comment - when Unicode does not happen

Reply via email to