Re: perl unicode support

Egmont Koblinger Fri, 30 Mar 2007 01:41:41 -0800

On Thu, Mar 29, 2007 at 04:24:39PM -0400, Rich Felker wrote:

Hi,


> Using accented characters in your own language has always been
> possible with legacy codepage locales

Of course.

> The only thing that's not
> possible in legacy codepage locales is handling text from other
> languages that need characters not present in your codepage.

You say it's not possible??? Just launch firefox/opera/konqueror/whatever
modern browser with a legacy locale and see whether it displays all foreign
letters. It _does_, though you believe it's "not possible".

But let's reverse the whole story. I write a homepage in Hungarian, using
either latin2 or utf8 charset. Someone who lives in West Europe, America,
Asia, the Christmas Island... anywhere else happens to visit this page. It's
not only important for him, it's also important for me that my accents get
displayed correctly there, under a locale unknown to me. And luckily that's
how all good browsers work. I can't see why you're reasoning that this
should't (or mustn't?) work.

> At this point anyone who wants multilingual text support should be
> using UTF-8 natively,

At this point everyone should be using UTF-8 natively. But not everybody
does.

> I’ve always used UTF-8 since I started with Linux; until recently it
> was just restricted to the first 128 characters of Unicode, though. :)

:-)

> UTF-8 has been around for almost 15 years now, longer than any real
> character-aware 8bit locale support on Linux. It was a mistake that
> 8bit locales were ever implemented on Linux. If things had been done
> right from the beginning we wouldn't even be having this discussion.

I agree.

> I’m sure you did have legitimate reasons to use Latin-2 when you did,
> namely broken software without proper support for UTF-8.

Yes.

> Here’s where
> we have to agree to disagree I think: you’re in favor of workarounds
> which get quick results while increasing the long-term maintainence
> cost and corner-case usability, while I’m in favor of omitting
> functionality (even very desirable functions) until someone does it
> right, with the goal of increasing the incentive for someone to do it
> right.

Imagine the following. Way back in 1996 I had to create homepages, text
files, LaTeX documents etc. that contained Hungarian accented characters.
There were two ways to go. One was to use the legacy 8-bit encoding
(iso-8859-2 for Hungarian), the other was to fix software to work with UTF-8.
(Oh, there's a third way for homepages and latex files: use their disgusting
escaping mechanism.) In the first case I'm ready with my job in several
minutes. In the second case I had to fix dozens (maybe even hundreds) of
pieces of software that even took (and still takes) more than 10 years for
all the developers around the world to fix. Imagine that my boss asked me to
create a homepage and I answered him "okay, but I first have to fix a
complete operating system with its applications, I'll be ready in N years".
Are you still convinced that the 1st solution was just a "workaround" that
increased long-term maintainence cost? You would be clearly right if
software were usable with UTF-8 those days, but it's a fact that they
weren't.

Nowadays, when ninety-some percent of software are ready to UTF-8, you can
force UTF-8 and fix some remaining applications if needed. This approach --
though might have been theoretically better -- just couldn't have worked in
the last century when only a minor part of software supported UTF-8.

> Nonsense. If you don’t have kanji fonts installed then it can’t
> display kanji anyway. Not having a compatible encoding is a comparable
> obstacle to not having fonts.

You're mixing two thing: installed (having support for it) vs. default.

Of course if you have no kanji installed then you won't expect an
application to display it. If you don't have UTF-8 support _available_ then
no-one expects software to handle it. But if UTF-8 support is _available_
though it's not set as the _default_, it is expectable that those software
that need it use it despite the default locale.

> I see no reason that a system without
> support for _doing_ anything with Japanese text should be able to
> display it.

See? You're talking about "support" too. If I set LC_ALL=hu_HU.ISO-8859-2
then my system still _supports_ UTF-8 and kanjis, though it's not the
default.

> What happens if you copy and paste it from your browser
> into a terminal or text editor???

Minor detail. Test what occurs. Depending on the terminal emulator, it might
skip unknown characters, replace them with question mark, or refuse to
insert anything if the clipboard/selection contains an out-of-locale
character. It's not important at all.

> Even the Unicode standards talk about “supported subset” and give
> official blessing to displaying characters outside the supported
> subset as a ? or replacement glyph or whatever.

Sure. But I still no reason why any application should restrict its own
"supported subset" to the current locale's charset if it has no sane reason
to do so.



-- 
Egmont

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to