Re: utf8_heavy noise

Jarkko Hietaniemi Sun, 22 Jun 2003 14:18:59 -0700

On Sun, Jun 22, 2003 at 05:28:03PM -0400, Daniel Yacob wrote:
> 
> > For your information:
> > Unicode 4.0 adds two sets of decimal digits.  :-)
>  
> > 1946..194F    ; Nd #  [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE
> > 104A0..104A9  ; Nd #  [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE
> 
> Thanks!  I wasn't aware of these additions.  I gave them a try
> but it appears Perl 5.8.0 was treating Unicode 4.0 chars as invalid.


In what way invalid?

> My GNOME terminal seemed to be converting Osmanya into something else
> also.

Unicode 4.0 came out this spring (about 9 months after Perl 5.8.0), so
I wouldn't be surprised if much software (or data, like fonts) isn't
yet updated for it.

> I'd like to bring up another utf8 issue.  My scripts that work with
> utf8 text always seem to start with:
> 
> use utf8;
> if ( $] >= 5.007 ) {
>       binmode (STDOUT, ":utf8");
> }
> 
> 
> It would be nice if "use utf8" set IO modes for utf8 automagically.
> Perhaps a pragma could be passed such as:  use utf8 ':all'  (or something),
> that set everything to utf8 that is settable.

And fixing that in Perl 5.8.1 would help Perl 5.8.0 how? :-)

But more seriously, the "use utf8" is "an evolutionary dead end".
The only thing it means these days is "my script is in UTF-8".
For "all the other" things, I think there can't ever be a consensus
for "all those things", since there are so many of such things.
Better be very explicit about the things you want to "UTF-8-ize".

> cheers,
> 
> /Daniel

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Re: utf8_heavy noise

Reply via email to