Re: How to handle unicode strings in utf8 and pre-utf8 pragma perls

Richard Evans Sun, 01 Jun 2003 01:33:39 -0700

David Graff wrote:

> If I understand Nicholas Clark's suggestion, it would mean that for any
> perl version prior to 5.8.0, the script won't compile unless "if.pm"
> has been installed from CPAN.
> 
> The fact that "if.pm" exists and is usable on older perl5 versions is
> really good news, but it still might be a hurdle for some users who
> depend on remote web-server sys-admins (or other uncontrollable forces)
> for perl support...


But as the modules I'm writing are not core perl modules, they'd have to be
installed anyway - I guess that's a problem whatever way I do it.

I've got to say if.pm looks like a brilliantly simple way of handling my
problem.

> In any case, one work-around for handling utf8 text in a version-neutral
> way would be to store this text in a file, not hard-coded into the perl
> script; then decide how to read the file, depending on the version; e.g.
> 
>  open( DAYS, "day_names.utf8" );
>  binmode( DAYS, ":utf8" ) if ( $] >= 5.008 );
>  @day_names = <DAYS>;
>  close DAYS;
> 
> Depending on what you do with the data elsewhere in your script, I'm not
> sure whether 5.6 will treat the data as utf8 characters when read from
> a file like this (5.6 does not support "binmode ':utf8', FH"), but
> there's a good chance that it will work.
> 
> You can also attach this text content at the end of your script, in a
> __DATA__ segment, and set DATA as the file handle in the code sample
> shown above (rather than DAYS).
> 
> Of course even using __DATA__, it can get tedious and hard to maintain
> if you have a lot of little string constants scattered throughout.

Thanks - these are useful ideas which I'll use in some other modules I'm
doing, but if.pm just feels right for what I'm trying to do ATM.

> (P.S.: for some reason, three of the characters in your first string
> didn't map to proper Cyrillic code points for me: \u04e9 and the two
> occurrences of \u04af -- I don't know the language, but were those
> typos?)

Ah, I picked the example at random - I'm using data from the OpenI18N/ICU
locales, and looking at the Kirghiz locale using the IBM ICU
LocaleExplorer:

  http://oss.software.ibm.com/cgi-bin/icu/lx/en/utf-8/?_=ky

I see the same result - it also says:

"Note: You're viewing an experimental locale. This locale is not part of the
official ICU installation! Please do not file bugs against this locale"

At the top, so who knows!

I hate having to use languages that I don't understand and, based off
feedback so far, there are problems with the ICU data as it stands. 

But I suppose a "comprehensive" set of locale date modules consisting of
English and basic French wouldn't be quite so useful ;->

Thanks for the feedback,
-- 
Richard Evans
[EMAIL PROTECTED]

Re: How to handle unicode strings in utf8 and pre-utf8 pragma perls

Reply via email to