Re: Weird interaction of ord, split, and substr with UTF-8?

Paul Hoffman Tue, 31 Oct 2000 13:09:23 -0800

At 9:01 PM +0100 10/31/00, Andreas J. Koenig wrote:
>I'd highly recommend falling back to Unicode::String, there are too
>many bugs in all perls since the model was changed from marking code
>to marking strings.

This sounds reasonable to me. It was exciting to try, however!

>  You do not need UCS-4 for your example, there is
>$u->substr and $u->ord!

<thwack> (The sound of my palm hitting my forehead) You mean I should 
read past the first ten lines of the Unicode::String man page?!? :-) 
Yep, this looks exactly right. Now let's see if it works with real 
data. Thanks!

FWIW, I'm writing a program to do domain name preparation, which is 
being worked on in the IETF's IDN WG. I'm doing lowercasing and 
checking for prohibited characters myself, and handing off 
normalization to Martin Dürst's charlint.pl. I'll be making my 
program public, and will let this list know when I think it is 
somewhat ready.

--Paul Hoffman

Re: Weird interaction of ord, split, and substr with UTF-8?

Reply via email to