Re: perl unicode support

ＳｒｉｎＴｕａｒ Wed, 28 Mar 2007 08:45:12 -0800

> Once you have a regex library that handles codepoints, the code that uses
> it doesnt have to care about them in particular.


It's not so simple. Suppose you have a byte sequence (decimal) 65 195 129
66. (This is the beginning of the Hungarian alphabet AÁB... encoded in
UTF-8).


Why is it not so simple?I just want to know some basic information:
Does it match or not. What range of bytes in the string was matched.

I don't care what the regex library does under the covers, and I
shouldnt have to care...
I can safely extract substrings on those boundaries now if it did its job right.

If it knows how to match "Á" to ".", then I dont have to know how it
goes about doing so.
Even better if the regex engine handles both normalization forms
transparently. My code should never have to care. I shouldnt have to
jump through hoops, and call all sort of fancy "binmode" settings or
perform "Encode::decode" incantantions everywhere to turn my scalars
back into plain old strings.
.)��D��-|��ˊ{��v��W�z[��b��m�������Yb��h���{���

Re: perl unicode support

Reply via email to