At 5:19 PM -0800 1/18/01, Gisle Aas wrote:
>Paul Hoffman <[EMAIL PROTECTED]> writes:
>  >          { $OutString .= utf8(uchr(hex("0x$PartString"))); }
>  > Why is uchr putting out UTF16 instead of UTF8 for the non-BMP character?
>
>Unicode::String is simply UTF16 internally.  The ->length and ->substr
>methods all operate directly on the UTF16 representation without
>looking for surrogates.  This is actually the wrong thing to do.  If
>these where fixed to know about surrogates then I think this example
>would work as you expected.  The ->hex function should probably also
>be made surrogate aware.

Fully agree. Given that Unicode 3.1 is about to come out with >40,000 
characters outside of the BMP, doing this soon would be a Very Good 
Thing.

>  > Even if uchr is putting out UTF16, why isn't the utf8() call coercing
>>  the value from UTF16 to UTF8?
>
>utf8() is actually converting from UTF8 to UTF16.  uchr() is
>converting a numeric value to UTF16.

Well, a UTF8 version of uchr would be good, even if it has a new name.

>  > How do I get this to put out UTF8, which is what I need?
>
>The ->utf8 method should do that.

Sorry, I don't understand this. Do you mean change
         { $OutString .= utf8(uchr(hex("0x$PartString"))); }
to
         { $OutString .= uchr(hex("0x$PartString"))->utf8; }
If so, that doesn't change the output at all. It is still a surrogate.

--Paul Hoffman

Reply via email to