At 5:19 PM -0800 1/18/01, Gisle Aas wrote:
>Paul Hoffman <[EMAIL PROTECTED]> writes:
> > { $OutString .= utf8(uchr(hex("0x$PartString"))); }
> > Why is uchr putting out UTF16 instead of UTF8 for the non-BMP character?
>
>Unicode::String is simply UTF16 internally. The ->length and ->substr
>methods all operate directly on the UTF16 representation without
>looking for surrogates. This is actually the wrong thing to do. If
>these where fixed to know about surrogates then I think this example
>would work as you expected. The ->hex function should probably also
>be made surrogate aware.
Fully agree. Given that Unicode 3.1 is about to come out with >40,000
characters outside of the BMP, doing this soon would be a Very Good
Thing.
> > Even if uchr is putting out UTF16, why isn't the utf8() call coercing
>> the value from UTF16 to UTF8?
>
>utf8() is actually converting from UTF8 to UTF16. uchr() is
>converting a numeric value to UTF16.
Well, a UTF8 version of uchr would be good, even if it has a new name.
> > How do I get this to put out UTF8, which is what I need?
>
>The ->utf8 method should do that.
Sorry, I don't understand this. Do you mean change
{ $OutString .= utf8(uchr(hex("0x$PartString"))); }
to
{ $OutString .= uchr(hex("0x$PartString"))->utf8; }
If so, that doesn't change the output at all. It is still a surrogate.
--Paul Hoffman