Using Unicode-String-2.06, I have the following test program:

=====

#!/usr/bin/perl -w

use Unicode::String qw(utf8 utf16 uchr);
Unicode::String->stringify_as('utf8');

@TestArr = ("0061 0062", "0063 12345");

foreach $TheString (@TestArr) {
     @AllHexIn = split(/\s+/, $TheString);
     $OutString = '';
     foreach $PartString (@AllHexIn)
         { $OutString .= utf8(uchr(hex("0x$PartString"))); }

     $TheLen = utf8($OutString)->length;

     $HexOfInput = '';
     foreach($i=0; $i<utf8($OutString)->length; $i++) {
         $HexOfInput .= utf8($OutString)->substr($i, 1)->hex . ' | ';
     }
     print "$TheString  $TheLen    $HexOfInput\n";
}

=====

The output is:

0061 0062  2    U+0061 | U+0062 |
0063 12345  3    U+0063 | U+d808 | U+df45 |

Why is uchr putting out UTF16 instead of UTF8 for the non-BMP character?

Even if uchr is putting out UTF16, why isn't the utf8() call coercing 
the value from UTF16 to UTF8?

How do I get this to put out UTF8, which is what I need?

--Paul Hoffman

Reply via email to