Re: utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform

SADAHIRO Tomoyuki Thu, 01 Sep 2005 15:18:43 -0700

Hello.
I think it is correct.

On EBCDIC platforms, perl uses UTF-EBCDIC instead of UTF-8,
nevertheless perl calls it "utf8."


In general chr(0xFF) (equals to "\xFF") in EBCDIC encodings
corresponds to U+009F, that is a single-octet control character;
thus a single octet sequence "\xFF" is well-form in UTF-EBCDIC too.

If you want to convert an interger to a character according to
Unicode scalar values, you can use pack('U'), but not chr().
For example, pack('U', 0xFF) should correspond to U+00FF
(y with diaeresis), everywhere (both on ASCII and on EBCDIC).

Regards,
SADAHIRO Tomoyuki

> Hi,
> 
>  This are the tetstcase i'm runing on EBCDIC platform,
> 
> my $b = chr(0x0FF);
> $p=utf8::upgrade($b);
> print "\n$p";
> 
> utf8::upgarde returns the number of octets necessary
> to represent the string as UTF-X.
> 
> EBCDIC output is 1 whereas ASCII platform output is 2.
> Is the return value i'm getting on EBCDIC is correct?
> 
> my $c=chr(0x0FF);
> print "before $c\n";
> print "\n";
> utf8::encode($c);
> print "after $c\n";
> print length($c);
> 
> On ASCII before is single octet repsentation and after
> encode is two byte , length is 2.
> 
> On EBCDIC it is single before and after encode and
> length is 1.
> 
> Is this correct on EBCDIC or is it a bug in code for
> EBCDIC ?
> 
> utf::is_utf8 test whether STRING is in UTF-8, so 0x0FF
> is UTF-8 on EBCDIC?

Re: utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform

Reply via email to