Thanks, applied in my repository. New tests and documentation fix in progress. When I am done w/ that, I will release Encode-2.0901 on my web (not CPAN yet). When cross-checks by porters are done I will release Encode-2.10.
Dan the Encode Maintainer
Now I am writing test suites and found some of the strictures are missing.
Surrogate -- OK
% perl -Mblib -MEncode -le '$a="\x{d801}"; print encode("UTF-8", $a, 1)'
"\x{d801}" does not map to utf8 at /gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
U+FFFF -- OK
% perl -Mblib -MEncode -le '$a="\x{ffff}"; print encode("UTF-8", $a, 1)'
"\x{ffff}" does not map to utf8 at /gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
Chars above U+10FFFF -- NOT OK
%> perl -Mblib -MEncode -le '$a="\x{11ffff}"; print encode("UTF-8", $a, 1)'
????
Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a problem of perl core. So I have checked utf8.c which defines that. Seems like it does not make use of PERL_UNICODE_MAX.
The patch against utf8.c fixes that.
> ~/danperl/bin/perl5.8.6 -Mblib -MEncode -le '$a="\x{11FFFF}"; print encode("UTF-8", $a, 1)'
"\x{00f4}" does not map to utf8 at /gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
As you see, the warning is still funny. But for any case w/ UTF8_WARN_LONG is funny as follows;
> perl -Mblib -MEncode -le '$a="\x{7fff_ffff}"; print encode("UTF-8", $a, 1)'
??????
> perl -Mblib -MEncode -le '$a="\x{8000_0000}"; print encode("UTF-8", $a, 1)'
"\x{00fe}" does not map to utf8 at /gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
I have tracked down and found this warning was handled by Encode so Gisle and I can fix that.
Dan the Encode Maintainer
--- perl-5.8.x/utf8.c Wed Nov 17 23:11:04 2004 +++ perl-5.8.x.dan/utf8.c Sun Dec 5 11:38:52 2004 @@ -429,6 +429,13 @@ } else uv = UTF8_ACCUMULATE(uv, *s); + /* Checks if ord() > 0x10FFFF -- dankogai */ + if (uv > PERL_UNICODE_MAX){ + if (!(flags & UTF8_ALLOW_LONG)) { + warning = UTF8_WARN_LONG; + goto malformed; + } + } if (!(uv > ouv)) { /* These cannot be allowed. */ if (uv == ouv) {