Dan Kogai <[EMAIL PROTECTED]> writes: > --- perl-5.8.x/utf8.c Wed Nov 17 23:11:04 2004 > +++ perl-5.8.x.dan/utf8.c Sun Dec 5 11:38:52 2004 > @@ -429,6 +429,13 @@ > } > else > uv = UTF8_ACCUMULATE(uv, *s); > + /* Checks if ord() > 0x10FFFF -- dankogai */ > + if (uv > PERL_UNICODE_MAX){ > + if (!(flags & UTF8_ALLOW_LONG)) { > + warning = UTF8_WARN_LONG; > + goto malformed; > + } > + } > if (!(uv > ouv)) { > /* These cannot be allowed. */ > if (uv == ouv) {
I think this patch is wrong since UTF8_ALLOW_LONG is about allowing overlong sequences. What we need is a UTF8_ALLOW_SUPER flag (matching UNICODE_ALLOW_SUPER) that would indicate that code points past 10xFFFF should be allowed. This would be the flag that UTF8_ALLOW_ANYUV should contain instead of UTF8_ALLOW_LONG. Unfortunately there is no more room for UTF8_ALLOW_* flags in the UTF8_ALLOW_ANY space so we would have to add some bits to this mask, which give us binary incompatiblity with extensions that use the old UTF8_ALLOW_ANY value. The UTF8_ALLOW_FFFF should also allow 0x1FFFF, 0x2FFFF as well as the 0xFFFE variants. This match the UNICODE_ALLOW_FFFF behaviour. Currently it only allows 0xFFFF. The UTF8_ALLOW_FDD0 flag to match UNICODE_ALLOW_FDD0 is also missing, but insted of introducing UTF8_ALLOW_FDD0 it seems better to collapse the *_ALLOW_FFFF and *_ALLOW_FDD0 flags into a single *_ALLOW_ILLEGAL and then make UNICODE_IS_ILLEGAL() match this. --Gisle