As you probably know perl's version of UTF-8 is not the real thing. I thought I would hack up a patch to support the encoding as defined by Unicode. That involves rejecting illegal chars (like surrogates, "\x{FFFF}" and "\x{FDD0}), chars above 0x10FFFF, overlong sequences and such.
Before I do this I would like to get some feedback on the interface. My prefered interface would be to make: encode("UTF-8", $string) imply the official restricted form and then have encode("UTF-8-Perl", $string) be used as the name for Perl's relaxed and extended version of the encoding. The encode_utf8($string) function would continue to be the same as encode("UTF-8-Perl", $string). This implies that encode("UTF-8", $string) can start failing while previously it could not. Another approach would be to add a FB_STRICT flag that could be passed with the CHECK argument. I'm not sure this would make sense for any encoding besides UTF-8 though. Other suggestions or comments? Regards, Gisle