Re: Encode utf-16 problem
Dan Kogai <[EMAIL PROTECTED]> writes: >On Tuesday, Dec 3, 2002, at 11:24 Asia/Tokyo, Dan Kogai wrote: >> Aw. You can't use 'utf16' for "use encoding" or PerlIO. You have to >> specify the endianness. Because of the BOM mark you can't use it for >> PerlIO stream. > >Hmm Even with endianness strictly set PerlIO still warns w/ partial >character warning. Should I mark all UTF as non-PerlIO-savvy (at least >BOMless ones should be done so). Since Partial Character warnings are >handled by PerlIO::encoding, it takes NI-XS to fix the prob The partial char stuff needs the encoding to use same rules as Encode::XS will take a look if it isn't fixed yet. > >Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/
Re: Encode utf-16 problem
I think this was (at least partly) a wrong alarm: it seems that it's the byte sequence 0x00 0x0a that makes groan about partial characters. If I do things "right" and convert also the "\n" (aka 0xa) to (little-endian) UTF-16 (0x0a 0x00), things work without warnings. (I've not figured out yet what really goes on with the 0x00 0x0a case.) $ ./perl -e 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16 $ hex utf16 ff fe 74 00 65 00 73 00 74 00 ..t.e.s.t. $ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print ' $ ./perl -le 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16 $ hex utf16 ff fe 74 00 65 00 73 00 74 00 0a..t.e.s.t.. $ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print ' UTF-16:Partial character at -e line 1. UTF-16:Partial character at -e line 1. UTF-16:Partial character at -e line 1, line 1. kosh:~/pp4/maint-5.8/perl ; ./perl -e 'print pack("v*", 0xFEFF, unpack("C*", "test\n"))' >! utf16 $ hex utf16 ff fe 74 00 65 00 73 00 74 00 0a 00 ..t.e.s.t... $ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print ' test $ -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
Re: Encode utf-16 problem
> >Aw. You can't use 'utf16' for "use encoding" or PerlIO. You have to > >specify the endianness. Because of the BOM mark you can't use it for > >PerlIO stream. > > Hmm Even with endianness strictly set PerlIO still warns w/ partial > character warning. Should I mark all UTF as non-PerlIO-savvy (at least > BOMless ones should be done so). Since Partial Character warnings are > handled by PerlIO::encoding, it takes NI-XS to fix the prob > > Dan the Encode Maintainer Actually this was originally reported by David Dyck wondering why ":utf16" doesn't work in the three argument form of open: open(FH, ":utf16", $file). (One gets a complaint of PerlIO/utf16.pm not existing.) I thought: 'Ha! Of course one needs to do ":encoding(utf16)" instead!'-- but that didn't work much better, as you can see. See http://bugs6.perl.org/rt2/Ticket/Display.html?id=15533 -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen
Re: Encode utf-16 problem
On Tuesday, Dec 3, 2002, at 11:24 Asia/Tokyo, Dan Kogai wrote: Aw. You can't use 'utf16' for "use encoding" or PerlIO. You have to specify the endianness. Because of the BOM mark you can't use it for PerlIO stream. Hmm Even with endianness strictly set PerlIO still warns w/ partial character warning. Should I mark all UTF as non-PerlIO-savvy (at least BOMless ones should be done so). Since Partial Character warnings are handled by PerlIO::encoding, it takes NI-XS to fix the prob Dan the Encode Maintainer
Re: Encode utf-16 problem
On Tuesday, Dec 3, 2002, at 11:12 Asia/Tokyo, Jarkko Hietaniemi wrote: Why the 'Partial character' warnings? I would have though the input files are just right. Also, the warnings are given to stderr unconditionally, I would have to redirect stderr to /dev/null to get rid of the warnings. $ perl -le 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16 $ hex utf16 ff fe 74 00 65 00 73 00 74 00 0a..t.e.s.t.. $ ./perl -Ilib -e 'open(FH, "<:encoding(utf16)", "utf16");$a=;print $a'|hex UTF-16:Partial character at -e line 1. UTF-16:Partial character at -e line 1. 74 65 73 74 test $ perl -le 'print pack("n*", 0xFEFF, unpack("C*", "test"))' >! utf16 $ hex utf16 fe ff 00 74 00 65 00 73 00 74 0a...t.e.s.t. $ ./perl -Ilib -e 'open(FH, "<:encoding(utf16)", "utf16");$a=;print $a'|hex UTF-16:Partial character at -e line 1. UTF-16:Partial character at -e line 1. 74 65 73 74 test $ Aw. You can't use 'utf16' for "use encoding" or PerlIO. You have to specify the endianness. Because of the BOM mark you can't use it for PerlIO stream. I'll tweak Unicode.pm so that perlio_ok returns 0 for BOMless UTF's in the next version Dan the Encode Maintainer