Re: Encode utf-16 problem

2003-01-06 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes:
>On Tuesday, Dec 3, 2002, at 11:24 Asia/Tokyo, Dan Kogai wrote:
>> Aw.  You can't use 'utf16' for "use encoding" or PerlIO.  You have to 
>> specify the endianness.  Because of the BOM mark you can't use it for 
>> PerlIO stream.
>
>Hmm Even with endianness strictly set PerlIO still warns w/ partial 
>character warning.  Should I mark all UTF as non-PerlIO-savvy (at least 
>BOMless ones should be done so).  Since Partial Character warnings are 
>handled by PerlIO::encoding, it takes NI-XS to fix the prob

The partial char stuff needs the encoding to use same rules as Encode::XS 
will take a look if it isn't fixed yet.


>
>Dan the Encode Maintainer
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/




Re: Encode utf-16 problem

2002-12-04 Thread Jarkko Hietaniemi
I think this was (at least partly) a wrong alarm: it seems that it's
the byte sequence 0x00 0x0a that makes  groan about partial
characters.  If I do things "right" and convert also the "\n" (aka 0xa)
to (little-endian) UTF-16 (0x0a 0x00), things work without warnings.
(I've not figured out yet what really goes on with the 0x00 0x0a case.)

$ ./perl -e 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16
$ hex utf16
ff fe 74 00 65 00 73 00 74 00   ..t.e.s.t.
$ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print '
$ ./perl -le 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16
$ hex utf16
ff fe 74 00 65 00 73 00 74 00 0a..t.e.s.t..
$ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print '
UTF-16:Partial character at -e line 1.
UTF-16:Partial character at -e line 1.
UTF-16:Partial character at -e line 1,  line 1.
kosh:~/pp4/maint-5.8/perl ; ./perl -e 'print pack("v*", 0xFEFF, unpack("C*", 
"test\n"))' >! utf16
$ hex utf16
ff fe 74 00 65 00 73 00 74 00 0a 00 ..t.e.s.t...
$ ./perl -Ilib -we 'open(FH, "<:encoding(utf16)", "utf16");print '
test
$ 

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen



Re: Encode utf-16 problem

2002-12-03 Thread Jarkko Hietaniemi
> >Aw.  You can't use 'utf16' for "use encoding" or PerlIO.  You have to 
> >specify the endianness.  Because of the BOM mark you can't use it for 
> >PerlIO stream.
> 
> Hmm Even with endianness strictly set PerlIO still warns w/ partial 
> character warning.  Should I mark all UTF as non-PerlIO-savvy (at least 
> BOMless ones should be done so).  Since Partial Character warnings are 
> handled by PerlIO::encoding, it takes NI-XS to fix the prob
> 
> Dan the Encode Maintainer

Actually this was originally reported by David Dyck wondering why ":utf16"
doesn't work in the three argument form of open: open(FH, ":utf16", $file).
(One gets a complaint of PerlIO/utf16.pm not existing.)  I thought:
'Ha!  Of course one needs to do ":encoding(utf16)" instead!'-- but that
didn't work much better, as you can see.
See http://bugs6.perl.org/rt2/Ticket/Display.html?id=15533

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen



Re: Encode utf-16 problem

2002-12-02 Thread Dan Kogai
On Tuesday, Dec 3, 2002, at 11:24 Asia/Tokyo, Dan Kogai wrote:

Aw.  You can't use 'utf16' for "use encoding" or PerlIO.  You have to 
specify the endianness.  Because of the BOM mark you can't use it for 
PerlIO stream.

Hmm Even with endianness strictly set PerlIO still warns w/ partial 
character warning.  Should I mark all UTF as non-PerlIO-savvy (at least 
BOMless ones should be done so).  Since Partial Character warnings are 
handled by PerlIO::encoding, it takes NI-XS to fix the prob

Dan the Encode Maintainer




Re: Encode utf-16 problem

2002-12-02 Thread Dan Kogai
On Tuesday, Dec 3, 2002, at 11:12 Asia/Tokyo, Jarkko Hietaniemi wrote:

Why the 'Partial character' warnings?  I would have though the input
files are just right.  Also, the warnings are given to stderr
unconditionally, I would have to redirect stderr to /dev/null to get
rid of the warnings.

$ perl -le 'print pack("v*", 0xFEFF, unpack("C*", "test"))' >! utf16
$ hex utf16
ff fe 74 00 65 00 73 00 74 00 0a..t.e.s.t..
$ ./perl -Ilib -e 'open(FH, "<:encoding(utf16)", 
"utf16");$a=;print $a'|hex
UTF-16:Partial character at -e line 1.
UTF-16:Partial character at -e line 1.
74 65 73 74 test
$ perl -le 'print pack("n*", 0xFEFF, unpack("C*", "test"))' >! utf16
$ hex utf16
fe ff 00 74 00 65 00 73 00 74 0a...t.e.s.t.
$ ./perl -Ilib -e 'open(FH, "<:encoding(utf16)", 
"utf16");$a=;print $a'|hex
UTF-16:Partial character at -e line 1.
UTF-16:Partial character at -e line 1.
74 65 73 74 test
$

Aw.  You can't use 'utf16' for "use encoding" or PerlIO.  You have to 
specify the endianness.  Because of the BOM mark you can't use it for 
PerlIO stream.

I'll tweak Unicode.pm so that perlio_ok returns 0 for BOMless UTF's in 
the next version

Dan the Encode Maintainer