Fwd: copy unicode (UCS-2) file

Hans Ginzel Wed, 26 Nov 2014 05:52:28 -0800

Hello!

Consider a small perl code below.
It should copy text file with removing leading and trailing spaces.


while (<>) {
 s/^s+//; s/s+$//;
 print $_, "n"; # say;
}

I run it with "shell" redirection
perl copy.pl <src.txt >dst.txt

It works well for Windows ansi and utf-8 text files. But when I have
tried an unicode (ucs-2le) source file containing
"anb" this is
FF FE 61 00 0D 00 0A 00 62 00 0D 00 0A 00
in hex (with Byte Order Mark) I get characters in hex
FF FE 61 00 0D 00 0D 0A 00 62 00 0D 00 0D 0A 00 0D 0A
.
I have attached these files but I am not sure what Mail Agents do with
them.

Variable PERL_UNICODE is not set.

I have tried add -CS to the command line, but got info about Malformed
UTF-8 character.
I have tried adding each of these pragmas to the beginning
use open ':encoding(UCS-2LE)';
use open IO => ':encoding(UCS-2LE)';
use open ':std' => ':encoding(UCS-2LE)';
but without desired goal. I tried to combine the pragma with -CS
option.
I have tried use feature qw/say/; say; instaed of print $_, "n"; but
without correct results.

perl --version
This is perl 5, version 18, subversion 2 (v5.18.2)
built for MSWin32-x86-multi-thread-64int

What is the correct way to set stdin/out to UCS-2LE, please?
What is the correct way to print "encoding independent" new line
character, please?
What is the correct way to say that s should match the "UCS-2LE way",
please?

In addition, is there a standardised way to auto-detect input encoding
(legacy(8bit)/utf-8/ucs-2), please?

Best regards
Hans Ginzel

ÿþa
b

ÿþa

b

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Fwd: copy unicode (UCS-2) file

Reply via email to