On 01/18/2007 05:13 AM, Eugene Kosov wrote:
Hi, everyone!
First of all, I'm sorry if this isn't a right place for such a question.
I'm having troubles with concatenating 2 utf8 strings. The only
difference (if I dont miss something) is 1st one is hardcoded to the
script (see below) and the 2nd is recieved from STDIN. I get something
strange if I try to concatenate them.
Does any body know what am I doing wrong? What is the right way to do this?
Any help, links, man refs will be greatly appreciated
$ cat test.pl
#!/usr/bin/perl -w
use strict;
binmode($_, ':utf8')
for(\*STDIN);
my $A = "\xd0\xa2\xd0\xb5\xd1\x81\xd1\x82\x0a";
my $B = <STDIN>;
my $C = $A.$B;
print "A: ", unpack('H*', $A), "\n";
print "B: ", unpack('H*', $B), "\n";
print "C: ", unpack('H*', $C), "\n";
$ hd input
00000000 d0 a2 d0 b5 d1 81 d1 82 0a |âõÃ?ÂÃ?Â?.|
00000009
$ perl test.pl < input
A: d0a2d0b5d181d1820a
B: d0a2d0b5d181d1820a
C: c390c2a2c390c2b5c391c281c391c2820ad0a2d0b5d181d1820a
--
BR,
Eugene Kosov
It's related to the UTF8-flag. Since $B has its utf8-flag on, perl seems
to think that it needs to re-encode $A into utf8 before concatenating
with $B. Try this program:
use strict;
use warnings;
use File::Slurp;
use Encode ();
my $playfile = '/tmp/play.file';
my $A = "\xd0\xa2\xd0\xb5\xd1\x81\xd1\x82\x0a";
unless (-f $playfile) {
write_file $playfile, { binmode => ':raw' }, $A;
}
my $tf = sub { shift() ? 'true' : 'false' };
open (STDIN, '<', $playfile) or die ("Redirection failed: $!\n");
binmode(STDIN,':utf8');
my $B = join ('',<STDIN>);
Encode::_utf8_off($B); # Comment this out to get a bug.
my $C = $A.$B;
print "A: ", unpack('H*', $A), "\n";
print "B: ", unpack('H*', $B), "\n";
print "C: ", unpack('H*', $C), "\n";
print "UTF8 for \$A: ", $tf->(Encode::is_utf8($A)), "\n";
print "UTF8 for \$B: ", $tf->(Encode::is_utf8($B)), "\n";
print "UTF8 for \$C: ", $tf->(Encode::is_utf8($C)), "\n";
__HTH__
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/