Hi Perl Gurus,
I am using functions decode_entities() & decode_utf8() to decode the html
codes and UTF (latin characters) respectively. (from module use Encode).
The functions which i mentioned above works upto ASCII Decimals 255 and
above that it works differently.
This is the URL i referred to know the list of html codes and latin
characters [http://www.ascii.cl/htmlcodes.htm].
Attached the sample script.
Where i give the input values which i got from a XML SOAP response for
decoding (The SOAP response doesn't gives the HTML numbers or HTML codes as
in the above said URL list).
The script gives me what i expected for array values from arr_val[0] to
arr_val[4] ((i.e) upto ASCII Decimals range 0-255)
but for arr_val[5] (which have ASCII Decimals greater than 255) the decoded
values are different.
Given the list of array variable values and their expected values. The
decoding fails for array variable arr_val[5].
Similarly i would need to encode also.
$arr_val[0] = '!"#$%&'()*+,-./ 0123456789:;<=>?' ;
expected decoded values -- !"#$%&'()*+,-./ 0123456789:;<=>?
$arr_val[1] =
'@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' ;
expected decoded values --
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
$arr_val[2] =
'�...@Ã~aÃ~bÃ~cÃ~dÃ~eÃ~fÃ~gÃ~hÃ~iÃ~jÃ~kÃ~lÃ~mÃ~nÃ~oÃ~pÃ~qÃ~rÃ~sÃ~tÃ~uÃ~vÃ~wÃ~xÃ~yÃ~zÃ~[Ã~\Ã~]Ã~^Ã~_'
;
expected decoded values -- ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß
$arr_val[3] = 'Ã|
áâãäåæçèéêëìÃîïðñòóôõö÷øùúûüýþÿ' ;
expected decoded values -- àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
$arr_val[4] =
'¡¢£¤¥¦§¨©ª«¬Â®¯°±²³´µ¶·¸¹º»¼½¾¿' ;
expected decoded values -- ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿
$arr_val[5] = 'others Å~RÅ~SÅ|
šŸÆ~r...@~sâ~@~t...@~xâ~@~y...@~zâ~@~\...@~]â~@~^â~@| �...@¡â~@¢...@¦' ;
expected decoded values -- others ŒœŠšŸƒ–—‘’‚“”„†‡•…‰€™
Could you please help to know what i am missing or doing wrong.
I'll greatly appreciate the help.
Thanks
Saravanan Balaji.
#!/ms/dist/perl5/bin/perl5.8 -I ../
use MSDW::Version
'HTML-Parser' => '3.56', # HTML::Entities may be used by HTTP::Response
;
use Encode;
use strict;
use Data::Dumper;
use HTML::Entities;
use HTML::Entities qw(encode_entities_numeric);
my @arr_val = ();
$arr_val[0] = '!"#$%&'()*+,-./ 0123456789:;<=>?' ;
$arr_val[1] =
'@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' ;
$arr_val[2] = '懒旅呐魄壬仕掏蜗醒矣哉肿刭谯茌捱' ;
$arr_val[3] = '噌忏溴骁栝觌祉铒瘃蝮趱鲼��������' ;
$arr_val[4] = '、¥ウЖ┆�����氨渤吹斗腹夯冀究' ;
$arr_val[5] = 'others ������������������' ;
my $bcp_in_file = "/tmp/testbcp.in" ;
my $out_str = "" ;
if (!(open ( TEMP_OUT, ">$bcp_in_file" ) )) ##REVISIT##
{
print "Error: cannot open the file \n";
}
foreach my $temp_var (@arr_val)
{
print "\nProcessing value [$temp_var] \n";
decode_entities($temp_var) ;
print "After HTML decode [$temp_var] \n";
my $temp_var2 = decode_utf8($temp_var);
print "After UTF8 decode [$temp_var2] \n\n";
print TEMP_OUT $temp_var2 ;
#my $temp_var3 = encode_utf8($temp_var2);
#print "After UTF8 encode [$temp_var3] \n";
#my $temp_var4 = encode_entities($temp_var3, '"&<>' );
#print "After HTML encode [$temp_var4] \n";
}
1;
############ End of Script #################