Hello,

I met a trouble just describing blow,
XML::Parser Encoding (UTF-8 -> ISO-8859-1)
http://www.netwise.it/xml/perlmonks/?node_id=197119

I'm porting some scripts written when perl 5.005 to perl 5.8.
These scripts are using,
  XML::Grove           0.46 alpha
  XML::Parser          2.34
  XML::Parser::PerlSAX 0.07

I upgraded perl to 5.8, but above modules are unchanged.


In old time the man making these scripts was awaring that multibyes strings
are not treated well, so they desided to "URL-encode" these multibyes like;
'%82%B5%82%E8%81%5B%82%B8',
and get XML::Parser to parse them, then to "URL-decode" into;
x"82B582E8815B82B8" (8 bytes long).

$str =~ s/%([0-9A-Fa-f][0-9A-Fa-f])/pack('H2', $1)/eg;

But now this "URL-decode" ended into "unpack('H*',$str)";
x"C282C2B5C282C3A8C2815BC282C2B8", should be utf8.
The length function retuens 8 but 15 under "use bytes;".

I tried these in error;

(1) Insert "no encoding;" into the main script, after all "use xxxx".
(2) call the next after "URL-decode".
    sub de_utf8 {
      use bytes;
      return "$_[0]";
    }
(3) $parser->parse( Source => { SystemId => $file_name, ProtocolEncoding => 
"iso-8859-1" });
(4) $parser->parse( Source => { SystemId => $file_name, Encoding => 
"iso-8859-1" });
(3),(4) are test with '<?xml version="1.0" encoding="iso-8859-1"?>'

And finaly the next way is good;

use utf8;
my $latin = pack("C*", unpack('U*', $utf));


My tests may not enough and missing something.
Now I can strip this (maybe) utf8 flags after "URL-decode", but these data
becomes utf8 again just before DB2 insert modules. These data is complicated
hash references.
The code of "URL-decode" is in a module, so I can rewrite it at once, but
DB2 insert modules are not one, 3 or 4.


After all, I reached into this point after 1 week. I can't figure out the
best solution. I like to know the encoding mechanism of XML::Parser,
and how to know the current encoding status, how to suppress utf8 effectively.

Regards,
Hirosi Taguti
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to