I think I've had the misfortune to run into a bug in the Win32::OLE
encoding logic, or possibly at a deeper layer. But maybe there is
something that I have not yet understood about this issue.
So what is it about?
One of the options for the Win32::OLE module is "CP", the codepage.
The documentation has the following to say about this option:
This variable is used to determine the codepage used by all
translations between Perl strings and Unicode strings used by
the OLE interface. The default value is CP_ACP, which is the
default ANSI codepage. Other possible values are CP_OEMCP,
CP_MACCP, CP_UTF7 and CP_UTF8. These constants are not exported
by default.
Let's see how this works in practice using some code involving
the Microsoft XML library (msxml6.dll):
\,,,/
(o o)
------oOOo-(_)-oOOo------
use strict;
use warnings;
use utf8;
use Win32::OLE;
my $P56 = $] < 5.008;
require Encode unless $P56;
sub add {
my( $dom, $txt ) = @_;
my $node = $dom->createElement( 'E' );
$node->appendChild( $dom->createTextNode( $txt ) );
$dom->documentElement->appendChild( $node );
my $ret = $dom->xml;
# The return value is unreliable. Looks like you'll get Unicode
# characters or legacy bytes depending on content. You can fix this
# starting from Perl 5.8. But should you have to?
$ret = Encode::encode_utf8( $ret ) unless $P56; # force octets
print $ret;
}
Win32::OLE->Option( CP => Win32::OLE::CP_UTF8 ); # UTF-8, please
# binmode STDOUT, ':utf8' unless $P56;
my $xml = '<U/>';
# $xml = '<?xml version="1.0" encoding="utf-8"?>' . $xml; # no use
my $dom = Win32::OLE->new( 'Msxml2.DOMDocument.6.0' );
$dom->loadXML( $xml );
add $dom, 'eins';
add $dom, 'blöd';
add $dom, 'weiß';
add $dom, 'café'; # still Latin1, no UTF-8 encoding
add $dom, 'ανεργία'; # Greek or Russian force UTF-8 encoding
add $dom, 'убит';
-------------------------
A data-dependent return value encoding is difficult to work with.
Is this a bug, or an oversight or misunderstanding on my behalf?
--
Michael Ludwig
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs