I think I've had the misfortune to run into a bug in the Win32::OLE
encoding logic, or possibly at a deeper layer. But maybe there is
something that I have not yet understood about this issue.

So what is it about?

One of the options for the Win32::OLE module is "CP", the codepage.
The documentation has the following to say about this option:

  This variable is used to determine the codepage used by all
  translations between Perl strings and Unicode strings used by
  the OLE interface. The default value is CP_ACP, which is the
  default ANSI codepage. Other possible values are CP_OEMCP,
  CP_MACCP, CP_UTF7 and CP_UTF8. These constants are not exported
  by default.

Let's see how this works in practice using some code involving
the Microsoft XML library (msxml6.dll):

          \,,,/
          (o o)
------oOOo-(_)-oOOo------
use strict;
use warnings;
use utf8;
use Win32::OLE;
my $P56 = $] < 5.008;
require Encode unless $P56;

sub add {
  my( $dom, $txt ) = @_;
  my $node = $dom->createElement( 'E' );
  $node->appendChild( $dom->createTextNode( $txt ) );
  $dom->documentElement->appendChild( $node );
  my $ret = $dom->xml;
  # The return value is unreliable. Looks like you'll get Unicode
  # characters or legacy bytes depending on content. You can fix this
  # starting from Perl 5.8. But should you have to?
  $ret = Encode::encode_utf8( $ret ) unless $P56; # force octets
  print $ret;
}

Win32::OLE->Option( CP => Win32::OLE::CP_UTF8 ); # UTF-8, please

# binmode STDOUT, ':utf8' unless $P56;
my $xml = '<U/>';
# $xml = '<?xml version="1.0" encoding="utf-8"?>' . $xml; # no use
my $dom = Win32::OLE->new( 'Msxml2.DOMDocument.6.0' );
$dom->loadXML( $xml );
add $dom, 'eins';
add $dom, 'blöd';
add $dom, 'weiß';
add $dom, 'café';       # still Latin1, no UTF-8 encoding
add $dom, 'ανεργία';    # Greek or Russian force UTF-8 encoding
add $dom, 'убит';
-------------------------

A data-dependent return value encoding is difficult to work with.

Is this a bug, or an oversight or misunderstanding on my behalf?
-- 
Michael Ludwig 
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to