> my $ret = $dom->xml; > # The return value is unreliable. Looks like you'll get Unicode > # characters or legacy bytes depending on content. You can fix this > # starting from Perl 5.8. But should you have to?
You do get a string of characters. The characters may be UTF-8 encoded internally when they cannot be represented by the ANSI codepage. In general this should not really matter, as Perl can upgrade/downgrade encodings internally as it sees fit. The one problem of course is that Win32::OLE uses CP_ACP for the regular encoding whereas Perl internals use Latin1, so any code points where CP_ACP is different from Latin1 will get mangled. The downgrading of results to CP_ACP is probably a mistake; I can't see how this would ever be useful. It helps scripts that don't know how to deal with Unicode strings, but those shouldn't ask for CP_UTF8 results in the first place. The internal confusion between Latin1 and CP_ACP is harder to deal with: the core text functions all assume Latin1, and the filesystem APIs all assume CP_ACP. So if we were to fix this to always assume Latin1 internally, then all scripts that read filenames from backticks/ qx(), or receive them from GUI dialogs, or read them from ANSI encoded text files will break unless they convert them to Latin1 explicitly. Maybe that breakage is necessary eventually, but it won't happen for Perl 5.14, so any change there is a long way off. > A data-dependent return value encoding is difficult to work with. Ignoring the CP_ACP/Latin1 issue, why does it matter which internal encoding is used for your strings? You commented out the line that put STDOUT into Unicode mode: # binmode STDOUT, ':utf8' unless $P56; But if you re-activate the line, then you will see that the characters are written out the same way, regardless of the way they have been encoded internally. Cheers, -Jan _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
