> -----Original Message-----
> From: Jan Dubois
> On Fri, 10 Sep 2010, Ludwig, Michael wrote:
> > The Win32::OLE manual says the following about the CP option:
> >
> > ----
> > This variable is used to determine the codepage used by all
> > translations between Perl strings and Unicode strings used  by the
> > OLE interface. The default value is CP_ACP, which is the default
> > ANSI codepage. Other possible values are CP_OEMCP, CP_MACCP, CP_UTF7
> > and CP_UTF8. These constants are not exported by default.
> > ----
> >
> > I don't understand the impact of this setting. I presume there isn't
> > any, but I want to be sure.
> 
> OLE Automation transfers strings internally encoded in UTF-16 (as BSTR 
> types).  Win32::OLE needs to transform them into regular Perl strings.
> By default it converts to CP_ACP, the standard 8-bit character set on 
> Windows.  That means any Unicode character that is not representable 
> in CP_ACP will be translated to a "replacement" character (e.g. '?').
> 
> If you want to preserve the original Unicode string, then you need to 
> tell Win32::OLE to use CP_UTF8 instead.

I can confirm that it works, and does make a difference. For Unicode
text processing, you need CP_UTF8. Interaction with MSXML and Unicode
documents didn't make sense before I specified CP_UTF8.

> CP_ACP is just the default for backwards compatibility reasons.
> 
> You probably don't want to use any of the other encodings, like 
> CP_MACCP or CP_UTF7, ever. :)

Serialized a doc to UTF-7 the other day. It does look funny. :-)
But reparsing appears to be a problem ...

Best,

Michael
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to