> -----Original Message----- > From: Jan Dubois > On Fri, 10 Sep 2010, Ludwig, Michael wrote: > > The Win32::OLE manual says the following about the CP option: > > > > ---- > > This variable is used to determine the codepage used by all > > translations between Perl strings and Unicode strings used by the > > OLE interface. The default value is CP_ACP, which is the default > > ANSI codepage. Other possible values are CP_OEMCP, CP_MACCP, CP_UTF7 > > and CP_UTF8. These constants are not exported by default. > > ---- > > > > I don't understand the impact of this setting. I presume there isn't > > any, but I want to be sure. > > OLE Automation transfers strings internally encoded in UTF-16 (as BSTR > types). Win32::OLE needs to transform them into regular Perl strings. > By default it converts to CP_ACP, the standard 8-bit character set on > Windows. That means any Unicode character that is not representable > in CP_ACP will be translated to a "replacement" character (e.g. '?'). > > If you want to preserve the original Unicode string, then you need to > tell Win32::OLE to use CP_UTF8 instead.
I can confirm that it works, and does make a difference. For Unicode text processing, you need CP_UTF8. Interaction with MSXML and Unicode documents didn't make sense before I specified CP_UTF8. > CP_ACP is just the default for backwards compatibility reasons. > > You probably don't want to use any of the other encodings, like > CP_MACCP or CP_UTF7, ever. :) Serialized a doc to UTF-7 the other day. It does look funny. :-) But reparsing appears to be a problem ... Best, Michael _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
