Here's a document containing some non-ASCII text.

<Unicode>
<Deutsch>Süße Ödnis statt ärgerem Übel.</Deutsch>
<Französisch>D'où vient que ça m'est égal?</Französisch>
<Griechisch>Στο 11,6% η ανεργία τον Ιούνιο</Griechisch>
<Russisch>Реальная погода в Москве</Russisch>
</Unicode>


Here's the VBSript/ASP that generated it:

<%@ language=vbscript codepage=65001 %>
<%
Option Explicit

Function NewDom()
        Set NewDom = Server.CreateObject( "Msxml2.DOMDocument.6.0" )
End Function

Dim domXml, domRes, nd1, txt

Set domXml = NewDom
Set domRes = NewDom

domXml.loadXML "<Unicode/>"

Set nd1 = domXml.createElement("Deutsch")
Set txt = domXml.createTextNode("Süße Ödnis statt ärgerem Übel.")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )

Set nd1 = domXml.createElement("Französisch")
Set txt = domXml.createTextNode("D'où vient que ça m'est égal?")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )

Set nd1 = domXml.createElement("Griechisch")
Set txt = domXml.createTextNode("Στο 11,6% η ανεργία τον Ιούνιο")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )

Set nd1 = domXml.createElement("Russisch")
Set txt = domXml.createTextNode("Реальная погода в Москве")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )

Response.ContentType = "text/xml"
Response.CharSet     = "UTF-8"
domXml.save( Response )
%>


And here's the dysfunctional PerlScript code I derived the VBScript version 
from:

<%@ language=perlscript codepage=65001 %>
<%
use strict;
use warnings;
# use utf8; # messes up German and French as well
use Win32::OLE;
use constant DOMCLASS => "Msxml2.DOMDocument.6.0";
our( $Server, $Response, $Request );
my %OLEoptions = ( Warn => 3 );
# $OLEoptions{CP} = Win32::OLE::CP_UTF8; # no effect
Win32::OLE->Option( %OLEoptions );

sub NewDom { return Win32::OLE->new( DOMCLASS ) }

my( $domXml, $domRes, $nd1, $txt );

$domXml = NewDom;
$domRes = NewDom;

$domXml->loadXML( "<Unicode/>" );

$nd1 = $domXml->createElement("Deutsch");
$txt = $domXml->createTextNode("Süße Ödnis statt ärgerem Übel.");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );

$nd1 = $domXml->createElement("Französisch");
$txt = $domXml->createTextNode("D'où vient que ça m'est égal?");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );

$nd1 = $domXml->createElement("Griechisch");
$txt = $domXml->createTextNode("Στο 11,6% η ανεργία τον Ιούνιο");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );

$nd1 = $domXml->createElement("Russisch");
$txt = $domXml->createTextNode("Реальная погода в Москве");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );

$Response->{ContentType} = "text/xml";
$Response->{CharSet}     = "UTF-8";
$domXml->save( $Response );
%>


VBS generates a perfect document, but my PerlScript-ASP messes up Greek and 
Russian.

<Unicode>
<Deutsch>Süße Ödnis statt ärgerem Übel.</Deutsch>
<Französisch>D'où vient que ça m'est égal?</Französisch>
<Griechisch>St? 11,6% ? a?e???a t?? ??????</Griechisch>
<Russisch>???????? ?????? ? ??????</Russisch>
</Unicode>

What's wrong with my code? Or am I running into some PerlScript engine or COM 
interop limitation?

I noted that the engine appears to map alpha, epsilon, sigma and tau to their 
latin equivalents (which is arguably the wrong thing to do), but fails to do so 
for the rest of the Greek alphabet; whereas the Cyrillic alphabet seems to fall 
into complete and consistent ignorance.

Okay, sorry for the long-winded XML examples (maybe they'll be useful for 
future generations); I stripped down the example to plain text, which exposes 
the problem:

<%@ language=perlscript codepage=65001 %>
<%
use strict;
use warnings;
# use utf8; # messes up German and French as well
our $Response;

my $txt = '';
$txt .= "Süße Ödnis statt ärgerem Übel.\n";
$txt .= "D'où vient que ça m'est égal?\n";
$txt .= "Στο 11,6% η ανεργία τον Ιούνιο\n";
$txt .= "Реальная погода в Москве";

$Response->{ContentType} = "text/plain";
$Response->{CharSet}     = "UTF-8";
$Response->Write( $txt );
%>


Yields:

Süße Ödnis statt ärgerem Übel.
D'où vient que ça m'est égal?
St? 11,6% ? a?e???a t?? ??????
???????? ?????? ? ??????


Any clues?
-- 
Michael Ludwig
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to