Here's a document containing some non-ASCII text.
<Unicode>
<Deutsch>Süße Ödnis statt ärgerem Übel.</Deutsch>
<Französisch>D'où vient que ça m'est égal?</Französisch>
<Griechisch>Στο 11,6% η ανεργία τον Ιούνιο</Griechisch>
<Russisch>Реальная погода в Москве</Russisch>
</Unicode>
Here's the VBSript/ASP that generated it:
<%@ language=vbscript codepage=65001 %>
<%
Option Explicit
Function NewDom()
Set NewDom = Server.CreateObject( "Msxml2.DOMDocument.6.0" )
End Function
Dim domXml, domRes, nd1, txt
Set domXml = NewDom
Set domRes = NewDom
domXml.loadXML "<Unicode/>"
Set nd1 = domXml.createElement("Deutsch")
Set txt = domXml.createTextNode("Süße Ödnis statt ärgerem Übel.")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )
Set nd1 = domXml.createElement("Französisch")
Set txt = domXml.createTextNode("D'où vient que ça m'est égal?")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )
Set nd1 = domXml.createElement("Griechisch")
Set txt = domXml.createTextNode("Στο 11,6% η ανεργία τον Ιούνιο")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )
Set nd1 = domXml.createElement("Russisch")
Set txt = domXml.createTextNode("Реальная погода в Москве")
domXml.documentElement.appendChild( nd1 ).appendChild( txt )
Response.ContentType = "text/xml"
Response.CharSet = "UTF-8"
domXml.save( Response )
%>
And here's the dysfunctional PerlScript code I derived the VBScript version
from:
<%@ language=perlscript codepage=65001 %>
<%
use strict;
use warnings;
# use utf8; # messes up German and French as well
use Win32::OLE;
use constant DOMCLASS => "Msxml2.DOMDocument.6.0";
our( $Server, $Response, $Request );
my %OLEoptions = ( Warn => 3 );
# $OLEoptions{CP} = Win32::OLE::CP_UTF8; # no effect
Win32::OLE->Option( %OLEoptions );
sub NewDom { return Win32::OLE->new( DOMCLASS ) }
my( $domXml, $domRes, $nd1, $txt );
$domXml = NewDom;
$domRes = NewDom;
$domXml->loadXML( "<Unicode/>" );
$nd1 = $domXml->createElement("Deutsch");
$txt = $domXml->createTextNode("Süße Ödnis statt ärgerem Übel.");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );
$nd1 = $domXml->createElement("Französisch");
$txt = $domXml->createTextNode("D'où vient que ça m'est égal?");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );
$nd1 = $domXml->createElement("Griechisch");
$txt = $domXml->createTextNode("Στο 11,6% η ανεργία τον Ιούνιο");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );
$nd1 = $domXml->createElement("Russisch");
$txt = $domXml->createTextNode("Реальная погода в Москве");
$domXml->documentElement->appendChild( $nd1 )->appendChild( $txt );
$Response->{ContentType} = "text/xml";
$Response->{CharSet} = "UTF-8";
$domXml->save( $Response );
%>
VBS generates a perfect document, but my PerlScript-ASP messes up Greek and
Russian.
<Unicode>
<Deutsch>Süße Ödnis statt ärgerem Übel.</Deutsch>
<Französisch>D'où vient que ça m'est égal?</Französisch>
<Griechisch>St? 11,6% ? a?e???a t?? ??????</Griechisch>
<Russisch>???????? ?????? ? ??????</Russisch>
</Unicode>
What's wrong with my code? Or am I running into some PerlScript engine or COM
interop limitation?
I noted that the engine appears to map alpha, epsilon, sigma and tau to their
latin equivalents (which is arguably the wrong thing to do), but fails to do so
for the rest of the Greek alphabet; whereas the Cyrillic alphabet seems to fall
into complete and consistent ignorance.
Okay, sorry for the long-winded XML examples (maybe they'll be useful for
future generations); I stripped down the example to plain text, which exposes
the problem:
<%@ language=perlscript codepage=65001 %>
<%
use strict;
use warnings;
# use utf8; # messes up German and French as well
our $Response;
my $txt = '';
$txt .= "Süße Ödnis statt ärgerem Übel.\n";
$txt .= "D'où vient que ça m'est égal?\n";
$txt .= "Στο 11,6% η ανεργία τον Ιούνιο\n";
$txt .= "Реальная погода в Москве";
$Response->{ContentType} = "text/plain";
$Response->{CharSet} = "UTF-8";
$Response->Write( $txt );
%>
Yields:
Süße Ödnis statt ärgerem Übel.
D'où vient que ça m'est égal?
St? 11,6% ? a?e???a t?? ??????
???????? ?????? ? ??????
Any clues?
--
Michael Ludwig
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs