Hi John,

  "In other words: regardless of platform, use binmode() on binary
  data, like for example images." (cf. perldoc -f binmode()).

XML regardless of platform and encoding is text data.

XML::LibXML sets the UTF8 flags properly on scalars (read as "described
either in perlguts|perlxs|perlapi|perlxstut|...").

what happens in XML::LibXML::Document::toString() is pretty much the
same thing as in XML::LibXML::Document::toFile(). The major difference
is that toString() dumps the results into a memory buffer and toFile
dumps it into a file (ok, that was easy, huh?)

additionally toString() sets the utf8 flag on the returned value if the
encoding of the document is utf8. This is required, because otherwise
perl's regex engine will not be able to process the result and length()
will not return the correct length of the string and ... 

you should to read/write your 'raw' data directly and check if you run
into any problems. a simple test would be:

$ perl dump_non_utf8_dom_to_stdout.pl | xmllint -
or
$ perl dump_non_utf8_dom_to_stdout.pl | xmllint --encode utf8 -

if xmllint reports no errors, there are no errors (at least in the
encoding)
and believe me, it works ;)

NB I ran the following code as script:

        use XML::LibXML;
        my $doc = XML::LibXML::Document->new();
        my $node = XML::LibXML::Element->new('test');
        my $in = pack('U', 0x00e4); # Ã
        $node->setAttribute( "foo", $in );
        $doc->setDocumentElement($node);
        $doc->setEncoding('iso-8859-1');
        print $doc->toString();

Christian.

On Mon, 2005-02-14 at 11:35 +0000, [EMAIL PROTECTED] wrote:
> Vaclav,
> 
> > On Wednesday 09 February 2005 16:05, [EMAIL PROTECTED] 
> wrote:
> > > When the code below is run the xml is not well formed.
> > > To be more precise, the output via toString isn't well formed, the
> > > output via toFile _is_ well formed.
> > > What seems to happening is that toString is decoding the utf string
> > > \x{C3}\x{84} for some reason. toFile does not do this.
> > IMHO STDOUT does it - when I change the binmode parameter to ":utf8", I get 
> > the same data on standard output as in the file (perl 5.8.0, libxml 2.6.16 
> > on 
> > Linux).
> 
> Yes, as it should, however, the output isn't set to :utf8 it's set to :raw. 
> I don't set it to :utf8 since not all output is going to be utf8.
> 
> John
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
-- 
Christian Glahn <[EMAIL PROTECTED]>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to