David Bertoni wrote:
> Stephen Collyer wrote:
>> I have a SAX2 parser which is exhibiting odd behaviour.
>>
>> If I give it some XML with an XML declaration like:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>>
>> it fails with a "Invalid document structure" error.
>> If I remove the encoding element, then it parses correctly.
> This is quite strange, since the parser will assume the encoding is
> UTF-8 without an encoding declaration.  The only case where I could
> imagine this might happen is with a UTF-16 document with an encoding
> declaration that indicates a byte-oriented encoding.  You can verify
> this by looking at a binary dump of the XML stream.

Dave, thanks for that - I suspect I know what the problem is.
I am, in fact, handing Xerces a UTF-16 document with an encoding
that says UTF-8 - is that what you mean by a "byte oriented encoding"
i.e a variable length encoding ?

The reason for this is that I am receiving a document in UTF-8 with
a decln that indicates UTF-8, but I'm transcoding it to UTF-16 early
on to make it fit in a Qt QString (I'm using the Trolltech Qt libs).
However, of course, if I hand that off to Xerces, the encoding decln
no longer matches the true encoding, which I guess is the cause of
the problem. This only dawned on me after I'd read your comment.

The only way I can see to fix this is to edit the decln in code.
Or can I tell Xerces to ignore it somehow ? Advice appreciated.

-- 
Regards

Steve Collyer
Netspinner Ltd

Reply via email to