David Bertoni wrote: > Stephen Collyer wrote: >> I have a SAX2 parser which is exhibiting odd behaviour. >> >> If I give it some XML with an XML declaration like: >> >> <?xml version="1.0" encoding="UTF-8" ?> >> >> it fails with a "Invalid document structure" error. >> If I remove the encoding element, then it parses correctly. > This is quite strange, since the parser will assume the encoding is > UTF-8 without an encoding declaration. The only case where I could > imagine this might happen is with a UTF-16 document with an encoding > declaration that indicates a byte-oriented encoding. You can verify > this by looking at a binary dump of the XML stream.
Dave, thanks for that - I suspect I know what the problem is. I am, in fact, handing Xerces a UTF-16 document with an encoding that says UTF-8 - is that what you mean by a "byte oriented encoding" i.e a variable length encoding ? The reason for this is that I am receiving a document in UTF-8 with a decln that indicates UTF-8, but I'm transcoding it to UTF-16 early on to make it fit in a Qt QString (I'm using the Trolltech Qt libs). However, of course, if I hand that off to Xerces, the encoding decln no longer matches the true encoding, which I guess is the cause of the problem. This only dawned on me after I'd read your comment. The only way I can see to fix this is to edit the decln in code. Or can I tell Xerces to ignore it somehow ? Advice appreciated. -- Regards Steve Collyer Netspinner Ltd
