For what it's worth, the VXML file you attached to the bug appears to be encoded in UTF-16 (with BOM), not UTF-8 as indicated by the XML declaration. If you change the XML declaration to match the actual encoding, or the actual encoding to match the declaration, I imagine you'll get better results.
It's often useful with questions like this to try parsing the offending document with one of the samples, such as DOMPrint. In this case, I found that DOMPrint (version 2.5) said the following: Fatal Error at file "[some-path]test1012.1.vxml", line 1, column 39 Message: An exception occurred! Type:UTFDataFormatException, Message:invalid byte 1 (ê) of a 1-byte sequence. While it may not be obvious what exactly caused the problem, there are some pretty strong hints there. When I changed the declaration to match the actual encoding, DOMPrint processed the document successfully. -----Original Message----- From: Jesse Pelton (JIRA) [mailto:[EMAIL PROTECTED] Sent: Thursday, June 22, 2006 12:48 PM To: [email protected] Subject: [jira] Closed: (XERCESC-1606) Failing in parsing Unicode XML file [ http://issues.apache.org/jira/browse/XERCESC-1606?page=all ] Jesse Pelton closed XERCESC-1606: --------------------------------- Resolution: Invalid Please use the mailing lists for questions about using Xerces. See http://xml.apache.org/mail.html#xerces-c-user. > Failing in parsing Unicode XML file > ----------------------------------- > > Key: XERCESC-1606 > URL: http://issues.apache.org/jira/browse/XERCESC-1606 > Project: Xerces-C++ > Type: Bug > Components: SAX/SAX2 > Versions: 1.7.0 > Environment: Operating System: Linux, Kernel Version:2.4 Software: xerces > 1.7.0 > Reporter: Shailendra Verma > Attachments: VXIlog.txt, test1012.1.vxml > > Hello all, > I am using xerces 1.7.0 and want to parse the following page in Unicode > format. > <?xml version="1.0" encoding="UTF-8"?> > <vxml version="2.0"> > <form> > <block> > <prompt xml:lang="zh-CN"> > <paragraph> > <sentence>???????</sentence> > </paragraph> > </prompt> > </block> > </form> > </vxml> > while parsing it is failing in parse->parse ( ) and caught by catch (const > SAXParseException & exception) . > Can anyone give me idea about which version of xerces-c can be used to parse > Unicode file? > Waiting for responses from you. > Thanx & Rgds, > Shailendra > -------------------------------------------------------------------------------------------------------------------------------------------- > try { > if (isDefaults && lastParse != DocumentParser::DEFAULTS) { > parser->parse(MemBufInputSource(DUMMY_VXML_DEFAULTS_DOC, > DUMMY_VXML_DEFAULTS_DOC_SIZE, > "vxml 1.0 defaults"), false); > lastParse = DocumentParser::DEFAULTS; > } > else if (!isDefaults && lastParse != DocumentParser::DOCUMENT) { > parser->parse(MemBufInputSource(DUMMY_VXML_DOC, DUMMY_VXML_DOC_SIZE, > "vxml 1.0 dtd"), false); > lastParse = DocumentParser::DOCUMENT; > } > } > catch (const SAXParseException & exception) { > log.StartDiagnostic(0) << L"DocumentParser::FetchDocument - Parse error " > << L"in file \"" > << XMLChToVXIchar(exception.getSystemId()) > << L"\", line " << exception.getLineNumber() > << L", column " << exception.getColumnNumber() > << L" - " << > XMLChToVXIchar(exception.getMessage()); > log.EndDiagnostic(); > log.LogError(999, SimpleLogger::MESSAGE, L"unable to load VXML DTD"); > return 4; > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
