For what it's worth, the VXML file you attached to the bug appears to be 
encoded in UTF-16 (with BOM), not UTF-8 as indicated by the XML declaration.  
If you change the XML declaration to match the actual encoding, or the actual 
encoding to match the declaration, I imagine you'll get better results.

It's often useful with questions like this to try parsing the offending 
document with one of the samples, such as DOMPrint.  In this case, I found that 
DOMPrint (version 2.5) said the following:

  Fatal Error at file "[some-path]test1012.1.vxml", line 1, column 39

   Message: An exception occurred! Type:UTFDataFormatException, Message:invalid 
byte 1 (ê) of a 1-byte sequence.

While it may not be obvious what exactly caused the problem, there are some 
pretty strong hints there.  When I changed the declaration to match the actual 
encoding, DOMPrint processed the document successfully.

-----Original Message-----
From: Jesse Pelton (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 22, 2006 12:48 PM
To: [email protected]
Subject: [jira] Closed: (XERCESC-1606) Failing in parsing Unicode XML file

     [ http://issues.apache.org/jira/browse/XERCESC-1606?page=all ]
     
Jesse Pelton closed XERCESC-1606:
---------------------------------

    Resolution: Invalid

Please use the mailing lists for questions about using Xerces. See 
http://xml.apache.org/mail.html#xerces-c-user.

> Failing in parsing Unicode XML file
> -----------------------------------
>
>          Key: XERCESC-1606
>          URL: http://issues.apache.org/jira/browse/XERCESC-1606
>      Project: Xerces-C++
>         Type: Bug

>   Components: SAX/SAX2
>     Versions: 1.7.0
>  Environment: Operating System: Linux, Kernel Version:2.4 Software: xerces 
> 1.7.0
>     Reporter: Shailendra Verma
>  Attachments: VXIlog.txt, test1012.1.vxml
>
> Hello all,
> I am using xerces 1.7.0 and want to parse the following page in Unicode 
> format.
> <?xml version="1.0" encoding="UTF-8"?>
> <vxml version="2.0">
> <form>
> <block>
>    <prompt xml:lang="zh-CN">
>       <paragraph>
>          <sentence>???????</sentence>
>       </paragraph>
>    </prompt>
> </block>
> </form>
> </vxml>
> while parsing it is failing in parse->parse ( ) and caught by catch (const 
> SAXParseException & exception) .
> Can anyone give me idea about which version of xerces-c can be used to parse 
> Unicode file?
> Waiting for responses from you.
> Thanx & Rgds,
> Shailendra
> --------------------------------------------------------------------------------------------------------------------------------------------
> try {
>     if (isDefaults && lastParse != DocumentParser::DEFAULTS) {
>       parser->parse(MemBufInputSource(DUMMY_VXML_DEFAULTS_DOC,
>                                       DUMMY_VXML_DEFAULTS_DOC_SIZE,
>                                       "vxml 1.0 defaults"), false);
>       lastParse = DocumentParser::DEFAULTS;
>     }
>     else if (!isDefaults && lastParse != DocumentParser::DOCUMENT) {
>       parser->parse(MemBufInputSource(DUMMY_VXML_DOC, DUMMY_VXML_DOC_SIZE,
>                                       "vxml 1.0 dtd"), false);
>       lastParse = DocumentParser::DOCUMENT;
>     }
>   }
> catch (const SAXParseException & exception) {
>     log.StartDiagnostic(0) << L"DocumentParser::FetchDocument - Parse error "
>                            << L"in file \""
>                            << XMLChToVXIchar(exception.getSystemId())
>                            << L"\", line " << exception.getLineNumber()
>                            << L", column " << exception.getColumnNumber()
>                            << L" - " << 
> XMLChToVXIchar(exception.getMessage());
>     log.EndDiagnostic();
>     log.LogError(999, SimpleLogger::MESSAGE, L"unable to load VXML DTD");
>     return 4;
>   }

-- 

This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to