On 8/13/2010 12:17 AM, Mithun wrote:
Hi,
I'm looking for a solution of setencoding for parsing a xml file.
Writing a wrapper like module for xerces which will help to parse XML string
comming from the network. So i have used DOMLSinput for setStringdata();
I can parse files with encoding="UTF-8" by default. But when i use a
sample file with encoding='" MEMPARSE_ENCODING "' Im getting segmentation fault.
I have searched and tried some options from "XMLUni" but result is same.
This is neither a valid encoding, nor valid syntax for specifying an
encoding. Still, Xerces-C should not cause a segmentation violation.
Can you provide a minimal program and an input document to reproduce
this and create a Jira issue?
2) How I can understand that type on the fly before setting it to be parsed?
There's no way to determine the encoding of a document for certain.
There are ways to do statistical analysis of a document to guess its
encoding, but that analysis can fail.
Dave