Benjamin Fritz created XERCESC-2240: ---------------------------------------
Summary: Junk characters (including null) allowed in XML declaration Key: XERCESC-2240 URL: https://issues.apache.org/jira/browse/XERCESC-2240 Project: Xerces-C++ Issue Type: Bug Affects Versions: 3.2.3 Environment: Linux Reporter: Benjamin Fritz In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not: <?xml version="1.0" encoding^@^@^@^@^@="UTF-8" ?> <root_elem> <child_elem some_attr="abc" /> <child_elem some_attr="def" /> </root_elem> The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error: <?xml version="1.0" encoding="UTF-8" ?> <root_elem^@^@^@^@^@> <child_elem some_attr="abc" /> <child_elem some_attr="def" /> </root_elem> This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org