[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration
[ https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fritz updated XERCESC-2240: Description: In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not: The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error: This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing. was: In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not: The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error: This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing. > Junk characters (including null) allowed in XML declaration > --- > > Key: XERCESC-2240 > URL: https://issues.apache.org/jira/browse/XERCESC-2240 > Project: Xerces-C++ > Issue Type: Bug > Components: Non-Validating Parser >Affects Versions: 3.2.3 > Environment: Linux >Reporter: Benjamin Fritz >Priority: Minor > Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml > > > In a library we've written using Xerces-C++ to validate XML files against a > given XSD, we have discovered that the XercesDOMParser::parse() function does > not record any errors if the XML declaration at the beginning of an XML > document contains "junk" characters, including control characters (^K) or > null bytes. The null control character specifically should be invalid in any > XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) > parses without error, but it should not: > > > > > > The following XML (attaching as basic_bad_bytes2.xml) correctly reports an > error: > > > > > > This is similar to XERCESC-1701, where the end of the document after the root > element was found to allow "junk" characters during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration
[ https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fritz updated XERCESC-2240: Component/s: Non-Validating Parser > Junk characters (including null) allowed in XML declaration > --- > > Key: XERCESC-2240 > URL: https://issues.apache.org/jira/browse/XERCESC-2240 > Project: Xerces-C++ > Issue Type: Bug > Components: Non-Validating Parser >Affects Versions: 3.2.3 > Environment: Linux >Reporter: Benjamin Fritz >Priority: Minor > Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml > > > In a library we've written using Xerces-C++ to validate XML files against a > given XSD, we have discovered that the XercesDOMParser::parse() function does > not record any errors if the XML declaration at the beginning of an XML > document contains "junk" characters, including control characters (^K) or > null bytes. The null control character specifically should be invalid in any > XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) > parses without error, but it should not: > > > > > > The following XML (attaching as basic_bad_bytes2.xml) correctly reports an > error: > > > > > > This is similar to XERCESC-1701, where the end of the document after the root > element was found to allow "junk" characters during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-1701) Xerces-C++ Allows junk after root element (null characters)
[ https://issues.apache.org/jira/browse/XERCESC-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fritz updated XERCESC-1701: Attachment: basic_bad_bytes3.xml > Xerces-C++ Allows junk after root element (null characters) > --- > > Key: XERCESC-1701 > URL: https://issues.apache.org/jira/browse/XERCESC-1701 > Project: Xerces-C++ > Issue Type: Bug > Components: Non-Validating Parser >Affects Versions: 3.0.1 > Environment: WindowsXP >Reporter: Maarten Koskamp >Priority: Major > Attachments: basic_bad_bytes3.xml, sample.xml, version.incl > > > Xerces-C allows a sequence of null characters after the document root at the > end of the xml instance. > XML Specifiction states that only white-space is allowed after the document > root. > See attached sample for details. > Info about the affected version of the parser is also added as an attachment > to this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration
[ https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fritz updated XERCESC-2240: Attachment: basic_bad_bytes.xml basic_bad_bytes2.xml > Junk characters (including null) allowed in XML declaration > --- > > Key: XERCESC-2240 > URL: https://issues.apache.org/jira/browse/XERCESC-2240 > Project: Xerces-C++ > Issue Type: Bug >Affects Versions: 3.2.3 > Environment: Linux >Reporter: Benjamin Fritz >Priority: Minor > Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml > > > In a library we've written using Xerces-C++ to validate XML files against a > given XSD, we have discovered that the XercesDOMParser::parse() function does > not record any errors if the XML declaration at the beginning of an XML > document contains "junk" characters, including control characters (^K) or > null bytes. The null control character specifically should be invalid in any > XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) > parses without error, but it should not: > > > > > > The following XML (attaching as basic_bad_bytes2.xml) correctly reports an > error: > > > > > > This is similar to XERCESC-1701, where the end of the document after the root > element was found to allow "junk" characters during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Created] (XERCESC-2240) Junk characters (including null) allowed in XML declaration
Benjamin Fritz created XERCESC-2240: --- Summary: Junk characters (including null) allowed in XML declaration Key: XERCESC-2240 URL: https://issues.apache.org/jira/browse/XERCESC-2240 Project: Xerces-C++ Issue Type: Bug Affects Versions: 3.2.3 Environment: Linux Reporter: Benjamin Fritz In a library we've written using Xerces-C++ to validate XML files against a given XSD, we have discovered that the XercesDOMParser::parse() function does not record any errors if the XML declaration at the beginning of an XML document contains "junk" characters, including control characters (^K) or null bytes. The null control character specifically should be invalid in any XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses without error, but it should not: The following XML (attaching as basic_bad_bytes2.xml) correctly reports an error: This is similar to XERCESC-1701, where the end of the document after the root element was found to allow "junk" characters during parsing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML characters are allowed by DOMWriter
[ https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Leffingwell updated XERCESC-2239: --- Environment: Operating System: All Platform: All > When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML > characters are allowed by DOMWriter > -- > > Key: XERCESC-2239 > URL: https://issues.apache.org/jira/browse/XERCESC-2239 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 > Environment: Operating System: All > Platform: All >Reporter: David Leffingwell >Priority: Major > > // Create a Document with a CDATA section that contains an invalid XML > character (e.g. 0x1b). > // This should fail when serializing the Document, but it does not when > XMLUni::fgDOMWRTSplitCdataSections is true. > struct XercesDeleter > { > template > void operator()(T* data) const > { > if (data) { data->release(); }; > } > }; > typedef std::unique_ptr > DOMWriterPtr; > typedef std::unique_ptr > DOMDocumentPtr; > XMLPlatformUtils::Initialize(); > DOMImplementation* impl = > DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS")); > // Create DOM with a CDATA section > DOMDocumentPtr document(impl->createDocument()); > DOMElement* element = > document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;), > XMLString::transcode("w:t")); > document->appendChild(element); > DOMCDATASection* codesection = document->createCDATASection(XercesString("c = > '';")); // 0x1B is not a valid XML 1.0 character > element->appendChild(codesection); > DOMWriterPtr writer(impl->createLSSerializer()); > writer->writeToString(document.get()) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Updated] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML characters are allowed by DOMWriter
[ https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Leffingwell updated XERCESC-2239: --- Summary: When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML characters are allowed by DOMWriter (was: When XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML characters are allowed by DOMWriter) > When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML > characters are allowed by DOMWriter > -- > > Key: XERCESC-2239 > URL: https://issues.apache.org/jira/browse/XERCESC-2239 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: David Leffingwell >Priority: Major > > // Create a Document with a CDATA section that contains an invalid XML > character (e.g. 0x1b). > // This should fail when serializing the Document, but it does not when > XMLUni::fgDOMWRTSplitCdataSections is true. > struct XercesDeleter > { > template > void operator()(T* data) const > { > if (data) { data->release(); }; > } > }; > typedef std::unique_ptr > DOMWriterPtr; > typedef std::unique_ptr > DOMDocumentPtr; > XMLPlatformUtils::Initialize(); > DOMImplementation* impl = > DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS")); > // Create DOM with a CDATA section > DOMDocumentPtr document(impl->createDocument()); > DOMElement* element = > document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;), > XMLString::transcode("w:t")); > document->appendChild(element); > DOMCDATASection* codesection = document->createCDATASection(XercesString("c = > '';")); // 0x1B is not a valid XML 1.0 character > element->appendChild(codesection); > DOMWriterPtr writer(impl->createLSSerializer()); > writer->writeToString(document.get()) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org
[jira] [Comment Edited] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML characters are allowed by DOMWriter
[ https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601489#comment-17601489 ] David Leffingwell edited comment on XERCESC-2239 at 9/8/22 5:39 PM: It looks like ensureValidString() (or something equivalent) is not being done for DOMNode::CDATA_SECTION_NODE when fgDOMWRTSplitCdataSections is true. https://github.com/apache/xerces-c/blob/fc1f7d3a41328e978d7f517193367af8966a40f8/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp was (Author: JIRAUSER295485): It looks like ensureValidString() (or something equivalent) is not being done for DOMNode::CDATA_SECTION_NODE. https://github.com/apache/xerces-c/blob/fc1f7d3a41328e978d7f517193367af8966a40f8/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp > When XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML > characters are allowed by DOMWriter > - > > Key: XERCESC-2239 > URL: https://issues.apache.org/jira/browse/XERCESC-2239 > Project: Xerces-C++ > Issue Type: Bug > Components: DOM >Affects Versions: 3.2.0 >Reporter: David Leffingwell >Priority: Major > > // Create a Document with a CDATA section that contains an invalid XML > character (e.g. 0x1b). > // This should fail when serializing the Document, but it does not when > XMLUni::fgDOMWRTSplitCdataSections is true. > struct XercesDeleter > { > template > void operator()(T* data) const > { > if (data) { data->release(); }; > } > }; > typedef std::unique_ptr > DOMWriterPtr; > typedef std::unique_ptr > DOMDocumentPtr; > XMLPlatformUtils::Initialize(); > DOMImplementation* impl = > DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS")); > // Create DOM with a CDATA section > DOMDocumentPtr document(impl->createDocument()); > DOMElement* element = > document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;), > XMLString::transcode("w:t")); > document->appendChild(element); > DOMCDATASection* codesection = document->createCDATASection(XercesString("c = > '';")); // 0x1B is not a valid XML 1.0 character > element->appendChild(codesection); > DOMWriterPtr writer(impl->createLSSerializer()); > writer->writeToString(document.get()) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org