[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

2022-09-08 Thread Benjamin Fritz (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Fritz updated XERCESC-2240:

Description: 
In a library we've written using Xerces-C++ to validate XML files against a 
given XSD, we have discovered that the XercesDOMParser::parse() function does 
not record any errors if the XML declaration at the beginning of an XML 
document contains "junk" characters, including control characters (^K) or null 
bytes. The null control character specifically should be invalid in any XML 
document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses 
without error, but it should not:



  
  


The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
error:



  
  


This is similar to XERCESC-1701, where the end of the document after the root 
element was found to allow "junk" characters during parsing.

  was:
In a library we've written using Xerces-C++ to validate XML files against a 
given XSD, we have discovered that the XercesDOMParser::parse() function does 
not record any errors if the XML declaration at the beginning of an XML 
document contains "junk" characters, including control characters (^K) or null 
bytes. The null control character specifically should be invalid in any XML 
document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses 
without error, but it should not:



  
  


The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
error:



  
  


This is similar to XERCESC-1701, where the end of the document after the root 
element was found to allow "junk" characters during parsing.


> Junk characters (including null) allowed in XML declaration
> ---
>
> Key: XERCESC-2240
> URL: https://issues.apache.org/jira/browse/XERCESC-2240
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Non-Validating Parser
>Affects Versions: 3.2.3
> Environment: Linux
>Reporter: Benjamin Fritz
>Priority: Minor
> Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml
>
>
> In a library we've written using Xerces-C++ to validate XML files against a 
> given XSD, we have discovered that the XercesDOMParser::parse() function does 
> not record any errors if the XML declaration at the beginning of an XML 
> document contains "junk" characters, including control characters (^K) or 
> null bytes. The null control character specifically should be invalid in any 
> XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) 
> parses without error, but it should not:
> 
> 
>   
>   
> 
> The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
> error:
> 
> 
>   
>   
> 
> This is similar to XERCESC-1701, where the end of the document after the root 
> element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

2022-09-08 Thread Benjamin Fritz (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Fritz updated XERCESC-2240:

Component/s: Non-Validating Parser

> Junk characters (including null) allowed in XML declaration
> ---
>
> Key: XERCESC-2240
> URL: https://issues.apache.org/jira/browse/XERCESC-2240
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Non-Validating Parser
>Affects Versions: 3.2.3
> Environment: Linux
>Reporter: Benjamin Fritz
>Priority: Minor
> Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml
>
>
> In a library we've written using Xerces-C++ to validate XML files against a 
> given XSD, we have discovered that the XercesDOMParser::parse() function does 
> not record any errors if the XML declaration at the beginning of an XML 
> document contains "junk" characters, including control characters (^K) or 
> null bytes. The null control character specifically should be invalid in any 
> XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) 
> parses without error, but it should not:
> 
> 
>   
>   
> 
> The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
> error:
> 
> 
>   
>   
> 
> This is similar to XERCESC-1701, where the end of the document after the root 
> element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-1701) Xerces-C++ Allows junk after root element (null characters)

2022-09-08 Thread Benjamin Fritz (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Fritz updated XERCESC-1701:

Attachment: basic_bad_bytes3.xml

> Xerces-C++ Allows junk after root element (null characters)
> ---
>
> Key: XERCESC-1701
> URL: https://issues.apache.org/jira/browse/XERCESC-1701
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: Non-Validating Parser
>Affects Versions: 3.0.1
> Environment: WindowsXP
>Reporter: Maarten Koskamp
>Priority: Major
> Attachments: basic_bad_bytes3.xml, sample.xml, version.incl
>
>
> Xerces-C allows a sequence of null characters after the document root at the 
> end of the xml instance.
> XML Specifiction states that only white-space is allowed after the document 
> root. 
> See attached sample for details.
> Info about the affected version of the parser is also added as an attachment 
> to this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

2022-09-08 Thread Benjamin Fritz (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Fritz updated XERCESC-2240:

Attachment: basic_bad_bytes.xml
basic_bad_bytes2.xml

> Junk characters (including null) allowed in XML declaration
> ---
>
> Key: XERCESC-2240
> URL: https://issues.apache.org/jira/browse/XERCESC-2240
> Project: Xerces-C++
>  Issue Type: Bug
>Affects Versions: 3.2.3
> Environment: Linux
>Reporter: Benjamin Fritz
>Priority: Minor
> Attachments: basic_bad_bytes.xml, basic_bad_bytes2.xml
>
>
> In a library we've written using Xerces-C++ to validate XML files against a 
> given XSD, we have discovered that the XercesDOMParser::parse() function does 
> not record any errors if the XML declaration at the beginning of an XML 
> document contains "junk" characters, including control characters (^K) or 
> null bytes. The null control character specifically should be invalid in any 
> XML document. I.e. the following XML file (attaching as basic_bad_bytes.xml) 
> parses without error, but it should not:
> 
> 
>   
>   
> 
> The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
> error:
> 
> 
>   
>   
> 
> This is similar to XERCESC-1701, where the end of the document after the root 
> element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Created] (XERCESC-2240) Junk characters (including null) allowed in XML declaration

2022-09-08 Thread Benjamin Fritz (Jira)
Benjamin Fritz created XERCESC-2240:
---

 Summary: Junk characters (including null) allowed in XML 
declaration
 Key: XERCESC-2240
 URL: https://issues.apache.org/jira/browse/XERCESC-2240
 Project: Xerces-C++
  Issue Type: Bug
Affects Versions: 3.2.3
 Environment: Linux
Reporter: Benjamin Fritz


In a library we've written using Xerces-C++ to validate XML files against a 
given XSD, we have discovered that the XercesDOMParser::parse() function does 
not record any errors if the XML declaration at the beginning of an XML 
document contains "junk" characters, including control characters (^K) or null 
bytes. The null control character specifically should be invalid in any XML 
document. I.e. the following XML file (attaching as basic_bad_bytes.xml) parses 
without error, but it should not:



  
  


The following XML (attaching as basic_bad_bytes2.xml) correctly reports an 
error:



  
  


This is similar to XERCESC-1701, where the end of the document after the root 
element was found to allow "junk" characters during parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML characters are allowed by DOMWriter

2022-09-08 Thread David Leffingwell (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Leffingwell updated XERCESC-2239:
---
Environment: 
Operating System: All
Platform: All

> When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML 
> characters are allowed by DOMWriter
> --
>
> Key: XERCESC-2239
> URL: https://issues.apache.org/jira/browse/XERCESC-2239
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
> Environment: Operating System: All
> Platform: All
>Reporter: David Leffingwell
>Priority: Major
>
> // Create a Document with a CDATA section that contains an invalid XML 
> character (e.g. 0x1b). 
> // This should fail when serializing the Document, but it does not when 
> XMLUni::fgDOMWRTSplitCdataSections is true.
> struct XercesDeleter
> {
> template
> void operator()(T* data) const
> {
> if (data) { data->release(); };
> }
> };
> typedef std::unique_ptr  
>  DOMWriterPtr;
> typedef std::unique_ptr 
> DOMDocumentPtr;
> XMLPlatformUtils::Initialize();
> DOMImplementation* impl = 
> DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS"));
>  // Create DOM with a CDATA section
> DOMDocumentPtr document(impl->createDocument());
> DOMElement* element = 
> document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;),
>  XMLString::transcode("w:t"));
> document->appendChild(element);
> DOMCDATASection* codesection = document->createCDATASection(XercesString("c = 
> '';")); // 0x1B is not a valid XML 1.0 character
> element->appendChild(codesection); 
> DOMWriterPtr writer(impl->createLSSerializer());
> writer->writeToString(document.get())



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Updated] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML characters are allowed by DOMWriter

2022-09-08 Thread David Leffingwell (Jira)


 [ 
https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Leffingwell updated XERCESC-2239:
---
Summary: When XMLUni::fgDOMWRTSplitCdataSections is true (the default), 
invalid XML characters are allowed by DOMWriter  (was: When 
XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML characters 
are allowed by DOMWriter)

> When XMLUni::fgDOMWRTSplitCdataSections is true (the default), invalid XML 
> characters are allowed by DOMWriter
> --
>
> Key: XERCESC-2239
> URL: https://issues.apache.org/jira/browse/XERCESC-2239
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: David Leffingwell
>Priority: Major
>
> // Create a Document with a CDATA section that contains an invalid XML 
> character (e.g. 0x1b). 
> // This should fail when serializing the Document, but it does not when 
> XMLUni::fgDOMWRTSplitCdataSections is true.
> struct XercesDeleter
> {
> template
> void operator()(T* data) const
> {
> if (data) { data->release(); };
> }
> };
> typedef std::unique_ptr  
>  DOMWriterPtr;
> typedef std::unique_ptr 
> DOMDocumentPtr;
> XMLPlatformUtils::Initialize();
> DOMImplementation* impl = 
> DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS"));
>  // Create DOM with a CDATA section
> DOMDocumentPtr document(impl->createDocument());
> DOMElement* element = 
> document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;),
>  XMLString::transcode("w:t"));
> document->appendChild(element);
> DOMCDATASection* codesection = document->createCDATASection(XercesString("c = 
> '';")); // 0x1B is not a valid XML 1.0 character
> element->appendChild(codesection); 
> DOMWriterPtr writer(impl->createLSSerializer());
> writer->writeToString(document.get())



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Comment Edited] (XERCESC-2239) When XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML characters are allowed by DOMWriter

2022-09-08 Thread David Leffingwell (Jira)


[ 
https://issues.apache.org/jira/browse/XERCESC-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601489#comment-17601489
 ] 

David Leffingwell edited comment on XERCESC-2239 at 9/8/22 5:39 PM:


It looks like ensureValidString() (or something equivalent) is not being done 
for DOMNode::CDATA_SECTION_NODE when fgDOMWRTSplitCdataSections is true.

https://github.com/apache/xerces-c/blob/fc1f7d3a41328e978d7f517193367af8966a40f8/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp


was (Author: JIRAUSER295485):
It looks like ensureValidString() (or something equivalent) is not being done 
for DOMNode::CDATA_SECTION_NODE.

https://github.com/apache/xerces-c/blob/fc1f7d3a41328e978d7f517193367af8966a40f8/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp

> When XMLUni::fgDOMWRTSplitCdataSections is true (the default) invalid XML 
> characters are allowed by DOMWriter
> -
>
> Key: XERCESC-2239
> URL: https://issues.apache.org/jira/browse/XERCESC-2239
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: David Leffingwell
>Priority: Major
>
> // Create a Document with a CDATA section that contains an invalid XML 
> character (e.g. 0x1b). 
> // This should fail when serializing the Document, but it does not when 
> XMLUni::fgDOMWRTSplitCdataSections is true.
> struct XercesDeleter
> {
> template
> void operator()(T* data) const
> {
> if (data) { data->release(); };
> }
> };
> typedef std::unique_ptr  
>  DOMWriterPtr;
> typedef std::unique_ptr 
> DOMDocumentPtr;
> XMLPlatformUtils::Initialize();
> DOMImplementation* impl = 
> DOMImplementationRegistry::getDOMImplementation(XMLString::transcode("LS"));
>  // Create DOM with a CDATA section
> DOMDocumentPtr document(impl->createDocument());
> DOMElement* element = 
> document->createElementNS(XMLString::transcode("http://schemas.openxmlformats.org/wordprocessingml/2006/main;),
>  XMLString::transcode("w:t"));
> document->appendChild(element);
> DOMCDATASection* codesection = document->createCDATASection(XercesString("c = 
> '';")); // 0x1B is not a valid XML 1.0 character
> element->appendChild(codesection); 
> DOMWriterPtr writer(impl->createLSSerializer());
> writer->writeToString(document.get())



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org