[ 
https://issues.apache.org/jira/browse/XERCESJ-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Glavassevich resolved XERCESJ-1475.
-------------------------------------------

    Resolution: Invalid

XML Schema processors operate over the XML Information Set [1] not the 
serialized form of an XML document.  There are no character references in the 
Infoset.  They will have been replaced by the XML parser with their character 
code before an XML Schema processor ever sees them.  The whiteSpace facet 
defined in the XML Schema specification is applied to the Infoset value and so 
what Xerces is doing is correct.  Don't confuse this with the attribute value 
normalization algorithm that is described in the XML 1.0/1.1 specs.  That is an 
entirely different process which occurs before the content of the document is 
reported by the XML parser to the application or any other component (including 
the schema validator) further down the pipeline.

[1] http://www.w3.org/TR/xml-infoset/#intro

> whitespace normalization for whitespace facet removes character references
> --------------------------------------------------------------------------
>
>                 Key: XERCESJ-1475
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1475
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: DOM (Level 3 Load & Save), JAXP (javax.xml.parsers), XML 
> Schema 1.0 Datatypes, XML Schema 1.1 Datatypes
>            Reporter: Martin Thomson
>
> Parsing an element that has a simple type with the whitespace facet set to 
> collapse or replace results in character references being normalized.  
> Character references must not be replaced or collapsed by whitespace 
> normalization.
> For example, <x>&#x20;a &#xA; b</x> should produce a value of " a \n b" in 
> the PSVI.  Instead, it produces "a b" if the whitespace facet is collapse 
> (for example, when the type of <x> is xs:token).  The character references 
> are replaced prior to normalization and are not properly preserved.
> The description of the whitespace facet [1] does not make this immediately 
> apparent, but it is relatively explicit in XML [2], though the text and 
> example seem to be in conflict on the use of a character reference for the 
> space character (&#x20;).
> [1] http://www.w3.org/TR/xmlschema11-2/#rf-whiteSpace
> [2] http://www.w3.org/TR/xml11/#AVNormalize

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to