[ 
https://issues.apache.org/jira/browse/DAFFODIL-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168132#comment-17168132
 ] 

Mike Beckerle edited comment on DAFFODIL-2363 at 7/30/20, 6:30 PM:
-------------------------------------------------------------------

Changed this to "beginner". There are quite a few things to understand in order 
to fix this issue, but once those are understood, the fix should be simple.
 * Daffodil's conversion of XML illegal characters into the PUA region E.g., 
U+E00B or U+E000 characters. (see: 
[https://daffodil.apache.org/infoset/#xml-illegal-characters])
 * That this only applies to the infoset once it has been converted into XML, 
not before.
 * That Daffodil's internal validation (aka "limited" validation) does NOT 
operate on these remapped characters, but on the original characters of the 
DFDL Infoset, which allows all the XML-illegal code points.
 * That the pattern used for the pattern facet must be interpretable by regular 
old XML Schema (and so cannot use DFDL character entities like %NUL; or %#x0b;.
 * That the pattern to be used by Daffodil's limited (internal) validation 
could be expressed in terms of the remapped XML PUA characters like "" but we 
can process the pattern to convert these sequences using the same PUA-to 
regular-DFDL-character conversion used when converting an XML Infoset into a 
DFDL Infoset when unparsing.

Ultimately this is a one-line fix: call the PUA-to-regular-DFDL-character 
conversion before using the pattern in Daffodil's internal validation mechanism.


was (Author: mbeckerle):
Changed this to "beginner". There are quite a few things to understand in order 
to fix this issue, but once those are understood, the fix should be simple. 

* Daffodil's conversion of XML illegal characters into the PUA region E.g., 
U+E00B or U+E000 characters. (see: 
https://daffodil.apache.org/infoset/#xml-illegal-characters)
* That this only applies to the infoset once it has been converted into XML, 
not before.
* That Daffodil's internal validation (aka "limited" validation) does NOT 
operate on these remapped characters, but on the original characters of the 
DFDL Infoset, which allows all the XML-illegal code points.
* That the pattern used for the pattern facet must be interpretable by regular 
old XML Schema (and so cannot use DFDL character entities like %NUL; or %#x0b;).
* That the pattern to be used by Daffodil's limited (internal) validation could 
be expressed in terms of the remapped XML PUA characters like "" but we 
can process the pattern to convert these sequences using the same PUA-to 
regular-DFDL-character conversion used when converting an XML Infoset into a 
DFDL Infoset when unparsing. 

Ultimately this is a one-line fix: call the PUA-to-regular-DFDL-character 
conversion before using the pattern in Daffodil's internal validation 
mechanism. 
 

> pattern facet can't use  notation. Makes validating NUL very hard. 
> ---------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2363
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2363
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 2.7.0
>            Reporter: Mike Beckerle
>            Priority: Critical
>              Labels: beginner
>             Fix For: 3.0.0
>
>
> See test_nulPattern1. 
> This bug is a real pain in the neck. 
> I want to capture NUL padding regions and insure they are all NUL. 
> These come through to XML as These U+E000 characters. I need to insure that 
> the string contains only those.
> So I'd like to use a 
> {code:xml}
> <xs:simpleType name="allNULStringType>
>   <xs:restriction base="xs:string">
>      <xs:pattern value="&#xE000;*"/>
>   </xs:restriction>
> </xs:simpleType>
> {code}
> I consider this data well-formed (should parse) even if other bytes are there 
> that aren't NUL, but such data is invalid. So I want the facet to check for 
> all NUL chars (or these E000 things that Daffodil puts in XML because XML 
> can't contain actual NUL chars).
> The test fails with 
> {code:java}
> org.apache.daffodil.tdml.TDMLExceptionImpl: (Implementation: daffodil) 
> Validation errors found where none were expected by the test case.
> Validation Error: ex:foo failed facet checks due to: facet pattern(s): *
> {code}
> If github does the right thing that pattern will look like the E000 box char 
> and *. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to