[
https://issues.apache.org/jira/browse/DAFFODIL-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168132#comment-17168132
]
Mike Beckerle edited comment on DAFFODIL-2363 at 7/30/20, 6:30 PM:
-------------------------------------------------------------------
Changed this to "beginner". There are quite a few things to understand in order
to fix this issue, but once those are understood, the fix should be simple.
* Daffodil's conversion of XML illegal characters into the PUA region E.g.,
U+E00B or U+E000 characters. (see:
[https://daffodil.apache.org/infoset/#xml-illegal-characters])
* That this only applies to the infoset once it has been converted into XML,
not before.
* That Daffodil's internal validation (aka "limited" validation) does NOT
operate on these remapped characters, but on the original characters of the
DFDL Infoset, which allows all the XML-illegal code points.
* That the pattern used for the pattern facet must be interpretable by regular
old XML Schema (and so cannot use DFDL character entities like %NUL; or %#x0b;.
* That the pattern to be used by Daffodil's limited (internal) validation
could be expressed in terms of the remapped XML PUA characters like "" but we
can process the pattern to convert these sequences using the same PUA-to
regular-DFDL-character conversion used when converting an XML Infoset into a
DFDL Infoset when unparsing.
Ultimately this is a one-line fix: call the PUA-to-regular-DFDL-character
conversion before using the pattern in Daffodil's internal validation mechanism.
was (Author: mbeckerle):
Changed this to "beginner". There are quite a few things to understand in order
to fix this issue, but once those are understood, the fix should be simple.
* Daffodil's conversion of XML illegal characters into the PUA region E.g.,
U+E00B or U+E000 characters. (see:
https://daffodil.apache.org/infoset/#xml-illegal-characters)
* That this only applies to the infoset once it has been converted into XML,
not before.
* That Daffodil's internal validation (aka "limited" validation) does NOT
operate on these remapped characters, but on the original characters of the
DFDL Infoset, which allows all the XML-illegal code points.
* That the pattern used for the pattern facet must be interpretable by regular
old XML Schema (and so cannot use DFDL character entities like %NUL; or %#x0b;).
* That the pattern to be used by Daffodil's limited (internal) validation could
be expressed in terms of the remapped XML PUA characters like "" but we
can process the pattern to convert these sequences using the same PUA-to
regular-DFDL-character conversion used when converting an XML Infoset into a
DFDL Infoset when unparsing.
Ultimately this is a one-line fix: call the PUA-to-regular-DFDL-character
conversion before using the pattern in Daffodil's internal validation
mechanism.
> pattern facet can't use  notation. Makes validating NUL very hard.
> ---------------------------------------------------------------------------
>
> Key: DAFFODIL-2363
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2363
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Affects Versions: 2.7.0
> Reporter: Mike Beckerle
> Priority: Critical
> Labels: beginner
> Fix For: 3.0.0
>
>
> See test_nulPattern1.
> This bug is a real pain in the neck.
> I want to capture NUL padding regions and insure they are all NUL.
> These come through to XML as These U+E000 characters. I need to insure that
> the string contains only those.
> So I'd like to use a
> {code:xml}
> <xs:simpleType name="allNULStringType>
> <xs:restriction base="xs:string">
> <xs:pattern value="*"/>
> </xs:restriction>
> </xs:simpleType>
> {code}
> I consider this data well-formed (should parse) even if other bytes are there
> that aren't NUL, but such data is invalid. So I want the facet to check for
> all NUL chars (or these E000 things that Daffodil puts in XML because XML
> can't contain actual NUL chars).
> The test fails with
> {code:java}
> org.apache.daffodil.tdml.TDMLExceptionImpl: (Implementation: daffodil)
> Validation errors found where none were expected by the test case.
> Validation Error: ex:foo failed facet checks due to: facet pattern(s): *
> {code}
> If github does the right thing that pattern will look like the E000 box char
> and *.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)