Thanks for running this up the chain so to speak. I agree that an SDE would probably be best for situations like this as I wouldn't think any sort of sane data format would use a combination of separators/escape characters like this.
Josh ________________________________ From: Beckerle, Mike <mbecke...@owlcyberdefense.com> Sent: Monday, May 3, 2021 3:32 PM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Re: Escape character parsing bug? So you have a separator the first char of which is the escape character. Yikes. I think the DFDL spec should, ideally, make this an SDE. Feels entirely ambiguous to me. The part of the spec you quote is quite problematic, but was updated by one word in the final DFDL Spec version. Occurrences of the dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed from the data, unless the dfdl:escapeCharacter is preceded by the dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter does not precede the dfdl:escapeCharacter, respectively. So breaking that into two independent statements: 1. An escapeCharacter is removed unless it is preceded by the escape-escape. 2. An escape-escape is removed unless it does not precede the escape character. So (1) means an escape char that is floating around not in front of any delimiter is removed. (2) means an escape-escape floating around not in front of any escape char, is preserved. That still doesn't help with your specific issue. If a delimiter begins with the escapeCharacter, will that delimiter appearing in the data be interpreted as an escape character followed by the 2nd and subsequent characters of the delimiter? Or will the delimiter be recognized? Consider dfdl:separator="/ // ///" with escapeCharacter="/" and escapeEscapeCharacter="/" What takes priority, interpretation of escapeCharacters and escapeEscapeCharacters or recognizing delimiters? I have posed this issue for consideration of the other DFDL workgroup members and I'll report back. ________________________________ From: Adams, Joshua <jad...@owlcyberdefense.com> Sent: Monday, May 3, 2021 2:38 PM To: dev@daffodil.apache.org <dev@daffodil.apache.org> Subject: Escape character parsing bug? Consider the following schema: <dfdl:defineEscapeScheme name="scenario3"> <dfdl:escapeScheme escapeCharacter='/' escapeKind="escapeCharacter" escapeEscapeCharacter="$" extraEscapedCharacters="" generateEscapeBlock="whenNeeded" /> </dfdl:defineEscapeScheme> <xs:element name="e_infix"> <xs:complexType> <xs:sequence dfdl:separator="/;" dfdl:separatorPosition="infix"> <xs:element name="x" type="xs:string" dfdl:escapeSchemeRef="tns:scenario3" /> <xs:element name="y" type="xs:string" minOccurs="0" dfdl:escapeSchemeRef="tns:scenario3" /> </xs:sequence> </xs:complexType> </xs:element> We then have the following test case: <parserTestCase name="scenario3_3" model="es3" description="Section 13 - escapeCharacter - DFDL-13-029R" root="e_infix" roundTrip="true"> <!-- See DFDL-1556 for to make roundTrip="true" --> <document>foo$$/;bar</document> <infoset> <dfdlInfoset> <tns:e_infix> <x>foo$/;bar</x> </tns:e_infix> </dfdlInfoset> </infoset> </parserTestCase> Shouldn't this parse as: <tns:e_infix> <x>foo$$</x> <y>bar</y> </tns:e_infix> The spec says the following: On parsing any in-scope terminating delimiter encountered in the data is not interpreted as such when it is immediately preceded by the dfdl:escapeCharacter (when not itself preceded by the dfdl:escapeEscapeCharacter). Occurrences of the dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed from the data, unless the dfdl:escapeCharacter is preceded by the dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter does not precede the dfdl:escapeCharacter. It seems to me that the '/;' terminator shouldn't be getting escaped in this case, but want to double check. Josh