This sounds good to me. Less complexity therefore fewer tests is a good thing.
________________________________
From: Adams, Joshua <jad...@owlcyberdefense.com>
Sent: Wednesday, May 5, 2021 5:05 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Escape character parsing bug?

So, after making the change to throw a Schema Definition Error whenever a 
terminator or separator begins with the escapeCharacter or 
escapeEscapeCharacter, around half of our escape scenario tests fail as they 
were all trying to test these weird edge cases for dealing with delimiters that 
start with the escapeCharacter or escapeEscapeCharacter.  I'm guessing that 
most of these tests can just be purged after a review to make sure we aren't 
losing coverage (other than this scenario where we are now throwing an SDE).  
Just wanted to get some opinions before moving forward with this change.

Josh
________________________________
From: Adams, Joshua <jad...@owlcyberdefense.com>
Sent: Tuesday, May 4, 2021 12:44 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Escape character parsing bug?

I'll begin making the change to add an SDE for these then.  It seems that most 
of the escape scheme tests that weren't round tripping were cases like this.

Josh

On May 4, 2021 12:15 PM, "Beckerle, Mike" <mbecke...@owlcyberdefense.com> wrote:
I asked Steve Hanson of IBM - other co-chair on DFDL workgroup, and one of the 
primaries on one of IBM's DFDL implementations, said that when he tries this 
situation with the escape character "/" matching the start of the separator, he 
gets an SDE.

It appears not to be part of the DFDL spec to call this out as an SDE, so that 
omission will likely become the first erratum to the DFDL v1.0 official final 
spec.


________________________________
From: Adams, Joshua <jad...@owlcyberdefense.com>
Sent: Monday, May 3, 2021 3:35 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Escape character parsing bug?

Thanks for running this up the chain so to speak.  I agree that an SDE would 
probably be best for situations like this as I wouldn't think any sort of sane 
data format would use a combination of separators/escape characters like this.

Josh
________________________________
From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
Sent: Monday, May 3, 2021 3:32 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Escape character parsing bug?

So you have a separator the first char of which is the escape character.

Yikes. I think the DFDL spec should, ideally, make this an SDE. Feels entirely 
ambiguous to me.

The part of the spec you quote is quite problematic, but was updated by one 
word in the final DFDL Spec version.

Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter, respectively.

So breaking that into two independent statements:

  1.  An escapeCharacter is removed unless it is preceded by the escape-escape.
  2.  An escape-escape is removed unless it does not precede the escape 
character.

So (1) means an escape char that is floating around not in front of any 
delimiter is removed.
(2) means an escape-escape floating around not in front of any escape char, is 
preserved.

That still doesn't help with your specific issue. If a delimiter begins with 
the escapeCharacter, will that delimiter appearing in the data be interpreted 
as an escape character followed by the 2nd and subsequent characters of the 
delimiter? Or will the delimiter be recognized?

Consider dfdl:separator="/ // ///" with escapeCharacter="/" and 
escapeEscapeCharacter="/"

What takes priority, interpretation of escapeCharacters and 
escapeEscapeCharacters or recognizing delimiters?

I have posed this issue for consideration of the other DFDL workgroup members 
and I'll report back.

________________________________
From: Adams, Joshua <jad...@owlcyberdefense.com>
Sent: Monday, May 3, 2021 2:38 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Escape character parsing bug?

Consider the following schema:

    <dfdl:defineEscapeScheme name="scenario3">
      <dfdl:escapeScheme escapeCharacter='/'
        escapeKind="escapeCharacter" escapeEscapeCharacter="$" 
extraEscapedCharacters="" generateEscapeBlock="whenNeeded" />
    </dfdl:defineEscapeScheme>

    <xs:element name="e_infix">
      <xs:complexType>
        <xs:sequence dfdl:separator="/;" dfdl:separatorPosition="infix">
          <xs:element name="x" type="xs:string" 
dfdl:escapeSchemeRef="tns:scenario3" />
          <xs:element name="y" type="xs:string" minOccurs="0" 
dfdl:escapeSchemeRef="tns:scenario3" />
        </xs:sequence>
      </xs:complexType>
    </xs:element>

We then have the following test case:
  <parserTestCase name="scenario3_3" model="es3"
    description="Section 13 - escapeCharacter - DFDL-13-029R" root="e_infix" 
roundTrip="true">
    <!-- See DFDL-1556 for to make roundTrip="true" -->
    <document>foo$$/;bar</document>
    <infoset>
      <dfdlInfoset>
        <tns:e_infix>
          <x>foo$/;bar</x>
        </tns:e_infix>
      </dfdlInfoset>
    </infoset>
  </parserTestCase>

Shouldn't this parse as:
<tns:e_infix>
  <x>foo$$</x>
  <y>bar</y>
</tns:e_infix>

The spec says the following:
On parsing any in-scope terminating delimiter encountered in the data
is not interpreted as such when it is immediately preceded by the
dfdl:escapeCharacter (when not itself preceded by the
dfdl:escapeEscapeCharacter). Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter.

It seems to me that the '/;' terminator shouldn't be getting escaped in this 
case, but want to double check.

Josh


Reply via email to