Apologies on tardy reply. I missed parts of this thread due to spam email 
filter.

(I learned that MS Outlook 365 is misclassifying some Apache email as junk 
email. )

Here's the link to what is proposed for checksum calculations, and it has links 
to some mock-ups showing how this checksum/crc stuff is supposed to work.

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Checksums%2C+CRC%2C+Parity+-+Layering+Enhancements

I do think this could be used to couple a generic hash into data that is 
verified at unparse.


________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Monday, August 30, 2021 9:50 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Fwd: FW: DFDL: potential problem

Interesting idea.

I was thinking you could do something like this once we have this new
feature implemented:

  <xs:element name="FormatAndChecksum">
    <xs:sequence>
      <xs:sequence dfdlx:layer="checksum">
        <xs:element ref="Format" />
      </xs:sequence>
      <xs:element name="checksum" type="xs:string"
        dfdl:inputValueCalc="$checksum" />
      <xs:sequence>
        <xs:annotation>
          <xs:appinfo source="http://www.ogf.org/dfdl/";>
            <dfdl:assert test="{ ./checksum eq $checksum }" />
          </xs:appinfo>
        </xs:annotation>
      </xs:sequence>
    </xs:sequence>
  </xs:element>

So we parse and checksum the entire data foramt, add the checksum to the
infoset via input value calc, and then add an assert that the calculated
checksum matchs the value in the infoset.

On parse, these two should always be the same. But on unparse, it's
possible they could be different and the assert would fail.
Unfortunately, this doesn't actually work because assert's are evaluated
during unparse.

This seems like a reasonable use case for asserts during unparse, and I
imagine there are others, so maybe that's a feature worth considering to
allow this type of unparse validation.




On 8/25/21 9:20 AM, Attila Horvath wrote:
>
> *Subject:* DFDL: potential problem
>
> ALCON
>
> re: idea for checksum calculations in DFDL
> <https://lists.apache.org/thread.html/r85112d45e552a1f5b467406aeeee0f0a4bcaf143372b95c8e72f2669%40%3Cdev.daffodil.apache.org%3E>
>
> We may have a potential ‘situation’ as part of our DFDL/Daffodil offering as
> follows…
>
> My DFDL schema development process consists of examining the exit codes of a
> four (4) part mechanism:
>
>  1. DFDL parsing – “Houston, we have a go.”
>  2. DFDL unparsing – “Houston, we have a go.”
>  3. *End-to-end source/destination data comparison – “Houston, we have a 
> problem.”*
>  4. Intermediate xml validation against reconstituted data – “Houston, we 
> have a
>     go.”
>
> I have an *_unintentional_*error in my DFDL schema- unfortunately the
> data/schema is lost that created this situation. Per above, both parse and
> unparse execute successfully and xmllint validates Daffodil’s intermediate XML
> file successfully against the reconstituted/unparsed data as well against the
> DFDL [erroneous] schema.
>
> However, the source and target data are *_NOT_* congruent.This is one 
> situation
> I did not anticipate this situation.
>
> This means, our model and incorporation of Daffodil in our situation leaves
> [albeit] a /possibility/ to have an erroneous DFDL schema that will ultimately
> send data end-to-end but because the two [gateway]ends do not
> communicatedirectly w/ each other there is no way for the destination gateway 
> to
> verify if the data is identical w/ the data received by the source gateway.
>
> To address above and perhaps along the lines of 'checksum calculations' re: 
> IPV4
> element, what is the collective opinion of having a SHASUM capability added to
> Daffodil allowing the parser to optionally ("invisibly") incorporate a SHASUM 
> in
> the intermediate XML file allowing the destination unparser to validate the
> reconstitute the data against the incorporated SHASUM?
>
> Perhaps a lame suggestion, could Daffodil optionally insert a comment tag 
> while
> parsing identifying it as a Daffodil inserted shasum comment which the 
> unparser
> can identify and validate the reconstituted data.
>
> Thx in advance,
>
> v/r
>
> Attila
>
>

Reply via email to