Given recent discussion re: releases Any inclination to implement validation strategy for reconstituted data in a lossless environment?
Thx - Attila On 2021/09/17 20:11:54, "Beckerle, Mike" <mbecke...@owlcyberdefense.com> wrote: > Apologies on tardy reply. I missed parts of this thread due to spam email > filter. > > (I learned that MS Outlook 365 is misclassifying some Apache email as junk > email. ) > > Here's the link to what is proposed for checksum calculations, and it has > links to some mock-ups showing how this checksum/crc stuff is supposed to > work. > > https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Checksums%2C+CRC%2C+Parity+-+Layering+Enhancements > > I do think this could be used to couple a generic hash into data that is > verified at unparse. > > > ________________________________ > From: Steve Lawrence <slawre...@apache.org> > Sent: Monday, August 30, 2021 9:50 AM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: Fwd: FW: DFDL: potential problem > > Interesting idea. > > I was thinking you could do something like this once we have this new > feature implemented: > > <xs:element name="FormatAndChecksum"> > <xs:sequence> > <xs:sequence dfdlx:layer="checksum"> > <xs:element ref="Format" /> > </xs:sequence> > <xs:element name="checksum" type="xs:string" > dfdl:inputValueCalc="$checksum" /> > <xs:sequence> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:assert test="{ ./checksum eq $checksum }" /> > </xs:appinfo> > </xs:annotation> > </xs:sequence> > </xs:sequence> > </xs:element> > > So we parse and checksum the entire data foramt, add the checksum to the > infoset via input value calc, and then add an assert that the calculated > checksum matchs the value in the infoset. > > On parse, these two should always be the same. But on unparse, it's > possible they could be different and the assert would fail. > Unfortunately, this doesn't actually work because assert's are evaluated > during unparse. > > This seems like a reasonable use case for asserts during unparse, and I > imagine there are others, so maybe that's a feature worth considering to > allow this type of unparse validation. > > > > > On 8/25/21 9:20 AM, Attila Horvath wrote: > > > > *Subject:* DFDL: potential problem > > > > ALCON > > > > re: idea for checksum calculations in DFDL > > <https://lists.apache.org/thread.html/r85112d45e552a1f5b467406aeeee0f0a4bcaf143372b95c8e72f2669%40%3Cdev.daffodil.apache.org%3E> > > > > We may have a potential ‘situation’ as part of our DFDL/Daffodil offering as > > follows… > > > > My DFDL schema development process consists of examining the exit codes of a > > four (4) part mechanism: > > > > 1. DFDL parsing – “Houston, we have a go.” > > 2. DFDL unparsing – “Houston, we have a go.” > > 3. *End-to-end source/destination data comparison – “Houston, we have a > > problem.”* > > 4. Intermediate xml validation against reconstituted data – “Houston, we > > have a > > go.” > > > > I have an *_unintentional_*error in my DFDL schema- unfortunately the > > data/schema is lost that created this situation. Per above, both parse and > > unparse execute successfully and xmllint validates Daffodil’s intermediate > > XML > > file successfully against the reconstituted/unparsed data as well against > > the > > DFDL [erroneous] schema. > > > > However, the source and target data are *_NOT_* congruent.This is one > > situation > > I did not anticipate this situation. > > > > This means, our model and incorporation of Daffodil in our situation leaves > > [albeit] a /possibility/ to have an erroneous DFDL schema that will > > ultimately > > send data end-to-end but because the two [gateway]ends do not > > communicatedirectly w/ each other there is no way for the destination > > gateway to > > verify if the data is identical w/ the data received by the source gateway. > > > > To address above and perhaps along the lines of 'checksum calculations' re: > > IPV4 > > element, what is the collective opinion of having a SHASUM capability added > > to > > Daffodil allowing the parser to optionally ("invisibly") incorporate a > > SHASUM in > > the intermediate XML file allowing the destination unparser to validate the > > reconstitute the data against the incorporated SHASUM? > > > > Perhaps a lame suggestion, could Daffodil optionally insert a comment tag > > while > > parsing identifying it as a Daffodil inserted shasum comment which the > > unparser > > can identify and validate the reconstituted data. > > > > Thx in advance, > > > > v/r > > > > Attila > > > > > >