Re: Please review mock up idea for checksum calculations in DFDL

Steve Lawrence Wed, 25 Aug 2021 12:05:54 -0700


On 8/23/21 1:51 PM, Beckerle, Mike wrote:
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Monday, August 9, 2021 12:18 PM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
>


--- snip ---

> 
> 2) For the IPv4 layer, it feels a bit unfortunate to have to split the
> CRC into two separate layers, since the CRC algorithm is really just a
> checksum over the whole header with just the checksum field treated as
> if it were zero. Is it possible to have a property that just specifies
> that the Nth byte doesn't contribute? Maybe something like:
> 
>   <xs:sequence dfdlx:layerTransform="checksum"
> dfdlx:runtimeProperties="ignoreByte=5">...
> 
> @@@ In the case of the IPv4 checksum, it can just hardcode the fact
that it skips those specific bytes.  I included the splitting into two
separate layers just to illustrate that this complexity could be
handled. I will look at recasting this as just one checksum layer and
see how it comes out. I think the other example of the GPS data format
with parity bit computations, is worth looking at as that one is fairly
complicated in which bits contribute in what ways.

Thinking more about this, I'm wondering if this is even possible to have
a checksum field inside the checksum layer, as I suggested? I *think*
that would cause circularities during unparse?

Say we have this schema, which is a simplified version of IPv4:

  <xs:sequence dfdlx:layer="uri:checksum">
    <xs:element name="field1" ... />
    <xs:element name="field2" dfdl:outputValueCalc="{ $checksum }" ... />
    <xs:element name="field3" ... />
  </xs:sequence>

So we have multiple fields that are all checksumed, where one of the
fields (field2 in this case) actually stores the checksum. And the bytes
associated with field are just skipped during the checksum calculation.

First field1 is unparsed. This goes to some InputStream, which the
checksum layer can start reading from and calculating the checksum. All
good so far.

Then field2 is unparsed. But because it is an OVC element, we create a
buffer for the eventual data, write nothing, and suspend until the
$checksum variable is set. All normal so far.

Then field3 is unparsed. But because the previous field is buffered,
this too must be buffered. We can still unparse data to this buffer, but
because it's being buffer, nothing is written to the InputStream that
the checksum layer is reading from.

And know we're in a deadlock. field2 is suspended waiting for $checksum
to be set. But we can't deliver any of these buffers to the underlying
InputStream so the checksum layer can finish it's calculation. Which
means $checksum is never set. So field2 can't unsuspended, etc. We're in
a loop.



A potential workaround might be to have special logic where the field3
buffer can be written to the checksum layer (since field two doesn't
matter in the calculation). And the checksum layer just knows field2 was
skipped. This would then allow the checksum layer to finish, and thus
field2 to be unparsed. But then the checksum layer needs to also keep a
buffer so that it can insert the unparsed field2 OVC value before the
field3 data. This seems pretty specialized though. And doesn't take into
account things like potential alignment that might not even be known
until field2 is actually unparsed, which would change the checksum value.

So I think we do need to use the approach where we split the checksum
into two different layers and combine them.

Re: Please review mock up idea for checksum calculations in DFDL

Reply via email to