Re: Please review mock up idea for checksum calculations in DFDL

2021-08-25 Thread Steve Lawrence


On 8/23/21 1:51 PM, Beckerle, Mike wrote:
> From: Steve Lawrence 
> Sent: Monday, August 9, 2021 12:18 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
> 

--- snip ---

> 
> 2) For the IPv4 layer, it feels a bit unfortunate to have to split the
> CRC into two separate layers, since the CRC algorithm is really just a
> checksum over the whole header with just the checksum field treated as
> if it were zero. Is it possible to have a property that just specifies
> that the Nth byte doesn't contribute? Maybe something like:
> 
>dfdlx:runtimeProperties="ignoreByte=5">...
> 
> @@@ In the case of the IPv4 checksum, it can just hardcode the fact
that it skips those specific bytes.  I included the splitting into two
separate layers just to illustrate that this complexity could be
handled. I will look at recasting this as just one checksum layer and
see how it comes out. I think the other example of the GPS data format
with parity bit computations, is worth looking at as that one is fairly
complicated in which bits contribute in what ways.

Thinking more about this, I'm wondering if this is even possible to have
a checksum field inside the checksum layer, as I suggested? I *think*
that would cause circularities during unparse?

Say we have this schema, which is a simplified version of IPv4:

  



  

So we have multiple fields that are all checksumed, where one of the
fields (field2 in this case) actually stores the checksum. And the bytes
associated with field are just skipped during the checksum calculation.

First field1 is unparsed. This goes to some InputStream, which the
checksum layer can start reading from and calculating the checksum. All
good so far.

Then field2 is unparsed. But because it is an OVC element, we create a
buffer for the eventual data, write nothing, and suspend until the
$checksum variable is set. All normal so far.

Then field3 is unparsed. But because the previous field is buffered,
this too must be buffered. We can still unparse data to this buffer, but
because it's being buffer, nothing is written to the InputStream that
the checksum layer is reading from.

And know we're in a deadlock. field2 is suspended waiting for $checksum
to be set. But we can't deliver any of these buffers to the underlying
InputStream so the checksum layer can finish it's calculation. Which
means $checksum is never set. So field2 can't unsuspended, etc. We're in
a loop.



A potential workaround might be to have special logic where the field3
buffer can be written to the checksum layer (since field two doesn't
matter in the calculation). And the checksum layer just knows field2 was
skipped. This would then allow the checksum layer to finish, and thus
field2 to be unparsed. But then the checksum layer needs to also keep a
buffer so that it can insert the unparsed field2 OVC value before the
field3 data. This seems pretty specialized though. And doesn't take into
account things like potential alignment that might not even be known
until field2 is actually unparsed, which would change the checksum value.

So I think we do need to use the approach where we split the checksum
into two different layers and combine them.


Re: Please review mock up idea for checksum calculations in DFDL

2021-08-25 Thread Steve Lawrence


On 8/25/21 1:41 PM, Beckerle, Mike wrote:
> One further comment at the end.
> 
> 
> From: Steve Lawrence 
> Sent: Monday, August 23, 2021 2:23 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
> 
> On 8/23/21 1:51 PM, Beckerle, Mike wrote:
>> Comments below see @@@mb
>>
>> 
>> From: Steve Lawrence 
>> Sent: Monday, August 9, 2021 12:18 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: Please review mock up idea for checksum calculations in DFDL
>>
>> Some comments:
>>
>> 1) I like the idea that the layers write to a variable, but it seems
>> like the variables are hard coded in the layer transformer? What are
>> your thoughts on having the variable defined in a property so that the
>> user has more control over the naming/definition of it, maybe via
>> something like dfdlx:runtimeProperties? For example:
>>
>>   > dfdlx:runtimeProperties="resultVariable=checksumPart1">...
>>
>> @@@ given that a layer transform can be defined with a unique namespace 
>> defined by way of a URI, there's never a need to be
>> concerned about naming conflicts. So I think ability to choose the variables 
>> names and provide them is overkill.
> 
> This is maybe a bit contrived, but one benefit of some configurability
> is that if you have a format with two of the same checksums for
> different parts of the data, you don't need newVariableInstance stuff.
> For example:
> 
>   
>   
> 
>   
>  dfdl:layerParameters="res=checksumHeader">
>   
> 
>  dfdl:layerParameters="res=checksumPayload">
>   
> 
>   
> 
> So it's just a bit cleaner looking. Though, I'm not sure that's a strong
> argument for configuring the variables. I imagine in most formats where
> there's multiple of the same checksums then it's in an array and you'd
> need new variable instance since the number of checksums isn't known.
> 
> I think this is a "let's see" kind of issue. We can use hardwired variables 
> for now, and add a feature later to pass in QNames of variables for the layer 
> to use if we find it too clumsy.

Good point. Keep it easy at first make sense. It should be easy to add a
feature to override the hardwired name if we realize it's needed.


Re: Please review mock up idea for checksum calculations in DFDL

2021-08-25 Thread Beckerle, Mike
One further comment at the end.


From: Steve Lawrence 
Sent: Monday, August 23, 2021 2:23 PM
To: dev@daffodil.apache.org 
Subject: Re: Please review mock up idea for checksum calculations in DFDL

On 8/23/21 1:51 PM, Beckerle, Mike wrote:
> Comments below see @@@mb
>
> 
> From: Steve Lawrence 
> Sent: Monday, August 9, 2021 12:18 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
>
> Some comments:
>
> 1) I like the idea that the layers write to a variable, but it seems
> like the variables are hard coded in the layer transformer? What are
> your thoughts on having the variable defined in a property so that the
> user has more control over the naming/definition of it, maybe via
> something like dfdlx:runtimeProperties? For example:
>
>dfdlx:runtimeProperties="resultVariable=checksumPart1">...
>
> @@@ given that a layer transform can be defined with a unique namespace 
> defined by way of a URI, there's never a need to be
> concerned about naming conflicts. So I think ability to choose the variables 
> names and provide them is overkill.

This is maybe a bit contrived, but one benefit of some configurability
is that if you have a format with two of the same checksums for
different parts of the data, you don't need newVariableInstance stuff.
For example:

  
  

  

  


  

  

So it's just a bit cleaner looking. Though, I'm not sure that's a strong
argument for configuring the variables. I imagine in most formats where
there's multiple of the same checksums then it's in an array and you'd
need new variable instance since the number of checksums isn't known.

I think this is a "let's see" kind of issue. We can use hardwired variables for 
now, and add a feature later to pass in QNames of variables for the layer to 
use if we find it too clumsy.

...



Re: Please review mock up idea for checksum calculations in DFDL

2021-08-23 Thread Steve Lawrence
On 8/23/21 1:51 PM, Beckerle, Mike wrote:
> Comments below see @@@mb
> 
> 
> From: Steve Lawrence 
> Sent: Monday, August 9, 2021 12:18 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
> 
> Some comments:
> 
> 1) I like the idea that the layers write to a variable, but it seems
> like the variables are hard coded in the layer transformer? What are
> your thoughts on having the variable defined in a property so that the
> user has more control over the naming/definition of it, maybe via
> something like dfdlx:runtimeProperties? For example:
> 
>dfdlx:runtimeProperties="resultVariable=checksumPart1">...
> 
> @@@ given that a layer transform can be defined with a unique namespace 
> defined by way of a URI, there's never a need to be
> concerned about naming conflicts. So I think ability to choose the variables 
> names and provide them is overkill.

This is maybe a bit contrived, but one benefit of some configurability
is that if you have a format with two of the same checksums for
different parts of the data, you don't need newVariableInstance stuff.
For example:

  
  

  

  


  

  

So it's just a bit cleaner looking. Though, I'm not sure that's a strong
argument for configuring the variables. I imagine in most formats where
there's multiple of the same checksums then it's in an array and you'd
need new variable instance since the number of checksums isn't known.


> I think of the variable definitions as coming from an imported schema that 
> one must have to use the layer transform.
> Right now we don't have a way of declaring a layer transform when defined 
> outside of the daffodil code base in a pluggable fashion, but assume we had 
> something like  className="com.myco.foobar"/> which would
> also appear in that import file, then accessing and using the layer transform 
> and its associated variables would all be obtained from the one import 
> statement.
> 
> 2) For the IPv4 layer, it feels a bit unfortunate to have to split the
> CRC into two separate layers, since the CRC algorithm is really just a
> checksum over the whole header with just the checksum field treated as
> if it were zero. Is it possible to have a property that just specifies
> that the Nth byte doesn't contribute? Maybe something like:
> 
>dfdlx:runtimeProperties="ignoreByte=5">...
> 
> @@@ In the case of the IPv4 checksum, it can just hardcode the fact that it 
> skips those specific bytes.  I included the splitting into two separate 
> layers just to illustrate that this complexity could be handled. I will look 
> at recasting this as just one checksum layer and see how it comes out. I 
> think the other example of the GPS data format with parity bit computations, 
> is worth looking at as that one is fairly complicated in which bits 
> contribute in what ways.

Agreed. I wasn't sure if there the IPv4 checks is specific to IPv4 or if
there are other uses where different bytes (or no bytes) are ignored.
Just thinking about re usability, but that's maybe more of an
implementation detail.

I'll take a look at the GPS example.

> 3) As for implementing the checksums, have you put any thought into
> making that extensible? For example, I'm wondering if we only have a
> single "checksum" layer, and then the dfdlx:runtimeProperties determines
> which algorithm to use? E.g.
> 
>dfdlx:runtimeProperties="algorithm=crc32">...
> 
>dfdlx:runtimeProperties="algorithm=ipv4header">...
> 
> And then people can register different checksum algorithms without
> having to reimplement their own layer? Or maybe we keep it simple and
> the default checksum layer just supports a handful of the most common
> checksums (maybe those supported by some preexisting checksum library?)
> 
> People could still implement their own pluggable checksum layer if they
> need something we don't support, but this would cover the most common
> cases and avoids a proliferation of a bunch of different layers that are
> basically the same except for some minor algorithm details.
> 
> @@@ This refactoring can of course be done. But isn't needed to get started. 
> Parameters to transform algorithms can be passed in variables, or could be 
> specified using an extensible property bag such as dfdlx:runtimeProperties as 
> you have shown. We may want a dedicated dfdl:layerParameters property since 
> we have other layering-specific properties (e.g., for layering length kind, 
> etc.) rather than using a generic hook. Ideally layering transformers could 
> check these properties statically and issue SDEs if misused.

Agreed.

> 
> On 7/30/21 

Re: Please review mock up idea for checksum calculations in DFDL

2021-08-23 Thread Beckerle, Mike
Comments below see @@@mb


From: Steve Lawrence 
Sent: Monday, August 9, 2021 12:18 PM
To: dev@daffodil.apache.org 
Subject: Re: Please review mock up idea for checksum calculations in DFDL

Some comments:

1) I like the idea that the layers write to a variable, but it seems
like the variables are hard coded in the layer transformer? What are
your thoughts on having the variable defined in a property so that the
user has more control over the naming/definition of it, maybe via
something like dfdlx:runtimeProperties? For example:

  ...

@@@ given that a layer transform can be defined with a unique namespace defined 
by way of a URI, there's never a need to be
concerned about naming conflicts. So I think ability to choose the variables 
names and provide them is overkill.

I think of the variable definitions as coming from an imported schema that one 
must have to use the layer transform.
Right now we don't have a way of declaring a layer transform when defined 
outside of the daffodil code base in a pluggable fashion, but assume we had 
something like  
which would
also appear in that import file, then accessing and using the layer transform 
and its associated variables would all be obtained from the one import 
statement.

2) For the IPv4 layer, it feels a bit unfortunate to have to split the
CRC into two separate layers, since the CRC algorithm is really just a
checksum over the whole header with just the checksum field treated as
if it were zero. Is it possible to have a property that just specifies
that the Nth byte doesn't contribute? Maybe something like:

  ...

@@@ In the case of the IPv4 checksum, it can just hardcode the fact that it 
skips those specific bytes.  I included the splitting into two separate layers 
just to illustrate that this complexity could be handled. I will look at 
recasting this as just one checksum layer and see how it comes out. I think the 
other example of the GPS data format with parity bit computations, is worth 
looking at as that one is fairly complicated in which bits contribute in what 
ways.

3) As for implementing the checksums, have you put any thought into
making that extensible? For example, I'm wondering if we only have a
single "checksum" layer, and then the dfdlx:runtimeProperties determines
which algorithm to use? E.g.

  ...

  ...

And then people can register different checksum algorithms without
having to reimplement their own layer? Or maybe we keep it simple and
the default checksum layer just supports a handful of the most common
checksums (maybe those supported by some preexisting checksum library?)

People could still implement their own pluggable checksum layer if they
need something we don't support, but this would cover the most common
cases and avoids a proliferation of a bunch of different layers that are
basically the same except for some minor algorithm details.

@@@ This refactoring can of course be done. But isn't needed to get started. 
Parameters to transform algorithms can be passed in variables, or could be 
specified using an extensible property bag such as dfdlx:runtimeProperties as 
you have shown. We may want a dedicated dfdl:layerParameters property since we 
have other layering-specific properties (e.g., for layering length kind, etc.) 
rather than using a generic hook. Ideally layering transformers could check 
these properties statically and issue SDEs if misused.


On 7/30/21 2:29 PM, Beckerle, Mike wrote:
> I would like comments on the layering enhancement to enable checksum
> computations in DFDL schemas.
>
>
> This is a high-priority feature for Daffodil's next release 3.2.0, especially
> for cybersecurity applications of Daffodil, which I know a number of us are
> involved in.
>
>
> I've produced a mock-up of how it would look, with lots of annotations in a 
> WIP
> pull request on the ethernetIP DFDL schema. I only did the mock-up for the 
> IPV4
> element, so look at that element in the ethernetIP.dfdl.xsd.
>
> (UDP and TCP packets have their own additional checksums - I didn't mock up
> those, just IPV4)
>
>
> This is at https://github.com/DFDLSchemas/ethernetIP/pull/1
> <https://github.com/DFDLSchemas/ethernetIP/pull/1>
>
>
> This doesn't run, it's just an initial mock-up of the ideas for
> checksum/CRC/parity recomputation capability as a further simple extension of
> the existing DFDL layering extension.
>
>
> The layering extension itself is described here:
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc
> <https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc>
>
>
> I did notice that none of the published DFDLSchemas actually use the layering
> transforms that we've built into Daffodil. There

RE: Please review: DFDL parity calculations also - was: Fw: Please review mock up idea for checksum calculations in DFDL

2021-08-10 Thread Interrante, John A (GE Research, US)
Mike,

I'll take a look.  When you say these pull requests aren't working yet, what 
are the missing parts?  I know at a minimum you need to implement the following 
transform names:







Do you plan to implement these transform names in Scala within the Daffodil 
codebase or as pluggable algorithms akin to UDFs?   Do you need to implement or 
change anything else in the layering functionality, given that the following 
initial transform names already are enumerated in 
daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/DFDL_part1_simpletypes.xsd?


  *   fourbyteswap: Swap bytes in 32 bit words
  *   base64_MIME: IETF RFC 2045, max line length 76 characters
  *   base64: IETF RFC 4648 - not the URL-SAFE version
  *   base64url: IETF RFC 4648 - the URL-SAFE version
  *   lineFolded_IMF: IETF RFC 2822 Internet Message Format (IMF)
  *   lineFolded_iCalendar: IETF RFC 5545 Internet Calendaring and Scheduling 
(iCalendar)
  *   quotedPrintable: IETF RFC 2045 Quoted Printable Content Transfer Encoding
  *   aisPayloadArmor: Automatic Identification System - ITU-R M.1371-1
  *   compress: Lempel-Ziv-Welch compression per Unix 'compress' command
  *   gzip: GZIP per https://www.ietf.org/rfc/rfc1952.txt

Actually, I don't see implementations of all these transform names - base64, 
base64url, and compress still seem to be unimplemented in the Daffodil codebase 
at this time.  But anyway, is all you need to do is to add these new transform 
names to the enumerated transform names and write the corresponding 
LayerTransformer & LayerTransformerFactory classes, tests, etc.?

I also saw Steve's comments and I think he has some good points on stuff like 
combining checksumLayerPart1 and checksumLayerPart2 into a single checksumLayer 
with a parameter specifying where the checksum field is so its value can be 
zero during the checksum computation.

John

From: Beckerle, Mike 
Sent: Tuesday, August 3, 2021 4:36 PM
To: dev@daffodil.apache.org
Subject: EXT: Please review: DFDL parity calculations also - was: Fw: Please 
review mock up idea for checksum calculations in DFDL

A second example focused on DFDL with parity calculations in a GPS format has 
also been "mocked up"

https://github.com/DFDLSchemas/gps-sps/pull/1

Please review and comment on this pull request also. The GPS spec this is based 
on is in the repository also in the doc directory.

Thank you


From: Beckerle, Mike
Sent: Friday, July 30, 2021 2:29 PM
To: dev@daffodil.apache.org<mailto:dev@daffodil.apache.org> 
mailto:dev@daffodil.apache.org>>
Subject: Please review mock up idea for checksum calculations in DFDL


I would like comments on the layering enhancement to enable checksum 
computations in DFDL schemas.



This is a high-priority feature for Daffodil's next release 3.2.0, especially 
for cybersecurity applications of Daffodil, which I know a number of us are 
involved in.



I've produced a mock-up of how it would look, with lots of annotations in a WIP 
pull request on the ethernetIP DFDL schema. I only did the mock-up for the IPV4 
element, so look at that element in the ethernetIP.dfdl.xsd.

(UDP and TCP packets have their own additional checksums - I didn't mock up 
those, just IPV4)



This is at https://github.com/DFDLSchemas/ethernetIP/pull/1



This doesn't run, it's just an initial mock-up of the ideas for 
checksum/CRC/parity recomputation capability as a further simple extension of 
the existing DFDL layering extension.



The layering extension itself is described here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc



I did notice that none of the published DFDLSchemas actually use the layering 
transforms that we've built into Daffodil. There are some non-public DFDL 
schemas that do use this extension to do line-folding transformations.



There are, however, tests showing the DFDL layering extension in daffodil's 
code base. See
https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
and search for dfdlx:layerTransform property.



The mock-up effectively proposes allowing layer transforms to read and write 
DFDL variables, as a means of them accepting input parameters, and as the means 
of them computing and returning output results.



I plan to do a couple other mock-ups of a check-digit calculation, and some 
parity bit computations, but this IPV4 is enough to get the gist of the idea.



I'd appreciate feedback on this, which you can do on the pull request in the 
usual github code review manner.



-mikeb



Mike Beckerle | Principal Engineer

[cid:2b10f593-ca11-4030-8f7b-3db1a1024055]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>
P +1-781-330-0412



Re: Please review mock up idea for checksum calculations in DFDL

2021-08-09 Thread Steve Lawrence
Some comments:

1) I like the idea that the layers write to a variable, but it seems
like the variables are hard coded in the layer transformer? What are
your thoughts on having the variable defined in a property so that the
user has more control over the naming/definition of it, maybe via
something like dfdlx:runtimeProperties? For example:

  ...

2) For the IPv4 layer, it feels a bit unfortunate to have to split the
CRC into two separate layers, since the CRC algorithm is really just a
checksum over the whole header with just the checksum field treated as
if it were zero. Is it possible to have a property that just specifies
that the Nth byte doesn't contribute? Maybe something like:

  ...


3) As for implementing the checksums, have you put any thought into
making that extensible? For example, I'm wondering if we only have a
single "checksum" layer, and then the dfdlx:runtimeProperties determines
which algorithm to use? E.g.

  ...

  ...

And then people can register different checksum algorithms without
having to reimplement their own layer? Or maybe we keep it simple and
the default checksum layer just supports a handful of the most common
checksums (maybe those supported by some preexisting checksum library?)

People could still implement their own pluggable checksum layer if they
need something we don't support, but this would cover the most common
cases and avoids a proliferation of a bunch of different layers that are
basically the same except for some minor algorithm details.


On 7/30/21 2:29 PM, Beckerle, Mike wrote:
> I would like comments on the layering enhancement to enable checksum 
> computations in DFDL schemas.
> 
> 
> This is a high-priority feature for Daffodil's next release 3.2.0, especially 
> for cybersecurity applications of Daffodil, which I know a number of us are 
> involved in.
> 
> 
> I've produced a mock-up of how it would look, with lots of annotations in a 
> WIP 
> pull request on the ethernetIP DFDL schema. I only did the mock-up for the 
> IPV4 
> element, so look at that element in the ethernetIP.dfdl.xsd.
> 
> (UDP and TCP packets have their own additional checksums - I didn't mock up 
> those, just IPV4)
> 
> 
> This is at https://github.com/DFDLSchemas/ethernetIP/pull/1 
> 
> 
> 
> This doesn't run, it's just an initial mock-up of the ideas for 
> checksum/CRC/parity recomputation capability as a further simple extension of 
> the existing DFDL layering extension.
> 
> 
> The layering extension itself is described here:
> 
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc
>  
> 
> 
> 
> I did notice that none of the published DFDLSchemas actually use the layering 
> transforms that we've built into Daffodil. There are some non-public DFDL 
> schemas that do use this extension to do line-folding transformations.
> 
> 
> There are, however, tests showing the DFDL layering extension in daffodil's 
> code 
> base. See
> 
> https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
>  
> 
> and search for dfdlx:layerTransform property.
> 
> 
> The mock-up effectively proposes allowing layer transforms to read and write 
> DFDL variables, as a means of them accepting input parameters, and as the 
> means 
> of them computing and returning output results.
> 
> 
> I plan to do a couple other mock-ups of a check-digit calculation, and some 
> parity bit computations, but this IPV4 is enough to get the gist of the idea.
> 
> 
> I'd appreciate feedback on this, which you can do on the pull request in the 
> usual github code review manner.
> 
> 
> -mikeb
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> 
> mbecke...@owlcyberdefense.com 
> 
> P +1-781-330-0412
> 



Please review: DFDL parity calculations also - was: Fw: Please review mock up idea for checksum calculations in DFDL

2021-08-03 Thread Beckerle, Mike
A second example focused on DFDL with parity calculations in a GPS format has 
also been "mocked up"

https://github.com/DFDLSchemas/gps-sps/pull/1

Please review and comment on this pull request also. The GPS spec this is based 
on is in the repository also in the doc directory.

Thank you


From: Beckerle, Mike
Sent: Friday, July 30, 2021 2:29 PM
To: dev@daffodil.apache.org 
Subject: Please review mock up idea for checksum calculations in DFDL


I would like comments on the layering enhancement to enable checksum 
computations in DFDL schemas.


This is a high-priority feature for Daffodil's next release 3.2.0, especially 
for cybersecurity applications of Daffodil, which I know a number of us are 
involved in.


I've produced a mock-up of how it would look, with lots of annotations in a WIP 
pull request on the ethernetIP DFDL schema. I only did the mock-up for the IPV4 
element, so look at that element in the ethernetIP.dfdl.xsd.

(UDP and TCP packets have their own additional checksums - I didn't mock up 
those, just IPV4)


This is at https://github.com/DFDLSchemas/ethernetIP/pull/1


This doesn't run, it's just an initial mock-up of the ideas for 
checksum/CRC/parity recomputation capability as a further simple extension of 
the existing DFDL layering extension.


The layering extension itself is described here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc


I did notice that none of the published DFDLSchemas actually use the layering 
transforms that we've built into Daffodil. There are some non-public DFDL 
schemas that do use this extension to do line-folding transformations.


There are, however, tests showing the DFDL layering extension in daffodil's 
code base. See

https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
and search for dfdlx:layerTransform property.


The mock-up effectively proposes allowing layer transforms to read and write 
DFDL variables, as a means of them accepting input parameters, and as the means 
of them computing and returning output results.


I plan to do a couple other mock-ups of a check-digit calculation, and some 
parity bit computations, but this IPV4 is enough to get the gist of the idea.


I'd appreciate feedback on this, which you can do on the pull request in the 
usual github code review manner.


-mikeb



Mike Beckerle | Principal Engineer

[cid:2b10f593-ca11-4030-8f7b-3db1a1024055]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>

P +1-781-330-0412



Please review mock up idea for checksum calculations in DFDL

2021-07-30 Thread Beckerle, Mike
I would like comments on the layering enhancement to enable checksum 
computations in DFDL schemas.


This is a high-priority feature for Daffodil's next release 3.2.0, especially 
for cybersecurity applications of Daffodil, which I know a number of us are 
involved in.


I've produced a mock-up of how it would look, with lots of annotations in a WIP 
pull request on the ethernetIP DFDL schema. I only did the mock-up for the IPV4 
element, so look at that element in the ethernetIP.dfdl.xsd.

(UDP and TCP packets have their own additional checksums - I didn't mock up 
those, just IPV4)


This is at https://github.com/DFDLSchemas/ethernetIP/pull/1


This doesn't run, it's just an initial mock-up of the ideas for 
checksum/CRC/parity recomputation capability as a further simple extension of 
the existing DFDL layering extension.


The layering extension itself is described here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc


I did notice that none of the published DFDLSchemas actually use the layering 
transforms that we've built into Daffodil. There are some non-public DFDL 
schemas that do use this extension to do line-folding transformations.


There are, however, tests showing the DFDL layering extension in daffodil's 
code base. See

https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
and search for dfdlx:layerTransform property.


The mock-up effectively proposes allowing layer transforms to read and write 
DFDL variables, as a means of them accepting input parameters, and as the means 
of them computing and returning output results.


I plan to do a couple other mock-ups of a check-digit calculation, and some 
parity bit computations, but this IPV4 is enough to get the gist of the idea.


I'd appreciate feedback on this, which you can do on the pull request in the 
usual github code review manner.


-mikeb



Mike Beckerle | Principal Engineer

[cid:2b10f593-ca11-4030-8f7b-3db1a1024055]

mbecke...@owlcyberdefense.com

P +1-781-330-0412