Re: How to list offset and length of DFDL elements within native data?

2021-07-01 Thread Beckerle, Mike
Daffodil doesn't currently have this ability.

The raw ingredients are largely there.

For example, the dfdl:valueLength or dfdl:contentLength function can be used as 
rulers to measure how big something is.

So if you organized a DFDL schema as




Then you can put an element in the schema and literally ask for 
dfdl:valueLength(../measureThis) in a dfdl:outputValueCalc element.

The idea that we should be able to annotate every element with its start 
position and length, and carry this through as annotated Infoset output is a 
good one. The debugger hooks have this information and output it in the trace 
output.



From: Interrante, John A (GE Research, US) 
Sent: Thursday, July 1, 2021 10:05 AM
To: dev@daffodil.apache.org 
Subject: How to list offset and length of DFDL elements within native data?

I've been asked a Daffodil / DFDL question that I don't know how to answer.  
The question is:

How to implement a function like get_offset_len(data, schema, 
field_path) -> (offset, length) ?

Do you know a good way (using Daffodil library functions or 
DFDL constructs) to pass some native data, a DFDL schema, an XPath or DPath 
expression referring to an element in the DFDL schema, and get the offset and 
length of that element's field within the native data?

Alternatively, does Daffodil have a way to apply a DFDL schema 
to some native data, construct an infoset from the native data, and list all 
the elements in the infoset along with their DPath, offset, and length?

I searched the Daffodil codebase and wasn't able to find a specific API like 
that although I may have missed something usable.  I scanned the DFDL 
specification and I did find a DFDL function called "dfdl:contentLength" in 
section 18.5.3.  The function's signature is:

dfdl:contentLength($node, $lengthUnits)

Returns the length of the supplied node's SimpleContent region 
for elements of simple type, or ComplexContent region for elements of complex 
type. These regions are defined in Section 9.2 DFDL Data Syntax Grammar. The 
value is returned as an xs:unsignedLong.
The second argument is of type xs:string and must be 'bytes', 'characters', or 
'bits' (Schema Definition Error otherwise) and determines the units of length.

Being able to get each element's length looks like it could help although a 
note in the same section said that the content length returned by 
dfdl:contentLength() excludes any alignment filling as well as any leading or 
trailing skip bytes.   That is, the returned length tells you about the length 
of the content, but does not tell you about the position of the content in the 
native data stream which is what I was asked to find.  Nevertheless, if the 
native data is not text but rather binary data with fixed-size fields, being 
able to list each content field with its length might be sufficient to deduce 
the position of each content field as well.

I wonder which would be easier to do?


  1.  Write a Scala program which calls some Daffodil API to parse some native 
data, construct an infoset from the native data, and list all the elements in 
the infoset along with their DPath, offset, and length?  This would require 
Daffodil to have an API to iterate over each element in the infoset and return 
each element's content length.
  2.  Add DFDL constructs to a DFDL schema which call dfdl:contentLength and 
dfdl:outputValueCalc to append the same information to the infoset?  This would 
require saving the infoset as XML and writing a program or command to read the 
information as a list.
  3.  Another way which I don't know about yet?
  4.  How would we handle any alignment filling as well as any leading or 
trailing skip bytes if the DFDL schema uses them?

Thanks,
John


How to list offset and length of DFDL elements within native data?

2021-07-01 Thread Interrante, John A (GE Research, US)
I've been asked a Daffodil / DFDL question that I don't know how to answer.  
The question is:

How to implement a function like get_offset_len(data, schema, 
field_path) -> (offset, length) ?

Do you know a good way (using Daffodil library functions or 
DFDL constructs) to pass some native data, a DFDL schema, an XPath or DPath 
expression referring to an element in the DFDL schema, and get the offset and 
length of that element's field within the native data?

Alternatively, does Daffodil have a way to apply a DFDL schema 
to some native data, construct an infoset from the native data, and list all 
the elements in the infoset along with their DPath, offset, and length?

I searched the Daffodil codebase and wasn't able to find a specific API like 
that although I may have missed something usable.  I scanned the DFDL 
specification and I did find a DFDL function called "dfdl:contentLength" in 
section 18.5.3.  The function's signature is:

dfdl:contentLength($node, $lengthUnits)

Returns the length of the supplied node's SimpleContent region 
for elements of simple type, or ComplexContent region for elements of complex 
type. These regions are defined in Section 9.2 DFDL Data Syntax Grammar. The 
value is returned as an xs:unsignedLong.
The second argument is of type xs:string and must be 'bytes', 'characters', or 
'bits' (Schema Definition Error otherwise) and determines the units of length.

Being able to get each element's length looks like it could help although a 
note in the same section said that the content length returned by 
dfdl:contentLength() excludes any alignment filling as well as any leading or 
trailing skip bytes.   That is, the returned length tells you about the length 
of the content, but does not tell you about the position of the content in the 
native data stream which is what I was asked to find.  Nevertheless, if the 
native data is not text but rather binary data with fixed-size fields, being 
able to list each content field with its length might be sufficient to deduce 
the position of each content field as well.

I wonder which would be easier to do?


  1.  Write a Scala program which calls some Daffodil API to parse some native 
data, construct an infoset from the native data, and list all the elements in 
the infoset along with their DPath, offset, and length?  This would require 
Daffodil to have an API to iterate over each element in the infoset and return 
each element's content length.
  2.  Add DFDL constructs to a DFDL schema which call dfdl:contentLength and 
dfdl:outputValueCalc to append the same information to the infoset?  This would 
require saving the infoset as XML and writing a program or command to read the 
information as a list.
  3.  Another way which I don't know about yet?
  4.  How would we handle any alignment filling as well as any leading or 
trailing skip bytes if the DFDL schema uses them?

Thanks,
John