Daffodil doesn't currently have this ability.
The raw ingredients are largely there.
For example, the dfdl:valueLength or dfdl:contentLength function can be used as
rulers to measure how big something is.
So if you organized a DFDL schema as
Then you can put an element in the schema and literally ask for
dfdl:valueLength(../measureThis) in a dfdl:outputValueCalc element.
The idea that we should be able to annotate every element with its start
position and length, and carry this through as annotated Infoset output is a
good one. The debugger hooks have this information and output it in the trace
output.
From: Interrante, John A (GE Research, US)
Sent: Thursday, July 1, 2021 10:05 AM
To: dev@daffodil.apache.org
Subject: How to list offset and length of DFDL elements within native data?
I've been asked a Daffodil / DFDL question that I don't know how to answer.
The question is:
How to implement a function like get_offset_len(data, schema,
field_path) -> (offset, length) ?
Do you know a good way (using Daffodil library functions or
DFDL constructs) to pass some native data, a DFDL schema, an XPath or DPath
expression referring to an element in the DFDL schema, and get the offset and
length of that element's field within the native data?
Alternatively, does Daffodil have a way to apply a DFDL schema
to some native data, construct an infoset from the native data, and list all
the elements in the infoset along with their DPath, offset, and length?
I searched the Daffodil codebase and wasn't able to find a specific API like
that although I may have missed something usable. I scanned the DFDL
specification and I did find a DFDL function called "dfdl:contentLength" in
section 18.5.3. The function's signature is:
dfdl:contentLength($node, $lengthUnits)
Returns the length of the supplied node's SimpleContent region
for elements of simple type, or ComplexContent region for elements of complex
type. These regions are defined in Section 9.2 DFDL Data Syntax Grammar. The
value is returned as an xs:unsignedLong.
The second argument is of type xs:string and must be 'bytes', 'characters', or
'bits' (Schema Definition Error otherwise) and determines the units of length.
Being able to get each element's length looks like it could help although a
note in the same section said that the content length returned by
dfdl:contentLength() excludes any alignment filling as well as any leading or
trailing skip bytes. That is, the returned length tells you about the length
of the content, but does not tell you about the position of the content in the
native data stream which is what I was asked to find. Nevertheless, if the
native data is not text but rather binary data with fixed-size fields, being
able to list each content field with its length might be sufficient to deduce
the position of each content field as well.
I wonder which would be easier to do?
1. Write a Scala program which calls some Daffodil API to parse some native
data, construct an infoset from the native data, and list all the elements in
the infoset along with their DPath, offset, and length? This would require
Daffodil to have an API to iterate over each element in the infoset and return
each element's content length.
2. Add DFDL constructs to a DFDL schema which call dfdl:contentLength and
dfdl:outputValueCalc to append the same information to the infoset? This would
require saving the infoset as XML and writing a program or command to read the
information as a list.
3. Another way which I don't know about yet?
4. How would we handle any alignment filling as well as any leading or
trailing skip bytes if the DFDL schema uses them?
Thanks,
John