Daffodil Devs,

Ludo Visser of ESA has attempted to run DFDL4S schemas with Daffodil,
modifying them as appropriate, but without success.

Email discussion thread is captured below.

If the C-code backend had more complete expression support it might well
handle the DFDL4S use case. Regular Daffodil with the JVM backend should
certainly be able to handle the use case albeit converting data into XML or
JSON is not part of their use case.

DFDL4S has an extension to the expression language. I suspect this could be
worked around, though may require the schemas to be modified a bit. They
use a regex in the element names of expressions to provide a specific kind
of expression polymorphism. (See
https://github.com/OpenGridForum/DFDL/issues/13) Moving these sort of
referenced child fields so that polymorphism is not needed to access them
may be possible. I think this may be another case where the schema shape
and nesting that a schema author prefers does not match the DFDL schema one
must write to make it actually work.

We have JIRA ticket https://issues.apache.org/jira/browse/DAFFODIL-1431
which was for DFDL4S Compatibility testing, but this ticket was closed for
being too "wish-list". Perhaps we should reopen this as Ludo has done some
of the work to identify data and schemas to use for this testing.

Mike Beckerle
Apache Daffodil | daffodil.apache.org
OGF DFDL Workgroup | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl



---------- Forwarded message ---------
From: Ludo Visser <[email protected]>
Date: Thu, Dec 18, 2025, 6:19 AM
Subject: Re: Participation in DFDL ISO workgroup
To: [email protected] <[email protected]>
Cc: Michele Zundo <[email protected]>, Montserrat Pinol Sole <
[email protected]>, Andrea Della Vecchia <
[email protected]>


Dear Mike,

Our schema’s are publicly available on our website:
https://eop-cfi.esa.int/index.php/applications/s2g-data-viewer/mission-files
(or
directly:
https://eop-cfi.esa.int/Repo/PUBLIC/DOCUMENTATION/MISSION_DATA/TELEMETRY_SCHEMA_FILES/
)

They are .jar files for use with S2G (
https://eop-cfi.esa.int/index.php/applications/s2g-data-viewer), but they
can be renamed to .zip and extracted. On the S2G page you can also find
some example data files.

I’ve attached here for convenience the Sentinel 1 and 3 schemas and the
example data.

I tried to parse e.g. the Sentinel 1 file with:

 ./daffodil/bin/daffodil parse -s Sentinel1X-bandTMISP.xsd --
demo_data_files/S1A_SAR_ISP.BIN

This will immediately complain about this line in
Sentinel1X-bandTMISPData.xsd:

 <xs:element name="AISR_Source_Data" type="TypeAISRSourceData" dfdl:
occursCountKind="expression" dfdl:occursCount="672"/>

Fixing that (adding { } around 672) then produces a long list of errors,
including those I mentioned in my previous email.

As mentioned, I’m very interested to get this to work, so please reach out
if further clarification is needed or if I can be of help otherwise.

Best regards,
Ludo


-- 

Ludo Visser

EOP-PES
* ESA ESTEC*
*[email protected] <[email protected]>* | T +31 6 414 71061
<+31641471061>

*From: *Mike Beckerle <[email protected]>
*Date: *Monday, 24 November 2025 at 15:37
*To: *Ludo Visser <[email protected]>
*Cc: *Michele Zundo <[email protected]>, Montserrat Pinol Sole <
[email protected]>, Andrea Della Vecchia <
[email protected]>
*Subject: *Re: Participation in DFDL ISO workgroup

If we could get the schema and a piece of example data we could dig into
why it doesn't work.
I'd like to first get your schema working in regular Daffodil running on
the JVM just to make sure we understand the requirements.

Right now expression support in the C-code generator is very partial and
I'm not at all surprised it doesn't work. We do have an open ticket (
https://issues.apache.org/jira/browse/DAFFODIL-2536) on refactoring
Daffodil to enable the C-code generator or other back-ends to traverse the
parsed expression tree, and generally work without having to be part of
Daffodil's core library. At that point adding real expression support to
the C-code generator would be much easier.

As far as whether your schema is "standard" DFDL, other than your path-step
regex extension feature, the other thing missing is a prefix "dfdl:" on the
contentLength function.

The regex feature you added to DFDL4S to create these polymorphic paths has
gotten some discussion and there are similar needs from different use
cases. This experimental DFDL feature issue  was opened:
https://github.com/OpenGridForum/DFDL/issues/13. In that the regex aspect
is not there so instead of "(.*)Packet_Secondary_Header" matching
"some_prefix_Packet_Secondary_Header" one would instead write
"*/Packet_Secondary_Header" matching "some_prefix/Packet_Seconary_Header"
which is closer to ordinary XPath wildcards.
This changes the schema so that instead of multiple fields with a shared
prefix in their names those become sub-fields within a common parent field
and the shared name prefix becomes the name of the common parent.


On Mon, Nov 24, 2025 at 3:53 AM Ludo Visser <[email protected]> wrote:

Hi Michele,

I had some time to play around with Daffodil. Unfortunately, and somewhat
unexpected, the tool does not accept any of our schemas.

Some issues seem to be related to Daffodil being a bit stricter than our
tools. For example, an attribute like “truncateSpecifiedLengthString” is
only used in unparsing, but Daffodil requires it to be present even when
parsing a file, while our tools happily accept it being omitted. Such
issues can be readily resolved.

More problematic are features that are not (yet) implemented by Daffodil.
For example, we make have use of the following construct to dynamically
compute element lengths based on information found in other nodes:

<xs:element name="Data" type="xs:hexBinary”
    dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{/Packet_Primary_Header/Packet_Data_Length + 1 -
contentLength(/Packet_Data_Field/(.*)Packet_Secondary_Header,'bytes') - 2}"
dmx:representation="Hexadecimal”>
</xs:element>

I’m not entirely sure if the expression is 100% compliant with the
specification, especially the regex part, but in any case,  Daffodil is
unable to parse this even with the regex removed.

Unfortunately, due to these issues, I was unable to generate the C code and
benchmark the tool against our tools, but I am interested to see if we can
find a way to make Daffodil and our DFDL4S schemas work together.

Best regards,
Ludo


-- 

*Satellite System Analysis Engineering Service*

*Service Delivered by Starion Nederland B.V. for the European Space Agency*

Ludo Visser

Akkodis Netherlands International B.V.



EOP-P
* ESTEC*

*[email protected] <[email protected]>*
T +31 6 414 71061


*From: *Michele Zundo <[email protected]>
*Date: *Friday, 4 July 2025 at 10:10
*To: *[email protected] <[email protected]>
*Cc: *Ludo Visser <[email protected]>, Montserrat Pinol Sole <
[email protected]>, Andrea Della Vecchia <
[email protected]>
*Subject: *Re: Participation in DFDL ISO workgroup

Hi Mike,  thanks. I was not aware.



I will ask my colleagues to have a look at  generating C code for one of
our schema and see how fast it goes.



Regards



Michele





*From: *Mike Beckerle <[email protected]>
*Date: *Thursday, 3 July 2025 at 18:32
*To: *Michele Zundo <[email protected]>
*Cc: *Ludo Visser <[email protected]>, Montserrat Pinol Sole <
[email protected]>, Andrea Della Vecchia <
[email protected]>
*Subject: *Re: Participation in DFDL ISO workgroup

Thanks for this Michele,



Are you aware of the C-code generator sub-effort for Daffodil? It is very
partial currently, but converts DFDL schemas to C code for parse/unparse.
Creates a C struct infoset object corresponding to the logical schema.



Most recent status is this:
*https://daffodil.apache.org/dev/design-notes/daffodilc-todos/
<https://daffodil.apache.org/dev/design-notes/daffodilc-todos/>*



I know the group that initiated this had very high speed as their goal, and
even generated System Verilog or VHDL to directly implement the
parse/unparse on FPGA logic though that aspect was not made open-source.





On Thu, Jul 3, 2025 at 6:07 AM Michele Zundo <*[email protected]
<[email protected]>*> wrote:

Hi Mike,



Some brief comments from our side, taking into account our specific use
case.

Indeed, requiring an XML/XSD parser is generally seen as heavy by all our
collaborators. While DFDL offers great flexibility, we’ve found its
performance suboptimal—making it suitable for tooling purposes but not fast
enough for use within high-performance data processors that need to read
binary data streams efficiently. In such contexts, manual coding of readers
is still the preferred approach.

What would be genuinely helpful is a *code generator* that could take a
DFDL schema and generate optimised read() and write() functions,
effectively hardcoding the logic for that specific schema to improve
execution speed.

On the functional side, there are several aspects of XSD/DFDL that we
appreciate:

·      The same schema used with DFDL can also be used to read/write data
in XML or other formats. We rely on XSD as a central *data model*, and we
benefit from standard XSD validation tools.

·      Using XML as a descriptive layer for mapping the logical structure
to a database is quite standard and well supported.

·      The syntax is rich and expressive, supporting *conditional
structures*, *variants*, and *polymorphism*—which we use extensively,
especially for CCSDS-based data where the structure depends on the value of
specific fields.

·      *Assertions* can be embedded in the schema itself in a standardised
way, allowing inline value checks.

·      A wide range of tools support schema editing and visualisation. For
example, we use *Oxygen XML Editor* on macOS (similar to *XMLSpy* on
Windows) for maintaining our schemas.

·      Through the XSD include mechanism, we can modularise and maintain
specific data components in separate files. This allows us to update only
the changed elements without affecting the full structure.

·      Finally, and importantly, we have full *bit-level control* over the
binary encoding through our XSD-defined DFDL schemas, which is crucial for
our applications.

There is currently a trend towards adopting lightweight languages like
JSON. Personally, I remain unconvinced that JSON is strict or expressive
enough for our needs—especially when it comes to polymorphism and
conditional structures. While support for such features is being added to
JSON schema, doing so risks undermining the original simplicity and purpose
of the format.

We’ve never faced issues with the size of the XSD schemas themselves—they
are generally compact. Nor do we typically serialise binary data to XML, so
the verbosity of XML isn’t a problem for us. Our use of DFDL is aimed at
enabling machine-to-machine communication, specifically to allow algorithms
to read and write binary data reliably and consistently.

>From this perspective, it’s not entirely clear how the use of *EXI* would
improve our workflow. Our primary goal is *not* to make XML compact, but
rather to retain precise control over the * bit-level encoding*, which is
not the focus of EXI.

Best regards,

*Michele*

Reply via email to