Daffodil Devs, Ludo Visser of ESA has attempted to run DFDL4S schemas with Daffodil, modifying them as appropriate, but without success.
Email discussion thread is captured below. If the C-code backend had more complete expression support it might well handle the DFDL4S use case. Regular Daffodil with the JVM backend should certainly be able to handle the use case albeit converting data into XML or JSON is not part of their use case. DFDL4S has an extension to the expression language. I suspect this could be worked around, though may require the schemas to be modified a bit. They use a regex in the element names of expressions to provide a specific kind of expression polymorphism. (See https://github.com/OpenGridForum/DFDL/issues/13) Moving these sort of referenced child fields so that polymorphism is not needed to access them may be possible. I think this may be another case where the schema shape and nesting that a schema author prefers does not match the DFDL schema one must write to make it actually work. We have JIRA ticket https://issues.apache.org/jira/browse/DAFFODIL-1431 which was for DFDL4S Compatibility testing, but this ticket was closed for being too "wish-list". Perhaps we should reopen this as Ludo has done some of the work to identify data and schemas to use for this testing. Mike Beckerle Apache Daffodil | daffodil.apache.org OGF DFDL Workgroup | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl ---------- Forwarded message --------- From: Ludo Visser <[email protected]> Date: Thu, Dec 18, 2025, 6:19 AM Subject: Re: Participation in DFDL ISO workgroup To: [email protected] <[email protected]> Cc: Michele Zundo <[email protected]>, Montserrat Pinol Sole < [email protected]>, Andrea Della Vecchia < [email protected]> Dear Mike, Our schema’s are publicly available on our website: https://eop-cfi.esa.int/index.php/applications/s2g-data-viewer/mission-files (or directly: https://eop-cfi.esa.int/Repo/PUBLIC/DOCUMENTATION/MISSION_DATA/TELEMETRY_SCHEMA_FILES/ ) They are .jar files for use with S2G ( https://eop-cfi.esa.int/index.php/applications/s2g-data-viewer), but they can be renamed to .zip and extracted. On the S2G page you can also find some example data files. I’ve attached here for convenience the Sentinel 1 and 3 schemas and the example data. I tried to parse e.g. the Sentinel 1 file with: ./daffodil/bin/daffodil parse -s Sentinel1X-bandTMISP.xsd -- demo_data_files/S1A_SAR_ISP.BIN This will immediately complain about this line in Sentinel1X-bandTMISPData.xsd: <xs:element name="AISR_Source_Data" type="TypeAISRSourceData" dfdl: occursCountKind="expression" dfdl:occursCount="672"/> Fixing that (adding { } around 672) then produces a long list of errors, including those I mentioned in my previous email. As mentioned, I’m very interested to get this to work, so please reach out if further clarification is needed or if I can be of help otherwise. Best regards, Ludo -- Ludo Visser EOP-PES * ESA ESTEC* *[email protected] <[email protected]>* | T +31 6 414 71061 <+31641471061> *From: *Mike Beckerle <[email protected]> *Date: *Monday, 24 November 2025 at 15:37 *To: *Ludo Visser <[email protected]> *Cc: *Michele Zundo <[email protected]>, Montserrat Pinol Sole < [email protected]>, Andrea Della Vecchia < [email protected]> *Subject: *Re: Participation in DFDL ISO workgroup If we could get the schema and a piece of example data we could dig into why it doesn't work. I'd like to first get your schema working in regular Daffodil running on the JVM just to make sure we understand the requirements. Right now expression support in the C-code generator is very partial and I'm not at all surprised it doesn't work. We do have an open ticket ( https://issues.apache.org/jira/browse/DAFFODIL-2536) on refactoring Daffodil to enable the C-code generator or other back-ends to traverse the parsed expression tree, and generally work without having to be part of Daffodil's core library. At that point adding real expression support to the C-code generator would be much easier. As far as whether your schema is "standard" DFDL, other than your path-step regex extension feature, the other thing missing is a prefix "dfdl:" on the contentLength function. The regex feature you added to DFDL4S to create these polymorphic paths has gotten some discussion and there are similar needs from different use cases. This experimental DFDL feature issue was opened: https://github.com/OpenGridForum/DFDL/issues/13. In that the regex aspect is not there so instead of "(.*)Packet_Secondary_Header" matching "some_prefix_Packet_Secondary_Header" one would instead write "*/Packet_Secondary_Header" matching "some_prefix/Packet_Seconary_Header" which is closer to ordinary XPath wildcards. This changes the schema so that instead of multiple fields with a shared prefix in their names those become sub-fields within a common parent field and the shared name prefix becomes the name of the common parent. On Mon, Nov 24, 2025 at 3:53 AM Ludo Visser <[email protected]> wrote: Hi Michele, I had some time to play around with Daffodil. Unfortunately, and somewhat unexpected, the tool does not accept any of our schemas. Some issues seem to be related to Daffodil being a bit stricter than our tools. For example, an attribute like “truncateSpecifiedLengthString” is only used in unparsing, but Daffodil requires it to be present even when parsing a file, while our tools happily accept it being omitted. Such issues can be readily resolved. More problematic are features that are not (yet) implemented by Daffodil. For example, we make have use of the following construct to dynamically compute element lengths based on information found in other nodes: <xs:element name="Data" type="xs:hexBinary” dfdl:lengthKind="explicit" dfdl:lengthUnits="bytes" dfdl:length="{/Packet_Primary_Header/Packet_Data_Length + 1 - contentLength(/Packet_Data_Field/(.*)Packet_Secondary_Header,'bytes') - 2}" dmx:representation="Hexadecimal”> </xs:element> I’m not entirely sure if the expression is 100% compliant with the specification, especially the regex part, but in any case, Daffodil is unable to parse this even with the regex removed. Unfortunately, due to these issues, I was unable to generate the C code and benchmark the tool against our tools, but I am interested to see if we can find a way to make Daffodil and our DFDL4S schemas work together. Best regards, Ludo -- *Satellite System Analysis Engineering Service* *Service Delivered by Starion Nederland B.V. for the European Space Agency* Ludo Visser Akkodis Netherlands International B.V. EOP-P * ESTEC* *[email protected] <[email protected]>* T +31 6 414 71061 *From: *Michele Zundo <[email protected]> *Date: *Friday, 4 July 2025 at 10:10 *To: *[email protected] <[email protected]> *Cc: *Ludo Visser <[email protected]>, Montserrat Pinol Sole < [email protected]>, Andrea Della Vecchia < [email protected]> *Subject: *Re: Participation in DFDL ISO workgroup Hi Mike, thanks. I was not aware. I will ask my colleagues to have a look at generating C code for one of our schema and see how fast it goes. Regards Michele *From: *Mike Beckerle <[email protected]> *Date: *Thursday, 3 July 2025 at 18:32 *To: *Michele Zundo <[email protected]> *Cc: *Ludo Visser <[email protected]>, Montserrat Pinol Sole < [email protected]>, Andrea Della Vecchia < [email protected]> *Subject: *Re: Participation in DFDL ISO workgroup Thanks for this Michele, Are you aware of the C-code generator sub-effort for Daffodil? It is very partial currently, but converts DFDL schemas to C code for parse/unparse. Creates a C struct infoset object corresponding to the logical schema. Most recent status is this: *https://daffodil.apache.org/dev/design-notes/daffodilc-todos/ <https://daffodil.apache.org/dev/design-notes/daffodilc-todos/>* I know the group that initiated this had very high speed as their goal, and even generated System Verilog or VHDL to directly implement the parse/unparse on FPGA logic though that aspect was not made open-source. On Thu, Jul 3, 2025 at 6:07 AM Michele Zundo <*[email protected] <[email protected]>*> wrote: Hi Mike, Some brief comments from our side, taking into account our specific use case. Indeed, requiring an XML/XSD parser is generally seen as heavy by all our collaborators. While DFDL offers great flexibility, we’ve found its performance suboptimal—making it suitable for tooling purposes but not fast enough for use within high-performance data processors that need to read binary data streams efficiently. In such contexts, manual coding of readers is still the preferred approach. What would be genuinely helpful is a *code generator* that could take a DFDL schema and generate optimised read() and write() functions, effectively hardcoding the logic for that specific schema to improve execution speed. On the functional side, there are several aspects of XSD/DFDL that we appreciate: · The same schema used with DFDL can also be used to read/write data in XML or other formats. We rely on XSD as a central *data model*, and we benefit from standard XSD validation tools. · Using XML as a descriptive layer for mapping the logical structure to a database is quite standard and well supported. · The syntax is rich and expressive, supporting *conditional structures*, *variants*, and *polymorphism*—which we use extensively, especially for CCSDS-based data where the structure depends on the value of specific fields. · *Assertions* can be embedded in the schema itself in a standardised way, allowing inline value checks. · A wide range of tools support schema editing and visualisation. For example, we use *Oxygen XML Editor* on macOS (similar to *XMLSpy* on Windows) for maintaining our schemas. · Through the XSD include mechanism, we can modularise and maintain specific data components in separate files. This allows us to update only the changed elements without affecting the full structure. · Finally, and importantly, we have full *bit-level control* over the binary encoding through our XSD-defined DFDL schemas, which is crucial for our applications. There is currently a trend towards adopting lightweight languages like JSON. Personally, I remain unconvinced that JSON is strict or expressive enough for our needs—especially when it comes to polymorphism and conditional structures. While support for such features is being added to JSON schema, doing so risks undermining the original simplicity and purpose of the format. We’ve never faced issues with the size of the XSD schemas themselves—they are generally compact. Nor do we typically serialise binary data to XML, so the verbosity of XML isn’t a problem for us. Our use of DFDL is aimed at enabling machine-to-machine communication, specifically to allow algorithms to read and write binary data reliably and consistently. >From this perspective, it’s not entirely clear how the use of *EXI* would improve our workflow. Our primary goal is *not* to make XML compact, but rather to retain precise control over the * bit-level encoding*, which is not the focus of EXI. Best regards, *Michele*
