With nifi-daffodil-impl-nar approach, there is only one possible version of Daffodil for all of NiFi. There is no way to have multiple flows with different versions of Daffodil.

The way things currently are, where each nifi-daffodil nar embeds a specific Daffodil version, you can have different flows that use different Daffodil versions. You just have to add multiple nifi-daffodil nars to NiFi and when you add a processor you can select which version of the processor you want. You just have to know which nifi-processor version maps to which Daffodil version, but the new README matrix makes that more clear. Note that this also works for Daffodil plugins--since each processor defines where to look for its Daffodil plugins, you just need separate directories for different plugin versions and configure the processor accordingly.

So it sounds like we do not want to switch to the alternative approach. It means you can't use newer nifi-daffodil processor features with older versions of Daffodil, but I'm not sure that's a big limitation since new features are rarely added.

On 2025-09-03 11:29 AM, Mike Beckerle wrote:

  From a user perspective, they now need to install two nars:
nifi-daffodil-nar
and *a nifi-daffodil-impl-nar version of their choosing*. So it's not too
much
effort, but does complicate things a bit.


Your phrase "a nifi-daffodil-impl-nar version of their choosing" suggests
singular, as in one NiFi flow = one daffodil version.
Or did you mean per NiFi Daffodil processor instance? So one NiFi flow
could be using many different Daffodil versions?

Indulge me either way to restate what I think is the fully general use
case.

    - A single NiFi flow needs to be able to use inputs from multiple
    different data formats for which the DFDL schemas require distinct versions
    of Daffodil for both DFDL language compatibility reasons, and distinct
    incompatible versions of plugins - even distinct versions of the same
    plugins.
    - A concrete example could be this:
       - A NiFi flow processes PCAP files. There are XSLT rules in the flow
       written to deal with Version X of the DFDL PCAP schema. This uses a
       particular EthernetIP schema which contains a Daffodil 3.7.0 layer plugin
       to compute IPv4 checksums.
       - There are other XSLT rules in the same flow written to deal with
       Version Y of the PCAP schema, which has distinct element names
and element
       nesting, so those rules must be used with that other PCAP schema. This
       version of PCAP depends on a particular EthernetIP schema which has a
       Daffodil 4.0.0 layer plugin.
       - The same PCAP file is fed as an input to BOTH Daffodil NiFI
       processors in the same flow, feeding separate downstream XSLT processors,
       so as to determine if both sets of XSLT rules are capable of producing
       output. If both are able to produce output, then the output of
the old XSLT
       is discarded and the output from the new XSLT is passed on in the flow.

I think this is not even an unlikely corner case scenario. We know of DFDL
users with 300+ DFDL schemas, presumably all tested and known working on a
particular version of Daffodil. They are unlikely to be willing to
requalify all of these for a new Daffodil version. But new DFDL schemas
they are creating will want to be for a newer (current) Daffodil version,
and some of these will be updates to existing DFDL schemas for new versions
of those formats, which need to co-exist in the same NiFi flow (though on
different paths within that flow) with the older versions of those formats,
and hence use the older schema which requires the older Daffodil version.


On Wed, Sep 3, 2025 at 9:57 AM Steve Lawrence <[email protected]> wrote:

Yeah, NiFi comes with slightly different issues. NiFi .nars usually embed
all
their dependencies, so when we build a NiFi .nar that uses Daffodil, that
version of the .nar is stuck with that version of Daffodil. This means
newer
nars with newer features cannot be used with older versions of Daffodil.
In
general this isn't an issue because we rarely add new features to the
nar--usually the only change in a new .nar version is bumping the Daffodil
version.

I believe NiFi has a way to deal with this, but it's a little bit more
complicated.

The idea is we would create a nar for every version of Daffodil we want to
support, and that nar literally just contains Dafodil jars and transitive
dependencies, no actual processor logic. We build a new one for every
version of
daffodil, for example

nifi-daffodil-impl-nar-4.0.0.nar
nifi-daffodil-impl-nar-3.11.0.nar
nifi-daffodil-impl-nar-3.10.0.nar
...

Then we have a single nar that provides the actually processor logic,
called
nifi-daffodil-nar. This version only changes when we add a new feature to
the
processor, nothing changes for new Daffodil releases. This new nar defines
a nar
dependency to "nifi-daffodil-impl" so that the nar can share the classpath
with
nifi-daffodil-impl-nar-XXX and find the daffodil nars. It would also need
to use
reflection to deal with any API differences between daffodil versions,
though
this is fairly minimal, and we already do something similar in our SBT
plugin.

  From a user perspective, they now need to install two nars:
nifi-daffodil-nar
and a nifi-daffodil-impl-nar version of their choosing. So it's not too
much
effort, but does complicate things a bit.

We can build these nars so the work on any NiFi version (we actually
already do
that, so that's not an issue).

I think the question is how important is it be able to use newer
nifi-daffodil
features with older versions of Daffodil, and if all this extra
complication is
worth it. The last feature added to NiFi Daffodil was in v1.15 (added
support
for setting external variables), and that uses Daffodil 3.6.0 which is
pretty
old, so I'm not sure many users would need this, they just need to install
the
right version of the processor for the version of Daffodil they want. In
the
past this has been hard to actually know which version of Daffodil the
processor
uses, but I've added a matrix to the processor README which should help to
make
it more clear:


https://github.com/OwlCyberDefense/nifi-daffodil?tab=readme-ov-file#nifi-compatibility

So in most cases users can just find the version of Daffodil they want in
that
list and install the associated processor.



On 2025-09-03 09:07 AM, Sood, Harinder wrote:
We also need to consider how to provide releases for different Ni-FI
versions

Sincerely,
   Harinder Sood


   Senior Program Manager
    [email protected]
    240 805 4219
    owlcyberdefense.com

The information contained in this transmission is for the personal and
confidential use of the individual or entity to which it is addressed. If
the reader is not the intended recipient, you are hereby notified that any
review, dissemination, or copying of this communication is strictly
prohibited. If you have received this transmission in error, please notify
the sender immediately.

-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Wednesday, September 3, 2025 8:59 AM
To: [email protected]
Subject: Re: [DISCUSS] Release Apache Daffodil v4.0.0 and Apache
Daffodil SBT Plugin v1.5.0

Agreed about the need for testing multiple versions.

It might be convenient if we could come up with a solution that doesn't
require multiple branches. Multiple branches requires extra work to keep
the branches in sync, and also makes it difficult to share common code or
tests.

I'm imagining a project structure where there could be multiple
sub-projects, each one designed for a specific Daffodil version. The
projects would share the src and test directories, so would only differ in
things like dependencies to Daffodil or plugins.

And for situations where code differences are needed between versions
(e.g.
plugins) we could have daffodil version specific directories, e.g.

src/main/scala                # code shared between all daffodil versions
src/main/scala-daffodil3110   # code only used for daffodil 3.11.0 sub
project
src/main/scala-daffodil400    # code only used for daffodil 4.0.0 sub
project

This would also work for src/main/resources and src/test.

This is kindof similar to how sbt supports projects building with Scala
2.x vs 3.x, and I think is not too difficult to do in SBT plugins.

To support Daffodil plugins, I think we could have a new sbt-daffodil
setting that defines daffodil plugin dependencies and each subproject would
mutate that to depend on the right version.

So an alternative SBT configuration might look something like this:

     name := "myFormat"

     version := "1.0.0"

     libraryDependencies := Seq(...) // normal non-plugin dependencies

     enablePlugin(DaffodilPlugin)

     daffodlPluginDependencies := Seq(
       "com.example.layers" %% "checksum" % "1.0.0"
     )

     daffodilProjectVersion("3.11.0")

     daffodilProjectVersion("4.0.0")


That would create subprojects for 3.11.0 and 4.0.0, so you could do
something like:

     sbt daffodil3110/test

     sbt daffodil400/test

And something like "sbt publish" would publish the main schema jar and
saved parsers if configured.

Each of those subprojects would depend on the "checksum" plugin but they
would depend on a slightly modified names (e.g. "checksum_daffodil3110" vs
"checksum_daffodil400"). Additional features would be needed to publish
plugins with multiple versions and mutate the name to match. I imagine that
would work similar to the multiple subprojects.

I think there's still alot of details to work out, and it needs some
testing to figure out if it will actually work, but I think in theory it's
possible to not require branches so common code and tests can be easily
shared and all tests run without needing to change branches. And I think it
could be done without too much boilerplate.

- Steve


On 2025-09-03 08:17 AM, Mike Beckerle wrote:
+1 for creating a 4.0.0 release

To me the biggest thing needed is not anything to hold up the release,
it's documentation on how to evolve a DFDL schema project on github
(or similar) and package server (e.g., like Artifactory) in such a way
that it can be maintained to work with Daffodil 3.7.0, 3.10.0, 3.11.0,
and 4.0.0 (and
subsequent) releases simultaneously, easily rebuilt and tested in
regression, etc. I pick those because the API (specifically packages)
changed after 3.7.0, 3.10.0 is the last of the Scala 2.12 releases,
3.11.0 is Scala 2.13, and 4.0.0 is Scala 3.

There are two scenarios - First is a pure schema - single component.
The DFDLSchemas FakeTDL is one such.

The second is a complex multi-component schema.  The PCAP schema on
DFDLSchemas is such. It uses EthernetIP as a component and that has a
Daffodil layer extension to compute IPv4 checksums. This should
illustrate all the challenges.

I think this methodology is going to require use of multiple git
branches, since there are clearly code changes required for the
layers. I think these use only standard TDML tests however, so ad-hoc
test rigs shouldn't muddy the waters. But despite these git branches,
much of the schema will be independent of them, and an objective
should be to share what can be shared so as not to do too much
cross-branch duplication of changes.







On Tue, Sep 2, 2025 at 3:03 PM Steve Lawrence <[email protected]>
wrote:

Hi all,

Although Daffodil 3.11.0 was release fairly recently, the current
main branch of Daffodil has a number of major improvements, including
switching to Scala 3, dropping official support for Java 8 and 11, a
much improved API, and a number of important bug fixes. With these
major changes complete, I think now would be a good time to release
Daffodil 4.0.0.

For reference, here is a page that describes all the breaking changes
and how to migrate to the new API:

https://daffodil.apache.org/migration-guides/4.0.0/

We should also plan to release the Daffodil SBT plugin 1.5.0
concurrently, since it has modifications needed to work with Daffodil
4.0.0.

There are a dozen or so open pull requests to update dependencies
that I think we can merge in the coming days in time for 4.0.0.
Please let us know if there are any other issues you think should be
fixed.

If there are no additional changes or objections, I will volunteer as
the release manager and plan to start the vote on Monday, Sep 8.

Thanks,
- Steve







Reply via email to