You shouldn't need to recompile a charset plugin for new versions of Daffodil as long as it's relatively recent.

The Daffodil charset API hasn't changed since 3.4.0, so a charset compiled with 3.4.0 or newer *should* work for all versions of Daffodil 3.4.0 or newer.

The Daffodil layer API was more recently updated in 3.8.0, so a layer built with 3.8.0 or 3.9.0 will work with either, as well as future versions of Daffodil.

The Daffodil UDF API hasn't been updated since I think pre-3.0.0. So all UDF's are pretty much universally supported as long as you're not using an ancient Daffodil version.

This is all to say that I think we are at a point where our plugin API's are stable and a plugin built with a modern-ish version of Daffodil should work with all future versions of Daffodil. There really isn't a need to publish a 3.8.0 version and a 3.9.0 version of plugins anymore. You *might* want to do that for older versions of plugins that to be used with older versions of Daffodil (e.g. a layer written for Daffodil 3.7.0), but modern plugins shouldn't need that and we should strive to move away from it.

I think if we ever do need to change one of these API's, we'll need to be better about maintaining backwards compatibility, so that we don't run into problems where we need to maintain different plugin versions for different daffodil versions. That just isn't maintainable and is a big headache.

Also, the daffodil-udf/etc. dependency stuff should be fixed in the latest development version of the daffodil-sbt plugin. So plugins no longer add transitive dependencies to the Daffodil version they were built with. We may want to plan a release for that sometime so those dependencies don't get accidentally pulled in.


On 2024-10-25 02:02 PM, Mike Beckerle wrote:
I ran into an interesting aspect of this today.

We have a extendedCharsets project which was compiled using Daffodil 3.8.0

The jar file is, however, not marked as being specific to Daffodil 3.8.0.
It is just extendedcharsets_2.12.jar
This is marked as scala 2.12 required, but not Daffodil 3.8 is required.
The information is in the dependency info stored next to the jar file, but
once I retrieve the jar file, there's no telling by looking at the jar file
itself.

So, if including the jar into some sort of nar/war file package, I need to
utilize the dependency information to know what the jars for plugins like
layers, charsets, and UDFs are compiled for.

Turns out it pulls in a dependency (not sure why) on the daffodil-udf
library, and that dependency is specifically to the 3.8.0 version of that
library.

So my schema which uses the extendedCharsets I was building with daffodil
3.9.0.

The dependency on daffodil 3.8.0 was hidden until we started doing the:

setDaffodilClasspath() {
   export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath")
   echo "DAFFODIL_CLASSPATH is set to: $DAFFODIL_CLASSPATH"
}

Looking at the resulting DAFFODIL_CLASSPATH, the only daffodil jar on it
was the daffodil-udf jar for daffodil 3.8.0.

This seems quite problematic.

I'd like to be able to compile multiple versions of this extendedCharsets
schema, for 3.8.0, 3.9.0, etc. and have them co-resident in our artifactory
as reusable components.

I don't know how to achieve that. I mean I could abuse the version
numbering to create versions 1.1.380, and 1.1.390, etc. but I don't like it
really.




On Fri, Oct 25, 2024 at 12:33 PM Steve Lawrence <slawre...@apache.org>
wrote:

Agreed, I don't think it necessarily *must* be added to 4.0, since it
wouldn't
have any backwards compatibility concerns, but it would be a really useful
sooner rather than later.

It's probably worth a discussion on how it might be implemented. As I
recall,
java doesn't really make the classpath contents available, all the magic
to find
things on the classpath is done by ClassLoaders, which don't even need to
be
backed by jars.

So my first thought is to make it so the compile* API functions accept an
optional list of jars. And we would create a custom ClassLoader that makes
those
jars available during compilation/parse/unparse. When saving a
DataProcessor we
would copy those jars into the saved parser binary. And when reloading a
saved
parser we would extract those jars, recreate the custom ClassLoader, and
use
that ClassLoader for parse/unparse operations. Overall I imagine the
modifications wouldn't be too big.

The daffodil-sbt plugin already has a list of dependency jars, so using
the new
compile API and passing those in should be straightforward.

We would also probably want a new CLI option. Whenever the -s option is
used we
could also support a new option (maybe -j/--jar ?), which is just a list
of
plugins and dependency jars. E.g.

    daffodil save-parser -s /org/example/message.dfdl.xsd \
      --jar schema-jars/*.jar \
      > message.bin

It makes the save-parser command a bit more complex, but it completely
avoids
having to deal with the classpath at all. And the added complexity is only
in
save-parser, unparse/parse wouldn't need it when using a saved parser. And
with
the daffodil-sbt plugin now becoming more standard, that command is
becoming
less and less needed.

For backwards compatibility, the custom ClassLoader could fall back to the
real
classpath if it fails to finding a resource/class, so we could still
support
DAFFODIL_CLASSPATH and normal classpath stuff if people want, it just
wouldn't
be able to include dependencies in the saved parsers.

I'm sure there's plenty of other things to think about, but something like
this
seems like it would be a big usability improvement now that
layers/charsets/etc
are getting more and more common. It also means systems that don't easily
support installing dependency jars using these saved parsers would just
work.


On 2024-10-25 11:24 AM, Adams, Joshua wrote:
All that said, I think what we really need is a way to move about from
requiring
DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies
in the
actual saved parser file, then using that saved parser wouldn't need any
classpath modifications, it's just all already there and Daffodil
internals
would uses those embedded dependencies for class lookups. I'm unsure of
exactly
how to do that, but it's definitely possible--NiFi does something very
similar
with it's "nar" format

Changing the saved parser format to include classpath JAR's sounds like
something perfect for a 4.0.0 release of Daffodil.

Josh
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Thursday, October 24, 2024 9:48 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema?

I've also just opened a PR to daffodil-sbt to fix the bug that causes
"show
fullClasspath" and "export fullClasspath" to include daffodil
dependencies:

https://github.com/apache/daffodil-sbt/pull/64


On 2024-10-24 08:24 AM, Steve Lawrence wrote:
Instead of `sbt dependencyTree`, you can run `sbt "show fullClasspath"`
to
output all the dependencies that `packageDaffodilBin` uses. You can
also run
`sbt "export fullClasspath"` to get an actual classpath string that you
can drop
into DAFFODIL_CLASSPATH. In one line, I think you could do:

export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath")

Note that the -batch and -error are needed to disable [info] and other
output
messages.

Also note that this includes the scala dependency and I think we might
have a
bug in daffodil-sbt that causes it to also include Daffodil
dependencies if any
schemas are layers/charsets/etc. I *think* the way the Daffodil CLI
builds up
the classpath those extra dependencies will all be ignored, but if not
you might
have to manually build up the classpath with just the paths you want.

If we want we could add a special sbt task that essentially mimics this
behavior, but I'd rather we just document this magic export command
somewhere so
we don't have to maintain it.


The Daffodil synonm is an interesting idea. I guess it would just set
fork the
daffodil script with DAFFODIL_CLASSPATH already set, and just pass any
task
arguments to the script? I think that's possible in SBT, but I think
the export
magic above is a bit more flexible and efficient since you don't need
to keep
sbt running to run a Daffodil command.


All that said, I think what we really need is a way to move about from
requiring
DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies
in the
actual saved parser file, then using that saved parser wouldn't need any
classpath modifications, it's just all already there and Daffodil
internals
would uses those embedded dependencies for class lookups. I'm unsure of
exactly
how to do that, but it's definitely possible--NiFi does something very
similar
with it's "nar" format.


On 2024-10-23 06:00 PM, Mike Beckerle wrote:
I am trying to go from 'sbt test' to a schema I can play with from the
daffodil CLI.

The schema of interest is the DFDLSchemas envelope-payload example.

This schema depends on
* tcpMessage
* mil-std-2045
* pcap

The pcap schema in turn depends on
* ethernetIP

The ethernetIP schema defines a Daffodil layer plugin that exists in
its
jar.

So far if I clone these all, and 'sbt publishLocal' all of the
components,
then I can 'sbt test' in envelope-payload and it passes all tests.

So now I'd like to do 'daffodil save-parser
-s src/main/resources/io/github/dfdlschemas/envelopepayload/xsd/
envelopePayload.dfdl.xsd
-o /tmp/envPay.bin'

By adding these to the build.sbt

daffodilPackageBinVersions := Seq(daffodilVersion.value)
daffodilPackageBinInfos := Seq(
     ("/io/github/dfdlschemas/tcpMessage/xsd/tcpMessage.dfdl.xsd",
Some("message"), None)
)

Then 'sbt packageDaffodilBin' will create a compiled schema under the
target/ directory named:

dfdl-envelope-payload-1.1.0-daffodil390.bin

So far so good.

Now the challenge.

But this can't be used with the daffodil CLI without also setting up
DAFFODIL_CLASSPATH to have at least the ethernetIP jar file.

Which is where? (yes I know in the ~/.ivy2/local cache, but there is
tons
of stuff in there.)

If I want to use xerces (aka full) validation then I also have to have
all
the other component schema jar files on the DAFFODIL_CLASSPATH as well.

I tried issuing 'sbt dependencyTree', but the dependency tree is not
just
the schemas, but all the dependencies on daffodil and everything
transitively it uses.

This is much too hard.

Why not have the daffodil-sbt plugin output a shell script that
appends to
DAFFODIL_CLASSPATH with all the necessary component schema jars.

Then users could just run that script to establish the
DAFFODIL_CLASSPATH
once, and then they could use the daffodil CLI normally.

In principle, a file could also be written intended to inform the
daffodil
VSCode extension of the classpath to construct for the schema.

Thoughts?

One possible alternative is to add a daffodil command to sbt via the
plugin
so that one can run daffodil command lines directly from the sbt
prompt:

sbt
daffodil parse -p target/...bin foo.dat

This is really just a synonym for issuing sbt run for the Main class
of the
daffodil-cli module, (after doing packageDaffodilBin) so might be very
easy
to do.










Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair |
www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<
http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl>
Owl Cyber Defense | www.owlcyberdefense.com<
http://www.owlcyberdefense.com>








Reply via email to