I ran into an interesting aspect of this today.

We have a extendedCharsets project which was compiled using Daffodil 3.8.0

The jar file is, however, not marked as being specific to Daffodil 3.8.0.
It is just extendedcharsets_2.12.jar
This is marked as scala 2.12 required, but not Daffodil 3.8 is required.
The information is in the dependency info stored next to the jar file, but
once I retrieve the jar file, there's no telling by looking at the jar file
itself.

So, if including the jar into some sort of nar/war file package, I need to
utilize the dependency information to know what the jars for plugins like
layers, charsets, and UDFs are compiled for.

Turns out it pulls in a dependency (not sure why) on the daffodil-udf
library, and that dependency is specifically to the 3.8.0 version of that
library.

So my schema which uses the extendedCharsets I was building with daffodil
3.9.0.

The dependency on daffodil 3.8.0 was hidden until we started doing the:

setDaffodilClasspath() {
  export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath")
  echo "DAFFODIL_CLASSPATH is set to: $DAFFODIL_CLASSPATH"
}

Looking at the resulting DAFFODIL_CLASSPATH, the only daffodil jar on it
was the daffodil-udf jar for daffodil 3.8.0.

This seems quite problematic.

I'd like to be able to compile multiple versions of this extendedCharsets
schema, for 3.8.0, 3.9.0, etc. and have them co-resident in our artifactory
as reusable components.

I don't know how to achieve that. I mean I could abuse the version
numbering to create versions 1.1.380, and 1.1.390, etc. but I don't like it
really.




On Fri, Oct 25, 2024 at 12:33 PM Steve Lawrence <slawre...@apache.org>
wrote:

> Agreed, I don't think it necessarily *must* be added to 4.0, since it
> wouldn't
> have any backwards compatibility concerns, but it would be a really useful
> sooner rather than later.
>
> It's probably worth a discussion on how it might be implemented. As I
> recall,
> java doesn't really make the classpath contents available, all the magic
> to find
> things on the classpath is done by ClassLoaders, which don't even need to
> be
> backed by jars.
>
> So my first thought is to make it so the compile* API functions accept an
> optional list of jars. And we would create a custom ClassLoader that makes
> those
> jars available during compilation/parse/unparse. When saving a
> DataProcessor we
> would copy those jars into the saved parser binary. And when reloading a
> saved
> parser we would extract those jars, recreate the custom ClassLoader, and
> use
> that ClassLoader for parse/unparse operations. Overall I imagine the
> modifications wouldn't be too big.
>
> The daffodil-sbt plugin already has a list of dependency jars, so using
> the new
> compile API and passing those in should be straightforward.
>
> We would also probably want a new CLI option. Whenever the -s option is
> used we
> could also support a new option (maybe -j/--jar ?), which is just a list
> of
> plugins and dependency jars. E.g.
>
>    daffodil save-parser -s /org/example/message.dfdl.xsd \
>      --jar schema-jars/*.jar \
>      > message.bin
>
> It makes the save-parser command a bit more complex, but it completely
> avoids
> having to deal with the classpath at all. And the added complexity is only
> in
> save-parser, unparse/parse wouldn't need it when using a saved parser. And
> with
> the daffodil-sbt plugin now becoming more standard, that command is
> becoming
> less and less needed.
>
> For backwards compatibility, the custom ClassLoader could fall back to the
> real
> classpath if it fails to finding a resource/class, so we could still
> support
> DAFFODIL_CLASSPATH and normal classpath stuff if people want, it just
> wouldn't
> be able to include dependencies in the saved parsers.
>
> I'm sure there's plenty of other things to think about, but something like
> this
> seems like it would be a big usability improvement now that
> layers/charsets/etc
> are getting more and more common. It also means systems that don't easily
> support installing dependency jars using these saved parsers would just
> work.
>
>
> On 2024-10-25 11:24 AM, Adams, Joshua wrote:
> >> All that said, I think what we really need is a way to move about from
> requiring
> >> DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies
> in the
> >> actual saved parser file, then using that saved parser wouldn't need any
> >> classpath modifications, it's just all already there and Daffodil
> internals
> >> would uses those embedded dependencies for class lookups. I'm unsure of
> exactly
> >> how to do that, but it's definitely possible--NiFi does something very
> similar
> >> with it's "nar" format
> >
> > Changing the saved parser format to include classpath JAR's sounds like
> something perfect for a 4.0.0 release of Daffodil.
> >
> > Josh
> > ________________________________
> > From: Steve Lawrence <slawre...@apache.org>
> > Sent: Thursday, October 24, 2024 9:48 AM
> > To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> > Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema?
> >
> > I've also just opened a PR to daffodil-sbt to fix the bug that causes
> "show
> > fullClasspath" and "export fullClasspath" to include daffodil
> dependencies:
> >
> > https://github.com/apache/daffodil-sbt/pull/64
> >
> >
> > On 2024-10-24 08:24 AM, Steve Lawrence wrote:
> >> Instead of `sbt dependencyTree`, you can run `sbt "show fullClasspath"`
> to
> >> output all the dependencies that `packageDaffodilBin` uses. You can
> also run
> >> `sbt "export fullClasspath"` to get an actual classpath string that you
> can drop
> >> into DAFFODIL_CLASSPATH. In one line, I think you could do:
> >>
> >> export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath")
> >>
> >> Note that the -batch and -error are needed to disable [info] and other
> output
> >> messages.
> >>
> >> Also note that this includes the scala dependency and I think we might
> have a
> >> bug in daffodil-sbt that causes it to also include Daffodil
> dependencies if any
> >> schemas are layers/charsets/etc. I *think* the way the Daffodil CLI
> builds up
> >> the classpath those extra dependencies will all be ignored, but if not
> you might
> >> have to manually build up the classpath with just the paths you want.
> >>
> >> If we want we could add a special sbt task that essentially mimics this
> >> behavior, but I'd rather we just document this magic export command
> somewhere so
> >> we don't have to maintain it.
> >>
> >>
> >> The Daffodil synonm is an interesting idea. I guess it would just set
> fork the
> >> daffodil script with DAFFODIL_CLASSPATH already set, and just pass any
> task
> >> arguments to the script? I think that's possible in SBT, but I think
> the export
> >> magic above is a bit more flexible and efficient since you don't need
> to keep
> >> sbt running to run a Daffodil command.
> >>
> >>
> >> All that said, I think what we really need is a way to move about from
> requiring
> >> DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies
> in the
> >> actual saved parser file, then using that saved parser wouldn't need any
> >> classpath modifications, it's just all already there and Daffodil
> internals
> >> would uses those embedded dependencies for class lookups. I'm unsure of
> exactly
> >> how to do that, but it's definitely possible--NiFi does something very
> similar
> >> with it's "nar" format.
> >>
> >>
> >> On 2024-10-23 06:00 PM, Mike Beckerle wrote:
> >>> I am trying to go from 'sbt test' to a schema I can play with from the
> >>> daffodil CLI.
> >>>
> >>> The schema of interest is the DFDLSchemas envelope-payload example.
> >>>
> >>> This schema depends on
> >>> * tcpMessage
> >>> * mil-std-2045
> >>> * pcap
> >>>
> >>> The pcap schema in turn depends on
> >>> * ethernetIP
> >>>
> >>> The ethernetIP schema defines a Daffodil layer plugin that exists in
> its
> >>> jar.
> >>>
> >>> So far if I clone these all, and 'sbt publishLocal' all of the
> components,
> >>> then I can 'sbt test' in envelope-payload and it passes all tests.
> >>>
> >>> So now I'd like to do 'daffodil save-parser
> >>> -s src/main/resources/io/github/dfdlschemas/envelopepayload/xsd/
> >>> envelopePayload.dfdl.xsd
> >>> -o /tmp/envPay.bin'
> >>>
> >>> By adding these to the build.sbt
> >>>
> >>> daffodilPackageBinVersions := Seq(daffodilVersion.value)
> >>> daffodilPackageBinInfos := Seq(
> >>>     ("/io/github/dfdlschemas/tcpMessage/xsd/tcpMessage.dfdl.xsd",
> >>> Some("message"), None)
> >>> )
> >>>
> >>> Then 'sbt packageDaffodilBin' will create a compiled schema under the
> >>> target/ directory named:
> >>>
> >>> dfdl-envelope-payload-1.1.0-daffodil390.bin
> >>>
> >>> So far so good.
> >>>
> >>> Now the challenge.
> >>>
> >>> But this can't be used with the daffodil CLI without also setting up
> >>> DAFFODIL_CLASSPATH to have at least the ethernetIP jar file.
> >>>
> >>> Which is where? (yes I know in the ~/.ivy2/local cache, but there is
> tons
> >>> of stuff in there.)
> >>>
> >>> If I want to use xerces (aka full) validation then I also have to have
> all
> >>> the other component schema jar files on the DAFFODIL_CLASSPATH as well.
> >>>
> >>> I tried issuing 'sbt dependencyTree', but the dependency tree is not
> just
> >>> the schemas, but all the dependencies on daffodil and everything
> >>> transitively it uses.
> >>>
> >>> This is much too hard.
> >>>
> >>> Why not have the daffodil-sbt plugin output a shell script that
> appends to
> >>> DAFFODIL_CLASSPATH with all the necessary component schema jars.
> >>>
> >>> Then users could just run that script to establish the
> DAFFODIL_CLASSPATH
> >>> once, and then they could use the daffodil CLI normally.
> >>>
> >>> In principle, a file could also be written intended to inform the
> daffodil
> >>> VSCode extension of the classpath to construct for the schema.
> >>>
> >>> Thoughts?
> >>>
> >>> One possible alternative is to add a daffodil command to sbt via the
> plugin
> >>> so that one can run daffodil command lines directly from the sbt
> prompt:
> >>>
> >>> sbt
> >>>> daffodil parse -p target/...bin foo.dat
> >>>
> >>> This is really just a synonym for issuing sbt run for the Main class
> of the
> >>> daffodil-cli module, (after doing packageDaffodilBin) so might be very
> easy
> >>> to do.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Mike Beckerle
> >>> Apache Daffodil PMC | daffodil.apache.org
> >>> OGF DFDL Workgroup Co-Chair |
> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<
> http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl>
> >>> Owl Cyber Defense | www.owlcyberdefense.com<
> http://www.owlcyberdefense.com>
> >>>
> >>
> >
> >
>
>

Reply via email to