I ran into an interesting aspect of this today. We have a extendedCharsets project which was compiled using Daffodil 3.8.0
The jar file is, however, not marked as being specific to Daffodil 3.8.0. It is just extendedcharsets_2.12.jar This is marked as scala 2.12 required, but not Daffodil 3.8 is required. The information is in the dependency info stored next to the jar file, but once I retrieve the jar file, there's no telling by looking at the jar file itself. So, if including the jar into some sort of nar/war file package, I need to utilize the dependency information to know what the jars for plugins like layers, charsets, and UDFs are compiled for. Turns out it pulls in a dependency (not sure why) on the daffodil-udf library, and that dependency is specifically to the 3.8.0 version of that library. So my schema which uses the extendedCharsets I was building with daffodil 3.9.0. The dependency on daffodil 3.8.0 was hidden until we started doing the: setDaffodilClasspath() { export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath") echo "DAFFODIL_CLASSPATH is set to: $DAFFODIL_CLASSPATH" } Looking at the resulting DAFFODIL_CLASSPATH, the only daffodil jar on it was the daffodil-udf jar for daffodil 3.8.0. This seems quite problematic. I'd like to be able to compile multiple versions of this extendedCharsets schema, for 3.8.0, 3.9.0, etc. and have them co-resident in our artifactory as reusable components. I don't know how to achieve that. I mean I could abuse the version numbering to create versions 1.1.380, and 1.1.390, etc. but I don't like it really. On Fri, Oct 25, 2024 at 12:33 PM Steve Lawrence <slawre...@apache.org> wrote: > Agreed, I don't think it necessarily *must* be added to 4.0, since it > wouldn't > have any backwards compatibility concerns, but it would be a really useful > sooner rather than later. > > It's probably worth a discussion on how it might be implemented. As I > recall, > java doesn't really make the classpath contents available, all the magic > to find > things on the classpath is done by ClassLoaders, which don't even need to > be > backed by jars. > > So my first thought is to make it so the compile* API functions accept an > optional list of jars. And we would create a custom ClassLoader that makes > those > jars available during compilation/parse/unparse. When saving a > DataProcessor we > would copy those jars into the saved parser binary. And when reloading a > saved > parser we would extract those jars, recreate the custom ClassLoader, and > use > that ClassLoader for parse/unparse operations. Overall I imagine the > modifications wouldn't be too big. > > The daffodil-sbt plugin already has a list of dependency jars, so using > the new > compile API and passing those in should be straightforward. > > We would also probably want a new CLI option. Whenever the -s option is > used we > could also support a new option (maybe -j/--jar ?), which is just a list > of > plugins and dependency jars. E.g. > > daffodil save-parser -s /org/example/message.dfdl.xsd \ > --jar schema-jars/*.jar \ > > message.bin > > It makes the save-parser command a bit more complex, but it completely > avoids > having to deal with the classpath at all. And the added complexity is only > in > save-parser, unparse/parse wouldn't need it when using a saved parser. And > with > the daffodil-sbt plugin now becoming more standard, that command is > becoming > less and less needed. > > For backwards compatibility, the custom ClassLoader could fall back to the > real > classpath if it fails to finding a resource/class, so we could still > support > DAFFODIL_CLASSPATH and normal classpath stuff if people want, it just > wouldn't > be able to include dependencies in the saved parsers. > > I'm sure there's plenty of other things to think about, but something like > this > seems like it would be a big usability improvement now that > layers/charsets/etc > are getting more and more common. It also means systems that don't easily > support installing dependency jars using these saved parsers would just > work. > > > On 2024-10-25 11:24 AM, Adams, Joshua wrote: > >> All that said, I think what we really need is a way to move about from > requiring > >> DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies > in the > >> actual saved parser file, then using that saved parser wouldn't need any > >> classpath modifications, it's just all already there and Daffodil > internals > >> would uses those embedded dependencies for class lookups. I'm unsure of > exactly > >> how to do that, but it's definitely possible--NiFi does something very > similar > >> with it's "nar" format > > > > Changing the saved parser format to include classpath JAR's sounds like > something perfect for a 4.0.0 release of Daffodil. > > > > Josh > > ________________________________ > > From: Steve Lawrence <slawre...@apache.org> > > Sent: Thursday, October 24, 2024 9:48 AM > > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > > Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema? > > > > I've also just opened a PR to daffodil-sbt to fix the bug that causes > "show > > fullClasspath" and "export fullClasspath" to include daffodil > dependencies: > > > > https://github.com/apache/daffodil-sbt/pull/64 > > > > > > On 2024-10-24 08:24 AM, Steve Lawrence wrote: > >> Instead of `sbt dependencyTree`, you can run `sbt "show fullClasspath"` > to > >> output all the dependencies that `packageDaffodilBin` uses. You can > also run > >> `sbt "export fullClasspath"` to get an actual classpath string that you > can drop > >> into DAFFODIL_CLASSPATH. In one line, I think you could do: > >> > >> export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath") > >> > >> Note that the -batch and -error are needed to disable [info] and other > output > >> messages. > >> > >> Also note that this includes the scala dependency and I think we might > have a > >> bug in daffodil-sbt that causes it to also include Daffodil > dependencies if any > >> schemas are layers/charsets/etc. I *think* the way the Daffodil CLI > builds up > >> the classpath those extra dependencies will all be ignored, but if not > you might > >> have to manually build up the classpath with just the paths you want. > >> > >> If we want we could add a special sbt task that essentially mimics this > >> behavior, but I'd rather we just document this magic export command > somewhere so > >> we don't have to maintain it. > >> > >> > >> The Daffodil synonm is an interesting idea. I guess it would just set > fork the > >> daffodil script with DAFFODIL_CLASSPATH already set, and just pass any > task > >> arguments to the script? I think that's possible in SBT, but I think > the export > >> magic above is a bit more flexible and efficient since you don't need > to keep > >> sbt running to run a Daffodil command. > >> > >> > >> All that said, I think what we really need is a way to move about from > requiring > >> DAFFODIL_CLASSPATH at all. For example, if we could embed dependencies > in the > >> actual saved parser file, then using that saved parser wouldn't need any > >> classpath modifications, it's just all already there and Daffodil > internals > >> would uses those embedded dependencies for class lookups. I'm unsure of > exactly > >> how to do that, but it's definitely possible--NiFi does something very > similar > >> with it's "nar" format. > >> > >> > >> On 2024-10-23 06:00 PM, Mike Beckerle wrote: > >>> I am trying to go from 'sbt test' to a schema I can play with from the > >>> daffodil CLI. > >>> > >>> The schema of interest is the DFDLSchemas envelope-payload example. > >>> > >>> This schema depends on > >>> * tcpMessage > >>> * mil-std-2045 > >>> * pcap > >>> > >>> The pcap schema in turn depends on > >>> * ethernetIP > >>> > >>> The ethernetIP schema defines a Daffodil layer plugin that exists in > its > >>> jar. > >>> > >>> So far if I clone these all, and 'sbt publishLocal' all of the > components, > >>> then I can 'sbt test' in envelope-payload and it passes all tests. > >>> > >>> So now I'd like to do 'daffodil save-parser > >>> -s src/main/resources/io/github/dfdlschemas/envelopepayload/xsd/ > >>> envelopePayload.dfdl.xsd > >>> -o /tmp/envPay.bin' > >>> > >>> By adding these to the build.sbt > >>> > >>> daffodilPackageBinVersions := Seq(daffodilVersion.value) > >>> daffodilPackageBinInfos := Seq( > >>> ("/io/github/dfdlschemas/tcpMessage/xsd/tcpMessage.dfdl.xsd", > >>> Some("message"), None) > >>> ) > >>> > >>> Then 'sbt packageDaffodilBin' will create a compiled schema under the > >>> target/ directory named: > >>> > >>> dfdl-envelope-payload-1.1.0-daffodil390.bin > >>> > >>> So far so good. > >>> > >>> Now the challenge. > >>> > >>> But this can't be used with the daffodil CLI without also setting up > >>> DAFFODIL_CLASSPATH to have at least the ethernetIP jar file. > >>> > >>> Which is where? (yes I know in the ~/.ivy2/local cache, but there is > tons > >>> of stuff in there.) > >>> > >>> If I want to use xerces (aka full) validation then I also have to have > all > >>> the other component schema jar files on the DAFFODIL_CLASSPATH as well. > >>> > >>> I tried issuing 'sbt dependencyTree', but the dependency tree is not > just > >>> the schemas, but all the dependencies on daffodil and everything > >>> transitively it uses. > >>> > >>> This is much too hard. > >>> > >>> Why not have the daffodil-sbt plugin output a shell script that > appends to > >>> DAFFODIL_CLASSPATH with all the necessary component schema jars. > >>> > >>> Then users could just run that script to establish the > DAFFODIL_CLASSPATH > >>> once, and then they could use the daffodil CLI normally. > >>> > >>> In principle, a file could also be written intended to inform the > daffodil > >>> VSCode extension of the classpath to construct for the schema. > >>> > >>> Thoughts? > >>> > >>> One possible alternative is to add a daffodil command to sbt via the > plugin > >>> so that one can run daffodil command lines directly from the sbt > prompt: > >>> > >>> sbt > >>>> daffodil parse -p target/...bin foo.dat > >>> > >>> This is really just a synonym for issuing sbt run for the Main class > of the > >>> daffodil-cli module, (after doing packageDaffodilBin) so might be very > easy > >>> to do. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> Mike Beckerle > >>> Apache Daffodil PMC | daffodil.apache.org > >>> OGF DFDL Workgroup Co-Chair | > www.ogf.org/ogf/doku.php/standards/dfdl/dfdl< > http://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl> > >>> Owl Cyber Defense | www.owlcyberdefense.com< > http://www.owlcyberdefense.com> > >>> > >> > > > > > >