How about something that is similar in format to a war file which is basically a jar file but with some defined structure. Instead of a WEB-INF directory, there could be a specific daffodil directory that contains metadata about the contents of the archive. And it could contain the binary parser, extra jar files, class files, and any other resource that may be needed in one nice package. An advantage to that is standard jar tools could be used to inspect, extract, or create.
// Mike -----Original Message----- From: Steve Lawrence <slawre...@apache.org> Sent: Friday, October 25, 2024 12:34 To: dev@daffodil.apache.org Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema? Agreed, I don't think it necessarily *must* be added to 4.0, since it wouldn't have any backwards compatibility concerns, but it would be a really useful sooner rather than later. It's probably worth a discussion on how it might be implemented. As I recall, java doesn't really make the classpath contents available, all the magic to find things on the classpath is done by ClassLoaders, which don't even need to be backed by jars. So my first thought is to make it so the compile* API functions accept an optional list of jars. And we would create a custom ClassLoader that makes those jars available during compilation/parse/unparse. When saving a DataProcessor we would copy those jars into the saved parser binary. And when reloading a saved parser we would extract those jars, recreate the custom ClassLoader, and use that ClassLoader for parse/unparse operations. Overall I imagine the modifications wouldn't be too big. The daffodil-sbt plugin already has a list of dependency jars, so using the new compile API and passing those in should be straightforward. We would also probably want a new CLI option. Whenever the -s option is used we could also support a new option (maybe -j/--jar ?), which is just a list of plugins and dependency jars. E.g. daffodil save-parser -s /org/example/message.dfdl.xsd \ --jar schema-jars/*.jar \ > message.bin It makes the save-parser command a bit more complex, but it completely avoids having to deal with the classpath at all. And the added complexity is only in save-parser, unparse/parse wouldn't need it when using a saved parser. And with the daffodil-sbt plugin now becoming more standard, that command is becoming less and less needed. For backwards compatibility, the custom ClassLoader could fall back to the real classpath if it fails to finding a resource/class, so we could still support DAFFODIL_CLASSPATH and normal classpath stuff if people want, it just wouldn't be able to include dependencies in the saved parsers. I'm sure there's plenty of other things to think about, but something like this seems like it would be a big usability improvement now that layers/charsets/etc are getting more and more common. It also means systems that don't easily support installing dependency jars using these saved parsers would just work. On 2024-10-25 11:24 AM, Adams, Joshua wrote: >> All that said, I think what we really need is a way to move about >> from requiring DAFFODIL_CLASSPATH at all. For example, if we could >> embed dependencies in the actual saved parser file, then using that >> saved parser wouldn't need any classpath modifications, it's just all >> already there and Daffodil internals would uses those embedded >> dependencies for class lookups. I'm unsure of exactly how to do that, >> but it's definitely possible--NiFi does something very similar with >> it's "nar" format > > Changing the saved parser format to include classpath JAR's sounds like > something perfect for a 4.0.0 release of Daffodil. > > Josh > ________________________________ > From: Steve Lawrence <slawre...@apache.org> > Sent: Thursday, October 24, 2024 9:48 AM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema? > > I've also just opened a PR to daffodil-sbt to fix the bug that causes > "show fullClasspath" and "export fullClasspath" to include daffodil > dependencies: > > https://github.com/apache/daffodil-sbt/pull/64 > > > On 2024-10-24 08:24 AM, Steve Lawrence wrote: >> Instead of `sbt dependencyTree`, you can run `sbt "show >> fullClasspath"` to output all the dependencies that >> `packageDaffodilBin` uses. You can also run `sbt "export >> fullClasspath"` to get an actual classpath string that you can drop into >> DAFFODIL_CLASSPATH. In one line, I think you could do: >> >> export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath") >> >> Note that the -batch and -error are needed to disable [info] and >> other output messages. >> >> Also note that this includes the scala dependency and I think we >> might have a bug in daffodil-sbt that causes it to also include >> Daffodil dependencies if any schemas are layers/charsets/etc. I >> *think* the way the Daffodil CLI builds up the classpath those extra >> dependencies will all be ignored, but if not you might have to manually >> build up the classpath with just the paths you want. >> >> If we want we could add a special sbt task that essentially mimics >> this behavior, but I'd rather we just document this magic export >> command somewhere so we don't have to maintain it. >> >> >> The Daffodil synonm is an interesting idea. I guess it would just set >> fork the daffodil script with DAFFODIL_CLASSPATH already set, and >> just pass any task arguments to the script? I think that's possible >> in SBT, but I think the export magic above is a bit more flexible and >> efficient since you don't need to keep sbt running to run a Daffodil command. >> >> >> All that said, I think what we really need is a way to move about >> from requiring DAFFODIL_CLASSPATH at all. For example, if we could >> embed dependencies in the actual saved parser file, then using that >> saved parser wouldn't need any classpath modifications, it's just all >> already there and Daffodil internals would uses those embedded >> dependencies for class lookups. I'm unsure of exactly how to do that, >> but it's definitely possible--NiFi does something very similar with it's >> "nar" format. >> >> >> On 2024-10-23 06:00 PM, Mike Beckerle wrote: >>> I am trying to go from 'sbt test' to a schema I can play with from >>> the daffodil CLI. >>> >>> The schema of interest is the DFDLSchemas envelope-payload example. >>> >>> This schema depends on >>> * tcpMessage >>> * mil-std-2045 >>> * pcap >>> >>> The pcap schema in turn depends on >>> * ethernetIP >>> >>> The ethernetIP schema defines a Daffodil layer plugin that exists in >>> its jar. >>> >>> So far if I clone these all, and 'sbt publishLocal' all of the >>> components, then I can 'sbt test' in envelope-payload and it passes all >>> tests. >>> >>> So now I'd like to do 'daffodil save-parser -s >>> src/main/resources/io/github/dfdlschemas/envelopepayload/xsd/ >>> envelopePayload.dfdl.xsd >>> -o /tmp/envPay.bin' >>> >>> By adding these to the build.sbt >>> >>> daffodilPackageBinVersions := Seq(daffodilVersion.value) >>> daffodilPackageBinInfos := Seq( >>> ("/io/github/dfdlschemas/tcpMessage/xsd/tcpMessage.dfdl.xsd", >>> Some("message"), None) >>> ) >>> >>> Then 'sbt packageDaffodilBin' will create a compiled schema under >>> the target/ directory named: >>> >>> dfdl-envelope-payload-1.1.0-daffodil390.bin >>> >>> So far so good. >>> >>> Now the challenge. >>> >>> But this can't be used with the daffodil CLI without also setting up >>> DAFFODIL_CLASSPATH to have at least the ethernetIP jar file. >>> >>> Which is where? (yes I know in the ~/.ivy2/local cache, but there is >>> tons of stuff in there.) >>> >>> If I want to use xerces (aka full) validation then I also have to >>> have all the other component schema jar files on the DAFFODIL_CLASSPATH as >>> well. >>> >>> I tried issuing 'sbt dependencyTree', but the dependency tree is not >>> just the schemas, but all the dependencies on daffodil and >>> everything transitively it uses. >>> >>> This is much too hard. >>> >>> Why not have the daffodil-sbt plugin output a shell script that >>> appends to DAFFODIL_CLASSPATH with all the necessary component schema jars. >>> >>> Then users could just run that script to establish the >>> DAFFODIL_CLASSPATH once, and then they could use the daffodil CLI normally. >>> >>> In principle, a file could also be written intended to inform the >>> daffodil VSCode extension of the classpath to construct for the schema. >>> >>> Thoughts? >>> >>> One possible alternative is to add a daffodil command to sbt via the >>> plugin so that one can run daffodil command lines directly from the sbt >>> prompt: >>> >>> sbt >>>> daffodil parse -p target/...bin foo.dat >>> >>> This is really just a synonym for issuing sbt run for the Main class >>> of the daffodil-cli module, (after doing packageDaffodilBin) so >>> might be very easy to do. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Mike Beckerle >>> Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup >>> Co-Chair | >>> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<http://www.ogf.org/ogf/ >>> doku.php/standards/dfdl/dfdl> Owl Cyber Defense | >>> www.owlcyberdefense.com<http://www.owlcyberdefense.com> >>> >> > >