Yep, I think something like that makes perfect sense. Apache NiFi also has a
similar format (calle nar) that's just a zip/jar file that has a directory
structure and files specific to NiFi.
On 2024-10-25 12:55 PM, McGann, Mike wrote:
How about something that is similar in format to a war file which is basically
a jar file but with some defined structure. Instead of a WEB-INF directory,
there could be a specific daffodil directory that contains metadata about the
contents of the archive. And it could contain the binary parser, extra jar
files, class files, and any other resource that may be needed in one nice
package. An advantage to that is standard jar tools could be used to inspect,
extract, or create.
// Mike
-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Friday, October 25, 2024 12:34
To: dev@daffodil.apache.org
Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema?
Agreed, I don't think it necessarily *must* be added to 4.0, since it wouldn't
have any backwards compatibility concerns, but it would be a really useful
sooner rather than later.
It's probably worth a discussion on how it might be implemented. As I recall,
java doesn't really make the classpath contents available, all the magic to
find things on the classpath is done by ClassLoaders, which don't even need to
be backed by jars.
So my first thought is to make it so the compile* API functions accept an
optional list of jars. And we would create a custom ClassLoader that makes
those jars available during compilation/parse/unparse. When saving a
DataProcessor we would copy those jars into the saved parser binary. And when
reloading a saved parser we would extract those jars, recreate the custom
ClassLoader, and use that ClassLoader for parse/unparse operations. Overall I
imagine the modifications wouldn't be too big.
The daffodil-sbt plugin already has a list of dependency jars, so using the new
compile API and passing those in should be straightforward.
We would also probably want a new CLI option. Whenever the -s option is used we
could also support a new option (maybe -j/--jar ?), which is just a list of
plugins and dependency jars. E.g.
daffodil save-parser -s /org/example/message.dfdl.xsd \
--jar schema-jars/*.jar \
> message.bin
It makes the save-parser command a bit more complex, but it completely avoids
having to deal with the classpath at all. And the added complexity is only in
save-parser, unparse/parse wouldn't need it when using a saved parser. And with
the daffodil-sbt plugin now becoming more standard, that command is becoming
less and less needed.
For backwards compatibility, the custom ClassLoader could fall back to the real
classpath if it fails to finding a resource/class, so we could still support
DAFFODIL_CLASSPATH and normal classpath stuff if people want, it just wouldn't
be able to include dependencies in the saved parsers.
I'm sure there's plenty of other things to think about, but something like this
seems like it would be a big usability improvement now that layers/charsets/etc
are getting more and more common. It also means systems that don't easily
support installing dependency jars using these saved parsers would just work.
On 2024-10-25 11:24 AM, Adams, Joshua wrote:
All that said, I think what we really need is a way to move about
from requiring DAFFODIL_CLASSPATH at all. For example, if we could
embed dependencies in the actual saved parser file, then using that
saved parser wouldn't need any classpath modifications, it's just all
already there and Daffodil internals would uses those embedded
dependencies for class lookups. I'm unsure of exactly how to do that,
but it's definitely possible--NiFi does something very similar with
it's "nar" format
Changing the saved parser format to include classpath JAR's sounds like
something perfect for a 4.0.0 release of Daffodil.
Josh
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Thursday, October 24, 2024 9:48 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: How to get DAFFODIL_CLASSPATH for a complex DFDL schema?
I've also just opened a PR to daffodil-sbt to fix the bug that causes
"show fullClasspath" and "export fullClasspath" to include daffodil
dependencies:
https://github.com/apache/daffodil-sbt/pull/64
On 2024-10-24 08:24 AM, Steve Lawrence wrote:
Instead of `sbt dependencyTree`, you can run `sbt "show
fullClasspath"` to output all the dependencies that
`packageDaffodilBin` uses. You can also run `sbt "export
fullClasspath"` to get an actual classpath string that you can drop into
DAFFODIL_CLASSPATH. In one line, I think you could do:
export DAFFODIL_CLASSPATH=$(sbt -batch -error "export fullClasspath")
Note that the -batch and -error are needed to disable [info] and
other output messages.
Also note that this includes the scala dependency and I think we
might have a bug in daffodil-sbt that causes it to also include
Daffodil dependencies if any schemas are layers/charsets/etc. I
*think* the way the Daffodil CLI builds up the classpath those extra
dependencies will all be ignored, but if not you might have to manually build
up the classpath with just the paths you want.
If we want we could add a special sbt task that essentially mimics
this behavior, but I'd rather we just document this magic export
command somewhere so we don't have to maintain it.
The Daffodil synonm is an interesting idea. I guess it would just set
fork the daffodil script with DAFFODIL_CLASSPATH already set, and
just pass any task arguments to the script? I think that's possible
in SBT, but I think the export magic above is a bit more flexible and
efficient since you don't need to keep sbt running to run a Daffodil command.
All that said, I think what we really need is a way to move about
from requiring DAFFODIL_CLASSPATH at all. For example, if we could
embed dependencies in the actual saved parser file, then using that
saved parser wouldn't need any classpath modifications, it's just all
already there and Daffodil internals would uses those embedded
dependencies for class lookups. I'm unsure of exactly how to do that,
but it's definitely possible--NiFi does something very similar with it's "nar"
format.
On 2024-10-23 06:00 PM, Mike Beckerle wrote:
I am trying to go from 'sbt test' to a schema I can play with from
the daffodil CLI.
The schema of interest is the DFDLSchemas envelope-payload example.
This schema depends on
* tcpMessage
* mil-std-2045
* pcap
The pcap schema in turn depends on
* ethernetIP
The ethernetIP schema defines a Daffodil layer plugin that exists in
its jar.
So far if I clone these all, and 'sbt publishLocal' all of the
components, then I can 'sbt test' in envelope-payload and it passes all tests.
So now I'd like to do 'daffodil save-parser -s
src/main/resources/io/github/dfdlschemas/envelopepayload/xsd/
envelopePayload.dfdl.xsd
-o /tmp/envPay.bin'
By adding these to the build.sbt
daffodilPackageBinVersions := Seq(daffodilVersion.value)
daffodilPackageBinInfos := Seq(
("/io/github/dfdlschemas/tcpMessage/xsd/tcpMessage.dfdl.xsd",
Some("message"), None)
)
Then 'sbt packageDaffodilBin' will create a compiled schema under
the target/ directory named:
dfdl-envelope-payload-1.1.0-daffodil390.bin
So far so good.
Now the challenge.
But this can't be used with the daffodil CLI without also setting up
DAFFODIL_CLASSPATH to have at least the ethernetIP jar file.
Which is where? (yes I know in the ~/.ivy2/local cache, but there is
tons of stuff in there.)
If I want to use xerces (aka full) validation then I also have to
have all the other component schema jar files on the DAFFODIL_CLASSPATH as well.
I tried issuing 'sbt dependencyTree', but the dependency tree is not
just the schemas, but all the dependencies on daffodil and
everything transitively it uses.
This is much too hard.
Why not have the daffodil-sbt plugin output a shell script that
appends to DAFFODIL_CLASSPATH with all the necessary component schema jars.
Then users could just run that script to establish the
DAFFODIL_CLASSPATH once, and then they could use the daffodil CLI normally.
In principle, a file could also be written intended to inform the
daffodil VSCode extension of the classpath to construct for the schema.
Thoughts?
One possible alternative is to add a daffodil command to sbt via the
plugin so that one can run daffodil command lines directly from the sbt prompt:
sbt
daffodil parse -p target/...bin foo.dat
This is really just a synonym for issuing sbt run for the Main class
of the daffodil-cli module, (after doing packageDaffodilBin) so
might be very easy to do.
Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup
Co-Chair |
www.ogf.org/ogf/doku.php/standards/dfdl/dfdl<http://www.ogf.org/ogf/
doku.php/standards/dfdl/dfdl> Owl Cyber Defense |
www.owlcyberdefense.com<http://www.owlcyberdefense.com>