Do these products support saved parsers? I believe tunables are
serialized along with parser so if the tunables were set when built it
should work as expected, so wouldn't need a new Daffodil version to support.
If not, a top-level setTunable annotation seems reasonable to me. Though
I imagine things could get tricky if any tunables are used prior to
actually parsing the schema. With the lazy evaluations of Daffodil,
that's potentially an issue. Unless we do some sort of pre-pass to
extract the annotations? Or maybe the annotations are read as part of
creating the ProcessorFactory, and they are applied to the DataProcessor
when onPath is called? It means tunables used during PF creation
wouldn't have an affect, but maybe that's a reasonable limitation? Might
be some technical things to figure out, but the annotation seems
reasonable. I'd use a saved parser if I could though.
I like the idea of a plugin-path annotation a bit less, just because
paths can be so different on different machines. It makes creating a
easily distributed schemas that much more difficult. We already have
issues with schema imports and those are just simple relative paths. And
some products might not even have a good way to add jars?
Maybe an alternative approach is to make our saved parser format more
complex. Instead of just being a serialized Java object, it could become
an archive that contains the serialized parser plus dependencies. Very
similar to how NiFi works with its nar files (maybe we call them .dar
files?). When Daffodil reloads this archive, it can extract and
deserialize the saved archive as well as put the jars on the classpath.
This could also contain README and other docs as well.
It could also contain the text schemas and config files as well. This
way users could extract the schemas and use for normal schema validation
with other tools. Or they could compile the extracted schemas with the
tunables. Or we could maybe even have a way to "compile" a dar if it
doesn't contain a serialized parser, using he schema embedded schemas
and configs.
This does need some new tool to create the archive though, an SBT plugin
could do it pretty easily I think. Maybe this is the impetus to finally
create a Daffodil sbt plugin?
On 2023-07-21 12:17 PM, Mike Beckerle wrote:
An issue has come up and I want to discuss a possible idea to help our
users manage this.
Products have embedded Daffodil.
Those products do not provide means for setting daffodil tunables.
But we know some of the tunables are important to enable a schema to work -
e.g., size limits on regex matches for example.
A given schema may want to tighten these down, or open them up.
But daffodil is now embedded in products that do not provide means to set
the tunables.
These products are regulated. Changing them to upgrade daffodil to a new
version is relatively easy. Adding new features to them triggers a
regulatory review.
So here's the idea:
Add a feature so that tunables can be set from inside the schema. E.g., a
top-level annotation like
<dfdlx:setTunables ..../>
That would go at the top level of the schema.
I also think that if you use a schema that requires a jar plugin (layering
plugin, UDF, charset, or validator plugin) that the jar file should be
automatically loaded by Daffodil without the user having to worry about
setting up classpaths properly before calling Daffodil via API.
I found on stack overflow that it is possible for a java program to extend
the classpath dynamically.
https://stackoverflow.com/questions/271506/why-cant-system-setproperty-change-the-classpath-at-runtime/1198693#1198693
and
https://stackoverflow.com/questions/402330/is-it-possible-to-add-to-classpath-dynamically-in-java
but Daffodil could also, given a plugin, scan its own "plugin-path" defined
in the schema, and force load the jars directly rather than letting
ordinary class loader classpath stuff do it.
Taken together, these would let people add Daffodil to a product without
having to add features for tuning, for specifying plug-ins, etc. The schema
could specify all these things.
Thoughts?
Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com