Do these products support saved parsers? I believe tunables are serialized along with parser so if the tunables were set when built it should work as expected, so wouldn't need a new Daffodil version to support.

If not, a top-level setTunable annotation seems reasonable to me. Though I imagine things could get tricky if any tunables are used prior to actually parsing the schema. With the lazy evaluations of Daffodil, that's potentially an issue. Unless we do some sort of pre-pass to extract the annotations? Or maybe the annotations are read as part of creating the ProcessorFactory, and they are applied to the DataProcessor when onPath is called? It means tunables used during PF creation wouldn't have an affect, but maybe that's a reasonable limitation? Might be some technical things to figure out, but the annotation seems reasonable. I'd use a saved parser if I could though.

I like the idea of a plugin-path annotation a bit less, just because paths can be so different on different machines. It makes creating a easily distributed schemas that much more difficult. We already have issues with schema imports and those are just simple relative paths. And some products might not even have a good way to add jars?


Maybe an alternative approach is to make our saved parser format more complex. Instead of just being a serialized Java object, it could become an archive that contains the serialized parser plus dependencies. Very similar to how NiFi works with its nar files (maybe we call them .dar files?). When Daffodil reloads this archive, it can extract and deserialize the saved archive as well as put the jars on the classpath.

This could also contain README and other docs as well.

It could also contain the text schemas and config files as well. This way users could extract the schemas and use for normal schema validation with other tools. Or they could compile the extracted schemas with the tunables. Or we could maybe even have a way to "compile" a dar if it doesn't contain a serialized parser, using he schema embedded schemas and configs.

This does need some new tool to create the archive though, an SBT plugin could do it pretty easily I think. Maybe this is the impetus to finally create a Daffodil sbt plugin?




On 2023-07-21 12:17 PM, Mike Beckerle wrote:
An issue has come up and I want to discuss a possible idea to help our
users manage this.

Products have embedded Daffodil.

Those products do not provide means for setting daffodil tunables.

But we know some of the tunables are important to enable a schema to work -
e.g., size limits on regex matches for example.

A given schema may want to tighten these down, or open them up.

But daffodil is now embedded in products that do not provide means to set
the tunables.
These products are regulated. Changing them to upgrade daffodil to a new
version is relatively easy. Adding new features to them triggers a
regulatory review.

So here's the idea:

Add a feature so that tunables can be set from inside the schema. E.g., a
top-level annotation like

<dfdlx:setTunables ..../>

That would go at the top level of the schema.

I also think that if you use a schema that requires a jar plugin (layering
plugin, UDF, charset, or validator plugin) that the jar file should be
automatically loaded by Daffodil without the user having to worry about
setting up classpaths properly before calling Daffodil via API.

I found on stack overflow that it is possible for a java program to extend
the classpath dynamically.
https://stackoverflow.com/questions/271506/why-cant-system-setproperty-change-the-classpath-at-runtime/1198693#1198693
and
https://stackoverflow.com/questions/402330/is-it-possible-to-add-to-classpath-dynamically-in-java
but Daffodil could also, given a plugin, scan its own "plugin-path" defined
in the schema, and force load the jars directly rather than letting
ordinary class loader classpath stuff do it.

Taken together, these would let people add Daffodil to a product without
having to add features for tuning, for specifying plug-ins, etc. The schema
could specify all these things.

Thoughts?

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com


Reply via email to