+1 on the ".dar" idea. The Daffodil Archive ".dar" idea Steve introduced and the parallels with the Apache NiFi ecosystem is exciting. The idea of bunding the serialized parser, the runtime dependencies, and a README is a great non-intrusive solution to both the tunables and classpath problems. Given Daffodil tooling and these ".dar" files, ought to solve the problem Mike describes, the "works for me" problem, and helps to solve the air gapped system problem. This can help on the TDML side too, combining the test cases with a ".dar" to provide consistent results.
Given an established ".dar" file standard, I'm excited by the possibility of having a browseable repository of these ".dar" files "at the ready" for data processing. It provides a new consistency across all supported platforms. They could even be signed and verified like many other package systems. On Fri, Jul 21, 2023 at 2:03 PM Steve Lawrence <slawre...@apache.org> wrote: > Do these products support saved parsers? I believe tunables are > serialized along with parser so if the tunables were set when built it > should work as expected, so wouldn't need a new Daffodil version to > support. > > If not, a top-level setTunable annotation seems reasonable to me. Though > I imagine things could get tricky if any tunables are used prior to > actually parsing the schema. With the lazy evaluations of Daffodil, > that's potentially an issue. Unless we do some sort of pre-pass to > extract the annotations? Or maybe the annotations are read as part of > creating the ProcessorFactory, and they are applied to the DataProcessor > when onPath is called? It means tunables used during PF creation > wouldn't have an affect, but maybe that's a reasonable limitation? Might > be some technical things to figure out, but the annotation seems > reasonable. I'd use a saved parser if I could though. > > I like the idea of a plugin-path annotation a bit less, just because > paths can be so different on different machines. It makes creating a > easily distributed schemas that much more difficult. We already have > issues with schema imports and those are just simple relative paths. And > some products might not even have a good way to add jars? > > > Maybe an alternative approach is to make our saved parser format more > complex. Instead of just being a serialized Java object, it could become > an archive that contains the serialized parser plus dependencies. Very > similar to how NiFi works with its nar files (maybe we call them .dar > files?). When Daffodil reloads this archive, it can extract and > deserialize the saved archive as well as put the jars on the classpath. > > This could also contain README and other docs as well. > > It could also contain the text schemas and config files as well. This > way users could extract the schemas and use for normal schema validation > with other tools. Or they could compile the extracted schemas with the > tunables. Or we could maybe even have a way to "compile" a dar if it > doesn't contain a serialized parser, using he schema embedded schemas > and configs. > > This does need some new tool to create the archive though, an SBT plugin > could do it pretty easily I think. Maybe this is the impetus to finally > create a Daffodil sbt plugin? > > > > > On 2023-07-21 12:17 PM, Mike Beckerle wrote: > > An issue has come up and I want to discuss a possible idea to help our > > users manage this. > > > > Products have embedded Daffodil. > > > > Those products do not provide means for setting daffodil tunables. > > > > But we know some of the tunables are important to enable a schema to > work - > > e.g., size limits on regex matches for example. > > > > A given schema may want to tighten these down, or open them up. > > > > But daffodil is now embedded in products that do not provide means to set > > the tunables. > > These products are regulated. Changing them to upgrade daffodil to a new > > version is relatively easy. Adding new features to them triggers a > > regulatory review. > > > > So here's the idea: > > > > Add a feature so that tunables can be set from inside the schema. E.g., a > > top-level annotation like > > > > <dfdlx:setTunables ..../> > > > > That would go at the top level of the schema. > > > > I also think that if you use a schema that requires a jar plugin > (layering > > plugin, UDF, charset, or validator plugin) that the jar file should be > > automatically loaded by Daffodil without the user having to worry about > > setting up classpaths properly before calling Daffodil via API. > > > > I found on stack overflow that it is possible for a java program to > extend > > the classpath dynamically. > > > https://stackoverflow.com/questions/271506/why-cant-system-setproperty-change-the-classpath-at-runtime/1198693#1198693 > > and > > > https://stackoverflow.com/questions/402330/is-it-possible-to-add-to-classpath-dynamically-in-java > > but Daffodil could also, given a plugin, scan its own "plugin-path" > defined > > in the schema, and force load the jars directly rather than letting > > ordinary class loader classpath stuff do it. > > > > Taken together, these would let people add Daffodil to a product without > > having to add features for tuning, for specifying plug-ins, etc. The > schema > > could specify all these things. > > > > Thoughts? > > > > Mike Beckerle > > Apache Daffodil PMC | daffodil.apache.org > > OGF DFDL Workgroup Co-Chair | > www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > > Owl Cyber Defense | www.owlcyberdefense.com > > > >