+1 on the ".dar" SBT plugin. Once a Daffodil Archive format is established, an SBT plugin enables developers to easily create Daffodil Archives and further enforce consistency.
On Mon, Jul 24, 2023 at 9:10 AM Davin Shearer <da...@apache.org> wrote: > +1 on the ".dar" idea. > > The Daffodil Archive ".dar" idea Steve introduced and the parallels with > the Apache NiFi ecosystem is exciting. The idea of bunding the serialized > parser, the runtime dependencies, and a README is a great non-intrusive > solution to both the tunables and classpath problems. Given Daffodil > tooling and these ".dar" files, ought to solve the problem Mike describes, > the "works for me" problem, and helps to solve the air gapped system > problem. This can help on the TDML side too, combining the test cases with > a ".dar" to provide consistent results. > > Given an established ".dar" file standard, I'm excited by the possibility > of having a browseable repository of these ".dar" files "at the ready" for > data processing. It provides a new consistency across all supported > platforms. They could even be signed and verified like many > other package systems. > > > > > > On Fri, Jul 21, 2023 at 2:03 PM Steve Lawrence <slawre...@apache.org> > wrote: > >> Do these products support saved parsers? I believe tunables are >> serialized along with parser so if the tunables were set when built it >> should work as expected, so wouldn't need a new Daffodil version to >> support. >> >> If not, a top-level setTunable annotation seems reasonable to me. Though >> I imagine things could get tricky if any tunables are used prior to >> actually parsing the schema. With the lazy evaluations of Daffodil, >> that's potentially an issue. Unless we do some sort of pre-pass to >> extract the annotations? Or maybe the annotations are read as part of >> creating the ProcessorFactory, and they are applied to the DataProcessor >> when onPath is called? It means tunables used during PF creation >> wouldn't have an affect, but maybe that's a reasonable limitation? Might >> be some technical things to figure out, but the annotation seems >> reasonable. I'd use a saved parser if I could though. >> >> I like the idea of a plugin-path annotation a bit less, just because >> paths can be so different on different machines. It makes creating a >> easily distributed schemas that much more difficult. We already have >> issues with schema imports and those are just simple relative paths. And >> some products might not even have a good way to add jars? >> >> >> Maybe an alternative approach is to make our saved parser format more >> complex. Instead of just being a serialized Java object, it could become >> an archive that contains the serialized parser plus dependencies. Very >> similar to how NiFi works with its nar files (maybe we call them .dar >> files?). When Daffodil reloads this archive, it can extract and >> deserialize the saved archive as well as put the jars on the classpath. >> >> This could also contain README and other docs as well. >> >> It could also contain the text schemas and config files as well. This >> way users could extract the schemas and use for normal schema validation >> with other tools. Or they could compile the extracted schemas with the >> tunables. Or we could maybe even have a way to "compile" a dar if it >> doesn't contain a serialized parser, using he schema embedded schemas >> and configs. >> >> This does need some new tool to create the archive though, an SBT plugin >> could do it pretty easily I think. Maybe this is the impetus to finally >> create a Daffodil sbt plugin? >> >> >> >> >> On 2023-07-21 12:17 PM, Mike Beckerle wrote: >> > An issue has come up and I want to discuss a possible idea to help our >> > users manage this. >> > >> > Products have embedded Daffodil. >> > >> > Those products do not provide means for setting daffodil tunables. >> > >> > But we know some of the tunables are important to enable a schema to >> work - >> > e.g., size limits on regex matches for example. >> > >> > A given schema may want to tighten these down, or open them up. >> > >> > But daffodil is now embedded in products that do not provide means to >> set >> > the tunables. >> > These products are regulated. Changing them to upgrade daffodil to a new >> > version is relatively easy. Adding new features to them triggers a >> > regulatory review. >> > >> > So here's the idea: >> > >> > Add a feature so that tunables can be set from inside the schema. E.g., >> a >> > top-level annotation like >> > >> > <dfdlx:setTunables ..../> >> > >> > That would go at the top level of the schema. >> > >> > I also think that if you use a schema that requires a jar plugin >> (layering >> > plugin, UDF, charset, or validator plugin) that the jar file should be >> > automatically loaded by Daffodil without the user having to worry about >> > setting up classpaths properly before calling Daffodil via API. >> > >> > I found on stack overflow that it is possible for a java program to >> extend >> > the classpath dynamically. >> > >> https://stackoverflow.com/questions/271506/why-cant-system-setproperty-change-the-classpath-at-runtime/1198693#1198693 >> > and >> > >> https://stackoverflow.com/questions/402330/is-it-possible-to-add-to-classpath-dynamically-in-java >> > but Daffodil could also, given a plugin, scan its own "plugin-path" >> defined >> > in the schema, and force load the jars directly rather than letting >> > ordinary class loader classpath stuff do it. >> > >> > Taken together, these would let people add Daffodil to a product without >> > having to add features for tuning, for specifying plug-ins, etc. The >> schema >> > could specify all these things. >> > >> > Thoughts? >> > >> > Mike Beckerle >> > Apache Daffodil PMC | daffodil.apache.org >> > OGF DFDL Workgroup Co-Chair | >> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl >> > Owl Cyber Defense | www.owlcyberdefense.com >> > >> >>