+1 on the ".dar" SBT plugin.

Once a Daffodil Archive format is established, an SBT plugin enables
developers to easily create Daffodil Archives and further enforce
consistency.

On Mon, Jul 24, 2023 at 9:10 AM Davin Shearer <da...@apache.org> wrote:

> +1 on the ".dar" idea.
>
> The Daffodil Archive ".dar" idea Steve introduced and the parallels with
> the Apache NiFi ecosystem is exciting.  The idea of bunding the serialized
> parser, the runtime dependencies, and a README is a great non-intrusive
> solution to both the tunables and classpath problems.  Given Daffodil
> tooling and these ".dar" files, ought to solve the problem Mike describes,
> the "works for me" problem, and helps to solve the air gapped system
> problem.  This can help on the TDML side too, combining the test cases with
> a ".dar" to provide consistent results.
>
> Given an established ".dar" file standard, I'm excited by the possibility
> of having a browseable repository of these ".dar" files "at the ready" for
> data processing.  It provides a new consistency across all supported
> platforms.  They could even be signed and verified like many
> other package systems.
>
>
>
>
>
> On Fri, Jul 21, 2023 at 2:03 PM Steve Lawrence <slawre...@apache.org>
> wrote:
>
>> Do these products support saved parsers? I believe tunables are
>> serialized along with parser so if the tunables were set when built it
>> should work as expected, so wouldn't need a new Daffodil version to
>> support.
>>
>> If not, a top-level setTunable annotation seems reasonable to me. Though
>> I imagine things could get tricky if any tunables are used prior to
>> actually parsing the schema. With the lazy evaluations of Daffodil,
>> that's potentially an issue. Unless we do some sort of pre-pass to
>> extract the annotations? Or maybe the annotations are read as part of
>> creating the ProcessorFactory, and they are applied to the DataProcessor
>> when onPath is called? It means tunables used during PF creation
>> wouldn't have an affect, but maybe that's a reasonable limitation? Might
>> be some technical things to figure out, but the annotation seems
>> reasonable. I'd use a saved parser if I could though.
>>
>> I like the idea of a plugin-path annotation a bit less, just because
>> paths can be so different on different machines. It makes creating a
>> easily distributed schemas that much more difficult. We already have
>> issues with schema imports and those are just simple relative paths. And
>> some products might not even have a good way to add jars?
>>
>>
>> Maybe an alternative approach is to make our saved parser format more
>> complex. Instead of just being a serialized Java object, it could become
>> an archive that contains the serialized parser plus dependencies. Very
>> similar to how NiFi works with its nar files (maybe we call them .dar
>> files?). When Daffodil reloads this archive, it can extract and
>> deserialize the saved archive as well as put the jars on the classpath.
>>
>> This could also contain README and other docs as well.
>>
>> It could also contain the text schemas and config files as well. This
>> way users could extract the schemas and use for normal schema validation
>> with other tools. Or they could compile the extracted schemas with the
>> tunables. Or we could maybe even have a way to "compile" a dar if it
>> doesn't contain a serialized parser, using he schema embedded schemas
>> and configs.
>>
>> This does need some new tool to create the archive though, an SBT plugin
>> could do it pretty easily I think. Maybe this is the impetus to finally
>> create a Daffodil sbt plugin?
>>
>>
>>
>>
>> On 2023-07-21 12:17 PM, Mike Beckerle wrote:
>> > An issue has come up and I want to discuss a possible idea to help our
>> > users manage this.
>> >
>> > Products have embedded Daffodil.
>> >
>> > Those products do not provide means for setting daffodil tunables.
>> >
>> > But we know some of the tunables are important to enable a schema to
>> work -
>> > e.g., size limits on regex matches for example.
>> >
>> > A given schema may want to tighten these down, or open them up.
>> >
>> > But daffodil is now embedded in products that do not provide means to
>> set
>> > the tunables.
>> > These products are regulated. Changing them to upgrade daffodil to a new
>> > version is relatively easy. Adding new features to them triggers a
>> > regulatory review.
>> >
>> > So here's the idea:
>> >
>> > Add a feature so that tunables can be set from inside the schema. E.g.,
>> a
>> > top-level annotation like
>> >
>> > <dfdlx:setTunables ..../>
>> >
>> > That would go at the top level of the schema.
>> >
>> > I also think that if you use a schema that requires a jar plugin
>> (layering
>> > plugin, UDF, charset, or validator plugin) that the jar file should be
>> > automatically loaded by Daffodil without the user having to worry about
>> > setting up classpaths properly before calling Daffodil via API.
>> >
>> > I found on stack overflow that it is possible for a java program to
>> extend
>> > the classpath dynamically.
>> >
>> https://stackoverflow.com/questions/271506/why-cant-system-setproperty-change-the-classpath-at-runtime/1198693#1198693
>> > and
>> >
>> https://stackoverflow.com/questions/402330/is-it-possible-to-add-to-classpath-dynamically-in-java
>> > but Daffodil could also, given a plugin, scan its own "plugin-path"
>> defined
>> > in the schema, and force load the jars directly rather than letting
>> > ordinary class loader classpath stuff do it.
>> >
>> > Taken together, these would let people add Daffodil to a product without
>> > having to add features for tuning, for specifying plug-ins, etc. The
>> schema
>> > could specify all these things.
>> >
>> > Thoughts?
>> >
>> > Mike Beckerle
>> > Apache Daffodil PMC | daffodil.apache.org
>> > OGF DFDL Workgroup Co-Chair |
>> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
>> > Owl Cyber Defense | www.owlcyberdefense.com
>> >
>>
>>

Reply via email to