Yeah, so I'm thinking of turning this idea into a tutorial overview of how to add a feature to Daffodil.
________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Monday, November 13, 2017 8:34:59 AM To: dev@daffodil.apache.org; Mike Beckerle Subject: Re: Extensibility for Daffodil - Concept/Idea for commentary This is very interesting, but your criticisms were exactly what I was thinking through as I read through this idea. Extensibility is really nice, but it could make future updates much more difficult since we now have to worry about not breaking existing extensions. Backwards compatibility now becomes way more difficult to maintain. Prod names can't easily change or be moved/removed, DSOM can't easily be changed, etc. The trait idea helps with that, but I think there's still a lot of effort required to figure out what needs to be public and what doesn't. I would much rather have people just do pull requests, have a proper review, and incorporate the code into Daffodil so that others could use it and have confidence of the quality. Creating documentation on how to add a new prod/extension would, I think, be more valuable and would also help promote gaining more developers. I'd also be concerned that any really interesting extensions would need to be implemented as multiple prods, or need to deal with the I/O layer, which would just add one more level of complexity to backwards compatibility and opening things up. It seems the biggest argument for this is that it makes it easier to add functionality without having to wait for a new release, but we should follow the release early release often philosophy, in which case that's not as big of an issue. This just sounds like a maintainability nightmare to me. - Steve On 11/10/2017 01:42 PM, Mike Beckerle wrote: > We need additional features in Daffodil. > > > The number of variations on data formats is likely endless, and while DFDL is > quite comprehensive there are still things it cannot express. > > > Here's a way to extend daffodil quite fundamentally, but from outside. > > > First we declare a DFDL extension like this: > > > <daf:defineExtension point="extensionPointName" > class="my.package.MyExtension"/> > > > So what is this extension point name? > > > This is the name of a grammar Prod (production) in the daffodil.grammar > mixins that appears on some mixin, it doesn't matter which mixin, so long as > the name of the Prod is unique. > > > Most of these Prod names are the names of grammar regions in the DFDL spec > data syntax grammar. > > > Prods are defined by this sort of thing in the Daffodil code: > > > lazy val foo = prod('foo){ guard } { production rule } > > > A Prod with name "foo" is created by this. > > > That name "foo" is the extension point name this creates. > > > The role of Prod is to look for a defined extension on its extension point. > If it finds one (defined by above daf:defineExtension), then it dynamically > loads the associated class by searching for it on the classpath. > > It then creates an instance of the class, passing the current DSOM object as > an argument. > > > So what does this class do? > > > The class implements the GrammarExtension trait, which means it has: > > > def guard: Boolean > > def gram: Gram > > > The developer of this extension then can access the DSOM object to implement > the guard, and the gram. > > Note that a Gram implements > > > def parser: Parser > > def unparser:Unparser > > > So implementing an extension requires that one implement this > GrammarExtension, but also most likely new Parser and Unparser classes. > > > The Prod, having found the extension, invokes the extension object's guard() > method. If false, the extension will not be used, and the Prod does exactly > what it does today. If true, the extension will be used, and overrides the > existing production. The guard could also call SDE to indicate compile-time > errors, and if it throws unexpectedly, the Prod would catch this and issue a > SDE itself that the extension failed. > > > If the guard is true, then the Prod would implement its own gram method by > delegating to the gram method of the extension instance. > > > Presumably, the Gram that is created by the extension is going to lay down > different new primitives and/or combinators, or detect SDEs. If there isn't > an SDE, then those primitives and combinators would then implement parser() > and unparser() which would return instances (presumably also newly > implemented as part of the extension) of Parser and Unparser. > > > That's it. That's the entire mechanism. > > > Extensions would be able to override almost everything about Daffodil's > implementation by this means. > > > Let's look at a simple example. Now turns out there is a format which > interprets bits of binary integers reversed. That is, 0x53 (decimal 83) would > not be interpreted as such, but by reversing the bits. If 8-bits wide, then > this would be 0xCA, which is decimal 202. > > > As an extension...? Let's just consider parsing. We need to hook into where > binary integers are created. In the grammar that's the binaryValue extension > point. > > > (Not a Prod today, but would be changed into one.) > > > When the binaryValue Prod is executed, the extension would be detected and > found. > > > The guard() method of the extension would look at the DSOM object, and > examine the XML, perhaps lookup properties. Let's assume it finds a > "property" foo:binaryBitOrder="reversed" (Detail, the extension API has to > allow looking these non dfdl properties up using DFDL's scoping rules) > > > Finding that foo:binaryBitOrder is specified and is "reversed" instead of > "normal" (anything else it would SDE), the guard() would return true. > > > The gram() method would be this: > > > assume base is the DSOM object being extended here. We know that's an > ElementBase in this case > > since we're dealing with a simple value. > > > lazy val gram = > > new ReverseBitsCombinator(this, > base.asInstanceOf[ElementBase].binaryIntegerValue) > > > The ReverseBitsCombinator would provide > > > class ReverseBitsCombinator(base: Term, originalBitsGram: Gram) extends > Gram = { > > > def parser() = new ReverseBitsParser(base.runtimeData, > > originalBitsGram.parser) > > } > > > and similarly for unparser. > > > The ReverseBitsParser would determine the length in bits, and after invoking > the child parser it would modify the infoset value per the bit-reversal > computation. > > > There are numerous details to work out, but this could be made to work. > > > More advanced extensions, such as the base64 or compress/decompress features > would be implemented by combinators. The parsers/unparsers would be more > advanced - these would involve building a wrapper around the I/O layer > DataInputStream and DataOutputStream, and the parser/unparser would utilize > the wrapped streams for the duration of the decoding, revert back to the > original stream after. This seems feasible. > > > So..... > > > As initial criticism - If someone can write such an extension, well, hey > Daffodil is open source, and they could just build it into Daffodil - which > is to say that the skill level needed to do that is pretty much the same. > They could add the feature, issue a pull request, etc. > > > But the extension mechanism would let them build it without waiting for a > Daffodil release to incorporate it. And there is much less to learn if you > can start from an example extension and just copy the pattern. > > > Another criticism is that this exposes almost every aspect of the internals > of Daffodil. Very little is hidden. > > I think that's ok - assuming we do some cleanup - making methods that can be > private, final, etc. > > > We would probably want to build proxy traits so that the extensions API could > remain stable even if we change the internals somewhat. > > > I'm thinking of trying this idea out on the base64 feature. > > > Thoughts? > > > >