We need additional features in Daffodil.
The number of variations on data formats is likely endless, and while DFDL is quite comprehensive there are still things it cannot express. Here's a way to extend daffodil quite fundamentally, but from outside. First we declare a DFDL extension like this: <daf:defineExtension point="extensionPointName" class="my.package.MyExtension"/> So what is this extension point name? This is the name of a grammar Prod (production) in the daffodil.grammar mixins that appears on some mixin, it doesn't matter which mixin, so long as the name of the Prod is unique. Most of these Prod names are the names of grammar regions in the DFDL spec data syntax grammar. Prods are defined by this sort of thing in the Daffodil code: lazy val foo = prod('foo){ guard } { production rule } A Prod with name "foo" is created by this. That name "foo" is the extension point name this creates. The role of Prod is to look for a defined extension on its extension point. If it finds one (defined by above daf:defineExtension), then it dynamically loads the associated class by searching for it on the classpath. It then creates an instance of the class, passing the current DSOM object as an argument. So what does this class do? The class implements the GrammarExtension trait, which means it has: def guard: Boolean def gram: Gram The developer of this extension then can access the DSOM object to implement the guard, and the gram. Note that a Gram implements def parser: Parser def unparser:Unparser So implementing an extension requires that one implement this GrammarExtension, but also most likely new Parser and Unparser classes. The Prod, having found the extension, invokes the extension object's guard() method. If false, the extension will not be used, and the Prod does exactly what it does today. If true, the extension will be used, and overrides the existing production. The guard could also call SDE to indicate compile-time errors, and if it throws unexpectedly, the Prod would catch this and issue a SDE itself that the extension failed. If the guard is true, then the Prod would implement its own gram method by delegating to the gram method of the extension instance. Presumably, the Gram that is created by the extension is going to lay down different new primitives and/or combinators, or detect SDEs. If there isn't an SDE, then those primitives and combinators would then implement parser() and unparser() which would return instances (presumably also newly implemented as part of the extension) of Parser and Unparser. That's it. That's the entire mechanism. Extensions would be able to override almost everything about Daffodil's implementation by this means. Let's look at a simple example. Now turns out there is a format which interprets bits of binary integers reversed. That is, 0x53 (decimal 83) would not be interpreted as such, but by reversing the bits. If 8-bits wide, then this would be 0xCA, which is decimal 202. As an extension...? Let's just consider parsing. We need to hook into where binary integers are created. In the grammar that's the binaryValue extension point. (Not a Prod today, but would be changed into one.) When the binaryValue Prod is executed, the extension would be detected and found. The guard() method of the extension would look at the DSOM object, and examine the XML, perhaps lookup properties. Let's assume it finds a "property" foo:binaryBitOrder="reversed" (Detail, the extension API has to allow looking these non dfdl properties up using DFDL's scoping rules) Finding that foo:binaryBitOrder is specified and is "reversed" instead of "normal" (anything else it would SDE), the guard() would return true. The gram() method would be this: assume base is the DSOM object being extended here. We know that's an ElementBase in this case since we're dealing with a simple value. lazy val gram = new ReverseBitsCombinator(this, base.asInstanceOf[ElementBase].binaryIntegerValue) The ReverseBitsCombinator would provide class ReverseBitsCombinator(base: Term, originalBitsGram: Gram) extends Gram = { def parser() = new ReverseBitsParser(base.runtimeData, originalBitsGram.parser) } and similarly for unparser. The ReverseBitsParser would determine the length in bits, and after invoking the child parser it would modify the infoset value per the bit-reversal computation. There are numerous details to work out, but this could be made to work. More advanced extensions, such as the base64 or compress/decompress features would be implemented by combinators. The parsers/unparsers would be more advanced - these would involve building a wrapper around the I/O layer DataInputStream and DataOutputStream, and the parser/unparser would utilize the wrapped streams for the duration of the decoding, revert back to the original stream after. This seems feasible. So..... As initial criticism - If someone can write such an extension, well, hey Daffodil is open source, and they could just build it into Daffodil - which is to say that the skill level needed to do that is pretty much the same. They could add the feature, issue a pull request, etc. But the extension mechanism would let them build it without waiting for a Daffodil release to incorporate it. And there is much less to learn if you can start from an example extension and just copy the pattern. Another criticism is that this exposes almost every aspect of the internals of Daffodil. Very little is hidden. I think that's ok - assuming we do some cleanup - making methods that can be private, final, etc. We would probably want to build proxy traits so that the extensions API could remain stable even if we change the internals somewhat. I'm thinking of trying this idea out on the base64 feature. Thoughts?