We need additional features in Daffodil.

The number of variations on data formats is likely endless, and while DFDL is 
quite comprehensive there are still things it cannot express.


Here's a way to extend daffodil quite fundamentally, but from outside.


First we declare a DFDL extension like this:


<daf:defineExtension point="extensionPointName" class="my.package.MyExtension"/>


So what is this extension point name?


This is the name of a grammar Prod (production) in the daffodil.grammar mixins 
that appears on some mixin, it doesn't matter which mixin, so long as the name 
of the Prod is unique.


Most of these Prod names are the names of grammar regions in the DFDL spec data 
syntax grammar.


Prods are defined by this sort of thing in the Daffodil code:


lazy val foo = prod('foo){ guard } { production rule }


A Prod with name "foo" is created by this.


That name "foo" is the extension point name this creates.


The role of Prod is to look for a defined extension on its extension point. If 
it finds one (defined by above daf:defineExtension), then it dynamically loads 
the associated class by searching for it on the classpath.

It then creates an instance of the class, passing the current DSOM object as an 
argument.


So what does this class do?


The class implements the GrammarExtension trait, which means it has:


def guard: Boolean

def gram: Gram


The developer of this extension then can access the DSOM object to implement 
the guard, and the gram.

Note that a Gram implements


def parser: Parser

def unparser:Unparser


So implementing an extension requires that one implement this GrammarExtension, 
but also most likely new Parser and Unparser classes.


The Prod, having found the extension, invokes the extension object's guard() 
method. If false, the extension will not be used, and the Prod does exactly 
what it does today. If true, the extension will be used, and overrides the 
existing production. The guard could also call SDE to indicate compile-time 
errors, and if it throws unexpectedly, the Prod would catch this and issue a 
SDE itself that the extension failed.


If the guard is true, then the Prod would implement its own gram method by 
delegating to the gram method of the extension instance.


Presumably, the Gram that is created by the extension is going to lay down 
different new primitives and/or combinators, or detect SDEs. If there isn't an 
SDE, then those primitives and combinators would then implement parser() and 
unparser() which would return instances (presumably also newly implemented as 
part of the extension) of Parser and Unparser.


That's it. That's the entire mechanism.


Extensions would be able to override almost everything about Daffodil's 
implementation by this means.


Let's look at a simple example. Now turns out there is a format which 
interprets bits of binary integers reversed. That is, 0x53 (decimal 83) would 
not be interpreted as such, but by reversing the bits. If 8-bits wide, then 
this would be 0xCA, which is decimal 202.


As an extension...? Let's just consider parsing. We need to hook into where 
binary integers are created. In the grammar that's the binaryValue extension 
point.


(Not a Prod today, but would be changed into one.)


When the binaryValue Prod is executed, the extension would be detected and 
found.


The guard() method of the extension would look at the DSOM object, and examine 
the XML, perhaps lookup properties. Let's assume it finds a "property" 
foo:binaryBitOrder="reversed" (Detail, the extension API has to allow looking 
these non dfdl properties up using DFDL's scoping rules)


Finding that foo:binaryBitOrder is specified and is "reversed" instead of 
"normal" (anything else it would SDE), the guard() would return true.


The gram() method would be this:


assume base is the DSOM object being extended here. We know that's an 
ElementBase in this case

since we're dealing with a simple value.


lazy val gram =

   new ReverseBitsCombinator(this, 
base.asInstanceOf[ElementBase].binaryIntegerValue)


The ReverseBitsCombinator would provide


   class ReverseBitsCombinator(base: Term, originalBitsGram: Gram) extends Gram 
= {


    def parser() = new ReverseBitsParser(base.runtimeData,

                                  originalBitsGram.parser)

    }


and similarly for unparser.


The ReverseBitsParser would determine the length in bits, and after invoking 
the child parser it would modify the infoset value per the bit-reversal 
computation.


There are numerous details to work out, but this could be made to work.


More advanced extensions, such as the base64 or compress/decompress features 
would be implemented by combinators. The parsers/unparsers would be more 
advanced - these would involve building a wrapper around the I/O layer 
DataInputStream and DataOutputStream, and the parser/unparser would utilize the 
wrapped streams for the duration of the decoding, revert back to the original 
stream after. This seems feasible.


So.....


As initial criticism - If someone can write such an extension, well, hey 
Daffodil is open source, and they could just build it into Daffodil - which is 
to say that the skill level needed to do that is pretty much the same.  They 
could add the feature, issue a pull request, etc.


But the extension mechanism would let them build it without waiting for a 
Daffodil release to incorporate it. And there is much less to learn if you can 
start from an example extension and just copy the pattern.


Another criticism is that this exposes almost every aspect of the internals of 
Daffodil. Very little is hidden.

I think that's ok - assuming we do some cleanup - making methods that can be 
private, final, etc.


We would probably want to build proxy traits so that the extensions API could 
remain stable even if we change the internals somewhat.


I'm thinking of trying this idea out on the base64 feature.


Thoughts?



Reply via email to