I have been looking at the code branch Julian F created (while we were at 
ApacheCon NA 2019) based on modifying the BinaryIntegerKnownLengthParser class 
to have a code-generator.

Previously I thought implementing this on the parser/unparser classes was ok, 
but having refreshed my knowledge of Daffodil internals in the grammar and 
parser/unparsers, I think the approach needs to evolve.

The BinaryIntegerKnownLengthParser already is a somewhat specialized parser. It 
is selected by the schema compiler based on
(a) binary (twos complement) integer
(b) length is a known constant

What we want is for the compiler to select parsers that are further specialized 
for:
(a) signed/unsigned
(b) known bitOrder unchanged from prior element
(c) known byteOrder (not an expression) unchanged from prior element
(d) alignment known to be 8-bit aligned
(e) length known to be 8, 16, 32, or 64(signed) only (or at least, multiple of 
8 bits)

So the reduction in "interpretation" overhead we seek here is moving all the 
conditionals related to these (a) to (e) to compile time from run time.

That's my first cut at everything Daffodil must prove in its compiler about the 
format in order to achieve the same performance as hand-written code that makes 
all these same assumptions.

Then the runtime library has to be factored such that given this information 
you can generate calls to primitive parsers that actually are specialized on 
these things and so avoid overhead. The daffodil I/O library currently doesn't 
have these operations called out.

None of the above requires code-generation for a java implementation.
It's just about enabling the compiler to select more specialized parse/unparse 
primitive operations.

The reason to generate separate code is really more about:

1) reducing the footprint for all the primitives and runtime aspects that are 
unused by a given format. This is more like an issue of selective linking.

2) populating different non-generic infoset slots corresponding to named 
elements (e.g., pojo data members) without using reflection. This requires 
generating code that literally contains assignments to object members. This 
requires inline code generation so that an assignment can be ordinary 
non-reflective code.

Expression evaluation is something further we need to consider. E.g., if the 
length of something is to be computed, doing that in generated code requires 
that we compile DPath expressions into the generated code language.

Next step is I plan to write up a design note on the wiki and get some feedback 
on it to solidify the requirements and approach. It is definitely time we 
considered all angles on this code-generation notion since numerous people have 
expressed interest in this means of using Daffodil.





________________________________
From: Julian Feinauer <j.feina...@pragmaticminds.de>
Sent: Wednesday, September 11, 2019 1:30 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: [GENERATION] Code generation with Daffodil

Hi guys,

I just had a discussion yesterday with Mike and Steve and we already had 
several discussions before in the PLC4X project.
We like Daffodil but have the issue that we do not fit with the “Interpreter” 
Runtime it currently is.
Mainly for two issues, performance and interoperability.
So Ideally, I would like to have a piece of code which takes a DFDL Schema file 
and generated Code which is specifically to parse the given schema, probably in 
a given output. Ideally in multiple languages.

As its not (yet) Christmas, I guess I will not get that for free so I played 
around a bit with the code and tried to understand it as good as possible and 
for me it seems that it is not that undoable as I initially thought (I already 
checked some months ago).
In fact, if I get it right, the key would be to add another method `translate: 
AstNode` to the `Parser` trait.
This should then generate an Ast (Sub-)node which represents all the action 
that would be done in the regular `parse` method.
Then, we could finally, try to translate this Ast to Code and dump it to a file 
(I guess this is the rather easy part).

This is just a rough thought, but I wanted to get it to the list and probably 
we will find some time to discuss it at ACNA.

Julian

Reply via email to