Re: Code generation from Daffodil runtime2

Beckerle, Mike Sat, 23 May 2020 12:46:30 -0700

I'm not available until Tuesday, but would be happy to participate then or 
after.


But you also don't have to wait for me.

If we do something next week, then given timezones I assume this would be early 
in the day. My bookings Tuesday are only at 11am US.ET and 2pm, Wednesday I 
have a 10am US.ET.


Some Runtime 2 thoughts for the weekend here:

re: Switch to C/C++ from Java

I think we want to do C/C++ as the initial target for generation. The earlier 
work we did trying out Julian's codegen framework we assumed we were doing Java 
POJOs, but I think we should switch because it's just confusing to me that 
we're still in Java, but not reusing any of the Runtime 1 library.

Doing that instead of Java as the generated code will create a clearer 
separation between Runtime2 and existing Scala/Java Runtime1.

re: Infrastructure, TDML/Testing

We will have to consider how to make it easy to test, and reuse existing tests. 
We still want our TDML tests and that test discipline to work regardless of 
whether we're testing Runtime1 or Runtime2.

E.g., our TDML tests will have to compile a schema, generate runtime2 code from 
it, compile that code and link with the runtime2 libraries, and then run that 
code, feeding it the TDML data, and obtaining the infoset back from it, 
converting that infoset into XML for comparison with the expected XML in the 
TDML test.

Paving all of that pathway early on will be key to success along with arranging 
to easily run the C/C++ IDE debugger against the generated code and it's 
runtime library.

The TDML "system" is divided into "tdml-lib", which is part of Daffodil,  and 
tdml "processors". The tdml-lib is the common framework for all TDML 
processors. The processors are independent DFDL implementations that implement 
interfaces/traits defined by tdml-lib.

Implementing this is implementing a runtime2 equivalent of the 
DaffodilTDMLDFDLProcessor.scala file, which has 4 classes implemented. We know 
this is doable because we've implemented an alternate TDML processor that 
drives TDML tests against the IBM DFDL libraries, and we use this cross-tester 
rig to verify that numerous DFDL schemas are portable to IBM DFDL and Daffodil.

In theory one could use Java JNI to call to C/C++ code as a means of driving 
the C/C++ runtime from the TDML processor. I had planned to do a TDML processor 
for the ESA's DFDL4Space DFDL implementations (they have a C and a Java one) 
but never got around to it.

This IBM cross tester is in https://github.com/OpenDFDL/ibmDFDLCrossTester for 
reference. A current open Pull Request is for the port to use daffodil 2.6.0's 
TDML-lib.

re: Parsing and Unparsing

Another guidance is that we should not repeat a prior mistake of focusing on 
parsing, and leaving unparsing for later. That was a pretty bad mistake we made 
with Daffodil early on. The parse and unparse capabilities for any 
backend/runtime should be built out simultaneously and symmetrically.  This 
means the tests also have to convert XML infosets into the corresponding C/C++ 
structures, unparse those and verify the unparsed data against what is expected.


re: Where is the code so far?

Look in daffodil-core for the runtime2 package. That's the DFDL schema compiler 
which will invoke the code generator.
What is there is a mock up at this point. The notion is that the schema 
compiler creates DSOM objects, those generate the Gram objects. Those are a 
simple rule-based optimizer that drops out irrelevant parts of the data 
"grammar" that are not needed.

All that DSOM and Gram stuff is backend independent.

The Gram objects have parser and unparser methods which are part of runtime 1. 
They construct runtime 1 Parser and Unparser objects, and the runtime 1 runtime 
data structures they use.

We will add to the Gram objects, generateCode methods which will be Runtime 2's 
mechanism. Asking a Gram object to generate code starts the recursive walk over 
the whole structure generating/emitting code.

None of this works right now. The initial goal is to get it to emit some code.

There is also the daffodil-runtime2 module - the code there was a mock up of 
the start of a java-callable runtime library, but that was for Runtime 2 
generating Java, not C/C++. If we switch to C/C++ theme, then this runtime 
library will obviously be written in C/C++ or a C/C++ linkable language, not a 
JVM language.


________________________________
From: Interrante, John A (GE Research, US) <inter...@research.ge.com>
Sent: Saturday, May 23, 2020 9:45 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: RE: Code generation from Daffodil runtime2

Hi Julian,

Let's use Google Meet since it has the best captions.  I can schedule the 
Google Meet if you can't and I'm happy to discuss the topic on Sat, Sun, or 
Mon, although I don't know if anyone else like Mike who might want to join us 
is available before Tues.  This Mon is Memorial Day in the USA and people may 
have special plans during a three-day holiday weekend.

Mike and anyone else interested, do you want to join the discussion and which 
days/times between now and Tues work for you?

Thanks!
John

-----Original Message-----
From: Julian Feinauer <j.feina...@pragmaticminds.de>
Sent: Saturday, May 23, 2020 3:53 AM
To: dev@daffodil.apache.org
Subject: EXT: Re: Code generation from Daffodil runtime2

Hi John,

thanks for taking the initiative here and sorry for beeing a bit silent in the 
last weeks / monthts.
Especially I hope that all of you are well!

I can offer to do a little Zoom / Teams / Google Hangout Webmeeting later today 
or tomorrow (or also next week) to discuss the topic a bit and get started 
together.

What do you think?

Julian

Am 22.05.20, 23:04 schrieb "Interrante, John A (GE Research, US)" 
<inter...@research.ge.com>:

    I'd like to work on the code generation framework 
(https://issues.apache.org/jira/browse/DAFFODIL-2202) initiated by Julian 
Feinauer and discussed by Mike Beckerle in the Daffodil design notes 
(https://cwiki.apache.org/confluence/display/DAFFODIL/WIP%3A+Daffodil+Runtime+2).
  Steve suggested that I start a discussion on the dev mailing list.

    Mike said that our first goal should be to get a very simple code-gen 
example working.  Write a schema that describes a single element (a 32-bit 
binary int), get Daffodil to compile that schema, generate the source code for 
a parser, compile that parser, and run a test against it showing it is working 
as expected:


    <xs:element name="uI_01" type="xs:unsignedInt" />


    <tdml:parserTestCase name="unsignedInt_binary"
      root="uI_01" model="SimpleTypes-binary"
      description="Section 5 Schema types-unsignedInt - DFDL-5-018R">
      <tdml:document>
        <tdml:documentPart 
type="bits">00000000000000000000000000001000</tdml:documentPart>
      </tdml:document>
      <tdml:infoset>
        <tdml:dfdlInfoset>
          <uI_01>8</uI_01>
        </tdml:dfdlInfoset>
      </tdml:infoset>
    </tdml:parserTestCase>


     @Test def test_unsignedInt_binary() { 
runner.runOneTest("unsignedInt_binary") }

    Julian has contributed some code 
(https://github.com/apache/incubator-daffodil/compare/master...JulianFeinauer:feature/daffodil-2202-initial-codegen)
 and Mike has made some changes to it 
(https://github.com/apache/incubator-daffodil/compare/master...mbeckerle:daffodil-2202-initial-codegen).

    We will have to solve a lot of problems before we can run even that test, 
so it would be a big step forward.   I want all the help I can get to get this 
tiny first example working.  I would welcome a suggestion from Julian, Mike, 
Steve, or anyone else how to continue what's been done so far.

    Thanks,
    John

    John Interrante
    Sr. Software Engineer, Software & Analytics
    GE Global Research Center, Niskayuna NY

Re: Code generation from Daffodil runtime2

Reply via email to