RE: Keep compiled C code or throw it away?

Interrante, John A (GE Research, US) Tue, 06 Oct 2020 05:31:13 -0700

Yes, I have implemented a Runtime2TDMLDFDLProcessor and it works as Steve 
described although it delegates some of the actual work to a 
Runtime2DataProcessor.  I already had to implement generating, compiling, 
linking, and running the generated C code in the Runtime2DataProcessor and 
Runtime2TDMLDFDLProcessor classes.   The parts that I may change based on the 
pull request comments and dev list discussion will be:


1) No need to provide a Scala parser, unparser API to the "daffodil parse", 
"daffodil unparse" CLI commands (merge Runtime2DataProcessor into 
Runtime2TDMLProcessor).

2) Continue compiling C source files with sbt for the sake of quicker 
compile/edit/fix cycles, but don't include machine binary files in the Daffodil 
distribution (put only C source and header files in a distribution jar).

3) Make the C code generator and Runtime2TDMLProcessor logic work the same 
regardless of when and where the code's called (on a dev's computer before 
building a distribution or after a distribution is installed on a user's 
computer).

4) Change the snap compilation, linking, and execution implementation to 
extract all C source and header files from jars and use caching where possible 
(in user's XDG_CACHE_HOME and using ideas from NixOS for building immutable, 
reproducible package stores)

5) Use the same jar extraction, snap compilation, caching where possible, 
linking, and execution implementation whether invoked by the IDE running Scala 
test cases, sbt test, the "daffodil test <tdml-file>" command line, or the 
"daffodil generate C" command line.  

Does all that sound good?  


-----Original Message-----
From: Beckerle, Mike <[email protected]> 
Sent: Monday, October 5, 2020 5:26 PM
To: [email protected]
Subject: EXT: Re: Keep compiled C code or throw it away?

re: new TDMLDFDLProcessor is needed. This exists and I think works as you have 
mentioned.


________________________________
From: Steve Lawrence <[email protected]>
Sent: Monday, October 5, 2020 5:12 PM
To: [email protected] <[email protected]>
Subject: Re: Keep compiled C code or throw it away?

> This would work quite like daffodil-propgen then. Just at test-compile
time, not regular compile time.

That requires an sbt source/resource generator, which means it depends on the 
sbt configuration in order to test. Might make testing with other IDE's more 
difficult. It also means something like the "daffodil test"
CLI command couldn't work since that doesn't use sbt.

What if we just create a new TDMLDFDLProcessor that is specific to the new c 
generator backend. This new TDMLDFDLProcessor can generate code based on the 
schema being tested, compile the schemas (using caching where possible), and 
execute whatever is compiled to parse/unparse, capture the result, and return 
it as a Parser/UnparseResult that the TDMLDaffodilProcessor can use. This 
TDMLDFDLProcessor essentially mimics how a normal user would use it, just like 
the current TDMLDFDLProcessor does.

This is analogous to how the IBM DFDL implementation works. This 
TDMLDFDLProcesor just happens to use the same Daffodil frontend but with a 
different Daffodil backend.


On 10/5/20 5:02 PM, Beckerle, Mike wrote:
> There are 3 different kinds of code in Daffodil:
>
> 1) static code humans write - compiler code and runtime code, and test rig 
> code. This includes scala, java, TDML, and soon enough, C code.
>
> 2) code that is generated that becomes part of daffodil itself. This is 
> generated by code in the daffodil-propgen library and creates src-managed, 
> and resource-managed code and resources in the daffodil-lib.
>
> The above are taken care of by SBT, whether scala, java, or C code.
>
> None of the above has anything to do with a DFDL schema created by a user.
>
> 3) The C-code generator creates C code from a user's schema.
>
> I would expect that generator to perhaps lay down not just the C code, but 
> make/build files so the user can build and run their code stand-alone.
>
> But I think of this as 100% separate from daffodil's build.sbt build system. 
> It could use sbt even, but it's not daffodil's build.
>
> The place where things get confusing is that in order to test (3) above, we 
> need to incorporate generating, compiling, linking, and running the generated 
> C code into daffodil's build, for testing purposes.
>
> So I think as a part of daffodil's build, analogous to how daffodil-propgen 
> puts Scala code into daffodil-lib/src-managed/scala/... the C-code 
> generator's src/test/scala code can be used to put C code into 
> daffodil-runtime2/test-managed/C/....
>
> That test-managed/C code would only be for test, but sbt would see it there 
> and compile it almost as if it were hand-written C code.
>
> This would work quite like daffodil-propgen then. Just at test-compile time, 
> not regular compile time.
>
> Does that makes sense?
>
>
>
> ________________________________
> From: Interrante, John A (GE Research, US) <[email protected]>
> Sent: Monday, October 5, 2020 2:11 PM
> To: [email protected] <[email protected]>
> Subject: Keep compiled C code or throw it away?
>
> The timing of when to compile the C source files that we will be adding to 
> the Daffodil source tree is another topic I would like to discuss on the dev 
> list.   I am using a sbt C compiler plugin in my runtime2 push request to 
> allow Daffodil's sbt build to compile C source files as well as Scala source 
> files.  We would have to include both the libraries built by the C compiler 
> (there would be several, not just one, as Mike pointed out) and some 
> corresponding C header/source files in a Daffodil distribution and/or the 
> output directory of a "daffodil generate C" command.
>
> The current discussion in the pull request is now wavering between:
>
>   1) Build the C libraries and distribute them with daffodil in its 
> daffodil/include and daffodil/lib directories
>   2) Build the C libraries, put them along with source files in a jar, and 
> distribute the jar with Daffodil
>   3) Put just the C source files in a jar and distribute the jar with 
> Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" 
> commands will snap compile and/or execute the C files
>
> The question comes down to this: what is the best time to build the C source 
> files?
>
>   - Before distribution: This allows us to verify that C source files build 
> and we can test them before we distribute them
>   - After distribution: We simplify the sbt build and don't need to 
> build multiple daffodil distributions for different platforms
>
> Are there other choices too?  Actually, I think we need to do BOTH.  We can 
> fix compilation errors quicker if we can build C source files immediately 
> after editing them.  We also need to test the C code by running TDML tests 
> every time we run sbt test or sbt c-generator/test, which implies we need to 
> build the C source files before distribution as well as after distribution.  
> However, throwing away the C-code libraries during distribution time does 
> mean that we need to compile 50K lines of C code possibly multiple times or 
> cache built C libraries somewhere in order to improve the user's experience.
>
> So the question really is this - do we want to throw away the compiled 
> libraries (".a" files) and distribute only the C source code in 
> platform-independent jars, or distribute compiled machine binary files along 
> with the C source files in or with the platform-independent jars?
>
> -----Original Message-----
> From: Steve Lawrence <[email protected]>
> Sent: Monday, October 5, 2020 10:49 AM
> To: [email protected]
> Subject: EXT: Re: Subproject names proposed for discussion
>
> A handful of unrelated thoughts, maybe overthinking things and I don't feel 
> strongly about anything below, but renaming is always pain so it'd be nice to 
> ensure we have something future proof.
>
> 1) Is there any benefit organizationally to having all backends being in the 
> same directory?
>
> 2) From a sorting perspective, it'd be nice if the scala projects were 
> together, so having it be scala-parser and scala-unparser rather than 
> parser-scala and unparser-scala has advantages.
>
> 3) Maybe the scala parser/unparser should be considered the same "scala"
> runtime, and so parser/unparser should be subdirectories of a 
> "daffodil-backend-scala" subdirectory?
>
> 4) Is there even a benefit to separating parser/unparser into separate jars? 
> There's so much shared logic between the two, and there's even a bunch of 
> unparsing stuff in the parser jar. Should we just combine them under the same 
> backend?
>
> Taking all of the above into account, perhaps something like this:
>
> ...
> |-- daffodil-backends
> |   |-- daffodil-scala
> |   |   `-- src
> |   `-- daffodil-generator-c
> |       `-- src
> |-- daffodil-lib
> |   `-- src
> |-- daffodil-schema-compiler
> |   `-- src
> ...
>
> 5) Is there something better than "backend" for describing these. I can't 
> think of anything. Does the DFDL spec have a concept of this?
>
> 6) Are there any benefits to using "codenames". My thinking is maybe someday 
> there could be multiple "scala" backends with different goals/extensions, and 
> so "daffodil-scala" is too generic. Codenames would be more like what we have 
> today, except real code names might be easier to remember than "runtime1" and 
> "runtime2". Disadvantage is there's less discoverability, but a README could 
> be added with short descriptions about what the backends try to accomplish. 
> Not sure I like this, but thought I'd throw it out there.
>
>
>
> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>> +1 from me.
>>
>> ________________________________
>> From: Interrante, John A (GE Research, US) <[email protected]>
>> Sent: Monday, October 5, 2020 9:28 AM
>> To: [email protected] <[email protected]>
>> Subject: Subproject names proposed for discussion
>>
>> Steve Lawrence and I would like to bring a topic to the dev list for 
>> discussion since not everyone is paying attention to the review of my 
>> runtime2 push request.  Steve suggested, and I agree, that renaming some of 
>> the Daffodil subprojects might make their meanings more obvious to newcomer 
>> devs.  If we do rename some subprojects after discussing it on this list, we 
>> will do it immediately in its own pull request since mixing changes with 
>> renames makes it difficult to see which changes are just renames instead of 
>> actual changes.
>>
>> What do devs think about us renaming some subprojects like this?
>>
>>     rename daffodil-core to daffodil-schema-compiler
>>     leave daffodil-lib alone
>>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>     rename daffodil-runtime2 to daffodil-backend-generator-c
>>
>>
>
>

RE: Keep compiled C code or throw it away?

Reply via email to