Sounds like a good idea. Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will the dependency on Janino still be there (that question applies to JDK as well), and what is the footprint?
Regards, Berthold Reinwald IBM Almaden Research Center office: (408) 927 2208; T/L: 457 2208 e-mail: reinw...@us.ibm.com From: Matthias Boehm <mboe...@googlemail.com> To: dev@systemml.incubator.apache.org Date: 03/31/2017 08:17 PM Subject: Java compiler for code generation Hi all, currently, our new code generator for operator fusion, uses the programmatic javax.tools.JavaCompiler, which is Java's standard API for compilation. Despite a plan cache that mitigates unnecessary compilation and recompilation overheads, we still see significant end-to-end overhead especially for small input data. Moving forward, I'd like to switch to Janino (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java compiler with restricted language support. The advantages are (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM, and MLogreg, Janino improved total javac compilation time from 2.039 to 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854 to 0.283 (46 operators), respectively. At the same time, there was no measurable impact on runtime efficiency, but even slightly reduced JIT compilation overhead. (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler requires the existence of a JDK, while Janino only requires a JRE, which means it makes it easier to apply code generation by default. However, I'm raising this here as Janino would add another explicit dependency (with BSD license). Fortunately, Spark also uses Janino for whole-stage-codegen. So we should be able to mark Janino as provided library. The only issue is a pure Hadoop environment, where we still want to use code generation for CP operations. To simplify the build, I could imagine using the javax.tools.JavaCompiler for hadoop execution types, but Janino by default. If you have any concerns, please let me know by Monday; otherwise I'd like to push this change into our upcoming 0.14 release. Regards, Matthias