Is dependency really an issue today, particularly when we bundle the dependencies with the SystemML jar ? I'd rather include a dependency then reinventing the wheel and create some code again (unless the dependency code is flawed).
Also, +1 for continuously reviewing / updating / triming out dependencies. On Mon, Apr 3, 2017 at 11:04 AM, <dusenberr...@gmail.com> wrote: > Using Janino sounds like a great idea. As for the footprint size for > Java-only execution modes, it might make sense to do an audit of our > current dependencies to see if anything can be removed to make up for the > additional amount. Then we could just use it in all scenarios without > worry. > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > > > On Mar 31, 2017, at 9:25 PM, Matthias Boehm <mboe...@googlemail.com> > wrote: > > > > that is a good question. Yes, if we want to enable code generation in > such > > a scenario it would also need Janino, which increases our footprint by > > roughly 0.6MB. > > > > Btw, Janino fits much better into such an in-memory deployment because it > > compiles classes in-memory without the need to write class files into a > > local working directory. The same could be done for > > javax.tools.JavaCompiler, but would require to custom in-memory > > JavaFileManager. > > > > Regards, > > Matthias > > > > On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <reinw...@us.ibm.com> > > wrote: > > > >> Sounds like a good idea. > >> > >> Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, > will > >> the dependency on Janino still be there (that question applies to JDK as > >> well), and what is the footprint? > >> > >> Regards, > >> Berthold Reinwald > >> IBM Almaden Research Center > >> office: (408) 927 2208; T/L: 457 2208 > >> e-mail: reinw...@us.ibm.com > >> > >> > >> > >> From: Matthias Boehm <mboe...@googlemail.com> > >> To: dev@systemml.incubator.apache.org > >> Date: 03/31/2017 08:17 PM > >> Subject: Java compiler for code generation > >> > >> > >> > >> Hi all, > >> > >> currently, our new code generator for operator fusion, uses the > >> programmatic javax.tools.JavaCompiler, which is Java's standard API for > >> compilation. Despite a plan cache that mitigates unnecessary compilation > >> and recompilation overheads, we still see significant end-to-end > overhead > >> especially for small input data. > >> > >> Moving forward, I'd like to switch to Janino > >> (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java > >> compiler with restricted language support. The advantages are > >> > >> (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, > GLM, > >> and MLogreg, Janino improved total javac compilation time from 2.039 to > >> 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854 > >> to > >> 0.283 (46 operators), respectively. At the same time, there was no > >> measurable impact on runtime efficiency, but even slightly reduced JIT > >> compilation overhead. > >> > >> (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler > >> requires the existence of a JDK, while Janino only requires a JRE, which > >> means it makes it easier to apply code generation by default. > >> > >> However, I'm raising this here as Janino would add another explicit > >> dependency (with BSD license). Fortunately, Spark also uses Janino for > >> whole-stage-codegen. So we should be able to mark Janino as provided > >> library. The only issue is a pure Hadoop environment, where we still > want > >> to use code generation for CP operations. To simplify the build, I could > >> imagine using the javax.tools.JavaCompiler for hadoop execution types, > but > >> Janino by default. > >> > >> If you have any concerns, please let me know by Monday; otherwise I'd > like > >> to push this change into our upcoming 0.14 release. > >> > >> > >> Regards, > >> Matthias > >> > >> > >> > >> > >> > -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/