Some high level comments without looking at the code. I'm in favor from abandoning the thrift generated java code in favor of immutable objects. I think it is easier to reason about and will ensure we have less errors in our code. If I understand correctly, the ProtoBuf format does this by default, so there some precedent for this style of code generation already.
I think using Facebook's swift is the best approach here. I would be hesitant to accept any custom code generation that involved us parsing thrift IDL files or thrift formats over the wire because I poses a very high maintenance burden. I also think generating the MyBatis mutable classes is superior to our current strategy of manually creating them. Finally, the performance improvements look fantastic. As an operator of a large cluster I am excited to see wholesale performance improvements as I am always concerned that my cluster is approaching the limits of what Aurora can handle safely. Overall, I think this change merits a serious discussion from all contributors. On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsir...@apache.org> wrote: > On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsir...@apache.org> wrote: > > > Context: Aurora uses the official Apache Thrift compiler today plus a > > home-grown python code generator [1] for immutable "entity" (I*) > wrappers. > > > > The proposal is to switch from using the Apache Thrift code generator to > a > > home grown generator. The proposal comes with a concrete example in the > > form of the actual RBs to effect this change: > > 1. A custom java thrift code generator: > > https://reviews.apache.org/r/42748/ > > 2. A custom MyBatis binding code generator powered by 1 above: > > https://reviews.apache.org/r/42749/ > > 3. Integration of 1 & 2 above into the Aurora codebase: > > https://reviews.apache.org/r/42756/ > > > > Since the RBs are large, I wanted to provide some extra context on the > > idea at a higher level. I provide rationale, pros and cons below for > those > > interested in the idea but wary of diving in on code review until the > idea > > itself passes a sniff test. > > Thanks in advance for your feedback - and if we get there - for your > > review effort. > > > > I just added wfarner and zmanji as reviewers for the 3 RBs above since > they've expressed direct interest. Happy to add others, just speak up or > else just comment on the reviews as you see fit. > I'll formally only submit if 1st this email thread reaches consensus, and > second, reviews are approved. > > == > > > > In the course of an initial run at creating a first-class REST-like > > scheduler interface [2] I came to the conclusion generating the json API > > from the thrift one might be a good path. That idea has been scrapped > with > > community feedback, but an initial experiment in custom thrift code-gen > for > > java that accompanied that idea seemed worth pursuing for its own > > independent benefits, chief among these being 1st class immutable thrift > > structs and the ability to leverage thrift annotations. > > > > Immutability: > > The benefits of having an immutable by default data model are the > standard > > ones; namely, its trivial to reason about safety of concurrent operations > > on the data model, stability of collections containing data model > entities > > and it opens up straight-forward optimizations that are easy to reason > > about. > > An example optimization is caching hashCodes for the immutable thrift > > structs. This was done after comparing jmh benchmarks run against master > > and then again against the proposal branch. Perf was comparable - within > > 10% plus and minus depending on the benchmark, but with the optimization > > added many benchmarks showed pronounced improvement in the proposal > branch > > [3]. The optimization is clearly safe and was quick and easy to > > implement. Further optimizations can be experimented with in a > > straightforward way. > > > > Thrift Annotations: > > The thrift IDL grammar has supported these for quite some time, but they > > are not plumbed to the generated java code. Uses are many and varied. I > > initially had my eye on annotation of thrift services with REST verbs, > > routes, etc - but immediately we can leverage these annotations to kill > > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that > > needs to be maintained. > > > > There are a few downsides to switching to our own java thrift code gen: > > 1. We own more code to maintain: Even though we have the custom python > > "immutable" wrapper generator [1] today, this new generator - even with > the > > python generator removed - represents a 5-6x increase in line count of > > custom code (~4.1k lines of code and tests in the new custom gen, ~700 > > lines in the existing python custom gen) > > 2. We conceptually fork from a sibling Apache project. > > > > The fork could be mitigated by turning our real experience iterating the > > custom code generator into a well-founded patch back into the Apache > Thrift > > project, but saying we'll do that is easier than following through and > > actually doing it. > > > > == > > Review guide / details: > > > > The technology stack: > > The thrift IDL parsing and thrift wire parsing are both handled by the > > Facebook swift project [4]. We only implement the middle bit that > > generates java code stubs. This gives higher confidence that the tricky > > bits out at either edge are done right. > > The thrift struct code generation is done using Square's javapoet [5] in > > favor of templating for the purpose of easier to read generator code. > This > > characterization is debatable though and template are certainly more > > flexible the minute you need to gen a second language (say we like this > and > > want to do javascript codegen this way too for example). > > The MyBatis codegen is forced by the thrift codegen for technical > > reasons. In short, there is no simple way to teach MyBatis to read and > > write immutable objects with builders. So the MyBatis code is generated > > via an annotation processor that runs after thrift code gen, but reading > > thrift annotations that survive that codegen process. > > The codegen unit testing is done with the help of Google's compile-tester > > [6]. NB that this has an expected output comparison that checks the > > generated AST and not the text, so its fairly lenient. Whitepsace and > > comments certainly don't matter. > > > > Review strategy: > > The code generator RBs (#1 & #2 in the 3 part series) are probably easier > > to review looking at samples of the generated code. Both the thrift > > codegen and MyBatis codegen samples are conveniently contained in the > > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/). The unit > > test uses resource files that contain both the thrift codegen inputs the > > annotation processor runs over and the annotation processor expected > > outputs - the MyBatis peer classes. So have a look there if you need > > something concrete and don't want to patch the RBs in and actually run > the > > codegen (`./gradlew api:compileJava`). > > The conversion RB (#3) is large but the changes are mainly mechanical > > conversions from the current mutable thrift + I* wrappers to pure > immutable > > thrift mutated via `.toBuilder` and `.with`'er methods. The main changes > > of note are to the portions of the codebase tightly tied to thrift as a > > technology: > > + Gson/thrift converters > > + Shiro annotated auth param interception > > + Thrift/Servlet binding > > > > [1] > > > https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py > > [2] https://issues.apache.org/jira/browse/AURORA-987 > > [3] > > > https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346 > > [4] https://github.com/facebook/swift > > [5] https://github.com/square/javapoet > > [6] https://github.com/google/compile-testing > > > > -- > Zameer Manji > >