Re: [PROPOSAL] Change java thrift code gen

Zameer Manji Wed, 27 Jan 2016 14:46:41 -0800

Some high level comments without looking at the code.

I'm in favor from abandoning the thrift generated java code in favor of
immutable objects. I think it is easier to reason about and will ensure we
have less errors in our code. If I understand correctly, the ProtoBuf
format does this by default, so there some precedent for this style of code
generation already.


I think using Facebook's swift is the best approach here. I would be
hesitant to accept any custom code generation that involved us parsing
thrift IDL files or thrift formats over the wire because I poses a very
high maintenance burden.

I also think generating the MyBatis mutable classes is superior to our
current strategy of manually creating them.

Finally, the performance improvements look fantastic. As an operator of a
large cluster I am excited to see wholesale performance improvements as I
am always concerned that my cluster is approaching the limits of what
Aurora can handle safely.

Overall, I think this change merits a serious discussion from all
contributors.

On Tue, Jan 26, 2016 at 9:19 PM, John Sirois <jsir...@apache.org> wrote:

> On Tue, Jan 26, 2016 at 8:47 PM, John Sirois <jsir...@apache.org> wrote:
>
> > Context: Aurora uses the official Apache Thrift compiler today plus a
> > home-grown python code generator [1] for immutable "entity" (I*)
> wrappers.
> >
> > The proposal is to switch from using the Apache Thrift code generator to
> a
> > home grown generator.  The proposal comes with a concrete example in the
> > form of the actual RBs to effect this change:
> > 1. A custom java thrift code generator:
> > https://reviews.apache.org/r/42748/
> > 2. A custom MyBatis binding code generator powered by 1 above:
> > https://reviews.apache.org/r/42749/
> > 3. Integration of 1 & 2 above into the Aurora codebase:
> > https://reviews.apache.org/r/42756/
> >
> > Since the RBs are large, I wanted to provide some extra context on the
> > idea at a higher level.  I provide rationale, pros and cons below for
> those
> > interested in the idea but wary of diving in on code review until the
> idea
> > itself passes a sniff test.
> > Thanks in advance for your feedback - and if we get there - for your
> > review effort.
> >
>
> I just added wfarner and zmanji as reviewers for the 3 RBs above since
> they've expressed direct interest.  Happy to add others, just speak up or
> else just comment on the reviews as you see fit.
> I'll formally only submit if 1st this email thread reaches consensus, and
> second, reviews are approved.
>
> ==
> >
> > In the course of an initial run at creating a first-class REST-like
> > scheduler interface [2] I came to the conclusion generating the json API
> > from the thrift one might be a good path.  That idea has been scrapped
> with
> > community feedback, but an initial experiment in custom thrift code-gen
> for
> > java that accompanied that idea seemed worth pursuing for its own
> > independent benefits, chief among these being 1st class immutable thrift
> > structs and the ability to leverage thrift annotations.
> >
> > Immutability:
> > The benefits of having an immutable by default data model are the
> standard
> > ones; namely, its trivial to reason about safety of concurrent operations
> > on the data model, stability of collections containing data model
> entities
> > and it opens up straight-forward optimizations that are easy to reason
> > about.
> > An example optimization is caching hashCodes for the immutable thrift
> > structs.  This was done after comparing jmh benchmarks run against master
> > and then again against the proposal branch.  Perf was comparable - within
> > 10% plus and minus depending on the benchmark, but with the optimization
> > added many benchmarks showed pronounced improvement in the proposal
> branch
> > [3].  The optimization is clearly safe and was quick and easy to
> > implement.  Further optimizations can be experimented with in a
> > straightforward way.
> >
> > Thrift Annotations:
> > The thrift IDL grammar has supported these for quite some time, but they
> > are not plumbed to the generated java code.  Uses are many and varied.  I
> > initially had my eye on annotation of thrift services with REST verbs,
> > routes, etc - but immediately we can leverage these annotations to kill
> > AnnotatedAuroraAdmin and reduce the amount of MyBatis binding code that
> > needs to be maintained.
> >
> > There are a few downsides to switching to our own java thrift code gen:
> > 1. We own more code to maintain:  Even though we have the custom python
> > "immutable" wrapper generator [1] today, this new generator - even with
> the
> > python generator removed - represents a 5-6x increase in line count of
> > custom code (~4.1k lines of code and tests in the new custom gen, ~700
> > lines in the existing python custom gen)
> > 2. We conceptually fork from a sibling Apache project.
> >
> > The fork could be mitigated by turning our real experience iterating the
> > custom code generator into a well-founded patch back into the Apache
> Thrift
> > project, but saying we'll do that is easier than following through and
> > actually doing it.
> >
> > ==
> > Review guide / details:
> >
> > The technology stack:
> > The thrift IDL parsing and thrift wire parsing are both handled by the
> > Facebook swift project [4].  We only implement the middle bit that
> > generates java code stubs.  This gives higher confidence that the tricky
> > bits out at either edge are done right.
> > The thrift struct code generation is done using Square's javapoet [5] in
> > favor of templating for the purpose of easier to read generator code.
> This
> > characterization is debatable though and template are certainly more
> > flexible the minute you need to gen a second language (say we like this
> and
> > want to do javascript codegen this way too for example).
> > The MyBatis codegen is forced by the thrift codegen for technical
> > reasons.  In short, there is no simple way to teach MyBatis to read and
> > write immutable objects with builders.  So the MyBatis code is generated
> > via an annotation processor that runs after thrift code gen, but reading
> > thrift annotations that survive that codegen process.
> > The codegen unit testing is done with the help of Google's compile-tester
> > [6].  NB that this has an expected output comparison that checks the
> > generated AST and not the text, so its fairly lenient.  Whitepsace and
> > comments certainly don't matter.
> >
> > Review strategy:
> > The code generator RBs (#1 & #2 in the 3 part series) are probably easier
> > to review looking at samples of the generated code.  Both the thrift
> > codegen and MyBatis codegen samples are conveniently contained in the
> > MyBatis codegen RB (#2: https://reviews.apache.org/r/42749/).  The unit
> > test uses resource files that contain both the thrift codegen inputs the
> > annotation processor runs over and the annotation processor expected
> > outputs  - the MyBatis peer classes.  So have a look there if you need
> > something concrete and don't want to patch the RBs in and actually run
> the
> > codegen (`./gradlew api:compileJava`).
> > The conversion RB (#3) is large but the changes are mainly mechanical
> > conversions from the current mutable thrift + I* wrappers to pure
> immutable
> > thrift mutated via `.toBuilder` and `.with`'er methods.  The main changes
> > of note are to the portions of the codebase tightly tied to thrift as a
> > technology:
> > + Gson/thrift converters
> > + Shiro annotated auth param interception
> > + Thrift/Servlet binding
> >
> > [1]
> >
> https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/tools/java/thrift_wrapper_codegen.py
> > [2] https://issues.apache.org/jira/browse/AURORA-987
> > [3]
> >
> https://docs.google.com/spreadsheets/d/1-CYMnEjzknAsY5_r_NVX8r85wxtrEByZ5YRiAbgMhP0/edit#gid=840229346
> > [4] https://github.com/facebook/swift
> > [5] https://github.com/square/javapoet
> > [6] https://github.com/google/compile-testing
> >
>
> --
> Zameer Manji
>
>

Re: [PROPOSAL] Change java thrift code gen

Reply via email to