[
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120017#comment-13120017
]
Christian Rakow commented on THRIFT-1115:
-----------------------------------------
I noticed a little bug when using slots and dynamic generation:
Currently TApplicationException is not imported anymore in the service file,
but still in use. The new TExceptionBase class is never used.
Best regards
> python TBase class for dynamic (de)serialization, and __slots__ option for
> memory savings
> -----------------------------------------------------------------------------------------
>
> Key: THRIFT-1115
> URL: https://issues.apache.org/jira/browse/THRIFT-1115
> Project: Thrift
> Issue Type: New Feature
> Components: Build Process, Python - Compiler, Python - Library
> Reporter: Will Pierce
> Assignee: Will Pierce
> Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch,
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch,
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch,
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch,
> THRIFT-1115.python_dynamic_code_and_slots_v5.patch,
> THRIFT-1115.python_dynamic_code_and_slots_v6.patch, test_dynser.py,
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or
> {{write()}} methods. Instead, the generated classes inherit from a new base
> class, {{TProtocolDynamic}}. This new class implements the de/serialization
> with {{read()}} and {{write()}} methods that iterate over the derived class's
> "{{thrift_spec}}" class member that describes the structure and types of the
> thrift. This dynamic evaluation works with both binary and compact
> protocols, and has the same hook as before for delegating de/serialization to
> fastbinary.so for the "accelerated binary" protocol. This new baseclass
> {{read()}} method may even be more efficient than the generated explicit
> {{read()}} code for objects with lots of attributes because it doesn't have a
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple
> directly. That efficiency gain is probably just noise though, since the
> dynamic process probably uses more CPU later on, though I haven't benchmarked
> it. (Up[date: see the benchmarking results posted below for
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the
> generated classes no longer get individual {{\_\_repr\_\_}} and
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}}
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just
> a constructor and class data. All the work of serialization and
> deserialization is done by the base class. This produces about 980 lines for
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base
> class, so users can save memory using the slots flag to generate non-dict
> based instances. The memory difference between dict and slots based objects
> is hard to measure, but seems to be around 10x smaller using slots, as long
> as the base class also uses {{\_\_slots\_\_}}. If the generated classes are
> old-style, and use slots, there's no memory savings at all, because the base
> class still creates a {{\_\_dict\_\_}} object for every instance. Python is
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and
> {{\_\_slots\_\_}}. Building DebugProtoTest.thrift with: -gen
> py:dynamic,slots versus \-gen py results in some pretty amazing memory
> savings. I tested by instantiating 1 million of the heavy
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and
> compared the VmRSS resident memory usage of both processes. I didn't set any
> values to any attributes, so every attribute was left with the null value,
> None. The slots technique used 441 MB with slots vs. 3485 MB using
> non-slots, non-dynamic generated code. That's about 13% of the original
> size, or 87% memory savings.
> I tried the same test using a very tiny object instead, the DebugThrift.thift
> {{Bonk()}} class, which only has two fields. For this, I made 10 million
> instances of {{Bonk()}} and the results were very similar: 730 MB with slots
> vs. 2622 MB using non-slots.
> The speed difference between non-slots and slots is about 25% better with
> slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1
> seconds for the same slots test, or a savings of 24% time. This only
> measured object creation time, not serialization/deserialization or instance
> attribute access. (See the benchmarking results below for
> serialization/deserialization comparisons).
> If the user wants to subclass the {{TProtocolDynamic}} class and override the
> read/write methods, or add more functionality, this patch also adds a
> {{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import
> *}}" option, to specify a different base class (CLS) for the generated
> "dynamic" objects, and an import line that brings that base class into scope.
> The python unit tests now get yet another test loop layer, with the
> {{Makefile.am}} specifying multiple thrift generation lines, exercising the
> possible combinations of {{thrift \-gen py:blah}}, sending the output to
> multiple gen-py-blah output directories (like gen\-py\-default,
> gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that
> {{RunClientServer.py}} can then loop over each of these and do all of its
> testing (which is combinatorially exhaustive) on each of those
> subdirectories. It's a pretty insane number of tests now, over 390
> individual tests on python >= 2.5 (ssl + multiprocessing). Since the test/py
> "make check" tests aren't done for the 'casual' installer, I think it's
> acceptable to have the "make check" tests take this long to complete- but it
> is annoying to have to wait a few minutes for all the tests to
> finish/succeed. I think it could be improved a fair bit.
> This patch also makes a few other small changes:
> * adds a comment to the top of each generated .py file with the options to
> \-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the
> thrift compiler generated code was created, after the fact. This should
> probably be backported to other languages.
> * added another test for inequality to {{SerializationTest.py}} to compare
> objects of different types, to exercise the {{\_\_eq\_\_}} code more
> * added \-\-genpydir cmdline options to {{TestServer.py}} and
> {{TestClient.py}} to let {{RunClientServer.py}} loop over the gen\-py\-\*
> directories as a kind of test controller.
> * reduced verbosity on the TestServer/Client.py code from 2 to 1, since the
> number of invocations is now huge.
> * changed TestServer's intentional Exception to have a message "Exception
> test PASSES." instead of "foo", which previously made the unit test output
> look kind of scary.
> * Adds container read/write methods to {{TProtocol}}, only used by
> {{TProtocolDynamic}} for "dynamic" generated classes
> * Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it
> easier to log about {{TType}} related (de)serialization issues.
> * Changes the import line in generated code from "{{from thrift.Thrift import
> *}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little
> more controlled about what is imported into the generated\-code namespace.
> May be slightly more efficient too.
> * reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the
> "make check" unit testing a bit. hopefully the delays are large enough that
> no spurious errors pop up.
> Things to ponder:
> Using -gen py:dynamic,slots actually results in python rejecting any attempt
> to assign to a non-existent member variable of a thrift object. It's not the
> same as statically-typed languages blowing up at compile time for the same
> reason, but it could expose more subtle user-code errors that otherwise are
> undetected during serialization.
> There's an interesting edge case in python 2.4 that arises due to old-style
> classes and exception processing that only shows up with new-style classes
> (those derived ultimately from object). I'm not sure how to handle this,
> since it's a problem that you don't know exists at code-generation time,
> since we can't really know what version of python is going to execute the
> generated code. Usually it just doesn't matter, but the case of exceptions
> with multiple inheritance is an edge case, it seems. Since this is only a
> tricky problem for the unit tests, I put some failure detection code into
> {{TestClient.py}} to skip the exception testing if the generated class causes
> a {{TypeError}} exception when raised, instead of itself. The test code
> prints a warning: "{{Skipping testing of Xception because python (2, 4) does
> not support this type of exception (gen py is: gen\-py\-dynamic)}}".
> I'm not sure the {{TProtocolDynamic}} class name is the best possible name
> for the base class for "dynamic" objects to derive from. The
> {{TProtocolDynamic}} class does embody the underlying protocol, at an
> abstract level, but the only other class in thrift/protocol/ that it is
> similar to is the {{TProtocolBase}} class.
> This patch adds more dynamic behavior to the thrift library, and it makes
> sense that we'd want to take it another step and implement a python class
> that can parse a .thrift file, and generate the corresponding
> message/struct/etc classes at runtime. This patch doesn't do that, but I
> think it makes it easier. Obviously the use-cases are compelling for
> generating the actual per-class {{read()}} and {{write()}} methods at
> thrift-compile time. But there's some interesting possibilities ahead.
> The unit testing code needs a little more TLC... I'm not too familiar with
> best practices here, but I'd love to see some code-coverage metrics produced
> by the unit testing / test cases. It would be nice to know if some code
> paths aren't being exercised. I won't be surprised if there are edge-cases
> that break the dynamic/slots generated code. It would be nice to know of
> any, so we can update the unit tests to explore it (and fix whatever it is).
> The unit tests don't exercise the py:twisted compatible code at all, so
> there's likely some lurking bugs in there.
> With this patch, the new output from {{thrift -h}} relating to python is:
> {noformat}
> --gen STR Generate code with a dynamically-registered generator.
> STR has the form language[:key1=val1[,key2,[key3=val3]]].
> Keys and values are options passed to the generator.
> Many options will not require values.
> Available generators (and options):
> [ ... snip ... ]
> py (Python):
> new_style: Generate new-style classes.
> twisted: Generate Twisted-friendly RPC services.
> slots: Generate code using slots for instance members.
> dynamic: Generate dynamic code, less code generated but slower.
> dynbase=CLS Derive generated classes from class CLS instead of
> TProtocolDynamic.
> dynimport='from foo.bar import CLS'
> Add an import line to generated code to find the dynbase
> class.
> {noformat}
> Sorry for such a long ticket body. I'll attach the patch in a minute or so,
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira