[jira] [Updated] (THRIFT-1115) python TBase class for dynamic (de)serialization, and slots option for memory savings

Will Pierce (JIRA) Tue, 24 May 2011 10:30:32 -0700

     [ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Will Pierce updated THRIFT-1115:
--------------------------------

    Attachment: THRIFT-1115.python_dynamic_code_and_slots_v4.patch

Hi David, thanks for the feedback.  I have an updated patch now.  I tried to 
make as few additional changes as possible, so there's forward progress, but 
unfortunately found a couple bugs lurking in there that needed a structural 
change (item 3 below).  I really appreciate the time you've put into helping 
get this patch refined.  (This code is important to get right.)

Answering your questions in-order:

1. Options: (Dynamic or newstyle) and slots
The dynamic option doesn't require the slots option, but there's no memory 
savings in that case.  The slots option can be used without the dynamic (TBase) 
feature, which gives memory savings but the classes use statically generated 
read/write code in the usual way.  The "make check" test-cases exercise all 
four combinations of (dynamic, slots, dynamicslots, <neither>).  It used to be 
a single option 'dynslots', but I split them apart because the slots feature 
can be used independently with the 'newstyle' option to get memory savings 
without TBase dynamic decoding. The test cases exercise each of (slots, 
newstyle, newstyleslots, <neither>).

2. Options: Dynamic and newstyle
The dynamic option produces new-style classes, so I think it would make sense 
for the compiler to ignore the 'newstyle' option if it is included with the 
'dynamic' option.  In this version of the patch, I've adjusted the compiler 
code so that the 'dynamic' option overrides the 'newstyle' option (sets it to 
false), so the code paths won't intersect.  The python test cases don't 
exercise this combination of options, as it treats 'dynamic,newstyle' the same 
as just 'dynamic'.  If you think it makes sense, though, I can add that 
combination to the "make check" testing in trunk/tests/py/Makefile.am file.

3. TExceptionBase read/write method copying:
I've tried it and uncovered some problems.  The method copying that I used for 
{{__eq__}} and {{__repr__}} in TExceptionBase was just wrong, and wasn't being 
exercised by any test code.  After I added the tests, I found a latent bug I 
introduced into the compiler generated non-dynamic, slots-enabled generated 
repr() method, which I've fixed in this updated patch.  (I also fixed some 
inconsistent indentation in TestClient.py)  Also, I reversed the inheritance 
order of generated exception structs, so they inherit from 
(TExceptionBase,Exception) instead of Exception,TExceptionBase, which fixes a 
bug where python's Exception repr() overrode TExceptionBase's.  That 
multiple-inheritance happens within TBase.py now, instead of in the generated 
code, keeping it in one place.  This is fixed in this latest version of the 
patch and the unit tests in TestClient now check the output of Xception's 
repr() using an explicit test assertion.

There was still a problem though, because python doesn't actually allow method 
copying across classes like I thought.  It throws "TypeError: unbound method 
write() must be called with TBase instance as first argument (got 
TBinaryProtocol instance instead)" when the copied method is invoked. (More 
detail: 
[http://mousebender.wordpress.com/2007/02/17/copying-methods-in-python/])  So 
that's a broken path, and I've removed the method copying... One alternative is 
delegating to module-level functions like a shared\_read(), but that would add 
significant runtime cost. It doesn't seem worthwhile to add an extra function 
call in every repr/eq/ne/read/write just to avoid duplicating code.

The only way I can think of to avoid cut-n-paste code in TBase without 
incurring runtime cost is to build a new old-style subclass, TRoot which 
implements what TBase did, then derive TBase from TRoot and 'object' to make it 
new-style (slots capable), and then derive TExceptionBase from TRoot and 
Exception, keeping it old-style for python2.4 compatibility.  The only code 
left in TBase and TExceptionBase are the mandatory {{__slots__ = []}} lines to 
stop python from generating a {{__dict__}} in between TRoot and a struct.  The 
result looks even cleaner than any previous version of this patch.

A picture will demonstrate it better than words.  For a thrift struct named Foo 
and an exception named Xcept, the class hierarchy would have three possible 
shapes, for dynamic vs. newstyle-nondynamic vs. default compiler options:
{noformat}
Dynamic                      Newstyle-nondynamic     Default
========================     ===================     ==============
object  TRoot  Exception     object  Exception       Foo  Exception
    |   /  \      |             |       |                    |
    |  /    \     |             |       |                  Xcept
   TBase   TExceptionBase      Foo    Xcept
    |             |
   Foo          Xcept
{noformat}

The only extra runtime cost here is the internal python method resolution 
lookup for handling on extra level of inheritance.  Given that TBase implements 
{{__slots__}} and has no local methods of its own, I think it's an acceptable 
cost.  The dynamic read() and write() methods are already somewhat slower than 
the auto-generated static code, so this extra method resolution cost is not 
significant.

Note: The impact of this inheritance change is that code generated using the 
"dynamic,slots" option isn't quite as memory-efficient as code generated with 
"newstyle,slots".  This is because the {{TRoot}} class is old-style, and even 
though is implements {{__slots__}}, the python interpreter sticks a 
{{__dict__}} in there.  It's an empty dict though, so its size is constant even 
for thrift structs with very many fields.  The resulting objects still far less 
RAM than the {{__dict__}} based ones, and this constant difference is less 
significant with larger objects.

I did some proof of concept timing tests, and the extra cost for invoking a 
method that is two inheritance levels up versus one level is pretty small.  For 
thrift object construction (Bonk class), the difference is 0.9837 microseconds 
_with_ TRoot versus 0.9488 microseconds without.  Serialization is 18.1025 
microseconds _with_ TRoot versus 18.3218 microseconds without (just noise).  
And deserialization is 24.3407 microseconds _with_ TRoot versus 24.5639 
microseconds without (noise).  So the extra level of inheritance doesn't make a 
practical speed difference.

When python2.4 support is no longer needed (2014? 2015?), we can collapse TRoot 
into TBase and inherit TExceptionBase from TBase and Exception.  The linecount 
will hardly change, but the class hierarchy will be a bit simpler.

And of course, an alternative to using this TRoot parent class would be to 
simply cut and paste the same code between TBase and TExceptionBase, and let 
the only difference be the 'class' line ({{object}} vs. {{Exception}}).

4. TypeError in TProtocol.py's readContainerMap() at line 303:
I've removed the TypeError exception handling, and left a 1-line comment in 
place to help anyone who traces their issues to that point.

Aside: The python deserializer will now fail to decode a message that has a 
map/dict using mutable keys, instead of silently eating the error but decoding 
the rest of the message.  If there was a mechanism for indicating "partial 
success" at decoding, we could allow for different degrees of decoder 
"strictness".  A user-settable strictness property might fit in with some of 
the other discussions about limiting the maximum sizes of some message types, 
like list and string lengths.  I'm definitely interested in adding that kind of 
robustness to the code in a followup patch, in a way that would work well 
across platforms of course.

5. unnecessary whitespace change
I added the whitespace back in, but also have inserted two lines above it to 
implement the repr() result testing, after fixing up my bug in the generated 
TException {{__repr__}} code.

6. unnecessary C++ comment left in
Zapped, thanks!

7. extra hdr string assignment
I merged the two hdr += assignments and made the indentation consistent, so it 
looks a lot cleaner.  I left hdr in for now.  I don't think it does any harm.

8. conflict between newstyle and dynamic
This is resolved now with the options handling code that makes 'dynamic' take 
precedence over newstyle (since dynamic TBase is itself new-style).

9. "else if" to previous line and indentation fix
Done.  I switched back to emacs for the C++ edits, since it auto indents, and 
the GUI IDE I was using before didn't do the job.  Also fixed a 
vertical-alignment on line 805 for {{def __eq__}} output.  Also touched a 
couple other places where my indentation was off a space or two.

10. "perhaps we should try slots" comment
That was a note to myself when I first started this patch.  It's zapped now.

11. trailing whitespace
Zapped teh trailing whitespace, and moved the << endl up to the previous line 
so it reads cleaner.

12. no space after !
Fixed in both places.

13. {{__eq__}} should match TBase's
Fixed up, good catch, thank you!

14. Service class wrong base class
I removed the TBase reference, and made the else check compare (gen_newstyle_ 
|| gen_dynamic_) so services generated with 'dyanmic' end up as new-style, to 
repeat the structs behavior.

15. service class for client
Updated the check to make sure '(object)' is the base class when _either_ 
newstyle or dynamic is selected, not just newstyle.  I'm not entirely sure this 
is what you meant, though.  I'm not as familiar with the service code as other 
parts of thrift.


I'm attaching version of the patch now which passes all unit tests under 
python2.7 and python2.4 on my host at home.


> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -----------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1115
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1115
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Build Process, Python - Compiler, Python - Library
>            Reporter: Will Pierce
>            Assignee: Will Pierce
>         Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of both processes.  I didn't set any 
> values to any attributes, so every attribute was left with the null value, 
> None.  The slots technique used 441 MB with slots vs. 3485 MB using 
> non-slots, non-dynamic generated code.  That's about 13% of the original 
> size, or 87% memory savings.
> I tried the same test using a very tiny object instead, the DebugThrift.thift 
> {{Bonk()}} class, which only has two fields.  For this, I made 10 million 
> instances of {{Bonk()}} and the results were very similar: 730 MB with slots 
> vs. 2622 MB using non-slots.
> The speed difference between non-slots and slots is about 25% better with 
> slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1 
> seconds for the same slots test, or a savings of 24% time.  This only 
> measured object creation time, not serialization/deserialization or instance 
> attribute access. (See the benchmarking results below for 
> serialization/deserialization comparisons).
> If the user wants to subclass the {{TProtocolDynamic}} class and override the 
> read/write methods, or add more functionality, this patch also adds a 
> {{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import 
> *}}" option, to specify a different base class (CLS) for the generated 
> "dynamic" objects, and an import line that brings that base class into scope.
> The python unit tests now get yet another test loop layer, with the 
> {{Makefile.am}} specifying multiple thrift generation lines, exercising the 
> possible combinations of {{thrift \-gen py:blah}}, sending the output to 
> multiple gen-py-blah output directories (like gen\-py\-default, 
> gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that 
> {{RunClientServer.py}} can then loop over each of these and do all of its 
> testing (which is combinatorially exhaustive) on each of those 
> subdirectories.  It's a pretty insane number of tests now, over 390 
> individual tests on python >= 2.5 (ssl + multiprocessing).  Since the test/py 
> "make check" tests aren't done for the 'casual' installer, I think it's 
> acceptable to have the "make check" tests take this long to complete- but it 
> is annoying to have to wait a few minutes for all the tests to 
> finish/succeed.  I think it could be improved a fair bit.
> This patch also makes a few other small changes:
> * adds a comment to the top of each generated .py file with the options to 
> \-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the 
> thrift compiler generated code was created, after the fact.  This should 
> probably be backported to other languages.
> * added another test for inequality to {{SerializationTest.py}} to compare 
> objects of different types, to exercise the {{\_\_eq\_\_}} code more
> * added \-\-genpydir cmdline options to {{TestServer.py}} and 
> {{TestClient.py}} to let {{RunClientServer.py}} loop over the gen\-py\-\* 
> directories as a kind of test controller.
> * reduced verbosity on the TestServer/Client.py code from 2 to 1, since the 
> number of invocations is now huge.
> * changed TestServer's intentional Exception to have a message "Exception 
> test PASSES." instead of "foo", which previously made the unit test output 
> look kind of scary.
> * Adds container read/write methods to {{TProtocol}}, only used by 
> {{TProtocolDynamic}} for "dynamic" generated classes
> * Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it 
> easier to log about {{TType}} related (de)serialization issues.
> * Changes the import line in generated code from "{{from thrift.Thrift import 
> *}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little 
> more controlled about what is imported into the generated\-code namespace.  
> May be slightly more efficient too.
> * reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the 
> "make check" unit testing a bit.  hopefully the delays are large enough that 
> no spurious errors pop up.
> Things to ponder:
> Using -gen py:dynamic,slots actually results in python rejecting any attempt 
> to assign to a non-existent member variable of a thrift object.  It's not the 
> same as statically-typed languages blowing up at compile time for the same 
> reason, but it could expose more subtle user-code errors that otherwise are 
> undetected during serialization.
> There's an interesting edge case in python 2.4 that arises due to old-style 
> classes and exception processing that only shows up with new-style classes 
> (those derived ultimately from object).  I'm not sure how to handle this, 
> since it's a problem that you don't know exists at code-generation time, 
> since we can't really know what version of python is going to execute the 
> generated code.  Usually it just doesn't matter, but the case of exceptions 
> with multiple inheritance is an edge case, it seems.  Since this is only a 
> tricky problem for the unit tests, I put some failure detection code into 
> {{TestClient.py}} to skip the exception testing if the generated class causes 
> a {{TypeError}} exception when raised, instead of itself.  The test code 
> prints a warning: "{{Skipping testing of Xception because python (2, 4) does 
> not support this type of exception (gen py is: gen\-py\-dynamic)}}".
> I'm not sure the {{TProtocolDynamic}} class name is the best possible name 
> for the base class for "dynamic" objects to derive from.  The 
> {{TProtocolDynamic}} class does embody the underlying protocol, at an 
> abstract level, but the only other class in thrift/protocol/ that it is 
> similar to is the {{TProtocolBase}} class.
> This patch adds more dynamic behavior to the thrift library, and it makes 
> sense that we'd want to take it another step and implement a python class 
> that can parse a .thrift file, and generate the corresponding 
> message/struct/etc classes at runtime.  This patch doesn't do that, but I 
> think it makes it easier.  Obviously the use-cases are compelling for 
> generating the actual per-class {{read()}} and {{write()}} methods at 
> thrift-compile time.  But there's some interesting possibilities ahead.
> The unit testing code needs a little more TLC... I'm not too familiar with 
> best practices here, but I'd love to see some code-coverage metrics produced 
> by the unit testing / test cases.  It would be nice to know if some code 
> paths aren't being exercised.  I won't be surprised if there are edge-cases 
> that break the dynamic/slots generated code.  It would be nice to know of 
> any, so we can update the unit tests to explore it (and fix whatever it is).  
> The unit tests don't exercise the py:twisted compatible code at all, so 
> there's likely some lurking bugs in there.
> With this patch, the new output from {{thrift -h}} relating to python is:
> {noformat}
>   --gen STR   Generate code with a dynamically-registered generator.
>                 STR has the form language[:key1=val1[,key2,[key3=val3]]].
>                 Keys and values are options passed to the generator.
>                 Many options will not require values.
> Available generators (and options):
> [ ... snip ... ]
>   py (Python):
>     new_style:       Generate new-style classes.
>     twisted:         Generate Twisted-friendly RPC services.
>     slots:           Generate code using slots for instance members.
>     dynamic:         Generate dynamic code, less code generated but slower.
>     dynbase=CLS      Derive generated classes from class CLS instead of 
> TProtocolDynamic.
>     dynimport='from foo.bar import CLS'
>                      Add an import line to generated code to find the dynbase 
> class.
> {noformat}
> Sorry for such a long ticket body.  I'll attach the patch in a minute or so,

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1115) python TBase class for dynamic (de)serialization, and __slots__ option for memory savings

Reply via email to

[jira] [Updated] (THRIFT-1115) python TBase class for dynamic (de)serialization, and slots option for memory savings