[jira] [Updated] (THRIFT-1115) dynamic python object (de)serialization, using slots for less memory, and custom object inheritance

Will Pierce (JIRA) Mon, 04 Apr 2011 05:19:57 -0700

     [ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Will Pierce updated THRIFT-1115:
--------------------------------

    Description: 
This patch adds several new features to the compiler for python and the python 
libraries, and exercises the new features with expanded unit testing.

This adds support for generating python classes that have no {{read()}} or 
{{write()}} methods. Instead, the generated classes inherit from a new base 
class, {{TProtocolDynamic}}. This new class implements the de/serialization 
with {{read()}} and {{write()}} methods that iterate over the derived class's 
"{{thrift_spec}}" class member that describes the structure and types of the 
thrift.  This dynamic evaluation works with both binary and compact protocols, 
and has the same hook as before for delegating de/serialization to 
fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
{{read()}} method may even be more efficient than the generated explicit 
{{read()}} code for objects with lots of attributes because it doesn't have a 
case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
directly.  That efficiency gain is probably just noise though, since the 
dynamic process probably uses more CPU later on, though I haven't benchmarked 
it. (Up[date: see the benchmarking results posted below for 
construction/serialization/deserialization comparisons.)

If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
generated classes no longer get individual {{\_\_repr\_\_}} and {{\_\_eq\_\_}} 
and {{\_\_ne\_\_}} methods, instead they inherit from the TProtocolDynamic base 
class implementation, which uses {{\_\_slots\_\_}} instead of {{\_\_dict\_\_}} 
for repr and equality testing.

When "dynamic" python classes are generated, they have very little code, just a 
constructor and class data.  All the work of serialization and deserialization 
is done by the base class.  This produces about 980 lines for DebugProtoTest 
vs. 3540 lines in default "\-gen py" mode, or about 1/3 the original code size.

The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
class, so users can save memory using the slots flag to generate non-dict based 
instances.  The memory difference between dict and slots based objects is hard 
to measure, but seems to be around 10x smaller using slots, as long as the base 
class also uses {{\_\_slots\_\_}}.  If the generated classes are old-style, and 
use slots, there's no memory savings at all, because the base class still 
creates a {{\_\_dict\_\_}} object for every instance.  Python is just tricky 
when it comes to using {{\_\_slots\_\_}} best.

The memory savings is pretty astounding using new-style classes and 
{{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen py:dynamic,slots 
versus \-gen py results in some pretty amazing memory savings.  I tested by 
instantiating 1 million of the heavy DebugProtoTest.thrift's 
{{CompactProtoTestStruct()}}, which has 49 attributes in it, using regular 
"\-gen py" code versus "{{\-gen py:dynamic,slots}}" and compared the VmRSS 
resident memory usage of both processes.  I didn't set any values to any 
attributes, so every attribute was left with the null value, None.  The slots 
technique used 441 MB with slots vs. 3485 MB using non-slots, non-dynamic 
generated code.  That's about 13% of the original size, or 87% memory savings.

I tried the same test using a very tiny object instead, the DebugThrift.thift 
{{Bonk()}} class, which only has two fields.  For this, I made 10 million 
instances of {{Bonk()}} and the results were very similar: 730 MB with slots 
vs. 2622 MB using non-slots.

The speed difference between non-slots and slots is about 25% better with 
slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1 
seconds for the same slots test, or a savings of 24% time.  This only measured 
object creation time, not serialization/deserialization or instance attribute 
access. (See the benchmarking results below for serialization/deserialization 
comparisons).

If the user wants to subclass the {{TProtocolDynamic}} class and override the 
read/write methods, or add more functionality, this patch also adds a 
{{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import *}}" 
option, to specify a different base class (CLS) for the generated "dynamic" 
objects, and an import line that brings that base class into scope.

The python unit tests now get yet another test loop layer, with the 
{{Makefile.am}} specifying multiple thrift generation lines, exercising the 
possible combinations of {{thrift \-gen py:blah}}, sending the output to 
multiple gen-py-blah output directories (like gen\-py\-default, 
gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that {{RunClientServer.py}} 
can then loop over each of these and do all of its testing (which is 
combinatorially exhaustive) on each of those subdirectories.  It's a pretty 
insane number of tests now, over 390 individual tests on python >= 2.5 (ssl + 
multiprocessing).  Since the test/py "make check" tests aren't done for the 
'casual' installer, I think it's acceptable to have the "make check" tests take 
this long to complete- but it is annoying to have to wait a few minutes for all 
the tests to finish/succeed.  I think it could be improved a fair bit.

This patch also makes a few other small changes:
* adds a comment to the top of each generated .py file with the options to 
\-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the 
thrift compiler generated code was created, after the fact.  This should 
probably be backported to other languages.
* added another test for inequality to {{SerializationTest.py}} to compare 
objects of different types, to exercise the {{\_\_eq\_\_}} code more
* added \-\-genpydir cmdline options to {{TestServer.py}} and {{TestClient.py}} 
to let {{RunClientServer.py}} loop over the gen\-py\-\* directories as a kind 
of test controller.
* reduced verbosity on the TestServer/Client.py code from 2 to 1, since the 
number of invocations is now huge.
* changed TestServer's intentional Exception to have a message "Exception test 
PASSES." instead of "foo", which previously made the unit test output look kind 
of scary.
* Adds container read/write methods to {{TProtocol}}, only used by 
{{TProtocolDynamic}} for "dynamic" generated classes
* Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it 
easier to log about {{TType}} related (de)serialization issues.
* Changes the import line in generated code from "{{from thrift.Thrift import 
*}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little more 
controlled about what is imported into the generated\-code namespace.  May be 
slightly more efficient too.
* reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the 
"make check" unit testing a bit.  hopefully the delays are large enough that no 
spurious errors pop up.

Things to ponder:

Using -gen py:dynamic,slots actually results in python rejecting any attempt to 
assign to a non-existent member variable of a thrift object.  It's not the same 
as statically-typed languages blowing up at compile time for the same reason, 
but it could expose more subtle user-code errors that otherwise are undetected 
during serialization.

There's an interesting edge case in python 2.4 that arises due to old-style 
classes and exception processing that only shows up with new-style classes 
(those derived ultimately from object).  I'm not sure how to handle this, since 
it's a problem that you don't know exists at code-generation time, since we 
can't really know what version of python is going to execute the generated 
code.  Usually it just doesn't matter, but the case of exceptions with multiple 
inheritance is an edge case, it seems.  Since this is only a tricky problem for 
the unit tests, I put some failure detection code into {{TestClient.py}} to 
skip the exception testing if the generated class causes a {{TypeError}} 
exception when raised, instead of itself.  The test code prints a warning: 
"{{Skipping testing of Xception because python (2, 4) does not support this 
type of exception (gen py is: gen\-py\-dynamic)}}".

I'm not sure the {{TProtocolDynamic}} class name is the best possible name for 
the base class for "dynamic" objects to derive from.  The {{TProtocolDynamic}} 
class does embody the underlying protocol, at an abstract level, but the only 
other class in thrift/protocol/ that it is similar to is the {{TProtocolBase}} 
class.

This patch adds more dynamic behavior to the thrift library, and it makes sense 
that we'd want to take it another step and implement a python class that can 
parse a .thrift file, and generate the corresponding message/struct/etc classes 
at runtime.  This patch doesn't do that, but I think it makes it easier.  
Obviously the use-cases are compelling for generating the actual per-class 
{{read()}} and {{write()}} methods at thrift-compile time.  But there's some 
interesting possibilities ahead.

The unit testing code needs a little more TLC... I'm not too familiar with best 
practices here, but I'd love to see some code-coverage metrics produced by the 
unit testing / test cases.  It would be nice to know if some code paths aren't 
being exercised.  I won't be surprised if there are edge-cases that break the 
dynamic/slots generated code.  It would be nice to know of any, so we can 
update the unit tests to explore it (and fix whatever it is).  The unit tests 
don't exercise the py:twisted compatible code at all, so there's likely some 
lurking bugs in there.

With this patch, the new output from {{thrift -h}} relating to python is:
{noformat}
  --gen STR   Generate code with a dynamically-registered generator.
                STR has the form language[:key1=val1[,key2,[key3=val3]]].
                Keys and values are options passed to the generator.
                Many options will not require values.

Available generators (and options):

[ ... snip ... ]

  py (Python):
    new_style:       Generate new-style classes.
    twisted:         Generate Twisted-friendly RPC services.
    slots:           Generate code using slots for instance members.
    dynamic:         Generate dynamic code, less code generated but slower.
    dynbase=CLS      Derive generated classes from class CLS instead of 
TProtocolDynamic.
    dynimport='from foo.bar import CLS'
                     Add an import line to generated code to find the dynbase 
class.
{noformat}

Sorry for such a long ticket body.  I'll attach the patch in a minute or so,


  was:

This patch adds several new features to the compiler for python and the python 
libraries, and exercises the new features with expanded unit testing.

This adds support for generating python classes that have no {{read()}} or 
{{write()}} methods. Instead, the generated classes inherit from a new base 
class, {{TProtocolDynamic}}. This new class implements the de/serialization 
with {{read()}} and {{write()}} methods that iterate over the derived class's 
"{{thrift_spec}}" class member that describes the structure and types of the 
thrift.  This dynamic evaluation works with both binary and compact protocols, 
and has the same hook as before for delegating de/serialization to 
fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
{{read()}} method may even be more efficient than the generated explicit 
{{read()}} code for objects with lots of attributes because it doesn't have a 
case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
directly.  That efficiency gain is probably just noise though, since the 
dynamic process probably uses more CPU later on, though I haven't benchmarked 
it.

If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
generated classes no longer get individual {{\_\_repr\_\_}} and {{\_\_eq\_\_}} 
and {{\_\_ne\_\_}} methods, instead they inherit from the TProtocolDynamic base 
class implementation, which uses {{\_\_slots\_\_}} instead of {{\_\_dict\_\_}} 
for repr and equality testing.

When "dynamic" python classes are generated, they have very little code, just a 
constructor and class data.  All the work of serialization and deserialization 
is done by the base class.  This produces about 980 lines for DebugProtoTest 
vs. 3540 lines in default "\-gen py" mode, or about 1/3 the original code size.

The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
class, so users can save memory using the slots flag to generate non-dict based 
instances.  The memory difference between dict and slots based objects is hard 
to measure, but seems to be around 10x smaller using slots, as long as the base 
class also uses {{\_\_slots\_\_}}.  If the generated classes are old-style, and 
use slots, there's no memory savings at all, because the base class still 
creates a {{\_\_dict\_\_}} object for every instance.  Python is just tricky 
when it comes to using {{\_\_slots\_\_}} best.

The memory savings is pretty astounding using new-style classes and 
{{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen py:dynamic,slots 
versus \-gen py results in some pretty amazing memory savings.  I tested by 
instantiating 1 million of the heavy DebugProtoTest.thrift's 
{{CompactProtoTestStruct()}}, which has 49 attributes in it, using regular 
"\-gen py" code versus "{{\-gen py:dynamic,slots}}" and compared the VmRSS 
resident memory usage of both processes.  I didn't set any values to any 
attributes, so every attribute was left with the null value, None.  The slots 
technique used 441 MB with slots vs. 3485 MB using non-slots, non-dynamic 
generated code.  That's about 13% of the original size, or 87% memory savings.

I tried the same test using a very tiny object instead, the DebugThrift.thift 
{{Bonk()}} class, which only has two fields.  For this, I made 10 million 
instances of {{Bonk()}} and the results were very similar: 730 MB with slots 
vs. 2622 MB using non-slots.

The speed difference between non-slots and slots is about 25% better with 
slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1 
seconds for the same slots test, or a savings of 24% time.  This only measured 
object creation time, not serialization/deserialization or instance attribute 
access.

If the user wants to subclass the {{TProtocolDynamic}} class and override the 
read/write methods, or add more functionality, this patch also adds a 
{{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import *}}" 
option, to specify a different base class (CLS) for the generated "dynamic" 
objects, and an import line that brings that base class into scope.

The python unit tests now get yet another test loop layer, with the 
{{Makefile.am}} specifying multiple thrift generation lines, exercising the 
possible combinations of {{thrift \-gen py:blah}}, sending the output to 
multiple gen-py-blah output directories (like gen\-py\-default, 
gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that {{RunClientServer.py}} 
can then loop over each of these and do all of its testing (which is 
combinatorially exhaustive) on each of those subdirectories.  It's a pretty 
insane number of tests now, over 390 individual tests on python >= 2.5 (ssl + 
multiprocessing).  Since the test/py "make check" tests aren't done for the 
'casual' installer, I think it's acceptable to have the "make check" tests take 
this long to complete- but it is annoying to have to wait a few minutes for all 
the tests to finish/succeed.  I think it could be improved a fair bit.

This patch also makes a few other small changes:
* adds a comment to the top of each generated .py file with the options to 
\-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the 
thrift compiler generated code was created, after the fact.  This should 
probably be backported to other languages.
* added another test for inequality to {{SerializationTest.py}} to compare 
objects of different types, to exercise the {{\_\_eq\_\_}} code more
* added \-\-genpydir cmdline options to {{TestServer.py}} and {{TestClient.py}} 
to let {{RunClientServer.py}} loop over the gen\-py\-\* directories as a kind 
of test controller.
* reduced verbosity on the TestServer/Client.py code from 2 to 1, since the 
number of invocations is now huge.
* changed TestServer's intentional Exception to have a message "Exception test 
PASSES." instead of "foo", which previously made the unit test output look kind 
of scary.
* Adds container read/write methods to {{TProtocol}}, only used by 
{{TProtocolDynamic}} for "dynamic" generated classes
* Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it 
easier to log about {{TType}} related (de)serialization issues.
* Changes the import line in generated code from "{{from thrift.Thrift import 
*}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little more 
controlled about what is imported into the generated\-code namespace.  May be 
slightly more efficient too.
* reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the 
"make check" unit testing a bit.  hopefully the delays are large enough that no 
spurious errors pop up.

Things to ponder:

Using -gen py:dynamic,slots actually results in python rejecting any attempt to 
assign to a non-existent member variable of a thrift object.  It's not the same 
as statically-typed languages blowing up at compile time for the same reason, 
but it could expose more subtle user-code errors that otherwise are undetected 
during serialization.

There's an interesting edge case in python 2.4 that arises due to old-style 
classes and exception processing that only shows up with new-style classes 
(those derived ultimately from object).  I'm not sure how to handle this, since 
it's a problem that you don't know exists at code-generation time, since we 
can't really know what version of python is going to execute the generated 
code.  Usually it just doesn't matter, but the case of exceptions with multiple 
inheritance is an edge case, it seems.  Since this is only a tricky problem for 
the unit tests, I put some failure detection code into {{TestClient.py}} to 
skip the exception testing if the generated class causes a {{TypeError}} 
exception when raised, instead of itself.  The test code prints a warning: 
"{{Skipping testing of Xception because python (2, 4) does not support this 
type of exception (gen py is: gen\-py\-dynamic)}}".

I'm not sure the {{TProtocolDynamic}} class name is the best possible name for 
the base class for "dynamic" objects to derive from.  The {{TProtocolDynamic}} 
class does embody the underlying protocol, at an abstract level, but the only 
other class in thrift/protocol/ that it is similar to is the {{TProtocolBase}} 
class.

This patch adds more dynamic behavior to the thrift library, and it makes sense 
that we'd want to take it another step and implement a python class that can 
parse a .thrift file, and generate the corresponding message/struct/etc classes 
at runtime.  This patch doesn't do that, but I think it makes it easier.  
Obviously the use-cases are compelling for generating the actual per-class 
{{read()}} and {{write()}} methods at thrift-compile time.  But there's some 
interesting possibilities ahead.

The unit testing code needs a little more TLC... I'm not too familiar with best 
practices here, but I'd love to see some code-coverage metrics produced by the 
unit testing / test cases.  It would be nice to know if some code paths aren't 
being exercised.  I won't be surprised if there are edge-cases that break the 
dynamic/slots generated code.  It would be nice to know of any, so we can 
update the unit tests to explore it (and fix whatever it is).  The unit tests 
don't exercise the py:twisted compatible code at all, so there's likely some 
lurking bugs in there.

With this patch, the new output from {{thrift -h}} relating to python is:
{noformat}
  --gen STR   Generate code with a dynamically-registered generator.
                STR has the form language[:key1=val1[,key2,[key3=val3]]].
                Keys and values are options passed to the generator.
                Many options will not require values.

Available generators (and options):

[ ... snip ... ]

  py (Python):
    new_style:       Generate new-style classes.
    twisted:         Generate Twisted-friendly RPC services.
    slots:           Generate code using slots for instance members.
    dynamic:         Generate dynamic code, less code generated but slower.
    dynbase=CLS      Derive generated classes from class CLS instead of 
TProtocolDynamic.
    dynimport='from foo.bar import CLS'
                     Add an import line to generated code to find the dynbase 
class.
{noformat}

Sorry for such a long ticket body.  I'll attach the patch in a minute or so,



> dynamic python object (de)serialization, using __slots__ for less memory, and 
> custom object inheritance
> -------------------------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1115
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1115
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Python - Compiler, Python - Library
>            Reporter: Will Pierce
>            Assignee: Will Pierce
>         Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of both processes.  I didn't set any 
> values to any attributes, so every attribute was left with the null value, 
> None.  The slots technique used 441 MB with slots vs. 3485 MB using 
> non-slots, non-dynamic generated code.  That's about 13% of the original 
> size, or 87% memory savings.
> I tried the same test using a very tiny object instead, the DebugThrift.thift 
> {{Bonk()}} class, which only has two fields.  For this, I made 10 million 
> instances of {{Bonk()}} and the results were very similar: 730 MB with slots 
> vs. 2622 MB using non-slots.
> The speed difference between non-slots and slots is about 25% better with 
> slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1 
> seconds for the same slots test, or a savings of 24% time.  This only 
> measured object creation time, not serialization/deserialization or instance 
> attribute access. (See the benchmarking results below for 
> serialization/deserialization comparisons).
> If the user wants to subclass the {{TProtocolDynamic}} class and override the 
> read/write methods, or add more functionality, this patch also adds a 
> {{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import 
> *}}" option, to specify a different base class (CLS) for the generated 
> "dynamic" objects, and an import line that brings that base class into scope.
> The python unit tests now get yet another test loop layer, with the 
> {{Makefile.am}} specifying multiple thrift generation lines, exercising the 
> possible combinations of {{thrift \-gen py:blah}}, sending the output to 
> multiple gen-py-blah output directories (like gen\-py\-default, 
> gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that 
> {{RunClientServer.py}} can then loop over each of these and do all of its 
> testing (which is combinatorially exhaustive) on each of those 
> subdirectories.  It's a pretty insane number of tests now, over 390 
> individual tests on python >= 2.5 (ssl + multiprocessing).  Since the test/py 
> "make check" tests aren't done for the 'casual' installer, I think it's 
> acceptable to have the "make check" tests take this long to complete- but it 
> is annoying to have to wait a few minutes for all the tests to 
> finish/succeed.  I think it could be improved a fair bit.
> This patch also makes a few other small changes:
> * adds a comment to the top of each generated .py file with the options to 
> \-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the 
> thrift compiler generated code was created, after the fact.  This should 
> probably be backported to other languages.
> * added another test for inequality to {{SerializationTest.py}} to compare 
> objects of different types, to exercise the {{\_\_eq\_\_}} code more
> * added \-\-genpydir cmdline options to {{TestServer.py}} and 
> {{TestClient.py}} to let {{RunClientServer.py}} loop over the gen\-py\-\* 
> directories as a kind of test controller.
> * reduced verbosity on the TestServer/Client.py code from 2 to 1, since the 
> number of invocations is now huge.
> * changed TestServer's intentional Exception to have a message "Exception 
> test PASSES." instead of "foo", which previously made the unit test output 
> look kind of scary.
> * Adds container read/write methods to {{TProtocol}}, only used by 
> {{TProtocolDynamic}} for "dynamic" generated classes
> * Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it 
> easier to log about {{TType}} related (de)serialization issues.
> * Changes the import line in generated code from "{{from thrift.Thrift import 
> *}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little 
> more controlled about what is imported into the generated\-code namespace.  
> May be slightly more efficient too.
> * reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the 
> "make check" unit testing a bit.  hopefully the delays are large enough that 
> no spurious errors pop up.
> Things to ponder:
> Using -gen py:dynamic,slots actually results in python rejecting any attempt 
> to assign to a non-existent member variable of a thrift object.  It's not the 
> same as statically-typed languages blowing up at compile time for the same 
> reason, but it could expose more subtle user-code errors that otherwise are 
> undetected during serialization.
> There's an interesting edge case in python 2.4 that arises due to old-style 
> classes and exception processing that only shows up with new-style classes 
> (those derived ultimately from object).  I'm not sure how to handle this, 
> since it's a problem that you don't know exists at code-generation time, 
> since we can't really know what version of python is going to execute the 
> generated code.  Usually it just doesn't matter, but the case of exceptions 
> with multiple inheritance is an edge case, it seems.  Since this is only a 
> tricky problem for the unit tests, I put some failure detection code into 
> {{TestClient.py}} to skip the exception testing if the generated class causes 
> a {{TypeError}} exception when raised, instead of itself.  The test code 
> prints a warning: "{{Skipping testing of Xception because python (2, 4) does 
> not support this type of exception (gen py is: gen\-py\-dynamic)}}".
> I'm not sure the {{TProtocolDynamic}} class name is the best possible name 
> for the base class for "dynamic" objects to derive from.  The 
> {{TProtocolDynamic}} class does embody the underlying protocol, at an 
> abstract level, but the only other class in thrift/protocol/ that it is 
> similar to is the {{TProtocolBase}} class.
> This patch adds more dynamic behavior to the thrift library, and it makes 
> sense that we'd want to take it another step and implement a python class 
> that can parse a .thrift file, and generate the corresponding 
> message/struct/etc classes at runtime.  This patch doesn't do that, but I 
> think it makes it easier.  Obviously the use-cases are compelling for 
> generating the actual per-class {{read()}} and {{write()}} methods at 
> thrift-compile time.  But there's some interesting possibilities ahead.
> The unit testing code needs a little more TLC... I'm not too familiar with 
> best practices here, but I'd love to see some code-coverage metrics produced 
> by the unit testing / test cases.  It would be nice to know if some code 
> paths aren't being exercised.  I won't be surprised if there are edge-cases 
> that break the dynamic/slots generated code.  It would be nice to know of 
> any, so we can update the unit tests to explore it (and fix whatever it is).  
> The unit tests don't exercise the py:twisted compatible code at all, so 
> there's likely some lurking bugs in there.
> With this patch, the new output from {{thrift -h}} relating to python is:
> {noformat}
>   --gen STR   Generate code with a dynamically-registered generator.
>                 STR has the form language[:key1=val1[,key2,[key3=val3]]].
>                 Keys and values are options passed to the generator.
>                 Many options will not require values.
> Available generators (and options):
> [ ... snip ... ]
>   py (Python):
>     new_style:       Generate new-style classes.
>     twisted:         Generate Twisted-friendly RPC services.
>     slots:           Generate code using slots for instance members.
>     dynamic:         Generate dynamic code, less code generated but slower.
>     dynbase=CLS      Derive generated classes from class CLS instead of 
> TProtocolDynamic.
>     dynimport='from foo.bar import CLS'
>                      Add an import line to generated code to find the dynbase 
> class.
> {noformat}
> Sorry for such a long ticket body.  I'll attach the patch in a minute or so,

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (THRIFT-1115) dynamic python object (de)serialization, using __slots__ for less memory, and custom object inheritance

Reply via email to

[jira] [Updated] (THRIFT-1115) dynamic python object (de)serialization, using slots for less memory, and custom object inheritance