[ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038066#comment-13038066
 ] 

David Reiss commented on THRIFT-1115:
-------------------------------------

Looks solid.  Mostly style changes.


Does dynamic require the slots option and conflict with the newstyle option?  
If so, this should be enforced.  If not, there are a few places where they 
conflict.

In TExceptionBase, can you copy the read and write methods the way you do with 
__eq__?

{noformat}
+      except TypeError:
+        pass # TODO: add logging about unhashable types. i.e. d=dict(); 
d[[0,1]] = 2 fails
{noformat}

This should be an exception.

{noformat}
@@ -176,7 +180,6 @@
     except Xception, x:
       self.assertEqual(x.errorCode, 1001)
       self.assertEqual(x.message, 'Xception')
-
     try:
       self.client.testException("throw_undeclared")
       self.fail("should have thrown exception")
@@ -225,4 +228,4 @@
         self.createTests()
{noformat}

Looks like this is an unnecessary change.

{noformat}
+    // hdr += std::string("from thrift.protocol.TBase import TBase\n");
{noformat}

Remove commented-out code.

{noformat}
+    hdr +=
+      "from thrift.transport import TTransport\n"
+      "from thrift.protocol import TBinaryProtocol, TProtocol\n";
+    hdr += "try:\n"
     "  from thrift.protocol import fastbinary\n"
     "except:\n"
     "  fastbinary = None\n";
{noformat}

The two "hdr += " statements can be combined into one.
In fact, it looks like the hdr variable can be eliminated entirely if you want.

{noformat}
@@ -384,6 +439,7 @@
   f_types_ <<
     "class " << tenum->get_name() <<
     (gen_newstyle_ ? "(object)" : "") <<
+    (gen_dynamic_ ? "(" + gen_dynbaseclass_ + ")" : "") <<  
     ":" << endl;
{noformat}

This is one conflict between newstyle and dynamic.  I think the generation of 
__eq__ and friends is another.

{noformat}
+    if (gen_newstyle_) {
+     out << "(object)";
+    }
+    else if (gen_dynamic_) {
+      out << "(" << gen_dynbaseclass_ << ")";
+    }
{noformat}

Combine "else if" to previous line.

{noformat}
+    indent_down();
+  } else {
+    if (! gen_dynamic_) {
+      // no base class available to implement __eq__ and __repr__ and __ne__ 
for us
{noformat}

Combine "else" with "if".  Also, indentation is off inside the "if".

{noformat}
+    // perhaps we should try __slots__?
{noformat}

What does this mean?

{noformat}
+    indent_up();
+    for (m_iter = sorted_members.begin(); m_iter != sorted_members.end(); 
++m_iter) {
+      indent(out) <<  "'" << (*m_iter)->get_name()  << "', "
+            << endl;
+    }
{noformat}

This puts trailing whitespace in generated code.

{noformat}
+  if (! gen_dynamic_) {
{noformat}

No space after "!".

{noformat}
+      indent() << "  for attr in self.__slots__:" << endl <<
+      indent() << "    my_val, other_val = [getattr(obj, attr) for obj in 
[self, other]]" << endl <<
+      indent() << "    if my_val is other_val:" << endl <<
+      indent() << "      continue" << endl <<
+      indent() << "    if my_val != other_val:" << endl <<
+      indent() << "        return False" << endl <<
+      indent() << "  return True" << endl <<
{noformat}

Please make the same updates as you did to the TBase implementation (two 
separte statements to extract vals, and skip the is check).

{noformat}
     if (gen_twisted_) {
       extends_if = "(Interface)";
+    } else if (gen_dynamic_) {
+      extends_if = "(" + gen_dynbaseclass_  + ")"; // TBase
{noformat}

The service interface should not be a TBase.

{noformat}
+    if (gen_twisted_ && gen_newstyle_) { // TODO: handle dynbase here?
{noformat}

Same for the client.


> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -----------------------------------------------------------------------------------------
>
>                 Key: THRIFT-1115
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1115
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Build Process, Python - Compiler, Python - Library
>            Reporter: Will Pierce
>            Assignee: Will Pierce
>         Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of both processes.  I didn't set any 
> values to any attributes, so every attribute was left with the null value, 
> None.  The slots technique used 441 MB with slots vs. 3485 MB using 
> non-slots, non-dynamic generated code.  That's about 13% of the original 
> size, or 87% memory savings.
> I tried the same test using a very tiny object instead, the DebugThrift.thift 
> {{Bonk()}} class, which only has two fields.  For this, I made 10 million 
> instances of {{Bonk()}} and the results were very similar: 730 MB with slots 
> vs. 2622 MB using non-slots.
> The speed difference between non-slots and slots is about 25% better with 
> slots, or 14.6 seconds for non-slots 10 million Bonk() test, versus 11.1 
> seconds for the same slots test, or a savings of 24% time.  This only 
> measured object creation time, not serialization/deserialization or instance 
> attribute access. (See the benchmarking results below for 
> serialization/deserialization comparisons).
> If the user wants to subclass the {{TProtocolDynamic}} class and override the 
> read/write methods, or add more functionality, this patch also adds a 
> {{dynbase=CLS}} thrift compiler option and a "{{dynimport='from foo import 
> *}}" option, to specify a different base class (CLS) for the generated 
> "dynamic" objects, and an import line that brings that base class into scope.
> The python unit tests now get yet another test loop layer, with the 
> {{Makefile.am}} specifying multiple thrift generation lines, exercising the 
> possible combinations of {{thrift \-gen py:blah}}, sending the output to 
> multiple gen-py-blah output directories (like gen\-py\-default, 
> gen\-py\-dynamicslots, gen\-py\-newstyle, etc), so that 
> {{RunClientServer.py}} can then loop over each of these and do all of its 
> testing (which is combinatorially exhaustive) on each of those 
> subdirectories.  It's a pretty insane number of tests now, over 390 
> individual tests on python >= 2.5 (ssl + multiprocessing).  Since the test/py 
> "make check" tests aren't done for the 'casual' installer, I think it's 
> acceptable to have the "make check" tests take this long to complete- but it 
> is annoying to have to wait a few minutes for all the tests to 
> finish/succeed.  I think it could be improved a fair bit.
> This patch also makes a few other small changes:
> * adds a comment to the top of each generated .py file with the options to 
> \-gen, i.e. "{{py:dynamic,slots}}", to make it easier to figure out how the 
> thrift compiler generated code was created, after the fact.  This should 
> probably be backported to other languages.
> * added another test for inequality to {{SerializationTest.py}} to compare 
> objects of different types, to exercise the {{\_\_eq\_\_}} code more
> * added \-\-genpydir cmdline options to {{TestServer.py}} and 
> {{TestClient.py}} to let {{RunClientServer.py}} loop over the gen\-py\-\* 
> directories as a kind of test controller.
> * reduced verbosity on the TestServer/Client.py code from 2 to 1, since the 
> number of invocations is now huge.
> * changed TestServer's intentional Exception to have a message "Exception 
> test PASSES." instead of "foo", which previously made the unit test output 
> look kind of scary.
> * Adds container read/write methods to {{TProtocol}}, only used by 
> {{TProtocolDynamic}} for "dynamic" generated classes
> * Added a {{_VALUES_TO_NAMES}} list to {{TType}} in {{Thrift.py}} to make it 
> easier to log about {{TType}} related (de)serialization issues.
> * Changes the import line in generated code from "{{from thrift.Thrift import 
> *}}" to "{{from thrift.Thrift import TType, TMessageType}}" to be a little 
> more controlled about what is imported into the generated\-code namespace.  
> May be slightly more efficient too.
> * reduces the inter\-test {{sleep()}} delays somewhat, to try to speed up the 
> "make check" unit testing a bit.  hopefully the delays are large enough that 
> no spurious errors pop up.
> Things to ponder:
> Using -gen py:dynamic,slots actually results in python rejecting any attempt 
> to assign to a non-existent member variable of a thrift object.  It's not the 
> same as statically-typed languages blowing up at compile time for the same 
> reason, but it could expose more subtle user-code errors that otherwise are 
> undetected during serialization.
> There's an interesting edge case in python 2.4 that arises due to old-style 
> classes and exception processing that only shows up with new-style classes 
> (those derived ultimately from object).  I'm not sure how to handle this, 
> since it's a problem that you don't know exists at code-generation time, 
> since we can't really know what version of python is going to execute the 
> generated code.  Usually it just doesn't matter, but the case of exceptions 
> with multiple inheritance is an edge case, it seems.  Since this is only a 
> tricky problem for the unit tests, I put some failure detection code into 
> {{TestClient.py}} to skip the exception testing if the generated class causes 
> a {{TypeError}} exception when raised, instead of itself.  The test code 
> prints a warning: "{{Skipping testing of Xception because python (2, 4) does 
> not support this type of exception (gen py is: gen\-py\-dynamic)}}".
> I'm not sure the {{TProtocolDynamic}} class name is the best possible name 
> for the base class for "dynamic" objects to derive from.  The 
> {{TProtocolDynamic}} class does embody the underlying protocol, at an 
> abstract level, but the only other class in thrift/protocol/ that it is 
> similar to is the {{TProtocolBase}} class.
> This patch adds more dynamic behavior to the thrift library, and it makes 
> sense that we'd want to take it another step and implement a python class 
> that can parse a .thrift file, and generate the corresponding 
> message/struct/etc classes at runtime.  This patch doesn't do that, but I 
> think it makes it easier.  Obviously the use-cases are compelling for 
> generating the actual per-class {{read()}} and {{write()}} methods at 
> thrift-compile time.  But there's some interesting possibilities ahead.
> The unit testing code needs a little more TLC... I'm not too familiar with 
> best practices here, but I'd love to see some code-coverage metrics produced 
> by the unit testing / test cases.  It would be nice to know if some code 
> paths aren't being exercised.  I won't be surprised if there are edge-cases 
> that break the dynamic/slots generated code.  It would be nice to know of 
> any, so we can update the unit tests to explore it (and fix whatever it is).  
> The unit tests don't exercise the py:twisted compatible code at all, so 
> there's likely some lurking bugs in there.
> With this patch, the new output from {{thrift -h}} relating to python is:
> {noformat}
>   --gen STR   Generate code with a dynamically-registered generator.
>                 STR has the form language[:key1=val1[,key2,[key3=val3]]].
>                 Keys and values are options passed to the generator.
>                 Many options will not require values.
> Available generators (and options):
> [ ... snip ... ]
>   py (Python):
>     new_style:       Generate new-style classes.
>     twisted:         Generate Twisted-friendly RPC services.
>     slots:           Generate code using slots for instance members.
>     dynamic:         Generate dynamic code, less code generated but slower.
>     dynbase=CLS      Derive generated classes from class CLS instead of 
> TProtocolDynamic.
>     dynimport='from foo.bar import CLS'
>                      Add an import line to generated code to find the dynbase 
> class.
> {noformat}
> Sorry for such a long ticket body.  I'll attach the patch in a minute or so,

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to