[jira] [Updated] (THRIFT-1115) python TBase class for dynamic (de)serialization, and __slots__ option for memory savings

2011-05-25 Thread Will Pierce (JIRA)

 [ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Pierce updated THRIFT-1115:


Attachment: THRIFT-1115.python_dynamic_code_and_slots_v5.patch

version 5 of patch attached, this changes method copying for TExceptionBase 
from TBase to use the .im_func special variable, and removes the TRoot class 
entirely.

the 'make check' tests for python2.7 and python2.4 work on my box.

> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -
>
> Key: THRIFT-1115
> URL: https://issues.apache.org/jira/browse/THRIFT-1115
> Project: Thrift
>  Issue Type: New Feature
>  Components: Build Process, Python - Compiler, Python - Library
>Reporter: Will Pierce
>Assignee: Will Pierce
> Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v5.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of both processes.  I didn't set any 
> values to any attributes, so every attribute was left with the null value, 
> None.  The slots technique used 441 MB with slots vs. 3485 MB using 
> non-slots, non-dynamic generated code.  That's about 13% of the original 
> size, or 87% memory savings.
> I tried the same test using a very tiny object instead, the DebugThrift.thift 
> {{Bonk()}} class, which only has two fields.  For this, I made 10 million 
> instances of {{Bonk()}} and the results were very similar

[jira] [Commented] (THRIFT-1115) python TBase class for dynamic (de)serialization, and __slots__ option for memory savings

2011-05-25 Thread Will Pierce (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039502#comment-13039502
 ] 

Will Pierce commented on THRIFT-1115:
-

Aha, the {{.im_func}} is the secret sauce that will make the method copying 
work.  I wondered if it was possible to get around python's instance type 
checking, im_func is it.  Thanks for pointing it out!

The method copying in Second seems the cleanest to me.  I'm a little queasy 
about the Third option, mostly because seeing a 'self' argument in a 
module-level function will confusingly look like a bug in a year or so when 
we've forgotten all about this detour.

I think you're right about the performance being equal.  I disassembled with 
dis.dis three functions that exercised each version (instance construction and 
a .go() call), and all three produced the same bytecode:
{code}
  1   0 LOAD_GLOBAL  0 (Third)  # or Second, or First
  3 CALL_FUNCTION0
  6 STORE_FAST   0 (x)
  9 LOAD_FAST0 (x)
 12 LOAD_ATTR2 (go)
 15 POP_TOP 
 16 LOAD_CONST   0 (None)
 19 RETURN_VALUE
{code}

I'll post an updated patch in a moment, after running the tests.


> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -
>
> Key: THRIFT-1115
> URL: https://issues.apache.org/jira/browse/THRIFT-1115
> Project: Thrift
>  Issue Type: New Feature
>  Components: Build Process, Python - Compiler, Python - Library
>Reporter: Will Pierce
>Assignee: Will Pierce
> Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 

[jira] [Commented] (THRIFT-1115) python TBase class for dynamic (de)serialization, and __slots__ option for memory savings

2011-05-25 Thread David Reiss (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039471#comment-13039471
 ] 

David Reiss commented on THRIFT-1115:
-

{noformat}
class First(object):
  def go(self, val):
return val + 1

class Second(object):
  go = First.go.im_func

def go_impl(self, val):
  return val + 1

class Third(object):
  go = go_impl
{noformat}

I think First, Second, and Third are all equivalent in terms of functionality 
and performance, though I haven't verified this.  I'm not sure how they compare 
in terms of debug-ability.  I'm sure it's subjective.

Do either of these options strike you as preferable to copying the 
implementations?

If not, I'd be fine with the copying.  I think we might do the same thing in 
PHP.

> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -
>
> Key: THRIFT-1115
> URL: https://issues.apache.org/jira/browse/THRIFT-1115
> Project: Thrift
>  Issue Type: New Feature
>  Components: Build Process, Python - Compiler, Python - Library
>Reporter: Will Pierce
>Assignee: Will Pierce
> Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of both processes.  I didn't set any 
> values to any attributes, so every attribute was left with the null value, 
> None.  The slots technique used 441 MB with slots vs. 3485 MB using 
> non-slots, non-dynamic gene

[jira] [Commented] (THRIFT-731) configure doesn't check for ant >= 1.7

2011-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039356#comment-13039356
 ] 

Hudson commented on THRIFT-731:
---

Integrated in Thrift #148 (See 
[https://builds.apache.org/hudson/job/Thrift/148/])
Thrift-731: configure doesn't check for ant >= 1.7
Client: java, build process
Patch: Harlan Lieberman-Berg, Jake Farrell

Adds a configure check to verify that the current version of ant is >= 1.7 
otherwise sets WITH_JAVA to no.


> configure doesn't check for ant >= 1.7
> --
>
> Key: THRIFT-731
> URL: https://issues.apache.org/jira/browse/THRIFT-731
> Project: Thrift
>  Issue Type: Bug
>  Components: Java - Compiler
>Reporter: Henry Robinson
>Assignee: Jake Farrell
>Priority: Minor
> Fix For: 0.7
>
> Attachments: Thrift-731.patch, ant.diff, configure.diff
>
>
> ./configure on a machine with ant 1.6 successfully runs, even though it's 
> required for the Java build step - otherwise you get
> BUILD FAILED
> /home/henry/thrift-0.2.0/lib/java/build.xml:86: Class 
> org.apache.tools.ant.taskdefs.ConditionTask doesn't support the nested 
> "typefound" element.
> Upgrading to ant 1.7.1 fixed the build failure, but would be nice if 
> configure gave a clue. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (THRIFT-1115) python TBase class for dynamic (de)serialization, and __slots__ option for memory savings

2011-05-25 Thread Will Pierce (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039352#comment-13039352
 ] 

Will Pierce commented on THRIFT-1115:
-

After thinking about this latest version of the patch, issue 3 w/ the 
duplicated code between TBase and TExceptionBase gnaws at me. We can eliminate 
TRoot if we just cut and paste its contents into both TBase and TExceptionBase. 
 It isn't super elegant, but the inheritance diagram is a lot simpler.  The 
only reason TRoot exists at all is to work around the python2.4 requirement 
that Exceptions cannot be new-style classes, otherwise the inheritance diagram 
would be simple: TBase(object) and TExceptionBase(TBase,Exception)

I'm not sure it's worth complicating the object hierarchy just to avoid having 
two copies of the same code, especially when they live in the same file.  I'm 
curious what others think of the tradeoff.  It's ~25 lines of code for 5 
methods, and 4 lines of whitespace.



> python TBase class for dynamic (de)serialization, and __slots__ option for 
> memory savings
> -
>
> Key: THRIFT-1115
> URL: https://issues.apache.org/jira/browse/THRIFT-1115
> Project: Thrift
>  Issue Type: New Feature
>  Components: Build Process, Python - Compiler, Python - Library
>Reporter: Will Pierce
>Assignee: Will Pierce
> Attachments: THRIFT-1115.python_dynamic_code_and_slots_v1.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v2.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v3.patch, 
> THRIFT-1115.python_dynamic_code_and_slots_v4.patch, test_dynser.py, 
> test_size.py
>
>
> This patch adds several new features to the compiler for python and the 
> python libraries, and exercises the new features with expanded unit testing.
> This adds support for generating python classes that have no {{read()}} or 
> {{write()}} methods. Instead, the generated classes inherit from a new base 
> class, {{TProtocolDynamic}}. This new class implements the de/serialization 
> with {{read()}} and {{write()}} methods that iterate over the derived class's 
> "{{thrift_spec}}" class member that describes the structure and types of the 
> thrift.  This dynamic evaluation works with both binary and compact 
> protocols, and has the same hook as before for delegating de/serialization to 
> fastbinary.so for the "accelerated binary" protocol.  This new baseclass 
> {{read()}} method may even be more efficient than the generated explicit 
> {{read()}} code for objects with lots of attributes because it doesn't have a 
> case/switch style series of "{{if field_id == X...}}" nested inside a loop.  
> Instead, it indexes the decoded field ID into the {{thrift_spec}} tuple 
> directly.  That efficiency gain is probably just noise though, since the 
> dynamic process probably uses more CPU later on, though I haven't benchmarked 
> it. (Up[date: see the benchmarking results posted below for 
> construction/serialization/deserialization comparisons.)
> If the 'dynamic' flag is given as a -gen py: flag to the compiler, then the 
> generated classes no longer get individual {{\_\_repr\_\_}} and 
> {{\_\_eq\_\_}} and {{\_\_ne\_\_}} methods, instead they inherit from the 
> TProtocolDynamic base class implementation, which uses {{\_\_slots\_\_}} 
> instead of {{\_\_dict\_\_}} for repr and equality testing.
> When "dynamic" python classes are generated, they have very little code, just 
> a constructor and class data.  All the work of serialization and 
> deserialization is done by the base class.  This produces about 980 lines for 
> DebugProtoTest vs. 3540 lines in default "\-gen py" mode, or about 1/3 the 
> original code size.
> The {{\_\_slots\_\_}} support is available without requiring the dynamic base 
> class, so users can save memory using the slots flag to generate non-dict 
> based instances.  The memory difference between dict and slots based objects 
> is hard to measure, but seems to be around 10x smaller using slots, as long 
> as the base class also uses {{\_\_slots\_\_}}.  If the generated classes are 
> old-style, and use slots, there's no memory savings at all, because the base 
> class still creates a {{\_\_dict\_\_}} object for every instance.  Python is 
> just tricky when it comes to using {{\_\_slots\_\_}} best.
> The memory savings is pretty astounding using new-style classes and 
> {{\_\_slots\_\_}}.  Building DebugProtoTest.thrift with: -gen 
> py:dynamic,slots versus \-gen py results in some pretty amazing memory 
> savings.  I tested by instantiating 1 million of the heavy 
> DebugProtoTest.thrift's {{CompactProtoTestStruct()}}, which has 49 attributes 
> in it, using regular "\-gen py" code versus "{{\-gen py:dynamic,slots}}" and 
> compared the VmRSS resident memory usage of bo

[jira] [Updated] (THRIFT-731) configure doesn't check for ant >= 1.7

2011-05-25 Thread Jake Farrell (JIRA)

 [ 
https://issues.apache.org/jira/browse/THRIFT-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Farrell updated THRIFT-731:


Attachment: Thrift-731.patch

New function using part of the previous patches to detect the current version 
of ant and set WITH_JAVA = no if ant < 1.7. Also cleans up the check for 
included dep packages since they will be auto downloaded if not available with 
the new build. 

> configure doesn't check for ant >= 1.7
> --
>
> Key: THRIFT-731
> URL: https://issues.apache.org/jira/browse/THRIFT-731
> Project: Thrift
>  Issue Type: Bug
>  Components: Java - Compiler
>Reporter: Henry Robinson
>Assignee: Harlan Lieberman-Berg
>Priority: Minor
> Fix For: 0.7
>
> Attachments: Thrift-731.patch, ant.diff, configure.diff
>
>
> ./configure on a machine with ant 1.6 successfully runs, even though it's 
> required for the Java build step - otherwise you get
> BUILD FAILED
> /home/henry/thrift-0.2.0/lib/java/build.xml:86: Class 
> org.apache.tools.ant.taskdefs.ConditionTask doesn't support the nested 
> "typefound" element.
> Upgrading to ant 1.7.1 fixed the build failure, but would be nice if 
> configure gave a clue. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Closed] (THRIFT-731) configure doesn't check for ant >= 1.7

2011-05-25 Thread Jake Farrell (JIRA)

 [ 
https://issues.apache.org/jira/browse/THRIFT-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Farrell closed THRIFT-731.
---

Resolution: Fixed
  Assignee: Jake Farrell  (was: Harlan Lieberman-Berg)

> configure doesn't check for ant >= 1.7
> --
>
> Key: THRIFT-731
> URL: https://issues.apache.org/jira/browse/THRIFT-731
> Project: Thrift
>  Issue Type: Bug
>  Components: Java - Compiler
>Reporter: Henry Robinson
>Assignee: Jake Farrell
>Priority: Minor
> Fix For: 0.7
>
> Attachments: Thrift-731.patch, ant.diff, configure.diff
>
>
> ./configure on a machine with ant 1.6 successfully runs, even though it's 
> required for the Java build step - otherwise you get
> BUILD FAILED
> /home/henry/thrift-0.2.0/lib/java/build.xml:86: Class 
> org.apache.tools.ant.taskdefs.ConditionTask doesn't support the nested 
> "typefound" element.
> Upgrading to ant 1.7.1 fixed the build failure, but would be nice if 
> configure gave a clue. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira