[ 
https://issues.apache.org/jira/browse/PIG-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283785#comment-13283785
 ] 

Jonathan Coveney commented on PIG-2638:
---------------------------------------

Bump. This bad boy is incremental but has literally no downside. I'm beginning 
to think we should rip out the custom intermediate and replace it with Kryo but 
until such a time, I think this is appropriate.
                
> Optimize BinInterSedes treatment of longs
> -----------------------------------------
>
>                 Key: PIG-2638
>                 URL: https://issues.apache.org/jira/browse/PIG-2638
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2638-0.patch, PIG-2638-1.patch
>
>
> During adventures in BinInterSedes, I noticed that Integers are written in an 
> optimized fashion, but longs are not. Given that, in the general case, we 
> have to write type information anyway, we might as well do the same 
> optimization for Longs. That is to say, given that most longs won't have 8 
> bytes of information in them, why should we waste the space of serializing 8 
> bytes?
> This patch takes its inspiration from varint encoding per these two sources:
> http://javasourcecode.org/html/open-source/mahout/mahout-0.5/org/apache/mahout/math/Varint.java.html
> https://developers.google.com/protocol-buffers/docs/encoding
> Though, nicely enough, we don't actually have to use varints. Since we HAVE 
> to write an 8 byte type header, we might as well include the number of bytes 
> we had to write. I use zig zag encoding so that in the case of negative 
> numbers, we see the benefit.
> This should decrease the amount of serialized long data by a good bit.
> Patch incoming. It passes test-commit in 0.11.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to