[ 
https://issues.apache.org/jira/browse/PIG-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431362#comment-13431362
 ] 

Jonathan Coveney commented on PIG-2862:
---------------------------------------

Given that tuple serialization/deserialization is key to any and all pig jobs 
(esp. the intermediate data in between the map and reduce phase), then the mere 
fact that the test suite runs means it is ok.

The savings come from 1 less byte per Tuple being serialized (both a size and 
CPU saving). The magnitude of that saving isn't gigantic, but it's essentially 
~8MB per 1000000 records.
                
> Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier
> ---------------------------------------------------------------------------
>
>                 Key: PIG-2862
>                 URL: https://issues.apache.org/jira/browse/PIG-2862
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>         Attachments: PIG-2862-0.patch
>
>
> Right now, there is TUPLE, SMALLTUPLE, and TINYTUPLE to try and save space 
> when writing out Tuples. There is no reason, however, that this can't be 
> hardcoded for common tuple sizes (<10) to further save space. A quick fix 
> that has positive benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to