[ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159739#comment-13159739
 ] 

Scott Carey commented on PIG-2359:
----------------------------------

Yes, codegen should probably be a different ticket.

I'm not sure I understand why there is much to do on the 
serialization/deserialization side, but I have not looked into it in detaul.

Conceptually, one can build an API for building pig tuples of this sort that is 
independent of the serialization used.  Given a Schema, get a TupleBuilder.  
Fill it with data as you deserialize and create the Tuple.  The tuple 
implementation details (e.g. code gen vs old Object[] style Tuple) can both be 
supported.  Am I missing something?
If the schema is not known or varies per record, then a completely different 
code path must be taken.


                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to