[ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159739#comment-13159739 ]
Scott Carey commented on PIG-2359: ---------------------------------- Yes, codegen should probably be a different ticket. I'm not sure I understand why there is much to do on the serialization/deserialization side, but I have not looked into it in detaul. Conceptually, one can build an API for building pig tuples of this sort that is independent of the serialization used. Given a Schema, get a TupleBuilder. Fill it with data as you deserialize and create the Tuple. The tuple implementation details (e.g. code gen vs old Object[] style Tuple) can both be supported. Am I missing something? If the schema is not known or varies per record, then a completely different code path must be taken. > Support more efficient Tuples when schemas are known > ---------------------------------------------------- > > Key: PIG-2359 > URL: https://issues.apache.org/jira/browse/PIG-2359 > Project: Pig > Issue Type: New Feature > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Attachments: PIG-2359.1.patch > > > Pig Tuples have significant overhead due to the fact that all the fields are > Objects. > When a Tuple only contains primitive fields (ints, longs, etc), it's possible > to avoid this overhead, which would result in significant memory savings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira