[ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150608#comment-13150608 ]
Dmitriy V. Ryaboy commented on PIG-2359: ---------------------------------------- Ashutosh, your comment is correct in the general case, but in this case, we only use these special tuples when we know the schema from the get-go. I think 1 new byte for each of possible primitive single-field tuples (serialization: tuple_type, null_bit, serialized value), and 1 new primitive tuple (serialization: tuple_type, size, types, n_null_bits, bytearray) should work. To get around the limitation of 1 byte for expressing the # of fields in a tuple (just in case) we can limit ourselves to 127 per byte, and save the high bit to indicate that the next byte should be interpreted to contain the rest of the number (so, it would take 3 bits to express 256.. but we could. And most of the time, this won't matter, as the vast majority of the tuples is going to be well under 128 fields). There's some nastiness in Tuples where they all have internal "isNull" field that might be true to indicate the whole tuple is null. I am not sure what the deal is there. Do we need to write an extra bit per tuple to differentiate "all fields are null" from "the tuple is null"? Or should we just write the NULL byte (currently used to express null tuples) and let it be deserialize into a normal, not-primitive, tuple? > Support more efficient Tuples when schemas are known > ---------------------------------------------------- > > Key: PIG-2359 > URL: https://issues.apache.org/jira/browse/PIG-2359 > Project: Pig > Issue Type: New Feature > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Attachments: PIG-2359.1.patch > > > Pig Tuples have significant overhead due to the fact that all the fields are > Objects. > When a Tuple only contains primitive fields (ints, longs, etc), it's possible > to avoid this overhead, which would result in significant memory savings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira