[ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150608#comment-13150608
 ] 

Dmitriy V. Ryaboy commented on PIG-2359:
----------------------------------------

Ashutosh, your comment is correct in the general case, but in this case, we 
only use these special tuples when we know the schema from the get-go.

I think 1 new byte for each of possible primitive single-field tuples 
(serialization: tuple_type, null_bit, serialized value), and 1 new primitive 
tuple (serialization: tuple_type, size, types, n_null_bits, bytearray) should 
work.

To get around the limitation of 1 byte for expressing the # of fields in a 
tuple (just in case) we can limit ourselves to 127 per byte, and save the high 
bit to indicate that the next byte should be interpreted to contain the rest of 
the number (so, it would take 3 bits to express 256.. but we could. And most of 
the time, this won't matter, as the vast majority of the tuples is going to be 
well under 128 fields).

There's some nastiness in Tuples where they all have internal "isNull" field 
that might be true to indicate the whole tuple is null. I am not sure what the 
deal is there. Do we need to write an extra bit per tuple to differentiate "all 
fields are null" from "the tuple is null"? Or should we just write the NULL 
byte (currently used to express null tuples) and let it be deserialize into a 
normal, not-primitive, tuple?
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to