[
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149422#comment-13149422
]
Dmitriy V. Ryaboy commented on PIG-2359:
----------------------------------------
Oops, fastutil wasn't supposed to make it into this patch, that's an experiment
for later. I'll get rid of the PrimitiveBags in the next version of the patch.
Putting the schemas into the JobConf is possible.. I am a little worried about
having binary data that's not readable without an ephemeral job conf, though.
Also, it'll be hard to figure out which tuple the schema is supposed to apply
to -- we'll need to prefix the tuple with something, anyway. Maybe a header?
The header would look like:
*magic_header_start*;
byte tuple_schema_id; *magic_schema_start*; byte type; byte type; ...
*magic_schema_end*;
byte tuple_schema_id; *magic_schema_start*; byte type; byte type; ...
*magic_schema_end*;
byte tuple_schema_id; *magic_schema_start*; byte type; byte type; ...
*magic_schema_end*;
*magic_header_end*;
Then, we introduce a SCHEMATUPLE value, like we did with TINY_TUPLE et al, and
the serialized tuple would look like:
*SCHEMATUPLE* *tuple_schema_id* bytes.....
*SCHEMATUPLE* *tuple_schema_id* bytes.....
I suppose we could also write the header in a separate tuple with high rep, to
avoid a hotspot.
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
> Key: PIG-2359
> URL: https://issues.apache.org/jira/browse/PIG-2359
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Attachments: PIG-2359.1.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible
> to avoid this overhead, which would result in significant memory savings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira