[ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149539#comment-13149539
 ] 

Gianmarco De Francisci Morales commented on PIG-2359:
-----------------------------------------------------

For sure we need to represent the tuple like this:

SCHEMATUPLE tuple_schema_id bytes...

Because we do not want to put the schema in every tuple, we need a single place 
where there is a Map: tuple_schema_id -> tuple_schema
This map is anyway job specific, because tuple_schema_ids will be generated on 
the fly for the specific schema of the tuple used in the job.
JobConf looks like a good place to put this piece of information exactly 
because it is job specific.
I do not personally like much the idea of having metadata in a tuple with high 
replication, it looks a bit hacky to me.
I don't understand your concern, could you detail it a bit more? In any case 
the BinInterSedes tuple is ephemeral as the JobConf, and is not used for 
persisting data. So I assume your concern is not for users, but for developers 
and debugging, right?
It is true that the tuple will not be anymore self-describing, but this is the 
price to pay to have more efficient serialization format. What kind of problems 
do you think could arise?

We can then discuss on the best way to represent this Map.
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to