[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Ashutosh Chauhan (Commented) (JIRA) Tue, 15 Nov 2011 06:18:20 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150510#comment-13150510
 ]


Ashutosh Chauhan commented on PIG-2359:
---------------------------------------

bq. I think that if we want really a more efficient tuple implementation when 
schemas are known, we need to strip the schema from the data. What's the point 
of repeating the schema in each tuple apart from ease of implementation?

Be careful with the assumption that schema is going to be same for all the rows 
in a data. Currently, Pig doesn't make this assumption and is thus able to work 
with tuples of varying schema in data. See, PIG-1131 where a related 
optimization was attempted (and also PIG-1188). 
                
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
>                 Key: PIG-2359
>                 URL: https://issues.apache.org/jira/browse/PIG-2359
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: PIG-2359.1.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are 
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible 
> to avoid this overhead, which would result in significant memory savings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2359) Support more efficient Tuples when schemas are known

Reply via email to