[
https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162547#comment-13162547
]
Dmitriy V. Ryaboy commented on PIG-2359:
----------------------------------------
Very rough (read: invalid, probably) speed test: modified SUM / LongSum to use
a PLongTuple, and ran this code on excite.log from the tutorial:
{code}
l = load 'tutorial/data/excite-big.log' as (id:chararray, val:long,
query:chararray);
x = foreach (group l all) generate SUM(l.val);
store x into '/tmp/foo';
{code}
Before optimization:
{code}
real 0m14.785s
user 0m22.516s
sys 0m1.203s
real 0m15.323s
user 0m22.605s
sys 0m1.182s
real 0m14.841s
user 0m22.600s
sys 0m1.176s
{code}
after:
{code}
real 0m14.347s
user 0m20.442s
sys 0m1.095s
real 0m14.344s
user 0m20.241s
sys 0m1.064s
real 0m14.577s
user 0m20.671s
sys 0m1.087s
{code}
> Support more efficient Tuples when schemas are known
> ----------------------------------------------------
>
> Key: PIG-2359
> URL: https://issues.apache.org/jira/browse/PIG-2359
> Project: Pig
> Issue Type: New Feature
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Attachments: PIG-2359.1.patch, PIG-2359.2.patch
>
>
> Pig Tuples have significant overhead due to the fact that all the fields are
> Objects.
> When a Tuple only contains primitive fields (ints, longs, etc), it's possible
> to avoid this overhead, which would result in significant memory savings.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira