[ https://issues.apache.org/jira/browse/PIG-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162547#comment-13162547 ]
Dmitriy V. Ryaboy commented on PIG-2359: ---------------------------------------- Very rough (read: invalid, probably) speed test: modified SUM / LongSum to use a PLongTuple, and ran this code on excite.log from the tutorial: {code} l = load 'tutorial/data/excite-big.log' as (id:chararray, val:long, query:chararray); x = foreach (group l all) generate SUM(l.val); store x into '/tmp/foo'; {code} Before optimization: {code} real 0m14.785s user 0m22.516s sys 0m1.203s real 0m15.323s user 0m22.605s sys 0m1.182s real 0m14.841s user 0m22.600s sys 0m1.176s {code} after: {code} real 0m14.347s user 0m20.442s sys 0m1.095s real 0m14.344s user 0m20.241s sys 0m1.064s real 0m14.577s user 0m20.671s sys 0m1.087s {code} > Support more efficient Tuples when schemas are known > ---------------------------------------------------- > > Key: PIG-2359 > URL: https://issues.apache.org/jira/browse/PIG-2359 > Project: Pig > Issue Type: New Feature > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Attachments: PIG-2359.1.patch, PIG-2359.2.patch > > > Pig Tuples have significant overhead due to the fact that all the fields are > Objects. > When a Tuple only contains primitive fields (ints, longs, etc), it's possible > to avoid this overhead, which would result in significant memory savings. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira