[ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708551#action_12708551
 ] 

Pradeep Kamath commented on PIG-802:
------------------------------------

Adding some more details:
A new kind of bag - ReadOnceBag needs to be implemented. This bag will have 
reference to the "key"  currently being processed and the iterator to values 
provided by hadoop in reduce(). The ReadOnceBag's iterator will simply iterate 
over the hadoop iterator at each call and construct a tuple by using the key 
and value (see POPackage.java for details on how this is done). POPackage 
should also be changed or a new class introduced which creates ReadOnceBags 
instead of regular bags. This creation of the bag should only initialize the 
bag with the key and iterator.

> PERFORMANCE: not creating bags for ORDER BY
> -------------------------------------------
>
>                 Key: PIG-802
>                 URL: https://issues.apache.org/jira/browse/PIG-802
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Order by should be changed to not use POPackage to put all of the tuples in a 
> bag on the reduce side, as the bag is just immediately flattened. It can 
> instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to