[ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847604#action_12847604
 ] 

Alan Gates commented on PIG-1309:
---------------------------------

Comments:

A liberal dose of comments would help greatly in understanding what the various 
helper methods are doing.

You use LocalRearrange to split the keys and values.  What's the overhead of 
that?  Would it be more efficient to factor the key splitting code out of LR 
and share it between LR and here?

I don't understand the need for pullTuplesFromSideLoaders().  In setup() you 
put one tuple from each input into the heap.  Then you pull from the heap until 
you see a key change.  But I don't understand the next step.  At key change you 
call pullTuplesFromSideLoaders().  But if you've been adding into the heap as 
you pull tuples there's no need to pull anything from the side loaders at this 
point.  All you should need to do is package up the bags you've build and 
return them as your tuple.

Also, it appears your using pullTuplesFromSideLoaders() to fill the heap.  You 
shouldn't be pulling all tuples for a current key from side loaders, as you're 
likely to miss tuples with keys that are in the side loaders but not in the 
main loader.  The algorithm should be that as you pull a tuple from the heap, 
you place the next tuple from that same stream into the heap.  The heap will 
guarantee that your tuples come out in order.


> Map-side Cogroup
> ----------------
>
>                 Key: PIG-1309
>                 URL: https://issues.apache.org/jira/browse/PIG-1309
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: mapsideCogrp.patch
>
>
> In never ending quest to make Pig go faster, we want to parallelize as many 
> relational operations as possible. Its already possible to do Group-by( 
> PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
> is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to