[ 
https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725368#action_12725368
 ] 

Ashutosh Chauhan commented on PIG-865:
--------------------------------------

Thanks for the review, Pradeep. 
As I was looking into code, I also found that bags used to hold replicate 
contents are recreated everytime, instead same bag object can be cleared and 
used again, thus minimizing object overhead. In the extreme case where every 
value of join key is different for every tuple (of replicate) but matches with 
tuples of fragment, we will end up creating as many bags as there are tuples 
where one bag would do. Will include this change and upload new patch.

   

> Performance: Unnnecessary computation in FRJoin
> -----------------------------------------------
>
>                 Key: PIG-865
>                 URL: https://issues.apache.org/jira/browse/PIG-865
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Ashutosh Chauhan
>            Priority: Minor
>             Fix For: 0.4.0
>
>         Attachments: pig-865.patch
>
>
> In POFRJoin implementation POLocalRearrange is used to extract join keys from 
> the input tuples. If keys match then to perform actual join input tuples are 
> fed to Foreach which does a cross on its inputs. After keys are extracted 
> using POLocalRearrange output; function getValueTuple(POLocalRearrange lr, 
> Tuple tuple) is called to reconstruct the input tuple. It seems that this 
> function call is unnecessary since we already have input tuple at that time. 
> This is not a bug, but since this function would get called for every tuple, 
> if it is eliminated, it should certainly help to improve performance. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to