[ 
https://issues.apache.org/jira/browse/PIG-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915880#action_12915880
 ] 

Daniel Dai commented on PIG-1637:
---------------------------------

test-patch result for PIG-1637-2.patch:

     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or 
modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning 
messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> Combiner not use because optimizor inserts a foreach between group and 
> algebric function
> ----------------------------------------------------------------------------------------
>
>                 Key: PIG-1637
>                 URL: https://issues.apache.org/jira/browse/PIG-1637
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.8.0
>
>         Attachments: PIG-1637-1.patch, PIG-1637-2.patch
>
>
> The following script does not use combiner after new optimization change.
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
>     as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> D = foreach C generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> This is because after group, optimizer detect group key is not used 
> afterward, it add a foreach statement after C. This is how it looks like 
> after optimization:
> {code}
> A = load ':INPATH:/pigmix/page_views' using 
> org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
>     as (user, action, timespent, query_term, ip_addr, timestamp, 
> estimated_revenue, page_info, page_links);
> B = foreach A generate user, (int)timespent as timespent, 
> (double)estimated_revenue as estimated_revenue;
> C = group B all; 
> C1 = foreach C generate B;
> D = foreach C1 generate SUM(B.timespent), AVG(B.estimated_revenue);
> store D into ':OUTPATH:';
> {code}
> That cancel the combiner optimization for D. 
> The way to solve the issue is to merge the C1 we inserted and D. Currently, 
> we do not merge these two foreach. The reason is that one output of the first 
> foreach (B) is referred twice in D, and currently rule assume after merge, we 
> need to calculate B twice in D. Actually, C1 is only doing projection, no 
> calculation of B. Merging C1 and D will not result calculating B twice. So C1 
> and D should be merged.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to