[ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
---------------------------------

    Attachment: pig-834_2.patch

Correct approach is following: If leaf of inner plan of ForEach is not 
combinable then we dont put combiner in any case. If it is, there should not be 
any other combinable POUserFunc in the ForEach's inner plan. First check 
already exists in trunk. This patch checks for this second conditon and makes 
sure not to fire combiner if there is any other combinable POUserFunc in the 
ForEach inner plan apart from leaf POUserFunc.

> incorrect plan when algebraic functions are nested
> --------------------------------------------------
>
>                 Key: PIG-834
>                 URL: https://issues.apache.org/jira/browse/PIG-834
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Thejas M Nair
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.7.0
>
>         Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
>     |   |
>     |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
>     |       |
>     |       |---Project[bag][2] - 1-123
>     |           |
>     |           |---Project[bag][1] - 1-124
>     |   |
>     |   Project[bytearray][0] - 1-133
>     |
>     |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
>         |
>         
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111--------
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
>     |   |
>     |   |---Project[bag][0] - 1-135
>     |   |
>     |   Project[bytearray][1] - 1-134
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-137--------
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
>     |   |
>     |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
>     |   |
>     |   |---Project[bag][0] - 1-136
>     |
>     |---POCombinerPackage[tuple]{bytearray} - 1-145--------
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to