[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2009-06-04 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-834:
--

Description: 
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.

{code}
# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false
{code}

  was:
a = load 'students.txt' as (c1,c2,c3,c4); 
c = group a by c2;  
f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
distinct does not function, and incorrect results are produced.
Distinct should have been evaluated in the 3 stages and output of Distinct 
should be given to COUNT in reduce stage.


# Map Reduce Plan  
#--
MapReduce node 1-122
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-139
|   |
|   Project[bytearray][1] - 1-140
|
|---New For Each(false,false)[bag] - 1-127
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
|   |
|   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
|   |
|   |---Project[bag][2] - 1-123
|   |
|   |---Project[bag][1] - 1-124
|   |
|   Project[bytearray][0] - 1-133
|
|---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
|

|---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
 - 1-111
Combine Plan
Local Rearrange[tuple]{bytearray}(false) - 1-143
|   |
|   Project[bytearray][1] - 1-144
|
|---New For Each(false,false)[bag] - 1-132
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
|   |
|   |---Project[bag][0] - 1-135
|   |
|   Project[bytearray][1] - 1-134
|
|---POCombinerPackage[tuple]{bytearray} - 1-137
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
|
|---New For Each(false)[bag] - 1-120
|   |
|   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
|   |
|   |---Project[bag][0] - 1-136
|
|---POCombinerPackage[tuple]{bytearray} - 1-145
Global sort: false


> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Priority: Critical
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[t

[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-01-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-834:
--

Fix Version/s: 0.7.0

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Priority: Critical
> Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-01-27 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-834:
---

Priority: Major  (was: Critical)

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Fix For: 0.7.0
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Attachment: pig-834.patch

In this patch, I look for a pattern of POUserFunc followed by another 
POUserFunc in the inner plan of ForEach and if thats found I flag the combiner 
optimizer to not fire. This disables the combiner for this particular query 
(test case included). Wondering if this fix is sufficient for this bug ?

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Attachment: pig-834_2.patch

Correct approach is following: If leaf of inner plan of ForEach is not 
combinable then we dont put combiner in any case. If it is, there should not be 
any other combinable POUserFunc in the ForEach's inner plan. First check 
already exists in trunk. This patch checks for this second conditon and makes 
sure not to fire combiner if there is any other combinable POUserFunc in the 
ForEach inner plan apart from leaf POUserFunc.

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-05 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Status: Patch Available  (was: Open)

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Status: Patch Available  (was: Open)

Trying to get hudson going on this.

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Status: Open  (was: Patch Available)

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Status: Open  (was: Patch Available)

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Attachment: pig-834_3.patch

Instead of having recursive function walking on plan, better to have a visitor 
doing that. So, this patch replaces that function with a visitor.

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Status: Patch Available  (was: Open)

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked-in.

> incorrect plan when algebraic functions are nested
> --
>
> Key: PIG-834
> URL: https://issues.apache.org/jira/browse/PIG-834
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 0.7.0
>
> Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch
>
>
> a = load 'students.txt' as (c1,c2,c3,c4); 
> c = group a by c2;  
> f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
> Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
> distinct does not function, and incorrect results are produced.
> Distinct should have been evaluated in the 3 stages and output of Distinct 
> should be given to COUNT in reduce stage.
> {code}
> # Map Reduce Plan  
> #--
> MapReduce node 1-122
> Map Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-139
> |   |
> |   Project[bytearray][1] - 1-140
> |
> |---New For Each(false,false)[bag] - 1-127
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
> |   |
> |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
> |   |
> |   |---Project[bag][2] - 1-123
> |   |
> |   |---Project[bag][1] - 1-124
> |   |
> |   Project[bytearray][0] - 1-133
> |
> |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
> |
> 
> |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
>  - 1-111
> Combine Plan
> Local Rearrange[tuple]{bytearray}(false) - 1-143
> |   |
> |   Project[bytearray][1] - 1-144
> |
> |---New For Each(false,false)[bag] - 1-132
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
> |   |
> |   |---Project[bag][0] - 1-135
> |   |
> |   Project[bytearray][1] - 1-134
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-137
> Reduce Plan
> Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
> |
> |---New For Each(false)[bag] - 1-120
> |   |
> |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
> |   |
> |   |---Project[bag][0] - 1-136
> |
> |---POCombinerPackage[tuple]{bytearray} - 1-145
> Global sort: false
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.