Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Zhang, Liyun Wed, 17 Dec 2014 22:41:41 -0800

Hi all,
   I met a problem that “group operator has different results in different 
engines like "spark" and 
"mapreduce"(PIG-4282<https://issues.apache.org/jira/browse/PIG-4282>).


groupdistinct.pig
A = load 'input1.txt' as (age:int,gpa:int);
B = group A by age;
C = foreach B {
 D = A.gpa;
 E = distinct D;
generate group, MIN(E);
};
dump C;
input1.txt is:
10 89
20 78
10 68
10 89
20 92
the mapreduce output is:
(10,68),(20,78)
the spark output is
(20,78),(10,68)
These two results are different, because the sequence of field ‘group’ is not 
same.

Is there any way to guarantee the sequence of “group” field as the input when 
using “group” operator in pig?


Best regards
Zhang,Liyun

Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Reply via email to