Re: Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Rohini Palaniswamy Mon, 22 Dec 2014 18:46:37 -0800

I see that the jira is for unit tests and not e2e test. Please use

Util.checkQueryOutputsAfterSort(iter, expectedResults);


-Rohini

On Mon, Dec 22, 2014 at 6:39 PM, Rohini Palaniswamy <[email protected]
> wrote:
>
> Usually I have been fixing these kinds of tests by adding an order by when
> I added new tests for Union for Tez. In this case you can add order by
> after the distinct in the nested foreach.
>
> Daniel,
>     Any better suggestions?
>
> Regards,
> Rohini
>
>
> On Wed, Dec 17, 2014 at 10:38 PM, Zhang, Liyun <[email protected]>
> wrote:
>>
>> Hi all,
>>    I met a problem that “group operator has different results in
>> different engines like "spark" and "mapreduce"(PIG-4282<
>> https://issues.apache.org/jira/browse/PIG-4282>).
>>
>> groupdistinct.pig
>> A = load 'input1.txt' as (age:int,gpa:int);
>> B = group A by age;
>> C = foreach B {
>>  D = A.gpa;
>>  E = distinct D;
>> generate group, MIN(E);
>> };
>> dump C;
>> input1.txt is:
>> 10 89
>> 20 78
>> 10 68
>> 10 89
>> 20 92
>> the mapreduce output is:
>> (10,68),(20,78)
>> the spark output is
>> (20,78),(10,68)
>> These two results are different, because the sequence of field ‘group’ is
>> not same.
>>
>> Is there any way to guarantee the sequence of “group” field as the input
>> when using “group” operator in pig?
>>
>>
>> Best regards
>> Zhang,Liyun
>>
>>

Re: Is there any way to guarantee the sequence of “group” field as the input when using “group” operator in pig

Reply via email to