Hi all,

If you need any kind of ordering in the output you use on the "sort" operator. 
It was designed for such needs. The fact that different engines produce 
differently ordered groups is due to each engine specific optimizations. If you 
ask PIG to re-order the groups you just remove any benefit of those 
optimization. I would rather keep groups the way it is because I know I could 
rely on sort if I need and pay its price or have the best speed if I don't need 
any specific ordering.

My conclusion is : group makes no guarantee by contract, so this is neither a 
problem nor a bug. It is a misuse of "group" compared to "sort"

Regards,
Remi

-----Message d'origine-----
De : Zhang, Liyun [mailto:[email protected]] 
Envoyé : jeudi 18 décembre 2014 07:38
À : [email protected]
Objet : Is there any way to guarantee the sequence of "group" field as the 
input when using "group" operator in pig

Hi all,
   I met a problem that "group operator has different results in different 
engines like "spark" and 
"mapreduce"(PIG-4282<https://issues.apache.org/jira/browse/PIG-4282>).

groupdistinct.pig
A = load 'input1.txt' as (age:int,gpa:int); B = group A by age; C = foreach B { 
 D = A.gpa;  E = distinct D; generate group, MIN(E); }; dump C; input1.txt is:
10 89
20 78
10 68
10 89
20 92
the mapreduce output is:
(10,68),(20,78)
the spark output is
(20,78),(10,68)
These two results are different, because the sequence of field 'group' is not 
same.

Is there any way to guarantee the sequence of "group" field as the input when 
using "group" operator in pig?


Best regards
Zhang,Liyun


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Reply via email to