Koji Noguchi created PIG-4819:
---------------------------------

             Summary: RANDOM() udf can lead to missing or redundant records
                 Key: PIG-4819
                 URL: https://issues.apache.org/jira/browse/PIG-4819
             Project: Pig
          Issue Type: Bug
            Reporter: Koji Noguchi
            Assignee: Koji Noguchi


When RANDOM() value is used for grouping/distinct/etc, it breaks the mapreduce 
rule and can lead to redundant or missing records. 

Some discussion can be found in 
https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195

We should make RANDOM less random so that it'll produce the same sequence of 
random values from the task retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to