Koji Noguchi created PIG-4819:
---------------------------------
Summary: RANDOM() udf can lead to missing or redundant records
Key: PIG-4819
URL: https://issues.apache.org/jira/browse/PIG-4819
Project: Pig
Issue Type: Bug
Reporter: Koji Noguchi
Assignee: Koji Noguchi
When RANDOM() value is used for grouping/distinct/etc, it breaks the mapreduce
rule and can lead to redundant or missing records.
Some discussion can be found in
https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
We should make RANDOM less random so that it'll produce the same sequence of
random values from the task retries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)