[ https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174466#comment-15174466 ]
Rohini Palaniswamy commented on PIG-4819: ----------------------------------------- bq. Let me know if I should make the same change. Yes. That would be good. Other comments on the patch - Tab spacing should be 4 spaces and not two in exec() method. - Can we remove System.err.println(tmpresult[i]); or use debug logging? > RANDOM() udf can lead to missing or redundant records > ----------------------------------------------------- > > Key: PIG-4819 > URL: https://issues.apache.org/jira/browse/PIG-4819 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Attachments: pig-4819-v01.patch > > > When RANDOM() value is used for grouping/distinct/etc, it breaks the > mapreduce rule and can lead to redundant or missing records. > Some discussion can be found in > https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195 > We should make RANDOM less random so that it'll produce the same sequence of > random values from the task retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)