[
https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-4819:
------------------------------
Attachment: pig-4819-v02_fix_v03.patch
Rohini pointed out that I would need to re-initialize the static variables for
Tez jvm reuse.
{quote}
http://pig.apache.org/docs/r0.15.0/udf.html
_Clean up static variable in Tez_
{quote}
Added. But still trying to see if I can get rid of the static variable
dependency.
> RANDOM() udf can lead to missing or redundant records
> -----------------------------------------------------
>
> Key: PIG-4819
> URL: https://issues.apache.org/jira/browse/PIG-4819
> Project: Pig
> Issue Type: Bug
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Fix For: 0.16.0
>
> Attachments: pig-4819-v01.patch, pig-4819-v02.patch,
> pig-4819-v02_fix_v01.patch, pig-4819-v02_fix_v02.patch,
> pig-4819-v02_fix_v03.patch
>
>
> When RANDOM() value is used for grouping/distinct/etc, it breaks the
> mapreduce rule and can lead to redundant or missing records.
> Some discussion can be found in
> https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
> We should make RANDOM less random so that it'll produce the same sequence of
> random values from the task retries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)