[jira] [Commented] (PIG-4819) RANDOM() udf can lead to missing or redundant records

2016-03-09 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187597#comment-15187597
 ] 

Koji Noguchi commented on PIG-4819:
---

bq. TestBuiltin.testURIWithCurlyBrace is failing after addition of 
testRandomJob with -Dhadoopversion=23 -Dexectype=tez. 

Fixed in PIG-4833.

bq. Also would be good to put this in Pig 0.15.1 as well.

Not sure.   I do like my change but still afraid of how it'll perform for our 
users.  
For now, I prefer to keep it only in trunk.


> RANDOM() udf can lead to missing or redundant records
> -
>
> Key: PIG-4819
> URL: https://issues.apache.org/jira/browse/PIG-4819
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.16.0
>
> Attachments: pig-4819-v01.patch, pig-4819-v02.patch, 
> pig-4819-v02_fix_v01.patch, pig-4819-v02_fix_v02.patch, 
> pig-4819-v02_fix_v03.patch, pig-4819-v02_fix_v04.patch, 
> pig-4819-v02_fix_v05.patch, pig-4819-v02_fix_v06.patch
>
>
> When RANDOM() value is used for grouping/distinct/etc, it breaks the 
> mapreduce rule and can lead to redundant or missing records. 
> Some discussion can be found in 
> https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
> We should make RANDOM less random so that it'll produce the same sequence of 
> random values from the task retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4819) RANDOM() udf can lead to missing or redundant records

2016-03-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179448#comment-15179448
 ] 

Rohini Palaniswamy commented on PIG-4819:
-

[~knoguchi],
TestBuiltin.testURIWithCurlyBrace is failing after addition of 
testRandomJob with -Dhadoopversion=23 -Dexectype=tez. Possible to take a look 
at it? Also would be good to put this in Pig 0.15.1 as well.

> RANDOM() udf can lead to missing or redundant records
> -
>
> Key: PIG-4819
> URL: https://issues.apache.org/jira/browse/PIG-4819
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.16.0
>
> Attachments: pig-4819-v01.patch, pig-4819-v02.patch, 
> pig-4819-v02_fix_v01.patch, pig-4819-v02_fix_v02.patch, 
> pig-4819-v02_fix_v03.patch, pig-4819-v02_fix_v04.patch, 
> pig-4819-v02_fix_v05.patch, pig-4819-v02_fix_v06.patch
>
>
> When RANDOM() value is used for grouping/distinct/etc, it breaks the 
> mapreduce rule and can lead to redundant or missing records. 
> Some discussion can be found in 
> https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
> We should make RANDOM less random so that it'll produce the same sequence of 
> random values from the task retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4819) RANDOM() udf can lead to missing or redundant records

2016-03-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178597#comment-15178597
 ] 

Rohini Palaniswamy commented on PIG-4819:
-

+1

> RANDOM() udf can lead to missing or redundant records
> -
>
> Key: PIG-4819
> URL: https://issues.apache.org/jira/browse/PIG-4819
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Fix For: 0.16.0
>
> Attachments: pig-4819-v01.patch, pig-4819-v02.patch, 
> pig-4819-v02_fix_v01.patch, pig-4819-v02_fix_v02.patch, 
> pig-4819-v02_fix_v03.patch, pig-4819-v02_fix_v04.patch, 
> pig-4819-v02_fix_v05.patch, pig-4819-v02_fix_v06.patch
>
>
> When RANDOM() value is used for grouping/distinct/etc, it breaks the 
> mapreduce rule and can lead to redundant or missing records. 
> Some discussion can be found in 
> https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
> We should make RANDOM less random so that it'll produce the same sequence of 
> random values from the task retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4819) RANDOM() udf can lead to missing or redundant records

2016-03-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176100#comment-15176100
 ] 

Rohini Palaniswamy commented on PIG-4819:
-

+1

bq. But should I simply extend org.apache.pig.builtin.RANDOM from  
org.apache.pig.piggybank.evaluation.math.RANDOM
   Would be ideal, but if they use newer piggybank jar with older version of 
pig it will break. So I think duplicating code is better for now.


> RANDOM() udf can lead to missing or redundant records
> -
>
> Key: PIG-4819
> URL: https://issues.apache.org/jira/browse/PIG-4819
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-4819-v01.patch, pig-4819-v02.patch
>
>
> When RANDOM() value is used for grouping/distinct/etc, it breaks the 
> mapreduce rule and can lead to redundant or missing records. 
> Some discussion can be found in 
> https://issues.apache.org/jira/browse/PIG-3257?focusedCommentId=13669195#comment-13669195
> We should make RANDOM less random so that it'll produce the same sequence of 
> random values from the task retries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)