[ 
https://issues.apache.org/jira/browse/PIG-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1831:
----------------------------

    Attachment: PIG-1831-1.patch

Richard suggest a better fix using ThreadLocal variable instead of static 
variable. Still keep static sJobConf for backward compatibility though it is 
already marked as deprecate in 0.7. In theory, if UDF still use deprecated 
sJobConf, they might see the same issue. But the chance of it should be very 
low.

> Indeterministic behavior in local mode due to static variable 
> PigMapReduce.sJobConf
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1831
>                 URL: https://issues.apache.org/jira/browse/PIG-1831
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>         Attachments: PIG-1831-0.patch, PIG-1831-1.patch
>
>
> The below script when run in local mode gives me a different output. It looks 
> like in local mode I have to store a relation obtained through streaming in 
> order to use it afterwards.
>  For example consider the below script : 
> DEFINE MySTREAMUDF `test.sh`;
> A  = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 
> );
> B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);
> --STORE B into 'output.B';
> C = JOIN B by wId LEFT OUTER, A by myId;
> D = FOREACH C GENERATE B::wId,B::num,data4 ;
> D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);
> --STORE D into 'output.D';
> E = foreach B GENERATE wId,num;
> F = DISTINCT E;
> G = GROUP F ALL;
> H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
> I = CROSS D,H;
> STORE I  into 'output.I';
> test.sh
> ---------
> #/bin/bash
> cut -f1,3
> And input is 
> abcd    label1  11      feature1
> acbd    label2  22      feature2
> adbc    label3  33      feature3
> Here if I store relation B and D then everytime i get the result  :
> acbd            3
> abcd            3
> adbc            3
> But if i dont store relations B and D then I get an empty output.  Here again 
> I have observed that this behaviour is random ie sometimes like 1out of 5 
> runs there will be output. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to