[ https://issues.apache.org/jira/browse/PIG-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-1831: ---------------------------- Attachment: PIG-1831-1.patch Richard suggest a better fix using ThreadLocal variable instead of static variable. Still keep static sJobConf for backward compatibility though it is already marked as deprecate in 0.7. In theory, if UDF still use deprecated sJobConf, they might see the same issue. But the chance of it should be very low. > Indeterministic behavior in local mode due to static variable > PigMapReduce.sJobConf > ----------------------------------------------------------------------------------- > > Key: PIG-1831 > URL: https://issues.apache.org/jira/browse/PIG-1831 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Reporter: Vivek Padmanabhan > Assignee: Daniel Dai > Attachments: PIG-1831-0.patch, PIG-1831-1.patch > > > The below script when run in local mode gives me a different output. It looks > like in local mode I have to store a relation obtained through streaming in > order to use it afterwards. > For example consider the below script : > DEFINE MySTREAMUDF `test.sh`; > A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 > ); > B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int); > --STORE B into 'output.B'; > C = JOIN B by wId LEFT OUTER, A by myId; > D = FOREACH C GENERATE B::wId,B::num,data4 ; > D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int); > --STORE D into 'output.D'; > E = foreach B GENERATE wId,num; > F = DISTINCT E; > G = GROUP F ALL; > H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount; > I = CROSS D,H; > STORE I into 'output.I'; > test.sh > --------- > #/bin/bash > cut -f1,3 > And input is > abcd label1 11 feature1 > acbd label2 22 feature2 > adbc label3 33 feature3 > Here if I store relation B and D then everytime i get the result : > acbd 3 > abcd 3 > adbc 3 > But if i dont store relations B and D then I get an empty output. Here again > I have observed that this behaviour is random ie sometimes like 1out of 5 > runs there will be output. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira