[ https://issues.apache.org/jira/browse/PIG-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vivek Padmanabhan updated PIG-1831: ----------------------------------- Description: The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards. For example consider the below script : DEFINE MySTREAMUDF `test.sh`; A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 ); B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int); --STORE B into 'output.B'; C = JOIN B by wId LEFT OUTER, A by myId; D = FOREACH C GENERATE B::wId,B::num,data4 ; D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int); --STORE D into 'output.D'; E = foreach B GENERATE wId,num; F = DISTINCT E; G = GROUP F ALL; H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount; I = CROSS D,H; STORE I into 'output.I'; test.sh --------- #/bin/bash cut -f1,3 And input is abcd label1 11 feature1 acbd label2 22 feature2 adbc label3 33 feature3 Here if I store relation B and D then everytime i get the result : acbd 3 abcd 3 adbc 3 But if i dont store relations B and D then I get an empty output. Here again I have observed that this behaviour is random ie sometimes like 1out of 5 runs there will be output. was: The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards. For example consider the below script : DEFINE MySTREAMUDF `test.sh`; A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 ); B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int); --STORE B into 'output.B'; C = JOIN B by wId LEFT OUTER, A by myId; D = FOREACH C GENERATE B::wId,B::num,data4 ; D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int); --STORE D into 'output.D'; E = foreach B GENERATE wId,num; F = DISTINCT E; G = GROUP F ALL; H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount; I = CROSS D,H; STORE I into 'output.I'; #/bin/bash cut -f1,3 And input is abcd label1 11 feature1 acbd label2 22 feature2 adbc label3 33 feature3 Here if I store relation B and D then everytime i get the result : acbd 3 abcd 3 adbc 3 But if i dont store relations B and D then I get an empty output. > Variation in output while using streaming udfs in local mode > ------------------------------------------------------------ > > Key: PIG-1831 > URL: https://issues.apache.org/jira/browse/PIG-1831 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Reporter: Vivek Padmanabhan > > The below script when run in local mode gives me a different output. It looks > like in local mode I have to store a relation obtained through streaming in > order to use it afterwards. > For example consider the below script : > DEFINE MySTREAMUDF `test.sh`; > A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 > ); > B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int); > --STORE B into 'output.B'; > C = JOIN B by wId LEFT OUTER, A by myId; > D = FOREACH C GENERATE B::wId,B::num,data4 ; > D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int); > --STORE D into 'output.D'; > E = foreach B GENERATE wId,num; > F = DISTINCT E; > G = GROUP F ALL; > H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount; > I = CROSS D,H; > STORE I into 'output.I'; > test.sh > --------- > #/bin/bash > cut -f1,3 > And input is > abcd label1 11 feature1 > acbd label2 22 feature2 > adbc label3 33 feature3 > Here if I store relation B and D then everytime i get the result : > acbd 3 > abcd 3 > adbc 3 > But if i dont store relations B and D then I get an empty output. Here again > I have observed that this behaviour is random ie sometimes like 1out of 5 > runs there will be output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.