Variation in output while using streaming udfs in local mode ------------------------------------------------------------
Key: PIG-1831 URL: https://issues.apache.org/jira/browse/PIG-1831 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Vivek Padmanabhan The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards. For example consider the below script : {code:lang=scala|title=} DEFINE MySTREAMUDF `test.sh`; A = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 ); B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int); --STORE B into 'output.B'; C = JOIN B by wId LEFT OUTER, A by myId; D = FOREACH C GENERATE B::wId,B::num,data4 ; D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int); --STORE D into 'output.D'; E = foreach B GENERATE wId,num; F = DISTINCT E; G = GROUP F ALL; H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount; I = CROSS D,H; STORE I into 'output.I'; {code} {code:lang=scala|title=test.sh} #/bin/bash cut -f1,3 {code} And input is >abcd label1 11 feature1 >acbd label2 22 feature2 >adbc label3 33 feature3 Here if I store relation B and D then everytime i get the result : acbd 3 abcd 3 adbc 3 But if i dont store relations B and D then I get an empty output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.