Grouping & streamer interaction causes all but first line be lost
-----------------------------------------------------------------

                 Key: PIG-1726
                 URL: https://issues.apache.org/jira/browse/PIG-1726
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
         Environment: Pig 0.7.0+9-1~jaunty-cdh3b2 (Cloudera's CDH3b2) on Ubuntu 
10.04.1, pseudo-cluster
            Reporter: Tommi Virtanen


With test.data:

foo
bar
bar
xyzzy
foo
foo
frob

this script test.pig:

orig = LOAD 'test.data' USING PigStorage();
DEFINE cat `cat`;
a1 = GROUP orig BY $0;
a2 = STREAM a1 THROUGH cat;
STORE a2 INTO 'one' USING PigStorage();
b1 = GROUP orig BY $0;
b2 = STREAM b1 THROUGH cat;
STORE b2 INTO 'two' USING PigStorage();

causes this output:

$ hadoop fs -cat one/part\*
bar     {(bar),(bar)}
$ hadoop fs -cat two/part\*
bar     {(bar),(bar)}
$ 

that is, all but one line is lost from both results. In comparison, taking out 
one of the branches makes the other one behave right; this script is works.pig:

orig = LOAD 'test.data' USING PigStorage();
DEFINE cat `cat`;
a1 = GROUP orig BY $0;
a2 = STREAM a1 THROUGH cat;
STORE a2 INTO 'one' USING PigStorage();

and it produces this output:

$ hadoop fs -cat one/part\*
bar     {(bar),(bar)}
foo     {(foo),(foo),(foo)}
frob    {(frob)}
xyzzy   {(xyzzy)}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to