Grouping & streamer interaction causes all but first line be lost
-----------------------------------------------------------------
Key: PIG-1726
URL: https://issues.apache.org/jira/browse/PIG-1726
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Environment: Pig 0.7.0+9-1~jaunty-cdh3b2 (Cloudera's CDH3b2) on Ubuntu
10.04.1, pseudo-cluster
Reporter: Tommi Virtanen
With test.data:
foo
bar
bar
xyzzy
foo
foo
frob
this script test.pig:
orig = LOAD 'test.data' USING PigStorage();
DEFINE cat `cat`;
a1 = GROUP orig BY $0;
a2 = STREAM a1 THROUGH cat;
STORE a2 INTO 'one' USING PigStorage();
b1 = GROUP orig BY $0;
b2 = STREAM b1 THROUGH cat;
STORE b2 INTO 'two' USING PigStorage();
causes this output:
$ hadoop fs -cat one/part\*
bar {(bar),(bar)}
$ hadoop fs -cat two/part\*
bar {(bar),(bar)}
$
that is, all but one line is lost from both results. In comparison, taking out
one of the branches makes the other one behave right; this script is works.pig:
orig = LOAD 'test.data' USING PigStorage();
DEFINE cat `cat`;
a1 = GROUP orig BY $0;
a2 = STREAM a1 THROUGH cat;
STORE a2 INTO 'one' USING PigStorage();
and it produces this output:
$ hadoop fs -cat one/part\*
bar {(bar),(bar)}
foo {(foo),(foo),(foo)}
frob {(frob)}
xyzzy {(xyzzy)}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.