IsEmpty returns the wrong value after using LIMIT
-------------------------------------------------
Key: PIG-1543
URL: https://issues.apache.org/jira/browse/PIG-1543
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu
1. Two input files:
1a: limit_empty.input_a
1
1
1
1b: limit_empty.input_b
2
2
2.
The pig script: limit_empty.pig
-- A contains only 1's & B contains only 2's
A = load 'limit_empty.input_a' as (a1:int);
B = load 'limit_empty.input_a' as (b1:int);
C =COGROUP A by a1, B by b1;
D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A),
COUNT(B);
store D into 'limit_empty.output/d';
-- After the script done, we see the right results:
-- {(1),(1),(1)} {} 1 0 3 0
-- {} {(2),(2)} 0 1 0 2
C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1),
COUNT(Alim), COUNT(Blim);
store D1 into 'limit_empty.output/d1';
-- After the script done, we see the unexpected results:
-- {(1)} {} 1 1 1 0
-- {} {(2)} 1 1 0 1
dump D;
dump D1;
3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
The major one:
IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while
IsEmpty() returns correctly in limit_empty.output/d/*.
The difference is that one has been applied with "LIMIT" before using IsEmpty().
The minor one:
The redirected output only contains the first dump:
({(1),(1),(1)},{},1,0,3L,0L)
({},{(2),(2)},0,1,0L,2L)
We expect two more lines like:
({(1)},{},1,1,1L,0L)
({},{(2)},1,1,0L,1L)
Besides, there is error says:
[main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob -
java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.Tuple
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.