Filtering bag in nested foreach does not produce expected results
-----------------------------------------------------------------

                 Key: PIG-710
                 URL: https://issues.apache.org/jira/browse/PIG-710
             Project: Pig
          Issue Type: Bug
            Reporter: David Ciemiewicz


I have an idiom I used to use in older versions of pig (prior to types branch) 
which would group into a collection and then filter the output if any of the 
collection contained a particular string.

This relies on FILTER statements within a FOREACH ... { ... GENERATE ... } 
statement.

ORDER ... BY in the FOREACH ... { ... GENERATE ... } statement does not seem to 
have a problem so it seems to be something isolated to the FILTER.

{code}
A = load 'filterbug.data' using PigStorage() as ( id, str );

B = group A by ( id );
describe B;
dump B;

D = foreach B generate
        group,
        COUNT(A),
        A.str;
describe D;
dump D;

C = foreach B {
        D = order A by str;
        matchedcount = COUNT(D);
        generate
                group,
                matchedcount as matchedcount,
                D.str;
        };
describe C;
dump C;

Cfiltered = foreach B {
        D = filter A by (
                str matches 'hello'
                );
        matchedcount = COUNT(D);
        generate
                group,
                matchedcount as matchedcount,
                A.str;
        };
describe Cfiltered;
dump Cfiltered;
{code}

Here's the output:

{code}
-bash-3.00$ pig -exectype local -latest filterbug.pig
USING: /grid/0/gs/pig/current

B: {group: bytearray,A: {id: bytearray,str: bytearray}}
2009-03-10 03:14:14,838 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-03-10 03:14:14,839 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(a,{(a,hello),(a,goodbye)})
(b,{(b,goodbye)})
(c,{(c,hello),(c,hello),(c,hello)})
(d,{(d,what)})

D: {group: bytearray,long,str: {str: bytearray}}
2009-03-10 03:14:14,920 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-03-10 03:14:14,920 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(a,2L,{(hello),(goodbye)})
(b,1L,{(goodbye)})
(c,3L,{(hello),(hello),(hello)})
(d,1L,{(what)})

C: {group: bytearray,matchedcount: long,str: {str: bytearray}}
2009-03-10 03:14:14,985 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-03-10 03:14:14,985 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(a,2L,{(goodbye),(hello)})
(b,1L,{(goodbye)})
(c,3L,{(hello),(hello),(hello)})
(d,1L,{(what)})
2009-03-10 03:14:15,018 [main] WARN  org.apache.pig.PigServer - Encountered 
Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

Cfiltered: {group: bytearray,matchedcount: long,str: {str: bytearray}}
2009-03-10 03:14:15,044 [main] WARN  org.apache.pig.PigServer - Encountered 
Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
2009-03-10 03:14:15,057 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-03-10 03:14:15,057 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(a,1L,{(hello),(goodbye)})
{code}

What I expect for the output of Cfiltered is actually:

(a,1L,{(hello),(goodbye)})
(b,0L,{(goodbye)})
(c,3L,{(hello),(hello),(hello)})
(d,0L,{(what)})


The data file is:

{code}
a       hello
a       goodbye
b       goodbye
c       hello
c       hello
c       hello
d       what
{code}



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to