COUNT returns no results as a result of two filter statements in FOREACH
------------------------------------------------------------------------
Key: PIG-514
URL: https://issues.apache.org/jira/browse/PIG-514
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Viraj Bhat
Fix For: types_branch
For the following piece of sample code in FOREACH which counts the filtered
student records based on record_type == 1 and scores and also on record_type ==
0 does not seem to return any results.
{code}
mydata = LOAD 'mystudentfile.txt' AS (record_type,name,age,scores,gpa);
--keep only what we need
mydata_filtered = FOREACH mydata GENERATE record_type, name, age, scores ;
--group
mydata_grouped = GROUP mydata_filtered BY (record_type,age);
myfinaldata = FOREACH mydata_grouped {
myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age == scores;
myfilter2 = FILTER mydata_filtered BY record_type == 0;
GENERATE FLATTEN(group),
-- Only this count causes the problem ??
COUNT(myfilter1) as col2,
SUM(myfilter2.scores) as col3,
COUNT(myfilter2) as col4; };
--these set of statements confirm that the count on the filters returns 1
--mycountdata = FOREACH mydata_grouped
--{
-- myfilter1 = FILTER mydata_filtered BY record_type == 1 AND age ==
scores;
-- GENERATE
-- COUNT(myfilter1) as colcount;
--};
--dump mycountdata;
dump myfinaldata;
{code}
But if you uncomment the {code} COUNT(myfilter1) as col2, {code}, it seems to
work with the following results..
(0,22,45.0,2L)
(0,24,133.0,6L)
(0,25,22.0,1L)
Also I have tried to verify if this is a issue with the {code} COUNT(myfilter1)
as col2, {code} returning zero. It does not seem to be the case.
If {code} dump mycountdata; {code} is uncommented it returns:
(1L)
(1L)
I am attaching the tab separated 'mystudentfile.txt' file used in this Pig
script. Is this an issue with 2 filters in the FOREACH followed by a COUNT on
these filters??
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.