[ https://issues.apache.org/jira/browse/PIG-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237368#comment-13237368 ]
Dmitriy V. Ryaboy commented on PIG-2610: ---------------------------------------- Ok so the Jira I *meant* to ask to open on this wasn't about a GC error (just push the filter above the group), but about the fact that the optimizer can do this automatically, with a little bit of trickiness (the filters need to be turned into generates, and the counts into sums). > GC errors on using FILTER within nested FOREACH > ----------------------------------------------- > > Key: PIG-2610 > URL: https://issues.apache.org/jira/browse/PIG-2610 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.1 > Reporter: Prashant Kommireddi > > User has reported running into GC overhead errors while trying to use FILTER > within FOREACH and aggregating the filtered field. Here is the sample > PigLatin script provided by the user that generated this issue. > {code} > raw = LOAD 'input' using MyCustomLoader(); > searches = FOREACH raw GENERATE > day, searchType, > FLATTEN(impBag) AS (adType, clickCount) > ; > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50; > counts = FOREACH groupedSearches{ > type1 = FILTER searches BY adType == 'type1'; > type2 = FILTER searches BY adType == 'type2'; > GENERATE > FLATTEN(group) AS (day, searchType), > COUNT(searches) numSearches, > SUM(clickCount) AS clickCountPerSearchType, > SUM(type1.clickCount) AS type1ClickCount, > SUM(type2.clickCount) AS type2ClickCount; > }; > {code} > Pig should be able to handle this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira