[ https://issues.apache.org/jira/browse/PIG-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248136#comment-13248136 ]
Prashant Kommireddi commented on PIG-2610: ------------------------------------------ How is this case different (from Pig Latin basics page)? {code} A = LOAD 'data' AS (url:chararray,outlink:chararray); DUMP A; (www.ccc.com,www.hjk.com) (www.ddd.com,www.xyz.org) (www.aaa.com,www.cvn.org) (www.www.com,www.kpt.net) (www.www.com,www.xyz.org) (www.ddd.com,www.xyz.org) B = GROUP A BY url; DUMP B; (www.aaa.com,{(www.aaa.com,www.cvn.org)}) (www.ccc.com,{(www.ccc.com,www.hjk.com)}) (www.ddd.com,{(www.ddd.com,www.xyz.org),(www.ddd.com,www.xyz.org)}) (www.www.com,{(www.www.com,www.kpt.net),(www.www.com,www.xyz.org)}) X = FOREACH B { FA= FILTER A BY outlink == 'www.xyz.org'; PA = FA.outlink; DA = DISTINCT PA; GENERATE group, COUNT(DA); } DUMP X; (www.aaa.com,0) (www.ccc.com,0) (www.ddd.com,1) (www.www.com,1) {code} > GC errors on using FILTER within nested FOREACH > ----------------------------------------------- > > Key: PIG-2610 > URL: https://issues.apache.org/jira/browse/PIG-2610 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.1 > Reporter: Prashant Kommireddi > > User has reported running into GC overhead errors while trying to use FILTER > within FOREACH and aggregating the filtered field. Here is the sample > PigLatin script provided by the user that generated this issue. > {code} > raw = LOAD 'input' using MyCustomLoader(); > searches = FOREACH raw GENERATE > day, searchType, > FLATTEN(impBag) AS (adType, clickCount) > ; > groupedSearches = GROUP searches BY (day, searchType) PARALLEL 50; > counts = FOREACH groupedSearches{ > type1 = FILTER searches BY adType == 'type1'; > type2 = FILTER searches BY adType == 'type2'; > GENERATE > FLATTEN(group) AS (day, searchType), > COUNT(searches) numSearches, > SUM(clickCount) AS clickCountPerSearchType, > SUM(type1.clickCount) AS type1ClickCount, > SUM(type2.clickCount) AS type2ClickCount; > }; > {code} > Pig should be able to handle this case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira