[ https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765682#action_12765682 ]
Daniel Dai commented on PIG-1022: --------------------------------- Actually we cannot push the filter even before f2. Since we do not keep track of the source of data inside tuple, so gid should be treated as a generated field of f2. However, projection map of f2 give us the wrong result that gid is a directly mapped field of group (which is a tuple (name, gid)), and this triggers all the subsequences. The fix for this problem is to modify the projection map generation logic for the mapped field. Santhosh, do you have any comment? > optimizer pushes filter before the foreach that generates column used by > filter > ------------------------------------------------------------------------------- > > Key: PIG-1022 > URL: https://issues.apache.org/jira/browse/PIG-1022 > Project: Pig > Issue Type: Bug > Components: impl > Reporter: Thejas M Nair > Assignee: Daniel Dai > > grunt> l = load 'students.txt' using PigStorage() as (name:chararray, > gender:chararray, age:chararray, score:chararray); > grunt> f = foreach l generate name, gender, age,score, '200' as > gid:chararray; > grunt> g = group f by (name, gid); > grunt> f2 = foreach g generate group.name as name: chararray, group.gid as > gid: chararray; > grunt> filt = filter f2 by gid == '200'; > grunt> explain filt; > In the plan generated filt is pushed up after the load and before the first > foreach, even though the filter is on gid which is generated in first foreach. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.