[ https://issues.apache.org/jira/browse/PIG-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062898#comment-13062898 ]
Daniel Dai commented on PIG-1916: --------------------------------- Hi, Zhijie, I browse through the patch, looks good to me. Certainly I will look into more details later. But in the mean time, can you add test cases to the patch? We need to test: 1. Simple case (the one on Jira) 2. More than two inputs in cogroup 3. There is statement before cross (eg, do a filter, then do cross) You can follow existing test (eg: TestEvalPipeline2) to write a separate test suit. > Nested cross > ------------ > > Key: PIG-1916 > URL: https://issues.apache.org/jira/browse/PIG-1916 > Project: Pig > Issue Type: New Feature > Components: impl > Reporter: Daniel Dai > Assignee: Zhijie Shen > Labels: gsoc2011 > Fix For: 0.10 > > Attachments: PIG-1916_1.patch, PIG-1916_2.patch, PIG-1916_3.patch > > > It is useful to have cross inside foreach nested statement. One typical use > case for nested foreach is after cogroup two relations, we want to flatten > the records of the same key, and do some processing. This is naturally to be > achieved by cross. Eg: > {code} > C = cogroup user by uid, session by uid; > D = foreach C { > crossed = cross user, session; -- To flatten two input bags > filtered = filter crossed by user::region == session::region; > result = foreach crossed generate processSession(user::age, user::gender, > session::ip); --Nested foreach Jira: PIG-1631 > generate result; > } > {code} > If we don't have cross, user have to write a UDF process the bag user, > session. It is much harder than a UDF process flattened tuples. This is > especially true when we have nested foreach statement(PIG-1631). > This is a candidate project for Google summer of code 2011. More information > about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira