[
https://issues.apache.org/jira/browse/PIG-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064249#comment-13064249
]
Daniel Dai commented on PIG-1916:
---------------------------------
Hi, Zhijie,
Patch looks pretty good. The simple test case you use is fine, and local mode
test should be good enough. I see you also tested cases when one side is an
empty bag, it works as expected. Illustrate also works. I think the patch is
good to go.
One future improvement is, we can optimize the bag creation. We can create n-1
bags instead of n bags. For the last relation, we can iterate tuple by tuple,
create results on the fly. Thus we can use less memory. We can open a separate
Jira ticket for this optimization.
> Nested cross
> ------------
>
> Key: PIG-1916
> URL: https://issues.apache.org/jira/browse/PIG-1916
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Reporter: Daniel Dai
> Assignee: Zhijie Shen
> Labels: gsoc2011
> Fix For: 0.10
>
> Attachments: PIG-1916_1.patch, PIG-1916_2.patch, PIG-1916_3.patch,
> PIG-1916_4.patch
>
>
> It is useful to have cross inside foreach nested statement. One typical use
> case for nested foreach is after cogroup two relations, we want to flatten
> the records of the same key, and do some processing. This is naturally to be
> achieved by cross. Eg:
> {code}
> C = cogroup user by uid, session by uid;
> D = foreach C {
> crossed = cross user, session; -- To flatten two input bags
> filtered = filter crossed by user::region == session::region;
> result = foreach crossed generate processSession(user::age, user::gender,
> session::ip); --Nested foreach Jira: PIG-1631
> generate result;
> }
> {code}
> If we don't have cross, user have to write a UDF process the bag user,
> session. It is much harder than a UDF process flattened tuples. This is
> especially true when we have nested foreach statement(PIG-1631).
> This is a candidate project for Google summer of code 2011. More information
> about the program can be found at http://wiki.apache.org/pig/GSoc2011
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira