[ 
https://issues.apache.org/jira/browse/PIG-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009905#comment-13009905
 ] 

Daniel Dai commented on PIG-1916:
---------------------------------

Yes, you will need to change LogicalPlanGenerator.g and buildNestedCrossOp so 
that Pig can recognize nested operator and generate logical plan correctly. 
This is the first step. There are other places need to change, for example:
1. LogToPhyTranslationVisitor.java, which translate logical plan to physical 
plan. It currently does not recognize nested cross
2. Also in LogToPhyTranslationVisitor.java, Pig translate top level "cross" 
into UDF GFCross. Nested cross will use a different implementation, so we 
should translate into a different physical plan (using POCross)
3. MRCompiler.java, it does not know how to handle POCross yet
4. Pipeline execution, I hope POCross will work, but need to review (a little 
background, POCross used to implement Pig local mode, however, it is dropped in 
Pig 0.7 since we move to hadoop local mode. Currently no one is using POCross, 
hopefully it still functional)

> Nested cross
> ------------
>
>                 Key: PIG-1916
>                 URL: https://issues.apache.org/jira/browse/PIG-1916
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>              Labels: gsoc2011
>             Fix For: 0.10
>
>
> It is useful to have cross inside foreach nested statement. One typical use 
> case for nested foreach is after cogroup two relations, we want to flatten 
> the records of the same key, and do some processing. This is naturally to be 
> achieved by cross. Eg:
> {code}
> C = cogroup user by uid, session by uid;
> D = foreach C {
>     crossed = cross user, session; -- To flatten two input bags
>     filtered = filter crossed by user::region == session::region;
>     result = foreach crossed generate processSession(user::age, user::gender, 
> session::ip);  --Nested foreach Jira: PIG-1631
>     generate result;
> }
> {code}
> If we don't have cross, user have to write a UDF process the bag user, 
> session. It is much harder than a UDF process flattened tuples. This is 
> especially true when we have nested foreach statement(PIG-1631).
> This is a candidate project for Google summer of code 2011. More information 
> about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to