[ https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771712#action_12771712 ]
Pradeep Kamath commented on PIG-920: ------------------------------------ In MultiQueryOptimizer.java (the numbers in the code blocks below are line numbers): It would be good to add some comments in the following code on why the plan size should be 2 or 3 and what the POForEach is {noformat} 223 if (pl.size() == 2 || pl.size() == 3) { 224 PhysicalOperator root = pl.getRoots().get(0); 225 PhysicalOperator leaf = pl.getLeaves().get(0); 226 if (root instanceof POLoad && leaf instanceof POStore) { 227 if (pl.size() == 3) { 228 PhysicalOperator mid = pl.getSuccessors(root).get(0); 229 if (mid instanceof POForEach) { 230 rtn = true; 231 } 232 } else { 233 rtn = true; 234 } 235 } 236 } 237 } {noformat} Just to be safe it might be better to check that there is only 1 successor before this code: {noformat} 265 PhysicalOperator opSucc = succ.mapPlan.getSuccessors(op).get(0); {noformat} Is the following by design even in the case where multiple successors are present for splitter? {noformat} 309 return 1; {noformat} > optimizing diamond queries > -------------------------- > > Key: PIG-920 > URL: https://issues.apache.org/jira/browse/PIG-920 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > Assignee: Richard Ding > Attachments: PIG-920.patch > > > The following query > A = load 'foo'; > B = filer A by $0>1; > C = filter A by $1 = 'foo'; > D = COGROUP C by $0, B by $0; > ...... > does not get efficiently executed. Currently, it runs a map only job that > basically reads and write the same data before doing the query processing. > Query where the data is loaded twice actually executed more efficiently. > This is not an uncommon query and we should fix this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.