[ https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946895#comment-15946895 ]
Nandor Kollar commented on PIG-5163: ------------------------------------ [~kellyzly] the plan after multiquery optimization you mentioned above is in fact after multiquery optimization and JoinGroupOptimizerSpark. The plan I mentioned is after multiquery optimization and before JoinGroupOptimizerSpark. Join group optimizer, as far as I understood just merges LocalRearrange, GlobalRearrange and Package to one operator, POJoinGroupSpark. When it tries to merge the LR, GR, P pattern in scope-22, since multiquery optimizer deleted the predecessor (the loading of temporary file in scope-34), POJoinGroupSpark will have only one predecessor: {code} List<PhysicalOperator> predOfLRAList = plan.getPredecessors(lra); {code} for scope-22 will be null. Correct me if I'm wrong, but according to this *I think think the bug is somewhere in MultiQueryOptimizerSpark#visitSparkOp*. By the way, I think JoinGroupSparkConverter#convert should have failed I don't think joining 1 RDD makes sense at all, does it? So instead of {code} SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 0) {code} we should check for {code} SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 1) {code} > MultiQuery_Streaming_1 is failing with spark exec type > ------------------------------------------------------ > > Key: PIG-5163 > URL: https://issues.apache.org/jira/browse/PIG-5163 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: liyunzhang_intel > Fix For: spark-branch > > > 2nd output was empty, looks like pig on spark didn't generate any data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)