[
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946895#comment-15946895
]
Nandor Kollar commented on PIG-5163:
------------------------------------
[~kellyzly] the plan after multiquery optimization you mentioned above is in
fact after multiquery optimization and JoinGroupOptimizerSpark. The plan I
mentioned is after multiquery optimization and before JoinGroupOptimizerSpark.
Join group optimizer, as far as I understood just merges LocalRearrange,
GlobalRearrange and Package to one operator, POJoinGroupSpark. When it tries to
merge the LR, GR, P pattern in scope-22, since multiquery optimizer deleted the
predecessor (the loading of temporary file in scope-34), POJoinGroupSpark will
have only one predecessor:
{code}
List<PhysicalOperator> predOfLRAList = plan.getPredecessors(lra);
{code}
for scope-22 will be null. Correct me if I'm wrong, but according to this *I
think think the bug is somewhere in MultiQueryOptimizerSpark#visitSparkOp*.
By the way, I think JoinGroupSparkConverter#convert should have failed I don't
think joining 1 RDD makes sense at all, does it? So instead of
{code}
SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 0)
{code}
we should check for
{code}
SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 1)
{code}
> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
> Key: PIG-5163
> URL: https://issues.apache.org/jira/browse/PIG-5163
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)