[jira] [Commented] (PIG-5163) MultiQuery_Streaming_1 is failing with spark exec type

Nandor Kollar (JIRA) Wed, 29 Mar 2017 03:27:07 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946895#comment-15946895
 ]


Nandor Kollar commented on PIG-5163:
------------------------------------

[~kellyzly] the plan after multiquery optimization you mentioned above is in 
fact after multiquery optimization and JoinGroupOptimizerSpark. The plan I 
mentioned is after multiquery optimization and before JoinGroupOptimizerSpark. 
Join group optimizer, as far as I understood just merges LocalRearrange, 
GlobalRearrange and Package to one operator, POJoinGroupSpark. When it tries to 
merge the LR, GR, P pattern in scope-22, since multiquery optimizer deleted the 
predecessor (the loading of temporary file in scope-34), POJoinGroupSpark will 
have only one predecessor:
{code}
List<PhysicalOperator> predOfLRAList = plan.getPredecessors(lra);
{code}
for scope-22 will be null. Correct me if I'm wrong, but according to this *I 
think think the bug is somewhere in MultiQueryOptimizerSpark#visitSparkOp*.

By the way, I think JoinGroupSparkConverter#convert should have failed I don't 
think joining 1 RDD makes sense at all, does it? So instead of
{code}
SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 0)
{code}
we should check for
{code}
SparkUtil.assertPredecessorSizeGreaterThan(predecessors, op, 1)
{code}

> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
>                 Key: PIG-5163
>                 URL: https://issues.apache.org/jira/browse/PIG-5163
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5163) MultiQuery_Streaming_1 is failing with spark exec type

Reply via email to