[
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944842#comment-15944842
]
Nandor Kollar commented on PIG-5163:
------------------------------------
It looks like this is an issue with multi-query optimization. Before multiquery
optimization, the spark plan looks like this:
{code}
Spark node scope-30
Store(hdfs://localhost:50373/tmp/temp274219070/tmp-1212075796:org.apache.pig.impl.io.InterStorage)
- scope-31
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-8
|
|---A: New For Each(false,false,false)[bag] - scope-7
| |
| Project[bytearray][0] - scope-1
| |
| Project[bytearray][1] - scope-3
| |
| Project[bytearray][2] - scope-5
|
|---A:
Load(hdfs://localhost:50373/user/nkollar/studenttab10k:org.apache.pig.builtin.PigStorage)
- scope-0--------
Spark node scope-33
B:
Store(hdfs://localhost:50373/user/nkollar/out.1:org.apache.pig.builtin.PigStorage)
- scope-12
|
|---Load(hdfs://localhost:50373/tmp/temp274219070/tmp-1212075796:org.apache.pig.impl.io.InterStorage)
- scope-32--------
Spark node scope-38
D:
Store(hdfs://localhost:50373/user/nkollar/out.2:org.apache.pig.builtin.PigStorage)
- scope-29
|
|---D: New For Each(true,true)[tuple] - scope-28
| |
| Project[bag][1] - scope-26
| |
| Project[bag][2] - scope-27
|
|---D: Package(Packager)[tuple]{bytearray} - scope-21
|
|---D: Global Rearrange[tuple] - scope-20
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-22
| | |
| | Project[bytearray][0] - scope-23
| |
|
|---Load(hdfs://localhost:50373/tmp/temp274219070/tmp-1212075796:org.apache.pig.impl.io.InterStorage)
- scope-34
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-24
| |
| Project[bytearray][0] - scope-25
|
|---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-17
|
|---Load(hdfs://localhost:50373/tmp/temp274219070/tmp-1212075796:org.apache.pig.impl.io.InterStorage)
- scope-36--------
{code}
and after it:
{code}
Spark node scope-30
Split - scope-42
| |
| B:
Store(hdfs://localhost:50373/user/nkollar/out.1:org.apache.pig.builtin.PigStorage)
- scope-12
| |
| D:
Store(hdfs://localhost:50373/user/nkollar/out.2:org.apache.pig.builtin.PigStorage)
- scope-29
| |
| |---D: New For Each(true,true)[tuple] - scope-28
| | |
| | Project[bag][1] - scope-26
| | |
| | Project[bag][2] - scope-27
| |
| |---D: Package(Packager)[tuple]{bytearray} - scope-21
| |
| |---D: Global Rearrange[tuple] - scope-20
| |
| |---D: Local Rearrange[tuple]{bytearray}(false) - scope-22
| | | |
| | | Project[bytearray][0] - scope-23
| |
| |---D: Local Rearrange[tuple]{bytearray}(false) - scope-24
| | |
| | Project[bytearray][0] - scope-25
| |
| |---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-17
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-8
|
|---A: New For Each(false,false,false)[bag] - scope-7
| |
| Project[bytearray][0] - scope-1
| |
| Project[bytearray][1] - scope-3
| |
| Project[bytearray][2] - scope-5
|
|---A:
Load(hdfs://localhost:50373/user/nkollar/studenttab10k:org.apache.pig.builtin.PigStorage)
- scope-0--------
{code}
The local rearrange in scope-22 doesn't have an input. [~kellyzly] scope-22
should have gone away after multiquery optimization?
> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
> Key: PIG-5163
> URL: https://issues.apache.org/jira/browse/PIG-5163
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: spark-branch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)