[
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946683#comment-15946683
]
liyunzhang_intel commented on PIG-5163:
---------------------------------------
[~nkollar]: it is a bug of POJoinGroupSpark#setPredecessors. in my cluster
before multiquery optimization
{code}
before multiquery optimization:
scope-74->scope-77 scope-82
scope-77
scope-82
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-74
Store(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage)
- scope-75
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-52
|
|---A: New For Each(false,false,false)[bag] - scope-51
| |
| Project[bytearray][0] - scope-45
| |
| Project[bytearray][1] - scope-47
| |
| Project[bytearray][2] - scope-49
|
|---A:
Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage)
- scope-44--------
Spark node scope-77
B:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage)
- scope-56
|
|---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage)
- scope-76--------
Spark node scope-82
D:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage)
- scope-73
|
|---D: New For Each(true,true)[tuple] - scope-72
| |
| Project[bag][1] - scope-70
| |
| Project[bag][2] - scope-71
|
|---D: Package(Packager)[tuple]{bytearray} - scope-65
|
|---D: Global Rearrange[tuple] - scope-64
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-66
| | |
| | Project[bytearray][0] - scope-67
| |
|
|---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage)
- scope-78
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-68
| |
| Project[bytearray][0] - scope-69
|
|---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-61
|
|---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage)
- scope-80--------
{code}
after multiquery optimization
{code}
after multiquery optimization:
scope-74
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-74
Split - scope-86
| |
| B:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage)
- scope-56
| |
| D:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage)
- scope-73
| |
| |---D: New For Each(true,true)[tuple] - scope-72
| | |
| | Project[bag][1] - scope-70
| | |
| | Project[bag][2] - scope-71
| |
| |---POJoinGroupSpark[tuple] - scope-64
| |
| |---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-61
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-52
|
|---A: New For Each(false,false,false)[bag] - scope-51
| |
| Project[bytearray][0] - scope-45
| |
| Project[bytearray][1] - scope-47
| |
| Project[bytearray][2] - scope-49
|
|---A:
Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage)
- scope-44--------
scope-74
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-74
Split - scope-86
| |
| B:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage)
- scope-56
| |
| D:
Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage)
- scope-73
| |
| |---D: New For Each(true,true)[tuple] - scope-72
| | |
| | Project[bag][1] - scope-70
| | |
| | Project[bag][2] - scope-71
| |
| |---POJoinGroupSpark[tuple] - scope-64
| |
| |---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-61
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-52
|
|---A: New For Each(false,false,false)[bag] - scope-51
| |
| Project[bytearray][0] - scope-45
| |
| Project[bytearray][1] - scope-47
| |
| Project[bytearray][2] - scope-49
|
|---A:
Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage)
- scope-44--------
{code}
the predecessor of scope-64 is scope-52 and scope-61 while in current code the
predecessor of scope-64 is only scope-61
> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
> Key: PIG-5163
> URL: https://issues.apache.org/jira/browse/PIG-5163
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)