[
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948262#comment-15948262
]
liyunzhang_intel commented on PIG-5163:
---------------------------------------
[~nkollar]: you can try to modify the plan according to your thought. but i
guess it will fail at
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder#physicalToRDD
because we don't use the rdd result which stored in
JobGraphBuilder#physicalOpRdds.
The algorithm of multiquery optimizer is a bit complex but most logic of
mapReduceLayer.MultiQueryOptimizer and
spark.optimizer.MultiQueryOptimizerSpark are similar. How this case in mr mode
with multiquery? I found scope-31 and scope-29 are not replaced with scope-8,
it loads "hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004" twice.
{code}
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-27
Map Plan
Store(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
- scope-28
|
|---B: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-8
|
|---A: New For Each(false,false,false)[bag] - scope-7
| |
| Project[bytearray][0] - scope-1
| |
| Project[bytearray][1] - scope-3
| |
| Project[bytearray][2] - scope-5
|
|---A:
Load(/user/pig/tests/data/singlefile/studenttab10k.mock:org.apache.pig.builtin.PigStorage)
- scope-0--------
Global sort: false
----------------
MapReduce node scope-33
Map Plan
Union[tuple] - scope-34
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-19
| | |
| | Project[bytearray][0] - scope-20
| |
|
|---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
- scope-29
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-21
| |
| Project[bytearray][0] - scope-22
|
|---C: POStream[perl -ne 'print $_;'
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
- scope-14
|
|---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
- scope-31--------
Reduce Plan
D: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
|
|---D: Package(JoinPackager(true,true))[tuple]{bytearray} - scope-18--------
Global sort: false
----------------
{code}
> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
> Key: PIG-5163
> URL: https://issues.apache.org/jira/browse/PIG-5163
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: spark-branch
>
> Attachments: PIG-5163_1.patch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)