[ 
https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948262#comment-15948262
 ] 

liyunzhang_intel commented on PIG-5163:
---------------------------------------

[~nkollar]: you can try to modify the plan according to your thought. but i 
guess it will fail at 
org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder#physicalToRDD
 because we don't use the rdd result which stored in 
JobGraphBuilder#physicalOpRdds. 

The algorithm of multiquery optimizer is a bit complex but most logic of 
mapReduceLayer.MultiQueryOptimizer and  
spark.optimizer.MultiQueryOptimizerSpark are similar.  How this case in mr mode 
with multiquery?   I found scope-31 and scope-29 are not replaced with scope-8, 
it loads "hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004" twice. 
{code}
#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-27
Map Plan
Store(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
 - scope-28
|
|---B: POStream[perl -ne 'print $_;' 
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
 - scope-8
    |
    |---A: New For Each(false,false,false)[bag] - scope-7
        |   |
        |   Project[bytearray][0] - scope-1
        |   |
        |   Project[bytearray][1] - scope-3
        |   |
        |   Project[bytearray][2] - scope-5
        |
        |---A: 
Load(/user/pig/tests/data/singlefile/studenttab10k.mock:org.apache.pig.builtin.PigStorage)
 - scope-0--------
Global sort: false
----------------

MapReduce node scope-33
Map Plan
Union[tuple] - scope-34
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-19
|   |   |
|   |   Project[bytearray][0] - scope-20
|   |
|   
|---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
 - scope-29
|
|---D: Local Rearrange[tuple]{bytearray}(false) - scope-21
    |   |
    |   Project[bytearray][0] - scope-22
    |
    |---C: POStream[perl -ne 'print $_;' 
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)]
 - scope-14
        |
        
|---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage)
 - scope-31--------
Reduce Plan
D: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
|
|---D: Package(JoinPackager(true,true))[tuple]{bytearray} - scope-18--------
Global sort: false
----------------
{code}


> MultiQuery_Streaming_1 is failing with spark exec type
> ------------------------------------------------------
>
>                 Key: PIG-5163
>                 URL: https://issues.apache.org/jira/browse/PIG-5163
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5163_1.patch
>
>
> 2nd output was empty, looks like pig on spark didn't generate any data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to