[ https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948262#comment-15948262 ]
liyunzhang_intel commented on PIG-5163: --------------------------------------- [~nkollar]: you can try to modify the plan according to your thought. but i guess it will fail at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder#physicalToRDD because we don't use the rdd result which stored in JobGraphBuilder#physicalOpRdds. The algorithm of multiquery optimizer is a bit complex but most logic of mapReduceLayer.MultiQueryOptimizer and spark.optimizer.MultiQueryOptimizerSpark are similar. How this case in mr mode with multiquery? I found scope-31 and scope-29 are not replaced with scope-8, it loads "hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004" twice. {code} #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-27 Map Plan Store(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage) - scope-28 | |---B: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-8 | |---A: New For Each(false,false,false)[bag] - scope-7 | | | Project[bytearray][0] - scope-1 | | | Project[bytearray][1] - scope-3 | | | Project[bytearray][2] - scope-5 | |---A: Load(/user/pig/tests/data/singlefile/studenttab10k.mock:org.apache.pig.builtin.PigStorage) - scope-0-------- Global sort: false ---------------- MapReduce node scope-33 Map Plan Union[tuple] - scope-34 | |---D: Local Rearrange[tuple]{bytearray}(false) - scope-19 | | | | | Project[bytearray][0] - scope-20 | | | |---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage) - scope-29 | |---D: Local Rearrange[tuple]{bytearray}(false) - scope-21 | | | Project[bytearray][0] - scope-22 | |---C: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-14 | |---Load(hdfs://bdpe42:8020/tmp/temp66298541/tmp1505642004:org.apache.pig.impl.io.InterStorage) - scope-31-------- Reduce Plan D: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26 | |---D: Package(JoinPackager(true,true))[tuple]{bytearray} - scope-18-------- Global sort: false ---------------- {code} > MultiQuery_Streaming_1 is failing with spark exec type > ------------------------------------------------------ > > Key: PIG-5163 > URL: https://issues.apache.org/jira/browse/PIG-5163 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: Nandor Kollar > Fix For: spark-branch > > Attachments: PIG-5163_1.patch > > > 2nd output was empty, looks like pig on spark didn't generate any data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)