[ https://issues.apache.org/jira/browse/PIG-5163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946683#comment-15946683 ]
liyunzhang_intel commented on PIG-5163: --------------------------------------- [~nkollar]: it is a bug of POJoinGroupSpark#setPredecessors. in my cluster before multiquery optimization {code} before multiquery optimization: scope-74->scope-77 scope-82 scope-77 scope-82 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-74 Store(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage) - scope-75 | |---B: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-52 | |---A: New For Each(false,false,false)[bag] - scope-51 | | | Project[bytearray][0] - scope-45 | | | Project[bytearray][1] - scope-47 | | | Project[bytearray][2] - scope-49 | |---A: Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage) - scope-44-------- Spark node scope-77 B: Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage) - scope-56 | |---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage) - scope-76-------- Spark node scope-82 D: Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage) - scope-73 | |---D: New For Each(true,true)[tuple] - scope-72 | | | Project[bag][1] - scope-70 | | | Project[bag][2] - scope-71 | |---D: Package(Packager)[tuple]{bytearray} - scope-65 | |---D: Global Rearrange[tuple] - scope-64 | |---D: Local Rearrange[tuple]{bytearray}(false) - scope-66 | | | | | Project[bytearray][0] - scope-67 | | | |---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage) - scope-78 | |---D: Local Rearrange[tuple]{bytearray}(false) - scope-68 | | | Project[bytearray][0] - scope-69 | |---C: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-61 | |---Load(hdfs://bdpe42:8020/tmp/temp1378261290/tmp-519681347:org.apache.pig.impl.io.InterStorage) - scope-80-------- {code} after multiquery optimization {code} after multiquery optimization: scope-74 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-74 Split - scope-86 | | | B: Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage) - scope-56 | | | D: Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage) - scope-73 | | | |---D: New For Each(true,true)[tuple] - scope-72 | | | | | Project[bag][1] - scope-70 | | | | | Project[bag][2] - scope-71 | | | |---POJoinGroupSpark[tuple] - scope-64 | | | |---C: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-61 | |---B: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-52 | |---A: New For Each(false,false,false)[bag] - scope-51 | | | Project[bytearray][0] - scope-45 | | | Project[bytearray][1] - scope-47 | | | Project[bytearray][2] - scope-49 | |---A: Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage) - scope-44-------- scope-74 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-74 Split - scope-86 | | | B: Store(hdfs://bdpe42:8020/user/root/ms_1.out.1:org.apache.pig.builtin.PigStorage) - scope-56 | | | D: Store(hdfs://bdpe42:8020/user/root/ms_1.out.2:org.apache.pig.builtin.PigStorage) - scope-73 | | | |---D: New For Each(true,true)[tuple] - scope-72 | | | | | Project[bag][1] - scope-70 | | | | | Project[bag][2] - scope-71 | | | |---POJoinGroupSpark[tuple] - scope-64 | | | |---C: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-61 | |---B: POStream[perl -ne 'print $_;' (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)] - scope-52 | |---A: New For Each(false,false,false)[bag] - scope-51 | | | Project[bytearray][0] - scope-45 | | | Project[bytearray][1] - scope-47 | | | Project[bytearray][2] - scope-49 | |---A: Load(/user/pig/tests/data/singlefile/studenttab10k:org.apache.pig.builtin.PigStorage) - scope-44-------- {code} the predecessor of scope-64 is scope-52 and scope-61 while in current code the predecessor of scope-64 is only scope-61 > MultiQuery_Streaming_1 is failing with spark exec type > ------------------------------------------------------ > > Key: PIG-5163 > URL: https://issues.apache.org/jira/browse/PIG-5163 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: liyunzhang_intel > Fix For: spark-branch > > > 2nd output was empty, looks like pig on spark didn't generate any data. -- This message was sent by Atlassian JIRA (v6.3.15#6346)