[ https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039583#comment-15039583 ]
liyunzhang_intel commented on PIG-4594: --------------------------------------- [~mohitsabharwal] and [~kexianda]: Let's explain why need add forceConnect method in PhysicalPlan.java: {quote} If we want to connect x to y and z. In MR implementation, we clone two copies x1 and x2. x1 will be connected to y, x2 will be connected to z.. {quote} Agree with Xianda. Let's see an example for it. {code} cat bin/testMultiQueryJiraPig983_2.pig a = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int); b = filter a by uid < 5; c = filter a by uid >= 5; d = join b by uname, c by uname; store d into './testMultiQueryJiraPig983_2.out'; {code} You can see in following result,after multiquery optimization, scope-57 and scope-67 actullay are same, so in mr implemention, scope-67 copys scope-57 to avoid exception "This operator does not support multiple outputs". we need do the load *twice* even though they are *same.* {code} before multiquery optimization: #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-39 Map Plan Store(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage) - scope-40 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- Global sort: false ---------------- MapReduce node scope-45 Map Plan Union[tuple] - scope-46 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-31 | | | | | Project[chararray][0] - scope-32 | | | |---b: Filter[bag] - scope-17 | | | | | Less Than[boolean] - scope-20 | | | | | |---Project[int][2] - scope-18 | | | | | |---Constant(5) - scope-19 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage) - scope-41 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-33 | | | Project[chararray][0] - scope-34 | |---c: Filter[bag] - scope-23 | | | Greater Than or Equal[boolean] - scope-26 | | | |---Project[int][2] - scope-24 | | | |---Constant(5) - scope-25 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage) - scope-43-------- Reduce Plan d: Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage) - scope-38 | |---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30-------- Global sort: false ---------------- after multiquery optimization: #-------------------------------------------------- # Map Reduce Plan #-------------------------------------------------- MapReduce node scope-45 Map Plan Union[tuple] - scope-46 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-31 | | | | | Project[chararray][0] - scope-32 | | | |---b: Filter[bag] - scope-17 | | | | | Less Than[boolean] - scope-20 | | | | | |---Project[int][2] - scope-18 | | | | | |---Constant(5) - scope-19 | | | |---a: New For Each(false,false,false,false)[bag] - scope-67 | | | | | Cast[chararray] - scope-60 | | | | | |---Project[bytearray][0] - scope-59 | | | | | Cast[chararray] - scope-62 | | | | | |---Project[bytearray][1] - scope-61 | | | | | Cast[int] - scope-64 | | | | | |---Project[bytearray][2] - scope-63 | | | | | Cast[int] - scope-66 | | | | | |---Project[bytearray][3] - scope-65 | | | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-58 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-33 | | | Project[chararray][0] - scope-34 | |---c: Filter[bag] - scope-23 | | | Greater Than or Equal[boolean] - scope-26 | | | |---Project[int][2] - scope-24 | | | |---Constant(5) - scope-25 | |---a: New For Each(false,false,false,false)[bag] - scope-57 | | | Cast[chararray] - scope-50 | | | |---Project[bytearray][0] - scope-49 | | | Cast[chararray] - scope-52 | | | |---Project[bytearray][1] - scope-51 | | | Cast[int] - scope-54 | | | |---Project[bytearray][2] - scope-53 | | | Cast[int] - scope-56 | | | |---Project[bytearray][3] - scope-55 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-48-------- Reduce Plan d: Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage) - scope-38 | |---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30-------- Global sort: false ---------------- {code} *but* in spark multiquery optimization: we *don't* need do the load *twice*(only have one load(*scope-0*)) the result in spark mode: {code} before multiquery optimization: scope-39->scope-45 scope-45 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-39 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage) - scope-40 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- Spark node scope-45 d: Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage) - scope-38 | |---d: New For Each(true,true)[tuple] - scope-37 | | | Project[bag][1] - scope-35 | | | Project[bag][2] - scope-36 | |---d: Package(Packager)[tuple]{chararray} - scope-30 | |---d: Global Rearrange[tuple] - scope-29 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-31 | | | | | Project[chararray][0] - scope-32 | | | |---b: Filter[bag] - scope-17 | | | | | Less Than[boolean] - scope-20 | | | | | |---Project[int][2] - scope-18 | | | | | |---Constant(5) - scope-19 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage) - scope-41 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-33 | | | Project[chararray][0] - scope-34 | |---c: Filter[bag] - scope-23 | | | Greater Than or Equal[boolean] - scope-26 | | | |---Project[int][2] - scope-24 | | | |---Constant(5) - scope-25 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage) - scope-43-------- after multiquery optimization: scope-39 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-39 d: Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage) - scope-38 | |---d: New For Each(true,true)[tuple] - scope-37 | | | Project[bag][1] - scope-35 | | | Project[bag][2] - scope-36 | |---d: Package(Packager)[tuple]{chararray} - scope-30 | |---d: Global Rearrange[tuple] - scope-29 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-31 | | | | | Project[chararray][0] - scope-32 | | | |---b: Filter[bag] - scope-17 | | | | | Less Than[boolean] - scope-20 | | | | | |---Project[int][2] - scope-18 | | | | | |---Constant(5) - scope-19 | | | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | | | Cast[chararray] - scope-2 | | | | | |---Project[bytearray][0] - scope-1 | | | | | Cast[chararray] - scope-5 | | | | | |---Project[bytearray][1] - scope-4 | | | | | Cast[int] - scope-8 | | | | | |---Project[bytearray][2] - scope-7 | | | | | Cast[int] - scope-11 | | | | | |---Project[bytearray][3] - scope-10 | | | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0 | |---d: Local Rearrange[tuple]{chararray}(false) - scope-33 | | | Project[chararray][0] - scope-34 | |---c: Filter[bag] - scope-23 | | | Greater Than or Equal[boolean] - scope-26 | | | |---Project[int][2] - scope-24 | | | |---Constant(5) - scope-25 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- {code} Method *forceConnect* I added in PIG-4594.patch works to connect y and z to x even though x does not support multi outputs because i *remove* the check whether the operator supports multiOutputs. > Enable "TestMultiQuery" in spark mode > ------------------------------------- > > Key: PIG-4594 > URL: https://issues.apache.org/jira/browse/PIG-4594 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4594-3.patch, PIG-4594.patch, PIG-4594_1.patch, > PIG-4594_2.patch > > > in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows > that > following unit test failures fail: > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438 -- This message was sent by Atlassian JIRA (v6.3.4#6332)