[ https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586116#comment-14586116 ]
liyunzhang_intel commented on PIG-4594: --------------------------------------- Multiquery is automatically enabled. If you want to disable it,just using following command: #pig -no_multiquery script When POSplit is encountered in SparkCompiler#visitSplit, a new SparkOperator will be generated. For example: {code} testSplit.pig A = load './testSplit.txt' as (f1:int, f2:int,f3:int); split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6); store x into './testSplit_x.out'; store y into './testSplit_y.out'; store z into './testSplit_z.out'; {code} {code} cat bin/testSplit.txt 1 2 3 4 5 6 7 8 9 {code} PhysicalPlan {code} x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16 | |---x: Filter[bag] - scope-12 | | | Less Than[boolean] - scope-15 | | | |---Project[int][0] - scope-13 | | | |---Constant(7) - scope-14 | |---1-1: Split - scope-11 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0 y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21 | |---y: Filter[bag] - scope-17 | | | Equal To[boolean] - scope-20 | | | |---Project[int][1] - scope-18 | | | |---Constant(5) - scope-19 | |---1-1: Split - scope-11 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0 z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30 | |---z: Filter[bag] - scope-22 | | | Or[boolean] - scope-29 | | | |---Less Than[boolean] - scope-25 | | | | | |---Project[int][2] - scope-23 | | | | | |---Constant(6) - scope-24 | | | |---Greater Than[boolean] - scope-28 | | | |---Project[int][2] - scope-26 | | | |---Constant(6) - scope-27 | |---1-1: Split - scope-11 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0 {code} SparkPlan {code} before multiquery optimization: scope-31->scope-34 scope-36 scope-38 scope-34 scope-36 scope-38 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-31 Store(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-32 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0-------- Spark node scope-34 x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16 | |---x: Filter[bag] - scope-12 | | | Less Than[boolean] - scope-15 | | | |---Project[int][0] - scope-13 | | | |---Constant(7) - scope-14 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-33-------- Spark node scope-36 y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21 | |---y: Filter[bag] - scope-17 | | | Equal To[boolean] - scope-20 | | | |---Project[int][1] - scope-18 | | | |---Constant(5) - scope-19 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-35-------- Spark node scope-38 z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30 | |---z: Filter[bag] - scope-22 | | | Or[boolean] - scope-29 | | | |---Less Than[boolean] - scope-25 | | | | | |---Project[int][2] - scope-23 | | | | | |---Constant(6) - scope-24 | | | |---Greater Than[boolean] - scope-28 | | | |---Project[int][2] - scope-26 | | | |---Constant(6) - scope-27 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp160363562/tmp-326156769:org.apache.pig.impl.io.InterStorage) - scope-37-------- {code} After multiquery optimization: scope-39 {code} Split - scope-39 | | | x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16 | | | |---x: Filter[bag] - scope-12 | | | | | Less Than[boolean] - scope-15 | | | | | |---Project[int][0] - scope-13 | | | | | |---Constant(7) - scope-14 | | | y: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_y.out:org.apache.pig.builtin.PigStorage) - scope-21 | | | |---y: Filter[bag] - scope-17 | | | | | Equal To[boolean] - scope-20 | | | | | |---Project[int][1] - scope-18 | | | | | |---Constant(5) - scope-19 | | | z: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_z.out:org.apache.pig.builtin.PigStorage) - scope-30 | | | |---z: Filter[bag] - scope-22 | | | | | Or[boolean] - scope-29 | | | | | |---Less Than[boolean] - scope-25 | | | | | | | |---Project[int][2] - scope-23 | | | | | | | |---Constant(6) - scope-24 | | | | | |---Greater Than[boolean] - scope-28 | | | | | |---Project[int][2] - scope-26 | | | | | |---Constant(6) - scope-27 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0 {code} In above case, MultiQueryOptimizerSpark will remove the unncesssary load(scope-33,scope-35,scope-37) and store(scope-32) and merge 4 spark nodes(scope-31,scope-34,scope-36,scope-38) to 1 spark node(scope-39). In PIG-4594.patch, NoopFilterRemover.java is added: NoopFilterRemover will remove the filters producing by the POSplit. for example: NoopFilterRemove will remove scope-15,scope-29 Before NoopFilterRemove#visit is executed: {code} #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-42 Store(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-43 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(file:///home/zly/prj/oss/kellyzly/pig/pig-976.txt:org.apache.pig.builtin.PigStorage) - scope-0-------- Spark node scope-45 d: Store(file:///home/zly/prj/oss/kellyzly/pig/output1:org.apache.pig.builtin.PigStorage) - scope-28 | |---d: New For Each(false,false)[bag] - scope-27 | | | Project[int][0] - scope-21 | | | POUserFunc(org.apache.pig.builtin.LongSum)[long] - scope-25 | | | |---Project[bag][3] - scope-24 | | | |---Project[bag][1] - scope-23 | |---b: Package(Packager)[tuple]{int} - scope-18 | |---b: Global Rearrange[tuple] - scope-17 | |---b: Local Rearrange[tuple]{int}(false) - scope-19 | | | Project[int][2] - scope-20 | |---a: Filter[bag] - scope-15 | | | Constant(true) - scope-16 | |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-44-------- Spark node scope-47 e: Store(file:///home/zly/prj/oss/kellyzly/pig/output2:org.apache.pig.builtin.PigStorage) - scope-41 | |---e: New For Each(false,false)[bag] - scope-40 | | | POUserFunc(org.apache.pig.builtin.COUNT)[long] - scope-36 | | | |---Project[bag][1] - scope-35 | | | Project[int][0] - scope-38 | |---c: Package(Packager)[tuple]{int} - scope-32 | |---c: Global Rearrange[tuple] - scope-31 | |---c: Local Rearrange[tuple]{int}(false) - scope-33 | | | Project[int][3] - scope-34 | |---a: Filter[bag] - scope-29 | | | Constant(true) - scope-30 | |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-46-------- {code} after NoopFilterRemove is executed: {code} #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-42 Store(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-43 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(file:///home/zly/prj/oss/kellyzly/pig/pig-976.txt:org.apache.pig.builtin.PigStorage) - scope-0-------- Spark node scope-45 d: Store(file:///home/zly/prj/oss/kellyzly/pig/output1:org.apache.pig.builtin.PigStorage) - scope-28 | |---d: New For Each(false,false)[bag] - scope-27 | | | Project[int][0] - scope-21 | | | POUserFunc(org.apache.pig.builtin.LongSum)[long] - scope-25 | | | |---Project[bag][3] - scope-24 | | | |---Project[bag][1] - scope-23 | |---b: Package(Packager)[tuple]{int} - scope-18 | |---b: Global Rearrange[tuple] - scope-17 | |---b: Local Rearrange[tuple]{int}(false) - scope-19 | | | Project[int][2] - scope-20 | |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-44-------- Spark node scope-47 e: Store(file:///home/zly/prj/oss/kellyzly/pig/output2:org.apache.pig.builtin.PigStorage) - scope-41 | |---e: New For Each(false,false)[bag] - scope-40 | | | POUserFunc(org.apache.pig.builtin.COUNT)[long] - scope-36 | | | |---Project[bag][1] - scope-35 | | | Project[int][0] - scope-38 | |---c: Package(Packager)[tuple]{int} - scope-32 | |---c: Global Rearrange[tuple] - scope-31 | |---c: Local Rearrange[tuple]{int}(false) - scope-33 | | | Project[int][3] - scope-34 | |---Load(file:/tmp/temp1964795825/tmp-1375252005:org.apache.pig.impl.io.InterStorage) - scope-46-------- {code} In MultiQueryOptimizerSpark, we divde all the cases into 3 situations. Here two concepts "splitter" and "splittee" are introduced. Splittee stands for sparkOperator which contains POSplit. Splitter stands for sparkOperator which is the successor of the splitter. 1. If the size of predecessors of splittee is more than 1, then not do multiquery optimization. For example TestMultiQuery#testMultiQueryWithFJ_2: {code} a = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int); b = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int); c = filter a by uid > 5; store c into './testMultiQueryWithFJ_2.output1'; d = filter b by gid > 10; store d into './testMultiQueryWithFJ_2.output2'; e = join c by gid, d by gid using 'repl'; store e into './testMultiQueryWithFJ_2.output3'; {code} Scope-69 's predecesors are are scope-57, scope-61,if we merge all the physical plan of scope-57,scope-61 to scope-69's physical plan and remove scope-57 and scope-61 as what shows in after multiquery optimization in the following .scope-60 and scope-64 will not find their predecessors(scope-57,scope-61). Because of this, this kind of case can not be multiquery optimized. before multiquery optimization: {code} scope-57->scope-60 scope-69 scope-60 scope-61->scope-64 scope-69 scope-64 scope-69 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-57 Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-58 | |---c: Filter[bag] - scope-14 | | | Greater Than[boolean] - scope-17 | | | |---Project[int][2] - scope-15 | | | |---Constant(5) - scope-16 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- Spark node scope-60 c: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output1:org.apache.pig.builtin.PigStorage) - scope-21 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-59-------- Spark node scope-69 e: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output3:org.apache.pig.builtin.PigStorage) - scope-56 | |---e: FRJoin[tuple] - scope-50 | | | Project[int][3] - scope-48 | | | Project[int][3] - scope-49 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-65 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-67-------- Spark node scope-61 Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-62 | |---d: Filter[bag] - scope-36 | | | Greater Than[boolean] - scope-39 | | | |---Project[int][3] - scope-37 | | | |---Constant(10) - scope-38 | |---b: New For Each(false,false,false,false)[bag] - scope-35 | | | Cast[chararray] - scope-24 | | | |---Project[bytearray][0] - scope-23 | | | Cast[chararray] - scope-27 | | | |---Project[bytearray][1] - scope-26 | | | Cast[int] - scope-30 | | | |---Project[bytearray][2] - scope-29 | | | Cast[int] - scope-33 | | | |---Project[bytearray][3] - scope-32 | |---b: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-22-------- Spark node scope-64 d: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output2:org.apache.pig.builtin.PigStorage) - scope-43 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-63-------- {code} after multiquery optimization: {code} scope-60 scope-64 scope-69 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-60 c: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output1:org.apache.pig.builtin.PigStorage) - scope-21 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-1373171069:org.apache.pig.impl.io.InterStorage) - scope-59-------- Spark node scope-69 e: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output3:org.apache.pig.builtin.PigStorage) - scope-56 | |---e: FRJoin[tuple] - scope-50 | | | Project[int][3] - scope-48 | | | Project[int][3] - scope-49 | |---c: Filter[bag] - scope-14 | | | Greater Than[boolean] - scope-17 | | | |---Project[int][2] - scope-15 | | | |---Constant(5) - scope-16 | | | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | | | Cast[chararray] - scope-2 | | | | | |---Project[bytearray][0] - scope-1 | | | | | Cast[chararray] - scope-5 | | | | | |---Project[bytearray][1] - scope-4 | | | | | Cast[int] - scope-8 | | | | | |---Project[bytearray][2] - scope-7 | | | | | Cast[int] - scope-11 | | | | | |---Project[bytearray][3] - scope-10 | | | |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- | |---d: Filter[bag] - scope-36 | | | Greater Than[boolean] - scope-39 | | | |---Project[int][3] - scope-37 | | | |---Constant(10) - scope-38 | |---b: New For Each(false,false,false,false)[bag] - scope-35 | | | Cast[chararray] - scope-24 | | | |---Project[bytearray][0] - scope-23 | | | Cast[chararray] - scope-27 | | | |---Project[bytearray][1] - scope-26 | | | Cast[int] - scope-30 | | | |---Project[bytearray][2] - scope-29 | | | Cast[int] - scope-33 | | | |---Project[bytearray][3] - scope-32 | |---b: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-22-------- Spark node scope-64 d: Store(hdfs://zly2.sh.intel.com:8020/user/root/testMultiQueryWithFJ_2.output2:org.apache.pig.builtin.PigStorage) - scope-43 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1880430179/tmp-654400409:org.apache.pig.impl.io.InterStorage) - scope-63-------- {code} 2. If the size of splittee is 1: {code} A = load './testSplit.txt' as (f1:int, f2:int,f3:int); split A into x if f1<7, y if f2==5, z if (f3<6 or f3>6); store x into './testSplit_x.out'; {code} before multiquery optimization: {code} scope-17->scope-20 scope-20 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-17 Store(hdfs://zly2.sh.intel.com:8020/tmp/temp756348234/tmp748022356:org.apache.pig.impl.io.InterStorage) - scope-18 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0-------- Spark node scope-20 x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16 | |---x: Filter[bag] - scope-12 | | | Less Than[boolean] - scope-15 | | | |---Project[int][0] - scope-13 | | | |---Constant(7) - scope-14 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp756348234/tmp748022356:org.apache.pig.impl.io.InterStorage) - scope-19-------- {code} after multiquery optimization: {code} scope-17 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-17 x: Store(hdfs://zly2.sh.intel.com:8020/user/root/testSplit_x.out:org.apache.pig.builtin.PigStorage) - scope-16 | |---x: Filter[bag] - scope-12 | | | Less Than[boolean] - scope-15 | | | |---Project[int][0] - scope-13 | | | |---Constant(7) - scope-14 | |---A: New For Each(false,false,false)[bag] - scope-10 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[int] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | |---A: Load(hdfs://zly2.sh.intel.com:8020/user/root/testSplit.txt:org.apache.pig.builtin.PigStorage) - scope-0-------- {code} 3. If the size of splittee is more than 1 and not in case1, we need create a split which type is POSplit, merge all the physical plans of splittees to the physical plan of split and remove the splittees. {code} a = load './passwd' using PigStorage(':') as (uname:chararray, passwd:chararray, uid:int, gid:int); b = filter a by uid < 5; store b into './multiquery.b.out'; c = foreach b generate uname; store c into './multiquery.out'; {code} In this case after multiquery optimization, we create POSplit (scope-34) and merge all the splittees to the sp before multiquery optimization: {code} scope-28->scope-31 scope-33 scope-31 scope-33 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-28 Store(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-29 | |---b: Filter[bag] - scope-14 | | | Less Than[boolean] - scope-17 | | | |---Project[int][2] - scope-15 | | | |---Constant(5) - scope-16 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- Spark node scope-31 b: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.b.out:org.apache.pig.builtin.PigStorage) - scope-21 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-30-------- Spark node scope-33 c: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.out:org.apache.pig.builtin.PigStorage) - scope-27 | |---c: New For Each(false)[bag] - scope-26 | | | Project[chararray][0] - scope-24 | |---Load(hdfs://zly2.sh.intel.com:8020/tmp/temp-1156248777/tmp1448287392:org.apache.pig.impl.io.InterStorage) - scope-32-------- {code} after multiquery optimization: {code} scope-28 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-28 Split - scope-34 | | | b: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.b.out:org.apache.pig.builtin.PigStorage) - scope-21 | | | c: Store(hdfs://zly2.sh.intel.com:8020/user/root/multiquery.out:org.apache.pig.builtin.PigStorage) - scope-27 | | | |---c: New For Each(false)[bag] - scope-26 | | | | | Project[chararray][0] - scope-24 | |---b: Filter[bag] - scope-14 | | | Less Than[boolean] - scope-17 | | | |---Project[int][2] - scope-15 | | | |---Constant(5) - scope-16 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly2.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0-------- {code} > Enable "TestMultiQuery" in spark mode > ------------------------------------- > > Key: PIG-4594 > URL: https://issues.apache.org/jira/browse/PIG-4594 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: liyunzhang_intel > Fix For: spark-branch > > Attachments: PIG-4594.patch > > > in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows > that > following unit test failures fail: > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252 > org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438 -- This message was sent by Atlassian JIRA (v6.3.4#6332)