[ https://issues.apache.org/jira/browse/PIG-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965389#comment-15965389 ]
liyunzhang_intel commented on PIG-5165: --------------------------------------- MultiQuery_Union_7.pig {code} a = load './studentnulltab10k' as (name, age, gpa:float); b = filter a by gpa >= 3.9; b1 = foreach b generate *; b2 = foreach b generate *; b3 = union onschema b1, b2; c = filter a by gpa < 2; c1 = foreach c generate *; c2 = foreach c generate *; c3 = union onschema c1, c2; a1 = union onschema b3, c3; store a1 into './MultiQuery_Union_7.out.1'; d = load './voternulltab10k' as (name, age, registration, contributions); e = join a1 by name right outer, d by name using 'skewed' PARALLEL 3; store e into './MultiQuery_Union_7.out.2'; explain e; {code} spark plan, the predecessor of POSkewedJoin(scope-226) is POForeach(scope-223) which is related to 'd' and POLoad(scope-246) which is related to 'a1', so the order of the join is inverted {code} ---- scope-228->scope-243 scope-243 scope-243->scope-245 scope-264 scope-245 scope-249->scope-264 scope-264 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-228 Split - scope-265 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage) - scope-231 | | | |---b: Filter[bag] - scope-156 | | | | | Greater Than or Equal[boolean] - scope-160 | | | | | |---Cast[double] - scope-158 | | | | | | | |---Project[float][2] - scope-157 | | | | | |---Constant(3.9) - scope-159 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage) - scope-238 | | | |---c: Filter[bag] - scope-183 | | | | | Less Than[boolean] - scope-186 | | | | | |---Project[float][2] - scope-184 | | | | | |---Constant(2.0) - scope-185 | |---a: New For Each(false,false,false)[bag] - scope-152 | | | Project[bytearray][0] - scope-145 | | | Project[bytearray][1] - scope-147 | | | Cast[float] - scope-150 | | | |---Project[bytearray][2] - scope-149 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage) - scope-144-------- Spark node scope-243 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage) - scope-244 | |---a1: Union[bag] - scope-207 | |---b3: Union[bag] - scope-180 | | | |---b1: New For Each(false,false,false)[bag] - scope-170 | | | | | | | Project[bytearray][0] - scope-164 | | | | | | | Project[bytearray][1] - scope-166 | | | | | | | Project[float][2] - scope-168 | | | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage) - scope-156 | | | |---b2: New For Each(false,false,false)[bag] - scope-179 | | | | | Project[bytearray][0] - scope-173 | | | | | Project[bytearray][1] - scope-175 | | | | | Project[float][2] - scope-177 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage) - scope-233 | |---c3: Union[bag] - scope-206 | |---c1: New For Each(false,false,false)[bag] - scope-196 | | | | | Project[bytearray][0] - scope-190 | | | | | Project[bytearray][1] - scope-192 | | | | | Project[float][2] - scope-194 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage) - scope-183 | |---c2: New For Each(false,false,false)[bag] - scope-205 | | | Project[bytearray][0] - scope-199 | | | Project[bytearray][1] - scope-201 | | | Project[float][2] - scope-203 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage) - scope-240-------- Spark node scope-245 a1: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage) - scope-211 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage) - scope-207-------- Spark node scope-264 e: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage) - scope-227 | |---e: SkewedJoin[tuple] - scope-226 | | | Project[bytearray][0] - scope-224 | | | Project[bytearray][0] - scope-225 | |---d: New For Each(false,false,false,false)[bag] - scope-223 | | | | | Project[bytearray][0] - scope-215 | | | | | Project[bytearray][1] - scope-217 | | | | | Project[bytearray][2] - scope-219 | | | | | Project[bytearray][3] - scope-221 | | | |---d: Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage) - scope-214 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage) - scope-246-------- Spark node scope-249 BroadcastSpark - scope-263 | |---New For Each(false)[tuple] - scope-262 | | | POUserFunc(org.apache.pig.impl.builtin.PartitionSkewedKeys)[tuple] - scope-261 | | | |---Project[tuple][*] - scope-260 | |---New For Each(false,false)[tuple] - scope-259 | | | Constant(3) - scope-258 | | | Project[bag][1] - scope-257 | |---POSparkSort[tuple]() - scope-226 | | | Project[bytearray][0] - scope-224 | |---New For Each(false,true)[tuple] - scope-256 | | | Project[bytearray][0] - scope-224 | | | POUserFunc(org.apache.pig.impl.builtin.GetMemNumRows)[tuple] - scope-254 | | | |---Project[tuple][*] - scope-253 | |---PoissonSampleSpark - scope-255 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage) - scope-252-------- {code} Use common join, the spark plan is( the predecessor of POJoinGroupSpark(scope-205) is Union(scope-186) which isrelated to 'a1' and ForEach(scope-202) which is related to 'd') {code} scope-219->scope-234 scope-234 scope-234 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-219 Split - scope-252 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-222 | | | |---b: Filter[bag] - scope-135 | | | | | Greater Than or Equal[boolean] - scope-139 | | | | | |---Cast[double] - scope-137 | | | | | | | |---Project[float][2] - scope-136 | | | | | |---Constant(3.9) - scope-138 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-229 | | | |---c: Filter[bag] - scope-162 | | | | | Less Than[boolean] - scope-165 | | | | | |---Project[float][2] - scope-163 | | | | | |---Constant(2.0) - scope-164 | |---a: New For Each(false,false,false)[bag] - scope-131 | | | Project[bytearray][0] - scope-124 | | | Project[bytearray][1] - scope-126 | | | Cast[float] - scope-129 | | | |---Project[bytearray][2] - scope-128 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage) - scope-123-------- Spark node scope-234 Split - scope-251 | | | a1: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage) - scope-190 | | | e: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage) - scope-218 | | | |---e: New For Each(true,true)[tuple] - scope-217 | | | | | POBinCond[bag] - scope-215 | | | | | |---Project[bag][1] - scope-211 | | | | | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213 | | | | | | | |---Project[bag][1] - scope-212 | | | | | |---Constant({(,,)}) - scope-214 | | | | | Project[bag][2] - scope-216 | | | |---POJoinGroupSpark[tuple] - scope-205 | | | |---d: New For Each(false,false,false,false)[bag] - scope-202 | | | | | Project[bytearray][0] - scope-194 | | | | | Project[bytearray][1] - scope-196 | | | | | Project[bytearray][2] - scope-198 | | | | | Project[bytearray][3] - scope-200 | | | |---d: Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage) - scope-193 | |---a1: Union[bag] - scope-186 | |---b3: Union[bag] - scope-159 | | | |---b1: New For Each(false,false,false)[bag] - scope-149 | | | | | | | Project[bytearray][0] - scope-143 | | | | | | | Project[bytearray][1] - scope-145 | | | | | | | Project[float][2] - scope-147 | | | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-135 | | | |---b2: New For Each(false,false,false)[bag] - scope-158 | | | | | Project[bytearray][0] - scope-152 | | | | | Project[bytearray][1] - scope-154 | | | | | Project[float][2] - scope-156 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-224 | |---c3: Union[bag] - scope-185 | |---c1: New For Each(false,false,false)[bag] - scope-175 | | | | | Project[bytearray][0] - scope-169 | | | | | Project[bytearray][1] - scope-171 | | | | | Project[float][2] - scope-173 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-162 | |---c2: New For Each(false,false,false)[bag] - scope-184 | | | Project[bytearray][0] - scope-178 | | | Project[bytearray][1] - scope-180 | | | Project[float][2] - scope-182 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-231-------- scope-219->scope-234 scope-234 scope-234 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-219 Split - scope-252 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-222 | | | |---b: Filter[bag] - scope-135 | | | | | Greater Than or Equal[boolean] - scope-139 | | | | | |---Cast[double] - scope-137 | | | | | | | |---Project[float][2] - scope-136 | | | | | |---Constant(3.9) - scope-138 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-229 | | | |---c: Filter[bag] - scope-162 | | | | | Less Than[boolean] - scope-165 | | | | | |---Project[float][2] - scope-163 | | | | | |---Constant(2.0) - scope-164 | |---a: New For Each(false,false,false)[bag] - scope-131 | | | Project[bytearray][0] - scope-124 | | | Project[bytearray][1] - scope-126 | | | Cast[float] - scope-129 | | | |---Project[bytearray][2] - scope-128 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage) - scope-123-------- Spark node scope-234 Split - scope-251 | | | a1: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage) - scope-190 | | | e: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage) - scope-218 | | | |---e: New For Each(true,true)[tuple] - scope-217 | | | | | POBinCond[bag] - scope-215 | | | | | |---Project[bag][1] - scope-211 | | | | | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213 | | | | | | | |---Project[bag][1] - scope-212 | | | | | |---Constant({(,,)}) - scope-214 | | | | | Project[bag][2] - scope-216 | | | |---POJoinGroupSpark[tuple] - scope-205 | | | |---d: New For Each(false,false,false,false)[bag] - scope-202 | | | | | Project[bytearray][0] - scope-194 | | | | | Project[bytearray][1] - scope-196 | | | | | Project[bytearray][2] - scope-198 | | | | | Project[bytearray][3] - scope-200 | | | |---d: Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage) - scope-193 | |---a1: Union[bag] - scope-186 | |---b3: Union[bag] - scope-159 | | | |---b1: New For Each(false,false,false)[bag] - scope-149 | | | | | | | Project[bytearray][0] - scope-143 | | | | | | | Project[bytearray][1] - scope-145 | | | | | | | Project[float][2] - scope-147 | | | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-135 | | | |---b2: New For Each(false,false,false)[bag] - scope-158 | | | | | Project[bytearray][0] - scope-152 | | | | | Project[bytearray][1] - scope-154 | | | | | Project[float][2] - scope-156 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage) - scope-224 | |---c3: Union[bag] - scope-185 | |---c1: New For Each(false,false,false)[bag] - scope-175 | | | | | Project[bytearray][0] - scope-169 | | | | | Project[bytearray][1] - scope-171 | | | | | Project[float][2] - scope-173 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-162 | |---c2: New For Each(false,false,false)[bag] - scope-184 | | | Project[bytearray][0] - scope-178 | | | Project[bytearray][1] - scope-180 | | | Project[float][2] - scope-182 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage) - scope-231------ {code} > MultiQuery_Union_7 is failing with spark exec type > -------------------------------------------------- > > Key: PIG-5165 > URL: https://issues.apache.org/jira/browse/PIG-5165 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: liyunzhang_intel > Fix For: spark-branch > > > 1st output is fine, 2nd is different -- This message was sent by Atlassian JIRA (v6.3.15#6346)