[ https://issues.apache.org/jira/browse/PIG-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965398#comment-15965398 ]
liyunzhang_intel commented on PIG-5165: --------------------------------------- the predecessors of POSkewedJoin are valid in order after combine optimization. but before multiquery optimization, the predecessor of POSkewedJoin are inverted. see following {code} after combiner optimization: scope-228->scope-230 scope-237 scope-230->scope-243 scope-237->scope-243 scope-243->scope-245 scope-264 scope-245 scope-249->scope-264 scope-264 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-228 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-229 | |---a: New For Each(false,false,false)[bag] - scope-152 | | | Project[bytearray][0] - scope-145 | | | Project[bytearray][1] - scope-147 | | | Cast[float] - scope-150 | | | |---Project[bytearray][2] - scope-149 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage) - scope-144-------- Spark node scope-230 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-231 | |---b: Filter[bag] - scope-156 | | | Greater Than or Equal[boolean] - scope-160 | | | |---Cast[double] - scope-158 | | | | | |---Project[float][2] - scope-157 | | | |---Constant(3.9) - scope-159 | |---a: Filter[bag] - scope-154 | | | Constant(true) - scope-155 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-152-------- Spark node scope-243 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-244 | |---a1: Union[bag] - scope-207 | |---b3: Union[bag] - scope-180 | | | |---b1: New For Each(false,false,false)[bag] - scope-170 | | | | | | | Project[bytearray][0] - scope-164 | | | | | | | Project[bytearray][1] - scope-166 | | | | | | | Project[float][2] - scope-168 | | | | | |---b: Filter[bag] - scope-162 | | | | | | | Constant(true) - scope-163 | | | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-156 | | | |---b2: New For Each(false,false,false)[bag] - scope-179 | | | | | Project[bytearray][0] - scope-173 | | | | | Project[bytearray][1] - scope-175 | | | | | Project[float][2] - scope-177 | | | |---b: Filter[bag] - scope-171 | | | | | Constant(true) - scope-172 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-233 | |---c3: Union[bag] - scope-206 | |---c1: New For Each(false,false,false)[bag] - scope-196 | | | | | Project[bytearray][0] - scope-190 | | | | | Project[bytearray][1] - scope-192 | | | | | Project[float][2] - scope-194 | | | |---c: Filter[bag] - scope-188 | | | | | Constant(true) - scope-189 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-183 | |---c2: New For Each(false,false,false)[bag] - scope-205 | | | Project[bytearray][0] - scope-199 | | | Project[bytearray][1] - scope-201 | | | Project[float][2] - scope-203 | |---c: Filter[bag] - scope-197 | | | Constant(true) - scope-198 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-240-------- Spark node scope-245 a1: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage) - scope-211 | |---a1: Filter[bag] - scope-209 | | | Constant(true) - scope-210 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-207-------- Spark node scope-264 e: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage) - scope-227 | |---e: SkewedJoin[tuple] - scope-226 | | | Project[bytearray][0] - scope-224 | | | Project[bytearray][0] - scope-225 | |---a1: Filter[bag] - scope-212 | | | | | Constant(true) - scope-213 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-246 | |---d: New For Each(false,false,false,false)[bag] - scope-223 | | | Project[bytearray][0] - scope-215 | | | Project[bytearray][1] - scope-217 | | | Project[bytearray][2] - scope-219 | | | Project[bytearray][3] - scope-221 | |---d: Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage) - scope-214-------- Spark node scope-237 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-238 | |---c: Filter[bag] - scope-183 | | | Less Than[boolean] - scope-186 | | | |---Project[float][2] - scope-184 | | | |---Constant(2.0) - scope-185 | |---a: Filter[bag] - scope-181 | | | Constant(true) - scope-182 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-236-------- Spark node scope-249 BroadcastSpark - scope-263 | |---New For Each(false)[tuple] - scope-262 | | | POUserFunc(org.apache.pig.impl.builtin.PartitionSkewedKeys)[tuple] - scope-261 | | | |---Project[tuple][*] - scope-260 | |---New For Each(false,false)[tuple] - scope-259 | | | Constant(3) - scope-258 | | | Project[bag][1] - scope-257 | |---POSparkSort[tuple]() - scope-226 | | | Project[bytearray][0] - scope-224 | |---New For Each(false,true)[tuple] - scope-256 | | | Project[bytearray][0] - scope-224 | | | POUserFunc(org.apache.pig.impl.builtin.GetMemNumRows)[tuple] - scope-254 | | | |---Project[tuple][*] - scope-253 | |---PoissonSampleSpark - scope-255 | |---a1: Filter[bag] - scope-250 | | | Constant(true) - scope-251 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-252-------- before multiquery optimization: scope-228->scope-230 scope-237 scope-230->scope-243 scope-237->scope-243 scope-243->scope-245 scope-264 scope-245 scope-249->scope-264 scope-264 #-------------------------------------------------- # Spark Plan #-------------------------------------------------- Spark node scope-228 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-229 | |---a: New For Each(false,false,false)[bag] - scope-152 | | | Project[bytearray][0] - scope-145 | | | Project[bytearray][1] - scope-147 | | | Cast[float] - scope-150 | | | |---Project[bytearray][2] - scope-149 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage) - scope-144-------- Spark node scope-230 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-231 | |---b: Filter[bag] - scope-156 | | | Greater Than or Equal[boolean] - scope-160 | | | |---Cast[double] - scope-158 | | | | | |---Project[float][2] - scope-157 | | | |---Constant(3.9) - scope-159 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-152-------- Spark node scope-243 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-244 | |---a1: Union[bag] - scope-207 | |---b3: Union[bag] - scope-180 | | | |---b1: New For Each(false,false,false)[bag] - scope-170 | | | | | | | Project[bytearray][0] - scope-164 | | | | | | | Project[bytearray][1] - scope-166 | | | | | | | Project[float][2] - scope-168 | | | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-156 | | | |---b2: New For Each(false,false,false)[bag] - scope-179 | | | | | Project[bytearray][0] - scope-173 | | | | | Project[bytearray][1] - scope-175 | | | | | Project[float][2] - scope-177 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-2031963854:org.apache.pig.impl.io.InterStorage) - scope-233 | |---c3: Union[bag] - scope-206 | |---c1: New For Each(false,false,false)[bag] - scope-196 | | | | | Project[bytearray][0] - scope-190 | | | | | Project[bytearray][1] - scope-192 | | | | | Project[float][2] - scope-194 | | | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-183 | |---c2: New For Each(false,false,false)[bag] - scope-205 | | | Project[bytearray][0] - scope-199 | | | Project[bytearray][1] - scope-201 | | | Project[float][2] - scope-203 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-240-------- Spark node scope-245 a1: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage) - scope-211 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-207-------- Spark node scope-264 e: Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage) - scope-227 | |---e: SkewedJoin[tuple] - scope-226 | | | Project[bytearray][0] - scope-224 | | | Project[bytearray][0] - scope-225 | |---d: New For Each(false,false,false,false)[bag] - scope-223 | | | | | Project[bytearray][0] - scope-215 | | | | | Project[bytearray][1] - scope-217 | | | | | Project[bytearray][2] - scope-219 | | | | | Project[bytearray][3] - scope-221 | | | |---d: Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage) - scope-214 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-246-------- Spark node scope-237 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp-1315218881:org.apache.pig.impl.io.InterStorage) - scope-238 | |---c: Filter[bag] - scope-183 | | | Less Than[boolean] - scope-186 | | | |---Project[float][2] - scope-184 | | | |---Constant(2.0) - scope-185 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp615375446:org.apache.pig.impl.io.InterStorage) - scope-236-------- Spark node scope-249 BroadcastSpark - scope-263 | |---New For Each(false)[tuple] - scope-262 | | | POUserFunc(org.apache.pig.impl.builtin.PartitionSkewedKeys)[tuple] - scope-261 | | | |---Project[tuple][*] - scope-260 | |---New For Each(false,false)[tuple] - scope-259 | | | Constant(3) - scope-258 | | | Project[bag][1] - scope-257 | |---POSparkSort[tuple]() - scope-226 | | | Project[bytearray][0] - scope-224 | |---New For Each(false,true)[tuple] - scope-256 | | | Project[bytearray][0] - scope-224 | | | POUserFunc(org.apache.pig.impl.builtin.GetMemNumRows)[tuple] - scope-254 | | | |---Project[tuple][*] - scope-253 | |---PoissonSampleSpark - scope-255 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp691912264/tmp287701784:org.apache.pig.impl.io.InterStorage) - scope-252-------- {code} > MultiQuery_Union_7 is failing with spark exec type > -------------------------------------------------- > > Key: PIG-5165 > URL: https://issues.apache.org/jira/browse/PIG-5165 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: Nandor Kollar > Assignee: liyunzhang_intel > Fix For: spark-branch > > > 1st output is fine, 2nd is different -- This message was sent by Atlassian JIRA (v6.3.15#6346)