[ 
https://issues.apache.org/jira/browse/PIG-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965389#comment-15965389
 ] 

liyunzhang_intel commented on PIG-5165:
---------------------------------------

MultiQuery_Union_7.pig 
{code}
a = load './studentnulltab10k' as (name, age, gpa:float);
b = filter a by gpa >= 3.9;
b1 = foreach b generate *;
b2 = foreach b generate *;
b3 = union onschema b1, b2;
c = filter a by gpa < 2;
c1 = foreach c generate *;
c2 = foreach c generate *;
c3 = union onschema c1, c2;
a1 = union onschema b3, c3;
store a1 into './MultiQuery_Union_7.out.1';
d = load './voternulltab10k' as (name, age, registration, contributions);
e = join a1 by name right outer, d by name using 'skewed' PARALLEL 3;
store e into './MultiQuery_Union_7.out.2';
explain e;
{code}

spark plan, the predecessor of POSkewedJoin(scope-226) is POForeach(scope-223) 
which is related to 'd' and POLoad(scope-246) which is related to 'a1', so the 
order of the join is inverted
{code}
----
scope-228->scope-243 scope-243 
scope-243->scope-245 scope-264 
scope-245
scope-249->scope-264 
scope-264
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-228
Split - scope-265
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
 - scope-231
|   |
|   |---b: Filter[bag] - scope-156
|       |   |
|       |   Greater Than or Equal[boolean] - scope-160
|       |   |
|       |   |---Cast[double] - scope-158
|       |   |   |
|       |   |   |---Project[float][2] - scope-157
|       |   |
|       |   |---Constant(3.9) - scope-159
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
 - scope-238
|   |
|   |---c: Filter[bag] - scope-183
|       |   |
|       |   Less Than[boolean] - scope-186
|       |   |
|       |   |---Project[float][2] - scope-184
|       |   |
|       |   |---Constant(2.0) - scope-185
|
|---a: New For Each(false,false,false)[bag] - scope-152
    |   |
    |   Project[bytearray][0] - scope-145
    |   |
    |   Project[bytearray][1] - scope-147
    |   |
    |   Cast[float] - scope-150
    |   |
    |   |---Project[bytearray][2] - scope-149
    |
    |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-144--------

Spark node scope-243
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
 - scope-244
|
|---a1: Union[bag] - scope-207
    |
    |---b3: Union[bag] - scope-180
    |   |
    |   |---b1: New For Each(false,false,false)[bag] - scope-170
    |   |   |   |
    |   |   |   Project[bytearray][0] - scope-164
    |   |   |   |
    |   |   |   Project[bytearray][1] - scope-166
    |   |   |   |
    |   |   |   Project[float][2] - scope-168
    |   |   |
    |   |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
 - scope-156
    |   |
    |   |---b2: New For Each(false,false,false)[bag] - scope-179
    |       |   |
    |       |   Project[bytearray][0] - scope-173
    |       |   |
    |       |   Project[bytearray][1] - scope-175
    |       |   |
    |       |   Project[float][2] - scope-177
    |       |
    |       
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
 - scope-233
    |
    |---c3: Union[bag] - scope-206
        |
        |---c1: New For Each(false,false,false)[bag] - scope-196
        |   |   |
        |   |   Project[bytearray][0] - scope-190
        |   |   |
        |   |   Project[bytearray][1] - scope-192
        |   |   |
        |   |   Project[float][2] - scope-194
        |   |
        |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
 - scope-183
        |
        |---c2: New For Each(false,false,false)[bag] - scope-205
            |   |
            |   Project[bytearray][0] - scope-199
            |   |
            |   Project[bytearray][1] - scope-201
            |   |
            |   Project[float][2] - scope-203
            |
            
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
 - scope-240--------

Spark node scope-245
a1: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
 - scope-211
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
 - scope-207--------

Spark node scope-264
e: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
 - scope-227
|
|---e: SkewedJoin[tuple] - scope-226
    |   |
    |   Project[bytearray][0] - scope-224
    |   |
    |   Project[bytearray][0] - scope-225
    |
    |---d: New For Each(false,false,false,false)[bag] - scope-223
    |   |   |
    |   |   Project[bytearray][0] - scope-215
    |   |   |
    |   |   Project[bytearray][1] - scope-217
    |   |   |
    |   |   Project[bytearray][2] - scope-219
    |   |   |
    |   |   Project[bytearray][3] - scope-221
    |   |
    |   |---d: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-214
    |
    
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
 - scope-246--------

Spark node scope-249
BroadcastSpark - scope-263
|
|---New For Each(false)[tuple] - scope-262
    |   |
    |   POUserFunc(org.apache.pig.impl.builtin.PartitionSkewedKeys)[tuple] - 
scope-261
    |   |
    |   |---Project[tuple][*] - scope-260
    |
    |---New For Each(false,false)[tuple] - scope-259
        |   |
        |   Constant(3) - scope-258
        |   |
        |   Project[bag][1] - scope-257
        |
        |---POSparkSort[tuple]() - scope-226
            |   |
            |   Project[bytearray][0] - scope-224
            |
            |---New For Each(false,true)[tuple] - scope-256
                |   |
                |   Project[bytearray][0] - scope-224
                |   |
                |   
POUserFunc(org.apache.pig.impl.builtin.GetMemNumRows)[tuple] - scope-254
                |   |
                |   |---Project[tuple][*] - scope-253
                |
                |---PoissonSampleSpark - scope-255
                    |
                    
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
 - scope-252--------
{code}                                  
        Use common join, the spark plan is( the predecessor of 
POJoinGroupSpark(scope-205) is Union(scope-186) which isrelated to 'a1' and 
ForEach(scope-202) which is related to 'd')  
{code}
scope-219->scope-234 scope-234 
scope-234
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-219
Split - scope-252
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-222
|   |
|   |---b: Filter[bag] - scope-135
|       |   |
|       |   Greater Than or Equal[boolean] - scope-139
|       |   |
|       |   |---Cast[double] - scope-137
|       |   |   |
|       |   |   |---Project[float][2] - scope-136
|       |   |
|       |   |---Constant(3.9) - scope-138
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-229
|   |
|   |---c: Filter[bag] - scope-162
|       |   |
|       |   Less Than[boolean] - scope-165
|       |   |
|       |   |---Project[float][2] - scope-163
|       |   |
|       |   |---Constant(2.0) - scope-164
|
|---a: New For Each(false,false,false)[bag] - scope-131
    |   |
    |   Project[bytearray][0] - scope-124
    |   |
    |   Project[bytearray][1] - scope-126
    |   |
    |   Cast[float] - scope-129
    |   |
    |   |---Project[bytearray][2] - scope-128
    |
    |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-123--------

Spark node scope-234
Split - scope-251
|   |
|   a1: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
 - scope-190
|   |
|   e: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
 - scope-218
|   |
|   |---e: New For Each(true,true)[tuple] - scope-217
|       |   |
|       |   POBinCond[bag] - scope-215
|       |   |
|       |   |---Project[bag][1] - scope-211
|       |   |
|       |   |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213
|       |   |   |
|       |   |   |---Project[bag][1] - scope-212
|       |   |
|       |   |---Constant({(,,)}) - scope-214
|       |   |
|       |   Project[bag][2] - scope-216
|       |
|       |---POJoinGroupSpark[tuple] - scope-205
|           |
|           |---d: New For Each(false,false,false,false)[bag] - scope-202
|               |   |
|               |   Project[bytearray][0] - scope-194
|               |   |
|               |   Project[bytearray][1] - scope-196
|               |   |
|               |   Project[bytearray][2] - scope-198
|               |   |
|               |   Project[bytearray][3] - scope-200
|               |
|               |---d: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-193
|
|---a1: Union[bag] - scope-186
    |
    |---b3: Union[bag] - scope-159
    |   |
    |   |---b1: New For Each(false,false,false)[bag] - scope-149
    |   |   |   |
    |   |   |   Project[bytearray][0] - scope-143
    |   |   |   |
    |   |   |   Project[bytearray][1] - scope-145
    |   |   |   |
    |   |   |   Project[float][2] - scope-147
    |   |   |
    |   |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-135
    |   |
    |   |---b2: New For Each(false,false,false)[bag] - scope-158
    |       |   |
    |       |   Project[bytearray][0] - scope-152
    |       |   |
    |       |   Project[bytearray][1] - scope-154
    |       |   |
    |       |   Project[float][2] - scope-156
    |       |
    |       
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-224
    |
    |---c3: Union[bag] - scope-185
        |
        |---c1: New For Each(false,false,false)[bag] - scope-175
        |   |   |
        |   |   Project[bytearray][0] - scope-169
        |   |   |
        |   |   Project[bytearray][1] - scope-171
        |   |   |
        |   |   Project[float][2] - scope-173
        |   |
        |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-162
        |
        |---c2: New For Each(false,false,false)[bag] - scope-184
            |   |
            |   Project[bytearray][0] - scope-178
            |   |
            |   Project[bytearray][1] - scope-180
            |   |
            |   Project[float][2] - scope-182
            |
            
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-231--------
scope-219->scope-234 scope-234 
scope-234
#--------------------------------------------------
# Spark Plan                                  
#--------------------------------------------------

Spark node scope-219
Split - scope-252
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-222
|   |
|   |---b: Filter[bag] - scope-135
|       |   |
|       |   Greater Than or Equal[boolean] - scope-139
|       |   |
|       |   |---Cast[double] - scope-137
|       |   |   |
|       |   |   |---Project[float][2] - scope-136
|       |   |
|       |   |---Constant(3.9) - scope-138
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-229
|   |
|   |---c: Filter[bag] - scope-162
|       |   |
|       |   Less Than[boolean] - scope-165
|       |   |
|       |   |---Project[float][2] - scope-163
|       |   |
|       |   |---Constant(2.0) - scope-164
|
|---a: New For Each(false,false,false)[bag] - scope-131
    |   |
    |   Project[bytearray][0] - scope-124
    |   |
    |   Project[bytearray][1] - scope-126
    |   |
    |   Cast[float] - scope-129
    |   |
    |   |---Project[bytearray][2] - scope-128
    |
    |---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-123--------

Spark node scope-234
Split - scope-251
|   |
|   a1: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
 - scope-190
|   |
|   e: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
 - scope-218
|   |
|   |---e: New For Each(true,true)[tuple] - scope-217
|       |   |
|       |   POBinCond[bag] - scope-215
|       |   |
|       |   |---Project[bag][1] - scope-211
|       |   |
|       |   |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213
|       |   |   |
|       |   |   |---Project[bag][1] - scope-212
|       |   |
|       |   |---Constant({(,,)}) - scope-214
|       |   |
|       |   Project[bag][2] - scope-216
|       |
|       |---POJoinGroupSpark[tuple] - scope-205
|           |
|           |---d: New For Each(false,false,false,false)[bag] - scope-202
|               |   |
|               |   Project[bytearray][0] - scope-194
|               |   |
|               |   Project[bytearray][1] - scope-196
|               |   |
|               |   Project[bytearray][2] - scope-198
|               |   |
|               |   Project[bytearray][3] - scope-200
|               |
|               |---d: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
 - scope-193
|
|---a1: Union[bag] - scope-186
    |
    |---b3: Union[bag] - scope-159
    |   |
    |   |---b1: New For Each(false,false,false)[bag] - scope-149
    |   |   |   |
    |   |   |   Project[bytearray][0] - scope-143
    |   |   |   |
    |   |   |   Project[bytearray][1] - scope-145
    |   |   |   |
    |   |   |   Project[float][2] - scope-147
    |   |   |
    |   |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-135
    |   |
    |   |---b2: New For Each(false,false,false)[bag] - scope-158
    |       |   |
    |       |   Project[bytearray][0] - scope-152
    |       |   |
    |       |   Project[bytearray][1] - scope-154
    |       |   |
    |       |   Project[float][2] - scope-156
    |       |
    |       
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
 - scope-224
    |
    |---c3: Union[bag] - scope-185
        |
        |---c1: New For Each(false,false,false)[bag] - scope-175
        |   |   |
        |   |   Project[bytearray][0] - scope-169
        |   |   |
        |   |   Project[bytearray][1] - scope-171
        |   |   |
        |   |   Project[float][2] - scope-173
        |   |
        |   
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-162
        |
        |---c2: New For Each(false,false,false)[bag] - scope-184
            |   |
            |   Project[bytearray][0] - scope-178
            |   |
            |   Project[bytearray][1] - scope-180
            |   |
            |   Project[float][2] - scope-182
            |
            
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
 - scope-231------
{code}  


> MultiQuery_Union_7 is failing with spark exec type
> --------------------------------------------------
>
>                 Key: PIG-5165
>                 URL: https://issues.apache.org/jira/browse/PIG-5165
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> 1st output is fine, 2nd is different



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to