[
https://issues.apache.org/jira/browse/PIG-5165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965389#comment-15965389
]
liyunzhang_intel commented on PIG-5165:
---------------------------------------
MultiQuery_Union_7.pig
{code}
a = load './studentnulltab10k' as (name, age, gpa:float);
b = filter a by gpa >= 3.9;
b1 = foreach b generate *;
b2 = foreach b generate *;
b3 = union onschema b1, b2;
c = filter a by gpa < 2;
c1 = foreach c generate *;
c2 = foreach c generate *;
c3 = union onschema c1, c2;
a1 = union onschema b3, c3;
store a1 into './MultiQuery_Union_7.out.1';
d = load './voternulltab10k' as (name, age, registration, contributions);
e = join a1 by name right outer, d by name using 'skewed' PARALLEL 3;
store e into './MultiQuery_Union_7.out.2';
explain e;
{code}
spark plan, the predecessor of POSkewedJoin(scope-226) is POForeach(scope-223)
which is related to 'd' and POLoad(scope-246) which is related to 'a1', so the
order of the join is inverted
{code}
----
scope-228->scope-243 scope-243
scope-243->scope-245 scope-264
scope-245
scope-249->scope-264
scope-264
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-228
Split - scope-265
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
- scope-231
| |
| |---b: Filter[bag] - scope-156
| | |
| | Greater Than or Equal[boolean] - scope-160
| | |
| | |---Cast[double] - scope-158
| | | |
| | | |---Project[float][2] - scope-157
| | |
| | |---Constant(3.9) - scope-159
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
- scope-238
| |
| |---c: Filter[bag] - scope-183
| | |
| | Less Than[boolean] - scope-186
| | |
| | |---Project[float][2] - scope-184
| | |
| | |---Constant(2.0) - scope-185
|
|---a: New For Each(false,false,false)[bag] - scope-152
| |
| Project[bytearray][0] - scope-145
| |
| Project[bytearray][1] - scope-147
| |
| Cast[float] - scope-150
| |
| |---Project[bytearray][2] - scope-149
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
- scope-144--------
Spark node scope-243
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
- scope-244
|
|---a1: Union[bag] - scope-207
|
|---b3: Union[bag] - scope-180
| |
| |---b1: New For Each(false,false,false)[bag] - scope-170
| | | |
| | | Project[bytearray][0] - scope-164
| | | |
| | | Project[bytearray][1] - scope-166
| | | |
| | | Project[float][2] - scope-168
| | |
| |
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
- scope-156
| |
| |---b2: New For Each(false,false,false)[bag] - scope-179
| | |
| | Project[bytearray][0] - scope-173
| | |
| | Project[bytearray][1] - scope-175
| | |
| | Project[float][2] - scope-177
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-684697133:org.apache.pig.impl.io.InterStorage)
- scope-233
|
|---c3: Union[bag] - scope-206
|
|---c1: New For Each(false,false,false)[bag] - scope-196
| | |
| | Project[bytearray][0] - scope-190
| | |
| | Project[bytearray][1] - scope-192
| | |
| | Project[float][2] - scope-194
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
- scope-183
|
|---c2: New For Each(false,false,false)[bag] - scope-205
| |
| Project[bytearray][0] - scope-199
| |
| Project[bytearray][1] - scope-201
| |
| Project[float][2] - scope-203
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp1279533794:org.apache.pig.impl.io.InterStorage)
- scope-240--------
Spark node scope-245
a1:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
- scope-211
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
- scope-207--------
Spark node scope-264
e:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
- scope-227
|
|---e: SkewedJoin[tuple] - scope-226
| |
| Project[bytearray][0] - scope-224
| |
| Project[bytearray][0] - scope-225
|
|---d: New For Each(false,false,false,false)[bag] - scope-223
| | |
| | Project[bytearray][0] - scope-215
| | |
| | Project[bytearray][1] - scope-217
| | |
| | Project[bytearray][2] - scope-219
| | |
| | Project[bytearray][3] - scope-221
| |
| |---d:
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
- scope-214
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
- scope-246--------
Spark node scope-249
BroadcastSpark - scope-263
|
|---New For Each(false)[tuple] - scope-262
| |
| POUserFunc(org.apache.pig.impl.builtin.PartitionSkewedKeys)[tuple] -
scope-261
| |
| |---Project[tuple][*] - scope-260
|
|---New For Each(false,false)[tuple] - scope-259
| |
| Constant(3) - scope-258
| |
| Project[bag][1] - scope-257
|
|---POSparkSort[tuple]() - scope-226
| |
| Project[bytearray][0] - scope-224
|
|---New For Each(false,true)[tuple] - scope-256
| |
| Project[bytearray][0] - scope-224
| |
|
POUserFunc(org.apache.pig.impl.builtin.GetMemNumRows)[tuple] - scope-254
| |
| |---Project[tuple][*] - scope-253
|
|---PoissonSampleSpark - scope-255
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-868851627/tmp-1787691485:org.apache.pig.impl.io.InterStorage)
- scope-252--------
{code}
Use common join, the spark plan is( the predecessor of
POJoinGroupSpark(scope-205) is Union(scope-186) which isrelated to 'a1' and
ForEach(scope-202) which is related to 'd')
{code}
scope-219->scope-234 scope-234
scope-234
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-219
Split - scope-252
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-222
| |
| |---b: Filter[bag] - scope-135
| | |
| | Greater Than or Equal[boolean] - scope-139
| | |
| | |---Cast[double] - scope-137
| | | |
| | | |---Project[float][2] - scope-136
| | |
| | |---Constant(3.9) - scope-138
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-229
| |
| |---c: Filter[bag] - scope-162
| | |
| | Less Than[boolean] - scope-165
| | |
| | |---Project[float][2] - scope-163
| | |
| | |---Constant(2.0) - scope-164
|
|---a: New For Each(false,false,false)[bag] - scope-131
| |
| Project[bytearray][0] - scope-124
| |
| Project[bytearray][1] - scope-126
| |
| Cast[float] - scope-129
| |
| |---Project[bytearray][2] - scope-128
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
- scope-123--------
Spark node scope-234
Split - scope-251
| |
| a1:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
- scope-190
| |
| e:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
- scope-218
| |
| |---e: New For Each(true,true)[tuple] - scope-217
| | |
| | POBinCond[bag] - scope-215
| | |
| | |---Project[bag][1] - scope-211
| | |
| | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213
| | | |
| | | |---Project[bag][1] - scope-212
| | |
| | |---Constant({(,,)}) - scope-214
| | |
| | Project[bag][2] - scope-216
| |
| |---POJoinGroupSpark[tuple] - scope-205
| |
| |---d: New For Each(false,false,false,false)[bag] - scope-202
| | |
| | Project[bytearray][0] - scope-194
| | |
| | Project[bytearray][1] - scope-196
| | |
| | Project[bytearray][2] - scope-198
| | |
| | Project[bytearray][3] - scope-200
| |
| |---d:
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
- scope-193
|
|---a1: Union[bag] - scope-186
|
|---b3: Union[bag] - scope-159
| |
| |---b1: New For Each(false,false,false)[bag] - scope-149
| | | |
| | | Project[bytearray][0] - scope-143
| | | |
| | | Project[bytearray][1] - scope-145
| | | |
| | | Project[float][2] - scope-147
| | |
| |
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-135
| |
| |---b2: New For Each(false,false,false)[bag] - scope-158
| | |
| | Project[bytearray][0] - scope-152
| | |
| | Project[bytearray][1] - scope-154
| | |
| | Project[float][2] - scope-156
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-224
|
|---c3: Union[bag] - scope-185
|
|---c1: New For Each(false,false,false)[bag] - scope-175
| | |
| | Project[bytearray][0] - scope-169
| | |
| | Project[bytearray][1] - scope-171
| | |
| | Project[float][2] - scope-173
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-162
|
|---c2: New For Each(false,false,false)[bag] - scope-184
| |
| Project[bytearray][0] - scope-178
| |
| Project[bytearray][1] - scope-180
| |
| Project[float][2] - scope-182
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-231--------
scope-219->scope-234 scope-234
scope-234
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-219
Split - scope-252
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-222
| |
| |---b: Filter[bag] - scope-135
| | |
| | Greater Than or Equal[boolean] - scope-139
| | |
| | |---Cast[double] - scope-137
| | | |
| | | |---Project[float][2] - scope-136
| | |
| | |---Constant(3.9) - scope-138
| |
|
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-229
| |
| |---c: Filter[bag] - scope-162
| | |
| | Less Than[boolean] - scope-165
| | |
| | |---Project[float][2] - scope-163
| | |
| | |---Constant(2.0) - scope-164
|
|---a: New For Each(false,false,false)[bag] - scope-131
| |
| Project[bytearray][0] - scope-124
| |
| Project[bytearray][1] - scope-126
| |
| Cast[float] - scope-129
| |
| |---Project[bytearray][2] - scope-128
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/studentnulltab10k:org.apache.pig.builtin.PigStorage)
- scope-123--------
Spark node scope-234
Split - scope-251
| |
| a1:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.1:org.apache.pig.builtin.PigStorage)
- scope-190
| |
| e:
Store(hdfs://zly1.sh.intel.com:8020/user/root/MultiQuery_Union_7.out.2:org.apache.pig.builtin.PigStorage)
- scope-218
| |
| |---e: New For Each(true,true)[tuple] - scope-217
| | |
| | POBinCond[bag] - scope-215
| | |
| | |---Project[bag][1] - scope-211
| | |
| | |---POUserFunc(org.apache.pig.builtin.IsEmpty)[boolean] - scope-213
| | | |
| | | |---Project[bag][1] - scope-212
| | |
| | |---Constant({(,,)}) - scope-214
| | |
| | Project[bag][2] - scope-216
| |
| |---POJoinGroupSpark[tuple] - scope-205
| |
| |---d: New For Each(false,false,false,false)[bag] - scope-202
| | |
| | Project[bytearray][0] - scope-194
| | |
| | Project[bytearray][1] - scope-196
| | |
| | Project[bytearray][2] - scope-198
| | |
| | Project[bytearray][3] - scope-200
| |
| |---d:
Load(hdfs://zly1.sh.intel.com:8020/user/root/voternulltab10k:org.apache.pig.builtin.PigStorage)
- scope-193
|
|---a1: Union[bag] - scope-186
|
|---b3: Union[bag] - scope-159
| |
| |---b1: New For Each(false,false,false)[bag] - scope-149
| | | |
| | | Project[bytearray][0] - scope-143
| | | |
| | | Project[bytearray][1] - scope-145
| | | |
| | | Project[float][2] - scope-147
| | |
| |
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-135
| |
| |---b2: New For Each(false,false,false)[bag] - scope-158
| | |
| | Project[bytearray][0] - scope-152
| | |
| | Project[bytearray][1] - scope-154
| | |
| | Project[float][2] - scope-156
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-1986449257:org.apache.pig.impl.io.InterStorage)
- scope-224
|
|---c3: Union[bag] - scope-185
|
|---c1: New For Each(false,false,false)[bag] - scope-175
| | |
| | Project[bytearray][0] - scope-169
| | |
| | Project[bytearray][1] - scope-171
| | |
| | Project[float][2] - scope-173
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-162
|
|---c2: New For Each(false,false,false)[bag] - scope-184
| |
| Project[bytearray][0] - scope-178
| |
| Project[bytearray][1] - scope-180
| |
| Project[float][2] - scope-182
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp2007961741/tmp-2779392:org.apache.pig.impl.io.InterStorage)
- scope-231------
{code}
> MultiQuery_Union_7 is failing with spark exec type
> --------------------------------------------------
>
> Key: PIG-5165
> URL: https://issues.apache.org/jira/browse/PIG-5165
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Nandor Kollar
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
>
> 1st output is fine, 2nd is different
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)