Mostafa Mokhtar created HIVE-10537:
--------------------------------------
Summary: ConvertJoinMapJoin should take into account
JOINNOCONDITIONALTASKTHRESHOLD when packing tables in a Vertex
Key: HIVE-10537
URL: https://issues.apache.org/jira/browse/HIVE-10537
Project: Hive
Issue Type: Bug
Reporter: Mostafa Mokhtar
Assignee: Vikram Dixit K
Vertex "Map 2" has inputs that add up to 803MB while NCTS is 610MB.
Query
{code}
set hive.auto.convert.join.noconditionaltask.size=640000000;
explain
select i_item_desc
,w_warehouse_name
,d1.d_week_seq
,count(case when p_promo_sk is null then 1 else 0 end) no_promo
,count(case when p_promo_sk is not null then 1 else 0 end) promo
,count(*) total_cnt
from catalog_sales
join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
join item on (item.i_item_sk = catalog_sales.cs_item_sk)
join customer_demographics on (catalog_sales.cs_bill_cdemo_sk =
customer_demographics.cd_demo_sk)
join household_demographics on (catalog_sales.cs_bill_hdemo_sk =
household_demographics.hd_demo_sk)
join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
left outer join promotion on (catalog_sales.cs_promo_sk=promotion.p_promo_sk)
left outer join catalog_returns on (catalog_returns.cr_item_sk =
catalog_sales.cs_item_sk and catalog_returns.cr_order_number =
catalog_sales.cs_order_number)
where d1.d_week_seq = d2.d_week_seq
and inv_quantity_on_hand < cs_quantity
and d3.d_date > d1.d_date + 5
and hd_buy_potential = '1001-5000'
and d1.d_year = 2001
and hd_buy_potential = '1001-5000'
and cd_marital_status = 'M'
and d1.d_year = 2001
group by i_item_desc,w_warehouse_name,d1.d_week_seq
order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq
limit 100
{code}
Plan
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 11
(BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 13 (BROADCAST_EDGE), Map 5
(BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8
(BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
DagName: jenkins_20150429044838_bd0a1cf7-235f-48db-9321-c13899fed7b3:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: catalog_returns
Statistics: Num rows: 28798881 Data size: 2942039156 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cr_item_sk (type: int), cr_order_number (type:
int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 28798881 Data size: 230391048 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1
(type: int)
Statistics: Num rows: 28798881 Data size: 230391048 Basic
stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 10
Map Operator Tree:
TableScan
alias: warehouse
filterExpr: w_warehouse_sk is not null (type: boolean)
Statistics: Num rows: 6 Data size: 6166 Basic stats: COMPLETE
Column stats: COMPLETE
Filter Operator
predicate: w_warehouse_sk is not null (type: boolean)
Statistics: Num rows: 6 Data size: 618 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: w_warehouse_sk (type: int), w_warehouse_name
(type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6 Data size: 618 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 6 Data size: 618 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized
Map 11
Map Operator Tree:
TableScan
alias: d1
filterExpr: (((d_year = 2001) and d_date_sk is not null) and
d_week_seq is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((d_year = 2001) and d_date_sk is not null) and
d_week_seq is not null) (type: boolean)
Statistics: Num rows: 652 Data size: 69112 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int), d_date (type:
string), d_week_seq (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 652 Data size: 66504 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 652 Data size: 66504 Basic stats:
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string), _col2 (type:
int)
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 652 Data size: 66504 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 652 Data size: 66504 Basic
stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: catalog_sales
Partition key expr: cs_sold_date_sk
Statistics: Num rows: 652 Data size: 66504 Basic
stats: COMPLETE Column stats: COMPLETE
Target column: cs_sold_date_sk
Target Vertex: Map 2
Execution mode: vectorized
Map 12
Map Operator Tree:
TableScan
alias: d1
filterExpr: (d_date_sk is not null and d_week_seq is not
null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_date_sk is not null and d_week_seq is not
null) (type: boolean)
Statistics: Num rows: 73049 Data size: 584392 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int), d_week_seq (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 73049 Data size: 584392 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1
(type: int)
Statistics: Num rows: 73049 Data size: 584392 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 584392 Basic
stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 73049 Data size: 584392 Basic
stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: inventory
Partition key expr: inv_date_sk
Statistics: Num rows: 73049 Data size: 584392 Basic
stats: COMPLETE Column stats: COMPLETE
Target column: inv_date_sk
Target Vertex: Map 5
Execution mode: vectorized
Map 13
Map Operator Tree:
TableScan
alias: promotion
Statistics: Num rows: 450 Data size: 530848 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: p_promo_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 450 Data size: 1800 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 450 Data size: 1800 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: catalog_sales
filterExpr: (((cs_item_sk is not null and cs_bill_hdemo_sk is
not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null)
(type: boolean)
Statistics: Num rows: 286549727 Data size: 37743959324 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((cs_item_sk is not null and cs_bill_hdemo_sk
is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null)
(type: boolean)
Statistics: Num rows: 284396955 Data size: 9086416352 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cs_ship_date_sk (type: int),
cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type:
int), cs_promo_sk (type: int), cs_order_number (type: int), cs_quantity (type:
int), cs_sold_date_sk (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7
Statistics: Num rows: 284396955 Data size: 9086416352
Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col3 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7, _col9, _col10, _col11
input vertices:
1 Map 5
Statistics: Num rows: 275157677926 Data size:
12106937828744 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Filter Operator
predicate: (_col10 < _col6) (type: boolean)
Statistics: Num rows: 91719225975 Data size:
4035645942900 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int), _col1 (type: int),
_col11 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int),
_col5 (type: int), _col7 (type: int), _col9 (type: int)
outputColumnNames: _col0, _col1, _col11, _col2,
_col3, _col4, _col5, _col7, _col9
Statistics: Num rows: 91719225975 Data size:
3301892135100 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1, _col3, _col4,
_col5, _col7, _col9, _col11
input vertices:
1 Map 6
Statistics: Num rows: 18343845888 Data size:
587003068416 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col3, _col4, _col5,
_col7, _col9, _col11
input vertices:
1 Map 7
Statistics: Num rows: 2620549632 Data size:
73375389696 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col4, _col5,
_col7, _col9, _col11, _col17
input vertices:
1 Map 8
Statistics: Num rows: 2620549632 Data size:
309224856576 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col3 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col4, _col5,
_col7, _col9, _col11, _col17, _col19
input vertices:
1 Map 9
Statistics: Num rows: 2620549632 Data size:
791405988864 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col9 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col4, _col5,
_col7, _col11, _col17, _col19, _col21
input vertices:
1 Map 10
Statistics: Num rows: 2620549632 Data
size: 1040358203904 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col7 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col4, _col5,
_col11, _col17, _col19, _col21, _col23, _col24
input vertices:
1 Map 11
Statistics: Num rows: 2925682123 Data
size: 1436509922393 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col11 (type: int), _col24 (type:
int)
1 _col0 (type: int), _col1 (type:
int)
outputColumnNames: _col3, _col4,
_col5, _col17, _col19, _col21, _col23, _col24
input vertices:
1 Map 12
Statistics: Num rows: 248727 Data
size: 121130049 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Filter Operator
predicate: (UDFToDouble(_col17) >
(UDFToDouble(_col23) + 5.0)) (type: boolean)
Statistics: Num rows: 82909 Data
size: 40376683 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col21 (type:
string), _col19 (type: string), _col24 (type: int), _col3 (type: int), _col4
(type: int), _col5 (type: int)
outputColumnNames: _col13,
_col15, _col22, _col3, _col4, _col5
Statistics: Num rows: 82909 Data
size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col4 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3,
_col5, _col13, _col15, _col22, _col28
input vertices:
1 Map 13
Statistics: Num rows: 82909
Data size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Select Operator
expressions: _col13 (type:
string), _col15 (type: string), _col22 (type: int), _col28 (type: int), _col3
(type: int), _col5 (type: int)
outputColumnNames: _col13,
_col15, _col22, _col28, _col3, _col5
Statistics: Num rows: 82909
Data size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Right Outer Join0 to 1
keys:
0 _col0 (type: int),
_col1 (type: int)
1 _col3 (type: int),
_col5 (type: int)
outputColumnNames: _col15,
_col17, _col24, _col30
input vertices:
0 Map 1
Statistics: Num rows: 2237
Data size: 650967 Basic stats: COMPLETE Column stats: COMPLETE
HybridGraceHashJoin: true
Select Operator
expressions: _col17
(type: string), _col15 (type: string), _col24 (type: int), CASE WHEN (_col30 is
null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col30 is not null) THEN
(1) ELSE (0) END (type: int)
outputColumnNames: _col0,
_col1, _col2, _col3, _col4
Statistics: Num rows:
2237 Data size: 650967 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations:
count(_col3), count(_col4), count()
keys: _col0 (type:
string), _col1 (type: string), _col2 (type: int)
mode: hash
outputColumnNames:
_col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 1
Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions:
_col0 (type: string), _col1 (type: string), _col2 (type: int)
sort order: +++
Map-reduce partition
columns: _col0 (type: string), _col1 (type: string), _col2 (type: int)
Statistics: Num rows:
1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:
_col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint)
Execution mode: vectorized
Map 5
Map Operator Tree:
TableScan
alias: inventory
filterExpr: (inv_item_sk is not null and inv_warehouse_sk is
not null) (type: boolean)
Statistics: Num rows: 37584000 Data size: 443485104 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (inv_item_sk is not null and inv_warehouse_sk is
not null) (type: boolean)
Statistics: Num rows: 37584000 Data size: 593821104 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: inv_item_sk (type: int), inv_warehouse_sk
(type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 37584000 Data size: 593821104 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 37584000 Data size: 593821104
Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: int), _col2 (type:
int), _col3 (type: int)
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: household_demographics
filterExpr: ((hd_buy_potential = '1001-5000') and hd_demo_sk
is not null) (type: boolean)
Statistics: Num rows: 7200 Data size: 770400 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((hd_buy_potential = '1001-5000') and hd_demo_sk
is not null) (type: boolean)
Statistics: Num rows: 1440 Data size: 138240 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: hd_demo_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1440 Data size: 5760 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 1440 Data size: 5760 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: customer_demographics
filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is not
null) (type: boolean)
Statistics: Num rows: 1920800 Data size: 718379200 Basic
stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((cd_marital_status = 'M') and cd_demo_sk is not
null) (type: boolean)
Statistics: Num rows: 274400 Data size: 24421600 Basic
stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cd_demo_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 274400 Data size: 1097600 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 274400 Data size: 1097600 Basic
stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 8
Map Operator Tree:
TableScan
alias: d1
filterExpr: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: d_date_sk is not null (type: boolean)
Statistics: Num rows: 73049 Data size: 7158802 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int), d_date (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 73049 Data size: 7158802 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 73049 Data size: 7158802 Basic
stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized
Map 9
Map Operator Tree:
TableScan
alias: item
filterExpr: i_item_sk is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: i_item_sk is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 9024000 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int), i_item_desc (type:
string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 48000 Data size: 9024000 Basic
stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 48000 Data size: 9024000 Basic
stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0), count(VALUE._col1),
count(VALUE._col2)
keys: KEY._col0 (type: string), KEY._col1 (type: string),
KEY._col2 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
key expressions: _col5 (type: bigint), _col0 (type: string),
_col1 (type: string), _col2 (type: int)
sort order: -+++
Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE
Column stats: COMPLETE
TopN Hash Memory Usage: 0.04
value expressions: _col3 (type: bigint), _col4 (type: bigint)
Reducer 4
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey1 (type: string),
KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: int), VALUE._col0
(type: bigint), VALUE._col1 (type: bigint), KEY.reducesinkkey0 (type: bigint)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE
Column stats: COMPLETE
Limit
Number of rows: 100
Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE
Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 311 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 100
Processor Tree:
ListSink
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)