Mostafa Mokhtar created HIVE-10537: -------------------------------------- Summary: ConvertJoinMapJoin should take into account JOINNOCONDITIONALTASKTHRESHOLD when packing tables in a Vertex Key: HIVE-10537 URL: https://issues.apache.org/jira/browse/HIVE-10537 Project: Hive Issue Type: Bug Reporter: Mostafa Mokhtar Assignee: Vikram Dixit K
Vertex "Map 2" has inputs that add up to 803MB while NCTS is 610MB. Query {code} set hive.auto.convert.join.noconditionaltask.size=640000000; explain select i_item_desc ,w_warehouse_name ,d1.d_week_seq ,count(case when p_promo_sk is null then 1 else 0 end) no_promo ,count(case when p_promo_sk is not null then 1 else 0 end) promo ,count(*) total_cnt from catalog_sales join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk) join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk) join item on (item.i_item_sk = catalog_sales.cs_item_sk) join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk) join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk) join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk) join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk) join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk) left outer join promotion on (catalog_sales.cs_promo_sk=promotion.p_promo_sk) left outer join catalog_returns on (catalog_returns.cr_item_sk = catalog_sales.cs_item_sk and catalog_returns.cr_order_number = catalog_sales.cs_order_number) where d1.d_week_seq = d2.d_week_seq and inv_quantity_on_hand < cs_quantity and d3.d_date > d1.d_date + 5 and hd_buy_potential = '1001-5000' and d1.d_year = 2001 and hd_buy_potential = '1001-5000' and cd_marital_status = 'M' and d1.d_year = 2001 group by i_item_desc,w_warehouse_name,d1.d_week_seq order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq limit 100 {code} Plan {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez Edges: Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE), Map 11 (BROADCAST_EDGE), Map 12 (BROADCAST_EDGE), Map 13 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE) Reducer 3 <- Map 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) DagName: jenkins_20150429044838_bd0a1cf7-235f-48db-9321-c13899fed7b3:1 Vertices: Map 1 Map Operator Tree: TableScan alias: catalog_returns Statistics: Num rows: 28798881 Data size: 2942039156 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cr_item_sk (type: int), cr_order_number (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 28798881 Data size: 230391048 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 28798881 Data size: 230391048 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 10 Map Operator Tree: TableScan alias: warehouse filterExpr: w_warehouse_sk is not null (type: boolean) Statistics: Num rows: 6 Data size: 6166 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: w_warehouse_sk is not null (type: boolean) Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: w_warehouse_sk (type: int), w_warehouse_name (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 6 Data size: 618 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Map 11 Map Operator Tree: TableScan alias: d1 filterExpr: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((d_year = 2001) and d_date_sk is not null) and d_week_seq is not null) (type: boolean) Statistics: Num rows: 652 Data size: 69112 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int), d_date (type: string), d_week_seq (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string), _col2 (type: int) Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: catalog_sales Partition key expr: cs_sold_date_sk Statistics: Num rows: 652 Data size: 66504 Basic stats: COMPLETE Column stats: COMPLETE Target column: cs_sold_date_sk Target Vertex: Map 2 Execution mode: vectorized Map 12 Map Operator Tree: TableScan alias: d1 filterExpr: (d_date_sk is not null and d_week_seq is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (d_date_sk is not null and d_week_seq is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int), d_week_seq (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int) outputColumnNames: _col0 Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator keys: _col0 (type: int) mode: hash outputColumnNames: _col0 Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Dynamic Partitioning Event Operator Target Input: inventory Partition key expr: inv_date_sk Statistics: Num rows: 73049 Data size: 584392 Basic stats: COMPLETE Column stats: COMPLETE Target column: inv_date_sk Target Vertex: Map 5 Execution mode: vectorized Map 13 Map Operator Tree: TableScan alias: promotion Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: p_promo_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 450 Data size: 1800 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: catalog_sales filterExpr: (((cs_item_sk is not null and cs_bill_hdemo_sk is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null) (type: boolean) Statistics: Num rows: 286549727 Data size: 37743959324 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (((cs_item_sk is not null and cs_bill_hdemo_sk is not null) and cs_bill_cdemo_sk is not null) and cs_ship_date_sk is not null) (type: boolean) Statistics: Num rows: 284396955 Data size: 9086416352 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cs_ship_date_sk (type: int), cs_bill_cdemo_sk (type: int), cs_bill_hdemo_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: int), cs_order_number (type: int), cs_quantity (type: int), cs_sold_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7 Statistics: Num rows: 284396955 Data size: 9086416352 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col3 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col9, _col10, _col11 input vertices: 1 Map 5 Statistics: Num rows: 275157677926 Data size: 12106937828744 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Filter Operator predicate: (_col10 < _col6) (type: boolean) Statistics: Num rows: 91719225975 Data size: 4035645942900 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: int), _col1 (type: int), _col11 (type: int), _col2 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int), _col7 (type: int), _col9 (type: int) outputColumnNames: _col0, _col1, _col11, _col2, _col3, _col4, _col5, _col7, _col9 Statistics: Num rows: 91719225975 Data size: 3301892135100 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col2 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col7, _col9, _col11 input vertices: 1 Map 6 Statistics: Num rows: 18343845888 Data size: 587003068416 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col3, _col4, _col5, _col7, _col9, _col11 input vertices: 1 Map 7 Statistics: Num rows: 2620549632 Data size: 73375389696 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col4, _col5, _col7, _col9, _col11, _col17 input vertices: 1 Map 8 Statistics: Num rows: 2620549632 Data size: 309224856576 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col3 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col4, _col5, _col7, _col9, _col11, _col17, _col19 input vertices: 1 Map 9 Statistics: Num rows: 2620549632 Data size: 791405988864 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col9 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col4, _col5, _col7, _col11, _col17, _col19, _col21 input vertices: 1 Map 10 Statistics: Num rows: 2620549632 Data size: 1040358203904 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col7 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col4, _col5, _col11, _col17, _col19, _col21, _col23, _col24 input vertices: 1 Map 11 Statistics: Num rows: 2925682123 Data size: 1436509922393 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col11 (type: int), _col24 (type: int) 1 _col0 (type: int), _col1 (type: int) outputColumnNames: _col3, _col4, _col5, _col17, _col19, _col21, _col23, _col24 input vertices: 1 Map 12 Statistics: Num rows: 248727 Data size: 121130049 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Filter Operator predicate: (UDFToDouble(_col17) > (UDFToDouble(_col23) + 5.0)) (type: boolean) Statistics: Num rows: 82909 Data size: 40376683 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col21 (type: string), _col19 (type: string), _col24 (type: int), _col3 (type: int), _col4 (type: int), _col5 (type: int) outputColumnNames: _col13, _col15, _col22, _col3, _col4, _col5 Statistics: Num rows: 82909 Data size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Left Outer Join0 to 1 keys: 0 _col4 (type: int) 1 _col0 (type: int) outputColumnNames: _col3, _col5, _col13, _col15, _col22, _col28 input vertices: 1 Map 13 Statistics: Num rows: 82909 Data size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Select Operator expressions: _col13 (type: string), _col15 (type: string), _col22 (type: int), _col28 (type: int), _col3 (type: int), _col5 (type: int) outputColumnNames: _col13, _col15, _col22, _col28, _col3, _col5 Statistics: Num rows: 82909 Data size: 24789791 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Right Outer Join0 to 1 keys: 0 _col0 (type: int), _col1 (type: int) 1 _col3 (type: int), _col5 (type: int) outputColumnNames: _col15, _col17, _col24, _col30 input vertices: 0 Map 1 Statistics: Num rows: 2237 Data size: 650967 Basic stats: COMPLETE Column stats: COMPLETE HybridGraceHashJoin: true Select Operator expressions: _col17 (type: string), _col15 (type: string), _col24 (type: int), CASE WHEN (_col30 is null) THEN (1) ELSE (0) END (type: int), CASE WHEN (_col30 is not null) THEN (1) ELSE (0) END (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col4 Statistics: Num rows: 2237 Data size: 650967 Basic stats: COMPLETE Column stats: COMPLETE Group By Operator aggregations: count(_col3), count(_col4), count() keys: _col0 (type: string), _col1 (type: string), _col2 (type: int) mode: hash outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: int) sort order: +++ Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: int) Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col3 (type: bigint), _col4 (type: bigint), _col5 (type: bigint) Execution mode: vectorized Map 5 Map Operator Tree: TableScan alias: inventory filterExpr: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean) Statistics: Num rows: 37584000 Data size: 443485104 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (inv_item_sk is not null and inv_warehouse_sk is not null) (type: boolean) Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: inv_item_sk (type: int), inv_warehouse_sk (type: int), inv_quantity_on_hand (type: int), inv_date_sk (type: int) outputColumnNames: _col0, _col1, _col2, _col3 Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 37584000 Data size: 593821104 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int) Execution mode: vectorized Map 6 Map Operator Tree: TableScan alias: household_demographics filterExpr: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 7200 Data size: 770400 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((hd_buy_potential = '1001-5000') and hd_demo_sk is not null) (type: boolean) Statistics: Num rows: 1440 Data size: 138240 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: hd_demo_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 1440 Data size: 5760 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 7 Map Operator Tree: TableScan alias: customer_demographics filterExpr: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean) Statistics: Num rows: 1920800 Data size: 718379200 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((cd_marital_status = 'M') and cd_demo_sk is not null) (type: boolean) Statistics: Num rows: 274400 Data size: 24421600 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: cd_demo_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 274400 Data size: 1097600 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 8 Map Operator Tree: TableScan alias: d1 filterExpr: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: d_date_sk is not null (type: boolean) Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int), d_date (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 73049 Data size: 7158802 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Map 9 Map Operator Tree: TableScan alias: item filterExpr: i_item_sk is not null (type: boolean) Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: i_item_sk is not null (type: boolean) Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: i_item_sk (type: int), i_item_desc (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 48000 Data size: 9024000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized Reducer 3 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0), count(VALUE._col1), count(VALUE._col2) keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: int) mode: mergepartial outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col5 (type: bigint), _col0 (type: string), _col1 (type: string), _col2 (type: int) sort order: -+++ Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.04 value expressions: _col3 (type: bigint), _col4 (type: bigint) Reducer 4 Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey1 (type: string), KEY.reducesinkkey2 (type: string), KEY.reducesinkkey3 (type: int), VALUE._col0 (type: bigint), VALUE._col1 (type: bigint), KEY.reducesinkkey0 (type: bigint) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 100 Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 311 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: 100 Processor Tree: ListSink {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)