Mostafa Mokhtar created HIVE-8747: ------------------------------------- Summary: Estimate number of rows for table with 0 rows overflows resulting in an in-efficient plan Key: HIVE-8747 URL: https://issues.apache.org/jira/browse/HIVE-8747 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth J Priority: Critical Fix For: 0.14.0
Query {code} select count(*) from web_sales ,date_dim ,ship_mode where web_sales.ws_sold_date_sk = date_dim.d_date_sk and web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk and d_year = 2002 and sm_carrier in ('DIAMOND','AIRBORNE') {code} Explain {code} STAGE PLANS: Stage: Stage-1 Tez Edges: Map 1 <- Map 4 (BROADCAST_EDGE) Map 4 <- Map 3 (BROADCAST_EDGE) Reducer 2 <- Map 1 (SIMPLE_EDGE) DagName: mmokhtar_20141105180404_59e6fb65-529f-4eaa-9446-7f34d12bffac:30 Vertices: Map 1 Map Operator Tree: TableScan alias: ship_mode filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 0 Data size: 45 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sm_ship_mode_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 1 keys: 0 _col1 (type: int) 1 _col0 (type: int) input vertices: 0 Map 4 Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Select Operator Statistics: Num rows: 9223372036854775807 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Group By Operator aggregations: count() mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col0 (type: bigint) Execution mode: vectorized Map 3 Map Operator Tree: TableScan alias: date_dim filterExpr: ((d_year = 2002) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: ((d_year = 2002) and d_date_sk is not null) (type: boolean) Statistics: Num rows: 652 Data size: 5216 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: d_date_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 652 Data size: 2608 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Map 4 Map Operator Tree: TableScan alias: web_sales filterExpr: (ws_sold_date_sk is not null and ws_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 143966864 Data size: 19577477788 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: (ws_sold_date_sk is not null and ws_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 143948856 Data size: 1151518824 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: ws_sold_date_sk (type: int), ws_ship_mode_sk (type: int) outputColumnNames: _col0, _col1 Statistics: Num rows: 143948856 Data size: 1151518824 Basic stats: COMPLETE Column stats: COMPLETE Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {_col1} 1 keys: 0 _col0 (type: int) 1 _col0 (type: int) outputColumnNames: _col1 input vertices: 1 Map 3 Statistics: Num rows: 1284818 Data size: 5139272 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col1 (type: int) sort order: + Map-reduce partition columns: _col1 (type: int) Statistics: Num rows: 1284818 Data size: 5139272 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)