[ https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laljo John Pullokkaran updated HIVE-7985: ----------------------------------------- Status: Patch Available (was: Open) > With CBO enabled cross product is generated when a subquery is present > ---------------------------------------------------------------------- > > Key: HIVE-7985 > URL: https://issues.apache.org/jira/browse/HIVE-7985 > Project: Hive > Issue Type: Bug > Components: CBO > Affects Versions: 0.14.0 > Reporter: Mostafa Mokhtar > Assignee: Laljo John Pullokkaran > Priority: Critical > Fix For: 0.14.0 > > Attachments: HIVE-7985.1.patch, HIVE-7985.patch > > > This is a regression introduced in the latest build of the CBO branch. > Removing the subquery for item will remove the cross products > Query > {code} > select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, > date_dim, item where item.i_item_id in (select i.i_item_id from item i where > i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and > ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by > i_item_id; > {code} > {code} > Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a > cross product > OK > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Map 1 <- Map 3 (BROADCAST_EDGE) > Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE) > Reducer 5 <- Map 4 (SIMPLE_EDGE) > DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: item > filterExpr: (true and i_item_id is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: i_item_id is not null (type: boolean) > Statistics: Num rows: 231000 Data size: 331931080 Basic > stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {i_item_sk} {i_item_id} > 1 {d_date_sk} > keys: > 0 > 1 > outputColumnNames: _col0, _col1, _col25 > input vertices: > 1 Map 3 > Statistics: Num rows: 254100 Data size: 365124192 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: string), > _col25 (type: int) > outputColumnNames: _col0, _col1, _col25 > Statistics: Num rows: 254100 Data size: 365124192 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 254100 Data size: 365124192 > Basic stats: COMPLETE Column stats: NONE > value expressions: _col0 (type: int), _col1 (type: > string), _col25 (type: int) > Execution mode: vectorized > Map 2 > Map Operator Tree: > TableScan > alias: i > filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') > and i_item_id is not null) (type: boolean) > Statistics: Num rows: 462000 Data size: 663862160 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((i_color) IN ('purple', 'burlywood', > 'indian') and i_item_id is not null) (type: boolean) > Statistics: Num rows: 115500 Data size: 165965540 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: i_item_id (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 115500 Data size: 165965540 Basic > stats: COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 115500 Data size: 165965540 > Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 115500 Data size: 165965540 > Basic stats: COMPLETE Column stats: NONE > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: date_dim > filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: > boolean) > Statistics: Num rows: 73049 Data size: 81741831 Basic > stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((d_year = 2001) and (d_moy = 1)) (type: > boolean) > Statistics: Num rows: 18262 Data size: 20435178 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 18262 Data size: 20435178 Basic > stats: COMPLETE Column stats: NONE > value expressions: d_date_sk (type: int) > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: store_sales > Statistics: Num rows: 82510879939 Data size: 7203833257964 > Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price} > 1 {_col0} {_col1} {_col25} > keys: > 0 > 1 > outputColumnNames: _col0, _col2, _col15, _col27, _col28, > _col52 > input vertices: > 1 Map 1 > Statistics: Num rows: 90761969664 Data size: > 7924217282560 Basic stats: COMPLETE Column stats: NONE > Filter Operator > predicate: ((_col2 = _col27) and (_col0 = _col52)) > (type: boolean) > Statistics: Num rows: 22690492416 Data size: > 1981054320640 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col15 (type: float), _col28 (type: > string) > outputColumnNames: _col15, _col59 > Statistics: Num rows: 22690492416 Data size: > 1981054320640 Basic stats: COMPLETE Column stats: NONE > Map Join Operator > condition map: > Left Semi Join 0 to 1 > condition expressions: > 0 {_col15} {_col59} > 1 > keys: > 0 _col59 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col15, _col59 > input vertices: > 1 Map 2 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col59 (type: string), _col15 (type: > float) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: sum(_col1) > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: > string) > Statistics: Num rows: 24959541248 Data size: > 2179159818240 Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: double) > Execution mode: vectorized > Reducer 5 > Reduce Operator Tree: > Group By Operator > aggregations: sum(VALUE._col0) > keys: KEY._col0 (type: string) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 12479770624 Data size: 1089579909120 > Basic stats: COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string), _col1 (type: double) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 12479770624 Data size: 1089579909120 > Basic stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 12479770624 Data size: > 1089579909120 Basic stats: COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > {code} > No cross product generated for this query > {code} > select i_item_id,sum(ss_ext_sales_price) total_sales > from > store_sales, > date_dim, > item > where i_color in ('purple','burlywood','indian') > and ss_item_sk = i_item_sk > and ss_sold_date_sk = d_date_sk > and d_year = 2001 > and d_moy = 1 > group by i_item_id; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)