[ https://issues.apache.org/jira/browse/HIVE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltan Haindrich updated HIVE-21795: ------------------------------------ Resolution: Fixed Fix Version/s: 4.0.0 Status: Resolved (was: Patch Available) pushed to master. Thank you Jesus for reviewing the changes! > Rollup summary row might be missing when a mapjoin is happening on a > partitioned table > -------------------------------------------------------------------------------------- > > Key: HIVE-21795 > URL: https://issues.apache.org/jira/browse/HIVE-21795 > Project: Hive > Issue Type: Bug > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-21795.01.patch, HIVE-21795.02.patch, > HIVE-21795.02.patch, HIVE-21795.02.patch, HIVE-21795.03.patch > > > * join between 2 tables; the larger is partitioned > * mapjoin is selected > * dpp is sending events from the small table to the large on > * rollup: summary row is missing if dpp removes all input partitions > the following should have a 1 row result. > {code} > set hive.auto.convert.join=true; > drop table if exists store_sales_s0; > drop table if exists store_s0; > CREATE TABLE store_sales_s0 (ss_item_sk int,payload string,payload2 > string,payload3 string) PARTITIONED BY (ss_store_sk int) stored as orc > TBLPROPERTIES( 'transactional'='false'); > CREATE TABLE store_s0 (s_item_sk int,s_store_sk int,s_state string) stored as > orc TBLPROPERTIES( 'transactional'='false'); > insert into store_s0 values > (1,10,'XX'), > (2,20,'AA'), > (3,30,'ZZ') > ; > insert into store_sales_s0 partition(ss_store_sk=9) values > (1,'xxx','xxx','xxx'),(2,'xxx','xxx','xxx'),(3,'xxx','xxx','xxx'),(4,'xxx','xxx','xxx'),(5,'xxx','xxx','xxx'); > insert into store_sales_s0 partition(ss_store_sk=39) values > (1,'xxx','xxx','xxx'),(2,'xxx','xxx','xxx'),(3,'xxx','xxx','xxx'),(4,'xxx','xxx','xxx'),(5,'xxx','xxx','xxx'); > explain select grouping(s_state) from store_s0, store_sales_s0 where > ss_store_sk = s_store_sk and s_state in ('SD','FL', 'MI', 'LA', 'MO', 'SC') > group by rollup(ss_item_sk, s_state) order by s_state; > select grouping(s_state) from store_s0, store_sales_s0 where ss_store_sk = > s_store_sk and s_state in ('SD','FL', 'MI', 'LA', 'MO', 'SC') group by > rollup(ss_item_sk, s_state) order by s_state; > {code} > explain: > {code} > STAGE PLANS: > Stage: Stage-1 > Tez > #### A masked pattern was here #### > Edges: > Map 2 <- Map 1 (BROADCAST_EDGE) > [...] > #### A masked pattern was here #### > Vertices: > Map 1 > Map Operator Tree: > [...] > Dynamic Partitioning Event Operator > Target column: ss_store_sk (int) > Target Input: store_sales_s0 > Partition key expr: ss_store_sk > Statistics: Num rows: 1 Data size: 4 Basic stats: > COMPLETE Column stats: COMPLETE > Target Vertex: Map 2 > Execution mode: vectorized, llap > LLAP IO: all inputs > [...] > Map 2 > Map Operator Tree: > TableScan > alias: store_sales_s0 > filterExpr: ss_store_sk is not null (type: boolean) > Statistics: Num rows: 10 Data size: 80 Basic stats: > COMPLETE Column stats: COMPLETE > Select Operator > expressions: ss_item_sk (type: int), ss_store_sk (type: > int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 80 Basic stats: > COMPLETE Column stats: COMPLETE > Map Join Operator > condition map: > Inner Join 0 to 1 > keys: > 0 _col0 (type: int) > 1 _col1 (type: int) > outputColumnNames: _col1, _col2 > input vertices: > 0 Map 1 > Statistics: Num rows: 10 Data size: 900 Basic stats: > COMPLETE Column stats: COMPLETE > [...] > Execution mode: vectorized, llap > LLAP IO: all inputs > [...] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)