[ https://issues.apache.org/jira/browse/HIVE-24388?focusedWorklogId=522688&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522688 ]
ASF GitHub Bot logged work on HIVE-24388: ----------------------------------------- Author: ASF GitHub Bot Created on: 10/Dec/20 12:08 Start Date: 10/Dec/20 12:08 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1750: URL: https://github.com/apache/hive/pull/1750#discussion_r540116764 ########## File path: ql/src/test/results/clientpositive/llap/swo_event_merge.q.out ########## @@ -0,0 +1,291 @@ +PREHOOK: query: drop table if exists x1_store_sales +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_store_sales +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table if exists x1_date_dim +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_date_dim +POSTHOOK: type: DROPTABLE +PREHOOK: query: drop table if exists x1_item +PREHOOK: type: DROPTABLE +POSTHOOK: query: drop table if exists x1_item +POSTHOOK: type: DROPTABLE +PREHOOK: query: create table x1_store_sales +( + ss_item_sk int +) +partitioned by (ss_sold_date_sk int) +stored as orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@x1_store_sales +POSTHOOK: query: create table x1_store_sales +( + ss_item_sk int +) +partitioned by (ss_sold_date_sk int) +stored as orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@x1_store_sales +PREHOOK: query: create table x1_date_dim +( + d_date_sk int, + d_month_seq int, + d_year int, + d_moy int +) +stored as orc +PREHOOK: type: CREATETABLE +PREHOOK: Output: database:default +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: create table x1_date_dim +( + d_date_sk int, + d_month_seq int, + d_year int, + d_moy int +) +stored as orc +POSTHOOK: type: CREATETABLE +POSTHOOK: Output: database:default +POSTHOOK: Output: default@x1_date_dim +PREHOOK: query: insert into x1_date_dim values (1,1,2000,2), + (2,2,2001,2) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: insert into x1_date_dim values (1,1,2000,2), + (2,2,2001,2) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_date_dim +POSTHOOK: Lineage: x1_date_dim.d_date_sk SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_month_seq SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_moy SCRIPT [] +POSTHOOK: Lineage: x1_date_dim.d_year SCRIPT [] +PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) values (1) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=1) values (1) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=1).ss_item_sk SCRIPT [] +PREHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) values (2) +PREHOOK: type: QUERY +PREHOOK: Input: _dummy_database@_dummy_table +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2 +POSTHOOK: query: insert into x1_store_sales partition (ss_sold_date_sk=2) values (2) +POSTHOOK: type: QUERY +POSTHOOK: Input: _dummy_database@_dummy_table +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=2 +POSTHOOK: Lineage: x1_store_sales PARTITION(ss_sold_date_sk=2).ss_item_sk SCRIPT [] +PREHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) update statistics set( +'numRows'='123456', +'rawDataSize'='1234567') +PREHOOK: type: ALTERTABLE_UPDATEPARTSTATS +PREHOOK: Input: default@x1_store_sales +PREHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: query: alter table x1_store_sales partition (ss_sold_date_sk=1) update statistics set( +'numRows'='123456', +'rawDataSize'='1234567') +POSTHOOK: type: ALTERTABLE_UPDATEPARTSTATS +POSTHOOK: Input: default@x1_store_sales +POSTHOOK: Input: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: Output: default@x1_store_sales@ss_sold_date_sk=1 +PREHOOK: query: alter table x1_date_dim update statistics set( +'numRows'='56', +'rawDataSize'='81449') +PREHOOK: type: ALTERTABLE_UPDATETABLESTATS +PREHOOK: Input: default@x1_date_dim +PREHOOK: Output: default@x1_date_dim +POSTHOOK: query: alter table x1_date_dim update statistics set( +'numRows'='56', +'rawDataSize'='81449') +POSTHOOK: type: ALTERTABLE_UPDATETABLESTATS +POSTHOOK: Input: default@x1_date_dim +POSTHOOK: Output: default@x1_date_dim +PREHOOK: query: explain +select count(*) cnt + from + x1_store_sales s + ,x1_date_dim d + where + 1=1 + and s.ss_sold_date_sk = d.d_date_sk + and d.d_year=2000 +union +select s.ss_item_sk*d_date_sk + from + x1_store_sales s + ,x1_date_dim d + where + 1=1 + and s.ss_sold_date_sk = d.d_date_sk + and d.d_year=2001 + group by s.ss_item_sk*d_date_sk +PREHOOK: type: QUERY +PREHOOK: Input: default@x1_date_dim +PREHOOK: Input: default@x1_store_sales +PREHOOK: Input: default@x1_store_sales@ss_sold_date_sk=1 +PREHOOK: Input: default@x1_store_sales@ss_sold_date_sk=2 +#### A masked pattern was here #### +POSTHOOK: query: explain +select count(*) cnt + from + x1_store_sales s + ,x1_date_dim d + where + 1=1 + and s.ss_sold_date_sk = d.d_date_sk + and d.d_year=2000 +union +select s.ss_item_sk*d_date_sk + from + x1_store_sales s + ,x1_date_dim d + where + 1=1 + and s.ss_sold_date_sk = d.d_date_sk + and d.d_year=2001 + group by s.ss_item_sk*d_date_sk +POSTHOOK: type: QUERY +POSTHOOK: Input: default@x1_date_dim +POSTHOOK: Input: default@x1_store_sales +POSTHOOK: Input: default@x1_store_sales@ss_sold_date_sk=1 +POSTHOOK: Input: default@x1_store_sales@ss_sold_date_sk=2 +#### A masked pattern was here #### +Plan optimized by CBO. + +Vertex dependency in root stage +Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE) +Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE), Union 4 (CONTAINS) +Reducer 5 <- Union 4 (SIMPLE_EDGE) +Reducer 6 <- Map 1 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE) +Reducer 7 <- Reducer 6 (SIMPLE_EDGE), Union 4 (CONTAINS) + +Stage-0 + Fetch Operator + limit:-1 + Stage-1 + Reducer 5 vectorized, llap + File Output Operator [FS_89] + Group By Operator [GBY_88] (rows=1 width=8) + Output:["_col0"],keys:KEY._col0 + <-Union 4 [SIMPLE_EDGE] + <-Reducer 3 [CONTAINS] vectorized, llap + Reduce Output Operator [RS_87] + PartitionCols:_col0 + Group By Operator [GBY_86] (rows=1 width=8) + Output:["_col0"],keys:_col0 + Group By Operator [GBY_85] (rows=1 width=8) + Output:["_col0"],aggregations:["count(VALUE._col0)"] + <-Reducer 2 [CUSTOM_SIMPLE_EDGE] llap + PARTITION_ONLY_SHUFFLE [RS_11] + Group By Operator [GBY_10] (rows=1 width=8) + Output:["_col0"],aggregations:["count()"] + Merge Join Operator [MERGEJOIN_51] (rows=1728398 width=8) + Conds:RS_71._col0=RS_77._col0(Inner) + <-Map 1 [SIMPLE_EDGE] vectorized, llap + SHUFFLE [RS_71] + PartitionCols:_col0 + Select Operator [SEL_69] (rows=123457 width=4) + Output:["_col0"] + Filter Operator [FIL_68] + predicate:ss_sold_date_sk is not null + TableScan [TS_0] (rows=123457 width=14) + default@x1_store_sales,s,Tbl:COMPLETE,Col:COMPLETE,Output:["ss_item_sk"] Review comment: I think right now we don't have a sanity check filter right before the TS to limit reduce shuffled data size to the previous amount; but reducing the number of scans is beneficial - IIRC in the q23 query the scanned partitions were the same; so the benefit was real. If we make available these things the same way as we have the SJ filters - then we could for sure avoid shuffling more data. Note: I think it would enable some further opportunities if we would change the SJ data transmission method from RS to EVENTOP - that way we shouldn't have to worry about parallel edges anymore... ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 522688) Time Spent: 40m (was: 0.5h) > Enhance swo optimizations to merge EventOperators > ------------------------------------------------- > > Key: HIVE-24388 > URL: https://issues.apache.org/jira/browse/HIVE-24388 > Project: Hive > Issue Type: Sub-task > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code} > EVENT1->TS1 > EVENT2->TS2 > {code} > are not merged because a TS may only handles the first event properly; > sending 2 events would cause one of them to be ignored -- This message was sent by Atlassian Jira (v8.3.4#803005)