[ https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hanumath Rao Maduri updated DRILL-6997: --------------------------------------- Labels: ready-to-commit (was: ) > Semijoin is changing the join ordering for some tpcds queries. > -------------------------------------------------------------- > > Key: DRILL-6997 > URL: https://issues.apache.org/jira/browse/DRILL-6997 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 1.15.0 > Reporter: Hanumath Rao Maduri > Assignee: Hanumath Rao Maduri > Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, > 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill > > > TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join > disabled at scale factor 100. It runs 100% slower at scale factor 1000. This > issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. > DRILL-6798: Planner changes to support semi-join. > {code:java} > with ws_wh as > (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2 > from web_sales ws1,web_sales ws2 > where ws1.ws_order_number = ws2.ws_order_number > and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk) > [_LIMITA] select [_LIMITB] > count(distinct ws_order_number) as "order count" > ,sum(ws_ext_ship_cost) as "total shipping cost" > ,sum(ws_net_profit) as "total net profit" > from > web_sales ws1 > ,date_dim > ,customer_address > ,web_site > where > d_date between '[YEAR]-[MONTH]-01' and > (cast('[YEAR]-[MONTH]-01' as date) + 60 days) > and ws1.ws_ship_date_sk = d_date_sk > and ws1.ws_ship_addr_sk = ca_address_sk > and ca_state = '[STATE]' > and ws1.ws_web_site_sk = web_site_sk > and web_company_name = 'pri' > and ws1.ws_order_number in (select ws_order_number > from ws_wh) > and ws1.ws_order_number in (select wr_order_number > from web_returns,ws_wh > where wr_order_number = ws_wh.ws_order_number) > order by count(distinct ws_order_number) > [_LIMITC]; > {code} > I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has > semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join > disabled. Both are executed with commit id > 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100. > The plan with semi-join enabled has moved the first hash join: > and ws1.ws_order_number in (select ws_order_number > from ws_wh) > It used to be on the build side of the first HJ on the left hand side > (04-05). It is now on the build side of the fourth HJ on the left hand side > (01-13). > The plan with semi-join enabled has a hash_partition_sender (operator 05-00) > that takes 10 seconds to execute. But all the fragments take about the same > amount of time. > The plan with semi-join enabled has two HJ that process 1B rows while the > plan with semi-joins disabled has one HJ that processes 1B rows. > The plan with semi-join enabled has several senders and receivers that wait > more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, > 17-00). When disabled, there is no operator waiting more than 10 seconds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)