[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels:   (was: ready-to-commit)

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-30 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6997:

Reviewer: Aman Sinha

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-29 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Labels: ready-to-commit  (was: )

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Attachment: 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6997) Semijoin is changing the join ordering for some tpcds queries.

2019-01-23 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-6997:
---
Attachment: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill

> Semijoin is changing the join ordering for some tpcds queries.
> --
>
> Key: DRILL-6997
> URL: https://issues.apache.org/jira/browse/DRILL-6997
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: 240aa5f8-24c4-e678-8d42-0fc06e5d2465.sys.drill, 
> 240abc6d-b816-5320-93b1-2a07d850e734.sys.drill
>
>
> TPCDS query 95 runs 50% slower with semi-join enabled compared to semi-join 
> disabled at scale factor 100. It runs 100% slower at scale factor 1000. This 
> issue was introduced with commit 71809ca6216d95540b2a41ce1ab2ebb742888671. 
> DRILL-6798: Planner changes to support semi-join.
> {code:java}
> with ws_wh as
>  (select ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
>  from web_sales ws1,web_sales ws2
>  where ws1.ws_order_number = ws2.ws_order_number
>  and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>  [_LIMITA] select [_LIMITB]
>  count(distinct ws_order_number) as "order count"
>  ,sum(ws_ext_ship_cost) as "total shipping cost"
>  ,sum(ws_net_profit) as "total net profit"
>  from
>  web_sales ws1
>  ,date_dim
>  ,customer_address
>  ,web_site
>  where
>  d_date between '[YEAR]-[MONTH]-01' and
>  (cast('[YEAR]-[MONTH]-01' as date) + 60 days)
>  and ws1.ws_ship_date_sk = d_date_sk
>  and ws1.ws_ship_addr_sk = ca_address_sk
>  and ca_state = '[STATE]'
>  and ws1.ws_web_site_sk = web_site_sk
>  and web_company_name = 'pri'
>  and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  and ws1.ws_order_number in (select wr_order_number
>  from web_returns,ws_wh
>  where wr_order_number = ws_wh.ws_order_number)
>  order by count(distinct ws_order_number)
>  [_LIMITC];
> {code}
>  I have attached two profiles. 240abc6d-b816-5320-93b1-2a07d850e734 has 
> semi-join enabled. 240aa5f8-24c4-e678-8d42-0fc06e5d2465 has semi-join 
> disabled. Both are executed with commit id 
> 6267185823c4c50ab31c029ee5b4d9df2fc94d03 and scale factor 100.
> The plan with semi-join enabled has moved the first hash join:
> and ws1.ws_order_number in (select ws_order_number
>  from ws_wh)
>  It used to be on the build side of the first HJ on the left hand side 
> (04-05). It is now on the build side of the fourth HJ on the left hand side 
> (01-13).
> The plan with semi-join enabled has a hash_partition_sender (operator 05-00) 
> that takes 10 seconds to execute. But all the fragments take about the same 
> amount of time.
> The plan with semi-join enabled has two HJ that process 1B rows while the 
> plan with semi-joins disabled has one HJ that processes 1B rows.
> The plan with semi-join enabled has several senders and receivers that wait 
> more than 10 seconds, (00-07, 01-07, 03-00, 04-00, 07-00, 08-00, 14-00, 
> 17-00). When disabled, there is no operator waiting more than 10 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)