[jira] [Updated] (HAWQ-830) Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times

2016-07-14 Thread Goden Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Goden Yao updated HAWQ-830:
---
Fix Version/s: backlog

> Wrong result in CTE query due to CTE is treated as init plan by planner and 
> evaluated multiple times
> 
>
> Key: HAWQ-830
> URL: https://issues.apache.org/jira/browse/HAWQ-830
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.0.0.0-incubating
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
> Fix For: backlog
>
>
> In CTE query, if the CTE itself is referenced multiple times, it should be 
> evaluated only once and then be used multiple time. However, it is treated as 
> init plan and evaluated multiple times in hawq 1.x and 2.0. This has two 
> issues here:
> 1. If the query in CTE is "volatile" (i.e., select volatile function) or has 
> side effect (create/drop object in database), it may generate wrong result
> 2. The performance of the query is not so efficient since the query in CTE is 
> evaluated multiple times.
> Here is the steps to reproduce:
> 1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the 
> result is incorrect
> {noformat}
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   random   |  random
> ---+---
>  0.519145511090755 | 0.751198637764901
> (1 row)
> EXPLAIN
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   QUERY PLAN
> --
>  Nested Loop  (cost=0.04..0.77 rows=20 width=16)
>->  Result  (cost=0.01..0.02 rows=1 width=0)
>  InitPlan
>->  Result  (cost=0.00..0.01 rows=1 width=0)
>->  Materialize  (cost=0.03..0.09 rows=6 width=8)
>  ->  Result  (cost=0.01..0.02 rows=1 width=0)
>InitPlan
>  ->  Result  (cost=0.00..0.01 rows=1 width=0)
>  Settings:  default_hash_table_bucket_number=6
>  Optimizer status: legacy query optimizer
> (10 rows)
> {noformat}
> 2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the 
> result is i
> {noformat}
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   random   |  random
> ---+---
>  0.989214501809329 | 0.989214501809329
> (1 row)
> EXPLAIN
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
> QUERY PLAN
> --
>  Nested Loop  (cost=0.01..0.06 rows=1 width=16)
>CTE r
>  ->  Result  (cost=0.00..0.01 rows=1 width=0)
>->  CTE Scan on r r1  (cost=0.00..0.02 rows=1 width=8)
>->  CTE Scan on r r2  (cost=0.00..0.02 rows=1 width=8)
> (5 rows){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HAWQ-830) Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times

2016-06-28 Thread Ruilong Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HAWQ-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruilong Huo updated HAWQ-830:
-
Affects Version/s: 2.0.0

> Wrong result in CTE query due to CTE is treated as init plan by planner and 
> evaluated multiple times
> 
>
> Key: HAWQ-830
> URL: https://issues.apache.org/jira/browse/HAWQ-830
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.0.0
>Reporter: Ruilong Huo
>Assignee: Ruilong Huo
>
> In CTE query, if the CTE itself is referenced multiple times, it should be 
> evaluated only once and then be used multiple time. However, it is treated as 
> init plan and evaluated multiple times in hawq 1.x and 2.0. This has two 
> issues here:
> 1. If the query in CTE is "volatile" (i.e., select volatile function) or has 
> side effect (create/drop object in database), it may generate wrong result
> 2. The performance of the query is not so efficient since the query in CTE is 
> evaluated multiple times.
> Here is the steps to reproduce:
> 1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the 
> result is incorrect
> {noformat}
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   random   |  random
> ---+---
>  0.519145511090755 | 0.751198637764901
> (1 row)
> EXPLAIN
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   QUERY PLAN
> --
>  Nested Loop  (cost=0.04..0.77 rows=20 width=16)
>->  Result  (cost=0.01..0.02 rows=1 width=0)
>  InitPlan
>->  Result  (cost=0.00..0.01 rows=1 width=0)
>->  Materialize  (cost=0.03..0.09 rows=6 width=8)
>  ->  Result  (cost=0.01..0.02 rows=1 width=0)
>InitPlan
>  ->  Result  (cost=0.00..0.01 rows=1 width=0)
>  Settings:  default_hash_table_bucket_number=6
>  Optimizer status: legacy query optimizer
> (10 rows)
> {noformat}
> 2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the 
> result is i
> {noformat}
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
>   random   |  random
> ---+---
>  0.989214501809329 | 0.989214501809329
> (1 row)
> EXPLAIN
> WITH r AS (SELECT random())
> SELECT r1.*, r2.*
> FROM r AS r1, r AS r2;
> QUERY PLAN
> --
>  Nested Loop  (cost=0.01..0.06 rows=1 width=16)
>CTE r
>  ->  Result  (cost=0.00..0.01 rows=1 width=0)
>->  CTE Scan on r r1  (cost=0.00..0.02 rows=1 width=8)
>->  CTE Scan on r r2  (cost=0.00..0.02 rows=1 width=8)
> (5 rows){noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)