[jira] [Updated] (HAWQ-830) Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times
[ https://issues.apache.org/jira/browse/HAWQ-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Goden Yao updated HAWQ-830: --- Fix Version/s: backlog > Wrong result in CTE query due to CTE is treated as init plan by planner and > evaluated multiple times > > > Key: HAWQ-830 > URL: https://issues.apache.org/jira/browse/HAWQ-830 > Project: Apache HAWQ > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.0.0.0-incubating >Reporter: Ruilong Huo >Assignee: Ruilong Huo > Fix For: backlog > > > In CTE query, if the CTE itself is referenced multiple times, it should be > evaluated only once and then be used multiple time. However, it is treated as > init plan and evaluated multiple times in hawq 1.x and 2.0. This has two > issues here: > 1. If the query in CTE is "volatile" (i.e., select volatile function) or has > side effect (create/drop object in database), it may generate wrong result > 2. The performance of the query is not so efficient since the query in CTE is > evaluated multiple times. > Here is the steps to reproduce: > 1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the > result is incorrect > {noformat} > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > random | random > ---+--- > 0.519145511090755 | 0.751198637764901 > (1 row) > EXPLAIN > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > QUERY PLAN > -- > Nested Loop (cost=0.04..0.77 rows=20 width=16) >-> Result (cost=0.01..0.02 rows=1 width=0) > InitPlan >-> Result (cost=0.00..0.01 rows=1 width=0) >-> Materialize (cost=0.03..0.09 rows=6 width=8) > -> Result (cost=0.01..0.02 rows=1 width=0) >InitPlan > -> Result (cost=0.00..0.01 rows=1 width=0) > Settings: default_hash_table_bucket_number=6 > Optimizer status: legacy query optimizer > (10 rows) > {noformat} > 2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the > result is i > {noformat} > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > random | random > ---+--- > 0.989214501809329 | 0.989214501809329 > (1 row) > EXPLAIN > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > QUERY PLAN > -- > Nested Loop (cost=0.01..0.06 rows=1 width=16) >CTE r > -> Result (cost=0.00..0.01 rows=1 width=0) >-> CTE Scan on r r1 (cost=0.00..0.02 rows=1 width=8) >-> CTE Scan on r r2 (cost=0.00..0.02 rows=1 width=8) > (5 rows){noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HAWQ-830) Wrong result in CTE query due to CTE is treated as init plan by planner and evaluated multiple times
[ https://issues.apache.org/jira/browse/HAWQ-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruilong Huo updated HAWQ-830: - Affects Version/s: 2.0.0 > Wrong result in CTE query due to CTE is treated as init plan by planner and > evaluated multiple times > > > Key: HAWQ-830 > URL: https://issues.apache.org/jira/browse/HAWQ-830 > Project: Apache HAWQ > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.0.0 >Reporter: Ruilong Huo >Assignee: Ruilong Huo > > In CTE query, if the CTE itself is referenced multiple times, it should be > evaluated only once and then be used multiple time. However, it is treated as > init plan and evaluated multiple times in hawq 1.x and 2.0. This has two > issues here: > 1. If the query in CTE is "volatile" (i.e., select volatile function) or has > side effect (create/drop object in database), it may generate wrong result > 2. The performance of the query is not so efficient since the query in CTE is > evaluated multiple times. > Here is the steps to reproduce: > 1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the > result is incorrect > {noformat} > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > random | random > ---+--- > 0.519145511090755 | 0.751198637764901 > (1 row) > EXPLAIN > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > QUERY PLAN > -- > Nested Loop (cost=0.04..0.77 rows=20 width=16) >-> Result (cost=0.01..0.02 rows=1 width=0) > InitPlan >-> Result (cost=0.00..0.01 rows=1 width=0) >-> Materialize (cost=0.03..0.09 rows=6 width=8) > -> Result (cost=0.01..0.02 rows=1 width=0) >InitPlan > -> Result (cost=0.00..0.01 rows=1 width=0) > Settings: default_hash_table_bucket_number=6 > Optimizer status: legacy query optimizer > (10 rows) > {noformat} > 2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the > result is i > {noformat} > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > random | random > ---+--- > 0.989214501809329 | 0.989214501809329 > (1 row) > EXPLAIN > WITH r AS (SELECT random()) > SELECT r1.*, r2.* > FROM r AS r1, r AS r2; > QUERY PLAN > -- > Nested Loop (cost=0.01..0.06 rows=1 width=16) >CTE r > -> Result (cost=0.00..0.01 rows=1 width=0) >-> CTE Scan on r r1 (cost=0.00..0.02 rows=1 width=8) >-> CTE Scan on r r2 (cost=0.00..0.02 rows=1 width=8) > (5 rows){noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)