[ https://issues.apache.org/jira/browse/CALCITE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803368#comment-17803368 ]
Jay Narale edited comment on CALCITE-6188 at 1/5/24 12:55 AM: -------------------------------------------------------------- We in Presto have recently supported CTE materialization [https://github.com/prestodb/presto/pull/20887. |https://github.com/prestodb/presto/pull/20887]The implementation heavily uses a new type of logical planNode called a sequence node which makes sure that deadlocks don't exist. Though our current implementation is not 100% optimized, we have a similar goal of optimizing all DAGs together. was (Author: jaynarale): We in Presto have recently supported CTE materialization [https://github.com/prestodb/presto/pull/20887. |https://github.com/prestodb/presto/pull/20887]The implementation heavily uses a new type of logical planNode called a sequence node. Though our current implementation is not 100% optimized, we have a similar goal of optimizing all DAGs together. > Multi-query optimization > ------------------------ > > Key: CALCITE-6188 > URL: https://issues.apache.org/jira/browse/CALCITE-6188 > Project: Calcite > Issue Type: Improvement > Reporter: Julian Hyde > Priority: Major > > Devise extensions to SQL so that queries with multiple input tables, multiple > intermediate tables, and multiple output data sets (result sets and DML > operations) can be defined in one SQL statement, optimized, and executed > atomically. > There are many flavors of multi-query optimization, depending on whether each > occurrence of "multiple" in the previous paragraph is replaced with 0, 1 or > "several". Our goal, here, is to allow them all to be expressed. We can then > devise planning strategies that work for particular flavors. > Examples of multi-queries: > * {*}Multiple DML outputs{*}. An INSERT statement that writes into a table > but also updates an index, > * {*}Multiple DML outputs, complex intermediate tables{*}. A DAG that > represents an ETL/ELT job; > * {*}Multiple query outputs{*}. A query that produces several data sets (say > a list of invoices for > orders and a list of products that need to be restocked); > * {*}DAG query{*}. A query that uses intermediate results more than once. > See discussion in the [Multi-query optimization email > thread|https://lists.apache.org/thread/mcdqwrtpx0os54t2nn9vtk17spkp5o5k]. > Here are some SQL examples. > We add a new keyword {{MULTI}} that represents a statement whose output > contains multiple data sets and DML operations, each with a unique name. For > intermediate results, we use the existing {{WITH}} clause. > h3. 1. Multi DML > Read from one or more tables, write to one or more tables. > An example is inserting into a table and also an index on that table > (represented as a sorted table). > {code:sql} > WITH > empDelta AS ( > SELECT * FROM emp2 > EXCEPT > SELECT * FROM emp) > MULTI > insertEmp AS ( > INSERT INTO emp > TABLE empDelta), > insertEmpDeptno AS ( > MERGE empDeptno AS e > USING TABLE empDelta AS d > ON e.deptno = d.deptno > WHEN NOT MATCHED THEN INSERT VALUES (deptno)); > {code} > h3. 2. Query that creates temporary table and uses it more than once > {code:sql} > WITH > temp AS ( > SELECT * > FROM emp AS e > JOIN dept USING (deptno) > WHERE e.job = 'MANAGER' > OR d.location = 'CHICAGO') > SELECT deptno, > (SELECT AVG(sal) > FROM temp AS t > WHERE t.deptno = e.deptno) AS deptAvgSal, > (SELECT AVG(sal) > FROM temp AS t > WHERE t.job = e.job) AS jobAvgSal > FROM e > WHERE e.deptno IN (10, 20); > {code} > h3. 3. Query whose optimal plan might use a temporary table > This query produces the same result as the previous query. There is a common > relational expression, so the optimizer should consider a DAG plan with a > reified intermediate result. > {code:sql} > SELECT deptno, > (SELECT AVG(e2.sal) > FROM emp AS e2 > JOIN dept AS d USING (deptno) > WHERE (e2.job = 'MANAGER' > OR d.location = 'CHICAGO') > AND e2.deptno = e.deptno) AS deptAvgSal, > (SELECT AVG(e3.sal) > FROM emp AS e3 > JOIN dept AS d USING (deptno) > WHERE (e3.job = 'MANAGER' > OR d.location = 'CHICAGO') > AND e3.job = e.job) AS jobAvgSal > FROM e > WHERE e.deptno IN (10, 20); > {code} > h3. 4. Query that produces several data sets > {code:sql} > WITH > newOrders AS ( > SELECT * > FROM orders > WHERE orderDate > DATE '2023-01-25') > MULTI > invoices AS ( > SELECT customerName, SUM(amount) > FROM newOrders > GROUP BY customerName), > restock AS ( > SELECT productId > FROM inventory > WHERE productId IN ( > SELECT productId FROM newOrders) > AND itemsOnHand < 10); > {code} > h3. 5. Query with a complex DAG, multiple output data sets and one DML > {code:sql} > WITH > t0 AS ( > SELECT * FROM t WHERE x > 5), > t00 AS ( > SELECT * FROM t0 WHERE y < 10), > t000 AS ( > SELECT x, MIN(y) FROM t00 GROUP BY x), > t1 AS ( > SELECT * FROM t WHERE y > 3), > t10 AS ( > SELECT * FROM t1 WHERE x < 8), > t2 AS ( > SELECT * FROM t000 > INTERSECT > SELECT * FROM t10), > t3 AS ( > SELECT * FROM u > INTERSECT > SELECT * FROM t00) > MULTI > q0 AS (SELECT * FROM t3), > q1 AS (SELECT * FROM t2), > d0 AS (DELETE FROM v WHERE v.x IN (SELECT x FROM t1)); > {code} > The data flow is the following DAG: > {noformat} > u --------------------+ > +--------------> t3 (q0) > +--> t0 --> t00 --+ > | +--> t000 --+ > t --+ +--> t2 (q1) > | +--> t10 -----------+ > +--> t1 --+ > +----------------------> d0 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)