Multi-query optimization is one of the big challenges of our field. Examples of multi-queries: * an INSERT statement that writes into a table but also updates an index, * a DAG that represents an ETL/ELT job; * a query that produces several data sets (say a list of invoices for orders and a list of products that need to be restocked); * a query that uses intermediate results more than once.
Common features of multi-queries are multiple output data sets, and re-used intermediate results. In other words, the dataflow graph is a DAG rather than just a tree. I think it would be useful to frame the problem by extending SQL so that such multi-queries can be represented as a single unit. From that would follow extensions to relational algebra, and improvements to planner algorithms and cost models. I would like to hear people's thoughts before I log a Jira case with a sketch of the problem. Related work: * https://issues.apache.org/jira/browse/CALCITE-481 Spool operator * https://issues.apache.org/jira/browse/CALCITE-1440 Multiple RelNodes * https://issues.apache.org/jira/browse/CALCITE-4568 Incremental query optimization * https://issues.apache.org/jira/browse/CALCITE-129 Recursive queries Julian