[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

2019-08-02 Thread Ruben Quesada Lopez (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898622#comment-16898622
 ] 

Ruben Quesada Lopez commented on CALCITE-481:
-

Ok, [~julianhyde].

> Add "Spool" operator, to allow re-use of relational expressions
> ---
>
> Key: CALCITE-481
> URL: https://issues.apache.org/jira/browse/CALCITE-481
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Fix For: 1.21.0
>
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
> https://hal.inria.fr/hal-01353891/document.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

2019-04-05 Thread Ruben Quesada Lopez (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810857#comment-16810857
 ] 

Ruben Quesada Lopez commented on CALCITE-481:
-

FYI, a very, very basic (and experimental) Spool API has been proposed in this 
[PR|https://github.com/apache/calcite/pull/1020], since a simple TableSpool was 
needed in the implementation of CALCITE-2812.

> Add "Spool" operator, to allow re-use of relational expressions
> ---
>
> Key: CALCITE-481
> URL: https://issues.apache.org/jira/browse/CALCITE-481
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
> https://hal.inria.fr/hal-01353891/document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

2019-03-26 Thread Danny Chan (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801636#comment-16801636
 ] 

Danny Chan commented on CALCITE-481:


Cool topic, just found it first time, i would like to contribute this, [~R0ger] 
maybe we can work together.

> Add "Spool" operator, to allow re-use of relational expressions
> ---
>
> Key: CALCITE-481
> URL: https://issues.apache.org/jira/browse/CALCITE-481
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
> https://hal.inria.fr/hal-01353891/document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-481) Add "Spool" operator, to allow re-use of relational expressions

2019-03-25 Thread Roger Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800496#comment-16800496
 ] 

Roger Shi commented on CALCITE-481:
---

To find the cheapest plan in Volcano dynamic programming model, I'd like to 
propose a possible solution. The result may be not "really" the cheapest but 
considering re-use at least.

The key point is detect the reuse at computing cumulative cost for RelNode. 
Currently cumulative cost is computed recursively not considering what RelNodes 
has been visited. If we get the cost-contributing RelNodes along with the cost 
itself, it'd be easy to detect the reused Relnodes. 

For instance,
{code:java}
UnionAll
input(0) ==  Filter - Scan
input(1) ==  Filter - Scan
{code}
When we compute the cumulative cost of input(0), RelNodes "Filter" and "Scan" 
are return at the same time. After visiting input(1) we found that "Filter" and 
"Scan" have been visted before, so their cost should not be added once more. 
The final result of "UnionAll"'s cumulative cost is cost(UnionAll) + 
cost(Filter) + cost(Scan).

Compared with integer linear programming, the benefit of above method is its 
low compute complexity. 

> Add "Spool" operator, to allow re-use of relational expressions
> ---
>
> Key: CALCITE-481
> URL: https://issues.apache.org/jira/browse/CALCITE-481
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> If a sub-tree occurs more than once in a query an efficient plan would 
> probably evaluate once and have two readers read the same data. We propose a 
> "Spool" relational expression for this purpose.
> Spool would have one input, the expression that populates it.
> In the VolcanoPlanner, any RelNode can already have multiple consumers (each 
> of which sees the same row type and the same data) but an optimal plan does 
> not typically include multiple uses of the same node, so most implementors 
> (e.g. EnumerableRelImplementor) would just not notice, and generate the same 
> code twice. Having an explicit Spool would alert the implementor to re-use 
> the result.
> We do not prescribe a mechanism for implementing Spool as a physical 
> operator. A job that populates a temporary table is one possible mechanism.
> As part of this case, we should implement Spool in Enumerable convention, and 
> use it to evaluate some test queries.
> The other reason to implement Spool is costing. The cost of a Spool with N 
> consumers is typically something like A + B . N. A, the fixed cost, is 
> significantly larger than B, the re-play cost.
> Volcano's dynamic programming model does not make it easy to account for 
> re-use. There are approaches in academia based on integer linear programming; 
> see e.g. http://www.slideshare.net/INRIA-OAK/plreuse and 
> https://hal.inria.fr/hal-01353891/document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)