Re: question about catalyst and TreeNode

2016-03-15 Thread Michael Armbrust
Trees are immutable, and TreeNode takes care of copying unchanged parts of
the tree when you are doing transformations.  As a result, even if you do
construct a DAG with the Dataset API, the first transformation will turn it
back into a tree.

The only exception to this rule is when we share the results of plans after
an Exchange operator.  This is the last step before execution and sometimes
turns the query into a DAG to avoid redundant computation.

On Tue, Mar 15, 2016 at 9:01 AM, Koert Kuipers  wrote:

> i am trying to understand some parts of the catalyst optimizer. but i
> struggle with one bigger picture issue:
>
> LogicalPlan extends TreeNode, which makes sense since the optimizations
> rely on tree transformations like transformUp and transformDown.
>
> but how can a LogicalPlan be a tree? isnt it really a DAG? if it is
> possible to create diamond-like operator dependencies, then assumptions
> made in tree transformations could be wrong? for example pushing a limit
> operator down into a child sounds safe, but if that same child is also used
> by another operator (so it has another parent, no longer a tree) then its
> not safe at all.
>
> what am i missing here?
> thanks! koert
>


question about catalyst and TreeNode

2016-03-15 Thread Koert Kuipers
i am trying to understand some parts of the catalyst optimizer. but i
struggle with one bigger picture issue:

LogicalPlan extends TreeNode, which makes sense since the optimizations
rely on tree transformations like transformUp and transformDown.

but how can a LogicalPlan be a tree? isnt it really a DAG? if it is
possible to create diamond-like operator dependencies, then assumptions
made in tree transformations could be wrong? for example pushing a limit
operator down into a child sounds safe, but if that same child is also used
by another operator (so it has another parent, no longer a tree) then its
not safe at all.

what am i missing here?
thanks! koert