[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

rdblue Fri, 02 Feb 2018 15:27:59 -0800

Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20387

> Why pushdown is happening in logical optimization and not during query
planning. My first instinct would be to have the optimizer get operators as
close to the leaves as possible and then fuse (or push down) as we convert to
physical plan. I'm probably missing something.

I think there are two reasons, but I'm not fully convinced by either one:

*
[`computeStats`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L232)
is defined on logical plans, so the result of filter push-down needs to be a
logical plan if we want to be able to use accurate stats for a scan. I'm
interested here to ensure that we correctly produce broadcast relations based
on the actual scan stats, not the table-level stats. Maybe there's another way
to do this?
* One of the tests for DSv2 ends up invoking the push-down rule twice,
which made me think about whether or not that should be valid. I think it
probably should be. For example, what if a plan has nodes that can all be
pushed, but they aren't in the right order? Or what if a projection wasn't
pushed through a filter because of a rule problem, but it can still be pushed
down? Incremental fusing during optimization might be an extensible way to
handle odd cases, or it may be useless. I'm not quite sure yet.

It would be great to hear your perspective on these.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20387: [SPARK-23203][SQL]: DataSourceV2: Use immutable logical ...

Reply via email to