Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 > Why pushdown is happening in logical optimization and not during query planning. My first instinct would be to have the optimizer get operators as close to the leaves as possible and then fuse (or push down) as we convert to physical plan. I'm probably missing something. I think there are two reasons, but I'm not fully convinced by either one: * [`computeStats`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L232) is defined on logical plans, so the result of filter push-down needs to be a logical plan if we want to be able to use accurate stats for a scan. I'm interested here to ensure that we correctly produce broadcast relations based on the actual scan stats, not the table-level stats. Maybe there's another way to do this? * One of the tests for DSv2 ends up invoking the push-down rule twice, which made me think about whether or not that should be valid. I think it probably should be. For example, what if a plan has nodes that can all be pushed, but they aren't in the right order? Or what if a projection wasn't pushed through a filter because of a rule problem, but it can still be pushed down? Incremental fusing during optimization might be an extensible way to handle odd cases, or it may be useless. I'm not quite sure yet. It would be great to hear your perspective on these.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org