Github user henryr commented on the issue: https://github.com/apache/spark/pull/21049 I might be a bit of a hardliner on this, but I think it's correct to eliminate the {{ORDER BY}} from common table expressions (e.g. MSSQL agrees with me, see [this link](https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017#guidelines-for-creating-and-using-common-table-expressions)). However, given the principle of least surprise, I agree it might be a good idea to at least start with scalar and nested subqueries, and leave inline views for another day. That might be a bit harder to do (I think the rule will need a whitelist of operators it's ok to eliminate sorts below), and in general I think there'll be some missed opportunities, but it's a start :) Alternatively we could extend the analyzed logical plan to explicitly mark the different subquery types (i.e. have a `InlineView` node, a `NestedSubquery` node and so on). That would make these optimizations easier to express, but I have some reservations about the semantics of introducing those nodes. What do you think @dilipbiswal / @gatorsmile ?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org