GitHub user rdblue opened a pull request: https://github.com/apache/spark/pull/20387
SPARK-22386: DataSourceV2: Use immutable logical plans. ## What changes were proposed in this pull request? DataSourceV2 should use immutable catalyst trees instead of wrapping a mutable DataSourceV2Reader. This commit updates DataSourceV2Relation and consolidates much of the DataSourceV2 API requirements for the read path in it. Instead of wrapping a reader that changes, the relation lazily produces a reader from its configuration. This commit also updates the predicate and projection push-down. Instead of the implementation from SPARK-22197, this reuses the rule matching from the Hive and DataSource read paths (using `PhysicalOperation`) and copies most of the implementation of `SparkPlanner.pruneFilterProject`, with updates for DataSourceV2. By reusing the implementation from other read paths, this should have fewer regressions from other read paths and is less code to maintain. The new push-down rules also support the following edge cases: * The output of DataSourceV2Relation should be what is returned by the reader, in case the reader can only partially satisfy the requested schema projection * The requested projection passed to the DataSourceV2Reader should include filter columns * The push-down rule may be run more than once if filters are not pushed through projections ## How was this patch tested? Existing push-down and read tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/spark SPARK-22386-push-down-immutable-trees Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20387.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20387 ---- commit d3233e1a8b1d4d153146b1a536dee34246920b0d Author: Ryan Blue <blue@...> Date: 2018-01-17T21:58:12Z SPAKR-22386: DataSourceV2: Use immutable logical plans. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org