GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/20387

    SPARK-22386: DataSourceV2: Use immutable logical plans.

    ## What changes were proposed in this pull request?
    
    DataSourceV2 should use immutable catalyst trees instead of wrapping a 
mutable DataSourceV2Reader. This commit updates DataSourceV2Relation and 
consolidates much of the DataSourceV2 API requirements for the read path in it. 
Instead of wrapping a reader that changes, the relation lazily produces a 
reader from its configuration.
    
    This commit also updates the predicate and projection push-down. Instead of 
the implementation from SPARK-22197, this reuses the rule matching from the 
Hive and DataSource read paths (using `PhysicalOperation`) and copies most of 
the implementation of `SparkPlanner.pruneFilterProject`, with updates for 
DataSourceV2. By reusing the implementation from other read paths, this should 
have fewer regressions from other read paths and is less code to maintain.
    
    The new push-down rules also support the following edge cases:
    
    * The output of DataSourceV2Relation should be what is returned by the 
reader, in case the reader can only partially satisfy the requested schema 
projection
    * The requested projection passed to the DataSourceV2Reader should include 
filter columns
    * The push-down rule may be run more than once if filters are not pushed 
through projections
    
    ## How was this patch tested?
    
    Existing push-down and read tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark 
SPARK-22386-push-down-immutable-trees

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20387.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20387
    
----
commit d3233e1a8b1d4d153146b1a536dee34246920b0d
Author: Ryan Blue <blue@...>
Date:   2018-01-17T21:58:12Z

    SPAKR-22386: DataSourceV2: Use immutable logical plans.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to