[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

gengliangwang Tue, 07 Aug 2018 00:54:01 -0700

Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r208114465
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsPushDownRequiredColumns.java
 ---
    @@ -21,22 +21,25 @@
     import org.apache.spark.sql.types.StructType;
     
     /**
    - * A mix-in interface for {@link DataSourceReader}. Data source readers 
can implement this
    + * A mix-in interface for {@link ScanConfigBuilder}. Data sources can 
implement this
      * interface to push down required columns to the data source and only 
read these columns during
      * scan to reduce the size of the data to be read.
      */
     @InterfaceStability.Evolving
    -public interface SupportsPushDownRequiredColumns extends DataSourceReader {
    +public interface SupportsPushDownRequiredColumns extends ScanConfigBuilder 
{
     
       /**
        * Applies column pruning w.r.t. the given requiredSchema.
        *
        * Implementation should try its best to prune the unnecessary columns 
or nested fields, but it's
        * also OK to do the pruning partially, e.g., a data source may not be 
able to prune nested
        * fields, and only prune top-level columns.
    -   *
    -   * Note that, data source readers should update {@link 
DataSourceReader#readSchema()} after
    -   * applying column pruning.
        */
       void pruneColumns(StructType requiredSchema);
    --- End diff --
    
    As we have a new method `prunedSchema`, should we rename this to 
`pruneSchema`? As the parameter is also schema.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to