Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208347697 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsPushDownRequiredColumns.java --- @@ -21,22 +21,25 @@ import org.apache.spark.sql.types.StructType; /** - * A mix-in interface for {@link DataSourceReader}. Data source readers can implement this + * A mix-in interface for {@link ScanConfigBuilder}. Data sources can implement this * interface to push down required columns to the data source and only read these columns during * scan to reduce the size of the data to be read. */ @InterfaceStability.Evolving -public interface SupportsPushDownRequiredColumns extends DataSourceReader { +public interface SupportsPushDownRequiredColumns extends ScanConfigBuilder { /** * Applies column pruning w.r.t. the given requiredSchema. * * Implementation should try its best to prune the unnecessary columns or nested fields, but it's * also OK to do the pruning partially, e.g., a data source may not be able to prune nested * fields, and only prune top-level columns. - * - * Note that, data source readers should update {@link DataSourceReader#readSchema()} after - * applying column pruning. */ void pruneColumns(StructType requiredSchema); + + /** + * Returns the schema after the column pruning is applied, so that Spark can know if some + * columns/nested fields are not pruned. + */ + StructType prunedSchema(); --- End diff -- I don't see a reason to add this. Why not get the final schema from the `ScanConfig`? Getting the schema from the `ScanConfig` is better because it is clear when the pruned schema will be accessed: after all pushdown methods are called. That matters because filters may cause the source to require more columns and the source may choose to return those columns to Spark instead of adding a projection. Deferring the projection to Spark is more efficient if Spark was going to add one anyway.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org