Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20397#discussion_r164335994 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsScanColumnarBatch.java --- @@ -30,21 +30,21 @@ @InterfaceStability.Evolving public interface SupportsScanColumnarBatch extends DataSourceV2Reader { @Override - default List<ReadTask<Row>> createReadTasks() { + default List<DataReaderFactory<Row>> createDataReaderFactories() { --- End diff -- `DataReaderFactory` is responsible to do serialization and initialize the actual data readers, so data reader creation must be done at executor side, and before that we need to determine how many RDD partitions we want, which is this method doing.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org