[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23266#discussion_r240074711 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); --- End diff -- To me, +1 for the current change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23266#discussion_r240073831 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); + + /** + * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. The built + * {@link Scan} must implement {@link Scan#toBatch()}. Spark will call this method for each data + * scanning query. + * + * The builder can take some query specific information to do operators pushdown, and keep these + * information in the created {@link Scan}. + * + */ + ScanBuilder newScanBuilder(DataSourceOptions options); --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23266#discussion_r240053406 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java --- @@ -18,9 +18,6 @@ package org.apache.spark.sql.sources.v2; import org.apache.spark.annotation.Evolving; -import org.apache.spark.sql.sources.v2.reader.Scan; -import org.apache.spark.sql.sources.v2.reader.ScanBuilder; -import org.apache.spark.sql.types.StructType; /** * An interface representing a logical structured data set of a data source. For example, the --- End diff -- According to this update, maybe `a logical structured data set` -> `a logical named data set`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23266#discussion_r240029373 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); --- End diff -- I'm not sure about this. Maybe it's ok to leave `schema` in `Table`, and asks write-only table to report schema as empty. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23266 [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits ## What changes were proposed in this pull request? As discussed in https://github.com/apache/spark/pull/23208/files#r239684490 , we should put read related methods in read related mix-in traits like `SupportsBatchRead`, to support write-only table. In the `Append` operator, we should skip schema validation if the table is write-only. This will be done when we finish the write API refactor in https://github.com/apache/spark/pull/23208/ ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark ds-read Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23266 commit da520cc6e7daa36c7d7fabdb0a08d4b4341250b9 Author: Wenchen Fan Date: 2018-12-09T08:32:51Z move read related methods from Table to read related mix-in traits --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org