[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r241054082 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java ## @@ -37,23 +35,14 @@ public interface Table { /** - * A name to identify this table. Implementations should provide a meaningful name, like the - * database and table name from catalog, or the location of files for this table. - */ - String name(); - - /** - * Returns the schema of this table. + * Returns the schema of this table. If the table is not readable and doesn't have a schema, an + * empty schema can be returned here. */ StructType schema(); /** - * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. Spark will call - * this method for each data scanning query. - * - * The builder can take some query specific information to do operators pushdown, and keep these - * information in the created {@link Scan}. - * + * A name to identify this table. Implementations should provide a meaningful name, like the + * database and table name from catalog, or the location of files for this table. */ - ScanBuilder newScanBuilder(DataSourceOptions options); + String name(); Review comment: Can we move `name()` to line 37 before `schema()` like before? That will reduce the number of changed lines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r241054082 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java ## @@ -37,23 +35,14 @@ public interface Table { /** - * A name to identify this table. Implementations should provide a meaningful name, like the - * database and table name from catalog, or the location of files for this table. - */ - String name(); - - /** - * Returns the schema of this table. + * Returns the schema of this table. If the table is not readable and doesn't have a schema, an + * empty schema can be returned here. */ StructType schema(); /** - * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} later. Spark will call - * this method for each data scanning query. - * - * The builder can take some query specific information to do operators pushdown, and keep these - * information in the created {@link Scan}. - * + * A name to identify this table. Implementations should provide a meaningful name, like the + * database and table name from catalog, or the location of files for this table. */ - ScanBuilder newScanBuilder(DataSourceOptions options); + String name(); Review comment: Can we move `name()` before `schema()` like before? That will reduce the number of changed lines. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r240739659 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ## @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); Review comment: Okay. You think so. But, obviously, we already wasted a lot of time for this `laughably unlikely` stuff, didn't we? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r240702810 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ## @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); Review comment: I'd moving the schema stuff to a new single interface. That would be better for Single Responsibility Principle. @rdblue , in that case, we are good, right? Is there any other concerns? > I don't see the value in moving the schema to a different interface, and I think that moving the schema to an interface specific to the read path is worse because it causes the problem that a table must be readable to be writable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r240406374 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ## @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); Review comment: I understand what you mean. The way I see this is - This PR is reducing the scope of `Table` interface to specify a named entity. - Then, we can split it out to specify structural information because it's orthogonal. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits URL: https://github.com/apache/spark/pull/23266#discussion_r240388355 ## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java ## @@ -20,14 +20,27 @@ import org.apache.spark.annotation.Evolving; import org.apache.spark.sql.sources.v2.reader.Scan; import org.apache.spark.sql.sources.v2.reader.ScanBuilder; +import org.apache.spark.sql.types.StructType; /** - * An empty mix-in interface for {@link Table}, to indicate this table supports batch scan. - * - * If a {@link Table} implements this interface, its {@link Table#newScanBuilder(DataSourceOptions)} - * must return a {@link ScanBuilder} that builds {@link Scan} with {@link Scan#toBatch()} - * implemented. - * + * A mix-in interface for {@link Table} to provide data reading ability of batch processing. */ @Evolving -public interface SupportsBatchRead extends Table { } +public interface SupportsBatchRead extends Table { + + /** + * Returns the schema of this table. + */ + StructType schema(); Review comment: @rdblue . Validating is one of the important use cases, but there are another use cases. We've already received the request previously. > The schema is defined by the dataframe itself, not by the data source, i.e. it should be extracted from df.schema and not by source.createReader - http://apache-spark-developers-list.1001551.n3.nabble.com/Possible-bug-in-DatasourceV2-td25343.html This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org