[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING URL: https://github.com/apache/spark/pull/25651#discussion_r323785172 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroDataSourceV2.scala ## @@ -35,7 +36,10 @@ class AvroDataSourceV2 extends FileDataSourceV2 { AvroTable(tableName, sparkSession, options, paths, None, fallbackFileFormat) } - override def getTable(options: CaseInsensitiveStringMap, schema: StructType): Table = { + override def getTable( + options: CaseInsensitiveStringMap, + schema: StructType, + partitions: Array[Transform]): Table = { Review comment: read options should be passed in `Table.newScanBuilder`. The `options` here is the table properties. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING URL: https://github.com/apache/spark/pull/25651#discussion_r320769069 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanSupportCheck.scala ## @@ -20,17 +20,20 @@ package org.apache.spark.sql.execution.datasources.v2 import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.streaming.{StreamingRelation, StreamingRelationV2} -import org.apache.spark.sql.sources.v2.TableCapability.{CONTINUOUS_READ, MICRO_BATCH_READ} +import org.apache.spark.sql.sources.v2.TableCapability.{BATCH_READ, CONTINUOUS_READ, MICRO_BATCH_READ} /** * This rules adds some basic table capability check for streaming scan, without knowing the actual * streaming execution mode. */ -object V2StreamingScanSupportCheck extends (LogicalPlan => Unit) { +object V2ScanSupportCheck extends (LogicalPlan => Unit) { import DataSourceV2Implicits._ override def apply(plan: LogicalPlan): Unit = { plan.foreach { + case r: DataSourceV2Relation if !r.table.supports(BATCH_READ) => +throw new AnalysisException( + s"Table ${r.table.name()} does not support batch scan.") Review comment: I've created a separate PR to fix it: https://github.com/apache/spark/pull/25679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING URL: https://github.com/apache/spark/pull/25651#discussion_r319931123 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java ## @@ -44,18 +45,28 @@ Table getTable(CaseInsensitiveStringMap options); /** - * Return a {@link Table} instance to do read/write with user-specified schema and options. + * Return a {@link Table} instance to do read/write with user-specified options and additional + * schema/partitions information. The additional schema/partitions information can be specified + * by users (e.g. {@code session.read.format("myFormat").schema(...)}) or retrieved from the + * metastore (e.g. {@code CREATE TABLE t(i INT) USING myFormat}). + * + * The returned table must report the same schema/partitions with the ones that are passed in. * * By default this method throws {@link UnsupportedOperationException}, implementations should - * override this method to handle user-specified schema. + * override this method to handle the additional schema/partitions information. * * @param options the user-specified options that can identify a table, e.g. file path, Kafka *topic name, etc. It's an immutable case-insensitive string-to-string map. - * @param schema the user-specified schema. + * @param schema the additional schema information. + * @param partitions the additional partitions information. * @throws UnsupportedOperationException */ - default Table getTable(CaseInsensitiveStringMap options, StructType schema) { + default Table getTable( Review comment: I'll refine the classdoc of this interface, after we reach an agreement of the proposal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org