[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-12 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r241054082
 
 

 ##
 File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java
 ##
 @@ -37,23 +35,14 @@
 public interface Table {
 
   /**
-   * A name to identify this table. Implementations should provide a 
meaningful name, like the
-   * database and table name from catalog, or the location of files for this 
table.
-   */
-  String name();
-
-  /**
-   * Returns the schema of this table.
+   * Returns the schema of this table. If the table is not readable and 
doesn't have a schema, an
+   * empty schema can be returned here.
*/
   StructType schema();
 
   /**
-   * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} 
later. Spark will call
-   * this method for each data scanning query.
-   * 
-   * The builder can take some query specific information to do operators 
pushdown, and keep these
-   * information in the created {@link Scan}.
-   * 
+   * A name to identify this table. Implementations should provide a 
meaningful name, like the
+   * database and table name from catalog, or the location of files for this 
table.
*/
-  ScanBuilder newScanBuilder(DataSourceOptions options);
+  String name();
 
 Review comment:
   Can we move `name()` to line 37 before `schema()` like before? That will 
reduce the number of changed lines.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-12 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r241054082
 
 

 ##
 File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/Table.java
 ##
 @@ -37,23 +35,14 @@
 public interface Table {
 
   /**
-   * A name to identify this table. Implementations should provide a 
meaningful name, like the
-   * database and table name from catalog, or the location of files for this 
table.
-   */
-  String name();
-
-  /**
-   * Returns the schema of this table.
+   * Returns the schema of this table. If the table is not readable and 
doesn't have a schema, an
+   * empty schema can be returned here.
*/
   StructType schema();
 
   /**
-   * Returns a {@link ScanBuilder} which can be used to build a {@link Scan} 
later. Spark will call
-   * this method for each data scanning query.
-   * 
-   * The builder can take some query specific information to do operators 
pushdown, and keep these
-   * information in the created {@link Scan}.
-   * 
+   * A name to identify this table. Implementations should provide a 
meaningful name, like the
+   * database and table name from catalog, or the location of files for this 
table.
*/
-  ScanBuilder newScanBuilder(DataSourceOptions options);
+  String name();
 
 Review comment:
   Can we move `name()` before `schema()` like before? That will reduce the 
number of changed lines.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240739659
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   Okay. You think so. But, obviously, we already wasted a lot of time for this 
`laughably unlikely` stuff, didn't we?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240702810
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   I'd moving the schema stuff to a new single interface. That would be better 
for Single Responsibility Principle. @rdblue , in that case, we are good, 
right? Is there any other concerns?
   > I don't see the value in moving the schema to a different interface, and I 
think that moving the schema to an interface specific to the read path is worse 
because it causes the problem that a table must be readable to be writable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240406374
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   I understand what you mean. The way I see this is
   - This PR is reducing the scope of `Table` interface to specify a named 
entity.
   - Then, we can split it out to specify structural information because it's 
orthogonal.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits

2018-12-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #23266: [SPARK-26313][SQL] 
move read related methods from Table to read related mix-in traits
URL: https://github.com/apache/spark/pull/23266#discussion_r240388355
 
 

 ##
 File path: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java
 ##
 @@ -20,14 +20,27 @@
 import org.apache.spark.annotation.Evolving;
 import org.apache.spark.sql.sources.v2.reader.Scan;
 import org.apache.spark.sql.sources.v2.reader.ScanBuilder;
+import org.apache.spark.sql.types.StructType;
 
 /**
- * An empty mix-in interface for {@link Table}, to indicate this table 
supports batch scan.
- * 
- * If a {@link Table} implements this interface, its {@link 
Table#newScanBuilder(DataSourceOptions)}
- * must return a {@link ScanBuilder} that builds {@link Scan} with {@link 
Scan#toBatch()}
- * implemented.
- * 
+ * A mix-in interface for {@link Table} to provide data reading ability of 
batch processing.
  */
 @Evolving
-public interface SupportsBatchRead extends Table { }
+public interface SupportsBatchRead extends Table {
+
+  /**
+   * Returns the schema of this table.
+   */
+  StructType schema();
 
 Review comment:
   @rdblue . Validating is one of the important use cases, but there are 
another use cases. We've already received the request previously. 
   > The schema is defined by the dataframe itself, not by the data source, 
i.e. it should be extracted from df.schema and not by source.createReader 
   
   - 
http://apache-spark-developers-list.1001551.n3.nabble.com/Possible-bug-in-DatasourceV2-td25343.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org