[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

rdblue Fri, 19 Oct 2018 14:01:25 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r226780862
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
    @@ -169,15 +174,16 @@ object DataSourceV2Relation {
           options: Map[String, String],
           tableIdent: Option[TableIdentifier] = None,
           userSpecifiedSchema: Option[StructType] = None): 
DataSourceV2Relation = {
    -    val reader = source.createReader(options, userSpecifiedSchema)
    +    val readSupport = source.createReadSupport(options, 
userSpecifiedSchema)
    --- End diff --
    
    In the long term, I don't think that sources should use the reader to get a 
schema. This is a temporary hack until we have catalog support, which is really 
where schemas should come from.
    
    The way this works in our version (which is substantially ahead of upstream 
Spark, unfortunately), is that a Table is loaded by a Catalog. The schema 
reported by that table is used to validate writes. That way, the table can 
report it's schema and Spark knows that data written must be compatible with 
that schema, but the source isn't required to be readable.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to