[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

rdblue Wed, 28 Nov 2018 09:11:54 -0800

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23086#discussion_r237178976
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala
 ---
    @@ -23,29 +23,28 @@ import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.plans.physical
     import org.apache.spark.sql.catalyst.plans.physical.SinglePartition
     import org.apache.spark.sql.execution.{ColumnarBatchScan, LeafExecNode, 
WholeStageCodegenExec}
    -import org.apache.spark.sql.execution.streaming.continuous._
     import org.apache.spark.sql.sources.v2.DataSourceV2
     import org.apache.spark.sql.sources.v2.reader._
    -import 
org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousPartitionReaderFactory,
 ContinuousReadSupport, MicroBatchReadSupport}
     
     /**
    - * Physical plan node for scanning data from a data source.
    + * Physical plan node for scanning a batch of data from a data source.
      */
     case class DataSourceV2ScanExec(
         output: Seq[AttributeReference],
         @transient source: DataSourceV2,
         @transient options: Map[String, String],
    --- End diff --
    
    With a catalog, there is no expectation that a `source` will be passed. 
This could be a string that identifies either the source or the catalog, for a 
good string representation of the physical plan. This is another area where I 
think `Table.name` would be helpful because the table's identifying information 
is really what should be shown instead of its source or catalog.
    
    For options, these are part of the scan and aren't used to affect the 
behavior of this physical node. I think that means that they shouldn't be part 
of the node's arguments.
    
    I think a good way to solve this problem is to change the pretty string 
format to use `Scan` instead. That has the information that defines what this 
node is doing, like the filters, projection, and options. And being able to 
convert a logical scan to text would be useful across all 3 execution modes.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

Reply via email to