[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

zsxwing Thu, 31 Mar 2016 12:51:28 -0700

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12049#discussion_r58116652
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -71,9 +71,18 @@ class StreamExecution(
       /** The current batchId or -1 if execution has not yet been initialized. 
*/
       private var currentBatchId: Long = -1
     
    +  private[sql] val logicalPlan = _logicalPlan.transform {
    +    case StreamingRelation(sourceCreator, output) =>
    +      // Materialize source to avoid creating it in every batch
    +      val source = sourceCreator()
    +      // We still need to use the previous `output` instead of 
`source.schema` as attributes in
    +      // "_logicalPlan" has already used attributes of the previous 
`output`.
    +      StreamingRelation(() => source, output)
    --- End diff --
    
    I tried `Map[DataSource, Source]` but failed because of RichSource.
    
    ```
      implicit class RichSource(s: Source) {
        def toDF(): DataFrame = Dataset.ofRows(sqlContext, StreamingRelation(s))
    
        def toDS[A: Encoder](): Dataset[A] = Dataset(sqlContext, 
StreamingRelation(s))
      }
    ```
    If we only have `StreamingRelaction(DataSource)`, then RichSource needs to 
create a DataSource for Source dynamically. 
    
    So the above codes will be changed to
    ```
      implicit class RichSource(s: Source) {
        def toDF(): DataFrame = Dataset.ofRows(sqlContext, 
StreamingRelation(DataSource(sqlContext, className = ...)))
    
        def toDS[A: Encoder](): Dataset[A] = Dataset(sqlContext, 
StreamingRelation(sqlContext, className = ...))
      }
    ```
    
    Here I don't what to fill for `className`. Without code generation, we 
won't be able to create a new class for different Source instances. This seems 
too complicated.
    
    Therefore, I used the `StreamExecutionRelation` idea finally.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

Reply via email to