GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/14214

    [SPARK-16545][SQL] Eliminate one unnecessary round of physical planning in 
ForeachSink

    ## Problem
    
    As reported by 
[SPARK-16545](https://issues.apache.org/jira/browse/SPARK-16545), in 
`ForeachSink` we have initialized 3 rounds of physical planning.
    
    Specifically:
    
    [1] In `StreamExecution`, 
[lastExecution.executedPlan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L369)
    
    [2] In `ForeachSink`, 
[forearchPartition()](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L69)
 calls withNewExecutionId(..., **_queryExection_**) which further calls 
[**_queryExecution_**.executedPlan](https://github.com/apache/spark/blob/9a5071996b968148f6b9aba12e0d3fe888d9acd8/sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala#L55)
     
    [3] In `ForeachSink`, [val rdd = { ... incrementalExecution = new 
IncrementalExecution 
...}](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L53)
    
    ## What changes were proposed in this pull request?
    
    [1] should not be eliminated in general;
    
    **[2] is eliminated by this patch, by replacing the `queryExecution` with 
`incrementalExecution` provided by [3];**
    
    [3] should be eliminated but can not be done at this stage; let's revisit 
it when SPARK-16264 is resolved.
    
    
    ## How was this patch tested?
    
    - checked manually now there are only 2 rounds of physical planning in 
ForeachSink after this patch
    - existing tests ensues it cause no regression


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark physical-3x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14214
    
----
commit 8ec635fe7403baf5149e3f6714872bf706b37cd7
Author: Liwei Lin <lwl...@gmail.com>
Date:   2016-07-15T02:12:02Z

    Fix foreachPartition

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to