[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval

2016-07-14 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378811#comment-15378811
 ] 

Mario Briggs edited comment on SPARK-16545 at 7/15/16 3:57 AM:
---

thanks. I looked into it too and i was not getting a fix that was satisfying to 
myself. The problem seems to be that Dataset assumes it has a QueryExecution 
with the Physical Plan (which is true in the batch case), since most of the 
Listener/metrics gathering functions want to dump this info , whereas in 
streaming we want only the 'inner' IncrementalExecution to produce the 
PhysicalPlan. I will submit what i have tried to do as well to ease the 
discussion points.


was (Author: mariobriggs):
thanks. I looked into it too and i was not getting a fix that was satisfying to 
myself. The problem seems to be that Dataset assumes it has a QueryExecution 
with the Physical Plan (which is true in the batch case), since most of the 
Listener/metrics gathering functions want to dump this info , whereas in 
streaming we want only the 'inner' IncrementalExecution to produce the 
PhysicalPlan. I will submit have i tried to do as well to ease the discussion 
points.

> Structured Streaming : foreachSink creates the Physical Plan multiple times 
> per TriggerInterval 
> 
>
> Key: SPARK-16545
> URL: https://issues.apache.org/jira/browse/SPARK-16545
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Mario Briggs
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval

2016-07-14 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376758#comment-15376758
 ] 

Mario Briggs edited comment on SPARK-16545 at 7/14/16 11:01 AM:


While looking at the performance of Structured streaming, found some excessive 
time being spent in the driver. 

Further looking into this, found the time spent in multiple (3 to be exact) 
initialisations of 
[QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85]
 due to multiple instances of QueryExecution created in the 
forEachSink.addBatch. 

Creation of physical plan involves more time and hence shouldn't be done more 
than once per TriggerInterval


was (Author: mariobriggs):
While looking at the performance of Structured streaming, found some excessive 
time being spent in the driver. 

Further looking into this, found the time spent in multiple (3 to be exact) 
initialisations of 
[QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85]
 due to multiple instances of QueryExecution created in the 
forEachSink.addBatch. 

Creation of physical plan involves more time and hence shouldn't be done more 
than once

> Structured Streaming : foreachSink creates the Physical Plan multiple times 
> per TriggerInterval 
> 
>
> Key: SPARK-16545
> URL: https://issues.apache.org/jira/browse/SPARK-16545
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Mario Briggs
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval

2016-07-14 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376758#comment-15376758
 ] 

Mario Briggs edited comment on SPARK-16545 at 7/14/16 10:53 AM:


While looking at the performance of Structured streaming, found some excessive 
time being spent in the driver. 

Further looking into this, found the time spent in multiple (3 to be exact) 
initialisations of 
[QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85]
 due to multiple instances of QueryExecution created in the 
forEachSink.addBatch. 

Creation of physical plan involves more time and hence shouldn't be done more 
than once


was (Author: mariobriggs):
While looking at the performance of Structured streaming, found some excessive 
time being spent in the driver. 

Further looking into this, found the time spent in multiple (3 to be exact) 
initialisations of QueryExecution.executedPlan due to multiple instances of 
QueryExecution created in the forEachSink.addBatch. 

Creation of physical plan involves more time and hence shouldn't be done more 
than once

> Structured Streaming : foreachSink creates the Physical Plan multiple times 
> per TriggerInterval 
> 
>
> Key: SPARK-16545
> URL: https://issues.apache.org/jira/browse/SPARK-16545
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Streaming
>Affects Versions: 2.0.0
>Reporter: Mario Briggs
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org