[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval
[ https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378811#comment-15378811 ] Mario Briggs edited comment on SPARK-16545 at 7/15/16 3:57 AM: --- thanks. I looked into it too and i was not getting a fix that was satisfying to myself. The problem seems to be that Dataset assumes it has a QueryExecution with the Physical Plan (which is true in the batch case), since most of the Listener/metrics gathering functions want to dump this info , whereas in streaming we want only the 'inner' IncrementalExecution to produce the PhysicalPlan. I will submit what i have tried to do as well to ease the discussion points. was (Author: mariobriggs): thanks. I looked into it too and i was not getting a fix that was satisfying to myself. The problem seems to be that Dataset assumes it has a QueryExecution with the Physical Plan (which is true in the batch case), since most of the Listener/metrics gathering functions want to dump this info , whereas in streaming we want only the 'inner' IncrementalExecution to produce the PhysicalPlan. I will submit have i tried to do as well to ease the discussion points. > Structured Streaming : foreachSink creates the Physical Plan multiple times > per TriggerInterval > > > Key: SPARK-16545 > URL: https://issues.apache.org/jira/browse/SPARK-16545 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Mario Briggs > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval
[ https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376758#comment-15376758 ] Mario Briggs edited comment on SPARK-16545 at 7/14/16 11:01 AM: While looking at the performance of Structured streaming, found some excessive time being spent in the driver. Further looking into this, found the time spent in multiple (3 to be exact) initialisations of [QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85] due to multiple instances of QueryExecution created in the forEachSink.addBatch. Creation of physical plan involves more time and hence shouldn't be done more than once per TriggerInterval was (Author: mariobriggs): While looking at the performance of Structured streaming, found some excessive time being spent in the driver. Further looking into this, found the time spent in multiple (3 to be exact) initialisations of [QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85] due to multiple instances of QueryExecution created in the forEachSink.addBatch. Creation of physical plan involves more time and hence shouldn't be done more than once > Structured Streaming : foreachSink creates the Physical Plan multiple times > per TriggerInterval > > > Key: SPARK-16545 > URL: https://issues.apache.org/jira/browse/SPARK-16545 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Mario Briggs > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16545) Structured Streaming : foreachSink creates the Physical Plan multiple times per TriggerInterval
[ https://issues.apache.org/jira/browse/SPARK-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376758#comment-15376758 ] Mario Briggs edited comment on SPARK-16545 at 7/14/16 10:53 AM: While looking at the performance of Structured streaming, found some excessive time being spent in the driver. Further looking into this, found the time spent in multiple (3 to be exact) initialisations of [QueryExecution.executedPlan|https://github.com/mariobriggs/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L85] due to multiple instances of QueryExecution created in the forEachSink.addBatch. Creation of physical plan involves more time and hence shouldn't be done more than once was (Author: mariobriggs): While looking at the performance of Structured streaming, found some excessive time being spent in the driver. Further looking into this, found the time spent in multiple (3 to be exact) initialisations of QueryExecution.executedPlan due to multiple instances of QueryExecution created in the forEachSink.addBatch. Creation of physical plan involves more time and hence shouldn't be done more than once > Structured Streaming : foreachSink creates the Physical Plan multiple times > per TriggerInterval > > > Key: SPARK-16545 > URL: https://issues.apache.org/jira/browse/SPARK-16545 > Project: Spark > Issue Type: Bug > Components: SQL, Streaming >Affects Versions: 2.0.0 >Reporter: Mario Briggs > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org