[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14214 @mariobriggs Thanks for the information! > 1 can be eliminated because 'executedPlan' is a ' lazy val' on QueryExecution ? Yea indeed. Its being there can provide us debug info but on second thought it might not be worth it. So let's also skip it in the case of `ForeachSink`. > ... However this cannot be the solution since SparkListenerSQLExecutionStart is a public API already Yea we probably do not want to modify this public API; so what we did in this patch was, passing [3]'s `incrementalExecution` into the listener so we'll only initialize physical planning only once for [2] and [3]. > ... why not keep [1] and the change to [2] be the simple case of changing L52 to the following: new Dataset(data.sparkSession, data.queryExecution, implicitly[Encoder[T]]) This is great. If reviews decide that 2 rounds of physical planning is acceptable, then let's do it your way! > ... ConsoleSink ... but it is only for Debug purposes So maybe let us live with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14214 **[Test build #62368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62368/consoleFull)** for PR 14214 at commit [`9334105`](https://github.com/apache/spark/commit/933410558429ea82f063f21236b8c5c645650a78). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62362/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14214 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14214 **[Test build #62362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)** for PR 14214 at commit [`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/14214 What i tried to do as a 'side fix' was like this, eliminate [1] since it was a lazy val. Move [2] out of the code path of the main thread i.e. let ListenerBus thread pay the penalty of producing the physical plan for logging ( i was coming from a performance test scenario, so it allowed me to proceed :-) ) . So the change was that SparkListenerSQLExecutionStart only take QueryExecution as a input parameter and not physicalPlanDescription & SparkPlanInfo . However this cannot be the solution since SparkListenerSQLExecutionStart is a public API already. [3] remains. As you might have already noticed ConsoleSink also suffers from the same problem of [2] and these are inside Dataset.withTypedCallback/withCallback, but it is only for Debug purposes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user mariobriggs commented on the issue: https://github.com/apache/spark/pull/14214 > [1] should not be eliminated in general; I dont understand the full internal aspects of IncrementalExecution, but my generally thinking was that 1 can be eliminated because 'executedPlan' is a ' lazy val' on QueryExecution ? >[2] is eliminated by this patch, by replacing the queryExecution with incrementalExecution provided by [3]; If the goal is to get it to just as minimal as possible for now and wait for SPARK-16264 (which i was also thinking where it will have to finally wait for full resolution), why not keep [1] and the change to [2] be the simple case of changing [L52](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L52) to the following ``` new Dataset(data.sparkSession, data.queryExecution, implicitly[Encoder[T]]) ``` and no further changes required to your ealier code. Will it be the case that the wrong physical plan will logged in SparkListenerSQLExecutionStart ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14214: [SPARK-16545][SQL] Eliminate one unnecessary round of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14214 **[Test build #62362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62362/consoleFull)** for PR 14214 at commit [`8ec635f`](https://github.com/apache/spark/commit/8ec635fe7403baf5149e3f6714872bf706b37cd7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org