[GitHub] [spark] liucht-inspur commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
liucht-inspur commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994#issuecomment-541521437 > @liucht-inspur Did u handle the tooltip for the Live UI Page for Spark. Yeah, exactly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal edited a comment on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation
dilipbiswal edited a comment on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation URL: https://github.com/apache/spark/pull/26011#issuecomment-541519499 The idea looks reasonable to me. cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation
dilipbiswal commented on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation URL: https://github.com/apache/spark/pull/26011#issuecomment-541519499 looks reasonable to me. cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26057: [SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning
zhengruifeng commented on issue #26057: [SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning URL: https://github.com/apache/spark/pull/26057#issuecomment-541518393 merged to master, thanks all This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #26057: [SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning
zhengruifeng closed pull request #26057: [SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning URL: https://github.com/apache/spark/pull/26057 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AbhishekNew commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
AbhishekNew commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994#issuecomment-541516449 @liucht-inspur Did u handle the tooltip for the Live UI Page for Spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation
dilipbiswal commented on a change in pull request #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation URL: https://github.com/apache/spark/pull/26011#discussion_r334335996 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RemoveSortInSubquery.scala ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{NamedExpression, PredicateHelper} +import org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, OrderIrrelevantAggs} +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule + +/** + * [[Sort]] without [[Limit]] in subquery is useless. For example, + * + * {{{ + * SELECT * FROM + *(SELECT f1 FROM tbl1 ORDER BY f2) temp1 + * JOIN + *(SELECT f3 FROM tbl2) temp2 + * ON temp1.f1 = temp2.f3 + * }}} + * + * is equal to + * + * {{{ + * SELECT * FROM + * (SELECT f1 FROM tbl1) temp1 + * JOIN + * (SELECT f3 FROM tbl2) temp2 + * ON temp1.f1 = temp2.f3" + * }}} + * + * This rule try to remove this kind of [[Sort]] operator. + */ +object RemoveSortInSubquery extends Rule[LogicalPlan] with PredicateHelper { Review comment: Should the existing RemoveRedundantSorts handle this as well ? The reason i ask is, i don't see any thing subquery specific in the new rule ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
LantaoJin commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334335548 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -66,4 +91,37 @@ class SQLHadoopMapReduceCommitProtocol( logInfo(s"Using output committer class ${committer.getClass.getCanonicalName}") committer } + + /** + * Called on the driver after a task commits. This can be used to access task commit messages + * before the job has finished. These same task commit messages will be passed to commitJob() + * if the entire job succeeds. + * Override it to check dynamic partition limitation on driver side. + */ + override def onTaskCommit(taskCommit: TaskCommitMessage): Unit = { Review comment: `SQLHadoopMapReduceCommitProtocol.onTaskCommit` overrides `FileCommitProtocol.onTaskCommit` on purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541514738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112005/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541514733 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541514733 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541514738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112005/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
SparkQA removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541512741 **[Test build #112005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112005/testReport)** for PR 25894 at commit [`0cb396b`](https://github.com/apache/spark/commit/0cb396bc50d0c3e9a3c6528f99ac58bc5fdc3901). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
SparkQA commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541514654 **[Test build #112005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112005/testReport)** for PR 25894 at commit [`0cb396b`](https://github.com/apache/spark/commit/0cb396bc50d0c3e9a3c6528f99ac58bc5fdc3901). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
LantaoJin commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334335126 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -66,4 +91,37 @@ class SQLHadoopMapReduceCommitProtocol( logInfo(s"Using output committer class ${committer.getClass.getCanonicalName}") committer } + + /** + * Called on the driver after a task commits. This can be used to access task commit messages + * before the job has finished. These same task commit messages will be passed to commitJob() + * if the entire job succeeds. + * Override it to check dynamic partition limitation on driver side. + */ + override def onTaskCommit(taskCommit: TaskCommitMessage): Unit = { Review comment: > this implementation completely hides org.apache.spark.internal.io.HadoopMapReduceCommitProtocol#commitTask No. `onTaskCommit` doesn't hide `commitTask`. Actually, `commitTask` is called on executor side but `onTaskCommit` is called on driver side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
SparkQA commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541512741 **[Test build #112005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112005/testReport)** for PR 25894 at commit [`0cb396b`](https://github.com/apache/spark/commit/0cb396bc50d0c3e9a3c6528f99ac58bc5fdc3901). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541512163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112004/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
SparkQA removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508496 **[Test build #112004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112004/testReport)** for PR 26107 at commit [`83d87bd`](https://github.com/apache/spark/commit/83d87bdea517ef3fcb8f5bee4a2b8692dbd3dd64). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541512157 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #26039: [SPARK-29366][SQL] Subqueries created for DPP are not printed in EXPLAIN FORMATTED
dilipbiswal commented on a change in pull request #26039: [SPARK-29366][SQL] Subqueries created for DPP are not printed in EXPLAIN FORMATTED URL: https://github.com/apache/spark/pull/26039#discussion_r334333472 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -199,8 +199,8 @@ object ExplainUtils { case s: BaseSubqueryExec => subqueries += ((p, e, s)) getSubqueries(s, subqueries) + case _ => Review comment: @cloud-fan Got it... I will send a small follow-up. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
SparkQA commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541512138 **[Test build #112004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112004/testReport)** for PR 26107 at commit [`83d87bd`](https://github.com/apache/spark/commit/83d87bdea517ef3fcb8f5bee4a2b8692dbd3dd64). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541512163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112004/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541512157 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541511410 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17017/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins removed a comment on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541511406 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541511410 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17017/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
AmplabJenkins commented on issue #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#issuecomment-541511406 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
dilipbiswal commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#discussion_r334332626 ## File path: docs/sql-getting-started.md ## @@ -346,6 +346,9 @@ For example: +## Scalar Functions +(to be filled soon) Review comment: @gatorsmile created [here](https://issues.apache.org/jira/browse/SPARK-29458) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
gatorsmile commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#discussion_r334331120 ## File path: docs/sql-getting-started.md ## @@ -346,6 +346,9 @@ For example: +## Scalar Functions +(to be filled soon) Review comment: Create a JIRA? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
dilipbiswal commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#discussion_r334331187 ## File path: docs/sql-ref-syntax-ddl-create-function.md ## @@ -19,4 +19,154 @@ license: | limitations under the License. --- -**This page is under construction** +### Description +The `CREATE FUNCTION` statement is used to create a temporary or permanent function +in Spark. Temporary functions are scoped at a session level where as permanent +functions are created in the persistent catalog and are made available to +all sessions. The resources specified in the `USING` clause are made available +to all executors when they are executed for the first time. In addition to the +SQL interface, spark allows users to create custom user defined scalar and +aggregate functions using Scala, Python and Java APIs. Please refer to +[scalar_functions](sql-getting-started.html#scalar-functions) and +[aggregate functions](sql-getting-started#aggregations) for more information. + +### Syntax +{% highlight sql %} +CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] +function_name AS class_name [ resource_locations ] +{% endhighlight %} + +### Parameters + + OR REPLACE + +If specified, the resources for function are reloaded. This is mainly useful +to pick up any changes made to the implementation of the function. This +parameter is mutually exclusive to IF NOT EXISTS and can not +be specified together. + + TEMPORARY + +Indicates the scope of function being created. When TEMPORARY is specified, the +created function is valid and visible in the current session. No persistent +entry is made in the catalog for these kind of functions. + + IF NOT EXISTS + +If specified, creates the function only when it does not exist. The creation +of function succeeds (no error is thrown), if the specified function already +exists in the system. This parameter is mutually exclusive to OR REPLACE +and can not be specified together. + + function_name + +Specifies a name of funnction to be created. The function name may be +optionally qualified with a database name. +Syntax: + +[database_name.]function_name + + + class_name + +Specifies the name of the class that provides the implementation for function to be created. +The implementing class should extend from one of the base classes as follows: + + Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package. + Should extend AbstractGenericUDAFResolver, GenericUDF, or + GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package. + Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package. + + + resource_locations + +Specifies the list of resources that contain the implementation of the function +along with its dependencies. +Syntax: + +USING { { (JAR | FILE ) resource_uri} , ...} + + + + +### Examples +{% highlight sql %} +-- 1. Create a simple UDF `SimpleUdf` that adds the supplied integral value by 10. +--import org.apache.hadoop.hive.ql.exec.UDF; +--public class SimpleUdf extends UDF { +-- public int evaluate(int value) { +-- return value + 10; +-- } +--} +-- 2. Compile and place it in a jar file called `SimpleUdf.jar` in /tmp. + +-- Create a table called `test` and insert two rows. +CREATE TABLE test(c1 INT); +INSERT INTO test VALUES (1), (2); + +-- Create a permanent function called `simple_udf`. +CREATE FUNCTION simple_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the function is in the registry. +SHOW USER FUNCTIONS; + +--+ + | function| + +--+ + |default.simple_udf| + +--+ + +-- Invoke the function. Every selected value should be incremented by 10. +SELECT simple_udf(c1) AS function_return_value FROM t1; + +-+ + |function_return_value| + +-+ + | 11| + | 12| + +-+ + +-- Created a temporary function. +CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the newly created temporary function is in the registry. +-- Please note that the temporary function does not have a qualified +-- database associated with it. +SHOW USER FUNCTIONS; + +--+ + | function| + +--+ + |default.simple_udf| + | simple_temp_udf| + +--+ + +-- 1. Mofify `SimpleUdf`'s implementation to add supplied integral value by 20. +--import org.apache.hadoop.hive.ql.exec.UDF; + +--public class SimpleUdfR extends UDF { +-- public int evaluate(int value) { +
[GitHub] [spark] AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508728 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins removed a comment on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17016/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
gatorsmile commented on a change in pull request #25894: [SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference URL: https://github.com/apache/spark/pull/25894#discussion_r334330780 ## File path: docs/sql-ref-syntax-ddl-create-function.md ## @@ -19,4 +19,154 @@ license: | limitations under the License. --- -**This page is under construction** +### Description +The `CREATE FUNCTION` statement is used to create a temporary or permanent function +in Spark. Temporary functions are scoped at a session level where as permanent +functions are created in the persistent catalog and are made available to +all sessions. The resources specified in the `USING` clause are made available +to all executors when they are executed for the first time. In addition to the +SQL interface, spark allows users to create custom user defined scalar and +aggregate functions using Scala, Python and Java APIs. Please refer to +[scalar_functions](sql-getting-started.html#scalar-functions) and +[aggregate functions](sql-getting-started#aggregations) for more information. + +### Syntax +{% highlight sql %} +CREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ] +function_name AS class_name [ resource_locations ] +{% endhighlight %} + +### Parameters + + OR REPLACE + +If specified, the resources for function are reloaded. This is mainly useful +to pick up any changes made to the implementation of the function. This +parameter is mutually exclusive to IF NOT EXISTS and can not +be specified together. + + TEMPORARY + +Indicates the scope of function being created. When TEMPORARY is specified, the +created function is valid and visible in the current session. No persistent +entry is made in the catalog for these kind of functions. + + IF NOT EXISTS + +If specified, creates the function only when it does not exist. The creation +of function succeeds (no error is thrown), if the specified function already +exists in the system. This parameter is mutually exclusive to OR REPLACE +and can not be specified together. + + function_name + +Specifies a name of funnction to be created. The function name may be +optionally qualified with a database name. +Syntax: + +[database_name.]function_name + + + class_name + +Specifies the name of the class that provides the implementation for function to be created. +The implementing class should extend from one of the base classes as follows: + + Should extend UDF or UDAF in org.apache.hadoop.hive.ql.exec package. + Should extend AbstractGenericUDAFResolver, GenericUDF, or + GenericUDTF in org.apache.hadoop.hive.ql.udf.generic package. + Should extend UserDefinedAggregateFunction in org.apache.spark.sql.expressions package. + + + resource_locations + +Specifies the list of resources that contain the implementation of the function +along with its dependencies. +Syntax: + +USING { { (JAR | FILE ) resource_uri} , ...} + + + + +### Examples +{% highlight sql %} +-- 1. Create a simple UDF `SimpleUdf` that adds the supplied integral value by 10. +--import org.apache.hadoop.hive.ql.exec.UDF; +--public class SimpleUdf extends UDF { +-- public int evaluate(int value) { +-- return value + 10; +-- } +--} +-- 2. Compile and place it in a jar file called `SimpleUdf.jar` in /tmp. + +-- Create a table called `test` and insert two rows. +CREATE TABLE test(c1 INT); +INSERT INTO test VALUES (1), (2); + +-- Create a permanent function called `simple_udf`. +CREATE FUNCTION simple_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the function is in the registry. +SHOW USER FUNCTIONS; + +--+ + | function| + +--+ + |default.simple_udf| + +--+ + +-- Invoke the function. Every selected value should be incremented by 10. +SELECT simple_udf(c1) AS function_return_value FROM t1; + +-+ + |function_return_value| + +-+ + | 11| + | 12| + +-+ + +-- Created a temporary function. +CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf' + USING JAR '/tmp/SimpleUdf.jar'; + +-- Verify that the newly created temporary function is in the registry. +-- Please note that the temporary function does not have a qualified +-- database associated with it. +SHOW USER FUNCTIONS; + +--+ + | function| + +--+ + |default.simple_udf| + | simple_temp_udf| + +--+ + +-- 1. Mofify `SimpleUdf`'s implementation to add supplied integral value by 20. +--import org.apache.hadoop.hive.ql.exec.UDF; + +--public class SimpleUdfR extends UDF { +-- public int evaluate(int value) { +-
[GitHub] [spark] AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508728 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
AmplabJenkins commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17016/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
SparkQA commented on issue #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107#issuecomment-541508496 **[Test build #112004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112004/testReport)** for PR 26107 at commit [`83d87bd`](https://github.com/apache/spark/commit/83d87bdea517ef3fcb8f5bee4a2b8692dbd3dd64). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default
gengliangwang opened a new pull request #26107: [SPARK-28885][SQL] Follow ANSI store assignment rules in table insertion by default URL: https://github.com/apache/spark/pull/26107 ### What changes were proposed in this pull request? When inserting a value into a column with the different data type, Spark performs type coercion. Currently, we support 3 policies for the store assignment rules: ANSI, legacy and strict, which can be set via the option "spark.sql.storeAssignmentPolicy": 1. ANSI: Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting `string` to `int` and `double` to `boolean`. It will throw a runtime exception if the value is out-of-range(overflow). 2. Legacy: Spark allows the type coercion as long as it is a valid `Cast`, which is very loose. E.g., converting either `string` to `int` or `double` to `boolean` is allowed. It is the current behavior in Spark 2.x for compatibility with Hive. When inserting an out-of-range value to a integral field, the low-order bits of the value is inserted(the same as Java/Scala numeric type casting). For example, if 257 is inserted to a field of Byte type, the result is 1. 3. Strict: Spark doesn't allow any possible precision loss or data truncation in store assignment, e.g., converting either `double` to `int` or `decimal` to `double` is allowed. The rules are originally for Dataset encoder. As far as I know, no mainstream DBMS is using this policy by default. Currently, the V1 data source uses "Legacy" policy by default, while V2 uses "Strict". This proposal is to use "ANSI" policy by default for both V1 and V2 in Spark 3.0. ### Why are the changes needed? Following the ANSI SQL standard is most reasonable among the 3 policies. ### Does this PR introduce any user-facing change? Yes. The default store assignment policy is ANSI for both V1 and V2 data sources. ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution
JkSelf commented on issue #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#issuecomment-541502159 @cloud-fan Can you help review the updated patch? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true
sandeep-katta commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true URL: https://github.com/apache/spark/pull/26095#discussion_r334324057 ## File path: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ## @@ -47,8 +47,7 @@ private[spark] class BlockStoreShuffleReader[K, C]( context, blockManager.blockStoreClient, blockManager, - mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition, -SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)), + mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition), Review comment: This is redundant code, since ShuffleWrite writes the mapId based on the `spark.shuffle.useOldFetchProtocol` flag, `MapStatus.mapTaskId` always gives the mapId which is set by the ShuffleWriter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sandeep-katta commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true
sandeep-katta commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true URL: https://github.com/apache/spark/pull/26095#discussion_r334324057 ## File path: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ## @@ -47,8 +47,7 @@ private[spark] class BlockStoreShuffleReader[K, C]( context, blockManager.blockStoreClient, blockManager, - mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition, -SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)), + mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition), Review comment: This redundant code, since ShuffleWrite writes the mapId based on the `spark.shuffle.useOldFetchProtocol` flag, `MapStatus.mapTaskId` always gives the mapId which is set by the ShuffleWriter This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26092: [SPARK-29440][SQL] Support java.time.Duration as an external type of CalendarIntervalType
cloud-fan commented on issue #26092: [SPARK-29440][SQL] Support java.time.Duration as an external type of CalendarIntervalType URL: https://github.com/apache/spark/pull/26092#issuecomment-541499455 My opinion on this is to expose the `CalendarInterval` class directly, with 2 new methods `extractPeriod` and `extractDuration`. Semantically, `CalendarInterval` is java `Period` + `Duration`. I don't think we can map `CalendarInterval` to `Duration` as it's kind of a truncation. Like @MaxGekk said, we can also separate the interval type to year-month interval and day-time interval. But it's a lot of effort to change the type system and is not compatible with parquet. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #26103: [SPARK-29381][PYTHON][ML] Add _ before the XXXParams classes
huaxingao commented on issue #26103: [SPARK-29381][PYTHON][ML] Add _ before the XXXParams classes URL: https://github.com/apache/spark/pull/26103#issuecomment-541498973 Almost all of these quasi-internal classes _XXXParams are newly added in these parity jiras with very few exceptions. One of them is ```LSHParams```. If user has subclass this ```LSHParams```, with the name changed to ```_LSHParams```, user has to explicitly import this class like ```from pyspark.ml.feature import _LSHParams```because import * does not import objects whose names start with an underscore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true
cloud-fan commented on a change in pull request #26095: [SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true URL: https://github.com/apache/spark/pull/26095#discussion_r334322988 ## File path: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ## @@ -47,8 +47,7 @@ private[spark] class BlockStoreShuffleReader[K, C]( context, blockManager.blockStoreClient, blockManager, - mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition, -SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)), + mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition), Review comment: This is the shuffle read side and we need to know the value of `SHUFFLE_USE_OLD_FETCH_PROTOCOL`. I think the bug is in the shuffle write side which is fixed in this PR. Do we really need to change the shuffle read side? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334320998 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -32,10 +35,14 @@ import org.apache.spark.sql.internal.SQLConf class SQLHadoopMapReduceCommitProtocol( jobId: String, path: String, -dynamicPartitionOverwrite: Boolean = false) +dynamicPartitionOverwrite: Boolean = false, Review comment: thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334321860 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -66,4 +91,37 @@ class SQLHadoopMapReduceCommitProtocol( logInfo(s"Using output committer class ${committer.getClass.getCanonicalName}") committer } + + /** + * Called on the driver after a task commits. This can be used to access task commit messages + * before the job has finished. These same task commit messages will be passed to commitJob() + * if the entire job succeeds. + * Override it to check dynamic partition limitation on driver side. + */ + override def onTaskCommit(taskCommit: TaskCommitMessage): Unit = { Review comment: this implementation completely hides org.apache.spark.internal.io.HadoopMapReduceCommitProtocol#commitTask, which was the behaviour earlier. Is it intensional? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334320842 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -63,7 +70,29 @@ class SQLHadoopMapReduceCommitProtocol( committer = ctor.newInstance() } } +totalPartitions = new AtomicInteger(0) logInfo(s"Using output committer class ${committer.getClass.getCanonicalName}") committer } + + override def newTaskTempFile( + taskContext: TaskAttemptContext, dir: Option[String], ext: String): String = { +val path = super.newTaskTempFile(taskContext, dir, ext) +totalPartitions.incrementAndGet() +if (dynamicPartitionOverwrite) { + if (totalPartitions.get > maxDynamicPartitions) { Review comment: oh thanks. now it makes things clearer. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table
itskals commented on a change in pull request #25840: [SPARK-29166][SQL] Add parameters to limit the number of dynamic partitions for data source table URL: https://github.com/apache/spark/pull/25840#discussion_r334320752 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala ## @@ -66,4 +68,18 @@ class SQLHadoopMapReduceCommitProtocol( logInfo(s"Using output committer class ${committer.getClass.getCanonicalName}") committer } + + override def newTaskTempFile( Review comment: Do we have any limitation like this before? I can help me recollect case where the config is only for SQL and not for other modes of operation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541493412 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541493408 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17015/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541493414 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541493414 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17015/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541493412 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file
AmplabJenkins removed a comment on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file URL: https://github.com/apache/spark/pull/26106#issuecomment-541492452 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file
AmplabJenkins commented on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file URL: https://github.com/apache/spark/pull/26106#issuecomment-541492632 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file
AmplabJenkins commented on issue #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file URL: https://github.com/apache/spark/pull/26106#issuecomment-541492452 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file
LuciferYang opened a new pull request #26106: [SPARK-29454][SQL]Reduce one unsafeProjection call when read parquet file URL: https://github.com/apache/spark/pull/26106 ### What changes were proposed in this pull request? ParquetGroupConverter call unsafeProjection function to covert SpecificInternalRow to UnsafeRow every times when read Parquet data file use ParquetRecordReader, then ParquetFileFormat will call unsafeProjection function to covert this UnsafeRow to another UnsafeRow again when partitionSchema is not empty , and on the other hand we PartitionReaderWithPartitionValues always do this convert process when use DataSourceV2. I think the first time convert in ParquetGroupConverter is redundant and ParquetRecordReader return a SpecificInternalRow is enough. ### How was this patch tested? Existing test case is enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541491201 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17015/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #26088: [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario
srowen commented on a change in pull request #26088: [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario URL: https://github.com/apache/spark/pull/26088#discussion_r334317158 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ## @@ -142,6 +142,12 @@ private[spark] object Config extends Logging { .stringConf .createOptional + val KUBERNETES_EXECUTOR_SCHEDULER_NAME = +ConfigBuilder("spark.kubernetes.executor.scheduler.name") + .doc("Specify the scheduler name for each executor pod") + .stringConf + .createWithDefault("") Review comment: Can you make this an optional config and only set it if present? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
SparkQA removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541486166 **[Test build #112003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112003/testReport)** for PR 25648 at commit [`f8eeaae`](https://github.com/apache/spark/commit/f8eeaae7152bd8fc8112a8a83ed8bbc97ba815ea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541487339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112003/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541487306 **[Test build #112003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112003/testReport)** for PR 25648 at commit [`f8eeaae`](https://github.com/apache/spark/commit/f8eeaae7152bd8fc8112a8a83ed8bbc97ba815ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541487337 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins removed a comment on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541487337 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
AmplabJenkins commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541487339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112003/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
SparkQA commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541486166 **[Test build #112003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112003/testReport)** for PR 25648 at commit [`f8eeaae`](https://github.com/apache/spark/commit/f8eeaae7152bd8fc8112a8a83ed8bbc97ba815ea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness
yaooqinn commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness URL: https://github.com/apache/spark/pull/25648#issuecomment-541486004 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] merrily01 commented on issue #26088: [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario
merrily01 commented on issue #26088: [SPARK-29436][K8S] Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario URL: https://github.com/apache/spark/pull/26088#issuecomment-541485438 Could you please kindly review ? @dongjoon-hyun @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541484029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112002/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541484028 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
SparkQA removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541481939 **[Test build #112002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112002/testReport)** for PR 25960 at commit [`4aa2dd2`](https://github.com/apache/spark/commit/4aa2dd2b7441b35fe80c1656dc8487bf4afef1c7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541484028 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541484029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112002/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541483996 **[Test build #112002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112002/testReport)** for PR 25960 at commit [`4aa2dd2`](https://github.com/apache/spark/commit/4aa2dd2b7441b35fe80c1656dc8487bf4afef1c7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WangGuangxin commented on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation
WangGuangxin commented on issue #26011: [SPARK-29343][SQL] Eliminate sorts without limit in the subquery of Join/Aggregation URL: https://github.com/apache/spark/pull/26011#issuecomment-541482908 @dongjoon-hyun @dilipbiswal @maropu @gatorsmile Could you please take a look at this PR when you have time? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541482108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17014/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541482104 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541482108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17014/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541482104 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541481992 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541481992 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541481998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112001/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541481998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112001/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541481939 **[Test build #112002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112002/testReport)** for PR 25960 at commit [`4aa2dd2`](https://github.com/apache/spark/commit/4aa2dd2b7441b35fe80c1656dc8487bf4afef1c7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
SparkQA removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465420 **[Test build #112001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112001/testReport)** for PR 26105 at commit [`7fc8117`](https://github.com/apache/spark/commit/7fc8117c30555046de313e19b78c445986b0297e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
SparkQA commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541481840 **[Test build #112001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112001/testReport)** for PR 26105 at commit [`7fc8117`](https://github.com/apache/spark/commit/7fc8117c30555046de313e19b78c445986b0297e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
LantaoJin commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-541481584 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26094: [SPARK-29442][SQL][PYSPARK] Set `default` mode should override the existing mode
dongjoon-hyun commented on a change in pull request #26094: [SPARK-29442][SQL][PYSPARK] Set `default` mode should override the existing mode URL: https://github.com/apache/spark/pull/26094#discussion_r334309190 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -77,7 +77,7 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { * `overwrite`: overwrite the existing data. * `append`: append the data. * `ignore`: ignore the operation (i.e. no-op). - * `error` or `errorifexists`: default option, throw an exception at runtime. + * `error`, `errorifexists`, or `default`: default option, throw an exception at runtime. Review comment: I also want to hide this `default` from the document. In any way, we need to fix the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26094: [SPARK-29442][SQL][PYSPARK] Set `default` mode should override the existing mode
dongjoon-hyun commented on a change in pull request #26094: [SPARK-29442][SQL][PYSPARK] Set `default` mode should override the existing mode URL: https://github.com/apache/spark/pull/26094#discussion_r334309160 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -87,10 +87,9 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { case "overwrite" => mode(SaveMode.Overwrite) case "append" => mode(SaveMode.Append) case "ignore" => mode(SaveMode.Ignore) - case "error" | "errorifexists" => mode(SaveMode.ErrorIfExists) - case "default" => this Review comment: Yes. It seems that we didn't document it before. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay
HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay URL: https://github.com/apache/spark/pull/24936#issuecomment-541469976 also cc. to @srowen as this PR helps to identify correctness problem described in #24890 in running query. Required background knowledge in this patch is similar to #24890 - we may need to evaluate which is better approach between #21617 and this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query
HeartSaVioR commented on issue #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query URL: https://github.com/apache/spark/pull/22952#issuecomment-541469585 cc. @tdas @zsxwing @jose-torres @gaborgsomogyi Kindly reminder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay
HeartSaVioR edited a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay URL: https://github.com/apache/spark/pull/24936#issuecomment-541469251 cc. @tdas @zsxwing @jose-torres @gaborgsomogyi Kindly reminder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay
HeartSaVioR commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark plus allowed delay URL: https://github.com/apache/spark/pull/24936#issuecomment-541469251 cc. @tdas @zsxwing @jose-torres @arunmahadevan @gaborgsomogyi Kindly reminder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26032: [SPARK-29361][SQL] Enable DataFrame with streaming source support on DSv1
HeartSaVioR commented on issue #26032: [SPARK-29361][SQL] Enable DataFrame with streaming source support on DSv1 URL: https://github.com/apache/spark/pull/26032#issuecomment-541468753 Closing this, as there was some explanation that DSv2 is the first one supporting streaming source. https://lists.apache.org/thread.html/2684c0fd155a21ef100377a23e135feeabd0b0a7a098ca5e40f20e37@%3Cdev.spark.apache.org%3E https://lists.apache.org/thread.html/3f0f5306b8d61a43023114bbcae7bb404ca5a0ddc3ba56f01876f8f6@ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #26032: [SPARK-29361][SQL] Enable DataFrame with streaming source support on DSv1
HeartSaVioR closed pull request #26032: [SPARK-29361][SQL] Enable DataFrame with streaming source support on DSv1 URL: https://github.com/apache/spark/pull/26032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17013/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
AmplabJenkins removed a comment on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
SparkQA commented on issue #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105#issuecomment-541465420 **[Test build #112001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112001/testReport)** for PR 26105 at commit [`7fc8117`](https://github.com/apache/spark/commit/7fc8117c30555046de313e19b78c445986b0297e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles
viirya opened a new pull request #26105: [SPARK-26570][SQL] Prevent OOM when transforming very many filestatus in InMemoryFileIndex.bulkListLeafFiles URL: https://github.com/apache/spark/pull/26105 ### What changes were proposed in this pull request? This PR proposes to wrap the collected SerializableFileStatus with SoftReference, when collecting file statuses in InMemoryFileIndex.bulkListLeafFiles. Then later when we transform SerializableFileStatus back to Status, if there is memory pressure, traversed SerializableFileStatus can be candidates for GC. ### Why are the changes needed? We get an array of (String, Seq[SerializableFileStatus]) when collecting file status in InMemoryFileIndex.bulkListLeafFiles. Then later we transform it back to sequence of (Path, Seq[FileStatus]). During the transforming, the items in the array are hold and can not be released by GC. Actually we double memory consumption here. When facing very many file status, this can be OOM. This change wraps the sequence of SerializableFileStatus with SoftReference in the array. So when we transform SerializableFileStatus back to Status, if there is memory pressure, traversed SerializableFileStatus can be candidates for GC. If SerializableFileStatus that is not traversed yet is cleared up by GC, we log necessary to users suggesting increasing driver memory. Then we re-list file status for the path. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org