[GitHub] [spark] AmplabJenkins removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706913968 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
SparkQA removed a comment on pull request #29975: URL: https://github.com/apache/spark/pull/29975#issuecomment-706811835 **[Test build #129650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129650/testReport)** for PR 29975 at commit [`2414bb0`](https://github.com/apache/spark/commit/2414bb041f47c00de28428014372ab6c55435e56). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA removed a comment on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706886780 **[Test build #129666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)** for PR 29893 at commit [`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
AmplabJenkins removed a comment on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706914459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
AmplabJenkins removed a comment on pull request #29975: URL: https://github.com/apache/spark/pull/29975#issuecomment-706913861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
SparkQA removed a comment on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706849086 **[Test build #129662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129662/testReport)** for PR 29881 at commit [`9ee1a86`](https://github.com/apache/spark/commit/9ee1a861e650f716870befcefd9820dc8425b9ae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
AmplabJenkins removed a comment on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706911022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
SparkQA commented on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706914439 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34269/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
AmplabJenkins commented on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706914459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
AmplabJenkins commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706913968 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
AmplabJenkins commented on pull request #29975: URL: https://github.com/apache/spark/pull/29975#issuecomment-706913861 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706913662 **[Test build #129666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)** for PR 29893 at commit [`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
HeartSaVioR commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-706912198 You're right. Would you like to go with revert & another PR, or it's just for information? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
SparkQA commented on pull request #29975: URL: https://github.com/apache/spark/pull/29975#issuecomment-706912508 **[Test build #129650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129650/testReport)** for PR 29975 at commit [`2414bb0`](https://github.com/apache/spark/commit/2414bb041f47c00de28428014372ab6c55435e56). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
HeartSaVioR edited a comment on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-706912198 You're right. Would you like to go with revert & another PR, or it's just for information? Either is fine for me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706911985 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34270/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
AmplabJenkins commented on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706911022 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
SparkQA commented on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706910030 **[Test build #129662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129662/testReport)** for PR 29881 at commit [`9ee1a86`](https://github.com/apache/spark/commit/9ee1a861e650f716870befcefd9820dc8425b9ae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30011: [WIP][SPARK-32281][SQL] Spark keep SORTED spec in metastore
SparkQA commented on pull request #30011: URL: https://github.com/apache/spark/pull/30011#issuecomment-706909487 **[Test build #129669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129669/testReport)** for PR 30011 at commit [`9364dd9`](https://github.com/apache/spark/commit/9364dd9699f2a756c191389a6923be70c6e22759). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706907596 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34271/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #30011: [WIP][SPARK-32281][SQL] Spark keep SORTED spec in metastore
AngersZh opened a new pull request #30011: URL: https://github.com/apache/spark/pull/30011 ### What changes were proposed in this pull request? Since Spark only support all ASC ordering bucket cols, and in `CatalogTable` meta it lose hive BUCKET sort info, after some command, spark update metadata, then SORT meta loss. ### Why are the changes needed? Spark shouldn't change Hive table meta of SORT BY ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? WIP This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #29890: [SPARK-33014][SQL] Support multiple bucket columns in DataSourceV2 table
LantaoJin commented on pull request #29890: URL: https://github.com/apache/spark/pull/29890#issuecomment-706906267 gentle ping @cloud-fan @HyukjinKwon @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero
leanken commented on pull request #29983: URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860 > > found two more place return Double.NaN with DivideByZero case. should I update these as well?? > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ?? > > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`? change into spark.sql.legacy.statisticalAggregate This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero
leanken edited a comment on pull request #29983: URL: https://github.com/apache/spark/pull/29983#issuecomment-706905860 > > found two more place return Double.NaN with DivideByZero case. should I update these as well?? > > and change ConfigName to spark.sql.legacy.divideByZeroEvaluation ?? > > Or, `spark.sql.legacy.centralMomentAndCovarianceAgg`? change into spark.sql.legacy.statisticalAggregate all these methods are statistical aggregation function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero
leanken commented on a change in pull request #29983: URL: https://github.com/apache/spark/pull/29983#discussion_r503069104 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala ## @@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto } test("windowing.q -- 15. testExpressions") { -// Moved because: -// - Spark uses a different default stddev (sample instead of pop) -// - Tiny numerical differences in stddev results. -// - Different StdDev behavior when n=1 (NaN instead of 0) -checkAnswer(sql(s""" - |select p_mfgr,p_name, p_size, - |rank() over(distribute by p_mfgr sort by p_name) as r, - |dense_rank() over(distribute by p_mfgr sort by p_name) as dr, - |cume_dist() over(distribute by p_mfgr sort by p_name) as cud, - |percent_rank() over(distribute by p_mfgr sort by p_name) as pr, - |ntile(3) over(distribute by p_mfgr sort by p_name) as nt, - |count(p_size) over(distribute by p_mfgr sort by p_name) as ca, - |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg, - |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st, - |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv, - |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv, - |first_value(p_size) over w1 as fvW1 - |from part - |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name - | rows between 2 preceding and 2 following) +withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") { Review comment: done. ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala ## @@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto } test("windowing.q -- 15. testExpressions") { -// Moved because: -// - Spark uses a different default stddev (sample instead of pop) -// - Tiny numerical differences in stddev results. -// - Different StdDev behavior when n=1 (NaN instead of 0) -checkAnswer(sql(s""" - |select p_mfgr,p_name, p_size, - |rank() over(distribute by p_mfgr sort by p_name) as r, - |dense_rank() over(distribute by p_mfgr sort by p_name) as dr, - |cume_dist() over(distribute by p_mfgr sort by p_name) as cud, - |percent_rank() over(distribute by p_mfgr sort by p_name) as pr, - |ntile(3) over(distribute by p_mfgr sort by p_name) as nt, - |count(p_size) over(distribute by p_mfgr sort by p_name) as ca, - |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg, - |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st, - |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv, - |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv, - |first_value(p_size) over w1 as fvW1 - |from part - |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name - | rows between 2 preceding and 2 following) +withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") { + // Moved because: + // - Spark uses a different default stddev (sample instead of pop) + // - Tiny numerical differences in stddev results. + // - Different StdDev behavior when n=1 (NaN instead of 0) Review comment: done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala ## @@ -174,7 +175,9 @@ case class StddevSamp(child: Expression) extends CentralMomentAgg(child) { override val evaluateExpression: Expression = { If(n === 0.0, Literal.create(null, DoubleType), - If(n === 1.0, Double.NaN, sqrt(m2 / (n - 1.0 + If(n === 1.0, +if (SQLConf.get.legacyCentralMomentAgg) Double.NaN else Literal.create(null, DoubleType), Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29983: [SPARK-13860][SQL] Change statistical aggregate function to return null instead of Double.NaN when divideByZero
SparkQA commented on pull request #29983: URL: https://github.com/apache/spark/pull/29983#issuecomment-706905385 **[Test build #129668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129668/testReport)** for PR 29983 at commit [`16bc5d5`](https://github.com/apache/spark/commit/16bc5d5c4310452fcbd0dea6e7e19e2fcbd10442). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
SparkQA commented on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706904773 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34269/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29880: [SPARK-33004][SQL] Migrate DESCRIBE column to use UnresolvedTableOrView to resolve the identifier
cloud-fan commented on a change in pull request #29880: URL: https://github.com/apache/spark/pull/29880#discussion_r503068179 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -315,6 +315,17 @@ case class DescribeRelation( override def output: Seq[Attribute] = DescribeTableSchema.describeTableAttributes() } +/** + * The logical plan of the DESCRIBE relation_name col_name command that works for v2 tables. + */ +case class DescribeColumn( +relation: LogicalPlan, +colNameParts: Seq[String], Review comment: Yea we can't break it. When we add v2 `DecribeColumExec` later, we may want to change `Seq[String]` to `Attribute` here, so that we can unify the column resolution logic for v1 and v2 commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
AmplabJenkins removed a comment on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-706898226 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
SparkQA commented on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-706898191 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34268/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
AmplabJenkins commented on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-706898226 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706895987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
AmplabJenkins commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706895987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706895485 **[Test build #129667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)** for PR 27735 at commit [`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA removed a comment on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706886873 **[Test build #129667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)** for PR 27735 at commit [`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on a change in pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
ScrapCodes commented on a change in pull request #27735: URL: https://github.com/apache/spark/pull/27735#discussion_r503061670 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountConfDirExecutorFeatureStep.scala ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.k8s.features + +import scala.collection.JavaConverters._ + +import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder} + +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod} +import org.apache.spark.deploy.k8s.Constants.{SPARK_CONF_DIR_INTERNAL, SPARK_CONF_VOL_EXEC} +import org.apache.spark.deploy.k8s.submit.KubernetesClientUtils + +/** + * Populate the contents of SPARK_CONF_DIR, except spark-(conf|properties) and templates to + * configMap to be mounted on executor pods. + */ +private[spark] class MountConfDirExecutorFeatureStep(conf: KubernetesConf) Review comment: Hi @dongjoon-hyun , thanks for taking a look. It can be merged with `BasicExecutorFeatureStep`, I found that step is already doing a lot of stuff. Do you think keeping it like this improves readability, I have no strong preference one way or the other. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue
AmplabJenkins removed a comment on pull request #30008: URL: https://github.com/apache/spark/pull/30008#issuecomment-706888990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue
AmplabJenkins commented on pull request #30008: URL: https://github.com/apache/spark/pull/30008#issuecomment-706888990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
dongjoon-hyun commented on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706888560 Thank you for review and approval, @HyukjinKwon and @maropu . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue
SparkQA removed a comment on pull request #30008: URL: https://github.com/apache/spark/pull/30008#issuecomment-706823341 **[Test build #129654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129654/testReport)** for PR 30008 at commit [`1161bd0`](https://github.com/apache/spark/commit/1161bd073e9fd4012903fd8fe05d62b198c60824). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30008: [SPARK-21708][BUILD][FOLLOWUP] Rename hdpVersion to hadoopVersionValue
SparkQA commented on pull request #30008: URL: https://github.com/apache/spark/pull/30008#issuecomment-706887924 **[Test build #129654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129654/testReport)** for PR 30008 at commit [`1161bd0`](https://github.com/apache/spark/commit/1161bd073e9fd4012903fd8fe05d62b198c60824). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
SparkQA commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706886780 **[Test build #129666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129666/testReport)** for PR 29893 at commit [`c76a2aa`](https://github.com/apache/spark/commit/c76a2aacb86411ef9d0c58ac996b274dce49beb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27735: [SPARK-30985][k8s] Support propagating SPARK_CONF_DIR files to driver and executor pods.
SparkQA commented on pull request #27735: URL: https://github.com/apache/spark/pull/27735#issuecomment-706886873 **[Test build #129667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129667/testReport)** for PR 27735 at commit [`20fe56f`](https://github.com/apache/spark/commit/20fe56fbe80b3e2714dcd4fcae7ebfceb95e5443). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
SparkQA commented on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-706886426 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34268/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
Ngone51 commented on a change in pull request #29855: URL: https://github.com/apache/spark/pull/29855#discussion_r503055352 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.net.ConnectException; + +/** + * Plugs into {@link RetryingBlockFetcher} to further control when an exception should be retried + * and logged. + * Note: {@link RetryingBlockFetcher} will delegate the exception to this handler only when + * - remaining retries < max retries + * - exception is an IOException + */ + +public interface ErrorHandler { + + boolean shouldRetryError(Throwable t); + + default boolean shouldLogError(Throwable t) { +return true; + } + + /** + * A no-op error handler instance. + */ + ErrorHandler NOOP_ERROR_HANDLER = t -> true; + + /** + * The error handler for pushing shuffle blocks to remote shuffle services. + */ + class BlockPushErrorHandler implements ErrorHandler { +/** + * String constant used for generating exception messages indicating a block to be merged + * arrives too late on the server side, and also for later checking such exceptions on the + * client side. When we get a block push failure because of the block arrives too late, we + * will not retry pushing the block nor log the exception on the client side. + */ +public static final String TOO_LATE_MESSAGE_SUFFIX = +"received after merged shuffle is finalized"; + +/** + * String constant used for generating exception messages indicating the server couldn't + * append a block after all available attempts due to collision with other blocks belonging Review comment: BTW, how do you know the partition info of a certain shuffle block? Do you use the `mapId` of a shuffle block? If so, I think you should be aware of the new shuffle fetch protocol in 3.0, that we use `taskAttemptId`(which is unique among multiple task attempts) instead of `partitionId` as `mapId` by default. This might break the current assumption of preventing the collision. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #29893: [SPARK-32976][SQL]Support column list in INSERT statement
yaooqinn commented on pull request #29893: URL: https://github.com/apache/spark/pull/29893#issuecomment-706885025 Thanks for reviewing, sorry for the late reply as I spent a long holiday with my family. I've addressed the comment, please check again @gatorsmile @cloud-fan @wangyum thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
AmplabJenkins removed a comment on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706879549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode
AmplabJenkins removed a comment on pull request #29644: URL: https://github.com/apache/spark/pull/29644#issuecomment-706879116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode
SparkQA removed a comment on pull request #29644: URL: https://github.com/apache/spark/pull/29644#issuecomment-706823356 **[Test build #129656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129656/testReport)** for PR 29644 at commit [`a0bdce9`](https://github.com/apache/spark/commit/a0bdce946d8d5df9a9dcf480e4a7a00ff43a0335). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29767: [SPARK-32896][SS] Add DataStreamWriter.table API
cloud-fan commented on pull request #29767: URL: https://github.com/apache/spark/pull/29767#issuecomment-706883233 We need to update the PR title, it's saveAsTable not table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
SparkQA commented on pull request #30010: URL: https://github.com/apache/spark/pull/30010#issuecomment-706880332 **[Test build #129665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129665/testReport)** for PR 30010 at commit [`aa7004e`](https://github.com/apache/spark/commit/aa7004e44a948ad86f6756e6ca3085e75462ee47). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
cloud-fan commented on a change in pull request #29975: URL: https://github.com/apache/spark/pull/29975#discussion_r503051694 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging { case ref: BoundReference if ctx.currentVars != null && ctx.currentVars(ref.ordinal) != null => - val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal) - collectLocalVariable(value) - collectLocalVariable(isNull) + val exprCode = ctx.currentVars(ref.ordinal) + // If the referred variable is not evaluated yet. + if (exprCode.code != EmptyBlock) { +exprCodesNeedEvaluate += exprCode.copy() Review comment: Sorry I was wrong, the copy here is necessary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
viirya commented on a change in pull request #29975: URL: https://github.com/apache/spark/pull/29975#discussion_r503051291 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging { case ref: BoundReference if ctx.currentVars != null && ctx.currentVars(ref.ordinal) != null => - val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal) - collectLocalVariable(value) - collectLocalVariable(isNull) + val exprCode = ctx.currentVars(ref.ordinal) + // If the referred variable is not evaluated yet. + if (exprCode.code != EmptyBlock) { +exprCodesNeedEvaluate += exprCode.copy() Review comment: `exprCode.code = EmptyBlock`, do you mean this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29544: [SPARK-32704][SQL] Logging plan changes for execution
maropu commented on a change in pull request #29544: URL: https://github.com/apache/spark/pull/29544#discussion_r503051134 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -229,17 +229,17 @@ object SQLConf { "'trace', 'debug', 'info', 'warn' and 'error'.") .createWithDefault("trace") - val OPTIMIZER_PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.optimizer.planChangeLog.rules") + val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules") .internal() -.doc("Configures a list of rules to be logged in the optimizer, in which the rules are " + +.doc("Configures a list of rules for logging plan changes, in which the rules are " + "specified by their rule names and separated by comma.") .version("3.0.0") .stringConf .createOptional - val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.optimizer.planChangeLog.batches") + val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches") .internal() -.doc("Configures a list of batches to be logged in the optimizer, in which the batches " + +.doc("Configures a list of batches for logging plan changes, in which the batches " + "are specified by their batch names and separated by comma.") .version("3.0.0") Review comment: Ah, I missed it. I'll update it in followup. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -229,17 +229,17 @@ object SQLConf { "'trace', 'debug', 'info', 'warn' and 'error'.") .createWithDefault("trace") - val OPTIMIZER_PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.optimizer.planChangeLog.rules") + val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules") .internal() -.doc("Configures a list of rules to be logged in the optimizer, in which the rules are " + +.doc("Configures a list of rules for logging plan changes, in which the rules are " + "specified by their rule names and separated by comma.") .version("3.0.0") .stringConf .createOptional - val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.optimizer.planChangeLog.batches") + val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches") .internal() -.doc("Configures a list of batches to be logged in the optimizer, in which the batches " + +.doc("Configures a list of batches for logging plan changes, in which the batches " + "are specified by their batch names and separated by comma.") .version("3.0.0") Review comment: Ah, I missed it. I'll update it in followup. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
SparkQA commented on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706879523 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34267/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
AmplabJenkins commented on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706879549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode
AmplabJenkins commented on pull request #29644: URL: https://github.com/apache/spark/pull/29644#issuecomment-706879116 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29975: [SPARK-33092][SQL] Support subexpression elimination in ProjectExec
cloud-fan commented on a change in pull request #29975: URL: https://github.com/apache/spark/pull/29975#discussion_r503050704 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1761,16 +1772,21 @@ object CodeGenerator extends Logging { case ref: BoundReference if ctx.currentVars != null && ctx.currentVars(ref.ordinal) != null => - val ExprCode(_, isNull, value) = ctx.currentVars(ref.ordinal) - collectLocalVariable(value) - collectLocalVariable(isNull) + val exprCode = ctx.currentVars(ref.ordinal) + // If the referred variable is not evaluated yet. + if (exprCode.code != EmptyBlock) { +exprCodesNeedEvaluate += exprCode.copy() Review comment: > We need to empty code of exprCode so we don't re-evaluate the code. AFAIK when we empty the code, we also do a copy, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #30010: [SPARK-33117][BUILD] Update zstd-jni to 1.4.5-6
dongjoon-hyun opened a new pull request #30010: URL: https://github.com/apache/spark/pull/30010 ### What changes were proposed in this pull request? This PR aims to upgrade ZStandard library for Apache Spark 3.1.0. ### Why are the changes needed? This will bring the latest bug fixes. - https://github.com/luben/zstd-jni/commit/2662fbdc320ce482a24c20b8fcac8b1d5b79fe33 - https://github.com/luben/zstd-jni/commit/bbe140b758be2e0ba64566e16d44cafd6e4ba142 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29644: [SPARK-32598][Scheduler] Fix missing driver logs under UI App-Executors tab in standalone cluster mode
SparkQA commented on pull request #29644: URL: https://github.com/apache/spark/pull/29644#issuecomment-706878158 **[Test build #129656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129656/testReport)** for PR 29644 at commit [`a0bdce9`](https://github.com/apache/spark/commit/a0bdce946d8d5df9a9dcf480e4a7a00ff43a0335). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #29544: [SPARK-32704][SQL] Logging plan changes for execution
gatorsmile commented on a change in pull request #29544: URL: https://github.com/apache/spark/pull/29544#discussion_r503049350 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -229,17 +229,17 @@ object SQLConf { "'trace', 'debug', 'info', 'warn' and 'error'.") .createWithDefault("trace") - val OPTIMIZER_PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.optimizer.planChangeLog.rules") + val PLAN_CHANGE_LOG_RULES = buildConf("spark.sql.planChangeLog.rules") .internal() -.doc("Configures a list of rules to be logged in the optimizer, in which the rules are " + +.doc("Configures a list of rules for logging plan changes, in which the rules are " + "specified by their rule names and separated by comma.") .version("3.0.0") .stringConf .createOptional - val OPTIMIZER_PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.optimizer.planChangeLog.batches") + val PLAN_CHANGE_LOG_BATCHES = buildConf("spark.sql.planChangeLog.batches") .internal() -.doc("Configures a list of batches to be logged in the optimizer, in which the batches " + +.doc("Configures a list of batches for logging plan changes, in which the batches " + "are specified by their batch names and separated by comma.") .version("3.0.0") Review comment: The version of these three configurations needs to be updated to 3.1.0? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
AmplabJenkins removed a comment on pull request #29817: URL: https://github.com/apache/spark/pull/29817#issuecomment-706875853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
AmplabJenkins removed a comment on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706875583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
AmplabJenkins commented on pull request #29817: URL: https://github.com/apache/spark/pull/29817#issuecomment-706875853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
AmplabJenkins commented on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706875583 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
SparkQA commented on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706875569 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34266/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
SparkQA removed a comment on pull request #29817: URL: https://github.com/apache/spark/pull/29817#issuecomment-706821615 **[Test build #129652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129652/testReport)** for PR 29817 at commit [`17bbbd9`](https://github.com/apache/spark/commit/17bbbd9461a898c898b87635e4167ef5e01360d2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
SparkQA commented on pull request #29817: URL: https://github.com/apache/spark/pull/29817#issuecomment-706874710 **[Test build #129652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129652/testReport)** for PR 29817 at commit [`17bbbd9`](https://github.com/apache/spark/commit/17bbbd9461a898c898b87635e4167ef5e01360d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
AmplabJenkins removed a comment on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706874052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
AmplabJenkins commented on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706874052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
SparkQA removed a comment on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706855156 **[Test build #129663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129663/testReport)** for PR 29933 at commit [`942ce58`](https://github.com/apache/spark/commit/942ce58331e5c535f7e3f1885f4caf3a93eb176d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
SparkQA commented on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706873675 **[Test build #129663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129663/testReport)** for PR 29933 at commit [`942ce58`](https://github.com/apache/spark/commit/942ce58331e5c535f7e3f1885f4caf3a93eb176d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29933: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver
SparkQA commented on pull request #29933: URL: https://github.com/apache/spark/pull/29933#issuecomment-706871871 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34267/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero
leanken commented on pull request #29983: URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405 > > Seems plausible to me. > > ``` > case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) { > override val evaluateExpression: Expression = { > If(n === 0.0, Literal.create(null, DoubleType), > If(n === 1.0, Double.NaN, ck / (n - 1.0))) > } > > case class Corr(x: Expression, y: Expression) > extends PearsonCorrelation(x, y) { > > override val evaluateExpression: Expression = { > If(n === 0.0, Literal.create(null, DoubleType), > If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk))) > } > ``` > > found two more place return Double.NaN with DivideByZero case. should I update these as well?? > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ?? @cloud-fan how about CovSample and Corr ?? should I update them as well in these patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706869300 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34265/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706869315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706869315 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken edited a comment on pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero
leanken edited a comment on pull request #29983: URL: https://github.com/apache/spark/pull/29983#issuecomment-706869405 > > Seems plausible to me. > > ``` > case class CovSample(left: Expression, right: Expression) extends Covariance(left, right) { > override val evaluateExpression: Expression = { > If(n === 0.0, Literal.create(null, DoubleType), > If(n === 1.0, Double.NaN, ck / (n - 1.0))) > } > > case class Corr(x: Expression, y: Expression) > extends PearsonCorrelation(x, y) { > > override val evaluateExpression: Expression = { > If(n === 0.0, Literal.create(null, DoubleType), > If(n === 1.0, Double.NaN, ck / sqrt(xMk * yMk))) > } > ``` > > found two more place return Double.NaN with DivideByZero case. should I update these as well?? > and change ConfigName to `spark.sql.legacy.divideByZeroEvaluation` ?? @cloud-fan how about CovSample and Corr ?? should I update them as well in this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
SparkQA commented on pull request #29881: URL: https://github.com/apache/spark/pull/29881#issuecomment-706868381 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34266/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
SparkQA commented on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-706867654 **[Test build #129664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129664/testReport)** for PR 29965 at commit [`77923c5`](https://github.com/apache/spark/commit/77923c5542ba762646c8a56fe97a1a2e3f553fab). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29983: [SPARK-13860][SQL] Change CentralMomentAgg to return null instead of Double.NaN when divideByZero
leanken commented on a change in pull request #29983: URL: https://github.com/apache/spark/pull/29983#discussion_r503043290 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala ## @@ -59,56 +60,115 @@ class WindowQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleto } test("windowing.q -- 15. testExpressions") { -// Moved because: -// - Spark uses a different default stddev (sample instead of pop) -// - Tiny numerical differences in stddev results. -// - Different StdDev behavior when n=1 (NaN instead of 0) -checkAnswer(sql(s""" - |select p_mfgr,p_name, p_size, - |rank() over(distribute by p_mfgr sort by p_name) as r, - |dense_rank() over(distribute by p_mfgr sort by p_name) as dr, - |cume_dist() over(distribute by p_mfgr sort by p_name) as cud, - |percent_rank() over(distribute by p_mfgr sort by p_name) as pr, - |ntile(3) over(distribute by p_mfgr sort by p_name) as nt, - |count(p_size) over(distribute by p_mfgr sort by p_name) as ca, - |avg(p_size) over(distribute by p_mfgr sort by p_name) as avg, - |stddev(p_size) over(distribute by p_mfgr sort by p_name) as st, - |first_value(p_size % 5) over(distribute by p_mfgr sort by p_name) as fv, - |last_value(p_size) over(distribute by p_mfgr sort by p_name) as lv, - |first_value(p_size) over w1 as fvW1 - |from part - |window w1 as (distribute by p_mfgr sort by p_mfgr, p_name - | rows between 2 preceding and 2 following) +withSQLConf(SQLConf.LEGACY_CENTRAL_MOMENT_AGG.key -> "true") { Review comment: I see your point. the legacy coverage code will only stay in DataFrameWindowFunctionsSuite, other UT will just be updated to expected answer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
AmplabJenkins removed a comment on pull request #30005: URL: https://github.com/apache/spark/pull/30005#issuecomment-706865981 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129655/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
leanken commented on a change in pull request #29965: URL: https://github.com/apache/spark/pull/29965#discussion_r503042637 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala ## @@ -341,7 +341,7 @@ class SQLAppStatusListener( val exec = getOrCreateExecution(executionId) exec.physicalPlanDescription = physicalPlanDescription -exec.metrics = sqlPlanMetrics +exec.metrics = exec.metrics ++ sqlPlanMetrics Review comment: done ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala ## @@ -754,8 +754,6 @@ class SQLAppStatusListenerSuite extends SharedSparkSession with JsonTestUtils SparkPlanGraph(newPlan) .allNodes.flatMap(_.metrics.map(_.accumulatorId)) -// Assume that AQE update sparkPlanInfo with newPlan -// ExecutionMetrics will be replaced using newPlan's SQLMetrics Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
AmplabJenkins commented on pull request #30005: URL: https://github.com/apache/spark/pull/30005#issuecomment-706865978 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
AmplabJenkins removed a comment on pull request #30005: URL: https://github.com/apache/spark/pull/30005#issuecomment-706865978 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
SparkQA removed a comment on pull request #30005: URL: https://github.com/apache/spark/pull/30005#issuecomment-706823328 **[Test build #129655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129655/testReport)** for PR 30005 at commit [`286ccc5`](https://github.com/apache/spark/commit/286ccc5d3025b6ff2da34c22e670746fd92e21f4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30005: [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier
SparkQA commented on pull request #30005: URL: https://github.com/apache/spark/pull/30005#issuecomment-706865108 **[Test build #129655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129655/testReport)** for PR 30005 at commit [`286ccc5`](https://github.com/apache/spark/commit/286ccc5d3025b6ff2da34c22e670746fd92e21f4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
AngersZh commented on a change in pull request #29881: URL: https://github.com/apache/spark/pull/29881#discussion_r503041748 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -396,6 +417,93 @@ private[spark] object HiveUtils extends Logging { config = configurations, barrierPrefixes = hiveMetastoreBarrierPrefixes, sharedPrefixes = hiveMetastoreSharedPrefixes) +} else if (hiveMetastoreJars == "path") { + + def addLocalHiveJars(file: File): Seq[URL] = { +if (file.getName == "*") { + val files = file.getParentFile.listFiles() + if (files == null) { +logWarning(s"Hive jar path '${file.getPath}' does not exist.") +Nil + } else { + files.filter(_.getName.toLowerCase(Locale.ROOT).endsWith(".jar")).map(_.toURL).toSeq + } +} else { + file.toURL :: Nil +} + } + + def checkRemoteHiveJars(path: String): Seq[URL] = { +try { + val hadoopPath = new Path(path) + val fs = hadoopPath.getFileSystem(hadoopConf) + if (hadoopPath.getName == "*") { +val parent = hadoopPath.getParent +if (!fs.exists(parent)) { + logWarning(s"Hive Jar ${path} does not exist.") + Nil +} else if (!fs.getFileStatus(parent).isDirectory) { + logWarning(s"Hive Jar ${parent} is not a directory.") + Nil +} else { + fs.listStatus(parent).map(_.getPath.toUri.toURL) +} + } else { +if (!fs.exists(hadoopPath)) { + logWarning(s"Hive Jar ${path} does not exist.") + Nil +} else if (fs.getFileStatus(hadoopPath).isDirectory) { + logWarning(s"Hive Jar ${path} not allow directory without `*`") + Nil +} else { + // Since tar/tar.gz file we can't know it's final path yet, not support it + hadoopPath.toUri.toURL :: Nil Review comment: > I see. So Hadoop side has a `URLStreamHandlerFactory` impl which provides handlers for the Hadoop-specific URL schemes. Not sure about the S3 error though :-/ Yea, dig into the code, seems s3 find config will call URLClassLoader and not equal make it reload element again and again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706863643 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34264/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706863638 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
AmplabJenkins commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706863638 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706863628 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34264/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile edited a comment on pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash
gatorsmile edited a comment on pull request #29552: URL: https://github.com/apache/spark/pull/29552#issuecomment-706862532 I like the concept of Trash, but I think this PR might just resolve a very specific issue by introducing a mechanism without a proper design doc. This could make the usage more complex. I think we need to consider the big picture. Trash directory is an important concept. If we decide to introduce it, we should consider all the code paths of Spark SQL that could delete the data, instead of Truncate only. We also need to consider what is the current behavior if the underlying file system does not provide the API `Trash.moveToAppropriateTrash`. Is the exception good? How about the performance when users are using the object store instead of HDFS? Will it impact the GDPR compliance? I think we should not merge the PR without the design doc and implementation plan. Above just lists some questions I have. The other community members might have more relevant questions/issues we need to resolve. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #29552: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash
gatorsmile commented on pull request #29552: URL: https://github.com/apache/spark/pull/29552#issuecomment-706862532 I like the concept of Trash, but I think this PR might just resolve a very specific issue by introducing a mechanism without a proper design doc. This could make the usage more complex. I think we need to consider the big picture. Trash directory is an important concept. If we decide to introduce it, we should consider all the code paths of Spark SQL that could delete the data, instead of Truncate only. We also need to consider what is the current behavior if the underlying file system does not provide the API `Trash.moveToAppropriateTrash`. Is the exception good? How about the performance when users are using the object store instead of HDFS? Will it impact the GDPR compliance? Without resolving the above issues, I think we should not merge the PR without the design and implementation plan. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29999: [SPARK-33045][SQL] Support build-in function like_all and fix StackOverflowError issue.
SparkQA commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-706862463 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34265/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
Ngone51 commented on a change in pull request #29855: URL: https://github.com/apache/spark/pull/29855#discussion_r503040276 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.net.ConnectException; + +import com.google.common.base.Throwables; + +/** + * Plugs into {@link RetryingBlockFetcher} to further control when an exception should be retried + * and logged. + * Note: {@link RetryingBlockFetcher} will delegate the exception to this handler only when + * - remaining retries < max retries + * - exception is an IOException + */ + +public interface ErrorHandler { + + boolean shouldRetryError(Throwable t); + + default boolean shouldLogError(Throwable t) { +return true; + } + + /** + * A no-op error handler instance. + */ + ErrorHandler NOOP_ERROR_HANDLER = t -> true; + + /** + * The error handler for pushing shuffle blocks to remote shuffle services. + */ + class BlockPushErrorHandler implements ErrorHandler { +/** + * String constant used for generating exception messages indicating a block to be merged + * arrives too late on the server side, and also for later checking such exceptions on the + * client side. When we get a block push failure because of the block arrives too late, we + * will not retry pushing the block nor log the exception on the client side. + */ +public static final String TOO_LATE_MESSAGE_SUFFIX = + "received after merged shuffle is finalized"; + +/** + * String constant used for generating exception messages indicating the server couldn't + * append a block after all available attempts due to collision with other blocks belonging + * to the same shuffle partition, and also for later checking such exceptions on the client + * side. When we get a block push failure because of the block couldn't be written due to + * this reason, we will not log the exception on the client side. + */ +public static final String COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX = + "Couldn't find an opportunity to write block"; + +@Override +public boolean shouldRetryError(Throwable t) { + // If it is a connection time out or a connection closed exception, no need to retry. + if (t.getCause() != null && t.getCause() instanceof ConnectException) { +return false; + } + // If the block is too late, there is no need to retry it + return !Throwables.getStackTraceAsString(t).contains(TOO_LATE_MESSAGE_SUFFIX); +} + +@Override +public boolean shouldLogError(Throwable t) { + String errorStackTrace = Throwables.getStackTraceAsString(t); + return !errorStackTrace.contains(COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX) && +!errorStackTrace.contains(TOO_LATE_MESSAGE_SUFFIX); Review comment: > You mean the one on line 84? It should be `&&`, since we need to check both places. If we need to check both places, shouldn't we ensure the error message contains 2 COULD_NOT_FIND_OPPORTUNITY_MSG_PREFIX / TOO_LATE_MESSAGE_SUFFIX separately? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao edited a comment on pull request #29843: [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao edited a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-706824666 I'll take a look at the test failure which is new. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #29881: [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location
sunchao commented on a change in pull request #29881: URL: https://github.com/apache/spark/pull/29881#discussion_r503039921 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala ## @@ -396,6 +417,93 @@ private[spark] object HiveUtils extends Logging { config = configurations, barrierPrefixes = hiveMetastoreBarrierPrefixes, sharedPrefixes = hiveMetastoreSharedPrefixes) +} else if (hiveMetastoreJars == "path") { + + def addLocalHiveJars(file: File): Seq[URL] = { +if (file.getName == "*") { + val files = file.getParentFile.listFiles() + if (files == null) { +logWarning(s"Hive jar path '${file.getPath}' does not exist.") +Nil + } else { + files.filter(_.getName.toLowerCase(Locale.ROOT).endsWith(".jar")).map(_.toURL).toSeq + } +} else { + file.toURL :: Nil +} + } + + def checkRemoteHiveJars(path: String): Seq[URL] = { +try { + val hadoopPath = new Path(path) + val fs = hadoopPath.getFileSystem(hadoopConf) + if (hadoopPath.getName == "*") { +val parent = hadoopPath.getParent +if (!fs.exists(parent)) { + logWarning(s"Hive Jar ${path} does not exist.") + Nil +} else if (!fs.getFileStatus(parent).isDirectory) { + logWarning(s"Hive Jar ${parent} is not a directory.") + Nil +} else { + fs.listStatus(parent).map(_.getPath.toUri.toURL) +} + } else { +if (!fs.exists(hadoopPath)) { + logWarning(s"Hive Jar ${path} does not exist.") + Nil +} else if (fs.getFileStatus(hadoopPath).isDirectory) { + logWarning(s"Hive Jar ${path} not allow directory without `*`") + Nil +} else { + // Since tar/tar.gz file we can't know it's final path yet, not support it + hadoopPath.toUri.toURL :: Nil Review comment: I see. So Hadoop side has a `URLStreamHandlerFactory` impl which provides handlers for the Hadoop-specific URL schemes. Not sure about the S3 error though :-/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
Ngone51 commented on a change in pull request #29855: URL: https://github.com/apache/spark/pull/29855#discussion_r503039472 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockPusher.java ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.shuffle; + +import java.nio.ByteBuffer; +import java.util.Arrays; +import java.util.Map; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.spark.network.buffer.ManagedBuffer; +import org.apache.spark.network.buffer.NioManagedBuffer; +import org.apache.spark.network.client.RpcResponseCallback; +import org.apache.spark.network.client.TransportClient; +import org.apache.spark.network.shuffle.protocol.PushBlockStream; + +/** + * Similar to {@link OneForOneBlockFetcher}, but for pushing blocks to remote shuffle service to + * be merged instead of for fetching them from remote shuffle services. This is used by + * ShuffleWriter when the block push process is initiated. The supplied BlockFetchingListener + * is used to handle the success or failure in pushing each blocks. + */ +public class OneForOneBlockPusher { + private static final Logger logger = LoggerFactory.getLogger(OneForOneBlockPusher.class); + private static final ErrorHandler PUSH_ERROR_HANDLER = new ErrorHandler.BlockPushErrorHandler(); + + private final TransportClient client; + private final String appId; + private final String[] blockIds; + private final BlockFetchingListener listener; + private final Map buffers; + + public OneForOneBlockPusher( + TransportClient client, + String appId, + String[] blockIds, + BlockFetchingListener listener, + Map buffers) { +this.client = client; +this.appId = appId; +this.blockIds = blockIds; +this.listener = listener; +this.buffers = buffers; + } + + private class BlockPushCallback implements RpcResponseCallback { + +private int index; +private String blockId; + +BlockPushCallback(int index, String blockId) { + this.index = index; + this.blockId = blockId; +} + +@Override +public void onSuccess(ByteBuffer response) { + // On receipt of a successful block push + listener.onBlockFetchSuccess(blockId, new NioManagedBuffer(ByteBuffer.allocate(0))); +} + +@Override +public void onFailure(Throwable e) { + // Since block push is best effort, i.e., if we encountered a block push failure that's not + // retriable or exceeding the max retires, we should not fail all remaining block pushes. + // The best effort nature makes block push tolerable of a partial completion. Thus, we only + // fail the block that's actually failed. Not that, on the RetryingBlockFetcher side, once + // retry is initiated, it would still invalidate the previous active retry listener, and + // retry all outstanding blocks. We are preventing forwarding unnecessary block push failures + // to the parent listener of the retry listener. The only exceptions would be if the block + // push failure is due to block arriving on the server side after merge finalization, or the + // client fails to establish connection to the server side. In both cases, we would fail all + // remaining blocks. + if (PUSH_ERROR_HANDLER.shouldRetryError(e)) { +String[] targetBlockId = Arrays.copyOfRange(blockIds, index, index + 1); +failRemainingBlocks(targetBlockId, e); + } else { +String[] targetBlockId = Arrays.copyOfRange(blockIds, index, blockIds.length); Review comment: > Here, if the exception is something that will happen for every block push, i.e., blocking being too late or we have a client side connection issue, all remaining blocks will fail. Would it have any side effect by calling `failRemainingBlocks` for multiple times for the same block in this case? I mean, we may call `failRemainingBlocks` for multiple times like `failRemainingBlocks(Array(0, 1, ..., 4), e)`, `failRemainingBlocks(Array(1, ..., 4), e)`, ..., `failRemainingBlocks(Array(4), e)`. -
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30006: [SPARK-33106][BUILD] Fix resolvers clash in SBT
AmplabJenkins removed a comment on pull request #30006: URL: https://github.com/apache/spark/pull/30006#issuecomment-706860082 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org