[GitHub] [spark] cloud-fan commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`
cloud-fan commented on a change in pull request #30363: URL: https://github.com/apache/spark/pull/30363#discussion_r568391121 ## File path: core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URL + +import scala.collection.JavaConverters._ +import scala.io.Source +import scala.reflect.runtime.{universe => ru} +import scala.util.control.NonFatal + +import org.apache.spark.internal.Logging + +object SparkConfRegisterLoader extends Logging { Review comment: Yea it's easier to just list all the known config objects, but I agree with @linhongliu-db that some modules are optional and we should only load the objects from modules when the modules are loaded. FYI all modules under the `external` directory are not loaded by default, and if they start to define configs, we will need this framework. It's also true that other spark vendors may define their own configs. This is similar to the data source registry resource files, so I don't have a big concern here as this is not the first time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
turboFei commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771444379 The UT passed in our spark-2.3 branch, will try to find out why it failed in community master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA commented on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771444120 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39352/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
SparkQA commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771443108 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39349/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
AmplabJenkins removed a comment on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771442106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134753/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771442108 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134760/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31286: [SPARK-34199][SQL] Block `count(table.*)` to follow ANSI standard and other SQL engines
AmplabJenkins removed a comment on pull request #31286: URL: https://github.com/apache/spark/pull/31286#issuecomment-771442107 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134754/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771442108 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134760/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
AmplabJenkins commented on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771442106 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134753/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31286: [SPARK-34199][SQL] Block `count(table.*)` to follow ANSI standard and other SQL engines
AmplabJenkins commented on pull request #31286: URL: https://github.com/apache/spark/pull/31286#issuecomment-771442107 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134754/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
HyukjinKwon commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568386466 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2472,6 +2472,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = +buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") + .internal() + .doc("If it is set to true, a null partition value is parsed as a string literal of its " + +"text representation, e.g., string 'null'. Otherwise, null partition values are parsed " + +"as they are.") + .version("3.2.0") Review comment: I think we have documented only in 3.0.2 so far in such cases .. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines
cloud-fan closed pull request #31286: URL: https://github.com/apache/spark/pull/31286 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines
cloud-fan commented on pull request #31286: URL: https://github.com/apache/spark/pull/31286#issuecomment-771440146 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
cloud-fan commented on a change in pull request #31378: URL: https://github.com/apache/spark/pull/31378#discussion_r568385314 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala ## @@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto checkAnswer( sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"), - Row("v1") :: Nil + Row("my_key1", "v1") :: Nil Review comment: Yea let's add an item in the migration guide. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31434: [SPARK-33591][SQL][FOLLOW-UP] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`
cloud-fan commented on pull request #31434: URL: https://github.com/apache/spark/pull/31434#issuecomment-771439102 @gengliangwang I've left more comments in the original PR, can you address them as well? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568384363 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2472,6 +2472,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = +buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") + .internal() + .doc("If it is set to true, a null partition value is parsed as a string literal of its " + +"text representation, e.g., string 'null'. Otherwise, null partition values are parsed " + +"as they are.") + .version("3.2.0") Review comment: Ah I see, then how should we place it in the migration guide? Repeat it 3 times in 3.0.2, 3.1.1 and 3.2.0 sections? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31434: [SPARK-33591][SQL][FOLLOW-UP] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`
HyukjinKwon commented on pull request #31434: URL: https://github.com/apache/spark/pull/31434#issuecomment-771437952 cc @cloud-fan too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
AngersZh commented on a change in pull request #31378: URL: https://github.com/apache/spark/pull/31378#discussion_r568382649 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala ## @@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto checkAnswer( sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"), - Row("v1") :: Nil + Row("my_key1", "v1") :: Nil Review comment: > Changing the output schema looks fine to me. Could you describe this change in the PR description? Yea, updated, Do we need to change migration guide too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
HyukjinKwon commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568382451 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2472,6 +2472,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = +buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") + .internal() + .doc("If it is set to true, a null partition value is parsed as a string literal of its " + +"text representation, e.g., string 'null'. Otherwise, null partition values are parsed " + +"as they are.") + .version("3.2.0") Review comment: should be 3.0.2 actually. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568382197 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala ## @@ -154,4 +154,15 @@ trait ShowPartitionsSuiteBase extends QueryTest with DDLCommandTestUtils { assert(partitions.first().getString(0) === "part=a") } } + + test("SPARK-33591: null as string partition literal value 'null' after setting legacy conf") { +withSQLConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL.key -> "true") { + withNamespaceAndTable("ns", "tbl") { t => +sql(s"CREATE TABLE $t (col1 INT, p1 STRING) $defaultUsing PARTITIONED BY (p1)") +sql(s"INSERT INTO TABLE $t PARTITION (p1 = null) SELECT 0") +runShowPartitionsSql(s"SHOW PARTITIONS $t", Row("p1=null") :: Nil) +runShowPartitionsSql(s"SHOW PARTITIONS $t PARTITION (p1 = null)", Row("p1=null") :: Nil) Review comment: how does this show the null string behavior? shall we just test a simple table scan operation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568381329 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2472,6 +2472,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = +buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") + .internal() + .doc("If it is set to true, a null partition value is parsed as a string literal of its " + +"text representation, e.g., string 'null'. Otherwise, null partition values are parsed " + +"as they are.") + .version("3.2.0") Review comment: the version should be 3.1.1, as we are going to backport this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568381165 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2472,6 +2472,16 @@ object SQLConf { .booleanConf .createWithDefault(true) + val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL = +buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral") + .internal() + .doc("If it is set to true, a null partition value is parsed as a string literal of its " + Review comment: ditto, make it more specific. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568381010 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -472,9 +472,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg */ override def visitPartitionSpec( ctx: PartitionSpecContext): Map[String, Option[String]] = withOrigin(ctx) { +val processNullLiteral = Review comment: this name is weird, how about `legacyNullAsString`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568380805 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -500,9 +502,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg * main purpose is to prevent slight differences due to back to back conversions i.e.: * String -> Literal -> String. */ - protected def visitStringConstant(ctx: ConstantContext): String = withOrigin(ctx) { + protected def visitStringConstant( +ctx: ConstantContext, Review comment: nit: 4 spaces indentation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
maropu commented on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-771434590 okay, could you copy the statement above into the PR description? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
cloud-fan commented on a change in pull request #31421: URL: https://github.com/apache/spark/pull/31421#discussion_r568379292 ## File path: docs/sql-migration-guide.md ## @@ -41,6 +41,8 @@ license: | - In Spark 3.2, the auto-generated `Cast` (such as those added by type coercion rules) will be stripped when generating column alias names. E.g., `sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS DOUBLE))`. + - In Spark 3.2, a null partition value is parsed as it is. In Spark 3.1 or earlier, it is parsed as a string literal of its text representation, e.g., string "null". To restore the legacy behavior, you can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true. Review comment: Let's be more specific ``` In Spark 3.2, `PARTITION(col=null)` is always parsed as a null literal in the partition spec. In Spark 3.1 or earlier, it is parsed as a string literal ..., if the partition column is string type. To restore the legacy behavior, ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA removed a comment on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771320748 **[Test build #134751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134751/testReport)** for PR 31402 at commit [`96037a8`](https://github.com/apache/spark/commit/96037a84aa6ee8f7fee2b838ee6d4cf6c89b2556). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA commented on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771432746 **[Test build #134751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134751/testReport)** for PR 31402 at commit [`96037a8`](https://github.com/apache/spark/commit/96037a84aa6ee8f7fee2b838ee6d4cf6c89b2556). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #31434: [SPARK-33591][SQL][FollowUp] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`
gengliangwang commented on pull request #31434: URL: https://github.com/apache/spark/pull/31434#issuecomment-771432824 cc @HyukjinKwon I will create two backport PRs to branch 3.1 and 3.0 once this is merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak edited a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
sarutak edited a comment on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-771432295 > Does this PR supports all types in PostgreSQL? I think so except for serial types as I mentioned [above](https://github.com/apache/spark/pull/31419#issuecomment-770883149). Here is all types in PostgreSQL. https://www.postgresql.org/docs/13/datatype.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #31434: [SPARK-33591][SQL][FollowUp] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`
gengliangwang opened a new pull request #31434: URL: https://github.com/apache/spark/pull/31434 ### What changes were proposed in this pull request? Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2. ### Why are the changes needed? The release version in https://github.com/apache/spark/pull/31421 was wrong. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Doc change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
sarutak commented on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-771432295 > Does this PR supports all types in PostgreSQL? are there unsupported types? I think so except for serial types as I mentioned [above](https://github.com/apache/spark/pull/31419#issuecomment-770883149). Here is all types in PostgreSQL. https://www.postgresql.org/docs/13/datatype.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
maropu commented on a change in pull request #31378: URL: https://github.com/apache/spark/pull/31378#discussion_r568376839 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala ## @@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto checkAnswer( sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"), - Row("v1") :: Nil + Row("my_key1", "v1") :: Nil Review comment: Changing the output schema looks fine to me. Could you describe this change in the PR description? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771365500 **[Test build #134760 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134760/testReport)** for PR 31204 at commit [`9009b9a`](https://github.com/apache/spark/commit/9009b9a5b5d66d5968737633db03dc30def5f36d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
SparkQA commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771429364 **[Test build #134760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134760/testReport)** for PR 31204 at commit [`9009b9a`](https://github.com/apache/spark/commit/9009b9a5b5d66d5968737633db03dc30def5f36d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
Ngone51 commented on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771428466 thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines
SparkQA removed a comment on pull request #31286: URL: https://github.com/apache/spark/pull/31286#issuecomment-771318932 **[Test build #134754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134754/testReport)** for PR 31286 at commit [`19340d4`](https://github.com/apache/spark/commit/19340d4fadffc4efe9d4adc823b381c2ec76d590). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
HyukjinKwon closed pull request #31429: URL: https://github.com/apache/spark/pull/31429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
HyukjinKwon commented on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771427885 Merged to master, branch-3.1 and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines
SparkQA commented on pull request #31286: URL: https://github.com/apache/spark/pull/31286#issuecomment-771427572 **[Test build #134754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134754/testReport)** for PR 31286 at commit [`19340d4`](https://github.com/apache/spark/commit/19340d4fadffc4efe9d4adc823b381c2ec76d590). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
SparkQA removed a comment on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771318802 **[Test build #134753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134753/testReport)** for PR 31429 at commit [`0ab2b7a`](https://github.com/apache/spark/commit/0ab2b7a8a859535a619eb99163a614d65ecd4d09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
SparkQA commented on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771427109 **[Test build #134753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134753/testReport)** for PR 31429 at commit [`0ab2b7a`](https://github.com/apache/spark/commit/0ab2b7a8a859535a619eb99163a614d65ecd4d09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31432: [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default
SparkQA commented on pull request #31432: URL: https://github.com/apache/spark/pull/31432#issuecomment-771425039 **[Test build #134762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134762/testReport)** for PR 31432 at commit [`584dcbb`](https://github.com/apache/spark/commit/584dcbb82e2f203a679feb7c368d88d97a789e1a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA commented on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771424391 **[Test build #134766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134766/testReport)** for PR 31402 at commit [`bfc1af1`](https://github.com/apache/spark/commit/bfc1af12c9d2ece5c94414a4df4291da1aa265bd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
AmplabJenkins removed a comment on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771420996 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
SparkQA commented on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771423559 **[Test build #134767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134767/testReport)** for PR 31433 at commit [`12be68c`](https://github.com/apache/spark/commit/12be68c17cd1d4d6a623e1773eb8c315589d7004). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
HyukjinKwon commented on pull request #31421: URL: https://github.com/apache/spark/pull/31421#issuecomment-771423568 Merged to master. @gengliangwang would you mind changing the version and making backporting PRs please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] offthewall123 commented on a change in pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
offthewall123 commented on a change in pull request #31433: URL: https://github.com/apache/spark/pull/31433#discussion_r568370590 ## File path: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala ## @@ -20,12 +20,11 @@ package org.apache.spark.shuffle.sort import org.apache.spark._ import org.apache.spark.internal.{config, Logging} import org.apache.spark.scheduler.MapStatus -import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver, ShuffleWriter} +import org.apache.spark.shuffle.{BaseShuffleHandle, ShuffleWriter} import org.apache.spark.shuffle.api.ShuffleExecutorComponents import org.apache.spark.util.collection.ExternalSorter private[spark] class SortShuffleWriter[K, V, C]( -shuffleBlockResolver: IndexShuffleBlockResolver, Review comment: Yes, it's not used here, i think someone forget to remove this when separate this part logic to LocalDiskShuffleDataIO. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
sunchao commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568370098 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { + case Some(bucketSet) => +filePath => { + BucketingUtils.getBucketId(filePath.getName) match { +case Some(id) => bucketSet.get(id) +case None => + if (ignoreCorruptFiles) { +// If ignoring corrupt file, do not prune when bucket file name is invalid Review comment: > I feel either skipping or processing the file is no way perfect. Yes agreed. > Given users explicitly disable bucketing here for reading the table, I would assume they want to read the table as a non-bucketed table, so they would like to read all of input files Good point. Although it seems a bit weird that someone would do this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
AmplabJenkins removed a comment on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771416669 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
SparkQA commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771422618 **[Test build #134763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134763/testReport)** for PR 31431 at commit [`62e4027`](https://github.com/apache/spark/commit/62e40275f4ab616d3089ea18e1037b7ccb2c4e4d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values
HyukjinKwon closed pull request #31421: URL: https://github.com/apache/spark/pull/31421 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
AmplabJenkins commented on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771420996 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
AmplabJenkins removed a comment on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771411424 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
dongjoon-hyun commented on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771419982 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
dongjoon-hyun commented on a change in pull request #31433: URL: https://github.com/apache/spark/pull/31433#discussion_r568367864 ## File path: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala ## @@ -20,12 +20,11 @@ package org.apache.spark.shuffle.sort import org.apache.spark._ import org.apache.spark.internal.{config, Logging} import org.apache.spark.scheduler.MapStatus -import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver, ShuffleWriter} +import org.apache.spark.shuffle.{BaseShuffleHandle, ShuffleWriter} import org.apache.spark.shuffle.api.ShuffleExecutorComponents import org.apache.spark.util.collection.ExternalSorter private[spark] class SortShuffleWriter[K, V, C]( -shuffleBlockResolver: IndexShuffleBlockResolver, Review comment: So, this is unused variable, @offthewall123 ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
SparkQA commented on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-771418173 **[Test build #134765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134765/testReport)** for PR 31413 at commit [`9a6999d`](https://github.com/apache/spark/commit/9a6999d507073bcb56e1fe6266102fcfed6c884c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31430: [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3
SparkQA commented on pull request #31430: URL: https://github.com/apache/spark/pull/31430#issuecomment-771417687 **[Test build #134764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134764/testReport)** for PR 31430 at commit [`8d43399`](https://github.com/apache/spark/commit/8d43399f5e8658354d911ac4469f0613ef8d06d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
AmplabJenkins removed a comment on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-770997062 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134735/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
AmplabJenkins removed a comment on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771410963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
AmplabJenkins removed a comment on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-771410968 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39343/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
AmplabJenkins commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771416669 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins removed a comment on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771410967 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39345/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
AmplabJenkins removed a comment on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771411511 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`
maropu commented on a change in pull request #30363: URL: https://github.com/apache/spark/pull/30363#discussion_r568364686 ## File path: core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URL + +import scala.collection.JavaConverters._ +import scala.io.Source +import scala.reflect.runtime.{universe => ru} +import scala.util.control.NonFatal + +import org.apache.spark.internal.Logging + +object SparkConfRegisterLoader extends Logging { Review comment: Yea, we might be able to brush up it based on the current implementation. But, IMO it'd be better to avoid loading classes from a resource file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
maropu commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771414229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #31423: [SPARK-34314][SQL] Create new file index after partition schema inferring w/ the schema
MaxGekk commented on a change in pull request #31423: URL: https://github.com/apache/spark/pull/31423#discussion_r568362848 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -413,9 +413,13 @@ case class DataSource( } else { val globbedPaths = checkAndGlobPathIfNecessary( checkEmptyGlobPath = true, checkFilesExist = checkFilesExist) - val index = createInMemoryFileIndex(globbedPaths) + val fileStatusCache = FileStatusCache.getOrCreate(sparkSession) + val indexInSchemaInferring = new InMemoryFileIndex( +sparkSession, globbedPaths, options, userSpecifiedSchema, fileStatusCache) val (resultDataSchema, resultPartitionSchema) = -getOrInferFileFormatSchema(format, () => index) +getOrInferFileFormatSchema(format, () => indexInSchemaInferring) + val index = new InMemoryFileIndex( +sparkSession, globbedPaths, options, Some(resultPartitionSchema), fileStatusCache) Review comment: New index re-uses the file statuses cache `fileStatusCache`, so, this should allow to avoid additional accesses to the file system. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
turboFei removed a comment on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771405635 For temporary view, its logical plan has been stored in catalog and the logical plan has been analyzed. So, even we refresh its underlying table, it logical plan would not been changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #31423: [SPARK-34314][SQL] Create new file index after partition schema inferring w/ the schema
MaxGekk commented on a change in pull request #31423: URL: https://github.com/apache/spark/pull/31423#discussion_r568362026 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala ## @@ -370,7 +370,7 @@ class PartitionedTablePerfStatsSuite assert(HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount() == 0) // reads and caches all the files initially - assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 5) + assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 10) Review comment: The second partition discovery should re-use file statuses from the cache if the cache is NoOp cache This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas
SparkQA commented on pull request #31429: URL: https://github.com/apache/spark/pull/31429#issuecomment-771412722 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39346/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
maropu commented on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-771412232 Does this PR supports all types in PostgreSQL? are there unsupported types? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
AmplabJenkins commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771411511 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
AmplabJenkins commented on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771411424 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
AmplabJenkins commented on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771410963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters
AmplabJenkins commented on pull request #31204: URL: https://github.com/apache/spark/pull/31204#issuecomment-771410967 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39345/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
AmplabJenkins commented on pull request #31413: URL: https://github.com/apache/spark/pull/31413#issuecomment-771410968 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39343/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
maropu commented on a change in pull request #31419: URL: https://github.com/apache/spark/pull/31419#discussion_r568360410 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala ## @@ -74,7 +74,9 @@ class PostgresIntegrationSuite extends DockerJDBCIntegrationSuite { .executeUpdate() conn.prepareStatement("CREATE TABLE st_with_array (c0 uuid, c1 inet, c2 cidr," + - "c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 jsonb[])") + "c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 jsonb[], c10 xml[], " + Review comment: How about adding tests for non-array cases, e.g., `col xml` and `col tsvector`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
Ngone51 commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-771410418 kindly ping @holdenk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes
Ngone51 commented on pull request #31249: URL: https://github.com/apache/spark/pull/31249#issuecomment-771409987 I left some minor comments. Overall, looks good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
c21 commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568359233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { + case Some(bucketSet) => +filePath => { + BucketingUtils.getBucketId(filePath.getName) match { +case Some(id) => bucketSet.get(id) +case None => + if (ignoreCorruptFiles) { +// If ignoring corrupt file, do not prune when bucket file name is invalid Review comment: Given users explicitly disable bucketing here for reading the table, I would assume they want to read the table as a non-bucketed table, so they would like to read all of input files, no? cc @viirya what's the use case you are thinking here? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] offthewall123 commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
offthewall123 commented on pull request #31433: URL: https://github.com/apache/spark/pull/31433#issuecomment-771408356 @srowen @HeartSaVioR Please take a review, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes
Ngone51 commented on a change in pull request #31249: URL: https://github.com/apache/spark/pull/31249#discussion_r568358423 ## File path: core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala ## @@ -554,6 +554,50 @@ class HealthTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with Mock verify(allocationClientMock).killExecutorsOnHost("hostA") } + test("excluding decommission and kills executors when enabled") { +val allocationClientMock = mock[ExecutorAllocationClient] + +// verify we decommission when configured +conf.set(config.EXCLUDE_ON_FAILURE_KILL_ENABLED, true) +conf.set(config.DECOMMISSION_ENABLED.key, "true") +conf.set(config.EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED.key, "true") +conf.set(config.MAX_FAILURES_PER_EXEC.key, "1") +conf.set(config.MAX_FAILED_EXEC_PER_NODE.key, "2") +healthTracker = new HealthTracker(listenerBusMock, conf, Some(allocationClientMock), clock) + +// Fail 4 tasks in one task set on executor 1, so that executor gets excluded for the whole +// application. +val taskSetExclude2 = createTaskSetExcludelist(stageId = 0) +(0 until 4).foreach { partition => + taskSetExclude2.updateExcludedForFailedTask( +"hostA", exec = "1", index = partition, failureReason = "testing") +} +healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, taskSetExclude2.execToFailures) + +val msg1 = + "Killing excluded executor id 1 since spark.excludeOnFailure.killExcludedExecutors is set." + +verify(allocationClientMock).decommissionExecutor( + "1", ExecutorDecommissionInfo(msg1), false) + +val taskSetExclude3 = createTaskSetExcludelist(stageId = 1) +// Fail 4 tasks in one task set on executor 2, so that executor gets excluded for the whole +// application. Since that's the second executor that is excluded on the same node, we also +// exclude that node. +(0 until 4).foreach { partition => + taskSetExclude3.updateExcludedForFailedTask( +"hostA", exec = "2", index = partition, failureReason = "testing") +} +healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, taskSetExclude3.execToFailures) + +val msg2 = + "Killing excluded executor id 2 since spark.excludeOnFailure.killExcludedExecutors is set." +verify(allocationClientMock).decommissionExecutor( + "2", ExecutorDecommissionInfo(msg2), false, false) +verify(allocationClientMock).decommissionExecutorsOnHost( + "hostA") Review comment: nit: turn into one line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] offthewall123 opened a new pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter
offthewall123 opened a new pull request #31433: URL: https://github.com/apache/spark/pull/31433 ### What changes were proposed in this pull request? Remove shuffleBlockResolver in SortShuffleWriter. ### Why are the changes needed? For better code understanding. ### Does this PR introduce _any_ user-facing change? NO. ### How was this patch tested? End to End. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
c21 commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568357088 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { + case Some(bucketSet) => +filePath => { + BucketingUtils.getBucketId(filePath.getName) match { +case Some(id) => bucketSet.get(id) +case None => + if (ignoreCorruptFiles) { +// If ignoring corrupt file, do not prune when bucket file name is invalid Review comment: I feel either skipping or processing the file is no way perfect. There can be other corruption case, where e.g. the table (specified with 1024 buckets), but only had 500 files underneath. This could be due to some other compute engines or users accidentally dump data here without respecting spark bucketing metadata. We have no efficient way to handle if number of files fewer than number of buckets. The existing usage of `ignoreCorruptFiles` [skip reading some of content of file](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L160), so it's also not completely ignoring. But I am fine if we think we need another config name for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables
turboFei commented on pull request #31431: URL: https://github.com/apache/spark/pull/31431#issuecomment-771405635 For temporary view, its logical plan has been stored in catalog and the logical plan has been analyzed. So, even we refresh its underlying table, it logical plan would not been changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] linhongliu-db commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`
linhongliu-db commented on a change in pull request #30363: URL: https://github.com/apache/spark/pull/30363#discussion_r568356352 ## File path: core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URL + +import scala.collection.JavaConverters._ +import scala.io.Source +import scala.reflect.runtime.{universe => ru} +import scala.util.control.NonFatal + +import org.apache.spark.internal.Logging + +object SparkConfRegisterLoader extends Logging { Review comment: @maropu, awesome! I was not aware of this in `PythonSQLUtils`. But I'm still worried about the maintenance if other external modules also have registered SQLConf. And this code assumes the hive module is already included which is not suitable for all other moudles. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes
Ngone51 commented on a change in pull request #31249: URL: https://github.com/apache/spark/pull/31249#discussion_r568356238 ## File path: core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala ## @@ -554,6 +554,50 @@ class HealthTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with Mock verify(allocationClientMock).killExecutorsOnHost("hostA") } + test("excluding decommission and kills executors when enabled") { +val allocationClientMock = mock[ExecutorAllocationClient] + +// verify we decommission when configured +conf.set(config.EXCLUDE_ON_FAILURE_KILL_ENABLED, true) +conf.set(config.DECOMMISSION_ENABLED.key, "true") +conf.set(config.EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED.key, "true") +conf.set(config.MAX_FAILURES_PER_EXEC.key, "1") +conf.set(config.MAX_FAILED_EXEC_PER_NODE.key, "2") +healthTracker = new HealthTracker(listenerBusMock, conf, Some(allocationClientMock), clock) + +// Fail 4 tasks in one task set on executor 1, so that executor gets excluded for the whole +// application. +val taskSetExclude2 = createTaskSetExcludelist(stageId = 0) +(0 until 4).foreach { partition => + taskSetExclude2.updateExcludedForFailedTask( +"hostA", exec = "1", index = partition, failureReason = "testing") +} +healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, taskSetExclude2.execToFailures) + +val msg1 = + "Killing excluded executor id 1 since spark.excludeOnFailure.killExcludedExecutors is set." Review comment: Shall we include decommission hint when enabled in the message (in `killExecutor()`)? e.g., "Killing (decommission) excluded executor..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-771405161 @alefischer13 Hi please look at this PR that was merged later and changed the way you need to configure the log4j.properties: #28801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #31430: [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3
maropu commented on pull request #31430: URL: https://github.com/apache/spark/pull/31430#issuecomment-771403087 LGTM nit: could you add a link to the release note of `1.4.8-3` in the PR description? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
sunchao commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568353916 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { + case Some(bucketSet) => +filePath => { + BucketingUtils.getBucketId(filePath.getName) match { +case Some(id) => bucketSet.get(id) +case None => + if (ignoreCorruptFiles) { +// If ignoring corrupt file, do not prune when bucket file name is invalid Review comment: Cool thanks for pointing to the discussion. I'm just not sure whether the corrupted file should be ignored or processed if the flag is turned on. `ignoreCorruptedFiles` seems to indicate that the problematic file should be ignored so it is a bit confusing that we still process it here. Also IMO ignoring it seems to be slightly safer (thinking someone dump garbage files into the bucketed partition dir)? cc @maropu @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA removed a comment on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771366254 **[Test build #134758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134758/testReport)** for PR 31402 at commit [`5c4c256`](https://github.com/apache/spark/commit/5c4c2567fcbde7d8249dc18947ed283de0e287d9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding
SparkQA commented on pull request #31402: URL: https://github.com/apache/spark/pull/31402#issuecomment-771401976 **[Test build #134758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134758/testReport)** for PR 31402 at commit [`5c4c256`](https://github.com/apache/spark/commit/5c4c2567fcbde7d8249dc18947ed283de0e287d9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class RowNumberLike extends AggregateWindowFunction ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31428: [SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap
dongjoon-hyun commented on pull request #31428: URL: https://github.com/apache/spark/pull/31428#issuecomment-771397570 Thank you, @ScrapCodes . Yes, for security, some namespaces do not allow the driver pod to create new ConfigMaps. As a result, Spark jobs are unable to be executed in those namespaces. This unblocks Spark jobs in those namespace. As I wrote in the PR description, this doesn't aim to hurt the functionality of the original contribution. Instead, this PR aims to provide a workaround for those security enforced namespace. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
c21 commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568353327 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { Review comment: Changed to `shouldProcess` as I feel `shouldNotPrune` is hard to reason about. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #31432: [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default
viirya opened a new pull request #31432: URL: https://github.com/apache/spark/pull/31432 ### What changes were proposed in this pull request? This patch proposes to remove `TRUNCATE` from the default `capabilities` list from `FileTable`. ### Why are the changes needed? The abstract class `FileTable` now lists `TRUNCATE` in its `capabilities`, but `FileTable` does not know if an implementation really supports truncation or not. Specifically, we can check existing `FileTable` implementations including `AvroTable`, `CSVTable`, `JsonTable`, etc. No one implementation really implements `SupportsTruncate` in its writer builder. ### Does this PR introduce _any_ user-facing change? No, because seems to me `FileTable` is not of user-facing public API. ### How was this patch tested? Existing unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly
AngersZh commented on a change in pull request #31378: URL: https://github.com/apache/spark/pull/31378#discussion_r568352532 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala ## @@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto checkAnswer( sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"), - Row("v1") :: Nil + Row("my_key1", "v1") :: Nil Review comment: > > then shall we change the schema of the v2 command? > > IMO, change `ShowTablePropertiesCommand` is correct since hive's output schema is `prpt_name` and `prpt_value`, I think hive is wrong before. Also ping @wangyum @beliefer @viirya Also hope your suggestion since I believe it's a hive's bug and we don't need to follow wrong behavior in hive 1.x and 2.x This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
maropu commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568352335 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { Review comment: yea, okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on pull request #31428: [SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap
ScrapCodes commented on pull request #31428: URL: https://github.com/apache/spark/pull/31428#issuecomment-771399674 Ahh, makes sense then. Wish, we could autodetect and fallback too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1
c21 commented on a change in pull request #31413: URL: https://github.com/apache/spark/pull/31413#discussion_r568351355 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -591,20 +590,48 @@ case class FileSourceScanExec( logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, " + s"open cost is considered as scanning $openCostInBytes bytes.") +// Filter files with bucket pruning if possible +lazy val ignoreCorruptFiles = fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles +val canPrune: Path => Boolean = optionalBucketSet match { + case Some(bucketSet) => +filePath => { + BucketingUtils.getBucketId(filePath.getName) match { +case Some(id) => bucketSet.get(id) +case None => + if (ignoreCorruptFiles) { +// If ignoring corrupt file, do not prune when bucket file name is invalid Review comment: @sunchao - this is newly introduced. Updated PR description. > Also I'm not sure if this is the best choice: if a bucketed table is corrupted, should we read the corrupt file? it will likely lead to incorrect results. On the other hand we can choose to ignore the file which seems to be more aligned with the name of the config, although result could still be incorrect. Note by default the exception will be thrown here and query will be failed loud. We allow a config here to help existing users to work around if they want. See relevant discussion in https://github.com/apache/spark/pull/31413#discussion_r567623746 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`
maropu commented on a change in pull request #30363: URL: https://github.com/apache/spark/pull/30363#discussion_r568350914 ## File path: core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala ## @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.net.URL + +import scala.collection.JavaConverters._ +import scala.io.Source +import scala.reflect.runtime.{universe => ru} +import scala.util.control.NonFatal + +import org.apache.spark.internal.Logging + +object SparkConfRegisterLoader extends Logging { Review comment: Why do we need this class for initing configs eagerly? We cannot simply load them in `SetCommand` as `PythonSQLUtils` does? https://github.com/apache/spark/blob/d99d0d27be875bba692bcfe376f90c930e170380/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala#L48-L58 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org