[GitHub] [spark] cloud-fan commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #30363:
URL: https://github.com/apache/spark/pull/30363#discussion_r568391121



##
File path: 
core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URL
+
+import scala.collection.JavaConverters._
+import scala.io.Source
+import scala.reflect.runtime.{universe => ru}
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+
+object SparkConfRegisterLoader extends Logging {

Review comment:
   Yea it's easier to just list all the known config objects, but I agree 
with @linhongliu-db that some modules are optional and we should only load the 
objects from modules when the modules are loaded. FYI all modules under the 
`external` directory are not loaded by default, and if they start to define 
configs, we will need this framework. It's also true that other spark vendors 
may define their own configs.
   
   This is similar to the data source registry resource files, so I don't have 
a big concern here as this is not the first time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


turboFei commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771444379


   The UT passed in our spark-2.3 branch, will try to find out why it failed in 
community master branch.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA commented on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771444120


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39352/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


SparkQA commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771443108


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39349/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771442106


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134753/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771442108


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134760/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31286: [SPARK-34199][SQL] Block `count(table.*)` to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31286:
URL: https://github.com/apache/spark/pull/31286#issuecomment-771442107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134754/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771442108


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134760/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771442106


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134753/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31286: [SPARK-34199][SQL] Block `count(table.*)` to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31286:
URL: https://github.com/apache/spark/pull/31286#issuecomment-771442107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134754/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


HyukjinKwon commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568386466



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2472,6 +2472,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
+buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
+  .internal()
+  .doc("If it is set to true, a null partition value is parsed as a string 
literal of its " +
+"text representation, e.g., string 'null'. Otherwise, null partition 
values are parsed " +
+"as they are.")
+  .version("3.2.0")

Review comment:
   I think we have documented only in 3.0.2 so far in such cases ..





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


cloud-fan closed pull request #31286:
URL: https://github.com/apache/spark/pull/31286


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


cloud-fan commented on pull request #31286:
URL: https://github.com/apache/spark/pull/31286#issuecomment-771440146


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31378:
URL: https://github.com/apache/spark/pull/31378#discussion_r568385314



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
##
@@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
 
 checkAnswer(
   sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"),
-  Row("v1") :: Nil
+  Row("my_key1", "v1") :: Nil

Review comment:
   Yea let's add an item in the migration guide.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31434: [SPARK-33591][SQL][FOLLOW-UP] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

2021-02-01 Thread GitBox


cloud-fan commented on pull request #31434:
URL: https://github.com/apache/spark/pull/31434#issuecomment-771439102


   @gengliangwang I've left more comments in the original PR, can you address 
them as well? thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568384363



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2472,6 +2472,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
+buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
+  .internal()
+  .doc("If it is set to true, a null partition value is parsed as a string 
literal of its " +
+"text representation, e.g., string 'null'. Otherwise, null partition 
values are parsed " +
+"as they are.")
+  .version("3.2.0")

Review comment:
   Ah I see, then how should we place it in the migration guide? Repeat it 
3 times in 3.0.2, 3.1.1 and 3.2.0 sections?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31434: [SPARK-33591][SQL][FOLLOW-UP] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

2021-02-01 Thread GitBox


HyukjinKwon commented on pull request #31434:
URL: https://github.com/apache/spark/pull/31434#issuecomment-771437952


   cc @cloud-fan too



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly

2021-02-01 Thread GitBox


AngersZh commented on a change in pull request #31378:
URL: https://github.com/apache/spark/pull/31378#discussion_r568382649



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
##
@@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
 
 checkAnswer(
   sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"),
-  Row("v1") :: Nil
+  Row("my_key1", "v1") :: Nil

Review comment:
   > Changing the output schema looks fine to me. Could you describe this 
change in the PR description?
   
   Yea, updated, Do we need to change migration guide too?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


HyukjinKwon commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568382451



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2472,6 +2472,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
+buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
+  .internal()
+  .doc("If it is set to true, a null partition value is parsed as a string 
literal of its " +
+"text representation, e.g., string 'null'. Otherwise, null partition 
values are parsed " +
+"as they are.")
+  .version("3.2.0")

Review comment:
   should be 3.0.2 actually.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568382197



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala
##
@@ -154,4 +154,15 @@ trait ShowPartitionsSuiteBase extends QueryTest with 
DDLCommandTestUtils {
   assert(partitions.first().getString(0) === "part=a")
 }
   }
+
+  test("SPARK-33591: null as string partition literal value 'null' after 
setting legacy conf") {
+withSQLConf(SQLConf.LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL.key 
-> "true") {
+  withNamespaceAndTable("ns", "tbl") { t =>
+sql(s"CREATE TABLE $t (col1 INT, p1 STRING) $defaultUsing PARTITIONED 
BY (p1)")
+sql(s"INSERT INTO TABLE $t PARTITION (p1 = null) SELECT 0")
+runShowPartitionsSql(s"SHOW PARTITIONS $t", Row("p1=null") :: Nil)
+runShowPartitionsSql(s"SHOW PARTITIONS $t PARTITION (p1 = null)", 
Row("p1=null") :: Nil)

Review comment:
   how does this show the null string behavior? shall we just test a simple 
table scan operation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568381329



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2472,6 +2472,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
+buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
+  .internal()
+  .doc("If it is set to true, a null partition value is parsed as a string 
literal of its " +
+"text representation, e.g., string 'null'. Otherwise, null partition 
values are parsed " +
+"as they are.")
+  .version("3.2.0")

Review comment:
   the version should be 3.1.1, as we are going to backport this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568381165



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2472,6 +2472,16 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val LEGACY_PARSE_NULL_PARTITION_SPEC_AS_STRING_LITERAL =
+buildConf("spark.sql.legacy.parseNullPartitionSpecAsStringLiteral")
+  .internal()
+  .doc("If it is set to true, a null partition value is parsed as a string 
literal of its " +

Review comment:
   ditto, make it more specific.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568381010



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -472,9 +472,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
*/
   override def visitPartitionSpec(
   ctx: PartitionSpecContext): Map[String, Option[String]] = 
withOrigin(ctx) {
+val processNullLiteral =

Review comment:
   this name is weird, how about `legacyNullAsString`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568380805



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##
@@ -500,9 +502,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
* main purpose is to prevent slight differences due to back to back 
conversions i.e.:
* String -> Literal -> String.
*/
-  protected def visitStringConstant(ctx: ConstantContext): String = 
withOrigin(ctx) {
+  protected def visitStringConstant(
+ctx: ConstantContext,

Review comment:
   nit: 4 spaces indentation





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


maropu commented on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-771434590


   okay, could you copy the statement above into the PR description?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


cloud-fan commented on a change in pull request #31421:
URL: https://github.com/apache/spark/pull/31421#discussion_r568379292



##
File path: docs/sql-migration-guide.md
##
@@ -41,6 +41,8 @@ license: |
 
   - In Spark 3.2, the auto-generated `Cast` (such as those added by type 
coercion rules) will be stripped when generating column alias names. E.g., 
`sql("SELECT floor(1)").columns` will be `FLOOR(1)` instead of `FLOOR(CAST(1 AS 
DOUBLE))`.
 
+  - In Spark 3.2, a null partition value is parsed as it is. In Spark 3.1 or 
earlier, it is parsed as a string literal of its text representation, e.g., 
string "null". To restore the legacy behavior, you can set 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

Review comment:
   Let's be more specific
   ```
   In Spark 3.2, `PARTITION(col=null)` is always parsed as a null literal in 
the partition spec. In Spark 3.1 or earlier,
   it is parsed as a string literal ..., if the partition column is string 
type. To restore the legacy behavior, ...
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA removed a comment on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771320748


   **[Test build #134751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134751/testReport)**
 for PR 31402 at commit 
[`96037a8`](https://github.com/apache/spark/commit/96037a84aa6ee8f7fee2b838ee6d4cf6c89b2556).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA commented on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771432746


   **[Test build #134751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134751/testReport)**
 for PR 31402 at commit 
[`96037a8`](https://github.com/apache/spark/commit/96037a84aa6ee8f7fee2b838ee6d4cf6c89b2556).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #31434: [SPARK-33591][SQL][FollowUp] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

2021-02-01 Thread GitBox


gengliangwang commented on pull request #31434:
URL: https://github.com/apache/spark/pull/31434#issuecomment-771432824


   cc @HyukjinKwon 
   I will create two backport PRs to branch 3.1 and 3.0 once this is merged.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak edited a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


sarutak edited a comment on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-771432295


   > Does this PR supports all types in PostgreSQL? 
   
   I think so except for serial types as I mentioned 
[above](https://github.com/apache/spark/pull/31419#issuecomment-770883149).
   
   Here is all types in PostgreSQL.
   https://www.postgresql.org/docs/13/datatype.html



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #31434: [SPARK-33591][SQL][FollowUp] Correct the version of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

2021-02-01 Thread GitBox


gengliangwang opened a new pull request #31434:
URL: https://github.com/apache/spark/pull/31434


   
   
   ### What changes were proposed in this pull request?
   
   Correct the version of SQL configuration 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
   
   
   ### Why are the changes needed?
   
   The release version in https://github.com/apache/spark/pull/31421 was wrong.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Doc change



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


sarutak commented on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-771432295


   > Does this PR supports all types in PostgreSQL? are there unsupported types?
   
   I think so except for serial types as I mentioned 
[above](https://github.com/apache/spark/pull/31419#issuecomment-770883149).
   
   Here is all types in PostgreSQL.
   https://www.postgresql.org/docs/13/datatype.html



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly

2021-02-01 Thread GitBox


maropu commented on a change in pull request #31378:
URL: https://github.com/apache/spark/pull/31378#discussion_r568376839



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
##
@@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
 
 checkAnswer(
   sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"),
-  Row("v1") :: Nil
+  Row("my_key1", "v1") :: Nil

Review comment:
   Changing the output schema looks fine to me. Could you describe this 
change in the PR description?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


SparkQA removed a comment on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771365500


   **[Test build #134760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134760/testReport)**
 for PR 31204 at commit 
[`9009b9a`](https://github.com/apache/spark/commit/9009b9a5b5d66d5968737633db03dc30def5f36d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


SparkQA commented on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771429364


   **[Test build #134760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134760/testReport)**
 for PR 31204 at commit 
[`9009b9a`](https://github.com/apache/spark/commit/9009b9a5b5d66d5968737633db03dc30def5f36d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


Ngone51 commented on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771428466


   thanks all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


SparkQA removed a comment on pull request #31286:
URL: https://github.com/apache/spark/pull/31286#issuecomment-771318932


   **[Test build #134754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134754/testReport)**
 for PR 31286 at commit 
[`19340d4`](https://github.com/apache/spark/commit/19340d4fadffc4efe9d4adc823b381c2ec76d590).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


HyukjinKwon closed pull request #31429:
URL: https://github.com/apache/spark/pull/31429


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


HyukjinKwon commented on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771427885


   Merged to master, branch-3.1 and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

2021-02-01 Thread GitBox


SparkQA commented on pull request #31286:
URL: https://github.com/apache/spark/pull/31286#issuecomment-771427572


   **[Test build #134754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134754/testReport)**
 for PR 31286 at commit 
[`19340d4`](https://github.com/apache/spark/commit/19340d4fadffc4efe9d4adc823b381c2ec76d590).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


SparkQA removed a comment on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771318802


   **[Test build #134753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134753/testReport)**
 for PR 31429 at commit 
[`0ab2b7a`](https://github.com/apache/spark/commit/0ab2b7a8a859535a619eb99163a614d65ecd4d09).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


SparkQA commented on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771427109


   **[Test build #134753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134753/testReport)**
 for PR 31429 at commit 
[`0ab2b7a`](https://github.com/apache/spark/commit/0ab2b7a8a859535a619eb99163a614d65ecd4d09).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31432: [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default

2021-02-01 Thread GitBox


SparkQA commented on pull request #31432:
URL: https://github.com/apache/spark/pull/31432#issuecomment-771425039


   **[Test build #134762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134762/testReport)**
 for PR 31432 at commit 
[`584dcbb`](https://github.com/apache/spark/commit/584dcbb82e2f203a679feb7c368d88d97a789e1a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA commented on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771424391


   **[Test build #134766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134766/testReport)**
 for PR 31402 at commit 
[`bfc1af1`](https://github.com/apache/spark/commit/bfc1af12c9d2ece5c94414a4df4291da1aa265bd).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771420996


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


SparkQA commented on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771423559


   **[Test build #134767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134767/testReport)**
 for PR 31433 at commit 
[`12be68c`](https://github.com/apache/spark/commit/12be68c17cd1d4d6a623e1773eb8c315589d7004).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


HyukjinKwon commented on pull request #31421:
URL: https://github.com/apache/spark/pull/31421#issuecomment-771423568


   Merged to master.
   
   @gengliangwang would you mind changing the version and making backporting 
PRs please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] offthewall123 commented on a change in pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


offthewall123 commented on a change in pull request #31433:
URL: https://github.com/apache/spark/pull/31433#discussion_r568370590



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
##
@@ -20,12 +20,11 @@ package org.apache.spark.shuffle.sort
 import org.apache.spark._
 import org.apache.spark.internal.{config, Logging}
 import org.apache.spark.scheduler.MapStatus
-import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver, 
ShuffleWriter}
+import org.apache.spark.shuffle.{BaseShuffleHandle, ShuffleWriter}
 import org.apache.spark.shuffle.api.ShuffleExecutorComponents
 import org.apache.spark.util.collection.ExternalSorter
 
 private[spark] class SortShuffleWriter[K, V, C](
-shuffleBlockResolver: IndexShuffleBlockResolver,

Review comment:
   Yes,  it's not used here, i think someone forget to remove this when 
separate this part logic to LocalDiskShuffleDataIO.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


sunchao commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568370098



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {
+  case Some(bucketSet) =>
+filePath => {
+  BucketingUtils.getBucketId(filePath.getName) match {
+case Some(id) => bucketSet.get(id)
+case None =>
+  if (ignoreCorruptFiles) {
+// If ignoring corrupt file, do not prune when bucket file 
name is invalid

Review comment:
   > I feel either skipping or processing the file is no way perfect.
   
   Yes agreed. 
   
   > Given users explicitly disable bucketing here for reading the table, I 
would assume they want to read the table as a non-bucketed table, so they would 
like to read all of input files
   
   Good point. Although it seems a bit weird that someone would do this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771416669


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


SparkQA commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771422618


   **[Test build #134763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134763/testReport)**
 for PR 31431 at commit 
[`62e4027`](https://github.com/apache/spark/commit/62e40275f4ab616d3089ea18e1037b7ccb2c4e4d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #31421: [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

2021-02-01 Thread GitBox


HyukjinKwon closed pull request #31421:
URL: https://github.com/apache/spark/pull/31421


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771420996


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771411424


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


dongjoon-hyun commented on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771419982


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #31433:
URL: https://github.com/apache/spark/pull/31433#discussion_r568367864



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
##
@@ -20,12 +20,11 @@ package org.apache.spark.shuffle.sort
 import org.apache.spark._
 import org.apache.spark.internal.{config, Logging}
 import org.apache.spark.scheduler.MapStatus
-import org.apache.spark.shuffle.{BaseShuffleHandle, IndexShuffleBlockResolver, 
ShuffleWriter}
+import org.apache.spark.shuffle.{BaseShuffleHandle, ShuffleWriter}
 import org.apache.spark.shuffle.api.ShuffleExecutorComponents
 import org.apache.spark.util.collection.ExternalSorter
 
 private[spark] class SortShuffleWriter[K, V, C](
-shuffleBlockResolver: IndexShuffleBlockResolver,

Review comment:
   So, this is unused variable, @offthewall123 ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


SparkQA commented on pull request #31413:
URL: https://github.com/apache/spark/pull/31413#issuecomment-771418173


   **[Test build #134765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134765/testReport)**
 for PR 31413 at commit 
[`9a6999d`](https://github.com/apache/spark/commit/9a6999d507073bcb56e1fe6266102fcfed6c884c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31430: [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3

2021-02-01 Thread GitBox


SparkQA commented on pull request #31430:
URL: https://github.com/apache/spark/pull/31430#issuecomment-771417687


   **[Test build #134764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134764/testReport)**
 for PR 31430 at commit 
[`8d43399`](https://github.com/apache/spark/commit/8d43399f5e8658354d911ac4469f0613ef8d06d5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-770997062


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134735/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771410963







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31413:
URL: https://github.com/apache/spark/pull/31413#issuecomment-771410968


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39343/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771416669


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771410967


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39345/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


AmplabJenkins removed a comment on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771411511


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`

2021-02-01 Thread GitBox


maropu commented on a change in pull request #30363:
URL: https://github.com/apache/spark/pull/30363#discussion_r568364686



##
File path: 
core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URL
+
+import scala.collection.JavaConverters._
+import scala.io.Source
+import scala.reflect.runtime.{universe => ru}
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+
+object SparkConfRegisterLoader extends Logging {

Review comment:
   Yea, we might be able to brush up it based on the current 
implementation. But, IMO it'd be better to avoid loading classes from a 
resource file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


maropu commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771414229







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #31423: [SPARK-34314][SQL] Create new file index after partition schema inferring w/ the schema

2021-02-01 Thread GitBox


MaxGekk commented on a change in pull request #31423:
URL: https://github.com/apache/spark/pull/31423#discussion_r568362848



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##
@@ -413,9 +413,13 @@ case class DataSource(
 } else {
   val globbedPaths = checkAndGlobPathIfNecessary(
 checkEmptyGlobPath = true, checkFilesExist = checkFilesExist)
-  val index = createInMemoryFileIndex(globbedPaths)
+  val fileStatusCache = FileStatusCache.getOrCreate(sparkSession)
+  val indexInSchemaInferring = new InMemoryFileIndex(
+sparkSession, globbedPaths, options, userSpecifiedSchema, 
fileStatusCache)
   val (resultDataSchema, resultPartitionSchema) =
-getOrInferFileFormatSchema(format, () => index)
+getOrInferFileFormatSchema(format, () => indexInSchemaInferring)
+  val index = new InMemoryFileIndex(
+sparkSession, globbedPaths, options, Some(resultPartitionSchema), 
fileStatusCache)

Review comment:
   New index re-uses the file statuses cache `fileStatusCache`, so, this 
should allow to avoid additional accesses to the file  system.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei removed a comment on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


turboFei removed a comment on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771405635


   For temporary view, its logical plan has been stored in catalog and the 
logical plan has been analyzed.
   
   So, even we refresh its underlying table, it logical plan would not been 
changed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #31423: [SPARK-34314][SQL] Create new file index after partition schema inferring w/ the schema

2021-02-01 Thread GitBox


MaxGekk commented on a change in pull request #31423:
URL: https://github.com/apache/spark/pull/31423#discussion_r568362026



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
##
@@ -370,7 +370,7 @@ class PartitionedTablePerfStatsSuite
   assert(HiveCatalogMetrics.METRIC_PARTITIONS_FETCHED.getCount() == 0)
 
   // reads and caches all the files initially
-  assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 5)
+  assert(HiveCatalogMetrics.METRIC_FILES_DISCOVERED.getCount() == 10)

Review comment:
   The second partition discovery should re-use file statuses from the 
cache if the cache is NoOp cache





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31429: [SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

2021-02-01 Thread GitBox


SparkQA commented on pull request #31429:
URL: https://github.com/apache/spark/pull/31429#issuecomment-771412722


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39346/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


maropu commented on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-771412232


   Does this PR supports all types in PostgreSQL? are there unsupported types?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771411511


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771411424


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771410963







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31204: [SPARK-26399][WEBUI][CORE] Add new stage-level REST APIs and parameters

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31204:
URL: https://github.com/apache/spark/pull/31204#issuecomment-771410967


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39345/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


AmplabJenkins commented on pull request #31413:
URL: https://github.com/apache/spark/pull/31413#issuecomment-771410968


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39343/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-01 Thread GitBox


maropu commented on a change in pull request #31419:
URL: https://github.com/apache/spark/pull/31419#discussion_r568360410



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
##
@@ -74,7 +74,9 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   .executeUpdate()
 
 conn.prepareStatement("CREATE TABLE st_with_array (c0 uuid, c1 inet, c2 
cidr," +
-  "c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 
jsonb[])")
+  "c3 json, c4 jsonb, c5 uuid[], c6 inet[], c7 cidr[], c8 json[], c9 
jsonb[], c10 xml[], " +

Review comment:
   How about adding tests for non-array cases, e.g., `col xml` and `col 
tsvector`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-02-01 Thread GitBox


Ngone51 commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-771410418


   kindly ping @holdenk 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

2021-02-01 Thread GitBox


Ngone51 commented on pull request #31249:
URL: https://github.com/apache/spark/pull/31249#issuecomment-771409987


   I left some minor comments. Overall, looks good to me.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


c21 commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568359233



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {
+  case Some(bucketSet) =>
+filePath => {
+  BucketingUtils.getBucketId(filePath.getName) match {
+case Some(id) => bucketSet.get(id)
+case None =>
+  if (ignoreCorruptFiles) {
+// If ignoring corrupt file, do not prune when bucket file 
name is invalid

Review comment:
   Given users explicitly disable bucketing here for reading the table, I 
would assume they want to read the table as a non-bucketed table, so they would 
like to read all of input files, no? cc @viirya what's the use case you are 
thinking here? Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] offthewall123 commented on pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


offthewall123 commented on pull request #31433:
URL: https://github.com/apache/spark/pull/31433#issuecomment-771408356


   @srowen @HeartSaVioR Please take a review, thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

2021-02-01 Thread GitBox


Ngone51 commented on a change in pull request #31249:
URL: https://github.com/apache/spark/pull/31249#discussion_r568358423



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala
##
@@ -554,6 +554,50 @@ class HealthTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Mock
 verify(allocationClientMock).killExecutorsOnHost("hostA")
   }
 
+  test("excluding decommission and kills executors when enabled") {
+val allocationClientMock = mock[ExecutorAllocationClient]
+
+// verify we decommission when configured
+conf.set(config.EXCLUDE_ON_FAILURE_KILL_ENABLED, true)
+conf.set(config.DECOMMISSION_ENABLED.key, "true")
+conf.set(config.EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED.key, "true")
+conf.set(config.MAX_FAILURES_PER_EXEC.key, "1")
+conf.set(config.MAX_FAILED_EXEC_PER_NODE.key, "2")
+healthTracker = new HealthTracker(listenerBusMock, conf, 
Some(allocationClientMock), clock)
+
+// Fail 4 tasks in one task set on executor 1, so that executor gets 
excluded for the whole
+// application.
+val taskSetExclude2 = createTaskSetExcludelist(stageId = 0)
+(0 until 4).foreach { partition =>
+  taskSetExclude2.updateExcludedForFailedTask(
+"hostA", exec = "1", index = partition, failureReason = "testing")
+}
+healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, 
taskSetExclude2.execToFailures)
+
+val msg1 =
+  "Killing excluded executor id 1 since 
spark.excludeOnFailure.killExcludedExecutors is set."
+
+verify(allocationClientMock).decommissionExecutor(
+  "1", ExecutorDecommissionInfo(msg1), false)
+
+val taskSetExclude3 = createTaskSetExcludelist(stageId = 1)
+// Fail 4 tasks in one task set on executor 2, so that executor gets 
excluded for the whole
+// application.  Since that's the second executor that is excluded on the 
same node, we also
+// exclude that node.
+(0 until 4).foreach { partition =>
+  taskSetExclude3.updateExcludedForFailedTask(
+"hostA", exec = "2", index = partition, failureReason = "testing")
+}
+healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, 
taskSetExclude3.execToFailures)
+
+val msg2 =
+  "Killing excluded executor id 2 since 
spark.excludeOnFailure.killExcludedExecutors is set."
+verify(allocationClientMock).decommissionExecutor(
+  "2", ExecutorDecommissionInfo(msg2), false, false)
+verify(allocationClientMock).decommissionExecutorsOnHost(
+  "hostA")

Review comment:
   nit: turn into one line?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] offthewall123 opened a new pull request #31433: [SPARK-34325][CORE] remove_shuffleBlockResolver_in_SortShuffleWriter

2021-02-01 Thread GitBox


offthewall123 opened a new pull request #31433:
URL: https://github.com/apache/spark/pull/31433


   
   
   ### What changes were proposed in this pull request?
   Remove shuffleBlockResolver in SortShuffleWriter.
   
   
   ### Why are the changes needed?
   For better code understanding.
   
   ### Does this PR introduce _any_ user-facing change?
   NO.
   
   
   ### How was this patch tested?
   End to End.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


c21 commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568357088



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {
+  case Some(bucketSet) =>
+filePath => {
+  BucketingUtils.getBucketId(filePath.getName) match {
+case Some(id) => bucketSet.get(id)
+case None =>
+  if (ignoreCorruptFiles) {
+// If ignoring corrupt file, do not prune when bucket file 
name is invalid

Review comment:
   I feel either skipping or processing the file is no way perfect. There 
can be other corruption case, where e.g. the table (specified with 1024 
buckets), but only had 500 files underneath. This could be due to some other 
compute engines or users accidentally dump data here without respecting spark 
bucketing metadata. We have no efficient way to handle if number of files fewer 
than number of buckets.
   
   The existing usage of `ignoreCorruptFiles` [skip reading some of content of 
file](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L160),
 so it's also not completely ignoring. But I am fine if we think we need 
another config name for this.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei commented on pull request #31431: [SPARK-34322][SQL] When refreshing a non-temporary view, also refresh its underlying tables

2021-02-01 Thread GitBox


turboFei commented on pull request #31431:
URL: https://github.com/apache/spark/pull/31431#issuecomment-771405635


   For temporary view, its logical plan has been stored in catalog and the 
logical plan has been analyzed.
   
   So, even we refresh its underlying table, it logical plan would not been 
changed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`

2021-02-01 Thread GitBox


linhongliu-db commented on a change in pull request #30363:
URL: https://github.com/apache/spark/pull/30363#discussion_r568356352



##
File path: 
core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URL
+
+import scala.collection.JavaConverters._
+import scala.io.Source
+import scala.reflect.runtime.{universe => ru}
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+
+object SparkConfRegisterLoader extends Logging {

Review comment:
   @maropu, awesome! I was not aware of this in `PythonSQLUtils`. But I'm 
still worried about the maintenance if other external modules also have 
registered SQLConf. And this code assumes the hive module is already included 
which is not suitable for all other moudles.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #31249: [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

2021-02-01 Thread GitBox


Ngone51 commented on a change in pull request #31249:
URL: https://github.com/apache/spark/pull/31249#discussion_r568356238



##
File path: 
core/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala
##
@@ -554,6 +554,50 @@ class HealthTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Mock
 verify(allocationClientMock).killExecutorsOnHost("hostA")
   }
 
+  test("excluding decommission and kills executors when enabled") {
+val allocationClientMock = mock[ExecutorAllocationClient]
+
+// verify we decommission when configured
+conf.set(config.EXCLUDE_ON_FAILURE_KILL_ENABLED, true)
+conf.set(config.DECOMMISSION_ENABLED.key, "true")
+conf.set(config.EXCLUDE_ON_FAILURE_DECOMMISSION_ENABLED.key, "true")
+conf.set(config.MAX_FAILURES_PER_EXEC.key, "1")
+conf.set(config.MAX_FAILED_EXEC_PER_NODE.key, "2")
+healthTracker = new HealthTracker(listenerBusMock, conf, 
Some(allocationClientMock), clock)
+
+// Fail 4 tasks in one task set on executor 1, so that executor gets 
excluded for the whole
+// application.
+val taskSetExclude2 = createTaskSetExcludelist(stageId = 0)
+(0 until 4).foreach { partition =>
+  taskSetExclude2.updateExcludedForFailedTask(
+"hostA", exec = "1", index = partition, failureReason = "testing")
+}
+healthTracker.updateExcludedForSuccessfulTaskSet(0, 0, 
taskSetExclude2.execToFailures)
+
+val msg1 =
+  "Killing excluded executor id 1 since 
spark.excludeOnFailure.killExcludedExecutors is set."

Review comment:
   Shall we include decommission hint when enabled in the message (in 
`killExecutor()`)? e.g., "Killing (decommission) excluded executor..."





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2021-02-01 Thread GitBox


igreenfield commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-771405161


   @alefischer13 Hi please look at this PR that was merged later and changed 
the way you need to configure the log4j.properties: #28801



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #31430: [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3

2021-02-01 Thread GitBox


maropu commented on pull request #31430:
URL: https://github.com/apache/spark/pull/31430#issuecomment-771403087


   LGTM nit: could you add a link to the release note of `1.4.8-3`  in the PR 
description?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


sunchao commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568353916



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {
+  case Some(bucketSet) =>
+filePath => {
+  BucketingUtils.getBucketId(filePath.getName) match {
+case Some(id) => bucketSet.get(id)
+case None =>
+  if (ignoreCorruptFiles) {
+// If ignoring corrupt file, do not prune when bucket file 
name is invalid

Review comment:
   Cool thanks for pointing to the discussion. I'm just not sure whether 
the corrupted file should be ignored or processed if the flag is turned on. 
`ignoreCorruptedFiles` seems to indicate that the problematic file should be 
ignored so it is a bit confusing that we still process it here. Also IMO 
ignoring it seems to be slightly safer (thinking someone dump garbage files 
into the bucketed partition dir)?
   
   cc @maropu @viirya 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA removed a comment on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771366254


   **[Test build #134758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134758/testReport)**
 for PR 31402 at commit 
[`5c4c256`](https://github.com/apache/spark/commit/5c4c2567fcbde7d8249dc18947ed283de0e287d9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31402: [SPARK-34296][SQL] AggregateWindowFunction frame should not always use UnboundedPreceding

2021-02-01 Thread GitBox


SparkQA commented on pull request #31402:
URL: https://github.com/apache/spark/pull/31402#issuecomment-771401976


   **[Test build #134758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134758/testReport)**
 for PR 31402 at commit 
[`5c4c256`](https://github.com/apache/spark/commit/5c4c2567fcbde7d8249dc18947ed283de0e287d9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `abstract class RowNumberLike extends AggregateWindowFunction `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #31428: [SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap

2021-02-01 Thread GitBox


dongjoon-hyun commented on pull request #31428:
URL: https://github.com/apache/spark/pull/31428#issuecomment-771397570


   Thank you, @ScrapCodes . Yes, for security, some namespaces do not allow the 
driver pod to create new ConfigMaps. As a result, Spark jobs are unable to be 
executed in those namespaces. This unblocks Spark jobs in those namespace. As I 
wrote in the PR description, this doesn't aim to hurt the functionality of the 
original contribution. Instead, this PR aims to provide a workaround for those 
security enforced namespace.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


c21 commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568353327



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {

Review comment:
   Changed to `shouldProcess` as I feel `shouldNotPrune` is hard to reason 
about.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #31432: [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default

2021-02-01 Thread GitBox


viirya opened a new pull request #31432:
URL: https://github.com/apache/spark/pull/31432


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch proposes to remove `TRUNCATE` from the default `capabilities` 
list from `FileTable`.
   
   ### Why are the changes needed?
   
   
   The abstract class `FileTable` now lists `TRUNCATE` in its `capabilities`, 
but `FileTable` does not know if an implementation really supports truncation 
or not. Specifically, we can check existing `FileTable` implementations 
including `AvroTable`, `CSVTable`, `JsonTable`, etc. No one implementation 
really implements `SupportsTruncate` in its writer builder.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No, because seems to me `FileTable` is not of user-facing public API.
   
   ### How was this patch tested?
   
   
   Existing unit tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31378: [SPARK-34240][SQL] Unify output of SHOW TBLPROPERTIES pass output attribute properly

2021-02-01 Thread GitBox


AngersZh commented on a change in pull request #31378:
URL: https://github.com/apache/spark/pull/31378#discussion_r568352532



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala
##
@@ -129,7 +129,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleto
 
 checkAnswer(
   sql(s"SHOW TBLPROPERTIES parquet_tab1('my_key1')"),
-  Row("v1") :: Nil
+  Row("my_key1", "v1") :: Nil

Review comment:
   > > then shall we change the schema of the v2 command?
   > 
   > IMO, change `ShowTablePropertiesCommand` is correct since hive's output 
schema is `prpt_name` and `prpt_value`, I think hive is wrong before.
   
   Also ping @wangyum @beliefer @viirya Also hope your suggestion since I 
believe it's a hive's bug and we don't need to follow wrong behavior in hive 
1.x and 2.x





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


maropu commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568352335



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {

Review comment:
   yea, okay.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ScrapCodes commented on pull request #31428: [SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap

2021-02-01 Thread GitBox


ScrapCodes commented on pull request #31428:
URL: https://github.com/apache/spark/pull/31428#issuecomment-771399674


   Ahh, makes sense then. Wish, we could autodetect and fallback too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #31413: [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

2021-02-01 Thread GitBox


c21 commented on a change in pull request #31413:
URL: https://github.com/apache/spark/pull/31413#discussion_r568351355



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -591,20 +590,48 @@ case class FileSourceScanExec(
 logInfo(s"Planning scan with bin packing, max size: $maxSplitBytes bytes, 
" +
   s"open cost is considered as scanning $openCostInBytes bytes.")
 
+// Filter files with bucket pruning if possible
+lazy val ignoreCorruptFiles = 
fsRelation.sparkSession.sessionState.conf.ignoreCorruptFiles
+val canPrune: Path => Boolean = optionalBucketSet match {
+  case Some(bucketSet) =>
+filePath => {
+  BucketingUtils.getBucketId(filePath.getName) match {
+case Some(id) => bucketSet.get(id)
+case None =>
+  if (ignoreCorruptFiles) {
+// If ignoring corrupt file, do not prune when bucket file 
name is invalid

Review comment:
   @sunchao - this is newly introduced. Updated PR description.
   
   > Also I'm not sure if this is the best choice: if a bucketed table is 
corrupted, should we read the corrupt file? it will likely lead to incorrect 
results. On the other hand we can choose to ignore the file which seems to be 
more aligned with the name of the config, although result could still be 
incorrect.
   
   Note by default the exception will be thrown here and query will be failed 
loud. We allow a config here to help existing users  to work around if they 
want. See relevant discussion in 
https://github.com/apache/spark/pull/31413#discussion_r567623746 .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #30363: [SPARK-33438][SQL] Eagerly init all SQLConf objects for command `set -v`

2021-02-01 Thread GitBox


maropu commented on a change in pull request #30363:
URL: https://github.com/apache/spark/pull/30363#discussion_r568350914



##
File path: 
core/src/main/scala/org/apache/spark/util/SparkConfRegisterLoader.scala
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util
+
+import java.net.URL
+
+import scala.collection.JavaConverters._
+import scala.io.Source
+import scala.reflect.runtime.{universe => ru}
+import scala.util.control.NonFatal
+
+import org.apache.spark.internal.Logging
+
+object SparkConfRegisterLoader extends Logging {

Review comment:
   Why do we need this class for initing configs eagerly? We cannot simply 
load them in `SetCommand` as `PythonSQLUtils` does?
   
https://github.com/apache/spark/blob/d99d0d27be875bba692bcfe376f90c930e170380/sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala#L48-L58





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >