[GitHub] [spark] cloud-fan commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33429:
URL: https://github.com/apache/spark/pull/33429#discussion_r672855707



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -139,21 +139,21 @@ class AdaptiveQueryExecSuite
 }
   }
 
-  private def checkNumLocalShuffleReaders(
-  plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Unit = {
+  private def checkNumLocalShuffleReads(

Review comment:
   or "num shuffles with local read", but it's longer




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883137733


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883137742


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883088693


   **[Test build #141302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)**
 for PR 33416 at commit 
[`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883137735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883137734


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883137739


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141293/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883088661


   **[Test build #141301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)**
 for PR 33428 at commit 
[`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883137741


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883137732


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45810/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33429:
URL: https://github.com/apache/spark/pull/33429#discussion_r672855337



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -139,21 +139,21 @@ class AdaptiveQueryExecSuite
 }
   }
 
-  private def checkNumLocalShuffleReaders(
-  plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Unit = {
+  private def checkNumLocalShuffleReads(

Review comment:
   "num readers" seems more natural?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on pull request #33426: [SPARK-32920][FOLLOW-UP] Fix shuffleMergeFinalized directly calling rdd.getNumPartitions as RDD is not serialized to executor

2021-07-19 Thread GitBox


otterc commented on pull request #33426:
URL: https://github.com/apache/spark/pull/33426#issuecomment-883139314


   cc @Ngone51 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883139077


   **[Test build #141306 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141306/testReport)**
 for PR 33239 at commit 
[`b2f646c`](https://github.com/apache/spark/commit/b2f646c4eb33db410a5144a73e4676e52d5bc64c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox


SparkQA commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-883138761


   **[Test build #141305 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141305/testReport)**
 for PR 33432 at commit 
[`294db33`](https://github.com/apache/spark/commit/294db339033b11ff8ccb14776f14f330ed77bf50).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883137742


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883137739


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141293/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883137735


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141302/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883137734


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883137732


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45810/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883137741


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141301/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883137733


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #33382: [SPARK-36137][SQL] HiveShim should fallback to getAllPartitionsOf even if directSQL is enabled in remote HMS

2021-07-19 Thread GitBox


sunchao commented on pull request #33382:
URL: https://github.com/apache/spark/pull/33382#issuecomment-883134666


   I feel adding one more config on top of the existing `tryDirectSql` may make 
it too complex. What if we introduce a new config and use that to decide 
whether Spark should fallback to call `getAllPartitionsMethod`? and default it 
to true but with a helpful message to tell users that they can switch it off to 
fail the query instead if they really want so?
   
   This should only affect those queries in a few scenarios which they used to 
fail but now can succeed, which IMO is a better outcome. 
   
   For instance:
   ```scala
   val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
   val shouldFallback = 
SQLConf.get.metastorePartitionPruningFallbackOnException
   try {
 getPartitionsByFilterMethod.invoke(hive, table, filter)
   .asInstanceOf[JArrayList[Partition]]
   } catch {
 case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
 shouldFallback =>
   logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
 "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
 "degrade performance. Modifying your Hive metastore 
configuration to set " +
 s"${tryDirectSqlConfVar.varname} to true (if it is not true 
already) may resolve " +
 "this problem. Otherwise, you can set " +
 
s"${SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ON_EXCEPTION.key} " +
 " to false and let the query fail instead.", ex)
   getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox


SparkQA commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883131348


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45810/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883129078


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #31096: [SPARK-34051][SQL] Support 32-bit unicode escape in string literals

2021-07-19 Thread GitBox


sarutak commented on pull request #31096:
URL: https://github.com/apache/spark/pull/31096#issuecomment-883126736


   @cloud-fan O.K, will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox


SparkQA commented on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883125681


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672846173



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##
@@ -79,9 +81,15 @@ abstract class FileFormatDataWriter(
 
   /** Write an iterator of records. */
   def writeWithIterator(iterator: Iterator[InternalRow]): Unit = {
+var count = 0L

Review comment:
   sounds good. let me update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


SparkQA commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883118578


   **[Test build #141301 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)**
 for PR 33428 at commit 
[`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox


SparkQA commented on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883118256


   **[Test build #141302 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)**
 for PR 33416 at commit 
[`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31096: [SPARK-34051][SQL] Support 32-bit unicode escape in string literals

2021-07-19 Thread GitBox


cloud-fan commented on pull request #31096:
URL: https://github.com/apache/spark/pull/31096#issuecomment-883115343


   Can we document this feature in `sql-ref-literals.md`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883113962


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


sunchao commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672830123



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/Write.java
##
@@ -62,4 +63,13 @@ default BatchWrite toBatch() {
   default StreamingWrite toStreaming() {
 throw new UnsupportedOperationException(description() + ": Streaming write 
is not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {
+CustomMetric[] NO_METRICS = {};
+return NO_METRICS;

Review comment:
   ditto

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -276,7 +279,12 @@ case class OverwritePartitionsDynamicExec(
 case class WriteToDataSourceV2Exec(
 batchWrite: BatchWrite,
 refreshCache: () => Unit,
-query: SparkPlan) extends V2TableWriteExec {
+query: SparkPlan,
+writeMetrics: Seq[CustomMetric]) extends V2TableWriteExec {
+
+  override val customMetrics = writeMetrics.map { customMetric =>

Review comment:
   nit: add type annotation to public member of the class

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala
##
@@ -79,9 +81,15 @@ abstract class FileFormatDataWriter(
 
   /** Write an iterator of records. */
   def writeWithIterator(iterator: Iterator[InternalRow]): Unit = {
+var count = 0L

Review comment:
   could we have something like:
   ```scala
   def writeWithMetrics(record: InternalRow): Unit = {
 if (count % CustomMetrics.NUM_ROWS_PER_UPDATE == 0) {
   CustomMetrics.updateMetrics(currentMetricsValues, customMetrics)
 }
 count += 1
 write(record)
   }
   ```
   and then update all the places to use this? instead of replicating the code 
in several places?
   

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
##
@@ -292,6 +300,10 @@ trait V2ExistingTableWriteExec extends V2TableWriteExec {
   def refreshCache: () => Unit
   def write: Write
 
+  override val customMetrics = write.supportedCustomMetrics().map { 
customMetric =>

Review comment:
   ditto

##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java
##
@@ -104,4 +105,13 @@
* @throws IOException if failure happens during disk/network IO like 
writing files.
*/
   void abort() throws IOException;
+
+  /**
+   * Returns an array of custom task metrics. By default it returns empty 
array. Note that it is
+   * not recommended to put heavy logic in this method as it may affect 
writing performance.
+   */
+  default CustomTaskMetric[] currentMetricsValues() {
+CustomTaskMetric[] NO_METRICS = {};
+return NO_METRICS;

Review comment:
   nit: can we just do it in one line: `return new CustomTaskMetric[]{};`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883031312


   **[Test build #141293 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141293/testReport)**
 for PR 33350 at commit 
[`7473aea`](https://github.com/apache/spark/commit/7473aea9aa91586366206b7a01ed3b6e11f7236a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


SparkQA commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883106788


   **[Test build #141293 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141293/testReport)**
 for PR 33350 at commit 
[`7473aea`](https://github.com/apache/spark/commit/7473aea9aa91586366206b7a01ed3b6e11f7236a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


AngersZh commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883104576


   > The error code is the following. It looks like OOM happening again.
   > 
   > * 
https://github.com/AngersZh/spark/runs/3110053861?check_suite_focus=true
   > 
   > ```
   >  ./build/mvn: line 178:  1699 Killed  "${MVN_BIN}" "$@"
   > 2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137.
   > ```
   
   Yea, should not be related to my pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu edited a comment on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


AngersZh edited a comment on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883104093


   > Thank you for pinging me, @AngersZh . BTW, you should use `[FOLLOWUP]` 
when you reuse the JIRA id.
   
   H, it's not a reuse jira. I write wrong jira id  `SPARK-36021` first  
and change to correct `SPARK-36201`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


AngersZh commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883104093


   > Thank you for pinging me, @AngersZh . BTW, you should use `[FOLLOWUP]` 
when you reuse the JIRA id.
   
   H, it's not a reuse jira. I write wrong jira id to `SPARK-36021` and 
change to correct `SPARK-36201`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883100612


   **[Test build #141304 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141304/testReport)**
 for PR 33429 at commit 
[`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672832410



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   Added test and updated the method doc. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672831210



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -1025,17 +938,23 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   /**
* Validates the options used for alter table commands after table and 
columns are resolved.
*/
-  private def checkAlterTableCommand(alter: AlterTableCommand): Unit = {
-def checkColumnNotExists(fieldNames: Seq[String], struct: StructType): 
Unit = {
+  private def checkAlterTableColumnCommand(alter: AlterTableColumnCommand): 
Unit = {
+def checkColumnNotExists(op: String, fieldNames: Seq[String], struct: 
StructType): Unit = {
   if (struct.findNestedField(fieldNames, includeCollections = 
true).isDefined) {
-alter.failAnalysis(s"Cannot ${alter.operation} column, because 
${fieldNames.quoted} " +
+alter.failAnalysis(s"Cannot $op column, because ${fieldNames.quoted} " 
+
   s"already exists in ${struct.treeString}")
   }
 }
 
 alter match {
+  case AlterTableAddColumns(table: ResolvedTable, colsToAdd) =>
+colsToAdd.foreach { colToAdd =>
+  checkColumnNotExists("add", colToAdd.name, table.schema)

Review comment:
   What if the to-be-added columns have duplicated names? e.g. `ADD COLUMNS 
a ..., a ...`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


HyukjinKwon commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883097397


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672830617



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   for top-level columns, `path` can be `ResolvedFieldName(Nil, 
StructField("root", tableSchema))` 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672830076



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   Or, we can change `QualifiedColType`
   ```
   case class QualifiedColType(
 path: FieldName,
 colName: String,
 ...
   )
   ```
   
   then we can keep the `Origin` of the column name, and save the resolution of 
the path at least.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   Or, we can change `QualifiedColType`
   ```
   case class QualifiedColType(
 path: FieldName,
 colName: String,
 ...
   )
   ```
   
   then we can keep the `Origin` of the column name, and resolve the path at 
most once.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883095290


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45812/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883094329


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883088620


   **[Test build #141300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)**
 for PR 33429 at commit 
[`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883094329


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141300/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883094271


   **[Test build #141300 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)**
 for PR 33429 at commit 
[`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait AQEShuffleReadRule extends Rule[SparkPlan] `
 * `case class CoalesceShufflePartitions(session: SparkSession) extends 
AQEShuffleReadRule `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672827401



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   This looks fine. We can remove the `if hasUnresolvedColumns(cols)` which 
is not very useful here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox


SparkQA commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883090499


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45810/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition

2021-07-19 Thread GitBox


SparkQA commented on pull request #33431:
URL: https://github.com/apache/spark/pull/33431#issuecomment-883090213


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-882968356


   **[Test build #141288 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)**
 for PR 33424 at commit 
[`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883011701


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672821019



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   maybe we should do
   ```
   case class QualifiedColType {
 path: Option[FieldName], // None for top-level columns
 colName: String,
 ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33410: [WIP][SPARK-36204][INFRA][BUILD] Deduplicate Scala 2.13 daily build

2021-07-19 Thread GitBox


SparkQA commented on pull request #33410:
URL: https://github.com/apache/spark/pull/33410#issuecomment-883088778


   **[Test build #141303 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141303/testReport)**
 for PR 33410 at commit 
[`48bfb39`](https://github.com/apache/spark/commit/48bfb39f0d3a8d15614998297ba56addf3b756b5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog

2021-07-19 Thread GitBox


SparkQA commented on pull request #33416:
URL: https://github.com/apache/spark/pull/33416#issuecomment-883088693


   **[Test build #141302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)**
 for PR 33416 at commit 
[`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883088620


   **[Test build #141300 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)**
 for PR 33429 at commit 
[`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


SparkQA commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883088661


   **[Test build #141301 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)**
 for PR 33428 at commit 
[`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox


SparkQA commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-883088569


   **[Test build #141299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141299/testReport)**
 for PR 33432 at commit 
[`dfabd0f`](https://github.com/apache/spark/commit/dfabd0fce7e9079cd66e75be0eb02a1c814c8b0b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883084988


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec

2021-07-19 Thread GitBox


SparkQA commented on pull request #33424:
URL: https://github.com/apache/spark/pull/33424#issuecomment-883082141


   **[Test build #141288 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)**
 for PR 33424 at commit 
[`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #33425: [SPARK-32919][FOLLOW-UP] Filter out driver in the merger locations and fix the return type of RemoveShufflePushMergerLocations

2021-07-19 Thread GitBox


otterc commented on a change in pull request #33425:
URL: https://github.com/apache/spark/pull/33425#discussion_r672821698



##
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##
@@ -2093,6 +2093,9 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
   Seq("hostC", "hostB", "hostD").sorted)
 assert(master.getShufflePushMergerLocations(4, 
Set.empty).map(_.host).sorted ===
   Seq("hostB", "hostA", "hostC", "hostD").sorted)
+master.removeShufflePushMergerLocation("hostA")
+assert(master.getShufflePushMergerLocations(4, 
Set.empty).map(_.host).sorted ===
+  Seq("hostB", "hostC", "hostD").sorted)

Review comment:
   Can we extend this UT to verify the driver host is excluded? It will 
ensure that any future changes will not change this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier

2021-07-19 Thread GitBox


cloud-fan commented on a change in pull request #33200:
URL: https://github.com/apache/spark/pull/33200#discussion_r672821019



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   /**
* Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table commands.
+   * for alter table column commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
-  case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
+  case a: AlterTableColumnCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
-  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u)
+  case u: UnresolvedFieldName => resolveFieldNames(table, u.name, 
u.origin)
+}
+
+  case a @ AlterTableAddColumns(r: ResolvedTable, cols) if 
hasUnresolvedColumns(cols) =>

Review comment:
   maybe we should do
   ```
   case class QualifiedColType {
 path: Option[FieldName], // None for top-level columns
 colName: String,
 ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


HyukjinKwon commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883075964


   let me rebase. seems like it couldn't detect my GitHub actions job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox


c21 commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-883074400


   cc @cloud-fan could you help take a look when you have time? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-19 Thread GitBox


c21 opened a new pull request #33432:
URL: https://github.com/apache/spark/pull/33432


   
   
   ### What changes were proposed in this pull request?
   
   This is a re-work of https://github.com/apache/spark/pull/30003, here we add 
support for writing Hive bucketed table with Parquet/ORC file format (data 
source v1 write path and Hive hash as the hash function). Support for Hive's 
other file format will be added in follow up PR.
   
   The changes are mostly on:
   
   * `HiveMetastoreCatalog.scala`: When converting hive table relation to data 
source relation, pass bucket info (BucketSpec) and other hive related info as 
options into `HadoopFsRelation` and `LogicalRelation`, which can be later 
accessed by `FileFormatWriter` to customize bucket id and file name.
   
   * `FileFormatWriter.scala`: Use `HiveHash` for `bucketIdExpression` if it's 
writing to Hive bucketed table. In addition, Spark output file name should 
follow Hive/Presto/Trino bucketed file naming convention. Introduce another 
parameter `bucketFileNamePrefix` and it introduces subsequent change in 
`FileFormatDataWriter`.
   
   * `HadoopMapReduceCommitProtocol`: Implement the new file name APIs 
introduced in https://github.com/apache/spark/pull/33012, and change its 
sub-class `PathOutputCommitProtocol`, to make Hive bucketed table writing work 
with all commit protocol (including S3A commit protocol).
   
   ### Why are the changes needed?
   
   To make Spark write other-SQL-engines-compatible bucketed table. Currently 
Spark bucketed table cannot be leveraged by other SQL engines like Hive and 
Presto, because it uses a different hash function (Spark murmur3hash). With 
this PR, the Spark-written-Hive-bucketed-table can be efficiently read by 
Presto and Hive to do bucket filter pruning, join, group-by, etc. This was and 
is blocking several companies (confirmed from Facebook, Lyft, etc) migrate 
bucketing workload from Hive to Spark.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, any Hive bucketed table (with Parquet/ORC format) written by Spark, is 
properly bucketed and can be efficiently processed by Hive and Presto/Trino.
   
   ### How was this patch tested?
   
   * Added unit test in BucketedWriteWithHiveSupportSuite.scala, to verify 
bucket file names and each row in each bucket is written properly.
   * WIP test in production. Will update later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883072761


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package

2021-07-19 Thread GitBox


SparkQA commented on pull request #33350:
URL: https://github.com/apache/spark/pull/33350#issuecomment-883071887


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069750


   It seems that the master branch's Java 17 job is suffering with the same 
reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


dongjoon-hyun edited a comment on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206


   The error code is the following. It looks like OOM happening again.
   - 
https://github.com/AngersZh/spark/runs/3110053861?check_suite_focus=true
   ```
./build/mvn: line 178:  1699 Killed  "${MVN_BIN}" "$@"
   2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206


   The error code is the following. It looks like OOM happening again.
   ```
./build/mvn: line 178:  1699 Killed  "${MVN_BIN}" "$@"
   2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too

2021-07-19 Thread GitBox


dongjoon-hyun commented on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883067722


   Wow, the GitHub Action failures look really weird.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jhu-chang commented on a change in pull request #33263: [SPARK-35027][CORE] Close the inputStream in FileAppender when writin…

2021-07-19 Thread GitBox


jhu-chang commented on a change in pull request #33263:
URL: https://github.com/apache/spark/pull/33263#discussion_r672813847



##
File path: core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala
##
@@ -76,7 +80,13 @@ private[spark] class FileAppender(inputStream: InputStream, 
file: File, bufferSi
   }
 }
   } {
-closeFile()
+try {
+  if (closeStreams) {
+inputStream.close()
+  }

Review comment:
   @Ngone51 @srowen 
   It's for both normal exit and exception.
   Sorry, i don't quite understand the last comment: do you mean handling the 
error from "inputStream.close()"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672812727



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   Ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


viirya commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672812642



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources
+
+import java.util.Collections
+
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.connector.catalog.{Identifier, 
InMemoryTableCatalog}
+import org.apache.spark.sql.functions.lit
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.StructType
+
+class FileFormatDataWriterMetricSuite

Review comment:
   Yea, I think so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


HyukjinKwon commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883060656


   Jenkins, ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672811174



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.datasources
+
+import java.util.Collections
+
+import org.scalatest.BeforeAndAfter
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.connector.catalog.{Identifier, 
InMemoryTableCatalog}
+import org.apache.spark.sql.functions.lit
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.StructType
+
+class FileFormatDataWriterMetricSuite

Review comment:
   For this one, I guess we need @gengliangwang 's review since he 
requested?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox


AmplabJenkins commented on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-883059530


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141285/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox


SparkQA removed a comment on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-882962473


   **[Test build #141285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)**
 for PR 33422 at commit 
[`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672810180



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   Also, it would be great if you put the explanation at line 48.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe

2021-07-19 Thread GitBox


SparkQA commented on pull request #33422:
URL: https://github.com/apache/spark/pull/33422#issuecomment-883058475


   **[Test build #141285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)**
 for PR 33422 at commit 
[`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


dongjoon-hyun commented on a change in pull request #33239:
URL: https://github.com/apache/spark/pull/33239#discussion_r672809311



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala
##
@@ -51,7 +51,7 @@ object CustomMetrics {
   currentMetricsValues: Seq[CustomTaskMetric],
   customMetrics: Map[String, SQLMetric]): Unit = {
 currentMetricsValues.foreach { metric =>
-  customMetrics(metric.name()).set(metric.value())
+  customMetrics.get(metric.name()).map(_.set(metric.value()))

Review comment:
   This looks more robust. Could you add a test case for this no-op too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file

2021-07-19 Thread GitBox


mridulm commented on pull request #32401:
URL: https://github.com/apache/spark/pull/32401#issuecomment-883057074


   Thanks for the clarifications ! This sounds good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox


mridulm commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883056728


   +CC @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-19 Thread GitBox


mridulm commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-883056445


   Merged to master and branch-3.2
   Thanks for working on this @zhouyejoe !
   Thanks for all the reviews @Ngone51, @otterc, @venkata91 :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] asfgit closed pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way

2021-07-19 Thread GitBox


asfgit closed pull request #33078:
URL: https://github.com/apache/spark/pull/33078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883053190


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tobiasedwards commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation

2021-07-19 Thread GitBox


tobiasedwards commented on pull request #33428:
URL: https://github.com/apache/spark/pull/33428#issuecomment-883053001


   There we go, that should be better.
   
   When I botched the rebase the bot added some incorrect labels though, are 
you able to remove them, @HyukjinKwon?
   
   Thanks again for your help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


dongjoon-hyun commented on a change in pull request #33429:
URL: https://github.com/apache/spark/pull/33429#discussion_r672805734



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala
##
@@ -88,23 +88,23 @@ case class CoalesceShufflePartitions(session: SparkSession) 
extends CustomShuffl
 val specsMap = shuffleStageInfos.zip(newPartitionSpecs).map { case 
(stageInfo, partSpecs) =>
   (stageInfo.shuffleStage.id, partSpecs)
 }.toMap
-updateShuffleReaders(plan, specsMap)
+updateShuffleRead(plan, specsMap)
   } else {
 plan
   }
 }
   }
 
-  private def updateShuffleReaders(
+  private def updateShuffleRead(

Review comment:
   Like the other places, `updateShuffleRead` -> `updateShuffleReads`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33409:
URL: https://github.com/apache/spark/pull/33409#issuecomment-883051150


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence

2021-07-19 Thread GitBox


AmplabJenkins removed a comment on pull request #33427:
URL: https://github.com/apache/spark/pull/33427#issuecomment-883051152


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path

2021-07-19 Thread GitBox


SparkQA commented on pull request #33239:
URL: https://github.com/apache/spark/pull/33239#issuecomment-883052445


   **[Test build #141298 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141298/testReport)**
 for PR 33239 at commit 
[`bccc98b`](https://github.com/apache/spark/commit/bccc98b7f3afad110ac450c183e341938dd20bc9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE

2021-07-19 Thread GitBox


SparkQA commented on pull request #33429:
URL: https://github.com/apache/spark/pull/33429#issuecomment-883052316


   **[Test build #141297 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141297/testReport)**
 for PR 33429 at commit 
[`f2b0bab`](https://github.com/apache/spark/commit/f2b0babd5835d10ee894943b49def6a9cb01fcad).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ

2021-07-19 Thread GitBox


SparkQA commented on pull request #33430:
URL: https://github.com/apache/spark/pull/33430#issuecomment-883052246


   **[Test build #141296 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141296/testReport)**
 for PR 33430 at commit 
[`955ecf6`](https://github.com/apache/spark/commit/955ecf6421b25244ad647a61a53998948064b451).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >