[GitHub] [spark] cloud-fan commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
cloud-fan commented on a change in pull request #33429: URL: https://github.com/apache/spark/pull/33429#discussion_r672855707 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -139,21 +139,21 @@ class AdaptiveQueryExecSuite } } - private def checkNumLocalShuffleReaders( - plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Unit = { + private def checkNumLocalShuffleReads( Review comment: or "num shuffles with local read", but it's longer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
AmplabJenkins removed a comment on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883137733 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45812/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition
AmplabJenkins removed a comment on pull request #33431: URL: https://github.com/apache/spark/pull/33431#issuecomment-883137742 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45809/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog
SparkQA removed a comment on pull request #33416: URL: https://github.com/apache/spark/pull/33416#issuecomment-883088693 **[Test build #141302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)** for PR 33416 at commit [`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog
AmplabJenkins removed a comment on pull request #33416: URL: https://github.com/apache/spark/pull/33416#issuecomment-883137735 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141302/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
AmplabJenkins removed a comment on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883137734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45811/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
AmplabJenkins removed a comment on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883137739 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141293/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
SparkQA removed a comment on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883088661 **[Test build #141301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)** for PR 33428 at commit [`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
AmplabJenkins removed a comment on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883137741 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141301/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ
AmplabJenkins removed a comment on pull request #33430: URL: https://github.com/apache/spark/pull/33430#issuecomment-883137732 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45810/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
cloud-fan commented on a change in pull request #33429: URL: https://github.com/apache/spark/pull/33429#discussion_r672855337 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -139,21 +139,21 @@ class AdaptiveQueryExecSuite } } - private def checkNumLocalShuffleReaders( - plan: SparkPlan, numShufflesWithoutLocalReader: Int = 0): Unit = { + private def checkNumLocalShuffleReads( Review comment: "num readers" seems more natural? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on pull request #33426: [SPARK-32920][FOLLOW-UP] Fix shuffleMergeFinalized directly calling rdd.getNumPartitions as RDD is not serialized to executor
otterc commented on pull request #33426: URL: https://github.com/apache/spark/pull/33426#issuecomment-883139314 cc @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883139077 **[Test build #141306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141306/testReport)** for PR 33239 at commit [`b2f646c`](https://github.com/apache/spark/commit/b2f646c4eb33db410a5144a73e4676e52d5bc64c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)
SparkQA commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-883138761 **[Test build #141305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141305/testReport)** for PR 33432 at commit [`294db33`](https://github.com/apache/spark/commit/294db339033b11ff8ccb14776f14f330ed77bf50). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition
AmplabJenkins commented on pull request #33431: URL: https://github.com/apache/spark/pull/33431#issuecomment-883137742 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45809/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
AmplabJenkins commented on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883137739 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141293/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog
AmplabJenkins commented on pull request #33416: URL: https://github.com/apache/spark/pull/33416#issuecomment-883137735 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141302/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
AmplabJenkins commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883137734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45811/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ
AmplabJenkins commented on pull request #33430: URL: https://github.com/apache/spark/pull/33430#issuecomment-883137732 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45810/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
AmplabJenkins commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883137741 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141301/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
AmplabJenkins commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883137733 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45812/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #33382: [SPARK-36137][SQL] HiveShim should fallback to getAllPartitionsOf even if directSQL is enabled in remote HMS
sunchao commented on pull request #33382: URL: https://github.com/apache/spark/pull/33382#issuecomment-883134666 I feel adding one more config on top of the existing `tryDirectSql` may make it too complex. What if we introduce a new config and use that to decide whether Spark should fallback to call `getAllPartitionsMethod`? and default it to true but with a helpful message to tell users that they can switch it off to fail the query instead if they really want so? This should only affect those queries in a few scenarios which they used to fail but now can succeed, which IMO is a better outcome. For instance: ```scala val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL val shouldFallback = SQLConf.get.metastorePartitionPruningFallbackOnException try { getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && shouldFallback => logWarning("Caught Hive MetaException attempting to get partition metadata by " + "filter from Hive. Falling back to fetching all partition metadata, which will " + "degrade performance. Modifying your Hive metastore configuration to set " + s"${tryDirectSqlConfVar.varname} to true (if it is not true already) may resolve " + "this problem. Otherwise, you can set " + s"${SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ON_EXCEPTION.key} " + " to false and let the query fail instead.", ex) getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ
SparkQA commented on pull request #33430: URL: https://github.com/apache/spark/pull/33430#issuecomment-883131348 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45810/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883129078 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45812/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #31096: [SPARK-34051][SQL] Support 32-bit unicode escape in string literals
sarutak commented on pull request #31096: URL: https://github.com/apache/spark/pull/31096#issuecomment-883126736 @cloud-fan O.K, will do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition
SparkQA commented on pull request #33431: URL: https://github.com/apache/spark/pull/33431#issuecomment-883125681 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45809/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
viirya commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672846173 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala ## @@ -79,9 +81,15 @@ abstract class FileFormatDataWriter( /** Write an iterator of records. */ def writeWithIterator(iterator: Iterator[InternalRow]): Unit = { +var count = 0L Review comment: sounds good. let me update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
SparkQA commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883118578 **[Test build #141301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)** for PR 33428 at commit [`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog
SparkQA commented on pull request #33416: URL: https://github.com/apache/spark/pull/33416#issuecomment-883118256 **[Test build #141302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)** for PR 33416 at commit [`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31096: [SPARK-34051][SQL] Support 32-bit unicode escape in string literals
cloud-fan commented on pull request #31096: URL: https://github.com/apache/spark/pull/31096#issuecomment-883115343 Can we document this feature in `sql-ref-literals.md`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883113962 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45811/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
sunchao commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672830123 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/Write.java ## @@ -62,4 +63,13 @@ default BatchWrite toBatch() { default StreamingWrite toStreaming() { throw new UnsupportedOperationException(description() + ": Streaming write is not supported"); } + + /** + * Returns an array of supported custom metrics with name and description. + * By default it returns empty array. + */ + default CustomMetric[] supportedCustomMetrics() { +CustomMetric[] NO_METRICS = {}; +return NO_METRICS; Review comment: ditto ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -276,7 +279,12 @@ case class OverwritePartitionsDynamicExec( case class WriteToDataSourceV2Exec( batchWrite: BatchWrite, refreshCache: () => Unit, -query: SparkPlan) extends V2TableWriteExec { +query: SparkPlan, +writeMetrics: Seq[CustomMetric]) extends V2TableWriteExec { + + override val customMetrics = writeMetrics.map { customMetric => Review comment: nit: add type annotation to public member of the class ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala ## @@ -79,9 +81,15 @@ abstract class FileFormatDataWriter( /** Write an iterator of records. */ def writeWithIterator(iterator: Iterator[InternalRow]): Unit = { +var count = 0L Review comment: could we have something like: ```scala def writeWithMetrics(record: InternalRow): Unit = { if (count % CustomMetrics.NUM_ROWS_PER_UPDATE == 0) { CustomMetrics.updateMetrics(currentMetricsValues, customMetrics) } count += 1 write(record) } ``` and then update all the places to use this? instead of replicating the code in several places? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala ## @@ -292,6 +300,10 @@ trait V2ExistingTableWriteExec extends V2TableWriteExec { def refreshCache: () => Unit def write: Write + override val customMetrics = write.supportedCustomMetrics().map { customMetric => Review comment: ditto ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DataWriter.java ## @@ -104,4 +105,13 @@ * @throws IOException if failure happens during disk/network IO like writing files. */ void abort() throws IOException; + + /** + * Returns an array of custom task metrics. By default it returns empty array. Note that it is + * not recommended to put heavy logic in this method as it may affect writing performance. + */ + default CustomTaskMetric[] currentMetricsValues() { +CustomTaskMetric[] NO_METRICS = {}; +return NO_METRICS; Review comment: nit: can we just do it in one line: `return new CustomTaskMetric[]{};`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
SparkQA removed a comment on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883031312 **[Test build #141293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141293/testReport)** for PR 33350 at commit [`7473aea`](https://github.com/apache/spark/commit/7473aea9aa91586366206b7a01ed3b6e11f7236a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
SparkQA commented on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883106788 **[Test build #141293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141293/testReport)** for PR 33350 at commit [`7473aea`](https://github.com/apache/spark/commit/7473aea9aa91586366206b7a01ed3b6e11f7236a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
AngersZh commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883104576 > The error code is the following. It looks like OOM happening again. > > * https://github.com/AngersZh/spark/runs/3110053861?check_suite_focus=true > > ``` > ./build/mvn: line 178: 1699 Killed "${MVN_BIN}" "$@" > 2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137. > ``` Yea, should not be related to my pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu edited a comment on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
AngersZh edited a comment on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883104093 > Thank you for pinging me, @AngersZh . BTW, you should use `[FOLLOWUP]` when you reuse the JIRA id. H, it's not a reuse jira. I write wrong jira id `SPARK-36021` first and change to correct `SPARK-36201` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
AngersZh commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883104093 > Thank you for pinging me, @AngersZh . BTW, you should use `[FOLLOWUP]` when you reuse the JIRA id. H, it's not a reuse jira. I write wrong jira id to `SPARK-36021` and change to correct `SPARK-36201` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883100612 **[Test build #141304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141304/testReport)** for PR 33429 at commit [`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
viirya commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672832410 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala ## @@ -51,7 +51,7 @@ object CustomMetrics { currentMetricsValues: Seq[CustomTaskMetric], customMetrics: Map[String, SQLMetric]): Unit = { currentMetricsValues.foreach { metric => - customMetrics(metric.name()).set(metric.value()) + customMetrics.get(metric.name()).map(_.set(metric.value())) Review comment: Added test and updated the method doc. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672831210 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -1025,17 +938,23 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { /** * Validates the options used for alter table commands after table and columns are resolved. */ - private def checkAlterTableCommand(alter: AlterTableCommand): Unit = { -def checkColumnNotExists(fieldNames: Seq[String], struct: StructType): Unit = { + private def checkAlterTableColumnCommand(alter: AlterTableColumnCommand): Unit = { +def checkColumnNotExists(op: String, fieldNames: Seq[String], struct: StructType): Unit = { if (struct.findNestedField(fieldNames, includeCollections = true).isDefined) { -alter.failAnalysis(s"Cannot ${alter.operation} column, because ${fieldNames.quoted} " + +alter.failAnalysis(s"Cannot $op column, because ${fieldNames.quoted} " + s"already exists in ${struct.treeString}") } } alter match { + case AlterTableAddColumns(table: ResolvedTable, colsToAdd) => +colsToAdd.foreach { colToAdd => + checkColumnNotExists("add", colToAdd.name, table.schema) Review comment: What if the to-be-added columns have duplicated names? e.g. `ADD COLUMNS a ..., a ...` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
HyukjinKwon commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883097397 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672830617 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: for top-level columns, `path` can be `ResolvedFieldName(Nil, StructField("root", tableSchema))` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672830076 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: Or, we can change `QualifiedColType` ``` case class QualifiedColType( path: FieldName, colName: String, ... ) ``` then we can keep the `Origin` of the column name, and save the resolution of the path at least. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: Or, we can change `QualifiedColType` ``` case class QualifiedColType( path: FieldName, colName: String, ... ) ``` then we can keep the `Origin` of the column name, and resolve the path at most once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883095290 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45812/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
AmplabJenkins removed a comment on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883094329 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141300/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA removed a comment on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883088620 **[Test build #141300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)** for PR 33429 at commit [`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
AmplabJenkins commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883094329 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141300/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883094271 **[Test build #141300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)** for PR 33429 at commit [`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait AQEShuffleReadRule extends Rule[SparkPlan] ` * `case class CoalesceShufflePartitions(session: SparkSession) extends AQEShuffleReadRule ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672827401 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: This looks fine. We can remove the `if hasUnresolvedColumns(cols)` which is not very useful here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ
SparkQA commented on pull request #33430: URL: https://github.com/apache/spark/pull/33430#issuecomment-883090499 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45810/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33431: [SPARK-36221][SQL] Make sure CustomShuffleReaderExec has at least one partition
SparkQA commented on pull request #33431: URL: https://github.com/apache/spark/pull/33431#issuecomment-883090213 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45809/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec
SparkQA removed a comment on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-882968356 **[Test build #141288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)** for PR 33424 at commit [`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
AmplabJenkins removed a comment on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
AmplabJenkins removed a comment on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
AmplabJenkins removed a comment on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883011701 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec
AmplabJenkins removed a comment on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672821019 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: maybe we should do ``` case class QualifiedColType { path: Option[FieldName], // None for top-level columns colName: String, ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33410: [WIP][SPARK-36204][INFRA][BUILD] Deduplicate Scala 2.13 daily build
SparkQA commented on pull request #33410: URL: https://github.com/apache/spark/pull/33410#issuecomment-883088778 **[Test build #141303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141303/testReport)** for PR 33410 at commit [`48bfb39`](https://github.com/apache/spark/commit/48bfb39f0d3a8d15614998297ba56addf3b756b5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33416: [SPARK-36207][PYTHON] Expose databaseExists in pyspark.sql.catalog
SparkQA commented on pull request #33416: URL: https://github.com/apache/spark/pull/33416#issuecomment-883088693 **[Test build #141302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141302/testReport)** for PR 33416 at commit [`ac88451`](https://github.com/apache/spark/commit/ac88451fa14154ad111c2fe2399c8576b133a03f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883088620 **[Test build #141300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141300/testReport)** for PR 33429 at commit [`2840b47`](https://github.com/apache/spark/commit/2840b475c6de6e3bd5bd3cfce7e981a289ab1e39). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
SparkQA commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883088661 **[Test build #141301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141301/testReport)** for PR 33428 at commit [`f4330a5`](https://github.com/apache/spark/commit/f4330a5abbd87c19191764b59bb5d55bf6472432). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)
SparkQA commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-883088569 **[Test build #141299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141299/testReport)** for PR 33432 at commit [`dfabd0f`](https://github.com/apache/spark/commit/dfabd0fce7e9079cd66e75be0eb02a1c814c8b0b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
AmplabJenkins commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883087051 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45808/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
AmplabJenkins commented on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883087045 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45807/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec
AmplabJenkins commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883087048 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141288/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883084988 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45811/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33424: [SPARK-36213][SQL] Normalize PartitionSpec for Describe Table Command with PartitionSpec
SparkQA commented on pull request #33424: URL: https://github.com/apache/spark/pull/33424#issuecomment-883082141 **[Test build #141288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141288/testReport)** for PR 33424 at commit [`afa5539`](https://github.com/apache/spark/commit/afa55393f8d0c6884ed1d47ac3ced1112a87e7b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on a change in pull request #33425: [SPARK-32919][FOLLOW-UP] Filter out driver in the merger locations and fix the return type of RemoveShufflePushMergerLocations
otterc commented on a change in pull request #33425: URL: https://github.com/apache/spark/pull/33425#discussion_r672821698 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala ## @@ -2093,6 +2093,9 @@ class BlockManagerSuite extends SparkFunSuite with Matchers with BeforeAndAfterE Seq("hostC", "hostB", "hostD").sorted) assert(master.getShufflePushMergerLocations(4, Set.empty).map(_.host).sorted === Seq("hostB", "hostA", "hostC", "hostD").sorted) +master.removeShufflePushMergerLocation("hostA") +assert(master.getShufflePushMergerLocations(4, Set.empty).map(_.host).sorted === + Seq("hostB", "hostC", "hostD").sorted) Review comment: Can we extend this UT to verify the driver host is excluded? It will ensure that any future changes will not change this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33200: [SPARK-36006][SQL] Migrate ALTER TABLE ... ADD/REPLACE COLUMNS commands to use UnresolvedTable to resolve the identifier
cloud-fan commented on a change in pull request #33200: URL: https://github.com/apache/spark/pull/33200#discussion_r672821019 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -3574,15 +3568,64 @@ class Analyzer(override val catalogManager: CatalogManager) /** * Rule to mostly resolve, normalize and rewrite column names based on case sensitivity - * for alter table commands. + * for alter table column commands. */ - object ResolveAlterTableCommands extends Rule[LogicalPlan] { + object ResolveAlterTableColumnCommands extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp { - case a: AlterTableCommand if a.table.resolved && hasUnresolvedFieldName(a) => + case a: AlterTableColumnCommand if a.table.resolved && hasUnresolvedFieldName(a) => val table = a.table.asInstanceOf[ResolvedTable] a.transformExpressions { - case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u) + case u: UnresolvedFieldName => resolveFieldNames(table, u.name, u.origin) +} + + case a @ AlterTableAddColumns(r: ResolvedTable, cols) if hasUnresolvedColumns(cols) => Review comment: maybe we should do ``` case class QualifiedColType { path: Option[FieldName], // None for top-level columns colName: String, ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
HyukjinKwon commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883075964 let me rebase. seems like it couldn't detect my GitHub actions job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)
c21 commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-883074400 cc @cloud-fan could you help take a look when you have time? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)
c21 opened a new pull request #33432: URL: https://github.com/apache/spark/pull/33432 ### What changes were proposed in this pull request? This is a re-work of https://github.com/apache/spark/pull/30003, here we add support for writing Hive bucketed table with Parquet/ORC file format (data source v1 write path and Hive hash as the hash function). Support for Hive's other file format will be added in follow up PR. The changes are mostly on: * `HiveMetastoreCatalog.scala`: When converting hive table relation to data source relation, pass bucket info (BucketSpec) and other hive related info as options into `HadoopFsRelation` and `LogicalRelation`, which can be later accessed by `FileFormatWriter` to customize bucket id and file name. * `FileFormatWriter.scala`: Use `HiveHash` for `bucketIdExpression` if it's writing to Hive bucketed table. In addition, Spark output file name should follow Hive/Presto/Trino bucketed file naming convention. Introduce another parameter `bucketFileNamePrefix` and it introduces subsequent change in `FileFormatDataWriter`. * `HadoopMapReduceCommitProtocol`: Implement the new file name APIs introduced in https://github.com/apache/spark/pull/33012, and change its sub-class `PathOutputCommitProtocol`, to make Hive bucketed table writing work with all commit protocol (including S3A commit protocol). ### Why are the changes needed? To make Spark write other-SQL-engines-compatible bucketed table. Currently Spark bucketed table cannot be leveraged by other SQL engines like Hive and Presto, because it uses a different hash function (Spark murmur3hash). With this PR, the Spark-written-Hive-bucketed-table can be efficiently read by Presto and Hive to do bucket filter pruning, join, group-by, etc. This was and is blocking several companies (confirmed from Facebook, Lyft, etc) migrate bucketing workload from Hive to Spark. ### Does this PR introduce _any_ user-facing change? Yes, any Hive bucketed table (with Parquet/ORC format) written by Spark, is properly bucketed and can be efficiently processed by Hive and Presto/Trino. ### How was this patch tested? * Added unit test in BucketedWriteWithHiveSupportSuite.scala, to verify bucket file names and each row in each bucket is written properly. * WIP test in production. Will update later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883072761 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33350: [SPARK-36136][SQL][TESTS] Refactor PruneFileSourcePartitionsSuite etc to a different package
SparkQA commented on pull request #33350: URL: https://github.com/apache/spark/pull/33350#issuecomment-883071887 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45807/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
dongjoon-hyun commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883069750 It seems that the master branch's Java 17 job is suffering with the same reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
dongjoon-hyun edited a comment on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206 The error code is the following. It looks like OOM happening again. - https://github.com/AngersZh/spark/runs/3110053861?check_suite_focus=true ``` ./build/mvn: line 178: 1699 Killed "${MVN_BIN}" "$@" 2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
dongjoon-hyun commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883069206 The error code is the following. It looks like OOM happening again. ``` ./build/mvn: line 178: 1699 Killed "${MVN_BIN}" "$@" 2021-07-20T04:37:47.6486105Z ##[error]Process completed with exit code 137. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33409: [SPARK-36201][SQL][FOLLOWUP] Schema check should check inner field too
dongjoon-hyun commented on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883067722 Wow, the GitHub Action failures look really weird. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jhu-chang commented on a change in pull request #33263: [SPARK-35027][CORE] Close the inputStream in FileAppender when writin…
jhu-chang commented on a change in pull request #33263: URL: https://github.com/apache/spark/pull/33263#discussion_r672813847 ## File path: core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala ## @@ -76,7 +80,13 @@ private[spark] class FileAppender(inputStream: InputStream, file: File, bufferSi } } } { -closeFile() +try { + if (closeStreams) { +inputStream.close() + } Review comment: @Ngone51 @srowen It's for both normal exit and exception. Sorry, i don't quite understand the last comment: do you mean handling the error from "inputStream.close()"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
viirya commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672812727 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala ## @@ -51,7 +51,7 @@ object CustomMetrics { currentMetricsValues: Seq[CustomTaskMetric], customMetrics: Map[String, SQLMetric]): Unit = { currentMetricsValues.foreach { metric => - customMetrics(metric.name()).set(metric.value()) + customMetrics.get(metric.name()).map(_.set(metric.value())) Review comment: Ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
viirya commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672812642 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources + +import java.util.Collections + +import org.scalatest.BeforeAndAfter +import org.scalatest.time.SpanSugar._ + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryTableCatalog} +import org.apache.spark.sql.functions.lit +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.StructType + +class FileFormatDataWriterMetricSuite Review comment: Yea, I think so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
HyukjinKwon commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883060656 Jenkins, ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
dongjoon-hyun commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672811174 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriterMetricSuite.scala ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources + +import java.util.Collections + +import org.scalatest.BeforeAndAfter +import org.scalatest.time.SpanSugar._ + +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.connector.catalog.{Identifier, InMemoryTableCatalog} +import org.apache.spark.sql.functions.lit +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.StructType + +class FileFormatDataWriterMetricSuite Review comment: For this one, I guess we need @gengliangwang 's review since he requested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe
AmplabJenkins commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-883059530 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141285/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe
SparkQA removed a comment on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-882962473 **[Test build #141285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)** for PR 33422 at commit [`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
dongjoon-hyun commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672810180 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala ## @@ -51,7 +51,7 @@ object CustomMetrics { currentMetricsValues: Seq[CustomTaskMetric], customMetrics: Map[String, SQLMetric]): Unit = { currentMetricsValues.foreach { metric => - customMetrics(metric.name()).set(metric.value()) + customMetrics.get(metric.name()).map(_.set(metric.value())) Review comment: Also, it would be great if you put the explanation at line 48. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33422: [SPARK-34806][SQL] Add Observation helper for Dataset.observe
SparkQA commented on pull request #33422: URL: https://github.com/apache/spark/pull/33422#issuecomment-883058475 **[Test build #141285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141285/testReport)** for PR 33422 at commit [`2a21bb3`](https://github.com/apache/spark/commit/2a21bb3017643410a81305d000af5e591b8ba3bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
dongjoon-hyun commented on a change in pull request #33239: URL: https://github.com/apache/spark/pull/33239#discussion_r672809311 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala ## @@ -51,7 +51,7 @@ object CustomMetrics { currentMetricsValues: Seq[CustomTaskMetric], customMetrics: Map[String, SQLMetric]): Unit = { currentMetricsValues.foreach { metric => - customMetrics(metric.name()).set(metric.value()) + customMetrics.get(metric.name()).map(_.set(metric.value())) Review comment: This looks more robust. Could you add a test case for this no-op too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32401: [SPARK-35276][CORE] Calculate checksum for shuffle data and write as checksum file
mridulm commented on pull request #32401: URL: https://github.com/apache/spark/pull/32401#issuecomment-883057074 Thanks for the clarifications ! This sounds good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better
mridulm commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-883056728 +CC @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better
mridulm commented on pull request #33078: URL: https://github.com/apache/spark/pull/33078#issuecomment-883056445 Merged to master and branch-3.2 Thanks for working on this @zhouyejoe ! Thanks for all the reviews @Ngone51, @otterc, @venkata91 :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better way
asfgit closed pull request #33078: URL: https://github.com/apache/spark/pull/33078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883053190 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45808/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tobiasedwards commented on pull request #33428: [SPARK-36220][PYTHON] Fix pyspark.sql.types.Row type annotation
tobiasedwards commented on pull request #33428: URL: https://github.com/apache/spark/pull/33428#issuecomment-883053001 There we go, that should be better. When I botched the rebase the bot added some incorrect labels though, are you able to remove them, @HyukjinKwon? Thanks again for your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
dongjoon-hyun commented on a change in pull request #33429: URL: https://github.com/apache/spark/pull/33429#discussion_r672805734 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala ## @@ -88,23 +88,23 @@ case class CoalesceShufflePartitions(session: SparkSession) extends CustomShuffl val specsMap = shuffleStageInfos.zip(newPartitionSpecs).map { case (stageInfo, partSpecs) => (stageInfo.shuffleStage.id, partSpecs) }.toMap -updateShuffleReaders(plan, specsMap) +updateShuffleRead(plan, specsMap) } else { plan } } } - private def updateShuffleReaders( + private def updateShuffleRead( Review comment: Like the other places, `updateShuffleRead` -> `updateShuffleReads`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33409: [SPARK-36201][SQL] Schema check should check inner field too
AmplabJenkins removed a comment on pull request #33409: URL: https://github.com/apache/spark/pull/33409#issuecomment-883051150 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45805/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33427: [SPARK-36216][PYTHON][TESTS] Increase timeout for StreamingLinearRegressionWithTests. test_parameter_convergence
AmplabJenkins removed a comment on pull request #33427: URL: https://github.com/apache/spark/pull/33427#issuecomment-883051152 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45804/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33239: [SPARK-36030][SQL] Support DS v2 metrics at writing path
SparkQA commented on pull request #33239: URL: https://github.com/apache/spark/pull/33239#issuecomment-883052445 **[Test build #141298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141298/testReport)** for PR 33239 at commit [`bccc98b`](https://github.com/apache/spark/commit/bccc98b7f3afad110ac450c183e341938dd20bc9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33429: [SPARK-36217][SQL] Rename CustomShuffleReader and OptimizeLocalShuffleReader in AQE
SparkQA commented on pull request #33429: URL: https://github.com/apache/spark/pull/33429#issuecomment-883052316 **[Test build #141297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141297/testReport)** for PR 33429 at commit [`f2b0bab`](https://github.com/apache/spark/commit/f2b0babd5835d10ee894943b49def6a9cb01fcad). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33430: [SPARK-36046][SQL][FOLLOWUP] Implement prettyName for MakeTimestampNTZ and MakeTimestampLTZ
SparkQA commented on pull request #33430: URL: https://github.com/apache/spark/pull/33430#issuecomment-883052246 **[Test build #141296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141296/testReport)** for PR 33430 at commit [`955ecf6`](https://github.com/apache/spark/commit/955ecf6421b25244ad647a61a53998948064b451). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org