[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900043288 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142538/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900042969 **[Test build #142538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142538/testReport)** for PR 33744 at commit [`c02382c`](https://github.com/apache/spark/commit/c02382cb13967c6b18621cb4967e7a0330dd0f08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
AmplabJenkins commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900041664 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900041615 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
Ngone51 commented on pull request #33759: URL: https://github.com/apache/spark/pull/33759#issuecomment-900041178 This's a bug fix that should be backported to 3.2 (and even 3.0), so cc @gengliangwang fyi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900040783 **[Test build #142541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142541/testReport)** for PR 33757 at commit [`e7c42ab`](https://github.com/apache/spark/commit/e7c42ab7921b003f9733ed814a82d43114023579). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
SparkQA commented on pull request #33759: URL: https://github.com/apache/spark/pull/33759#issuecomment-900040798 **[Test build #142540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142540/testReport)** for PR 33759 at commit [`ca6e4c0`](https://github.com/apache/spark/commit/ca6e4c0b500ad90a542dbeeb4997779244517438). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes
SparkQA commented on pull request #33673: URL: https://github.com/apache/spark/pull/33673#issuecomment-900040831 **[Test build #142542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142542/testReport)** for PR 33673 at commit [`9457ca5`](https://github.com/apache/spark/commit/9457ca5b2460e7ef5511a78f35f90c999434a808). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
Ngone51 commented on pull request #33759: URL: https://github.com/apache/spark/pull/33759#issuecomment-900039887 cc @mridulm @cloud-fan @jiangxb1987 Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang
Ngone51 opened a new pull request #33759: URL: https://github.com/apache/spark/pull/33759 ### What changes were proposed in this pull request? Instead of exiting the executor within the RpcEnv's thread, exit the executor in a separate thread. ### Why are the changes needed? The current exit way in `onDisconnected` can cause the deadlock, which has the exact same root cause with https://github.com/apache/spark/pull/12012: * `onDisconnected` -> `System.exit` are called in sequence in the thread of `MessageLoop.threadpool` * `System.exit` triggers shutdown hooks and `executor.stop` is one of the hooks. * `executor.stop` stops the `Dispatcher`, which waits for the `MessageLoop.threadpool` to shutdown further. * Thus, the thread which runs `System.exit` waits for hooks to be done, but the `MessageLoop.threadpool` in the hook waits that thread to finish. Finally, this mutual dependence results in the deadlock. ### Does this PR introduce _any_ user-facing change? Yes, the executor shutdown won't hang. ### How was this patch tested? Pass existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API
SparkQA commented on pull request #33723: URL: https://github.com/apache/spark/pull/33723#issuecomment-900034196 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47037/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH
SparkQA commented on pull request #33736: URL: https://github.com/apache/spark/pull/33736#issuecomment-900032200 **[Test build #142539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142539/testReport)** for PR 33736 at commit [`25a3b60`](https://github.com/apache/spark/commit/25a3b60dc470c5e0a4f1796bde7d25bef567d9d4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH
AngersZh commented on a change in pull request #33736: URL: https://github.com/apache/spark/pull/33736#discussion_r690071365 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -79,6 +79,17 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema { """.stripMargin) } + def createTables(): Unit = { Review comment: > where else do we call this method? Since PlanStabilitySuite extends TPCDSBase and the create table in `beforeAll`, here we split `createTables` then for TPCH suite we can only just create our needed table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900029261 **[Test build #142538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142538/testReport)** for PR 33744 at commit [`c02382c`](https://github.com/apache/spark/commit/c02382cb13967c6b18621cb4967e7a0330dd0f08). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH
AngersZh commented on a change in pull request #33736: URL: https://github.com/apache/spark/pull/33736#discussion_r690070608 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -333,3 +340,22 @@ class TPCDSModifiedPlanStabilityWithStatsSuite extends PlanStabilitySuite { } } } + +abstract class TPCHPlanStabilitySuiteBase extends PlanStabilitySuite { Review comment: > why add an abstract class that has only one child? Have [withState] subclass before but found not have stats data then remove it. Remove TPCHPlanStabilitySuiteBase now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
SparkQA commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-900028834 **[Test build #142537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142537/testReport)** for PR 33753 at commit [`ac2ce78`](https://github.com/apache/spark/commit/ac2ce78ed0f1532e9d0a0bbd863dcee6ca174177). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
AmplabJenkins removed a comment on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-900026956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47036/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins removed a comment on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900026945 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47032/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
AmplabJenkins removed a comment on pull request #33758: URL: https://github.com/apache/spark/pull/33758#issuecomment-900026946 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47034/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
AmplabJenkins commented on pull request #33758: URL: https://github.com/apache/spark/pull/33758#issuecomment-900026946 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47034/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900026945 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47032/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
AmplabJenkins commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-900026956 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47036/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900024067 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47035/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
SparkQA commented on pull request #33758: URL: https://github.com/apache/spark/pull/33758#issuecomment-900023997 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47034/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA removed a comment on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-94006 **[Test build #142534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)** for PR 33757 at commit [`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
SparkQA commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-900022414 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47036/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
cloud-fan commented on a change in pull request #33758: URL: https://github.com/apache/spark/pull/33758#discussion_r690064618 ## File path: sql/core/src/test/resources/sql-tests/inputs/ansi/group-analytics.sql ## @@ -1 +0,0 @@ ---IMPORT group-analytics.sql Review comment: good catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries
HeartSaVioR commented on pull request #33749: URL: https://github.com/apache/spark/pull/33749#issuecomment-900022027 cc. @gengliangwang as well as this PR targets to Spark 3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries
HeartSaVioR commented on pull request #33749: URL: https://github.com/apache/spark/pull/33749#issuecomment-900021576 cc. @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-900016995 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47032/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
AmplabJenkins commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900014181 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900013873 **[Test build #142534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)** for PR 33757 at commit [`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API
SparkQA commented on pull request #33723: URL: https://github.com/apache/spark/pull/33723#issuecomment-900013805 **[Test build #142536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142536/testReport)** for PR 33723 at commit [`38d98ed`](https://github.com/apache/spark/commit/38d98ed9e613f181018bb6266d89ab772db84b4c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API
tanelk commented on pull request #33723: URL: https://github.com/apache/spark/pull/33723#issuecomment-900013436 pinging @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
AmplabJenkins commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900011854 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47031/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-900011833 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47031/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33754: [SPARK-36526][SQL] DSV2 Index Support: Add supportsIndex interface
HyukjinKwon commented on pull request #33754: URL: https://github.com/apache/spark/pull/33754#issuecomment-900011457 Oh, okay. so it really means the concept of an index in DBMS's table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
SparkQA commented on pull request #33758: URL: https://github.com/apache/spark/pull/33758#issuecomment-98151 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47034/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
gatorsmile commented on a change in pull request #33758: URL: https://github.com/apache/spark/pull/33758#discussion_r690048714 ## File path: sql/core/src/test/resources/sql-tests/inputs/ansi/group-analytics.sql ## @@ -1 +0,0 @@ ---IMPORT group-analytics.sql Review comment: Do we need to remove the result file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tooptoop4 commented on pull request #33332: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker
tooptoop4 commented on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-95346 can this log level change be merged? @steveloughran -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tooptoop4 removed a comment on pull request #33332: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker
tooptoop4 removed a comment on pull request #2: URL: https://github.com/apache/spark/pull/2#issuecomment-888246182 can this log level change be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins removed a comment on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-94901 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-94901 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
SparkQA commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-94831 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH
cloud-fan commented on a change in pull request #33736: URL: https://github.com/apache/spark/pull/33736#discussion_r690046057 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -333,3 +340,22 @@ class TPCDSModifiedPlanStabilityWithStatsSuite extends PlanStabilitySuite { } } } + +abstract class TPCHPlanStabilitySuiteBase extends PlanStabilitySuite { Review comment: why add an abstract class that has only one child? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
SparkQA commented on pull request #33753: URL: https://github.com/apache/spark/pull/33753#issuecomment-94377 **[Test build #142535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142535/testReport)** for PR 33753 at commit [`de8f15f`](https://github.com/apache/spark/commit/de8f15fb68eb8085f6a1a59b98051bbd55d01519). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH
cloud-fan commented on a change in pull request #33736: URL: https://github.com/apache/spark/pull/33736#discussion_r690045699 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -79,6 +79,17 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema { """.stripMargin) } + def createTables(): Unit = { Review comment: where else do we call this method? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #33754: [SPARK-36526][SQL] DSV2 Index Support: Add supportsIndex interface
huaxingao commented on pull request #33754: URL: https://github.com/apache/spark/pull/33754#issuecomment-94097 @HyukjinKwon Sorry for the confusion. I didn't put enough explanation in the PR's description. I updated the description. Hope it's clear now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-94006 **[Test build #142534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)** for PR 33757 at commit [`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-93489 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47032/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
HyukjinKwon commented on a change in pull request #33753: URL: https://github.com/apache/spark/pull/33753#discussion_r690044302 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ## @@ -222,3 +222,10 @@ private[sql] object AnyTimestampType extends AbstractDataType with Serializable def unapply(e: Expression): Boolean = acceptsType(e.dataType) } + +/** + * The interval type which conforms to the ANSI SQL standard. + * + * @since 3.2.0 Review comment: Let's also remove these since this isn't an API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
AmplabJenkins removed a comment on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-92450 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142530/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins removed a comment on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-92446 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source
AmplabJenkins removed a comment on pull request #33588: URL: https://github.com/apache/spark/pull/33588#issuecomment-92447 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47033/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
AmplabJenkins removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-92448 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins removed a comment on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-92449 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142531/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
AmplabJenkins commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-92450 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142530/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-92446 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
AmplabJenkins commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-92448 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source
AmplabJenkins commented on pull request #33588: URL: https://github.com/apache/spark/pull/33588#issuecomment-92447 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47033/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-92449 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142531/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA removed a comment on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899985674 **[Test build #142531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)** for PR 33744 at commit [`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA removed a comment on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-899985652 **[Test build #142530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)** for PR 33757 at commit [`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source
SparkQA commented on pull request #33588: URL: https://github.com/apache/spark/pull/33588#issuecomment-90124 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47033/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-89228 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47031/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-89072 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries
HeartSaVioR commented on a change in pull request #33749: URL: https://github.com/apache/spark/pull/33749#discussion_r690031859 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1586,6 +1586,21 @@ object SQLConf { .stringConf .createWithDefault("lz4") + /** + * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places should be updated + * together. + */ + val STATE_STORE_ROCKSDB_FORMAT_VERSION = +buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion") + .internal() + .doc("Set the RocksDB format version. This will be stored in the checkpoint when starting " + Review comment: Could we please describe the case when end users want to set the config instead of default one? Otherwise old few people can understand how it works and why this configuration exists. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1586,6 +1586,21 @@ object SQLConf { .stringConf .createWithDefault("lz4") + /** + * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places should be updated + * together. + */ + val STATE_STORE_ROCKSDB_FORMAT_VERSION = +buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion") + .internal() + .doc("Set the RocksDB format version. This will be stored in the checkpoint when starting " + +"a streaming query. If this configuration is not set, we will use the value in the " + +"checkpoint when restarting a streaming query.") + .version("3.2.0") + .intConf + .checkValue(_ >= 0, "Must not be negative") + .createWithDefault(5) Review comment: May worth having a single line comment that 5 is the latest table format version for RocksDB 6.20.3. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala ## @@ -497,23 +497,38 @@ case class RocksDBConf( blockSizeKB: Long, blockCacheSizeMB: Long, lockAcquireTimeoutMs: Long, -resetStatsOnLoad : Boolean) +resetStatsOnLoad : Boolean, +formatVersion: Int) object RocksDBConf { /** Common prefix of all confs in SQLConf that affects RocksDB */ val ROCKSDB_CONF_NAME_PREFIX = "spark.sql.streaming.stateStore.rocksdb" - private case class ConfEntry(name: String, default: String) { -def fullName: String = s"$ROCKSDB_CONF_NAME_PREFIX.${name}".toLowerCase(Locale.ROOT) + case class ConfEntry(name: String, default: String) { +def fullName: String = s"$ROCKSDB_CONF_NAME_PREFIX.${name}" } // Configuration that specifies whether to compact the RocksDB data every time data is committed - private val COMPACT_ON_COMMIT_CONF = ConfEntry("compactOnCommit", "false") + val COMPACT_ON_COMMIT_CONF = ConfEntry("compactOnCommit", "false") private val PAUSE_BG_WORK_FOR_COMMIT_CONF = ConfEntry("pauseBackgroundWorkForCommit", "true") private val BLOCK_SIZE_KB_CONF = ConfEntry("blockSizeKB", "4") private val BLOCK_CACHE_SIZE_MB_CONF = ConfEntry("blockCacheSizeMB", "8") - private val LOCK_ACQUIRE_TIMEOUT_MS_CONF = ConfEntry("lockAcquireTimeoutMs", "6") + val LOCK_ACQUIRE_TIMEOUT_MS_CONF = ConfEntry("lockAcquireTimeoutMs", "6") private val RESET_STATS_ON_LOAD = ConfEntry("resetStatsOnLoad", "true") + // Configuration to set the RocksDB format version. When upgrading the RocksDB version in Spark, Review comment: Nice explanation! It would be nice if we can refer this from config in SQLConf which is closer to user facing - despite it's marked as internal, they find the config in SQLConf first instead of this. ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala ## @@ -62,8 +62,9 @@ class RocksDBStateStoreSuite extends StateStoreSuiteBase[RocksDBStateStoreProvid val testConfs = Seq( ("spark.sql.streaming.stateStore.providerClass", classOf[RocksDBStateStoreProvider].getName), -(RocksDBConf.ROCKSDB_CONF_NAME_PREFIX + ".compactOnCommit", "true"), -(RocksDBConf.ROCKSDB_CONF_NAME_PREFIX + ".lockAcquireTimeoutMs", "10") +(RocksDBConf.COMPACT_ON_COMMIT_CONF.fullName, "true"), Review comment: Should we remove this as well in RocksDBConf if we want to have consistent behavior, "case sensitive"? `val confs = CaseInsensitiveMap[String](storeConf.confs)` ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1586,6 +1586,21 @@ object SQLConf { .stringConf .createWithDefault("lz4") + /** + * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places should be updated + * together. + */ + val STATE_STORE_ROCKSDB_FORMAT_VERSION = +buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion") + .internal() + .doc("Set
[GitHub] [spark] sumeetgajjar commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
sumeetgajjar commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-87315 > I just realized this bug does cause the real problem when working in conjunction with #24533. Basically, the re-registration issue leads to the driver thinks an executor is alive while it's actually dead, which in turn causes the client to retry the block on the dead executor, while it shouldn't. Could you @sumeetgajjar backport this fix to 3.1/3.0 as well? > cc @mridulm @attilapiros @Ngone51, sure I will backport it to 3.1 and 3.0 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
SparkQA commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-84735 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-84506 **[Test build #142531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)** for PR 33744 at commit [`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-83976 **[Test build #142530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)** for PR 33757 at commit [`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic edited a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
itholic edited a comment on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-899987424 ~I'm fixing CategoricalIndexTest.~ Fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types
MaxGekk commented on a change in pull request #33753: URL: https://github.com/apache/spark/pull/33753#discussion_r690034234 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala ## @@ -222,3 +222,11 @@ private[sql] object AnyTimestampType extends AbstractDataType with Serializable def unapply(e: Expression): Boolean = acceptsType(e.dataType) } + +/** + * The interval type which conforms to the ANSI SQL standard. + * + * @since 3.2.0 + */ +@Unstable Review comment: ok. Let me remove `@Unstable`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
SparkQA commented on pull request #33758: URL: https://github.com/apache/spark/pull/33758#issuecomment-80871 **[Test build #142533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142533/testReport)** for PR 33758 at commit [`76c697e`](https://github.com/apache/spark/commit/76c697e7fac97945087452916c5f929e8a89f880). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
gengliangwang opened a new pull request #33758: URL: https://github.com/apache/spark/pull/33758 ### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](https://github.com/apache/spark/pull/32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
itholic commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-899987424 I'm fixing CategoricalIndexTest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
HyukjinKwon commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r690027199 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -970,7 +995,59 @@ class Dataset[T] private[sql]( } /** - * Equi-join with another `DataFrame` using the given columns. A cross join with a predicate + * Equi-join with another `DataFrame` using the given column. A cross join with a predicate + * is specified as an inner join. If you would explicitly like to perform a cross join use the + * `crossJoin` method. + * + * Different from other join functions, the join column will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * @param right Right side of the join operation. + * @param usingColumn Name of the column to join on. This column must exist on both sides. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `fullouter`, `full_outer`, `left`, + * `leftouter`, `left_outer`, `right`, `rightouter`, `right_outer`, + * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, left_anti`. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.3.0 + */ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { +join(right, Seq(usingColumn), joinType) + } + + /** + * (Java-specific) Equi-join with another `DataFrame` using the given columns. A cross join with Review comment: Let's just go ahead with a prose then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
AmplabJenkins removed a comment on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899986644 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47030/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
AmplabJenkins commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899986644 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47030/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899986625 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47030/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
SparkQA commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-899986237 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47029/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source
SparkQA commented on pull request #33588: URL: https://github.com/apache/spark/pull/33588#issuecomment-899986032 **[Test build #142532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142532/testReport)** for PR 33588 at commit [`ab8d985`](https://github.com/apache/spark/commit/ab8d9854130b3312b2414da749cf1ae0d9950093). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899985674 **[Test build #142531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)** for PR 33744 at commit [`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
SparkQA commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-899985652 **[Test build #142530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)** for PR 33757 at commit [`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source
AmplabJenkins removed a comment on pull request #33588: URL: https://github.com/apache/spark/pull/33588#issuecomment-899410081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 edited a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
Ngone51 edited a comment on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-899979045 I just realized this bug does cause the real problem when working in conjunction with https://github.com/apache/spark/pull/24533. Basically, the re-registration issue leads to the driver thinks an executor is alive while it's actually dead, which in turn causes the client to retry the block on the dead executor, while it shouldn't. Could you @sumeetgajjar backport this fix to 3.1/3.0 as well? cc @mridulm @attilapiros -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins removed a comment on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899983343 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47026/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899983343 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47026/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
SparkQA commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899983260 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47026/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
SparkQA commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-899983001 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47028/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins removed a comment on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899982648 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47025/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins removed a comment on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-899973551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brandondahler commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads
brandondahler commented on a change in pull request #33323: URL: https://github.com/apache/spark/pull/33323#discussion_r690022929 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -970,7 +995,59 @@ class Dataset[T] private[sql]( } /** - * Equi-join with another `DataFrame` using the given columns. A cross join with a predicate + * Equi-join with another `DataFrame` using the given column. A cross join with a predicate + * is specified as an inner join. If you would explicitly like to perform a cross join use the + * `crossJoin` method. + * + * Different from other join functions, the join column will only appear once in the output, + * i.e. similar to SQL's `JOIN USING` syntax. + * + * @param right Right side of the join operation. + * @param usingColumn Name of the column to join on. This column must exist on both sides. + * @param joinType Type of join to perform. Default `inner`. Must be one of: + * `inner`, `cross`, `outer`, `full`, `fullouter`, `full_outer`, `left`, + * `leftouter`, `left_outer`, `right`, `rightouter`, `right_outer`, + * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, left_anti`. + * + * @note If you perform a self-join using this function without aliasing the input + * `DataFrame`s, you will NOT be able to reference any columns after the join, since + * there is no way to disambiguate which side of the join you would like to reference. + * + * @group untypedrel + * @since 3.3.0 + */ + def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame = { +join(right, Seq(usingColumn), joinType) + } + + /** + * (Java-specific) Equi-join with another `DataFrame` using the given columns. A cross join with Review comment: The problem with the simple `[[join]]` link is that there's 8 total overloads that match that target reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask
AmplabJenkins commented on pull request #33744: URL: https://github.com/apache/spark/pull/33744#issuecomment-899982648 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47025/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
AmplabJenkins commented on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-899982652 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142528/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc
SparkQA commented on pull request #33748: URL: https://github.com/apache/spark/pull/33748#issuecomment-899980920 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47027/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
SparkQA removed a comment on pull request #33735: URL: https://github.com/apache/spark/pull/33735#issuecomment-899965311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
HyukjinKwon commented on pull request #33757: URL: https://github.com/apache/spark/pull/33757#issuecomment-899980252 cc @xinrong-databricks @ueshin FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 edited a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
Ngone51 edited a comment on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-899979045 I just realized this bug does cause the real problem when working in conjunction with https://github.com/apache/spark/pull/24533. Basically, the re-registration issue leads to the driver thinks an executor is alive while it's actually dead, which in turn causes the client to retry the block fetching on a dead executor, while it shouldn't. Could you @sumeetgajjar backport this fix to 3.1/3.0 as well? cc @mridulm @attilapiros -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic opened a new pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3
itholic opened a new pull request #33757: URL: https://github.com/apache/spark/pull/33757 ### What changes were proposed in this pull request? This PR proposes to fix the behavior of `astype` for `CategoricalDtype` to follow pandas 1.3. **Before:** ```python >>> pcat 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] >>> pcat.astype(CategoricalDtype(["b", "c", "a"])) 0a 1b 2c dtype: category Categories (3, object): ['b', 'c', 'a'] ``` **After:** ```python >>> pcat 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] >>> pcat.astype(CategoricalDtype(["b", "c", "a"])) 0a 1b 2c dtype: category Categories (3, object): ['a', 'b', 'c'] # CategoricalDtype is not updated if dtype is the same ``` `CategoricalDtype` is treated as a same `dtype` if the unique values are the same. ```python >>> pcat1 = pser.astype(CategoricalDtype(["b", "c", "a"])) >>> pcat2 = pser.astype(CategoricalDtype(["a", "b", "c"])) >>> pcat1.dtype == pcat2.dtype True ``` ### Why are the changes needed? We should follow the latest pandas as much as possible. ### Does this PR introduce _any_ user-facing change? Yes, the behavior is changed as example in the PR description. ### How was this patch tested? Unittest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org