[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900043288


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142538/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900042969


   **[Test build #142538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142538/testReport)**
 for PR 33744 at commit 
[`c02382c`](https://github.com/apache/spark/commit/c02382cb13967c6b18621cb4967e7a0330dd0f08).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900041664


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900041615


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

2021-08-16 Thread GitBox


Ngone51 commented on pull request #33759:
URL: https://github.com/apache/spark/pull/33759#issuecomment-900041178


   This's a bug fix that should be backported to 3.2 (and even 3.0), so cc 
@gengliangwang fyi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900040783


   **[Test build #142541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142541/testReport)**
 for PR 33757 at commit 
[`e7c42ab`](https://github.com/apache/spark/commit/e7c42ab7921b003f9733ed814a82d43114023579).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

2021-08-16 Thread GitBox


SparkQA commented on pull request #33759:
URL: https://github.com/apache/spark/pull/33759#issuecomment-900040798


   **[Test build #142540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142540/testReport)**
 for PR 33759 at commit 
[`ca6e4c0`](https://github.com/apache/spark/commit/ca6e4c0b500ad90a542dbeeb4997779244517438).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33673: [SPARK-36448][SQL] Exceptions in NoSuchItemException.scala have to be case classes

2021-08-16 Thread GitBox


SparkQA commented on pull request #33673:
URL: https://github.com/apache/spark/pull/33673#issuecomment-900040831


   **[Test build #142542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142542/testReport)**
 for PR 33673 at commit 
[`9457ca5`](https://github.com/apache/spark/commit/9457ca5b2460e7ef5511a78f35f90c999434a808).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

2021-08-16 Thread GitBox


Ngone51 commented on pull request #33759:
URL: https://github.com/apache/spark/pull/33759#issuecomment-900039887


   cc @mridulm @cloud-fan @jiangxb1987 Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 opened a new pull request #33759: [SPARK-36532][CORE] Fix deadlock in CoarseGrainedExecutorBackend.onDisconnected to avoid executor shutdown hang

2021-08-16 Thread GitBox


Ngone51 opened a new pull request #33759:
URL: https://github.com/apache/spark/pull/33759


   
   
   ### What changes were proposed in this pull request?
   
   
   Instead of exiting the executor within the RpcEnv's thread, exit the 
executor in a separate thread.
   
   ### Why are the changes needed?
   
   
   The current exit way in `onDisconnected` can cause the deadlock, which has 
the exact same root cause with https://github.com/apache/spark/pull/12012:
   
   * `onDisconnected` -> `System.exit` are called in sequence in the thread of 
`MessageLoop.threadpool`
   * `System.exit` triggers shutdown hooks and `executor.stop` is one of the 
hooks.
   * `executor.stop` stops the `Dispatcher`, which waits for the 
`MessageLoop.threadpool`  to shutdown further.
   * Thus, the thread which runs `System.exit` waits for hooks to be done, but 
the `MessageLoop.threadpool` in the hook waits that thread to finish. Finally, 
this mutual dependence results in the deadlock.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   Yes, the executor shutdown won't hang.
   
   ### How was this patch tested?
   
   
   Pass existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API

2021-08-16 Thread GitBox


SparkQA commented on pull request #33723:
URL: https://github.com/apache/spark/pull/33723#issuecomment-900034196


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47037/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox


SparkQA commented on pull request #33736:
URL: https://github.com/apache/spark/pull/33736#issuecomment-900032200


   **[Test build #142539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142539/testReport)**
 for PR 33736 at commit 
[`25a3b60`](https://github.com/apache/spark/commit/25a3b60dc470c5e0a4f1796bde7d25bef567d9d4).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox


AngersZh commented on a change in pull request #33736:
URL: https://github.com/apache/spark/pull/33736#discussion_r690071365



##
File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
##
@@ -79,6 +79,17 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema {
""".stripMargin)
   }
 
+  def createTables(): Unit = {

Review comment:
   > where else do we call this method?
   
   Since PlanStabilitySuite extends TPCDSBase and the create table in 
`beforeAll`, here we split `createTables`
   then for TPCH suite we can only just create our needed table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900029261


   **[Test build #142538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142538/testReport)**
 for PR 33744 at commit 
[`c02382c`](https://github.com/apache/spark/commit/c02382cb13967c6b18621cb4967e7a0330dd0f08).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox


AngersZh commented on a change in pull request #33736:
URL: https://github.com/apache/spark/pull/33736#discussion_r690070608



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -333,3 +340,22 @@ class TPCDSModifiedPlanStabilityWithStatsSuite extends 
PlanStabilitySuite {
 }
   }
 }
+
+abstract class TPCHPlanStabilitySuiteBase extends PlanStabilitySuite {

Review comment:
   > why add an abstract class that has only one child?
   
   Have [withState]  subclass before but found not have stats data then remove 
it.  Remove  TPCHPlanStabilitySuiteBase now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


SparkQA commented on pull request #33753:
URL: https://github.com/apache/spark/pull/33753#issuecomment-900028834


   **[Test build #142537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142537/testReport)**
 for PR 33753 at commit 
[`ac2ce78`](https://github.com/apache/spark/commit/ac2ce78ed0f1532e9d0a0bbd863dcee6ca174177).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33753:
URL: https://github.com/apache/spark/pull/33753#issuecomment-900026956


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900026945


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33758:
URL: https://github.com/apache/spark/pull/33758#issuecomment-900026946


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33758:
URL: https://github.com/apache/spark/pull/33758#issuecomment-900026946


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900026945


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33753:
URL: https://github.com/apache/spark/pull/33753#issuecomment-900026956


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900024067


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47035/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


SparkQA commented on pull request #33758:
URL: https://github.com/apache/spark/pull/33758#issuecomment-900023997


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA removed a comment on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-94006


   **[Test build #142534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)**
 for PR 33757 at commit 
[`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


SparkQA commented on pull request #33753:
URL: https://github.com/apache/spark/pull/33753#issuecomment-900022414


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47036/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


cloud-fan commented on a change in pull request #33758:
URL: https://github.com/apache/spark/pull/33758#discussion_r690064618



##
File path: sql/core/src/test/resources/sql-tests/inputs/ansi/group-analytics.sql
##
@@ -1 +0,0 @@
---IMPORT group-analytics.sql

Review comment:
   good catch!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries

2021-08-16 Thread GitBox


HeartSaVioR commented on pull request #33749:
URL: https://github.com/apache/spark/pull/33749#issuecomment-900022027


   cc. @gengliangwang as well as this PR targets to Spark 3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries

2021-08-16 Thread GitBox


HeartSaVioR commented on pull request #33749:
URL: https://github.com/apache/spark/pull/33749#issuecomment-900021576


   cc. @viirya 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-900016995


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900014181


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900013873


   **[Test build #142534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)**
 for PR 33757 at commit 
[`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API

2021-08-16 Thread GitBox


SparkQA commented on pull request #33723:
URL: https://github.com/apache/spark/pull/33723#issuecomment-900013805


   **[Test build #142536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142536/testReport)**
 for PR 33723 at commit 
[`38d98ed`](https://github.com/apache/spark/commit/38d98ed9e613f181018bb6266d89ab772db84b4c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on pull request #33723: [SPARK-36496][SQL] Remove literals from grouping expressions when using the DataFrame withColumn API

2021-08-16 Thread GitBox


tanelk commented on pull request #33723:
URL: https://github.com/apache/spark/pull/33723#issuecomment-900013436


   pinging @maropu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900011854


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47031/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-900011833


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47031/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33754: [SPARK-36526][SQL] DSV2 Index Support: Add supportsIndex interface

2021-08-16 Thread GitBox


HyukjinKwon commented on pull request #33754:
URL: https://github.com/apache/spark/pull/33754#issuecomment-900011457


   Oh, okay. so it really means the concept of an index in DBMS's table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


SparkQA commented on pull request #33758:
URL: https://github.com/apache/spark/pull/33758#issuecomment-98151


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47034/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


gatorsmile commented on a change in pull request #33758:
URL: https://github.com/apache/spark/pull/33758#discussion_r690048714



##
File path: sql/core/src/test/resources/sql-tests/inputs/ansi/group-analytics.sql
##
@@ -1 +0,0 @@
---IMPORT group-analytics.sql

Review comment:
   Do we need to remove the result file?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tooptoop4 commented on pull request #33332: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker

2021-08-16 Thread GitBox


tooptoop4 commented on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-95346


   can this log level change be merged? @steveloughran 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tooptoop4 removed a comment on pull request #33332: [SPARK-36147][SQL] Warn if less files visible after stats write in BasicWriteStatsTracker

2021-08-16 Thread GitBox


tooptoop4 removed a comment on pull request #2:
URL: https://github.com/apache/spark/pull/2#issuecomment-888246182


   can this log level change be merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-94901


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-94901


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


SparkQA commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-94831


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox


cloud-fan commented on a change in pull request #33736:
URL: https://github.com/apache/spark/pull/33736#discussion_r690046057



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -333,3 +340,22 @@ class TPCDSModifiedPlanStabilityWithStatsSuite extends 
PlanStabilitySuite {
 }
   }
 }
+
+abstract class TPCHPlanStabilitySuiteBase extends PlanStabilitySuite {

Review comment:
   why add an abstract class that has only one child?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


SparkQA commented on pull request #33753:
URL: https://github.com/apache/spark/pull/33753#issuecomment-94377


   **[Test build #142535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142535/testReport)**
 for PR 33753 at commit 
[`de8f15f`](https://github.com/apache/spark/commit/de8f15fb68eb8085f6a1a59b98051bbd55d01519).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33736: [SPARK-35991][SQL] Add PlanStability suite for TPCH

2021-08-16 Thread GitBox


cloud-fan commented on a change in pull request #33736:
URL: https://github.com/apache/spark/pull/33736#discussion_r690045699



##
File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
##
@@ -79,6 +79,17 @@ trait TPCDSBase extends SharedSparkSession with TPCDSSchema {
""".stripMargin)
   }
 
+  def createTables(): Unit = {

Review comment:
   where else do we call this method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #33754: [SPARK-36526][SQL] DSV2 Index Support: Add supportsIndex interface

2021-08-16 Thread GitBox


huaxingao commented on pull request #33754:
URL: https://github.com/apache/spark/pull/33754#issuecomment-94097


   @HyukjinKwon Sorry for the confusion. I didn't put enough explanation in the 
PR's description. I updated the description. Hope it's clear now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-94006


   **[Test build #142534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142534/testReport)**
 for PR 33757 at commit 
[`4f55ea0`](https://github.com/apache/spark/commit/4f55ea06192db0c859f86fc924395872d4b076ed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-93489


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47032/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


HyukjinKwon commented on a change in pull request #33753:
URL: https://github.com/apache/spark/pull/33753#discussion_r690044302



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
##
@@ -222,3 +222,10 @@ private[sql] object AnyTimestampType extends 
AbstractDataType with Serializable
 
   def unapply(e: Expression): Boolean = acceptsType(e.dataType)
 }
+
+/**
+ * The interval type which conforms to the ANSI SQL standard.
+ *
+ * @since 3.2.0

Review comment:
   Let's also remove these since this isn't an API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-92450


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-92446


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33588:
URL: https://github.com/apache/spark/pull/33588#issuecomment-92447


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-92448


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-92449


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142531/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-92450


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-92446


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-92448


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33588:
URL: https://github.com/apache/spark/pull/33588#issuecomment-92447


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-92449


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142531/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA removed a comment on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899985674


   **[Test build #142531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)**
 for PR 33744 at commit 
[`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA removed a comment on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-899985652


   **[Test build #142530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)**
 for PR 33757 at commit 
[`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source

2021-08-16 Thread GitBox


SparkQA commented on pull request #33588:
URL: https://github.com/apache/spark/pull/33588#issuecomment-90124


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47033/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-89228


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47031/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


SparkQA commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-89072


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on a change in pull request #33749: [SPARK-36519][SS]Store RocksDB format version in the checkpoint for streaming queries

2021-08-16 Thread GitBox


HeartSaVioR commented on a change in pull request #33749:
URL: https://github.com/apache/spark/pull/33749#discussion_r690031859



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1586,6 +1586,21 @@ object SQLConf {
   .stringConf
   .createWithDefault("lz4")
 
+  /**
+   * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places 
should be updated
+   * together.
+   */
+  val STATE_STORE_ROCKSDB_FORMAT_VERSION =
+buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion")
+  .internal()
+  .doc("Set the RocksDB format version. This will be stored in the 
checkpoint when starting " +

Review comment:
   Could we please describe the case when end users want to set the config 
instead of default one? Otherwise old few people can understand how it works 
and why this configuration exists.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1586,6 +1586,21 @@ object SQLConf {
   .stringConf
   .createWithDefault("lz4")
 
+  /**
+   * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places 
should be updated
+   * together.
+   */
+  val STATE_STORE_ROCKSDB_FORMAT_VERSION =
+buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion")
+  .internal()
+  .doc("Set the RocksDB format version. This will be stored in the 
checkpoint when starting " +
+"a streaming query. If this configuration is not set, we will use the 
value in the " +
+"checkpoint when restarting a streaming query.")
+  .version("3.2.0")
+  .intConf
+  .checkValue(_ >= 0, "Must not be negative")
+  .createWithDefault(5)

Review comment:
   May worth having a single line comment that 5 is the latest table format 
version for RocksDB 6.20.3.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
##
@@ -497,23 +497,38 @@ case class RocksDBConf(
 blockSizeKB: Long,
 blockCacheSizeMB: Long,
 lockAcquireTimeoutMs: Long,
-resetStatsOnLoad : Boolean)
+resetStatsOnLoad : Boolean,
+formatVersion: Int)
 
 object RocksDBConf {
   /** Common prefix of all confs in SQLConf that affects RocksDB */
   val ROCKSDB_CONF_NAME_PREFIX = "spark.sql.streaming.stateStore.rocksdb"
 
-  private case class ConfEntry(name: String, default: String) {
-def fullName: String = 
s"$ROCKSDB_CONF_NAME_PREFIX.${name}".toLowerCase(Locale.ROOT)
+  case class ConfEntry(name: String, default: String) {
+def fullName: String = s"$ROCKSDB_CONF_NAME_PREFIX.${name}"
   }
 
   // Configuration that specifies whether to compact the RocksDB data every 
time data is committed
-  private val COMPACT_ON_COMMIT_CONF = ConfEntry("compactOnCommit", "false")
+  val COMPACT_ON_COMMIT_CONF = ConfEntry("compactOnCommit", "false")
   private val PAUSE_BG_WORK_FOR_COMMIT_CONF = 
ConfEntry("pauseBackgroundWorkForCommit", "true")
   private val BLOCK_SIZE_KB_CONF = ConfEntry("blockSizeKB", "4")
   private val BLOCK_CACHE_SIZE_MB_CONF = ConfEntry("blockCacheSizeMB", "8")
-  private val LOCK_ACQUIRE_TIMEOUT_MS_CONF = ConfEntry("lockAcquireTimeoutMs", 
"6")
+  val LOCK_ACQUIRE_TIMEOUT_MS_CONF = ConfEntry("lockAcquireTimeoutMs", "6")
   private val RESET_STATS_ON_LOAD = ConfEntry("resetStatsOnLoad", "true")
+  // Configuration to set the RocksDB format version. When upgrading the 
RocksDB version in Spark,

Review comment:
   Nice explanation! It would be nice if we can refer this from config in 
SQLConf which is closer to user facing - despite it's marked as internal, they 
find the config in SQLConf first instead of this.

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreSuite.scala
##
@@ -62,8 +62,9 @@ class RocksDBStateStoreSuite extends 
StateStoreSuiteBase[RocksDBStateStoreProvid
   val testConfs = Seq(
 ("spark.sql.streaming.stateStore.providerClass",
   classOf[RocksDBStateStoreProvider].getName),
-(RocksDBConf.ROCKSDB_CONF_NAME_PREFIX + ".compactOnCommit", "true"),
-(RocksDBConf.ROCKSDB_CONF_NAME_PREFIX + ".lockAcquireTimeoutMs", "10")
+(RocksDBConf.COMPACT_ON_COMMIT_CONF.fullName, "true"),

Review comment:
   Should we remove this as well in RocksDBConf if we want to have 
consistent behavior, "case sensitive"?
   `val confs = CaseInsensitiveMap[String](storeConf.confs)`

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -1586,6 +1586,21 @@ object SQLConf {
   .stringConf
   .createWithDefault("lz4")
 
+  /**
+   * Note: this is defined in `RocksDBConf.FORMAT_VERSION`. These two places 
should be updated
+   * together.
+   */
+  val STATE_STORE_ROCKSDB_FORMAT_VERSION =
+buildConf("spark.sql.streaming.stateStore.rocksdb.formatVersion")
+  .internal()
+  .doc("Set

[GitHub] [spark] sumeetgajjar commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-08-16 Thread GitBox


sumeetgajjar commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-87315


   > I just realized this bug does cause the real problem when working in 
conjunction with #24533. Basically, the re-registration issue leads to the 
driver thinks an executor is alive while it's actually dead, which in turn 
causes the client to retry the block on the dead executor, while it shouldn't. 
Could you @sumeetgajjar backport this fix to 3.1/3.0 as well?
   > cc @mridulm @attilapiros
   
   @Ngone51, sure I will backport it to 3.1 and 3.0 as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


SparkQA commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-84735


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-84506


   **[Test build #142531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)**
 for PR 33744 at commit 
[`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-83976


   **[Test build #142530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)**
 for PR 33757 at commit 
[`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic edited a comment on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


itholic edited a comment on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-899987424


   ~I'm fixing CategoricalIndexTest.~
   
   Fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #33753: [SPARK-36524][SQL] Common class for ANSI interval types

2021-08-16 Thread GitBox


MaxGekk commented on a change in pull request #33753:
URL: https://github.com/apache/spark/pull/33753#discussion_r690034234



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala
##
@@ -222,3 +222,11 @@ private[sql] object AnyTimestampType extends 
AbstractDataType with Serializable
 
   def unapply(e: Expression): Boolean = acceptsType(e.dataType)
 }
+
+/**
+ * The interval type which conforms to the ANSI SQL standard.
+ *
+ * @since 3.2.0
+ */
+@Unstable

Review comment:
   ok. Let me remove `@Unstable`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


SparkQA commented on pull request #33758:
URL: https://github.com/apache/spark/pull/33758#issuecomment-80871


   **[Test build #142533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142533/testReport)**
 for PR 33758 at commit 
[`76c697e`](https://github.com/apache/spark/commit/76c697e7fac97945087452916c5f929e8a89f880).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #33758: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"

2021-08-16 Thread GitBox


gengliangwang opened a new pull request #33758:
URL: https://github.com/apache/spark/pull/33758


   
   
   ### What changes were proposed in this pull request?
   
   Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases 
](https://github.com/apache/spark/pull/32129)
   
   ### Why are the changes needed?
   
   It turns out that many users are using the group by alias feature.  Spark 
has its precedence rule when alias names conflict with column names in Group by 
clause: always use the table column. This should be reasonable and acceptable.
   
   As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing 
the group by alias in ANSI mode.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, the feature is not released yet.
   
   ### How was this patch tested?
   
   Unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


itholic commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-899987424


   I'm fixing CategoricalIndexTest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-08-16 Thread GitBox


HyukjinKwon commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r690027199



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -970,7 +995,59 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Equi-join with another `DataFrame` using the given columns. A cross join 
with a predicate
+   * Equi-join with another `DataFrame` using the given column. A cross join 
with a predicate
+   * is specified as an inner join. If you would explicitly like to perform a 
cross join use the
+   * `crossJoin` method.
+   *
+   * Different from other join functions, the join column will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumn Name of the column to join on. This column must exist 
on both sides.
+   * @param joinType Type of join to perform. Default `inner`. Must be one of:
+   * `inner`, `cross`, `outer`, `full`, `fullouter`, 
`full_outer`, `left`,
+   * `leftouter`, `left_outer`, `right`, `rightouter`, 
`right_outer`,
+   * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, 
left_anti`.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.3.0
+   */
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
+join(right, Seq(usingColumn), joinType)
+  }
+
+  /**
+   * (Java-specific) Equi-join with another `DataFrame` using the given 
columns. A cross join with

Review comment:
   Let's just go ahead with a prose then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-899986644


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47030/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-899986644


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47030/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


SparkQA commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-899986625


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47030/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


SparkQA commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-899986237


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47029/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source

2021-08-16 Thread GitBox


SparkQA commented on pull request #33588:
URL: https://github.com/apache/spark/pull/33588#issuecomment-899986032


   **[Test build #142532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142532/testReport)**
 for PR 33588 at commit 
[`ab8d985`](https://github.com/apache/spark/commit/ab8d9854130b3312b2414da749cf1ae0d9950093).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899985674


   **[Test build #142531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142531/testReport)**
 for PR 33744 at commit 
[`b239373`](https://github.com/apache/spark/commit/b239373f9e04924fc04a4417368a3097246e5d8f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


SparkQA commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-899985652


   **[Test build #142530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/142530/testReport)**
 for PR 33757 at commit 
[`11b9feb`](https://github.com/apache/spark/commit/11b9feb5b6a633c53b91f2f47851f20e57224569).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33588: [SPARK-36346][SQL] Support TimestampNTZ type in Orc file source

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33588:
URL: https://github.com/apache/spark/pull/33588#issuecomment-899410081






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 edited a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-08-16 Thread GitBox


Ngone51 edited a comment on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-899979045


   I just realized this bug does cause the real problem when working in 
conjunction with https://github.com/apache/spark/pull/24533. Basically, the 
re-registration issue leads to the driver thinks an executor is alive while 
it's actually dead, which in turn causes the client to retry the block on the 
dead executor, while it shouldn't.   Could you @sumeetgajjar backport this fix 
to 3.1/3.0 as well?
   cc @mridulm @attilapiros 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899983343


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899983343


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


SparkQA commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899983260


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


SparkQA commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-899983001


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47028/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899982648


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47025/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins removed a comment on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-899973551






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] brandondahler commented on a change in pull request #33323: [SPARK-35739][SQL] Add Java-compatible Dataset.join overloads

2021-08-16 Thread GitBox


brandondahler commented on a change in pull request #33323:
URL: https://github.com/apache/spark/pull/33323#discussion_r690022929



##
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##
@@ -970,7 +995,59 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Equi-join with another `DataFrame` using the given columns. A cross join 
with a predicate
+   * Equi-join with another `DataFrame` using the given column. A cross join 
with a predicate
+   * is specified as an inner join. If you would explicitly like to perform a 
cross join use the
+   * `crossJoin` method.
+   *
+   * Different from other join functions, the join column will only appear 
once in the output,
+   * i.e. similar to SQL's `JOIN USING` syntax.
+   *
+   * @param right Right side of the join operation.
+   * @param usingColumn Name of the column to join on. This column must exist 
on both sides.
+   * @param joinType Type of join to perform. Default `inner`. Must be one of:
+   * `inner`, `cross`, `outer`, `full`, `fullouter`, 
`full_outer`, `left`,
+   * `leftouter`, `left_outer`, `right`, `rightouter`, 
`right_outer`,
+   * `semi`, `leftsemi`, `left_semi`, `anti`, `leftanti`, 
left_anti`.
+   *
+   * @note If you perform a self-join using this function without aliasing the 
input
+   * `DataFrame`s, you will NOT be able to reference any columns after the 
join, since
+   * there is no way to disambiguate which side of the join you would like to 
reference.
+   *
+   * @group untypedrel
+   * @since 3.3.0
+   */
+  def join(right: Dataset[_], usingColumn: String, joinType: String): 
DataFrame = {
+join(right, Seq(usingColumn), joinType)
+  }
+
+  /**
+   * (Java-specific) Equi-join with another `DataFrame` using the given 
columns. A cross join with

Review comment:
   The problem with the simple `[[join]]` link is that there's 8 total 
overloads that match that target reference.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33744: [SPARK-36403][PYTHON] Implement Index.putmask

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33744:
URL: https://github.com/apache/spark/pull/33744#issuecomment-899982648


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47025/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


AmplabJenkins commented on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-899982652


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/142528/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33748: [SPARK-36516][SQL] Add File Metadata cache support for Orc

2021-08-16 Thread GitBox


SparkQA commented on pull request #33748:
URL: https://github.com/apache/spark/pull/33748#issuecomment-899980920


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47027/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33735: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string

2021-08-16 Thread GitBox


SparkQA removed a comment on pull request #33735:
URL: https://github.com/apache/spark/pull/33735#issuecomment-899965311






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


HyukjinKwon commented on pull request #33757:
URL: https://github.com/apache/spark/pull/33757#issuecomment-899980252


   cc @xinrong-databricks @ueshin FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 edited a comment on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-08-16 Thread GitBox


Ngone51 edited a comment on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-899979045


   I just realized this bug does cause the real problem when working in 
conjunction with https://github.com/apache/spark/pull/24533. Basically, the 
re-registration issue leads to the driver thinks an executor is alive while 
it's actually dead, which in turn causes the client to retry the block fetching 
on a dead executor, while it shouldn't.   Could you @sumeetgajjar backport this 
fix to 3.1/3.0 as well?
   cc @mridulm @attilapiros 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request #33757: [SPARK-36368][PYTHON] Fix CategoricalOps.astype to follow pandas 1.3

2021-08-16 Thread GitBox


itholic opened a new pull request #33757:
URL: https://github.com/apache/spark/pull/33757


   ### What changes were proposed in this pull request?
   
   This PR proposes to fix the behavior of `astype` for `CategoricalDtype` to 
follow pandas 1.3.
   
   
   **Before:**
   ```python
   >>> pcat
   0a
   1b
   2c
   dtype: category
   Categories (3, object): ['a', 'b', 'c']
   
   >>> pcat.astype(CategoricalDtype(["b", "c", "a"]))
   0a
   1b
   2c
   dtype: category
   Categories (3, object): ['b', 'c', 'a']
   ```
   
   **After:**
   ```python
   >>> pcat
   0a
   1b
   2c
   dtype: category
   Categories (3, object): ['a', 'b', 'c']
   
   >>> pcat.astype(CategoricalDtype(["b", "c", "a"]))
   0a
   1b
   2c
   dtype: category
   Categories (3, object): ['a', 'b', 'c']  # CategoricalDtype is not updated 
if dtype is the same
   ```
   
   `CategoricalDtype` is treated as a same `dtype` if the unique values are the 
same.
   
   ```python
   >>> pcat1 = pser.astype(CategoricalDtype(["b", "c", "a"]))
   >>> pcat2 = pser.astype(CategoricalDtype(["a", "b", "c"]))
   >>> pcat1.dtype == pcat2.dtype
   True
   ```
   
   ### Why are the changes needed?
   
   We should follow the latest pandas as much as possible.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the behavior is changed as example in the PR description.
   
   ### How was this patch tested?
   
   Unittest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >