date:20220608

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-08 Thread GitBox



zhouyejoe commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r893126752


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -576,6 +661,7 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) {
   } finally {
 partition.closeAllFilesAndDeleteIfNeeded(false);
   }
+  
cleanUpAppShufflePartitionInfoInDB(partition.appAttemptShuffleMergeId);

Review Comment:
   I just tried locally to revert this change, the UT doesn't fail. Will double 
check why it is not failing if we have write first and then remove immediately 
here, and update. Reopen the issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-08 Thread GitBox



zhouyejoe commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r893125981


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -992,6 +1233,45 @@ AppShufflePartitionInfo getPartitionInfo() {
 }
   }
 
+  /**
+   * Simply encodes an application attempt ID.
+   */
+  public static class AppAttemptId {

Review Comment:
   Added hashcode in the latest commit, and also updated the equals to start 
from cheaper one to more expensive ones.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #36815:
URL: https://github.com/apache/spark/pull/36815#issuecomment-1150727977

   `q5` failure is fixed but `q4` seems to be indeterministic for some reasons 
in `branch-3.2`. The result seems to be oscillated with new values and old 
values. Let me take a look at more.
   ```
   - check simplified (tpcds-v1.4/q4) *** FAILED *** (1 second, 36 milliseconds)
   ...
   *** 1 TEST FAILED ***
   Failed: Total 9599, Failed 1, Errors 0, Passed 9598, Ignored 29
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #36776: [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too

2022-06-08 Thread GitBox



beliefer commented on PR #36776:
URL: https://github.com/apache/spark/pull/36776#issuecomment-1150725531

   @cloud-fan @huaxingao Thank you for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] singhpk234 commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



singhpk234 commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150725000

   > @singhpk234 is this your first Spark contribution?
   
   yup 
   
   > Can you have your email?
   
   Ahh sure, my email is prashant010...@gmail.com
   
   Many thanks @huaxingao for merging this :) !!! 
   
   Thank you all for your awesome reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36807: [SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox



LuciferYang commented on PR #36807:
URL: https://github.com/apache/spark/pull/36807#issuecomment-1150723451

   
[e803525](https://github.com/apache/spark/pull/36807/commits/e8035255231f1ead54f18cb76c9c9c05151abfd6)
 merge with master and re-run GA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



huaxingao commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150722468

   @singhpk234 is this your first Spark contribution? I tried to assign the 
jira to you but somehow can't find you. Then I tried to add you to the Spark 
contributors list but found multiple users with the same name. Can you have 
your email?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36776: [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too

2022-06-08 Thread GitBox



cloud-fan closed pull request #36776: [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] 
`PushableColumnWithoutNestedColumn` need be translated to predicate too
URL: https://github.com/apache/spark/pull/36776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36776: [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] `PushableColumnWithoutNestedColumn` need be translated to predicate too

2022-06-08 Thread GitBox



cloud-fan commented on PR #36776:
URL: https://github.com/apache/spark/pull/36776#issuecomment-1150721826

   thanks, merging to master/3.3!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



huaxingao commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150711877

   Merged to master/3.3. Thanks a lot @singhpk234 and et al.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao closed pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



huaxingao closed pull request #36810: [SPARK-39417][SQL] Handle Null partition 
values in PartitioningUtils
URL: https://github.com/apache/spark/pull/36810


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



huaxingao commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150709548

   The python documentation failure is not related to this PR. All the other 
tests run successfully. I will merge this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 commented on pull request #36808: [SPARK-39415][CORE] Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread GitBox



cxzl25 commented on PR #36808:
URL: https://github.com/apache/spark/pull/36808#issuecomment-1150693473

   Duplicate #32009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cxzl25 closed pull request #36808: [SPARK-39415][CORE] Local mode supports HadoopDelegationTokenManager

2022-06-08 Thread GitBox



cxzl25 closed pull request #36808: [SPARK-39415][CORE] Local mode supports 
HadoopDelegationTokenManager
URL: https://github.com/apache/spark/pull/36808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150686914

   Python docs passed.
   
   Merged to master, branch-3.3 and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <0.18 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon closed pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the 
docutils version <0.18 in documentation build
URL: https://github.com/apache/spark/pull/36813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



AngersZh commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893074610


##
core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala:
##
@@ -187,12 +181,6 @@ class OutputCommitCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfter {
 // The authorized committer now fails, clearing the lock
 outputCommitCoordinator.taskCompleted(stage, stageAttempt, partition,
   attemptNumber = authorizedCommitter, reason = TaskKilled("test"))
-// A new task should now be allowed to become the authorized committer
-assert(outputCommitCoordinator.canCommit(stage, stageAttempt, partition,

Review Comment:
   > We need this level of this for the new behavior
   
   How about current?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



AngersZh commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893072393


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -461,7 +467,8 @@ class SparkContext(config: SparkConf) extends Logging {
 listenerBus.addToStatusQueue(_statusStore.listener.get)
 
 // Create the Spark execution environment (cache, map output tracker, etc)
-_env = createSparkEnv(_conf, isLocal, listenerBus)
+_env = createSparkEnv(_conf, isLocal, listenerBus,
+new OutputCommitCoordinator(conf, true, Option(this)))

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36663:
URL: https://github.com/apache/spark/pull/36663#discussion_r893067774


##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java:
##
@@ -196,6 +196,90 @@
  *Since version: 3.4.0
  *   
  *  
+ *  Name: DATE_ADD
+ *   
+ *SQL semantic: DATE_ADD(start_date, num_days)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: DATE_DIFF
+ *   
+ *SQL semantic: DATE_DIFF(end_date, start_date)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: TRUNC
+ *   
+ *SQL semantic: TRUNC(date, format)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: SECOND
+ *   
+ *SQL semantic: SECOND(timestamp, timezone)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: MINUTE
+ *   
+ *SQL semantic: MINUTE(timestamp, timezone)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: HOUR
+ *   
+ *SQL semantic: HOUR(timestamp, timezone)
+ *Since version: 3.4.0
+ *   
+ *  
+ *  Name: MONTH

Review Comment:
   I'm wondering if the `EXTRACT` function is more common that these field 
extraction functions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36663:
URL: https://github.com/apache/spark/pull/36663#discussion_r893067221


##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala:
##
@@ -259,6 +259,55 @@ class V2ExpressionBuilder(
   } else {
 None
   }
+case date: DateAdd =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("DATE_ADD", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case date: DateDiff =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("DATE_DIFF", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case date: TruncDate =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("TRUNC", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case Second(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("SECOND", Array[V2Expression](v)))
+case Minute(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("MINUTE", Array[V2Expression](v)))
+case Hour(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("HOUR", Array[V2Expression](v)))
+case Month(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("MONTH", Array[V2Expression](v)))
+case Quarter(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("QUARTER", Array[V2Expression](v)))
+case Year(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("YEAR", Array[V2Expression](v)))
+// The DAY_OF_WEEK function in Spark returns the day of the week for 
date/timestamp.
+// Database dialects should avoid to follow ISO semantics when handling 
DAY_OF_WEEK.
+case DayOfWeek(child) => generateExpression(child)

Review Comment:
   It's a bit weird to expose Spark-specific functions to the data source. 
Spark does support `extract('dayofweek_iso', ...)`, which ends up with 
`Add(WeekDay(source), Literal(1))`. Shall we match it instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r893062401


##
core/src/main/resources/error/error-classes.json:
##
@@ -551,5 +551,10 @@
   "Writing job aborted"
 ],
 "sqlState" : "4"
+  },
+  "NULL_COMPARISON_RESULT" : {
+"message" : [
+  "The comparison result is null. If you want to handle null as 0 (equal), 
you can set \"\" to \"\"."

Review Comment:
   actually, why not hardcode the config name and value here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893061431


##
core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala:
##
@@ -187,12 +181,6 @@ class OutputCommitCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfter {
 // The authorized committer now fails, clearing the lock
 outputCommitCoordinator.taskCompleted(stage, stageAttempt, partition,
   attemptNumber = authorizedCommitter, reason = TaskKilled("test"))
-// A new task should now be allowed to become the authorized committer
-assert(outputCommitCoordinator.canCommit(stage, stageAttempt, partition,

Review Comment:
   We need this level of this for the new behavior



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-08 Thread GitBox



HeartSaVioR commented on PR #36737:
URL: https://github.com/apache/spark/pull/36737#issuecomment-1150658562

   General comment from what I see in review comments:
   
   I see you repeat the explanation of the code you changed; I don't think 
reviewers asked about the detailed explanation of the code changes. There is no 
"high-level" explanation why it is broken (I roughly see it's from the language 
spec of modulo operation), and also "high-level" explanation how you deal with 
it in this PR. Please look through the description of the reference Flink PR 
you linked - while it also mentioned about code snippet, it explained with high 
level first, and then introduced the code change it proposed.
   
   As long as you update the PR description with high-level explanation, I 
guess it should be straightforward to understand the code change, and you'd 
easily pass the reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36816: [SPARK-39425][PYTHON][PS] Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36816:
URL: https://github.com/apache/spark/pull/36816#issuecomment-1150648303

   Thanks for updating the guide @Yikun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36814: [SPARK-39422][SQL] Improve error message for 'SHOW CREATE TABLE' with unsupported serdes

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36814:
URL: https://github.com/apache/spark/pull/36814#issuecomment-1150647932

   Hey thanks for letting me know. https://github.com/apache/spark/pull/36813 
should fix that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36814: [SPARK-39422][SQL] Improve error message for 'SHOW CREATE TABLE' with unsupported serdes

2022-06-08 Thread GitBox



huaxingao commented on PR #36814:
URL: https://github.com/apache/spark/pull/36814#issuecomment-1150646179

   @HyukjinKwon 
   The python doc generation failed. I saw the same error in other PRs too.
   ```
   /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
Python doc generation failed (RuntimeError)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893049422


##
.github/workflows/build_and_test.yml:
##
@@ -547,6 +547,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 # Pin the MarkupSafe to 2.0.1 to resolve the CI error.
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
+python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
 python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8' 

Review Comment:
   ```suggestion
   python3.9 -m pip install sphinx_plotly_directive 'numpy>=1.20.0' 
pyarrow pandas 'plotly>=4.8' 
   python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
   ```



##
.github/workflows/build_and_test.yml:
##
@@ -547,6 +547,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 # Pin the MarkupSafe to 2.0.1 to resolve the CI error.
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
+python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893049087


##
dev/requirements.txt:
##
@@ -35,6 +35,7 @@ numpydoc
 jinja2<3.0.0
 sphinx<3.1.0
 sphinx-plotly-directive
+docutils~=1.7.0

Review Comment:
   ```suggestion
   docutils<0.18.0
   ```



##
.github/workflows/build_and_test.yml:
##
@@ -547,6 +547,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 # Pin the MarkupSafe to 2.0.1 to resolve the CI error.
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
+python3.9 -m pip install 'docutils~=1.7.0' # See SPARK-39421

Review Comment:
   ```suggestion
   python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893047340


##
dev/requirements.txt:
##
@@ -35,6 +35,7 @@ numpydoc
 jinja2<3.0.0
 sphinx<3.1.0
 sphinx-plotly-directive
+docutils<1.7.0

Review Comment:
   ```suggestion
   docutils~=1.7.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893047298


##
.github/workflows/build_and_test.yml:
##
@@ -547,6 +547,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-35375.
 # Pin the MarkupSafe to 2.0.1 to resolve the CI error.
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
+python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421

Review Comment:
   ```suggestion
   python3.9 -m pip install 'docutils~=1.7.0' # See SPARK-39421
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893047077


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421

Review Comment:
   Thanks man



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



Yikun commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893046990


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421

Review Comment:
   See also https://pypi.org/project/docutils/#history



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893046599


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.

Review Comment:
   ```suggestion
   #   See also https://issues.apache.org/jira/browse/SPARK-38279.
   python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421
   ```



##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-08 Thread GitBox



itholic commented on PR #36793:
URL: https://github.com/apache/spark/pull/36793#issuecomment-1150640677

   Otherwise looks good if test passes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-08 Thread GitBox



itholic commented on PR #36793:
URL: https://github.com/apache/spark/pull/36793#issuecomment-1150640100

   Seems like it would fixed when https://github.com/apache/spark/pull/36813 is 
merged.
   
   Let's rebase after then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.7 in documentation build

2022-06-08 Thread GitBox



Yikun commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893045350


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421

Review Comment:
   successful: 
https://github.com/cxzl25/spark/runs/6794665905?check_suite_focus=true#step:11:466
   
   failed: 
https://github.com/HyukjinKwon/spark/runs/6805386169?check_suite_focus=true#step:11:457
   
   If we make sure it's `doctuils` problem, this might be 
   ```suggestion
   python3.9 -m pip install 'docutils<=0.17.1' # See SPARK-39421
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on pull request #36816: [SPARK-39425][PYTHON][PS] Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



Yikun commented on PR #36816:
URL: https://github.com/apache/spark/pull/36816#issuecomment-1150637320

   @LuciferYang Thanks for reminder, I guess I need a reabase after doctest 
fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36816: [SPARK-39425][PYTHON][PS] Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



LuciferYang commented on PR #36816:
URL: https://github.com/apache/spark/pull/36816#issuecomment-1150637062

   > @LuciferYang Hmm, this is not for fix doc test failed, :). It's for plus 
migration doc for 
[SPARK-38819](https://issues.apache.org/jira/browse/SPARK-38819)
   
   OK 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun commented on pull request #36816: [SPARK-39425][PYTHON][PS] Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



Yikun commented on PR #36816:
URL: https://github.com/apache/spark/pull/36816#issuecomment-1150636796

   @LuciferYang Hmm, this is not for fix doc test failed, :). It's for plus 
migration doc for SPARK-38819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nyingping commented on a diff in pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-08 Thread GitBox



nyingping commented on code in PR #36737:
URL: https://github.com/apache/spark/pull/36737#discussion_r893042194


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3963,8 +3966,10 @@ object TimeWindowing extends Rule[LogicalPlan] {
 
 def getWindow(i: Int, dataType: DataType): Expression = {
   val timestamp = PreciseTimestampConversion(window.timeColumn, 
dataType, LongType)
-  val lastStart = timestamp - (timestamp - window.startTime
-+ window.slideDuration) % window.slideDuration
+  val remainder = (timestamp - window.startTime) % window.slideDuration
+  val lastStart = CaseWhen(
+Seq((LessThan(remainder, 0), timestamp - remainder - 
window.slideDuration))

Review Comment:
   @srowen sure,As mentioned above, I think this problem is unavoidable because 
we can't control the behavior of users. All we can do is get the right results 
even if users enter illegal data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36816: Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



LuciferYang commented on PR #36816:
URL: https://github.com/apache/spark/pull/36816#issuecomment-1150634766

   @Yikun Will the current pr fix 
[SPARK-39424](https://issues.apache.org/jira/browse/SPARK-39424) 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36807: [WIP][SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox



LuciferYang commented on PR #36807:
URL: https://github.com/apache/spark/pull/36807#issuecomment-1150633518

   There are no official release notes yet
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Yikun opened a new pull request, #36816: Add migration guide for pandas 1.4 behavior changes

2022-06-08 Thread GitBox



Yikun opened a new pull request, #36816:
URL: https://github.com/apache/spark/pull/36816

   
   
   ### What changes were proposed in this pull request?
   Add migration guide for pandas 1.4 behavior changes:
   * SPARK-39054 https://github.com/apache/spark/pull/36581: In Spark 3.4, if 
Pandas on Spark API `Groupby.apply`'s `func` parameter return type is not 
specified and `compute.shortcut_limit` is set to `0`, the sampling rows will be 
set to 2 (ensure sampling rows always >= 2) to make sure infer schema is 
accurate.
   
   * SPARK-38822 https://github.com/apache/spark/pull/36168 In Spark 3.4, if 
Pandas on Spark API `Index.insert` is out of bounds, will rasied IndexError 
with `index {} is out of bounds for axis 0 with size {}` to follow pandas 1.4 
behavior.
   
   * SPARK-38857 https://github.com/apache/spark/pull/36159 In Spark 3.4, the 
series name will be preserved in Pandas on Spark API `Series.mode` to follow 
pandas 1.4 behavior.
   
   * SPARK-38859 https://github.com/apache/spark/pull/36142 In Spark 3.4, the 
Pandas on Spark API `Index.__setitem__` will first to check `value` type is 
`Column` to avoid raise unexpected error in `is_list_like` like `Cannot convert 
column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when 
building DataFrame boolean expressions.`.
   
   * SPARK-38820 https://github.com/apache/spark/pull/36357 In Spark 3.4, the 
Pandas on Spark API `astype('category')` will also refresh `categories.dtype` 
according to original data `dtype` to follow pandas 1.4 behavior.
   
   * SPARK-38947 https://github.com/apache/spark/pull/36464 In Spark 3.4, the 
Pandas on Spark API supports groupby positional indexing in `GroupBy.head` and 
`GroupBy.tail` to follow pandas 1.4. Negative arguments now work correctly and 
result in ranges relative to the end and start of each group, Previously, 
negative arguments returned empty frames.
   
   * SPARK-39317 https://github.com/apache/spark/pull/36699 In Spark 3.4, the 
infer schema process of `groupby.apply` in Pandas on Spark, will first infer 
the pandas type to ensure the accuracy of the pandas `dtype` as much as 
possible.
   
   * SPARK-39314 https://github.com/apache/spark/pull/36711 In Spark 3.4, the 
`Series.concat` sort parameter will be respected to follow pandas 1.4 behaviors.
   
   
   For other test only fixes, I don't add migration doc: SPARK-38821 
SPARK-39053 SPARK-38982
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893038044


##
core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala:
##
@@ -187,8 +188,8 @@ class OutputCommitCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfter {
 // The authorized committer now fails, clearing the lock
 outputCommitCoordinator.taskCompleted(stage, stageAttempt, partition,
   attemptNumber = authorizedCommitter, reason = TaskKilled("test"))
-// A new task should now be allowed to become the authorized committer
-assert(outputCommitCoordinator.canCommit(stage, stageAttempt, partition,
+// A new task should not be allowed to become stage failed because of may 
cause data duplication

Review Comment:
   ```suggestion
   // A new task should not be allowed to become stage failed because of 
potential data duplication
   ```



##
core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala:
##
@@ -235,7 +236,8 @@ class OutputCommitCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfter {
 assert(!outputCommitCoordinator.canCommit(stage, 3, partition, 
taskAttempt))
 outputCommitCoordinator.taskCompleted(stage, 1, partition, taskAttempt,
   ExecutorLostFailure("0", exitCausedByApp = true, None))
-assert(outputCommitCoordinator.canCommit(stage, 4, partition, taskAttempt))
+// A new task should not be allowed to become stage failed because of may 
cause data duplication

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893037549


##
core/src/main/scala/org/apache/spark/SparkEnv.scala:
##
@@ -423,6 +432,7 @@ object SparkEnv extends Logging {
 
 envInstance
   }
+  // scalastyle:on argcount

Review Comment:
   this can be put right after the parameter list



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



gengliangwang commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r893036298


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3828,6 +3828,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_ARRAY_SORT_FAILS_ON_NULL_COMPARISON_RESULT =
+buildConf("spark.sql.legacy.arraySortFailsOnNullComparisonResult")

Review Comment:
   I feel the "fail" behavior is the legacy one from the new conf name. The 
previous name was good. Or we can follow the other conf style 
`spark.sql.legacy.allowNullComparisonResultInArraySort`.
   Sorry about the previous trivial comment: 
https://github.com/apache/spark/pull/36812#discussion_r892996318



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a diff in pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



yaooqinn commented on code in PR #36810:
URL: https://github.com/apache/spark/pull/36810#discussion_r893034665


##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala:
##
@@ -1259,6 +1259,14 @@ class ParquetV2PartitionDiscoverySuite extends 
ParquetPartitionDiscoverySuite {
 assert("p_int=10/p_float=1.0" === path)
   }
 
+  test("SPARK-39417: Null partition value") {
+// null partition value are replace by DEFAULT_PARTITION_NAME before 
hitting getPathFragment.

Review Comment:
   nit: `null partition value is replaced`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150621696

   It actually fails with 1.7.0 too 😂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #36807: [WIP][SPARK-39414][BUILD] Upgrade Scala to 2.12.16

2022-06-08 Thread GitBox



LuciferYang commented on PR #36807:
URL: https://github.com/apache/spark/pull/36807#issuecomment-1150621586

   `Run documentation build` failed:
   
   ```
   /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: 
UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate 
the documentation; however, your version was 1.3.5
 warnings.warn(
   Warning, treated as error:
   node class 'meta' is already registered, its visitors will be overridden
   make: *** [Makefile:35: html] Error 2
   
 Jekyll 4.2.1   Please append `--trace` to the `build` command 
for any additional information or backtrace. 
   
   /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
Python doc generation failed (RuntimeError)
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `require'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `block in require_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `require_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
 `block in require_plugin_files'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `require_plugin_files'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
 `conscientious_require'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
 `setup'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:36:in
 `initialize'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `new'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:30:in
 `process'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `block in process_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/command.rb:91:in
 `process_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/commands/build.rb:18:in
 `block (2 levels) in init_with_program'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `block in execute'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/command.rb:221:in
 `execute'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary/program.rb:44:in
 `go'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/mercenary-0.4.0/lib/mercenary.rb:21:in
 `program'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/exe/jekyll:15:in
 `'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `load'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/bin/jekyll:23:in `'
   Error: Process completed with exit code 1.
   ```
   
   But I think this not  related to current pr because master branch has some 
issue now:
   
   - https://github.com/apache/spark/runs/6803919840?check_suite_focus=true
   - https://github.com/apache/spark/runs/6799560292?check_suite_focus=true
   - https://github.com/apache/spark/runs/6801448545?check_suite_focus=true
   - https://github.com/apache/spark/runs/6803919840?check_suite_focus=true 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



cloud-fan commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150620951

   I think higher version is better? maybe we should change the docker file 
later (after 3.3 is released...)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150620405

   I matched the version to Dockerfile.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request, #35250: [SPARK-37961][SQL] Override maxRows/maxRowsPerPartition for some logical operators

2022-06-08 Thread GitBox



zhengruifeng opened a new pull request, #35250:
URL: https://github.com/apache/spark/pull/35250

   ### What changes were proposed in this pull request?
   
   
   1, override `maxRowsPerPartition` in 
`Sort`,`Expand`,`Sample`,`CollectMetrics`;
   2, override `maxRows` in `Expand`,`CollectMetrics`;
   
   ### Why are the changes needed?
   
   1, provide an accurate value if possible
   2, `SampleExec` with replace may generate more rows than child node;
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   added testsuites


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r893029627


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip install 'docutils<1.8.0' # See SPARK-39421

Review Comment:
   ```suggestion
   python3.9 -m pip install 'docutils<1.7.0' # See SPARK-39421
   ```



##
dev/requirements.txt:
##
@@ -35,6 +35,7 @@ numpydoc
 jinja2<3.0.0
 sphinx<3.1.0
 sphinx-plotly-directive
+docutils<1.8.0

Review Comment:
   ```suggestion
   docutils<1.7.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-08 Thread GitBox



cloud-fan closed pull request #36586: [SPARK-39236][SQL] Make CreateTable and 
ListTables be compatible with 3 layer namespace
URL: https://github.com/apache/spark/pull/36586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-08 Thread GitBox



cloud-fan commented on PR #36586:
URL: https://github.com/apache/spark/pull/36586#issuecomment-1150618882

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36586:
URL: https://github.com/apache/spark/pull/36586#discussion_r893029290


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##
@@ -965,6 +965,10 @@ class SessionCatalog(
 isTempView(nameParts.asTableIdentifier)
   }
 
+  def isGlobalTempViewDB(dbName: String): Boolean = {
+globalTempViewManager.database.equals(dbName)

Review Comment:
   this should use `equalsIgnoreCase`, we can fix it in a followup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] singhpk234 commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



singhpk234 commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150618735

   @cloud-fan, This seems to be introduced via 
[commit](https://github.com/apache/spark/commit/fc29c91f27d866502f5b6cc4261d4943b57e),
 [SPARK-35561](https://issues.apache.org/jira/browse/SPARK-35561) and seems to 
affect `3.3.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1150615271

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

2022-06-08 Thread GitBox



dongjoon-hyun closed pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a 
test case for HIVE-25190
URL: https://github.com/apache/spark/pull/36787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #36805: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread GitBox



beliefer commented on PR #36805:
URL: https://github.com/apache/spark/pull/36805#issuecomment-1150613902

   @huaxingao @cloud-fan Thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-08 Thread GitBox



mridulm commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r893022070


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -576,6 +661,7 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) {
   } finally {
 partition.closeAllFilesAndDeleteIfNeeded(false);
   }
+  
cleanUpAppShufflePartitionInfoInDB(partition.appAttemptShuffleMergeId);

Review Comment:
   If we had a test for this, curious why it was not surfacing this issue as a 
test failure. Perhaps something needs to be augmented in the test ? Or some 
other bug ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-08 Thread GitBox



mridulm commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r893021304


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -992,6 +1233,45 @@ AppShufflePartitionInfo getPartitionInfo() {
 }
   }
 
+  /**
+   * Simply encodes an application attempt ID.
+   */
+  public static class AppAttemptId {

Review Comment:
   Yes, `equals` would be required.
   I was suggesting to make sure they are all following similar pattern, and go 
from cheaper to more expensive checks.
   Also, to have `hashCode` if we are overriding `equals`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-08 Thread GitBox



ulysses-you commented on code in PR #36698:
URL: https://github.com/apache/spark/pull/36698#discussion_r893019308


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -734,11 +920,35 @@ case class Pmod(
 
   override def nullable: Boolean = true
 
+  override def decimalMethod: String = "remainder"
+
+  // This follows Remainder rule
+  override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): 
DecimalType = {
+val resultScale = max(s1, s2)
+val resultPrecision = min(p1 - s1, p2 - s2) + resultScale
+if (allowPrecisionLoss) {
+  DecimalType.adjustPrecisionScale(resultPrecision, resultScale)
+} else {
+  DecimalType.bounded(resultPrecision, resultScale)
+}
+  }
+
   private lazy val isZero: Any => Boolean = right.dataType match {
 case _: DecimalType => x => x.asInstanceOf[Decimal].isZero
 case _ => x => x == 0
   }
 
+  private lazy val pmodCache: (Any, Any) => Any = (l, r) => dataType match {

Review Comment:
   good catch..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36564: [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status

2022-06-08 Thread GitBox



AngersZh commented on code in PR #36564:
URL: https://github.com/apache/spark/pull/36564#discussion_r893017815


##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -461,7 +467,8 @@ class SparkContext(config: SparkConf) extends Logging {
 listenerBus.addToStatusQueue(_statusStore.listener.get)
 
 // Create the Spark execution environment (cache, map output tracker, etc)
-_env = createSparkEnv(_conf, isLocal, listenerBus)
+_env = createSparkEnv(_conf, isLocal, listenerBus,
+new OutputCommitCoordinator(conf, true, Option(this)))

Review Comment:
   > this looks weird, I think we just need to pass `this` to `createSparkEnv`, 
which creates driver-side `SparkEnv` that contains `OutputCommitCoordinator`
   
   How about current?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36698:
URL: https://github.com/apache/spark/pull/36698#discussion_r893017456


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -734,11 +920,35 @@ case class Pmod(
 
   override def nullable: Boolean = true
 
+  override def decimalMethod: String = "remainder"
+
+  // This follows Remainder rule
+  override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): 
DecimalType = {
+val resultScale = max(s1, s2)
+val resultPrecision = min(p1 - s1, p2 - s2) + resultScale
+if (allowPrecisionLoss) {
+  DecimalType.adjustPrecisionScale(resultPrecision, resultScale)
+} else {
+  DecimalType.bounded(resultPrecision, resultScale)
+}
+  }
+
   private lazy val isZero: Any => Boolean = right.dataType match {
 case _: DecimalType => x => x.asInstanceOf[Decimal].isZero
 case _ => x => x == 0
   }
 
+  private lazy val pmodCache: (Any, Any) => Any = (l, r) => dataType match {

Review Comment:
   this doesn't work... we should do
   ```
   dataType match {
 case _: IntegerType => (l, r) => ...
 ...
   }
   ```



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -734,11 +920,35 @@ case class Pmod(
 
   override def nullable: Boolean = true
 
+  override def decimalMethod: String = "remainder"
+
+  // This follows Remainder rule
+  override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): 
DecimalType = {
+val resultScale = max(s1, s2)
+val resultPrecision = min(p1 - s1, p2 - s2) + resultScale
+if (allowPrecisionLoss) {
+  DecimalType.adjustPrecisionScale(resultPrecision, resultScale)
+} else {
+  DecimalType.bounded(resultPrecision, resultScale)
+}
+  }
+
   private lazy val isZero: Any => Boolean = right.dataType match {
 case _: DecimalType => x => x.asInstanceOf[Decimal].isZero
 case _ => x => x == 0
   }
 
+  private lazy val pmodCache: (Any, Any) => Any = (l, r) => dataType match {

Review Comment:
   ```suggestion
 private lazy val pmodFunc: (Any, Any) => Any = (l, r) => dataType match {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #36815:
URL: https://github.com/apache/spark/pull/36815#issuecomment-1150604227

   Thank you, @sunchao and @cloud-fan .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-08 Thread GitBox



Eugene-Mark commented on PR #36499:
URL: https://github.com/apache/spark/pull/36499#issuecomment-1150603723

   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #36815:
URL: https://github.com/apache/spark/pull/36815#issuecomment-1150603278

   Thank you, @HyukjinKwon !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36815:
URL: https://github.com/apache/spark/pull/36815#issuecomment-1150603145

   👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #36815:
URL: https://github.com/apache/spark/pull/36815#issuecomment-1150602874

   cc @maryannxue , @cloud-fan , @HyukjinKwon , @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #34929: [SPARK-37670][SQL] Support predicate pushdown and column pruning for de-duped CTEs

2022-06-08 Thread GitBox



dongjoon-hyun commented on PR #34929:
URL: https://github.com/apache/spark/pull/34929#issuecomment-1150601161

   Here is a followup.
   - https://github.com/apache/spark/pull/36815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36815: [SPARK-37670][FOLLOWUP][SQL][TESTS][3.2] Update TPCDS golden files

2022-06-08 Thread GitBox



dongjoon-hyun opened a new pull request, #36815:
URL: https://github.com/apache/spark/pull/36815

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36683:
URL: https://github.com/apache/spark/pull/36683#issuecomment-1150592704

   Let me merge this in few days ... assuming that we're all good. Hopefully my 
benchmark is good enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150589281

   seems like Max already fixed it in 
https://github.com/apache/spark/blob/master/dev/create-release/spark-rm/Dockerfile#L45


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen opened a new pull request, #36814: [SPARK-39422][SQL] Improve error message for 'SHOW CREATE TABLE' with unsupported serdes

2022-06-08 Thread GitBox



JoshRosen opened a new pull request, #36814:
URL: https://github.com/apache/spark/pull/36814

   ### What changes were proposed in this pull request?
   
   This PR improves the error message that is thrown when trying to run `SHOW 
CREATE TABLE` on a Hive table with an unsupported serde. Currently this results 
in an error like
   
   ```
   org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
against table rcFileTable, which is created by Hive and uses the following 
unsupported serde configuration
SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat 
   ```
   
   This patch improves this error message by adding a suggestion to use `SHOW 
CREATE TABLE ... AS SERDE`:
   
   ```
   org.apache.spark.sql.AnalysisException: Failed to execute SHOW CREATE TABLE 
against table rcFileTable, which is created by Hive and uses the following 
unsupported serde configuration
SERDE: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe 
INPUTFORMAT: org.apache.hadoop.hive.ql.io.RCFileInputFormat OUTPUTFORMAT: 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat
   Please use `SHOW CREATE TABLE rcFileTable AS SERDE` to show Hive DDL instead.
   ```
   
   ### Why are the changes needed?
   
   The existing error message is confusing.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it improves a user-facing error message.
   
   ### How was this patch tested?
   
   Manually tested with
   
   ```
   CREATE TABLE rcFileTable(i INT)
   ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe'
   STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
   
   SHOW CREATE TABLE rcFileTable
   ```
   
   to trigger the error.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36698:
URL: https://github.com/apache/spark/pull/36698#discussion_r893003521


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -953,15 +952,16 @@ case class Pmod(
   // when we reach here, failOnError must bet true.
   throw QueryExecutionErrors.divideByZeroError(queryContext)
 }
-input1 match {
-  case i: Integer => pmod(i, input2.asInstanceOf[java.lang.Integer])
-  case l: Long => pmod(l, input2.asInstanceOf[java.lang.Long])
-  case s: Short => pmod(s, input2.asInstanceOf[java.lang.Short])
-  case b: Byte => pmod(b, input2.asInstanceOf[java.lang.Byte])
-  case f: Float => pmod(f, input2.asInstanceOf[java.lang.Float])
-  case d: Double => pmod(d, input2.asInstanceOf[java.lang.Double])
-  case d: Decimal => checkOverflow(
-pmod(d, input2.asInstanceOf[Decimal]), 
dataType.asInstanceOf[DecimalType])
+
+dataType match {

Review Comment:
   e.g. maybe we can call `dataType` only once and create a `(Any, Any) => Any` 
lambda and use it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36698:
URL: https://github.com/apache/spark/pull/36698#discussion_r893003240


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##
@@ -953,15 +952,16 @@ case class Pmod(
   // when we reach here, failOnError must bet true.
   throw QueryExecutionErrors.divideByZeroError(queryContext)
 }
-input1 match {
-  case i: Integer => pmod(i, input2.asInstanceOf[java.lang.Integer])
-  case l: Long => pmod(l, input2.asInstanceOf[java.lang.Long])
-  case s: Short => pmod(s, input2.asInstanceOf[java.lang.Short])
-  case b: Byte => pmod(b, input2.asInstanceOf[java.lang.Byte])
-  case f: Float => pmod(f, input2.asInstanceOf[java.lang.Float])
-  case d: Double => pmod(d, input2.asInstanceOf[java.lang.Double])
-  case d: Decimal => checkOverflow(
-pmod(d, input2.asInstanceOf[Decimal]), 
dataType.asInstanceOf[DecimalType])
+
+dataType match {

Review Comment:
   we should cache it somewhere, otherwise it's very expensive to call it per 
row.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox



cloud-fan closed pull request #36693: [SPARK-39349] Add a centralized 
CheckError method for QA of error path
URL: https://github.com/apache/spark/pull/36693


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox



cloud-fan commented on PR #36693:
URL: https://github.com/apache/spark/pull/36693#issuecomment-1150579697

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-08 Thread GitBox



cloud-fan commented on PR #36693:
URL: https://github.com/apache/spark/pull/36693#issuecomment-1150579622

   The python doc issue is being fixed by 
https://github.com/apache/spark/pull/36813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



gengliangwang commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892996318


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala:
##
@@ -375,9 +375,17 @@ case class ArrayTransform(
 // scalastyle:on line.size.limit
 case class ArraySort(
 argument: Expression,
-function: Expression)
+function: Expression,
+handleComparisonResultNullAsZero: Boolean)

Review Comment:
   How about `failOnNullComparisonResult`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



cloud-fan commented on PR #36813:
URL: https://github.com/apache/spark/pull/36813#issuecomment-1150577614

   how about the docker file we use for release?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



gengliangwang commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892992649


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3828,6 +3828,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_ARRAY_SORT_HANDLES_COMPARISON_RESULT_NULL_VALUE_AS_ZERO =
+buildConf("spark.sql.legacy.arraySortHandlesComparisonResultNullAsZero")

Review Comment:
   +1 with @cloud-fan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892991176


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3828,6 +3828,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_ARRAY_SORT_HANDLES_COMPARISON_RESULT_NULL_VALUE_AS_ZERO =
+buildConf("spark.sql.legacy.arraySortHandlesComparisonResultNullAsZero")

Review Comment:
   I don't think so, legacy config is for legacy behaviors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



cloud-fan commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892990862


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -2012,4 +2012,12 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 new SparkException(
   errorClass = "MULTI_VALUE_SUBQUERY_ERROR", messageParameters = 
Array(plan), cause = null)
   }
+
+  def comparisonResultIsNullError(): Throwable = {
+new NullPointerException(

Review Comment:
   yea, with the new error class framework, the java exception type doesn't 
matter any more. We can always use `SparkException`, but we need to add an 
error class for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r892987817


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip 'docutils<1.8.0' # See SPARK-39421

Review Comment:
   ```suggestion
   python3.9 -m pip install 'docutils<1.8.0' # See SPARK-39421
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892987033


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -2012,4 +2012,12 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 new SparkException(
   errorClass = "MULTI_VALUE_SUBQUERY_ERROR", messageParameters = 
Array(plan), cause = null)
   }
+
+  def comparisonResultIsNullError(): Throwable = {
+new NullPointerException(

Review Comment:
   I remember the exception being thrown should inherit `SparkThrowable` when 
it's thrown to end-users ... cc @MaxGekk am I remembering it correctly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892987033


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -2012,4 +2012,12 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
 new SparkException(
   errorClass = "MULTI_VALUE_SUBQUERY_ERROR", messageParameters = 
Array(plan), cause = null)
   }
+
+  def comparisonResultIsNullError(): Throwable = {
+new NullPointerException(

Review Comment:
   I remember the exception being thrown should inherit `SparkThrowable` when 
it's thrown to end-users ... cc @MaxGekk am I remembering it correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36812:
URL: https://github.com/apache/spark/pull/36812#discussion_r892986587


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3828,6 +3828,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_ARRAY_SORT_HANDLES_COMPARISON_RESULT_NULL_VALUE_AS_ZERO =
+buildConf("spark.sql.legacy.arraySortHandlesComparisonResultNullAsZero")

Review Comment:
   qq should it be covered by ansi config @gengliangwang ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36793: [SPARK-39406][PYTHON] Accept NumPy array in createDataFrame

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36793:
URL: https://github.com/apache/spark/pull/36793#discussion_r892985507


##
python/pyspark/sql/tests/test_arrow.py:
##
@@ -495,6 +509,22 @@ def test_schema_conversion_roundtrip(self):
 schema_rt = from_arrow_schema(arrow_schema)
 self.assertEqual(self.schema, schema_rt)
 
+@unittest.skipIf(not have_numpy, cast(str, numpy_requirement_message))

Review Comment:
   That's fine. It's not the main code but testing. Let's just keep it simple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon commented on code in PR #36813:
URL: https://github.com/apache/spark/pull/36813#discussion_r892981613


##
.github/workflows/build_and_test.yml:
##
@@ -549,6 +549,7 @@ jobs:
 #   See also https://issues.apache.org/jira/browse/SPARK-38279.
 python3.9 -m pip install 'sphinx<3.1.0' mkdocs pydata_sphinx_theme 
ipython nbsphinx numpydoc 'jinja2<3.0.0' 'markupsafe==2.0.1'
 python3.9 -m pip install ipython_genutils # See SPARK-38517
+python3.9 -m pip 'docutils<1.8' # See SPARK-39421

Review Comment:
   ```suggestion
   python3.9 -m pip 'docutils<1.8.0' # See SPARK-39421
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #36813: [SPARK-39421][PYTHON][DOCS] Pin the docutils version <1.8 in documentation build

2022-06-08 Thread GitBox



HyukjinKwon opened a new pull request, #36813:
URL: https://github.com/apache/spark/pull/36813

   ### What changes were proposed in this pull request?
   
   This PR fixes the Sphinx build failure below (see 
https://github.com/singhpk234/spark/runs/6799026458?check_suite_focus=true): 
   
   ```
   Moving to python/docs directory and building sphinx.
   Running Sphinx v3.0.4
   WARNING:root:'PYARROW_IGNORE_TIMEZONE' environment variable was not set. It 
is required to set this environment variable to '1' in both driver and executor 
sides if you use pyarrow>=2.0.0. pandas-on-Spark will set it for you but it 
does not work if there is a Spark context already launched.
   /__w/spark/spark/python/pyspark/pandas/supported_api_gen.py:101: 
UserWarning: Warning: Latest version of pandas(>=1.4.0) is required to generate 
the documentation; however, your version was 1.3.5
 warnings.warn(
   Warning, treated as error:
   node class 'meta' is already registered, its visitors will be overridden
   make: *** [Makefile:35: html] Error 2
   
 Jekyll 4.2.1   Please append `--trace` to the `build` command 
for any additional information or backtrace. 
   
   ```
   
   Sphinx build fails apparently with the latest docutils (see also 
https://issues.apache.org/jira/browse/FLINK-24662). we should pin the version.
   
   ### Why are the changes needed?
   
   To recover the CI.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   CI in this PR should test it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a diff in pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



wangyum commented on code in PR #36810:
URL: https://github.com/apache/spark/pull/36810#discussion_r892980250


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala:
##
@@ -359,7 +359,12 @@ object PartitioningUtils extends SQLConfHelper{
   def removeLeadingZerosFromNumberTypePartition(value: String, dataType: 
DataType): String =
 dataType match {
   case ByteType | ShortType | IntegerType | LongType | FloatType | 
DoubleType =>
-castPartValueToDesiredType(dataType, value, null).toString
+val castedPartVal = castPartValueToDesiredType(dataType, value, null)
+if (castedPartVal != null) {
+  castedPartVal.toString
+} else {
+  null
+}

Review Comment:
   `Option(castPartValueToDesiredType(dataType, value, 
null)).map(_.toString).orNull`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



cloud-fan commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150557713

   do we know which commit caused this issue? is it a 3.3 only bug?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #35605: [SPARK-38280][SQL] The Rank windows to be ordered is not necessary in a query.

2022-06-08 Thread GitBox



github-actions[bot] commented on PR #35605:
URL: https://github.com/apache/spark/pull/35605#issuecomment-1150538958

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-08 Thread GitBox



wangyum commented on PR #36786:
URL: https://github.com/apache/spark/pull/36786#issuecomment-1150532390

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum closed pull request #36786: [SPARK-39400][SQL] spark-sql should remove hive resource dir in all case

2022-06-08 Thread GitBox



wangyum closed pull request #36786: [SPARK-39400][SQL] spark-sql should remove 
hive resource dir in all case
URL: https://github.com/apache/spark/pull/36786


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36810: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-08 Thread GitBox



huaxingao commented on PR #36810:
URL: https://github.com/apache/spark/pull/36810#issuecomment-1150517588

   The Python doc generation failure seems to be irrelevant. All the other 
tests passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin opened a new pull request, #36812: [SPARK-39419][SQL] Fix ArraySort to throw an exception when the comparator returns null

2022-06-08 Thread GitBox



ueshin opened a new pull request, #36812:
URL: https://github.com/apache/spark/pull/36812

   ### What changes were proposed in this pull request?
   
   Fixes `ArraySort` to throw an exception when the comparator returns `null`.
   
   Also updates the doc to follow the corrected behavior.
   
   ### Why are the changes needed?
   
   When the comparator of `ArraySort` returns `null`, currently it handles it 
as `0` (equal).
   
   According to the doc, 
   
   ```
   It returns -1, 0, or 1 as the first element is less than, equal to, or 
greater than the second element. If the comparator function returns other 
values (including null), the function will fail and raise an error.
   ```
   
   It's fine to return non -1, 0, 1 integers to follow the Java convention 
(still need to update the doc, though), but it should throw an exception for 
`null` result.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, if a user uses a comparator that returns `null`, it will throw an error 
after this PR.
   
   The legacy flag 
`spark.sql.legacy.arraySortHandlesComparisonResultNullAsZero` can be used to 
handle `null` as `0` (equal).
   
   ### How was this patch tested?
   
   Added some tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-08 Thread GitBox



zhouyejoe commented on code in PR #35906:
URL: https://github.com/apache/spark/pull/35906#discussion_r892940331


##
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java:
##
@@ -576,6 +661,7 @@ public MergeStatuses 
finalizeShuffleMerge(FinalizeShuffleMerge msg) {
   } finally {
 partition.closeAllFilesAndDeleteIfNeeded(false);
   }
+  
cleanUpAppShufflePartitionInfoInDB(partition.appAttemptShuffleMergeId);

Review Comment:
   Just found that we do have a UT which tests that a [finalized 
shuffle](https://github.com/apache/spark/pull/35906/files#diff-6dac15558c856c231c284e221b8ae0cdd2ad79c98586eb7e49c3ddcefa53a241R263),
 after a restart, getOrCreateAppShufflePartitionInfo will trigger 
[BlockPushNonFatalFailure](https://github.com/apache/spark/pull/35906/files#diff-6dac15558c856c231c284e221b8ae0cdd2ad79c98586eb7e49c3ddcefa53a241R289)
 with an exception message containing "is finalized". This UT should have 
tested that the DB contains the partitionInfo after a successful finalization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 183 matches

Mail list logo