[GitHub] [spark] hasnain-db commented on pull request #42685: [WIP][SPARK-44937][CORE] Add SSL/TLS support for RPC and Shuffle communications

2023-09-22 Thread via GitHub


hasnain-db commented on PR #42685:
URL: https://github.com/apache/spark/pull/42685#issuecomment-1732226082

   Thanks @mridulm ! Happy to do that. I can think of a nice split in line with 
the bullet points listed in the summary here.
   
   Just to confirm (since I'm not sure this repo has support for stacked PRs - 
if there is, please link me to an example) - you're proposing I put up one PR, 
get it approved and merged, then put up the second PR, and so on, right (since 
most changes depend on each other).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43065: [SPARK-45287][TESTS] Add Java 21 
benchmark result and update Java 17 ones
URL: https://github.com/apache/spark/pull/43065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43065:
URL: https://github.com/apache/spark/pull/43065#issuecomment-1732206610

   I'll merge this because this PR doesn't touch any code. These are purely 
generated files as the snapshot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43065:
URL: https://github.com/apache/spark/pull/43065#issuecomment-1732206126

   Thank you for thorough reviews. Ya, we should catch up them one by one after 
having this. This helps us be in the same page and monitor this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


LuciferYang commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334916319


##
sql/core/benchmarks/SortBenchmark-results.txt:
##
@@ -2,15 +2,15 @@
 radix sort
 

 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 radix sort 2500:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-reference TimSort key prefix array12374  12403 
 41  2.0 495.0   1.0X
-reference Arrays.sort  3377   3381 
  5  7.4 135.1   3.7X
-radix sort one byte 209212 
  2119.5   8.4  59.2X
-radix sort two bytes398403 
  3 62.8  15.9  31.1X
-radix sort eight bytes 1538   1538 
  0 16.3  61.5   8.0X
-radix sort key prefix array1953   1998 
 64 12.8  78.1   6.3X
+reference TimSort key prefix array14141  14208 
 96  1.8 565.6   1.0X

Review Comment:
   ditto



##
sql/core/benchmarks/ColumnarBatchBenchmark-results.txt:
##
@@ -2,58 +2,58 @@
 Int Read/Write
 

 
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
 Int Read/Write:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Java Array  257266 
  8   1273.6   0.8   1.0X
-ByteBuffer Unsafe   480490 
  5682.0   1.5   0.5X
-ByteBuffer API 1994   1996 
  2164.4   6.1   0.1X
-DirectByteBuffer756762 
  7433.6   2.3   0.3X
-Unsafe Buffer   255263 
  4   1283.1   0.8   1.0X
-Column(on heap) 266272 
  6   1231.5   0.8   1.0X
-Column(off heap)526529 
  2623.1   1.6   0.5X
-Column(off heap direct) 258265 
  7   1270.3   0.8   1.0X
-UnsafeRow (on heap) 556560 
  6589.0   1.7   0.5X
-UnsafeRow (off heap)599606 
  5546.9   1.8   0.4X
-Column On Heap Append   478488 
  6686.0   1.5   0.5X
+Java Array  254261 
  5   1290.1   0.8   1.0X
+ByteBuffer Unsafe   420427 
  8780.2   1.3   0.6X
+ByteBuffer API  801822 
 28409.0   2.4   0.3X
+DirectByteBuffer661668 
  7495.8   2.0   0.4X
+Unsafe Buffer   253266 
 10   1296.0   0.8   1.0X
+Column(on heap) 254261 
  4   1292.2   0.8   1.0X
+Column(off heap)255261 
  5   1287.3   0.8   1.0X
+Column(off heap direct) 253258 
  6   1297.3   0.8   1.0X
+UnsafeRow (on heap) 722729 
  9454.1   2.2   0.4X
+UnsafeRow (off heap)532543 
 13616.3   1.6   0.5X
+Column On Heap Append   516

[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43069:
URL: https://github.com/apache/spark/pull/43069#issuecomment-1732200379

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s 
v1.25 and lower version support
URL: https://github.com/apache/spark/pull/43069


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43069:
URL: https://github.com/apache/spark/pull/43069#issuecomment-1732200263

   Thank you so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork

2023-09-22 Thread via GitHub


yaooqinn commented on PR #43053:
URL: https://github.com/apache/spark/pull/43053#issuecomment-1732198539

   Thank you all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43069:
URL: https://github.com/apache/spark/pull/43069#issuecomment-1732194064

   Could you review this doc-only PR, @LuciferYang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732193985

   Thank you, @LuciferYang !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43065:
URL: https://github.com/apache/spark/pull/43065#issuecomment-1732193905

   Thank you, @LuciferYang . Now, the PR is ready by adding 
AnsiIntervalSortBenchmark (Java17/21).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334911535


##
sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt:
##
@@ -1,10 +1,10 @@
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 constructor:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-arrayOfAny4  4 
  0   2491.5   0.4   1.0X
-arrayOfAnyAsObject  256257 
  1 39.1  25.6   0.0X
-arrayOfAnyAsSeq  18 18 
  0551.9   1.8   0.2X
-arrayOfInt  536537 
  1 18.7  53.6   0.0X
-arrayOfIntAsObject  788794 
 10 12.7  78.8   0.0X
+arrayOfAny7  7 
  0   1495.4   0.7   1.0X
+arrayOfAnyAsObject7  7 
  0   1495.3   0.7   1.0X
+arrayOfAnyAsSeq 201202 
  1 49.8  20.1   0.0X

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334911491


##
core/benchmarks/ZStandardBenchmark-results.txt:
##
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 

 
-OpenJDK 64-Bit Server VM 1.8.0_372-b07 on Linux 5.15.0-1041-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Compression 1 times at level 1 without buffer pool293  
  327  85  0.0   29283.2   1.0X
-Compression 1 times at level 2 without buffer pool322  
  324   2  0.0   32184.8   0.9X
-Compression 1 times at level 3 without buffer pool453  
  456   2  0.0   45285.1   0.6X
-Compression 1 times at level 1 with buffer pool   171  
  173   1  0.1   17065.2   1.7X
-Compression 1 times at level 2 with buffer pool   208  
  209   1  0.0   20786.5   1.4X
-Compression 1 times at level 3 with buffer pool   334  
  335   2  0.0   33350.3   0.9X
+Compression 1 times at level 1 without buffer pool   2800  
 2801   2  0.0  279995.2   1.0X

Review Comment:
   Yes, this one has been on my todo list. Will try to identify the root cause.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732193237

   Merged into master for Apache Spark 4.0, thanks @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang closed pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


LuciferYang closed pull request #43066: [SPARK-45288][TESTS] Remove outdated 
benchmark result files, `*-jdk1[17]*results.txt`
URL: https://github.com/apache/spark/pull/43066


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43069:
URL: https://github.com/apache/spark/pull/43069

   ### What changes were proposed in this pull request?
   
   This PR aims to update K8s doc to recommend K8s 1.26+ for Apache Spark 4.0.0.
   
   ### Why are the changes needed?
   
   **1. Default K8s Version in Public Cloud environments**
   
   The default K8s versions of public cloud providers are already K8s 1.27+.
   
   - EKS: v1.27 (Default)
   - GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid)
   
   **2. End Of Support**
   
   In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 
4.0.0 arrives on June 2024. K8s 1.26 is also going to reach EOL on June.
   
   | K8s  |   AKS   |   GKE   |   EKS   |
   |  | --- | --- | --- |
   | 1.27 | 2024-07 | 2024-08 | 2024-07 |
   | 1.26 | 2024-03 | 2024-06 | 2024-06 |
   | 1.25 | 2023-12 | 2024-02 | 2024-05 |
   | 1.24 | 2023-07 | 2023-10 | 2024-01 |
   
   - [AKS EOL 
Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
   - [GKE EOL 
Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
   - [EKS EOL 
Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)
   
   ### Does this PR introduce _any_ user-facing change?
   
   - No, this is a documentation-only change about K8s versions.
   - Apache Spark K8s Integration Test is currently using K8s v1.26.3 on 
Minikube.
   
   ### How was this patch tested?
   
   Manual review.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1732191906

   > Thank you. I checked now. Spark doc is updated with Java 17. So, we don't 
need to mention here. It seems that we need to fix it from `Java 17` to 
`Java17/21`. I'll handle it independently because it's Java 21 stuff.
   > 
   > 
https://github.com/apache/spark/blob/06ccb6d434476afacc08936cf473670102d41010/docs/index.md?plain=1#L37
   
   
https://github.com/apache/spark/blob/51938fea36af19824a657c0326af9de03393e1dd/docs/building-spark.md?plain=1#L29-L31
   
   `building-spark.md` should also include Java 21. I apologize, I previously 
only focused on Java 17.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1732191440

   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


LuciferYang commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334909906


##
sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt:
##
@@ -1,10 +1,10 @@
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 constructor:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-arrayOfAny4  4 
  0   2491.5   0.4   1.0X
-arrayOfAnyAsObject  256257 
  1 39.1  25.6   0.0X
-arrayOfAnyAsSeq  18 18 
  0551.9   1.8   0.2X
-arrayOfInt  536537 
  1 18.7  53.6   0.0X
-arrayOfIntAsObject  788794 
 10 12.7  78.8   0.0X
+arrayOfAny7  7 
  0   1495.4   0.7   1.0X
+arrayOfAnyAsObject7  7 
  0   1495.3   0.7   1.0X
+arrayOfAnyAsSeq 201202 
  1 49.8  20.1   0.0X

Review Comment:
   The results of `arrayOfAnyAsSeq` have undergone significant changes and need 
attention (it may be a known issue, but I can't remember the details).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


LuciferYang commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334909865


##
sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt:
##
@@ -1,10 +1,10 @@
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 constructor:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-arrayOfAny4  4 
  0   2491.5   0.4   1.0X
-arrayOfAnyAsObject  256257 
  1 39.1  25.6   0.0X
-arrayOfAnyAsSeq  18 18 
  0551.9   1.8   0.2X
-arrayOfInt  536537 
  1 18.7  53.6   0.0X
-arrayOfIntAsObject  788794 
 10 12.7  78.8   0.0X
+arrayOfAny7  7 
  0   1495.4   0.7   1.0X
+arrayOfAnyAsObject7  7 
  0   1495.3   0.7   1.0X

Review Comment:
   The results of `arrayOfAnyAsSeq` have undergone significant changes and need 
attention (it may be a known issue, but I can't remember the details).



##
sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt:
##
@@ -1,10 +1,10 @@
-OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure
-Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
 constructor:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-arrayOfAny4  4 
  0   2491.5   0.4   1.0X
-arrayOfAnyAsObject  256257 
  1 39.1  25.6   0.0X
-arrayOfAnyAsSeq  18 18 
  0551.9   1.8   0.2X
-arrayOfInt  536537 
  1 18.7  53.6   0.0X
-arrayOfIntAsObject  788794 
 10 12.7  78.8   0.0X
+arrayOfAny7  7 
  0   1495.4   0.7   1.0X
+arrayOfAnyAsObject7  7 
  0   1495.3   0.7   1.0X

Review Comment:
   The results of `arrayOfAnyAsSeq` have undergone significant changes and need 
attention (it may be a known issue, but I can't remember the details).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones

2023-09-22 Thread via GitHub


LuciferYang commented on code in PR #43065:
URL: https://github.com/apache/spark/pull/43065#discussion_r1334908931


##
core/benchmarks/ZStandardBenchmark-results.txt:
##
@@ -2,26 +2,26 @@
 Benchmark ZStandardCompressionCodec
 

 
-OpenJDK 64-Bit Server VM 1.8.0_372-b07 on Linux 5.15.0-1041-azure
+OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure
 Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Benchmark ZStandardCompressionCodec:Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
--
-Compression 1 times at level 1 without buffer pool293  
  327  85  0.0   29283.2   1.0X
-Compression 1 times at level 2 without buffer pool322  
  324   2  0.0   32184.8   0.9X
-Compression 1 times at level 3 without buffer pool453  
  456   2  0.0   45285.1   0.6X
-Compression 1 times at level 1 with buffer pool   171  
  173   1  0.1   17065.2   1.7X
-Compression 1 times at level 2 with buffer pool   208  
  209   1  0.0   20786.5   1.4X
-Compression 1 times at level 3 with buffer pool   334  
  335   2  0.0   33350.3   0.9X
+Compression 1 times at level 1 without buffer pool   2800  
 2801   2  0.0  279995.2   1.0X

Review Comment:
   From what I remember, the results of this microbenchmark are always unstable.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43053:
URL: https://github.com/apache/spark/pull/43053#issuecomment-1732185291

   Merged to master for Apache Spark 4.0.0. Thank you, @yaooqinn and all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43053: [SPARK-45274][CORE][SQL][UI] 
Implementation of a new DAG drawing approach for job/stage/plan graphics to 
avoid fork
URL: https://github.com/apache/spark/pull/43053


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #42685: [WIP][SPARK-44937][CORE] Add SSL/TLS support for RPC and Shuffle communications

2023-09-22 Thread via GitHub


mridulm commented on PR #42685:
URL: https://github.com/apache/spark/pull/42685#issuecomment-1732173622

   Thanks for working on this @hasnain-db , this is a very nice adding to spark 
!
   Given the size of the PR, can we split this up to make it easier to review ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732154701

   This PR is irrelevant from CI result. So, please note that I stopped the 
runnjng pipelines on this PR manually to unblock my other PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork

2023-09-22 Thread via GitHub


mridulm commented on PR #43053:
URL: https://github.com/apache/spark/pull/43053#issuecomment-1732154429

   Nice job @yaooqinn  !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a diff in pull request #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-22 Thread via GitHub


mridulm commented on code in PR #42950:
URL: https://github.com/apache/spark/pull/42950#discussion_r1334896106


##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1903,13 +1903,20 @@ private[spark] class DAGScheduler(
 
   case smt: ShuffleMapTask =>
 val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
-shuffleStage.pendingPartitions -= task.partitionId
+// Ignore task completion for old attempt of indeterminate stage
+val ignoreIndeterminate = stage.isIndeterminate &&
+  task.stageAttemptId < stage.latestInfo.attemptNumber()
+if (!ignoreIndeterminate) {
+  shuffleStage.pendingPartitions -= task.partitionId
+}
 val status = event.result.asInstanceOf[MapStatus]
 val execId = status.location.executorId
 logDebug("ShuffleMapTask finished on " + execId)
 if (executorFailureEpoch.contains(execId) &&
 smt.epoch <= executorFailureEpoch(execId)) {
   logInfo(s"Ignoring possibly bogus $smt completion from executor 
$execId")

Review Comment:
   So this is kind of funny - take a look at what the above replaced @cloud-fan 
: 
https://github.com/apache/spark/pull/16620/files#diff-85de35b2e85646ed499c545a3be1cd3ffd525a88aae835a9c621f877eebadcb6R1183
 :-)
   
   Both of these actually do not account for DETERMINATE/INDETERMINATE changes 
we made subsequently.
   IMO, for INDETERMINATE stages, we should ignore task completion events from 
previous attempts - since we have already cancelled the stage attempt.
   Having said that, I have not thought through the nuances here.
   
   +CC @jiangxb1987 as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a diff in pull request #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages

2023-09-22 Thread via GitHub


mridulm commented on code in PR #42950:
URL: https://github.com/apache/spark/pull/42950#discussion_r1334896106


##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1903,13 +1903,20 @@ private[spark] class DAGScheduler(
 
   case smt: ShuffleMapTask =>
 val shuffleStage = stage.asInstanceOf[ShuffleMapStage]
-shuffleStage.pendingPartitions -= task.partitionId
+// Ignore task completion for old attempt of indeterminate stage
+val ignoreIndeterminate = stage.isIndeterminate &&
+  task.stageAttemptId < stage.latestInfo.attemptNumber()
+if (!ignoreIndeterminate) {
+  shuffleStage.pendingPartitions -= task.partitionId
+}
 val status = event.result.asInstanceOf[MapStatus]
 val execId = status.location.executorId
 logDebug("ShuffleMapTask finished on " + execId)
 if (executorFailureEpoch.contains(execId) &&
 smt.epoch <= executorFailureEpoch(execId)) {
   logInfo(s"Ignoring possibly bogus $smt completion from executor 
$execId")

Review Comment:
   So this is kind of funny - take a look at what the above replaced @cloud-fan 
: 
https://github.com/apache/spark/pull/16620/files#diff-85de35b2e85646ed499c545a3be1cd3ffd525a88aae835a9c621f877eebadcb6R1183
 :-)
   
   Both of these actually do not account for DETERMINATE/INDETERMINATE changes 
we made subsequently.
   IMO, for INDETERMINATE stages, we should ignore task completion events from 
previous attempts - since we have already cancelled the state.
   Having said that, I have not thought through the nuances here.
   
   +CC @jiangxb1987 as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Hisoka-X commented on pull request #40963: [SPARK-43288][SQL] DataSourceV2: CREATE TABLE LIKE

2023-09-22 Thread via GitHub


Hisoka-X commented on PR #40963:
URL: https://github.com/apache/spark/pull/40963#issuecomment-1732151326

   @atronchi Hi, since this PR not updated... So I created one for create table 
like too, please check https://github.com/apache/spark/pull/42586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] chenyu-opensource commented on pull request #43028: [SPARK-45248][CORE]Set the timeout for spark ui server

2023-09-22 Thread via GitHub


chenyu-opensource commented on PR #43028:
URL: https://github.com/apache/spark/pull/43028#issuecomment-1732148383

   > OK, please put a comment in the code about why this is set lower than 
usual.
   
   Thank you for your suggestion and i had follow it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jchen5 opened a new pull request, #43068: [SPARK-44550][SQL] Enable correctness fixes for `null IN (empty list)` under ANSI

2023-09-22 Thread via GitHub


jchen5 opened a new pull request, #43068:
URL: https://github.com/apache/spark/pull/43068

   ### What changes were proposed in this pull request?
   Enables the correctness fixes for `null IN (empty list)` expressions
   
   `null IN (empty list)` incorrectly evaluates to null, when it should 
evaluate to false. (The reason it should be false is because a IN (b1, b2) is 
defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR 
which is false. This is specified by ANSI SQL.)
   
   Many places in Spark execution (In, InSet, InSubquery) and optimization 
(OptimizeIn, NullPropagation) implemented this wrong behavior. This is a 
longstanding correctness issue which has existed since null support for IN 
expressions was first added to Spark.
   
   See previous PRs where the fixes were implemented: 
https://github.com/apache/spark/pull/42007 and 
https://github.com/apache/spark/pull/42163.
   
   The behavior is under a flag. This PR enables the new behavior by default 
under ANSI, while under non-ANSI the old behavior remains the default for now. 
Later, we should switch the new behavior to default in both cases.
   
   See [this 
doc](https://docs.google.com/document/d/1k8AY8oyT-GI04SnP7eXttPDnDj-Ek-c3luF2zL6DPNU/edit)
 for more information.
   
   ### Why are the changes needed?
   Fix wrong SQL semantics
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, fix wrong SQL semantics
   
   ### How was this patch tested?
   Unit tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #40503: [SPARK-42830] [UI] Link skipped stages on Spark UI

2023-09-22 Thread via GitHub


github-actions[bot] commented on PR #40503:
URL: https://github.com/apache/spark/pull/40503#issuecomment-1732142967

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #40529: [SPARK-42890] [UI] add repeat identifier on SQL UI

2023-09-22 Thread via GitHub


github-actions[bot] commented on PR #40529:
URL: https://github.com/apache/spark/pull/40529#issuecomment-1732142946

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #40821: [SPARK-43152][spark-structured-streaming] Parametrisable output metadata path (_spark_metadata)

2023-09-22 Thread via GitHub


github-actions[bot] closed pull request #40821: 
[SPARK-43152][spark-structured-streaming] Parametrisable output metadata path 
(_spark_metadata)
URL: https://github.com/apache/spark/pull/40821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

2023-09-22 Thread via GitHub


github-actions[bot] commented on PR #40782:
URL: https://github.com/apache/spark/pull/40782#issuecomment-1732142935

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732140942

   Could you review this PR, @attilapiros ? After we switching to Java 17+, 
there are several clean-up PRs like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43064:
URL: https://github.com/apache/spark/pull/43064#issuecomment-1732140284

   Thank you. And, if you are fine with Apache Spark 4.0, that's great! I was 
worried. 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 opened a new pull request, #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false

2023-09-22 Thread via GitHub


warrenzhu25 opened a new pull request, #43067:
URL: https://github.com/apache/spark/pull/43067

   ### What changes were proposed in this pull request?
   Add `keepReadLock` parameter in `lockNewBlockForWriting()`. When 
`keepReadLock` is `false`, skip `lockForReading()` to avoid block on read Lock 
or potential deadlock issue.
   
   When 2 tasks try to compute same rdd with replication level of 2 and running 
on only 2 executors. Deadlock will happen. Details refer [SPARK-45057]
   
   Task thread hold write lock and waiting for replication to remote executor 
while shuffle server thread which handling block upload request waiting on 
`lockForReading` in 
[BlockInfoManager.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L457C24-L457C24)
   
   ### Why are the changes needed?
   This could save unnecessary read lock acquire and avoid deadlock issue 
mention above. 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT in BlockInfoManagerSuite
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on pull request #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore

2023-09-22 Thread via GitHub


attilapiros commented on PR #43064:
URL: https://github.com/apache/spark/pull/43064#issuecomment-1732139192

   @dongjoon-hyun 
   
   Thanks!
   
   > Are you using the current beta-1?
   
   Yes.
   
   > Is there a timeline for Hive 4.0 GA?
   
   I will ask around but as I know they still have some blockers.
   
   > Although I know that you filed this as Bug for some old releases, but I 
believe this PR should be a subtask for Apache Spark 4.0.0 because there is no 
existing Spark users with Apache Hive 4.0.0 Megastore.
   
   Sorry that was a mistake of mine thanks for fixing that in Jira.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43062: [SPARK-45285][CORE][TESTS] Remove 
deprecated `Runtime.getRuntime.exec(String)` API usage
URL: https://github.com/apache/spark/pull/43062


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43062:
URL: https://github.com/apache/spark/pull/43062#issuecomment-1732134994

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


viirya commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1732134464

   Sounds good. Thanks @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43060: [SPARK-45284][R] Update SparkR 
minimum SystemRequirements to Java 17
URL: https://github.com/apache/spark/pull/43060


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1732133925

   Thank you. I checked now. Spark doc is updated with Java 17. It seems that 
we need to fix it from `Java 17` to `Java17/21`. I'll handle it independently 
because it's Java 21 stuff.
   
   
https://github.com/apache/spark/blob/06ccb6d434476afacc08936cf473670102d41010/docs/index.md?plain=1#L37


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


viirya commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1732132107

   Do we have necessary change in Spark documents?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732130048

   I tried to clean-up this in my regeneration PR, but it makes the commit log 
weird because Git thinks is renaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43066:
URL: https://github.com/apache/spark/pull/43066#issuecomment-1732129270

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43066:
URL: https://github.com/apache/spark/pull/43066

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43065:
URL: https://github.com/apache/spark/pull/43065#issuecomment-1732121987

   cc @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43065:
URL: https://github.com/apache/spark/pull/43065

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xiongbo-sjtu commented on pull request #43021: [SPARK-45227][CORE] Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend

2023-09-22 Thread via GitHub


xiongbo-sjtu commented on PR #43021:
URL: https://github.com/apache/spark/pull/43021#issuecomment-1732099260

   @jiangxb1987 @mridulm 
   
   Eventually got all tests passed in Github Actions.  Any concern on merging 
this pull request?
   
   As a side note, I've discovered [another minor 
issue](https://issues.apache.org/jira/browse/SPARK-45283), but will address 
that in another pull request.
   
   Thanks,
   Bo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros opened a new pull request, #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore

2023-09-22 Thread via GitHub


attilapiros opened a new pull request, #43064:
URL: https://github.com/apache/spark/pull/43064

   
   ### What changes were proposed in this pull request?
   
   Supporting Hive 4.0 metastore where partition filters even for CHAR and a 
VARCHAR types can be pushed down.
   
   **Hive 4.0 is still beta! This is why this is work on progress PR.** 
   
   ### Why are the changes needed?
   
   Supporting more Hive versions (with extra performance improvement) is good 
for our users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Regarding supporting Hive 4.0 metastore the documentation is updated 
accordingly.
   
   ### How was this patch tested?
   
    Manually
   
   I used the docker image of apache/hive:4.0.0-beta-1 for starting a metastore 
and a hiveserver2 (along with a hadoop3 docker image).
   
   Created a table:
   ```
   CREATE EXTERNAL TABLE testTable1 ( 
 column1 String 
   ) PARTITIONED BY (partColumn1 CHAR(30), partColumn2 VARCHAR(30)) LOCATION 
'hdfs://hadoop3:8020/tmp/hive_external/';
   ```
   
   Inserted some values in beeline:
   
   ```
   insert into table testtable1 values ("column1_v1", "partcolumn1_v1", 
"partcolumn2_v1"), ("column1_v2", "partcolumn1_v2", "partcolumn2_v2");
   ```
   
   Started my spark in the hiveserver2 container as:
   ```
   ./bin/spark-shell --conf spark.sql.hive.metastore.version=4.0.0 --conf 
spark.sql.hive.metastore.jars="/opt/hive/lib/*"
   ```
   
   Run the query as:
   ```
   scala> sql("select * from testtable1 where partcolumn1 = 'partcolumn1_v1' 
and partcolumn2 = 'partcolumn2_v1'").show
   Hive Session ID = 6846fe0e-968a-474d-afec-4f67b3a2a274
   +--++--+
   |   column1| partcolumn1|   partcolumn2|
   +--++--+
   |column1_v1|partcolumn1_v1   ...|partcolumn2_v1|
   +--++--+
   ```
   
   And check the HMS calls in the metastore container in the file 
`/tmp/hive/hive.log`:
   ```
   ...
   2023-09-22T21:06:34,293  INFO [Metastore-Handler-Pool: Thread-1356] 
HiveMetaStore.audit: ugi=hive   ip=172.30.0.5   cmd=source:172.30.0.5 
get_partitions_by_filter : tbl=hive.default.testtable1
   ...
   ```
   
   Which contains the expected `get_partitions_by_filter`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] atronchi commented on pull request #40963: [SPARK-43288][SQL] DataSourceV2: CREATE TABLE LIKE

2023-09-22 Thread via GitHub


atronchi commented on PR #40963:
URL: https://github.com/apache/spark/pull/40963#issuecomment-1732002000

   Would it be possible to re-open this PR? The `CREATE TABLE LIKE` 
functionality still does not exist for DataSourceV2...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #37234: [SPARK-39822][PYTHON][PS] Provide a good feedback to users

2023-09-22 Thread via GitHub


bjornjorgensen commented on PR #37234:
URL: https://github.com/apache/spark/pull/37234#issuecomment-1731983498

   @bzhaoopenstack will you reopen this? 
   If not, can I open a new PR with yours code and add you as co-writer?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43060:
URL: https://github.com/apache/spark/pull/43060#issuecomment-1731960708

   Could you review this, @LuciferYang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #40390:
URL: https://github.com/apache/spark/pull/40390#issuecomment-1731959966

   Thanks. Ya, I also was tracking that, @LuciferYang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen opened a new pull request, #43063: [SPARK-45286][DOCS] Add back Matomo analytics

2023-09-22 Thread via GitHub


srowen opened a new pull request, #43063:
URL: https://github.com/apache/spark/pull/43063

   ### What changes were proposed in this pull request?
   
   Add analytics to doc pages using the ASF's Matomo service
   
   
   ### Why are the changes needed?
   
   We had previously removed Google Analytics from the website and release 
docs, per ASF policy: https://github.com/apache/spark/pull/36310
   
   We just restored analytics using the ASF-hosted Matomo service on the 
website:
   
https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30
   
   This change would put the same new tracking code back into the release docs. 
It would let us see what docs and resources are most used, I suppose.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   N/A
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-09-22 Thread via GitHub


ion-elgreco commented on PR #38624:
URL: https://github.com/apache/spark/pull/38624#issuecomment-1731949210

   @HyukjinKwon since @igorghi has shown with his tests it's not possible to 
use repartition().mapInArrow to mimic groupbyApply, would it now make sense to 
add groupbyApplyInArrow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43062:
URL: https://github.com/apache/spark/pull/43062

   ### What changes were proposed in this pull request?
   
   This PR aims to remove the deprecated `Runtime.exec` methods with a single 
string command line.
   
   ### Why are the changes needed?
   
   This is deprecated from Java 18.
   - https://bugs.openjdk.org/browse/JDK-8276408 (Deprecate Runtime.exec 
methods with a single string command line argument)
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Manually check the compilation log.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agubichev opened a new pull request, #43061: tests for correlated exists/IN with ORDER BY/LIMIT

2023-09-22 Thread via GitHub


agubichev opened a new pull request, #43061:
URL: https://github.com/apache/spark/pull/43061

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-09-22 Thread via GitHub


LuciferYang commented on PR #40390:
URL: https://github.com/apache/spark/pull/40390#issuecomment-1731860178

   @ulysses-you I found that after this PR is merged, 
`InMemoryColumnarBenchmark` will fail to execute.
   
   ```
   build/sbt "sql/Test/runMain 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark"
   ```
   
   ```
   [error] Exception in thread "main" java.lang.IndexOutOfBoundsException: 0
   [error]  at scala.collection.LinearSeqOps.apply(LinearSeq.scala:131)
   [error]  at scala.collection.LinearSeqOps.apply$(LinearSeq.scala:128)
   [error]  at scala.collection.immutable.List.apply(List.scala:79)
   [error]  at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.intCache(InMemoryColumnarBenchmark.scala:47)
   [error]  at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.$anonfun$runBenchmarkSuite$1(InMemoryColumnarBenchmark.scala:68)
   [error]  at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
   [error]  at 
org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:42)
   [error]  at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.runBenchmarkSuite(InMemoryColumnarBenchmark.scala:68)
   [error]  at 
org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:72)
   [error]  at 
org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark.main(InMemoryColumnarBenchmark.scala)
   [error] Nonzero exit code returned from runner: 1
   [error] (sql / Test / runMain) Nonzero exit code returned from runner: 1
   ```
   
   Should we run `InMemoryColumnarBenchmark` with the configuration 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning=false`?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43060:
URL: https://github.com/apache/spark/pull/43060

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn closed pull request #43016: [SPARK-45077][UI][FOLLOWUP] Update comment to link the forked repo yaooqinn/dagre-d3

2023-09-22 Thread via GitHub


yaooqinn closed pull request #43016: [SPARK-45077][UI][FOLLOWUP] Update comment 
to link the forked repo yaooqinn/dagre-d3
URL: https://github.com/apache/spark/pull/43016


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43059:
URL: https://github.com/apache/spark/pull/43059#issuecomment-1731767573

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43059: [SPARK-45281][CORE][TESTS] Update 
BenchmarkBase to use Java 17 as the base version
URL: https://github.com/apache/spark/pull/43059


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43059:
URL: https://github.com/apache/spark/pull/43059#issuecomment-1731766295

   Thank you so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43056:
URL: https://github.com/apache/spark/pull/43056#issuecomment-1731765149

   Merged to master~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43056: [SPARK-45277][BUILD][INFRA] Install 
Java 17 to support SparkR testing on Windows
URL: https://github.com/apache/spark/pull/43056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334634163


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   Yes in both ways,
   - Maven build will prevent it explicitly.
   - SBT also seems to hit compilation failure due to `-target:17`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


viirya commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334632685


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   Hmm, so currently we cannot use jdk8 with master where Java17 is enforced?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334632610


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   Hmm, I verified Java 8, it seems to fail in the other part.
   ```
   $ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/Test/runMain 
org.apache.spark.serializer.KryoBenchmark"
   ...
   [error] '17' is not a valid choice for '-target'
   [error] bad option: '-target:17'
   [error] (tags / Compile / compileIncremental) Compilation failed
   [error] Total time: 39 s, completed Sep 22, 2023 10:03:31 AM
   ```



##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   Hmm, I re-verified Java 8, it seems to fail in the other part.
   ```
   $ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/Test/runMain 
org.apache.spark.serializer.KryoBenchmark"
   ...
   [error] '17' is not a valid choice for '-target'
   [error] bad option: '-target:17'
   [error] (tags / Compile / compileIncremental) Compilation failed
   [error] Total time: 39 s, completed Sep 22, 2023 10:03:31 AM
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334631707


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   For example, we didn't consider Java 7 unto now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334629527


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   We has been using `maven-enforcer-plugin` for the java version. We can 
assume Java 17+ dev environment.
   ```
 
   ${java.version}
 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


viirya commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334628023


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   If it is using jdk8, it will be ""? It may be confused with Java17 base 
result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


viirya commented on code in PR #43059:
URL: https://github.com/apache/spark/pull/43059#discussion_r1334628023


##
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala:
##
@@ -51,7 +51,7 @@ abstract class BenchmarkBase {
 val regenerateBenchmarkFiles: Boolean = 
System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1"
 if (regenerateBenchmarkFiles) {
   val version = System.getProperty("java.version").split("\\D+")(0).toInt
-  val jdkString = if (version > 8) s"-jdk$version" else ""
+  val jdkString = if (version > 17) s"-jdk$version" else ""

Review Comment:
   If it is using jdk8, it will be ""?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #42943:
URL: https://github.com/apache/spark/pull/42943#issuecomment-1731745429

   Thank you for your decision, @dcoliversun .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43056:
URL: https://github.com/apache/spark/pull/43056#issuecomment-1731734955

   https://github.com/apache/spark/assets/1475305/2df78203-3cfd-4e5e-b162-d0bd38b3615d";>
   
   Passed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version

2023-09-22 Thread via GitHub


dongjoon-hyun opened a new pull request, #43059:
URL: https://github.com/apache/spark/pull/43059

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43035:
URL: https://github.com/apache/spark/pull/43035#issuecomment-1731694203

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter 
fails when writing more values than initial capacity 
URL: https://github.com/apache/spark/pull/43035


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #43057: [SPARK-45280][INFRA] Change Maven daily test use Java 17 for testing

2023-09-22 Thread via GitHub


dongjoon-hyun closed pull request #43057: [SPARK-45280][INFRA] Change Maven 
daily test use Java 17 for testing
URL: https://github.com/apache/spark/pull/43057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement

2023-09-22 Thread via GitHub


gengliangwang commented on code in PR #42985:
URL: https://github.com/apache/spark/pull/42985#discussion_r1334552910


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:
##
@@ -61,68 +62,92 @@ case class PrintToStderr(child: Expression) extends 
UnaryExpression {
 /**
  * Throw with the result of an expression (used for debugging).
  */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Throws an exception with `expr`.",
+  usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION 
with `expr` as message, or a defined error class in `expr` with a parameter 
map.",
   examples = """
 Examples:
   > SELECT _FUNC_('custom error message');
-   java.lang.RuntimeException
-   custom error message
+   [USER_RAISED_EXCEPTION] custom error message
+
+  > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`'));
+   [VIEW_NOT_FOUND] The view `V1` cannot be found. ...
   """,
   since = "3.1.0",
   group = "misc_funcs")
-case class RaiseError(child: Expression, dataType: DataType)
-  extends UnaryExpression with ImplicitCastInputTypes {
+// scalastyle:on line.size.limit
+case class RaiseError(errorClass: Expression, errorParms: Expression, 
dataType: DataType)
+  extends BinaryExpression with ImplicitCastInputTypes {
 
-  def this(child: Expression) = this(child, NullType)
+  def this(str: Expression) = {
+this(Literal("USER_RAISED_EXCEPTION"),
+  CreateMap(Seq(Literal("errorMessage"), str)), NullType)
+  }
+
+  def this(errorClass: Expression, errorParms: Expression) = {
+this(errorClass, errorParms, NullType)
+  }
 
   override def foldable: Boolean = false
   override def nullable: Boolean = true
-  override def inputTypes: Seq[AbstractDataType] = Seq(StringType)
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(StringType, MapType(StringType, StringType))
+
+  override def left: Expression = errorClass
+  override def right: Expression = errorParms
 
   override def prettyName: String = "raise_error"
 
   override def eval(input: InternalRow): Any = {
-val value = child.eval(input)
-if (value == null) {
-  throw new RuntimeException()
-}
-throw new RuntimeException(value.toString)
+val error = errorClass.eval(input).asInstanceOf[UTF8String]
+val parms: MapData = errorParms.eval(input).asInstanceOf[MapData]
+throw raiseError(error, parms)
   }
 
   // if (true) is to avoid codegen compilation exception that statement is 
unreachable
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val eval = child.genCode(ctx)
+val error = errorClass.genCode(ctx)
+val parms = errorParms.genCode(ctx)
 ExprCode(
-  code = code"""${eval.code}
+  code = code"""${error.code}

Review Comment:
   We may need to check the nullability of error and params as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement

2023-09-22 Thread via GitHub


gengliangwang commented on code in PR #42985:
URL: https://github.com/apache/spark/pull/42985#discussion_r1334552099


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:
##
@@ -61,68 +62,92 @@ case class PrintToStderr(child: Expression) extends 
UnaryExpression {
 /**
  * Throw with the result of an expression (used for debugging).
  */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Throws an exception with `expr`.",
+  usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION 
with `expr` as message, or a defined error class in `expr` with a parameter 
map.",
   examples = """
 Examples:
   > SELECT _FUNC_('custom error message');
-   java.lang.RuntimeException
-   custom error message
+   [USER_RAISED_EXCEPTION] custom error message
+
+  > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`'));
+   [VIEW_NOT_FOUND] The view `V1` cannot be found. ...
   """,
   since = "3.1.0",
   group = "misc_funcs")
-case class RaiseError(child: Expression, dataType: DataType)
-  extends UnaryExpression with ImplicitCastInputTypes {
+// scalastyle:on line.size.limit
+case class RaiseError(errorClass: Expression, errorParms: Expression, 
dataType: DataType)
+  extends BinaryExpression with ImplicitCastInputTypes {
 
-  def this(child: Expression) = this(child, NullType)
+  def this(str: Expression) = {
+this(Literal("USER_RAISED_EXCEPTION"),
+  CreateMap(Seq(Literal("errorMessage"), str)), NullType)
+  }
+
+  def this(errorClass: Expression, errorParms: Expression) = {
+this(errorClass, errorParms, NullType)
+  }
 
   override def foldable: Boolean = false
   override def nullable: Boolean = true
-  override def inputTypes: Seq[AbstractDataType] = Seq(StringType)
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(StringType, MapType(StringType, StringType))
+
+  override def left: Expression = errorClass
+  override def right: Expression = errorParms
 
   override def prettyName: String = "raise_error"
 
   override def eval(input: InternalRow): Any = {
-val value = child.eval(input)
-if (value == null) {
-  throw new RuntimeException()
-}
-throw new RuntimeException(value.toString)
+val error = errorClass.eval(input).asInstanceOf[UTF8String]
+val parms: MapData = errorParms.eval(input).asInstanceOf[MapData]
+throw raiseError(error, parms)
   }
 
   // if (true) is to avoid codegen compilation exception that statement is 
unreachable
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val eval = child.genCode(ctx)
+val error = errorClass.genCode(ctx)
+val parms = errorParms.genCode(ctx)
 ExprCode(
-  code = code"""${eval.code}
+  code = code"""${error.code}

Review Comment:
   ```suggestion
 code = code"""${error.code}
   |${parms.code}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement

2023-09-22 Thread via GitHub


srielau commented on code in PR #42985:
URL: https://github.com/apache/spark/pull/42985#discussion_r1334513626


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4432,6 +4432,17 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_RAISE_ERROR_WITHOUT_ERROR_CLASS =
+buildConf("spark.sql.legacy.raiseErrorWithoutErrorClass")
+  .internal()
+  .doc("When set to true, restores the legacy behavior of `raise_error` 
and `assert_true` to " +
+"not return the `[USER_RAISED_EXCEPTION]` prefix." +
+"For example, `raise_error('error!')` returns `error!` instead of " +
+"`[[USER_RAISED_EXCEPTION] Error!`.")

Review Comment:
   I spoke to @gatorsmile , and he also recommended a config. There are two 
things at play here:
   The exception changed away from RuntimeException, and we got the prefix.
It smells like we have a decent chance of breaking anyone who wants to 
catch these exceptions.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement

2023-09-22 Thread via GitHub


srielau commented on code in PR #42985:
URL: https://github.com/apache/spark/pull/42985#discussion_r1334513626


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -4432,6 +4432,17 @@ object SQLConf {
   .booleanConf
   .createWithDefault(false)
 
+  val LEGACY_RAISE_ERROR_WITHOUT_ERROR_CLASS =
+buildConf("spark.sql.legacy.raiseErrorWithoutErrorClass")
+  .internal()
+  .doc("When set to true, restores the legacy behavior of `raise_error` 
and `assert_true` to " +
+"not return the `[USER_RAISED_EXCEPTION]` prefix." +
+"For example, `raise_error('error!')` returns `error!` instead of " +
+"`[[USER_RAISED_EXCEPTION] Error!`.")

Review Comment:
   I spoke to Xiao, and he also recommended a config. There are two things at 
play here:
   The exception changed away from RuntimeException, and we got the prefix.
It smells like we have a decent chance of breaking anyone who wants to 
catch these exceptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement

2023-09-22 Thread via GitHub


srielau commented on code in PR #42985:
URL: https://github.com/apache/spark/pull/42985#discussion_r1334511376


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala:
##
@@ -61,68 +62,97 @@ case class PrintToStderr(child: Expression) extends 
UnaryExpression {
 /**
  * Throw with the result of an expression (used for debugging).
  */
+// scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Throws an exception with `expr`.",
+  usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION 
with `expr` as message, or a defined error class in `expr` with a parameter 
map.",
   examples = """
 Examples:
   > SELECT _FUNC_('custom error message');
-   java.lang.RuntimeException
-   custom error message
+   [USER_RAISED_EXCEPTION] custom error message
+
+  > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`'));
+   [VIEW_NOT_FOUND] The view `V1` cannot be found. ...
   """,
   since = "3.1.0",
   group = "misc_funcs")
-case class RaiseError(child: Expression, dataType: DataType)
-  extends UnaryExpression with ImplicitCastInputTypes {
+// scalastyle:on line.size.limit
+case class RaiseError(errorClass: Expression, errorParms: Expression, 
dataType: DataType)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  def this(str: Expression) = {
+this(Literal(
+  if (SQLConf.get.legacyNegativeIndexInArrayInsert) {

Review Comment:
   Abandoned effort to put the logic in the wrong spot. Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ishnagy commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2023-09-22 Thread via GitHub


ishnagy commented on PR #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-1731539550

   Hi @ulysses-you , @attilapiros 
   
   I'd like to work on this issue and tie up all the loose ends left.
   
   If you're ok with it, I'd like to open a new PR from my private repo reusing 
a significant portion of the changes in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork

2023-09-22 Thread via GitHub


yaooqinn commented on PR #43053:
URL: https://github.com/apache/spark/pull/43053#issuecomment-1731521203

   cc @sarutak @cloud-fan @dongjoon-hyun @HyukjinKwon @mridulm, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #43058: Test new ammonite

2023-09-22 Thread via GitHub


LuciferYang opened a new pull request, #43058:
URL: https://github.com/apache/spark/pull/43058

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11

2023-09-22 Thread via GitHub


LuciferYang commented on code in PR #43005:
URL: https://github.com/apache/spark/pull/43005#discussion_r1334415482


##
dev/infra/Dockerfile:
##
@@ -30,7 +30,7 @@ RUN apt-get update && apt-get install -y \
 pkg-config  \
 curl  \
 wget  \
-openjdk-8-jdk  \
+openjdk-17-jdk-headless  \

Review Comment:
   Thanks @Yikun ~ Let me continue to monitor the running status of GA.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #43057: [SPARK-45280] Change Maven daily test use Java 17 for testing.

2023-09-22 Thread via GitHub


LuciferYang opened a new pull request, #43057:
URL: https://github.com/apache/spark/pull/43057

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on a diff in pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11

2023-09-22 Thread via GitHub


Yikun commented on code in PR #43005:
URL: https://github.com/apache/spark/pull/43005#discussion_r1334402022


##
dev/infra/Dockerfile:
##
@@ -30,7 +30,7 @@ RUN apt-get update && apt-get install -y \
 pkg-config  \
 curl  \
 wget  \
-openjdk-8-jdk  \
+openjdk-17-jdk-headless  \

Review Comment:
   Emm, jdk install in here seems useless, because we are using github action 
to install the java. But if CI passed, it is ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43032: [SPARK-45252][CORE] Escape the greater/less than symbols in the comments to make `sbt doc` execute successfully

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43032:
URL: https://github.com/apache/spark/pull/43032#issuecomment-1731439172

   Thanks @dongjoon-hyun and @mridulm ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #43028: [SPARK-45248][CORE]Set the timeout for spark ui server

2023-09-22 Thread via GitHub


srowen commented on PR #43028:
URL: https://github.com/apache/spark/pull/43028#issuecomment-1731429696

   OK, please put a comment in the code about why this is set lower than usual.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang closed pull request #43054: Test Appveyor use pre-installed java 17

2023-09-22 Thread via GitHub


LuciferYang closed pull request #43054: Test Appveyor use pre-installed java 17
URL: https://github.com/apache/spark/pull/43054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #43056: [SPARK-45277][INFRA] Install Java 17 for Windows SparkR test

2023-09-22 Thread via GitHub


LuciferYang opened a new pull request, #43056:
URL: https://github.com/apache/spark/pull/43056

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11

2023-09-22 Thread via GitHub


LuciferYang commented on PR #43005:
URL: https://github.com/apache/spark/pull/43005#issuecomment-1731366636

   Thanks @dongjoon-hyun @HyukjinKwon @bjornjorgensen and @cfmcgrady ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sander-goos commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity

2023-09-22 Thread via GitHub


sander-goos commented on PR #43035:
URL: https://github.com/apache/spark/pull/43035#issuecomment-1731360942

   > +1, this PR looks reasonable (Pending CIs). There is no perf regression 
for the case which fits the limit, right, @sander-goos ?
   
   There shouldn't be a perf regression; the extra call to `handleSafe` is a 
no-op when index < capacity. The Arrow writers for other types also `setSafe` 
where applicable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #43025: [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1

2023-09-22 Thread via GitHub


dongjoon-hyun commented on PR #43025:
URL: https://github.com/apache/spark/pull/43025#issuecomment-1731345459

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >