[GitHub] [spark] hasnain-db commented on pull request #42685: [WIP][SPARK-44937][CORE] Add SSL/TLS support for RPC and Shuffle communications
hasnain-db commented on PR #42685: URL: https://github.com/apache/spark/pull/42685#issuecomment-1732226082 Thanks @mridulm ! Happy to do that. I can think of a nice split in line with the bullet points listed in the summary here. Just to confirm (since I'm not sure this repo has support for stacked PRs - if there is, please link me to an example) - you're proposing I put up one PR, get it approved and merged, then put up the second PR, and so on, right (since most changes depend on each other). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun closed pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones URL: https://github.com/apache/spark/pull/43065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun commented on PR #43065: URL: https://github.com/apache/spark/pull/43065#issuecomment-1732206610 I'll merge this because this PR doesn't touch any code. These are purely generated files as the snapshot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun commented on PR #43065: URL: https://github.com/apache/spark/pull/43065#issuecomment-1732206126 Thank you for thorough reviews. Ya, we should catch up them one by one after having this. This helps us be in the same page and monitor this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
LuciferYang commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334916319 ## sql/core/benchmarks/SortBenchmark-results.txt: ## @@ -2,15 +2,15 @@ radix sort -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz radix sort 2500: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -reference TimSort key prefix array12374 12403 41 2.0 495.0 1.0X -reference Arrays.sort 3377 3381 5 7.4 135.1 3.7X -radix sort one byte 209212 2119.5 8.4 59.2X -radix sort two bytes398403 3 62.8 15.9 31.1X -radix sort eight bytes 1538 1538 0 16.3 61.5 8.0X -radix sort key prefix array1953 1998 64 12.8 78.1 6.3X +reference TimSort key prefix array14141 14208 96 1.8 565.6 1.0X Review Comment: ditto ## sql/core/benchmarks/ColumnarBatchBenchmark-results.txt: ## @@ -2,58 +2,58 @@ Int Read/Write -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz Int Read/Write: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Java Array 257266 8 1273.6 0.8 1.0X -ByteBuffer Unsafe 480490 5682.0 1.5 0.5X -ByteBuffer API 1994 1996 2164.4 6.1 0.1X -DirectByteBuffer756762 7433.6 2.3 0.3X -Unsafe Buffer 255263 4 1283.1 0.8 1.0X -Column(on heap) 266272 6 1231.5 0.8 1.0X -Column(off heap)526529 2623.1 1.6 0.5X -Column(off heap direct) 258265 7 1270.3 0.8 1.0X -UnsafeRow (on heap) 556560 6589.0 1.7 0.5X -UnsafeRow (off heap)599606 5546.9 1.8 0.4X -Column On Heap Append 478488 6686.0 1.5 0.5X +Java Array 254261 5 1290.1 0.8 1.0X +ByteBuffer Unsafe 420427 8780.2 1.3 0.6X +ByteBuffer API 801822 28409.0 2.4 0.3X +DirectByteBuffer661668 7495.8 2.0 0.4X +Unsafe Buffer 253266 10 1296.0 0.8 1.0X +Column(on heap) 254261 4 1292.2 0.8 1.0X +Column(off heap)255261 5 1287.3 0.8 1.0X +Column(off heap direct) 253258 6 1297.3 0.8 1.0X +UnsafeRow (on heap) 722729 9454.1 2.2 0.4X +UnsafeRow (off heap)532543 13616.3 1.6 0.5X +Column On Heap Append 516
[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support
dongjoon-hyun commented on PR #43069: URL: https://github.com/apache/spark/pull/43069#issuecomment-1732200379 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support
dongjoon-hyun closed pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support URL: https://github.com/apache/spark/pull/43069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support
dongjoon-hyun commented on PR #43069: URL: https://github.com/apache/spark/pull/43069#issuecomment-1732200263 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork
yaooqinn commented on PR #43053: URL: https://github.com/apache/spark/pull/43053#issuecomment-1732198539 Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support
dongjoon-hyun commented on PR #43069: URL: https://github.com/apache/spark/pull/43069#issuecomment-1732194064 Could you review this doc-only PR, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732193985 Thank you, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun commented on PR #43065: URL: https://github.com/apache/spark/pull/43065#issuecomment-1732193905 Thank you, @LuciferYang . Now, the PR is ready by adding AnsiIntervalSortBenchmark (Java17/21). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334911535 ## sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt: ## @@ -1,10 +1,10 @@ -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -arrayOfAny4 4 0 2491.5 0.4 1.0X -arrayOfAnyAsObject 256257 1 39.1 25.6 0.0X -arrayOfAnyAsSeq 18 18 0551.9 1.8 0.2X -arrayOfInt 536537 1 18.7 53.6 0.0X -arrayOfIntAsObject 788794 10 12.7 78.8 0.0X +arrayOfAny7 7 0 1495.4 0.7 1.0X +arrayOfAnyAsObject7 7 0 1495.3 0.7 1.0X +arrayOfAnyAsSeq 201202 1 49.8 20.1 0.0X Review Comment: Sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
dongjoon-hyun commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334911491 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,26 +2,26 @@ Benchmark ZStandardCompressionCodec -OpenJDK 64-Bit Server VM 1.8.0_372-b07 on Linux 5.15.0-1041-azure +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Benchmark ZStandardCompressionCodec:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -- -Compression 1 times at level 1 without buffer pool293 327 85 0.0 29283.2 1.0X -Compression 1 times at level 2 without buffer pool322 324 2 0.0 32184.8 0.9X -Compression 1 times at level 3 without buffer pool453 456 2 0.0 45285.1 0.6X -Compression 1 times at level 1 with buffer pool 171 173 1 0.1 17065.2 1.7X -Compression 1 times at level 2 with buffer pool 208 209 1 0.0 20786.5 1.4X -Compression 1 times at level 3 with buffer pool 334 335 2 0.0 33350.3 0.9X +Compression 1 times at level 1 without buffer pool 2800 2801 2 0.0 279995.2 1.0X Review Comment: Yes, this one has been on my todo list. Will try to identify the root cause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
LuciferYang commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732193237 Merged into master for Apache Spark 4.0, thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang closed pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
LuciferYang closed pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt` URL: https://github.com/apache/spark/pull/43066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43069: [SPARK-44119][K8S][DOCS] Drop K8s v1.25 and lower version support
dongjoon-hyun opened a new pull request, #43069: URL: https://github.com/apache/spark/pull/43069 ### What changes were proposed in this pull request? This PR aims to update K8s doc to recommend K8s 1.26+ for Apache Spark 4.0.0. ### Why are the changes needed? **1. Default K8s Version in Public Cloud environments** The default K8s versions of public cloud providers are already K8s 1.27+. - EKS: v1.27 (Default) - GKE: v1.27 (Stable), v1.27 (Regular), v1.27 (Rapid) **2. End Of Support** In addition, K8s 1.25 and olders are going to reach EOL when Apache Spark 4.0.0 arrives on June 2024. K8s 1.26 is also going to reach EOL on June. | K8s | AKS | GKE | EKS | | | --- | --- | --- | | 1.27 | 2024-07 | 2024-08 | 2024-07 | | 1.26 | 2024-03 | 2024-06 | 2024-06 | | 1.25 | 2023-12 | 2024-02 | 2024-05 | | 1.24 | 2023-07 | 2023-10 | 2024-01 | - [AKS EOL Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar) - [GKE EOL Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule) - [EKS EOL Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar) ### Does this PR introduce _any_ user-facing change? - No, this is a documentation-only change about K8s versions. - Apache Spark K8s Integration Test is currently using K8s v1.26.3 on Minikube. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
LuciferYang commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1732191906 > Thank you. I checked now. Spark doc is updated with Java 17. So, we don't need to mention here. It seems that we need to fix it from `Java 17` to `Java17/21`. I'll handle it independently because it's Java 21 stuff. > > https://github.com/apache/spark/blob/06ccb6d434476afacc08936cf473670102d41010/docs/index.md?plain=1#L37 https://github.com/apache/spark/blob/51938fea36af19824a657c0326af9de03393e1dd/docs/building-spark.md?plain=1#L29-L31 `building-spark.md` should also include Java 21. I apologize, I previously only focused on Java 17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
LuciferYang commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1732191440 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
LuciferYang commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334909906 ## sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt: ## @@ -1,10 +1,10 @@ -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -arrayOfAny4 4 0 2491.5 0.4 1.0X -arrayOfAnyAsObject 256257 1 39.1 25.6 0.0X -arrayOfAnyAsSeq 18 18 0551.9 1.8 0.2X -arrayOfInt 536537 1 18.7 53.6 0.0X -arrayOfIntAsObject 788794 10 12.7 78.8 0.0X +arrayOfAny7 7 0 1495.4 0.7 1.0X +arrayOfAnyAsObject7 7 0 1495.3 0.7 1.0X +arrayOfAnyAsSeq 201202 1 49.8 20.1 0.0X Review Comment: The results of `arrayOfAnyAsSeq` have undergone significant changes and need attention (it may be a known issue, but I can't remember the details). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
LuciferYang commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334909865 ## sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt: ## @@ -1,10 +1,10 @@ -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -arrayOfAny4 4 0 2491.5 0.4 1.0X -arrayOfAnyAsObject 256257 1 39.1 25.6 0.0X -arrayOfAnyAsSeq 18 18 0551.9 1.8 0.2X -arrayOfInt 536537 1 18.7 53.6 0.0X -arrayOfIntAsObject 788794 10 12.7 78.8 0.0X +arrayOfAny7 7 0 1495.4 0.7 1.0X +arrayOfAnyAsObject7 7 0 1495.3 0.7 1.0X Review Comment: The results of `arrayOfAnyAsSeq` have undergone significant changes and need attention (it may be a known issue, but I can't remember the details). ## sql/catalyst/benchmarks/GenericArrayDataBenchmark-results.txt: ## @@ -1,10 +1,10 @@ -OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Linux 5.15.0-1031-azure -Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -arrayOfAny4 4 0 2491.5 0.4 1.0X -arrayOfAnyAsObject 256257 1 39.1 25.6 0.0X -arrayOfAnyAsSeq 18 18 0551.9 1.8 0.2X -arrayOfInt 536537 1 18.7 53.6 0.0X -arrayOfIntAsObject 788794 10 12.7 78.8 0.0X +arrayOfAny7 7 0 1495.4 0.7 1.0X +arrayOfAnyAsObject7 7 0 1495.3 0.7 1.0X Review Comment: The results of `arrayOfAnyAsSeq` have undergone significant changes and need attention (it may be a known issue, but I can't remember the details). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result and update Java 17 ones
LuciferYang commented on code in PR #43065: URL: https://github.com/apache/spark/pull/43065#discussion_r1334908931 ## core/benchmarks/ZStandardBenchmark-results.txt: ## @@ -2,26 +2,26 @@ Benchmark ZStandardCompressionCodec -OpenJDK 64-Bit Server VM 1.8.0_372-b07 on Linux 5.15.0-1041-azure +OpenJDK 64-Bit Server VM 17.0.8+7-LTS on Linux 5.15.0-1046-azure Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz Benchmark ZStandardCompressionCodec:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -- -Compression 1 times at level 1 without buffer pool293 327 85 0.0 29283.2 1.0X -Compression 1 times at level 2 without buffer pool322 324 2 0.0 32184.8 0.9X -Compression 1 times at level 3 without buffer pool453 456 2 0.0 45285.1 0.6X -Compression 1 times at level 1 with buffer pool 171 173 1 0.1 17065.2 1.7X -Compression 1 times at level 2 with buffer pool 208 209 1 0.0 20786.5 1.4X -Compression 1 times at level 3 with buffer pool 334 335 2 0.0 33350.3 0.9X +Compression 1 times at level 1 without buffer pool 2800 2801 2 0.0 279995.2 1.0X Review Comment: From what I remember, the results of this microbenchmark are always unstable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork
dongjoon-hyun commented on PR #43053: URL: https://github.com/apache/spark/pull/43053#issuecomment-1732185291 Merged to master for Apache Spark 4.0.0. Thank you, @yaooqinn and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork
dongjoon-hyun closed pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork URL: https://github.com/apache/spark/pull/43053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #42685: [WIP][SPARK-44937][CORE] Add SSL/TLS support for RPC and Shuffle communications
mridulm commented on PR #42685: URL: https://github.com/apache/spark/pull/42685#issuecomment-1732173622 Thanks for working on this @hasnain-db , this is a very nice adding to spark ! Given the size of the PR, can we split this up to make it easier to review ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732154701 This PR is irrelevant from CI result. So, please note that I stopped the runnjng pipelines on this PR manually to unblock my other PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork
mridulm commented on PR #43053: URL: https://github.com/apache/spark/pull/43053#issuecomment-1732154429 Nice job @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages
mridulm commented on code in PR #42950: URL: https://github.com/apache/spark/pull/42950#discussion_r1334896106 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1903,13 +1903,20 @@ private[spark] class DAGScheduler( case smt: ShuffleMapTask => val shuffleStage = stage.asInstanceOf[ShuffleMapStage] -shuffleStage.pendingPartitions -= task.partitionId +// Ignore task completion for old attempt of indeterminate stage +val ignoreIndeterminate = stage.isIndeterminate && + task.stageAttemptId < stage.latestInfo.attemptNumber() +if (!ignoreIndeterminate) { + shuffleStage.pendingPartitions -= task.partitionId +} val status = event.result.asInstanceOf[MapStatus] val execId = status.location.executorId logDebug("ShuffleMapTask finished on " + execId) if (executorFailureEpoch.contains(execId) && smt.epoch <= executorFailureEpoch(execId)) { logInfo(s"Ignoring possibly bogus $smt completion from executor $execId") Review Comment: So this is kind of funny - take a look at what the above replaced @cloud-fan : https://github.com/apache/spark/pull/16620/files#diff-85de35b2e85646ed499c545a3be1cd3ffd525a88aae835a9c621f877eebadcb6R1183 :-) Both of these actually do not account for DETERMINATE/INDETERMINATE changes we made subsequently. IMO, for INDETERMINATE stages, we should ignore task completion events from previous attempts - since we have already cancelled the stage attempt. Having said that, I have not thought through the nuances here. +CC @jiangxb1987 as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #42950: [SPARK-45182][CORE] Ignore task completion from old stage after retrying indeterminate stages
mridulm commented on code in PR #42950: URL: https://github.com/apache/spark/pull/42950#discussion_r1334896106 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1903,13 +1903,20 @@ private[spark] class DAGScheduler( case smt: ShuffleMapTask => val shuffleStage = stage.asInstanceOf[ShuffleMapStage] -shuffleStage.pendingPartitions -= task.partitionId +// Ignore task completion for old attempt of indeterminate stage +val ignoreIndeterminate = stage.isIndeterminate && + task.stageAttemptId < stage.latestInfo.attemptNumber() +if (!ignoreIndeterminate) { + shuffleStage.pendingPartitions -= task.partitionId +} val status = event.result.asInstanceOf[MapStatus] val execId = status.location.executorId logDebug("ShuffleMapTask finished on " + execId) if (executorFailureEpoch.contains(execId) && smt.epoch <= executorFailureEpoch(execId)) { logInfo(s"Ignoring possibly bogus $smt completion from executor $execId") Review Comment: So this is kind of funny - take a look at what the above replaced @cloud-fan : https://github.com/apache/spark/pull/16620/files#diff-85de35b2e85646ed499c545a3be1cd3ffd525a88aae835a9c621f877eebadcb6R1183 :-) Both of these actually do not account for DETERMINATE/INDETERMINATE changes we made subsequently. IMO, for INDETERMINATE stages, we should ignore task completion events from previous attempts - since we have already cancelled the state. Having said that, I have not thought through the nuances here. +CC @jiangxb1987 as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Hisoka-X commented on pull request #40963: [SPARK-43288][SQL] DataSourceV2: CREATE TABLE LIKE
Hisoka-X commented on PR #40963: URL: https://github.com/apache/spark/pull/40963#issuecomment-1732151326 @atronchi Hi, since this PR not updated... So I created one for create table like too, please check https://github.com/apache/spark/pull/42586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chenyu-opensource commented on pull request #43028: [SPARK-45248][CORE]Set the timeout for spark ui server
chenyu-opensource commented on PR #43028: URL: https://github.com/apache/spark/pull/43028#issuecomment-1732148383 > OK, please put a comment in the code about why this is set lower than usual. Thank you for your suggestion and i had follow it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jchen5 opened a new pull request, #43068: [SPARK-44550][SQL] Enable correctness fixes for `null IN (empty list)` under ANSI
jchen5 opened a new pull request, #43068: URL: https://github.com/apache/spark/pull/43068 ### What changes were proposed in this pull request? Enables the correctness fixes for `null IN (empty list)` expressions `null IN (empty list)` incorrectly evaluates to null, when it should evaluate to false. (The reason it should be false is because a IN (b1, b2) is defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR which is false. This is specified by ANSI SQL.) Many places in Spark execution (In, InSet, InSubquery) and optimization (OptimizeIn, NullPropagation) implemented this wrong behavior. This is a longstanding correctness issue which has existed since null support for IN expressions was first added to Spark. See previous PRs where the fixes were implemented: https://github.com/apache/spark/pull/42007 and https://github.com/apache/spark/pull/42163. The behavior is under a flag. This PR enables the new behavior by default under ANSI, while under non-ANSI the old behavior remains the default for now. Later, we should switch the new behavior to default in both cases. See [this doc](https://docs.google.com/document/d/1k8AY8oyT-GI04SnP7eXttPDnDj-Ek-c3luF2zL6DPNU/edit) for more information. ### Why are the changes needed? Fix wrong SQL semantics ### Does this PR introduce _any_ user-facing change? Yes, fix wrong SQL semantics ### How was this patch tested? Unit tests ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #40503: [SPARK-42830] [UI] Link skipped stages on Spark UI
github-actions[bot] commented on PR #40503: URL: https://github.com/apache/spark/pull/40503#issuecomment-1732142967 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #40529: [SPARK-42890] [UI] add repeat identifier on SQL UI
github-actions[bot] commented on PR #40529: URL: https://github.com/apache/spark/pull/40529#issuecomment-1732142946 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #40821: [SPARK-43152][spark-structured-streaming] Parametrisable output metadata path (_spark_metadata)
github-actions[bot] closed pull request #40821: [SPARK-43152][spark-structured-streaming] Parametrisable output metadata path (_spark_metadata) URL: https://github.com/apache/spark/pull/40821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs
github-actions[bot] commented on PR #40782: URL: https://github.com/apache/spark/pull/40782#issuecomment-1732142935 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732140942 Could you review this PR, @attilapiros ? After we switching to Java 17+, there are several clean-up PRs like this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore
dongjoon-hyun commented on PR #43064: URL: https://github.com/apache/spark/pull/43064#issuecomment-1732140284 Thank you. And, if you are fine with Apache Spark 4.0, that's great! I was worried. 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] warrenzhu25 opened a new pull request, #43067: [SPARK-45057][CORE] Avoid acquire read lock when keepReadLock is false
warrenzhu25 opened a new pull request, #43067: URL: https://github.com/apache/spark/pull/43067 ### What changes were proposed in this pull request? Add `keepReadLock` parameter in `lockNewBlockForWriting()`. When `keepReadLock` is `false`, skip `lockForReading()` to avoid block on read Lock or potential deadlock issue. When 2 tasks try to compute same rdd with replication level of 2 and running on only 2 executors. Deadlock will happen. Details refer [SPARK-45057] Task thread hold write lock and waiting for replication to remote executor while shuffle server thread which handling block upload request waiting on `lockForReading` in [BlockInfoManager.scala](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala#L457C24-L457C24) ### Why are the changes needed? This could save unnecessary read lock acquire and avoid deadlock issue mention above. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT in BlockInfoManagerSuite ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros commented on pull request #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore
attilapiros commented on PR #43064: URL: https://github.com/apache/spark/pull/43064#issuecomment-1732139192 @dongjoon-hyun Thanks! > Are you using the current beta-1? Yes. > Is there a timeline for Hive 4.0 GA? I will ask around but as I know they still have some blockers. > Although I know that you filed this as Bug for some old releases, but I believe this PR should be a subtask for Apache Spark 4.0.0 because there is no existing Spark users with Apache Hive 4.0.0 Megastore. Sorry that was a mistake of mine thanks for fixing that in Jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage
dongjoon-hyun closed pull request #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage URL: https://github.com/apache/spark/pull/43062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage
dongjoon-hyun commented on PR #43062: URL: https://github.com/apache/spark/pull/43062#issuecomment-1732134994 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
viirya commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1732134464 Sounds good. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
dongjoon-hyun closed pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17 URL: https://github.com/apache/spark/pull/43060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
dongjoon-hyun commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1732133925 Thank you. I checked now. Spark doc is updated with Java 17. It seems that we need to fix it from `Java 17` to `Java17/21`. I'll handle it independently because it's Java 21 stuff. https://github.com/apache/spark/blob/06ccb6d434476afacc08936cf473670102d41010/docs/index.md?plain=1#L37 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
viirya commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1732132107 Do we have necessary change in Spark documents? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732130048 I tried to clean-up this in my regeneration PR, but it makes the commit log weird because Git thinks is renaming. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun commented on PR #43066: URL: https://github.com/apache/spark/pull/43066#issuecomment-1732129270 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43066: [SPARK-45288][TESTS] Remove outdated benchmark result files, `*-jdk1[17]*results.txt`
dongjoon-hyun opened a new pull request, #43066: URL: https://github.com/apache/spark/pull/43066 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result
dongjoon-hyun commented on PR #43065: URL: https://github.com/apache/spark/pull/43065#issuecomment-1732121987 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43065: [SPARK-45287][TESTS] Add Java 21 benchmark result
dongjoon-hyun opened a new pull request, #43065: URL: https://github.com/apache/spark/pull/43065 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xiongbo-sjtu commented on pull request #43021: [SPARK-45227][CORE] Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend
xiongbo-sjtu commented on PR #43021: URL: https://github.com/apache/spark/pull/43021#issuecomment-1732099260 @jiangxb1987 @mridulm Eventually got all tests passed in Github Actions. Any concern on merging this pull request? As a side note, I've discovered [another minor issue](https://issues.apache.org/jira/browse/SPARK-45283), but will address that in another pull request. Thanks, Bo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] attilapiros opened a new pull request, #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore
attilapiros opened a new pull request, #43064: URL: https://github.com/apache/spark/pull/43064 ### What changes were proposed in this pull request? Supporting Hive 4.0 metastore where partition filters even for CHAR and a VARCHAR types can be pushed down. **Hive 4.0 is still beta! This is why this is work on progress PR.** ### Why are the changes needed? Supporting more Hive versions (with extra performance improvement) is good for our users. ### Does this PR introduce _any_ user-facing change? Yes. Regarding supporting Hive 4.0 metastore the documentation is updated accordingly. ### How was this patch tested? Manually I used the docker image of apache/hive:4.0.0-beta-1 for starting a metastore and a hiveserver2 (along with a hadoop3 docker image). Created a table: ``` CREATE EXTERNAL TABLE testTable1 ( column1 String ) PARTITIONED BY (partColumn1 CHAR(30), partColumn2 VARCHAR(30)) LOCATION 'hdfs://hadoop3:8020/tmp/hive_external/'; ``` Inserted some values in beeline: ``` insert into table testtable1 values ("column1_v1", "partcolumn1_v1", "partcolumn2_v1"), ("column1_v2", "partcolumn1_v2", "partcolumn2_v2"); ``` Started my spark in the hiveserver2 container as: ``` ./bin/spark-shell --conf spark.sql.hive.metastore.version=4.0.0 --conf spark.sql.hive.metastore.jars="/opt/hive/lib/*" ``` Run the query as: ``` scala> sql("select * from testtable1 where partcolumn1 = 'partcolumn1_v1' and partcolumn2 = 'partcolumn2_v1'").show Hive Session ID = 6846fe0e-968a-474d-afec-4f67b3a2a274 +--++--+ | column1| partcolumn1| partcolumn2| +--++--+ |column1_v1|partcolumn1_v1 ...|partcolumn2_v1| +--++--+ ``` And check the HMS calls in the metastore container in the file `/tmp/hive/hive.log`: ``` ... 2023-09-22T21:06:34,293 INFO [Metastore-Handler-Pool: Thread-1356] HiveMetaStore.audit: ugi=hive ip=172.30.0.5 cmd=source:172.30.0.5 get_partitions_by_filter : tbl=hive.default.testtable1 ... ``` Which contains the expected `get_partitions_by_filter`. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] atronchi commented on pull request #40963: [SPARK-43288][SQL] DataSourceV2: CREATE TABLE LIKE
atronchi commented on PR #40963: URL: https://github.com/apache/spark/pull/40963#issuecomment-1732002000 Would it be possible to re-open this PR? The `CREATE TABLE LIKE` functionality still does not exist for DataSourceV2... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bjornjorgensen commented on pull request #37234: [SPARK-39822][PYTHON][PS] Provide a good feedback to users
bjornjorgensen commented on PR #37234: URL: https://github.com/apache/spark/pull/37234#issuecomment-1731983498 @bzhaoopenstack will you reopen this? If not, can I open a new PR with yours code and add you as co-writer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
dongjoon-hyun commented on PR #43060: URL: https://github.com/apache/spark/pull/43060#issuecomment-1731960708 Could you review this, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default
dongjoon-hyun commented on PR #40390: URL: https://github.com/apache/spark/pull/40390#issuecomment-1731959966 Thanks. Ya, I also was tracking that, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen opened a new pull request, #43063: [SPARK-45286][DOCS] Add back Matomo analytics
srowen opened a new pull request, #43063: URL: https://github.com/apache/spark/pull/43063 ### What changes were proposed in this pull request? Add analytics to doc pages using the ASF's Matomo service ### Why are the changes needed? We had previously removed Google Analytics from the website and release docs, per ASF policy: https://github.com/apache/spark/pull/36310 We just restored analytics using the ASF-hosted Matomo service on the website: https://github.com/apache/spark-website/commit/a1548627b48a62c2e51870d1488ca3e09397bd30 This change would put the same new tracking code back into the release docs. It would let us see what docs and resources are most used, I suppose. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup
ion-elgreco commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1731949210 @HyukjinKwon since @igorghi has shown with his tests it's not possible to use repartition().mapInArrow to mimic groupbyApply, would it now make sense to add groupbyApplyInArrow? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43062: [SPARK-45285][CORE][TESTS] Remove deprecated `Runtime.getRuntime.exec(String)` API usage
dongjoon-hyun opened a new pull request, #43062: URL: https://github.com/apache/spark/pull/43062 ### What changes were proposed in this pull request? This PR aims to remove the deprecated `Runtime.exec` methods with a single string command line. ### Why are the changes needed? This is deprecated from Java 18. - https://bugs.openjdk.org/browse/JDK-8276408 (Deprecate Runtime.exec methods with a single string command line argument) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually check the compilation log. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] agubichev opened a new pull request, #43061: tests for correlated exists/IN with ORDER BY/LIMIT
agubichev opened a new pull request, #43061: URL: https://github.com/apache/spark/pull/43061 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default
LuciferYang commented on PR #40390: URL: https://github.com/apache/spark/pull/40390#issuecomment-1731860178 @ulysses-you I found that after this PR is merged, `InMemoryColumnarBenchmark` will fail to execute. ``` build/sbt "sql/Test/runMain org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark" ``` ``` [error] Exception in thread "main" java.lang.IndexOutOfBoundsException: 0 [error] at scala.collection.LinearSeqOps.apply(LinearSeq.scala:131) [error] at scala.collection.LinearSeqOps.apply$(LinearSeq.scala:128) [error] at scala.collection.immutable.List.apply(List.scala:79) [error] at org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.intCache(InMemoryColumnarBenchmark.scala:47) [error] at org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.$anonfun$runBenchmarkSuite$1(InMemoryColumnarBenchmark.scala:68) [error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) [error] at org.apache.spark.benchmark.BenchmarkBase.runBenchmark(BenchmarkBase.scala:42) [error] at org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark$.runBenchmarkSuite(InMemoryColumnarBenchmark.scala:68) [error] at org.apache.spark.benchmark.BenchmarkBase.main(BenchmarkBase.scala:72) [error] at org.apache.spark.sql.execution.columnar.InMemoryColumnarBenchmark.main(InMemoryColumnarBenchmark.scala) [error] Nonzero exit code returned from runner: 1 [error] (sql / Test / runMain) Nonzero exit code returned from runner: 1 ``` Should we run `InMemoryColumnarBenchmark` with the configuration `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning=false`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43060: [SPARK-45284][R] Update SparkR minimum SystemRequirements to Java 17
dongjoon-hyun opened a new pull request, #43060: URL: https://github.com/apache/spark/pull/43060 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn closed pull request #43016: [SPARK-45077][UI][FOLLOWUP] Update comment to link the forked repo yaooqinn/dagre-d3
yaooqinn closed pull request #43016: [SPARK-45077][UI][FOLLOWUP] Update comment to link the forked repo yaooqinn/dagre-d3 URL: https://github.com/apache/spark/pull/43016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on PR #43059: URL: https://github.com/apache/spark/pull/43059#issuecomment-1731767573 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun closed pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version URL: https://github.com/apache/spark/pull/43059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on PR #43059: URL: https://github.com/apache/spark/pull/43059#issuecomment-1731766295 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows
dongjoon-hyun commented on PR #43056: URL: https://github.com/apache/spark/pull/43056#issuecomment-1731765149 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows
dongjoon-hyun closed pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows URL: https://github.com/apache/spark/pull/43056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334634163 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: Yes in both ways, - Maven build will prevent it explicitly. - SBT also seems to hit compilation failure due to `-target:17`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
viirya commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334632685 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: Hmm, so currently we cannot use jdk8 with master where Java17 is enforced? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334632610 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: Hmm, I verified Java 8, it seems to fail in the other part. ``` $ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/Test/runMain org.apache.spark.serializer.KryoBenchmark" ... [error] '17' is not a valid choice for '-target' [error] bad option: '-target:17' [error] (tags / Compile / compileIncremental) Compilation failed [error] Total time: 39 s, completed Sep 22, 2023 10:03:31 AM ``` ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: Hmm, I re-verified Java 8, it seems to fail in the other part. ``` $ SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "core/Test/runMain org.apache.spark.serializer.KryoBenchmark" ... [error] '17' is not a valid choice for '-target' [error] bad option: '-target:17' [error] (tags / Compile / compileIncremental) Compilation failed [error] Total time: 39 s, completed Sep 22, 2023 10:03:31 AM ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334631707 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: For example, we didn't consider Java 7 unto now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334629527 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: We has been using `maven-enforcer-plugin` for the java version. We can assume Java 17+ dev environment. ``` ${java.version} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
viirya commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334628023 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: If it is using jdk8, it will be ""? It may be confused with Java17 base result. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a diff in pull request #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
viirya commented on code in PR #43059: URL: https://github.com/apache/spark/pull/43059#discussion_r1334628023 ## core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala: ## @@ -51,7 +51,7 @@ abstract class BenchmarkBase { val regenerateBenchmarkFiles: Boolean = System.getenv("SPARK_GENERATE_BENCHMARK_FILES") == "1" if (regenerateBenchmarkFiles) { val version = System.getProperty("java.version").split("\\D+")(0).toInt - val jdkString = if (version > 8) s"-jdk$version" else "" + val jdkString = if (version > 17) s"-jdk$version" else "" Review Comment: If it is using jdk8, it will be ""? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s
dongjoon-hyun commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1731745429 Thank you for your decision, @dcoliversun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43056: [SPARK-45277][BUILD][INFRA] Install Java 17 to support SparkR testing on Windows
LuciferYang commented on PR #43056: URL: https://github.com/apache/spark/pull/43056#issuecomment-1731734955 https://github.com/apache/spark/assets/1475305/2df78203-3cfd-4e5e-b162-d0bd38b3615d";> Passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #43059: [SPARK-45281][CORE][TESTS] Update BenchmarkBase to use Java 17 as the base version
dongjoon-hyun opened a new pull request, #43059: URL: https://github.com/apache/spark/pull/43059 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity
dongjoon-hyun commented on PR #43035: URL: https://github.com/apache/spark/pull/43035#issuecomment-1731694203 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity
dongjoon-hyun closed pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity URL: https://github.com/apache/spark/pull/43035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #43057: [SPARK-45280][INFRA] Change Maven daily test use Java 17 for testing
dongjoon-hyun closed pull request #43057: [SPARK-45280][INFRA] Change Maven daily test use Java 17 for testing URL: https://github.com/apache/spark/pull/43057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement
gengliangwang commented on code in PR #42985: URL: https://github.com/apache/spark/pull/42985#discussion_r1334552910 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala: ## @@ -61,68 +62,92 @@ case class PrintToStderr(child: Expression) extends UnaryExpression { /** * Throw with the result of an expression (used for debugging). */ +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Throws an exception with `expr`.", + usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION with `expr` as message, or a defined error class in `expr` with a parameter map.", examples = """ Examples: > SELECT _FUNC_('custom error message'); - java.lang.RuntimeException - custom error message + [USER_RAISED_EXCEPTION] custom error message + + > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`')); + [VIEW_NOT_FOUND] The view `V1` cannot be found. ... """, since = "3.1.0", group = "misc_funcs") -case class RaiseError(child: Expression, dataType: DataType) - extends UnaryExpression with ImplicitCastInputTypes { +// scalastyle:on line.size.limit +case class RaiseError(errorClass: Expression, errorParms: Expression, dataType: DataType) + extends BinaryExpression with ImplicitCastInputTypes { - def this(child: Expression) = this(child, NullType) + def this(str: Expression) = { +this(Literal("USER_RAISED_EXCEPTION"), + CreateMap(Seq(Literal("errorMessage"), str)), NullType) + } + + def this(errorClass: Expression, errorParms: Expression) = { +this(errorClass, errorParms, NullType) + } override def foldable: Boolean = false override def nullable: Boolean = true - override def inputTypes: Seq[AbstractDataType] = Seq(StringType) + override def inputTypes: Seq[AbstractDataType] = +Seq(StringType, MapType(StringType, StringType)) + + override def left: Expression = errorClass + override def right: Expression = errorParms override def prettyName: String = "raise_error" override def eval(input: InternalRow): Any = { -val value = child.eval(input) -if (value == null) { - throw new RuntimeException() -} -throw new RuntimeException(value.toString) +val error = errorClass.eval(input).asInstanceOf[UTF8String] +val parms: MapData = errorParms.eval(input).asInstanceOf[MapData] +throw raiseError(error, parms) } // if (true) is to avoid codegen compilation exception that statement is unreachable override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val eval = child.genCode(ctx) +val error = errorClass.genCode(ctx) +val parms = errorParms.genCode(ctx) ExprCode( - code = code"""${eval.code} + code = code"""${error.code} Review Comment: We may need to check the nullability of error and params as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement
gengliangwang commented on code in PR #42985: URL: https://github.com/apache/spark/pull/42985#discussion_r1334552099 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala: ## @@ -61,68 +62,92 @@ case class PrintToStderr(child: Expression) extends UnaryExpression { /** * Throw with the result of an expression (used for debugging). */ +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Throws an exception with `expr`.", + usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION with `expr` as message, or a defined error class in `expr` with a parameter map.", examples = """ Examples: > SELECT _FUNC_('custom error message'); - java.lang.RuntimeException - custom error message + [USER_RAISED_EXCEPTION] custom error message + + > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`')); + [VIEW_NOT_FOUND] The view `V1` cannot be found. ... """, since = "3.1.0", group = "misc_funcs") -case class RaiseError(child: Expression, dataType: DataType) - extends UnaryExpression with ImplicitCastInputTypes { +// scalastyle:on line.size.limit +case class RaiseError(errorClass: Expression, errorParms: Expression, dataType: DataType) + extends BinaryExpression with ImplicitCastInputTypes { - def this(child: Expression) = this(child, NullType) + def this(str: Expression) = { +this(Literal("USER_RAISED_EXCEPTION"), + CreateMap(Seq(Literal("errorMessage"), str)), NullType) + } + + def this(errorClass: Expression, errorParms: Expression) = { +this(errorClass, errorParms, NullType) + } override def foldable: Boolean = false override def nullable: Boolean = true - override def inputTypes: Seq[AbstractDataType] = Seq(StringType) + override def inputTypes: Seq[AbstractDataType] = +Seq(StringType, MapType(StringType, StringType)) + + override def left: Expression = errorClass + override def right: Expression = errorParms override def prettyName: String = "raise_error" override def eval(input: InternalRow): Any = { -val value = child.eval(input) -if (value == null) { - throw new RuntimeException() -} -throw new RuntimeException(value.toString) +val error = errorClass.eval(input).asInstanceOf[UTF8String] +val parms: MapData = errorParms.eval(input).asInstanceOf[MapData] +throw raiseError(error, parms) } // if (true) is to avoid codegen compilation exception that statement is unreachable override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val eval = child.genCode(ctx) +val error = errorClass.genCode(ctx) +val parms = errorParms.genCode(ctx) ExprCode( - code = code"""${eval.code} + code = code"""${error.code} Review Comment: ```suggestion code = code"""${error.code} |${parms.code} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement
srielau commented on code in PR #42985: URL: https://github.com/apache/spark/pull/42985#discussion_r1334513626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4432,6 +4432,17 @@ object SQLConf { .booleanConf .createWithDefault(false) + val LEGACY_RAISE_ERROR_WITHOUT_ERROR_CLASS = +buildConf("spark.sql.legacy.raiseErrorWithoutErrorClass") + .internal() + .doc("When set to true, restores the legacy behavior of `raise_error` and `assert_true` to " + +"not return the `[USER_RAISED_EXCEPTION]` prefix." + +"For example, `raise_error('error!')` returns `error!` instead of " + +"`[[USER_RAISED_EXCEPTION] Error!`.") Review Comment: I spoke to @gatorsmile , and he also recommended a config. There are two things at play here: The exception changed away from RuntimeException, and we got the prefix. It smells like we have a decent chance of breaking anyone who wants to catch these exceptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement
srielau commented on code in PR #42985: URL: https://github.com/apache/spark/pull/42985#discussion_r1334513626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4432,6 +4432,17 @@ object SQLConf { .booleanConf .createWithDefault(false) + val LEGACY_RAISE_ERROR_WITHOUT_ERROR_CLASS = +buildConf("spark.sql.legacy.raiseErrorWithoutErrorClass") + .internal() + .doc("When set to true, restores the legacy behavior of `raise_error` and `assert_true` to " + +"not return the `[USER_RAISED_EXCEPTION]` prefix." + +"For example, `raise_error('error!')` returns `error!` instead of " + +"`[[USER_RAISED_EXCEPTION] Error!`.") Review Comment: I spoke to Xiao, and he also recommended a config. There are two things at play here: The exception changed away from RuntimeException, and we got the prefix. It smells like we have a decent chance of breaking anyone who wants to catch these exceptions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srielau commented on a diff in pull request #42985: [SPARK-44838][SQL][WIP] raise_error improvement
srielau commented on code in PR #42985: URL: https://github.com/apache/spark/pull/42985#discussion_r1334511376 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala: ## @@ -61,68 +62,97 @@ case class PrintToStderr(child: Expression) extends UnaryExpression { /** * Throw with the result of an expression (used for debugging). */ +// scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr) - Throws an exception with `expr`.", + usage = "_FUNC_( expr [, errorParams ]) - Throws a USER_RAISED_EXCEPTION with `expr` as message, or a defined error class in `expr` with a parameter map.", examples = """ Examples: > SELECT _FUNC_('custom error message'); - java.lang.RuntimeException - custom error message + [USER_RAISED_EXCEPTION] custom error message + + > SELECT _FUNC_('VIEW_NOT_FOUND', Map('relationName' -> '`V1`')); + [VIEW_NOT_FOUND] The view `V1` cannot be found. ... """, since = "3.1.0", group = "misc_funcs") -case class RaiseError(child: Expression, dataType: DataType) - extends UnaryExpression with ImplicitCastInputTypes { +// scalastyle:on line.size.limit +case class RaiseError(errorClass: Expression, errorParms: Expression, dataType: DataType) + extends BinaryExpression with ImplicitCastInputTypes { + + def this(str: Expression) = { +this(Literal( + if (SQLConf.get.legacyNegativeIndexInArrayInsert) { Review Comment: Abandoned effort to put the logic in the wrong spot. Removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ishnagy commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long
ishnagy commented on PR #33550: URL: https://github.com/apache/spark/pull/33550#issuecomment-1731539550 Hi @ulysses-you , @attilapiros I'd like to work on this issue and tie up all the loose ends left. If you're ok with it, I'd like to open a new PR from my private repo reusing a significant portion of the changes in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #43053: [SPARK-45274][CORE][SQL][UI] Implementation of a new DAG drawing approach for job/stage/plan graphics to avoid fork
yaooqinn commented on PR #43053: URL: https://github.com/apache/spark/pull/43053#issuecomment-1731521203 cc @sarutak @cloud-fan @dongjoon-hyun @HyukjinKwon @mridulm, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request, #43058: Test new ammonite
LuciferYang opened a new pull request, #43058: URL: https://github.com/apache/spark/pull/43058 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11
LuciferYang commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1334415482 ## dev/infra/Dockerfile: ## @@ -30,7 +30,7 @@ RUN apt-get update && apt-get install -y \ pkg-config \ curl \ wget \ -openjdk-8-jdk \ +openjdk-17-jdk-headless \ Review Comment: Thanks @Yikun ~ Let me continue to monitor the running status of GA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request, #43057: [SPARK-45280] Change Maven daily test use Java 17 for testing.
LuciferYang opened a new pull request, #43057: URL: https://github.com/apache/spark/pull/43057 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Yikun commented on a diff in pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11
Yikun commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1334402022 ## dev/infra/Dockerfile: ## @@ -30,7 +30,7 @@ RUN apt-get update && apt-get install -y \ pkg-config \ curl \ wget \ -openjdk-8-jdk \ +openjdk-17-jdk-headless \ Review Comment: Emm, jdk install in here seems useless, because we are using github action to install the java. But if CI passed, it is ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43032: [SPARK-45252][CORE] Escape the greater/less than symbols in the comments to make `sbt doc` execute successfully
LuciferYang commented on PR #43032: URL: https://github.com/apache/spark/pull/43032#issuecomment-1731439172 Thanks @dongjoon-hyun and @mridulm ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #43028: [SPARK-45248][CORE]Set the timeout for spark ui server
srowen commented on PR #43028: URL: https://github.com/apache/spark/pull/43028#issuecomment-1731429696 OK, please put a comment in the code about why this is set lower than usual. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang closed pull request #43054: Test Appveyor use pre-installed java 17
LuciferYang closed pull request #43054: Test Appveyor use pre-installed java 17 URL: https://github.com/apache/spark/pull/43054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request, #43056: [SPARK-45277][INFRA] Install Java 17 for Windows SparkR test
LuciferYang opened a new pull request, #43056: URL: https://github.com/apache/spark/pull/43056 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #43005: [SPARK-44112][BUILD][INFRA][DOCS] Drop support for Java 8 and Java 11
LuciferYang commented on PR #43005: URL: https://github.com/apache/spark/pull/43005#issuecomment-1731366636 Thanks @dongjoon-hyun @HyukjinKwon @bjornjorgensen and @cfmcgrady ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sander-goos commented on pull request #43035: [SPARK-45256][SQL] DurationWriter fails when writing more values than initial capacity
sander-goos commented on PR #43035: URL: https://github.com/apache/spark/pull/43035#issuecomment-1731360942 > +1, this PR looks reasonable (Pending CIs). There is no perf regression for the case which fits the limit, right, @sander-goos ? There shouldn't be a perf regression; the extra call to `handleSafe` is a no-op when index < capacity. The Arrow writers for other types also `setSafe` where applicable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #43025: [SPARK-45247][BUILD][PYTHON][PS] Upgrade Pandas to 2.1.1
dongjoon-hyun commented on PR #43025: URL: https://github.com/apache/spark/pull/43025#issuecomment-1731345459 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org