Re: [PR] [SPARK-48171][CORE] Clean up the use of deprecated constructors of `o.rocksdb.Logger` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46436: [SPARK-48171][CORE] Clean up the use of deprecated constructors of `o.rocksdb.Logger` URL: https://github.com/apache/spark/pull/46436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46402: URL: https://github.com/apache/spark/pull/46402#issuecomment-2098687774 I merged the following. Could you rebase this PR? - https://github.com/apache/spark/pull/46427 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48169][SQL] Use lazy BadRecordException cause in all parsers and remove the old constructor, which was meant for the migration [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46438: [SPARK-48169][SQL] Use lazy BadRecordException cause in all parsers and remove the old constructor, which was meant for the migration URL: https://github.com/apache/spark/pull/46438 -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46841][SQL] Add collation support for ICU locales and collation specifiers [spark]

2024-05-07 Thread via GitHub
stefankandic commented on PR #46180: URL: https://github.com/apache/spark/pull/46180#issuecomment-2098821075 will we have to do the same for pyspark - as `StringType` there only supports 4 initial collations? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48170][PYTHON][CONNECT][TESTS] Enable `ArrowPythonUDFParityTests.test_err_return_type` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46433: URL: https://github.com/apache/spark/pull/46433#issuecomment-2098820860 Merged to master for Apache Spark 4.0.0-preview. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48170][PYTHON][CONNECT][TESTS] Enable `ArrowPythonUDFParityTests.test_err_return_type` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46433: [SPARK-48170][PYTHON][CONNECT][TESTS] Enable `ArrowPythonUDFParityTests.test_err_return_type` URL: https://github.com/apache/spark/pull/46433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46414: URL: https://github.com/apache/spark/pull/46414#issuecomment-2098834940 Merged to master for Apache Spark 4.0.0-preview. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46414: [SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply URL: https://github.com/apache/spark/pull/46414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
viirya commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098866163 Looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46441: [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline URL: https://github.com/apache/spark/pull/46441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098866227 Thank you, @viirya ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [DO-NOT-REVIEW][SPARK-48089][SS][CONNECT] Fix Streaming 3.5 <> 4.0 compatibility test [spark]

2024-05-07 Thread via GitHub
WweiL opened a new pull request, #46444: URL: https://github.com/apache/spark/pull/46444 This reverts commit df633091b7147dea84a5a51a30dcf690ca7d1124. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592826648 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592837929 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -17,28 +17,36 @@ package

Re: [PR] [SPARK-47307][SQL] Add a config to optionally chunk base64 strings [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-2098956928 Gentle ping, @ted-jenks . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592830626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -162,10 +196,10 @@ object ModeBuilder extends ExpressionBuilder {

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2098985001 WDYT, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48146][SQL] Handle case of aggregate function in With expression child [spark]

2024-05-07 Thread via GitHub
kelvinjian-db commented on PR #46443: URL: https://github.com/apache/spark/pull/46443#issuecomment-2099000299 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun opened a new pull request, #46441: URL: https://github.com/apache/spark/pull/46441 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48173][SQL] CheckAnalysis should see the entire query plan [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46439: [SPARK-48173][SQL] CheckAnalysis should see the entire query plan URL: https://github.com/apache/spark/pull/46439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [3.5][SPARK-48173][SQL] CheckAnalysis should see the entire query plan [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46442: URL: https://github.com/apache/spark/pull/46442#issuecomment-2098777015 BTW, could you reiview this INFRA PR when you have some time, @cloud-fan ? There are not many people at this point of time. Sorry for asking you. - #46441 The above is a

Re: [PR] [SPARK-41547][CONNECT][TESTS] Re-eneable Spark Connect function tests with ANSI mode [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46432: [SPARK-41547][CONNECT][TESTS] Re-eneable Spark Connect function tests with ANSI mode URL: https://github.com/apache/spark/pull/46432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-07 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1592763066 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUpdateEventTimeWatermarkColumn.scala: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
cloud-fan commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098943607 late LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47297][SQL] Add collation support for format expressions [spark]

2024-05-07 Thread via GitHub
cloud-fan commented on PR #46423: URL: https://github.com/apache/spark/pull/46423#issuecomment-2098622408 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48000][SQL] Enable hash join support for non-binary collations [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46166: URL: https://github.com/apache/spark/pull/46166#discussion_r1592640514 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48000][SQL] Enable hash join support for non-binary collations [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46166: URL: https://github.com/apache/spark/pull/46166#discussion_r1592640514 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46395][CORE] Assign Spark configs to groups for use in documentation [spark]

2024-05-07 Thread via GitHub
nchammas commented on PR #44755: URL: https://github.com/apache/spark/pull/44755#issuecomment-2098662588 Following up regarding your feedback on the API change, @holdenk. Do you recall what it was? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48169][SQL] Use lazy BadRecordException cause in all parsers and remove the old constructor, which was meant for the migration [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46438: URL: https://github.com/apache/spark/pull/46438#issuecomment-2098821219 Merged to master for Apache Spark 4.0.0-preview. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47764][FOLLOW-UP][WIP] Change to use ShuffleDriverComponents.removeShuffle to remove shuffle properly [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46302: URL: https://github.com/apache/spark/pull/46302#issuecomment-2098823631 I converted this to `Draft` PR because the CI is broken and the PR title has `[WIP]`. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592820188 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46273: [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data URL: https://github.com/apache/spark/pull/46273 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46273: URL: https://github.com/apache/spark/pull/46273#issuecomment-2098934956 Merged to master for Apache Spark 4.0.0-preview. Could you make backporting PRs to the release branches, @cxzl25 ? -- This is an automated message from the Apache Git

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592824395 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592822463 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592834974 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -17,28 +17,36 @@ package

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098974153 Thank you, @cloud-fan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-41794][SQL] Add `try_remainder` function and re-enable column tests [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46434: URL: https://github.com/apache/spark/pull/46434#issuecomment-2098973037 Could you fix Python linter failure, @grundprinzip ? ``` ./python/pyspark/sql/tests/connect/test_connect_column.py:1029:5: F401 'os' imported but unused import os

Re: [PR] [SPARK-48173][SQL] CheckAnalysis should see the entire query plan [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46439: URL: https://github.com/apache/spark/pull/46439#issuecomment-2098635035 Merged to master for Apache Spark 4.0.0-preview. Please make backporting PRs to the release branches, @cloud-fan . I believe we need to pass the CIs for this backporting.

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1592672474 ## connector/profiler/pom.xml: ## @@ -45,6 +48,7 @@ me.bechberger ap-loader-all 3.0-8 + provided Review Comment: cc @parthchandra ,

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1592673906 ## connector/profiler/README.md: ## @@ -23,7 +23,7 @@ Code profiling is currently only supported for To get maximum profiling information set the following jvm

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-07 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1592763363 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -347,6 +347,28 @@ class IncrementalExecution(

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-07 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1592764603 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala: ## @@ -107,25 +109,70 @@ case class EventTimeWatermarkExec( }

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-07 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1592763254 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -347,6 +347,28 @@ class IncrementalExecution(

Re: [PR] [SPARK-48165][BUILD] Update `ap-loader` to 3.0-9 [spark]

2024-05-07 Thread via GitHub
parthchandra commented on PR #46427: URL: https://github.com/apache/spark/pull/46427#issuecomment-2098862359 Thanks @dongjoon-hyun. Noted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-07 Thread via GitHub
parthchandra commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1592803402 ## connector/profiler/pom.xml: ## @@ -45,6 +48,7 @@ me.bechberger ap-loader-all 3.0-8 + provided Review Comment: It's great to

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592822463 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592822930 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592822930 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [SPARK-48000][SQL] Enable hash join support for non-binary collations [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46166: URL: https://github.com/apache/spark/pull/46166#discussion_r1592640514 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2098787633 Thank you for confirming. Ya, let's wait. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-48146] Handle case of aggregate function in With expression child [spark]

2024-05-07 Thread via GitHub
kelvinjian-db opened a new pull request, #46443: URL: https://github.com/apache/spark/pull/46443 ### What changes were proposed in this pull request? This PR fixes an edge case where an aggregate function is in the child of a `With` expression, which previously

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592827444 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## Review Comment: did you run this benchmark? I wonder how this

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
cloud-fan commented on code in PR #46273: URL: https://github.com/apache/spark/pull/46273#discussion_r1592828452 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2502,6 +2502,26 @@ class AdaptiveQueryExecSuite } }

Re: [PR] [SPARK-47297][SQL] Add collation support for format expressions [spark]

2024-05-07 Thread via GitHub
cloud-fan closed pull request #46423: [SPARK-47297][SQL] Add collation support for format expressions URL: https://github.com/apache/spark/pull/46423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [3.5][SPARK-48173][SQL] CheckAnalysis should see the entire query plan [spark]

2024-05-07 Thread via GitHub
cloud-fan opened a new pull request, #46442: URL: https://github.com/apache/spark/pull/46442 backport https://github.com/apache/spark/pull/46439 to 3.5 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/38029

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098838055 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
GideonPotok commented on PR #46404: URL: https://github.com/apache/spark/pull/46404#issuecomment-2098916903 @cloud-fan @MaxGekk @dbatomic just letting y'all know this is ready for first round of review. Thanks! -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-48165][BUILD] Update `ap-loader` to 3.0-9 [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46427: URL: https://github.com/apache/spark/pull/46427#issuecomment-2098639686 Merged to master for Apache Spark 4.0.0-preview. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48165][BUILD] Update `ap-loader` to 3.0-9 [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46427: [SPARK-48165][BUILD] Update `ap-loader` to 3.0-9 URL: https://github.com/apache/spark/pull/46427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48174][INFRA] Merge `connect` back to the original test pipeline [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46441: URL: https://github.com/apache/spark/pull/46441#issuecomment-2098682742 Could you review this INFRA PR, @cloud-fan ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48172][SQL] Fix escaping issue for mysql [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on code in PR #46437: URL: https://github.com/apache/spark/pull/46437#discussion_r1592733398 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -351,7 +351,7 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592825050 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -800,6 +804,61 @@ class CollationStringExpressionsSuite

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592833471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } - override

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
uros-db commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592832622 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } - override

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2098984770 I understand your point, @mridulm . Do you use the following syntax? ``` spark.sparkContext.setLocalProperty("mdc." + name, "value") ``` Initially, I thought it's

Re: [PR] [SPARK-48184][PYTHON][CONNECT] Always set the seed of `Dataframe.sample` in Client side [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46456: URL: https://github.com/apache/spark/pull/46456#issuecomment-2099681039 Ah, I got your point. It's a very interesting `connector` bug. > df2 should be immutable. I was thinking the following. My bad. ``` scala>

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46440: URL: https://github.com/apache/spark/pull/46440#issuecomment-2099685242 No problem, @yaooqinn . We have enough time for 4.0.0. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-07 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1593353161 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

[PR] [WIP][SQL] Misc expressions [spark]

2024-05-07 Thread via GitHub
uros-db opened a new pull request, #46461: URL: https://github.com/apache/spark/pull/46461 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48184][PYTHON][CONNECT] Always set the seed of `Dataframe.sample` in Client side [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on code in PR #46456: URL: https://github.com/apache/spark/pull/46456#discussion_r1593374153 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -430,6 +430,11 @@ def test_sample(self): IllegalArgumentException, lambda:

Re: [PR] [SPARK-48184][PYTHON][CONNECT] Always set the seed of `Dataframe.sample` in Client side [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on code in PR #46456: URL: https://github.com/apache/spark/pull/46456#discussion_r1593374153 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -430,6 +430,11 @@ def test_sample(self): IllegalArgumentException, lambda:

Re: [PR] [SPARK-48185][SQL] Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo' [spark]

2024-05-07 Thread via GitHub
LuciferYang commented on code in PR #46457: URL: https://github.com/apache/spark/pull/46457#discussion_r1593383893 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -197,8 +197,8 @@ trait SparkDateTimeUtils {

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-07 Thread via GitHub
yaooqinn commented on PR #46440: URL: https://github.com/apache/spark/pull/46440#issuecomment-2099683026 Thank you @dongjoon-hyun Let me look into this issue, it might take a while -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
cxzl25 commented on code in PR #46273: URL: https://github.com/apache/spark/pull/46273#discussion_r1593350763 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2502,6 +2502,26 @@ class AdaptiveQueryExecSuite } } +

[PR] [SPARK-48185][SQL] Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo' [spark]

2024-05-07 Thread via GitHub
yaooqinn opened a new pull request, #46457: URL: https://github.com/apache/spark/pull/46457 ### What changes were proposed in this pull request? I met the error below while debugging UTs because of loading `sun.util.calendar.ZoneInfo` eagerly. This PR makes the

Re: [PR] [SPARK-47914][SQL] Do not display the splits parameter in Range [spark]

2024-05-07 Thread via GitHub
yaooqinn closed pull request #46136: [SPARK-47914][SQL] Do not display the splits parameter in Range URL: https://github.com/apache/spark/pull/46136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48183][PYTHON][DOCS] Update error contribution guide to respect new error class file [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46455: URL: https://github.com/apache/spark/pull/46455#issuecomment-2099711660 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48183][PYTHON][DOCS] Update error contribution guide to respect new error class file [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46455: [SPARK-48183][PYTHON][DOCS] Update error contribution guide to respect new error class file URL: https://github.com/apache/spark/pull/46455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48185][SQL] Fix 'symbolic reference class is not accessible: class sun.util.calendar.ZoneInfo' [spark]

2024-05-07 Thread via GitHub
LuciferYang commented on code in PR #46457: URL: https://github.com/apache/spark/pull/46457#discussion_r1593382052 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -197,8 +197,8 @@ trait SparkDateTimeUtils {

Re: [PR] [SPARK-48037][CORE][3.5] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46459: URL: https://github.com/apache/spark/pull/46459#issuecomment-2099760350 Could you make a backporting PR to branch-3.4 too, @cxzl25 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-07 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2099774647 This seems to be broken in the main function of pyspark init(), what is the expected action item we should take? @HyukjinKwon -- This is an automated message from the Apache

Re: [PR] [SPARK-48037][CORE][3.5] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46459: URL: https://github.com/apache/spark/pull/46459#issuecomment-2099780272 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48037][CORE][3.5] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun closed pull request #46459: [SPARK-48037][CORE][3.5] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data URL: https://github.com/apache/spark/pull/46459 -- This is an automated message from the Apache Git Service. To respond to

[PR] [WIP][SQL] Add support for AbstractMapType [spark]

2024-05-07 Thread via GitHub
uros-db opened a new pull request, #46458: URL: https://github.com/apache/spark/pull/46458 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47914][SQL] Do not display the splits parameter in Range [spark]

2024-05-07 Thread via GitHub
yaooqinn commented on PR #46136: URL: https://github.com/apache/spark/pull/46136#issuecomment-2099698487 Merged to master Thank you @guixiaowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-48037][CORE][3.5] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-07 Thread via GitHub
cxzl25 opened a new pull request, #46459: URL: https://github.com/apache/spark/pull/46459 ### What changes were proposed in this pull request? This PR aims to fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data. ### Why are the

[PR] [WIP][SQL] Add collation support for URL expressions [spark]

2024-05-07 Thread via GitHub
uros-db opened a new pull request, #46460: URL: https://github.com/apache/spark/pull/46460 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SQL] Add collation support for JSON expressions [spark]

2024-05-07 Thread via GitHub
uros-db opened a new pull request, #46462: URL: https://github.com/apache/spark/pull/46462 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-07 Thread via GitHub
GideonPotok commented on code in PR #46404: URL: https://github.com/apache/spark/pull/46404#discussion_r1592912017 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -70,20 +78,46 @@ case class Mode( buffer } -

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-07 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1592915751 ## python/pyspark/sql/pandas/conversion.py: ## @@ -225,15 +225,68 @@ def toPandas(self) -> "PandasDataFrameLike": else: return pdf -def

Re: [PR] [SPARK-48167][PYTHON][TESTS][FOLLOWUP][3.5] Reformat test_readwriter.py to fix Python Linter error [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46445: URL: https://github.com/apache/spark/pull/46445#issuecomment-2099174845 cc @HyukjinKwon and @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-07 Thread via GitHub
gengliangwang commented on PR #46390: URL: https://github.com/apache/spark/pull/46390#issuecomment-2099214104 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-48177][BUILD]: Bump Apache Parquet to 1.14.0 [spark]

2024-05-07 Thread via GitHub
Fokko opened a new pull request, #46447: URL: https://github.com/apache/spark/pull/46447 ### What changes were proposed in this pull request? ### Why are the changes needed? Fixes quite a few bugs on the Parquet side:

[PR] [SPARK-XXX][INFRA][3.5] Pin `nbsphinx` to `0.9.3' [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun opened a new pull request, #46448: URL: https://github.com/apache/spark/pull/46448 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48177][BUILD] Upgrade `Apache Parquet` to 1.14.0 [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun commented on PR #46447: URL: https://github.com/apache/spark/pull/46447#issuecomment-2099295786 cc @cloud-fan , @HyukjinKwon , @mridulm , @sunchao , @yaooqinn , @LuciferYang , @steveloughran , @viirya , @huaxin, @parthchandra , too. -- This is an automated message from the

[PR] [SPARK-48178][INFRA] Run `build/scala-211/java-11-17` jobs of `branch-3.5` only if needed [spark]

2024-05-07 Thread via GitHub
dongjoon-hyun opened a new pull request, #46449: URL: https://github.com/apache/spark/pull/46449 ### What changes were proposed in this pull request? This PR aims to run `build`, `scala-213`, and `java-11-17` job of `branch-3.5` only if needed to reduce the maximum concurrency of

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-07 Thread via GitHub
gengliangwang commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2099078293 @dongjoon-hyun Thanks, I will add a configuration about the unification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47365][PYTHON] Add toArrowTable() DataFrame method to PySpark [spark]

2024-05-07 Thread via GitHub
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1592915751 ## python/pyspark/sql/pandas/conversion.py: ## @@ -225,15 +225,68 @@ def toPandas(self) -> "PandasDataFrameLike": else: return pdf -def

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-05-07 Thread via GitHub
SemyonSinchenko commented on PR #46368: URL: https://github.com/apache/spark/pull/46368#issuecomment-2099107642 New changes: - fixes from comments - **changing the type from Long to BigInteger** (`bytes` in proto) -- This is an automated message from the Apache Git Service. To

  1   2   3   >