Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-02 Thread via GitHub
panbingkun commented on code in PR #46301: URL: https://github.com/apache/spark/pull/46301#discussion_r1587105724 ## common/utils/src/main/java/org/apache/spark/internal/Logger.java: ## @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
panbingkun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2089669299 > Although we are waiting for `Ammonite` still, could you base this PR once more, @panbingkun ? > > * [Add support for Scala 2.13.14 

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
grundprinzip commented on code in PR #46297: URL: https://github.com/apache/spark/pull/46297#discussion_r1587143920 ## dev/requirements.txt: ## @@ -16,6 +16,7 @@ memory-profiler>=0.61.0 # PySpark test dependencies unittest-xml-reporting openpyxl +parameterized Review

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-05-02 Thread via GitHub
rajatrj20 commented on PR #45350: URL: https://github.com/apache/spark/pull/45350#issuecomment-2089760654 @cloud-fan This change broke an existing behaviour. When a aliased generator field A is referenced in some another field B in project list, it will create a situation where the B will

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on code in PR #46297: URL: https://github.com/apache/spark/pull/46297#discussion_r1587187092 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -18,13 +18,15 @@ import unittest import uuid from collections.abc import Generator -from typing

Re: [PR] [SPARK-48054][PYTHON][CONNECT][INFRA] Backward compatibility test for Spark Connect [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #46298: URL: https://github.com/apache/spark/pull/46298#issuecomment-2089909746 https://github.com/HyukjinKwon/spark/actions/runs/8920156555/job/24497640296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-05-02 Thread via GitHub
cloud-fan commented on PR #45350: URL: https://github.com/apache/spark/pull/45350#issuecomment-2090237466 > SELECT col_1, EXPLODE(MAP_KEYS(map_str_col)) AS key, map_str_col[key] AS value FROM nestedTable1; I think this can be supported with LCA. cc @anchovYu -- This is an

[PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when data type is mismatched [spark]

2024-05-02 Thread via GitHub
JoshRosen opened a new pull request, #46333: URL: https://github.com/apache/spark/pull/46333 ### What changes were proposed in this pull request? While migrating the `NTile` expression's type check failures to the new error class framework, PR

Re: [PR] [SPARK-48072][SQL][TESTS] Improve SQLQuerySuite test output [spark]

2024-05-02 Thread via GitHub
vladimirg-db commented on PR #46318: URL: https://github.com/apache/spark/pull/46318#issuecomment-2089979391 Yes, @dongjoon-hyun, my intent was to add a small improvement after the [SPARK-47939](https://issues.apache.org/jira/browse/SPARK-47939), which was resolved by my PR recently.

[PR] [SPARK-48088][PYTHON][CONNECT][TESTS][3.5] Skip tests that fail in 3.5 client <> 4.0 server [spark]

2024-05-02 Thread via GitHub
HyukjinKwon opened a new pull request, #46334: URL: https://github.com/apache/spark/pull/46334 ### What changes were proposed in this pull request? This PR proposes to skip the tests that fail with 3.5 client and 4.0 server in Spark Connect (by adding

Re: [PR] [SPARK-44264][PYTHON][ML] FunctionPickler Class [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #41946: URL: https://github.com/apache/spark/pull/41946#issuecomment-2089747330 cc @WeichenXu123, @lu-wang-dl, @xinrong-meng, @rithwik-db, @maddiedawson, mina following up this please? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on code in PR #46297: URL: https://github.com/apache/spark/pull/46297#discussion_r1587187092 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -18,13 +18,15 @@ import unittest import uuid from collections.abc import Generator -from typing

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
HyukjinKwon closed pull request #46297: [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received URL: https://github.com/apache/spark/pull/46297 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
nija-at commented on code in PR #46297: URL: https://github.com/apache/spark/pull/46297#discussion_r1587297283 ## dev/requirements.txt: ## @@ -16,6 +16,7 @@ memory-profiler>=0.61.0 # PySpark test dependencies unittest-xml-reporting openpyxl +parameterized Review Comment:

Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #46297: URL: https://github.com/apache/spark/pull/46297#issuecomment-2090362203 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
panbingkun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2089952489 > https://repo1.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.13.14/0.19/ I have updated the version of `genjavadoc` in the file `project/SparkBuild.scala`. --

Re: [PR] [SPARK-48088][PYTHON][CONNECT][TESTS][3.5] Skip tests that fail in 3.5 client <> 4.0 server [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on code in PR #46334: URL: https://github.com/apache/spark/pull/46334#discussion_r1587434003 ## python/pyspark/ml/tests/connect/test_connect_tuning.py: ## @@ -15,16 +15,17 @@ # See the License for the specific language governing permissions and #

Re: [PR] [SPARK-45988][SPARK-45989][PYTHON] Fix typehints to handle `list` GenericAlias in Python 3.11+ [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #43888: URL: https://github.com/apache/spark/pull/43888#issuecomment-2089920144 Let me backport this to branch-3.5 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on code in PR #46301: URL: https://github.com/apache/spark/pull/46301#discussion_r1587185157 ## common/utils/src/main/java/org/apache/spark/internal/Logger.java: ## @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-05-02 Thread via GitHub
bluzy commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2089855493 PTAL @dongjoon-hyun @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46333: URL: https://github.com/apache/spark/pull/46333#issuecomment-2090622363 cc @LuciferYang and @MaxGekk from - https://github.com/apache/spark/pull/38457 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46333: [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type URL: https://github.com/apache/spark/pull/46333 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48079][BUILD] Upgrade maven-install/deploy-plugin to 3.1.2 [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46330: [SPARK-48079][BUILD] Upgrade maven-install/deploy-plugin to 3.1.2 URL: https://github.com/apache/spark/pull/46330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46288: URL: https://github.com/apache/spark/pull/46288#discussion_r1587813517 ## connector/connect/client/jvm/pom.xml: ## @@ -73,7 +73,7 @@ com.lihaoyi - ammonite_${scala.version} + ammonite_2.13.13 Review

Re: [PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46333: URL: https://github.com/apache/spark/pull/46333#issuecomment-2090847049 Oh, there was a test failure in 3.5/3.4. Could you make a backporting PR to branch-3.5 and branch-3.4, @JoshRosen ? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48072][SQL][TESTS] Improve SQLQuerySuite test output - use `===` instead of `sameElements` for Arrays [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46318: [SPARK-48072][SQL][TESTS] Improve SQLQuerySuite test output - use `===` instead of `sameElements` for Arrays URL: https://github.com/apache/spark/pull/46318 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48088][PYTHON][CONNECT][TESTS][3.5] Skip tests that fail in 3.5 client <> 4.0 server [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46334: URL: https://github.com/apache/spark/pull/46334#issuecomment-2090614620 cc @grundprinzip , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2090615461 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-45988][SPARK-45989][PYTHON] Fix typehints to handle `list` GenericAlias in Python 3.11+ [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #43888: URL: https://github.com/apache/spark/pull/43888#issuecomment-2090629597 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
panbingkun commented on code in PR #46288: URL: https://github.com/apache/spark/pull/46288#discussion_r1587786326 ## connector/connect/client/jvm/pom.xml: ## @@ -73,7 +73,7 @@ com.lihaoyi - ammonite_${scala.version} + ammonite_2.13.13 Review

Re: [PR] [SPARK-45988][SPARK-45989][PYTHON] Fix typehints to handle `list` GenericAlias in Python 3.11+ [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #43888: URL: https://github.com/apache/spark/pull/43888#issuecomment-2090851277 I also cherry-picked it to branch-3.4, too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [Test] exclude scala.collection.Seq at java side [spark]

2024-05-02 Thread via GitHub
panbingkun opened a new pull request, #46335: URL: https://github.com/apache/spark/pull/46335 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46333: URL: https://github.com/apache/spark/pull/46333#issuecomment-2090626063 Merged to master/3.5/3.4 for Apache Spark 4.0.0-preview and 3.5.2 and 3.4.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-02 Thread via GitHub
panbingkun commented on PR #46301: URL: https://github.com/apache/spark/pull/46301#issuecomment-2090643728 @gengliangwang This pr is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-02 Thread via GitHub
panbingkun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2090712115 > jline I guess it may be related to the fact that the `Ammonite` supporting `scala 2.13.14` has not been released. -- This is an automated message from the Apache Git

[PR] [SPARK-48089] Backport Client Side StreamingQueryListener to 3.5 [spark]

2024-05-02 Thread via GitHub
WweiL opened a new pull request, #46339: URL: https://github.com/apache/spark/pull/46339 ### What changes were proposed in this pull request? Backport Client Side StreamingQueryListener to 3.5 ### Why are the changes needed? To pass cross-version test ###

[PR] [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][3.5] Add config to switch between client side and server side StreamingQueryListener [spark]

2024-05-02 Thread via GitHub
WweiL opened a new pull request, #46341: URL: https://github.com/apache/spark/pull/46341 ### What changes were proposed in this pull request? We are backporting the client side StreamingQueryListenr to branch-3.5. In case there is usage of server side listener and users want

Re: [PR] [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][3.5] Add config to switch between client side and server side StreamingQueryListener [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46341: [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][3.5] Add config to switch between client side and server side StreamingQueryListener URL: https://github.com/apache/spark/pull/46341 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][3.5] Add config to switch between client side and server side StreamingQueryListener [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46341: URL: https://github.com/apache/spark/pull/46341#issuecomment-2091452810 Since this is not for review, I'd recommend to open this PR to your repository. Your GitHub Action can verify your PR, @WweiL . -- This is an automated message from the Apache

Re: [PR] [SPARK-48089][SS][CONNECT][3.5] Backport Client Side StreamingQueryListener to 3.5 [spark]

2024-05-02 Thread via GitHub
WweiL commented on PR #46339: URL: https://github.com/apache/spark/pull/46339#issuecomment-2091452004 @dongjoon-hyun Thanks for the comment! May I know if it is acceptable to merge this with this added config? https://github.com/apache/spark/pull/46341 We can allow users to switch between

[PR] [SPARK-48095][INFRA] Run `build_non_ansi.yml` once per day [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun opened a new pull request, #46342: URL: https://github.com/apache/spark/pull/46342 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47681][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-02 Thread via GitHub
gene-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1588311315 ## python/pyspark/sql/variant_utils.py: ## @@ -245,47 +245,57 @@ def _get_string(cls, value: bytes, pos: int) -> str: length = cls._read_long(value,

[PR] [SPARK-48093][SS][CONNECT][4.0] Add server side config handler for 3.5 client requesting Server side StreamingQueryListener [spark]

2024-05-02 Thread via GitHub
WweiL opened a new pull request, #46340: URL: https://github.com/apache/spark/pull/46340 ### What changes were proposed in this pull request? We are backporting the client side StreamingQueryListenr to branch-3.5. In case there is usage of server side listener and users want

Re: [PR] [SPARK-47681][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-02 Thread via GitHub
chenhao-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1588306639 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) {

Re: [PR] [SPARK-48095][INFRA] Run `build_non_ansi.yml` once per day [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46342: URL: https://github.com/apache/spark/pull/46342#issuecomment-2091492692 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47681][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-02 Thread via GitHub
gene-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1588226317 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) {

Re: [PR] [SPARK-48067][SQL] Fix variant default columns [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on PR #46312: URL: https://github.com/apache/spark/pull/46312#issuecomment-2091365435 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48067][SQL] Fix variant default columns [spark]

2024-05-02 Thread via GitHub
gengliangwang closed pull request #46312: [SPARK-48067][SQL] Fix variant default columns URL: https://github.com/apache/spark/pull/46312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47681][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-02 Thread via GitHub
chenhao-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1588252744 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) {

[PR] [SPARK-48081][SQL][3.5] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
JoshRosen opened a new pull request, #46336: URL: https://github.com/apache/spark/pull/46336 branch-3.5 pick of PR https://github.com/apache/spark/pull/46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows

Re: [PR] [SPARK-36705][FOLLOW-UP] Support the case when user's classes need to register for Kryo serialization [spark]

2024-05-02 Thread via GitHub
romibuzi commented on PR #34158: URL: https://github.com/apache/spark/pull/34158#issuecomment-2091427975 I have filled a JIRA prior finding this PR https://issues.apache.org/jira/browse/SPARK-48043. I'm facing a similar issue with the call on `Utils.isPushBasedShuffleEnabled` in

[PR] [SPARK-48081][SQL][3.4] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
JoshRosen opened a new pull request, #46337: URL: https://github.com/apache/spark/pull/46337 branch-3.4 pick of PR https://github.com/apache/spark/pull/46333 , fixing test issue due to difference in expected error message parameter formatting across branches; original description follows

Re: [PR] [SPARK-48081] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
JoshRosen commented on PR #46333: URL: https://github.com/apache/spark/pull/46333#issuecomment-2091029447 Backport PRs: - 3.5: https://github.com/apache/spark/pull/46336 - 3.4: https://github.com/apache/spark/pull/46337 -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48081][SQL][3.5] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46336: [SPARK-48081][SQL][3.5] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type URL: https://github.com/apache/spark/pull/46336 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-48081][SQL][3.5] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46336: URL: https://github.com/apache/spark/pull/46336#issuecomment-2091437784 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48081][SQL][3.4] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46337: URL: https://github.com/apache/spark/pull/46337#issuecomment-2091448715 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48081][SQL][3.4] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46337: [SPARK-48081][SQL][3.4] Fix ClassCastException in NTile.checkInputDataTypes() when argument is non-foldable or of wrong type URL: https://github.com/apache/spark/pull/46337 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-48089][SS][CONNECT][3.5] Backport Client Side StreamingQueryListener to 3.5 [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46339: [SPARK-48089][SS][CONNECT][3.5] Backport Client Side StreamingQueryListener to 3.5 URL: https://github.com/apache/spark/pull/46339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48089][SS][CONNECT][3.5] Backport Client Side StreamingQueryListener to 3.5 [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46339: URL: https://github.com/apache/spark/pull/46339#issuecomment-2091447286 Let me close this to prevent accidental merging. We can continue to discuss on this PR after closing. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47681][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-02 Thread via GitHub
chenhao-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1588326114 ## python/pyspark/sql/variant_utils.py: ## @@ -245,47 +245,57 @@ def _get_string(cls, value: bytes, pos: int) -> str: length =

Re: [PR] [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework [spark]

2024-05-02 Thread via GitHub
dtenedor commented on PR #46309: URL: https://github.com/apache/spark/pull/46309#issuecomment-2091585054 @gengliangwang thanks for your review, responded to comments, please look again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588493621 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

Re: [PR] [SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict [spark]

2024-05-02 Thread via GitHub
sunchao closed pull request #46325: [SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict URL: https://github.com/apache/spark/pull/46325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict [spark]

2024-05-02 Thread via GitHub
szehon-ho commented on PR #46325: URL: https://github.com/apache/spark/pull/46325#issuecomment-2091913189 Thanks for fast review! Yea will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48099][INFRA] Run `maven-build` test only on `Java 21 on MacOS14 (Apple Silicon)` [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46347: URL: https://github.com/apache/spark/pull/46347#issuecomment-2091940855 Could you review this PR, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46738][CONNECT][PYTHON] Make the display of `cast` in `Regular Spark` and `Spark Connect` consistent [spark]

2024-05-02 Thread via GitHub
github-actions[bot] commented on PR #44829: URL: https://github.com/apache/spark/pull/44829#issuecomment-2091944038 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [WIP][SQL] Avoid parquet footer reads twice [spark]

2024-05-02 Thread via GitHub
github-actions[bot] commented on PR #44853: URL: https://github.com/apache/spark/pull/44853#issuecomment-2091944028 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-42669][CONNECT] Short circuit local relation RPCs [spark]

2024-05-02 Thread via GitHub
github-actions[bot] commented on PR #40782: URL: https://github.com/apache/spark/pull/40782#issuecomment-2091944074 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

2024-05-02 Thread via GitHub
github-actions[bot] commented on PR #44725: URL: https://github.com/apache/spark/pull/44725#issuecomment-2091944051 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48088][PYTHON][CONNECT][TESTS][3.5] Skip tests that fail in 3.5 client <> 4.0 server [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #46334: URL: https://github.com/apache/spark/pull/46334#issuecomment-2091994700 `Spark Connect` is actually for testing pure python library. Spark 3.5 doesn't have it ... from 4.0 <> 4.1, we could leverage pure python library build to test them. I am

Re: [PR] [SPARK-48096][INFRA] Run `build_maven_java21_macos14.yml` every two days [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46343: URL: https://github.com/apache/spark/pull/46343#issuecomment-2091612469 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46346: URL: https://github.com/apache/spark/pull/46346#issuecomment-2091823433 Thank you so much, @gengliangwang . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48097][INFRA] Limit GHA job execution time to up to 3 hours in `build_and_test.yml` [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46344: [SPARK-48097][INFRA] Limit GHA job execution time to up to 3 hours in `build_and_test.yml` URL: https://github.com/apache/spark/pull/46344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48097][INFRA] Limit GHA job execution time to up to 3 hours in `build_and_test.yml` [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46344: URL: https://github.com/apache/spark/pull/46344#issuecomment-2091828537 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][4.0] Add server side config handler for 3.5 client requesting Server side StreamingQueryListener [spark]

2024-05-02 Thread via GitHub
WweiL closed pull request #46340: [DO-NOT-REVIEW] [SPARK-48093][SS][CONNECT][4.0] Add server side config handler for 3.5 client requesting Server side StreamingQueryListener URL: https://github.com/apache/spark/pull/46340 -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict [spark]

2024-05-02 Thread via GitHub
sunchao commented on PR #46325: URL: https://github.com/apache/spark/pull/46325#issuecomment-2091895489 Merged to master, thanks @szehon-ho ! Do you think we need to backport this to branch-3.4 and branch-3.5? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47671][Core] Enable structured logging in log4j2.properties.template and update docs [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46349: URL: https://github.com/apache/spark/pull/46349#discussion_r1588525182 ## docs/configuration.md: ## @@ -3670,14 +3670,17 @@ Note: When running Spark on YARN in `cluster` mode, environment variables need t # Configuring Logging

Re: [PR] [SPARK-47671][Core] Enable structured logging in log4j2.properties.template and update docs [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on PR #46349: URL: https://github.com/apache/spark/pull/46349#issuecomment-2091909130 @dongjoon-hyun I double checked and removed some comments. PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [SPARK-48096][INFRA] Run `build_maven_java21_macos14.yml` every two days [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun opened a new pull request, #46343: URL: https://github.com/apache/spark/pull/46343 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [SPARK-48097][INFRA] Limit GHA job execution time to up to 3 hours in `build_and_test.yml` [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun opened a new pull request, #46344: URL: https://github.com/apache/spark/pull/46344 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48096][INFRA] Run `build_maven_java21_macos14.yml` every two days [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46343: [SPARK-48096][INFRA] Run `build_maven_java21_macos14.yml` every two days URL: https://github.com/apache/spark/pull/46343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-43046][FOLLOWUP][SS][CONNECT] Remove not used line in deduplicateWithinWatermark [spark]

2024-05-02 Thread via GitHub
WweiL opened a new pull request, #46345: URL: https://github.com/apache/spark/pull/46345 ### What changes were proposed in this pull request? An extra assignment was added when we first introduce `dropDuplicatesWithinWatermark` in

[PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun opened a new pull request, #46346: URL: https://github.com/apache/spark/pull/46346 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46346: URL: https://github.com/apache/spark/pull/46346#issuecomment-2091830231 Oh, sorry. I need to revise this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48065][SQL] SPJ: allowJoinKeysSubsetOfPartitionKeys is too strict [spark]

2024-05-02 Thread via GitHub
szehon-ho commented on PR #46325: URL: https://github.com/apache/spark/pull/46325#issuecomment-2091841349 @sunchao I think its a simple fix, can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588499107 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588499484 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588500715 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588500029 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46346: URL: https://github.com/apache/spark/pull/46346#issuecomment-2091864830 Thank you for review and approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47671][Core] Enable structured logging in log4j2.properties.template and update docs [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on PR #46349: URL: https://github.com/apache/spark/pull/46349#issuecomment-2091886078 cc @dtenedor @panbingkun as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-47671][Core] Enable structured logging in log4j2.properties.template and update docs [spark]

2024-05-02 Thread via GitHub
gengliangwang opened a new pull request, #46349: URL: https://github.com/apache/spark/pull/46349 ### What changes were proposed in this pull request? - Rename the current log4j2.properties.template as log4j2.properties.pattern-layout-template - Enable structured

Re: [PR] [SPARK-48098][INFRA] Enable `NOLINT_ON_COMPILE` for all except `lint` job [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on code in PR #46346: URL: https://github.com/apache/spark/pull/46346#discussion_r1588493621 ## project/SparkBuild.scala: ## @@ -257,7 +257,7 @@ object SparkBuild extends PomBuild { lazy val sharedSettings = sparkGenjavadocSettings ++

[PR] [SPARK-48102] Track duration for acquiring source/sink metrics while reporting streaming query progress [spark]

2024-05-02 Thread via GitHub
anishshri-db opened a new pull request, #46350: URL: https://github.com/apache/spark/pull/46350 ### What changes were proposed in this pull request? Track duration for acquiring source/sink metrics while reporting streaming query progress ### Why are the changes needed?

Re: [PR] [SPARK-48048][CONNECT][SS] Added client side listener support for Scala [spark]

2024-05-02 Thread via GitHub
HyukjinKwon closed pull request #46287: [SPARK-48048][CONNECT][SS] Added client side listener support for Scala URL: https://github.com/apache/spark/pull/46287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48048][CONNECT][SS] Added client side listener support for Scala [spark]

2024-05-02 Thread via GitHub
HyukjinKwon commented on PR #46287: URL: https://github.com/apache/spark/pull/46287#issuecomment-2091982060 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-02 Thread via GitHub
gengliangwang commented on code in PR #46301: URL: https://github.com/apache/spark/pull/46301#discussion_r1588589564 ## common/utils/src/main/java/org/apache/spark/internal/Logger.java: ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] [SPARK-48096][INFRA] Run `build_maven_java21_macos14.yml` every two days [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun commented on PR #46343: URL: https://github.com/apache/spark/pull/46343#issuecomment-2091539703 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48095][INFRA] Run `build_non_ansi.yml` once per day [spark]

2024-05-02 Thread via GitHub
dongjoon-hyun closed pull request #46342: [SPARK-48095][INFRA] Run `build_non_ansi.yml` once per day URL: https://github.com/apache/spark/pull/46342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48095][INFRA] Run `build_non_ansi.yml` once per day [spark]

2024-05-02 Thread via GitHub
huaxingao commented on PR #46342: URL: https://github.com/apache/spark/pull/46342#issuecomment-2091567244 LGTM. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   >