Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-05-10 Thread via GitHub
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105548275 I raised SPARK-48238. And I have no solution without reverting javax => jakarta namespace migration yet. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]

2024-05-10 Thread via GitHub
guixiaowen commented on PR #46496: URL: https://github.com/apache/spark/pull/46496#issuecomment-2105515304 > sorry this description is very hard to read. I think you are trying to say that stage level scheduling isn't supported on k8s and yarn when dynamic allocation is disabled? When you

Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]

2024-05-10 Thread via GitHub
beliefer commented on PR #46437: URL: https://github.com/apache/spark/pull/46437#issuecomment-2105511312 LGTM except @cloud-fan 's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]

2024-05-10 Thread via GitHub
beliefer commented on code in PR #46437: URL: https://github.com/apache/spark/pull/46437#discussion_r1597343529 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -65,7 +68,6 @@ protected String

[PR] [Only Test] test jaxb-runtime [spark]

2024-05-10 Thread via GitHub
panbingkun opened a new pull request, #46533: URL: https://github.com/apache/spark/pull/46533 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #46531: [SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script URL: https://github.com/apache/spark/pull/46531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105497685 I have narrowed the scope of the impact of not using `local maven repo cache` to the step `MIMA test` of the job `lint` -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-10 Thread via GitHub
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession)

Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105493606 cc @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48237][BUILD] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105481077 > Please file a JIRA issue, @panbingkun . Done, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]

2024-05-10 Thread via GitHub
AngersZh commented on PR #46523: URL: https://github.com/apache/spark/pull/46523#issuecomment-2105466110 > Do you think you can provide test cases for this, @AngersZh ? Added a new UT to show the difference, pls take a look again @dongjoon-hyun -- This is an automated

Re: [PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105466841 Please file a JIRA issue, @panbingkun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2105463595 Thank you, @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]

2024-05-10 Thread via GitHub
pan3793 commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105463104 @dongjoon-hyun Okay, as this fail the CI, we should revert deps removing first and do more investigation later. While for supporting "legacy Hive UDF jars", I think if the user

Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105461805 In the `mima` step 1.with local maven repo cache: 2.without local maven repo cache: -- This is an automated message from the Apache Git Service. To respond to the

[PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]

2024-05-10 Thread via GitHub
panbingkun opened a new pull request, #46532: URL: https://github.com/apache/spark/pull/46532 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-10 Thread via GitHub
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession)

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-10 Thread via GitHub
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession)

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2024-05-10 Thread via GitHub
beliefer commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2105457208 Let me rebase again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]

2024-05-10 Thread via GitHub
AngersZh commented on PR #46523: URL: https://github.com/apache/spark/pull/46523#issuecomment-2105441711 > Do you think you can provide test cases for this, @AngersZh ? `SPARK-39551: Invalid plan check - invalid broadcast query stage` Can cover this, I don't know if we need

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-10 Thread via GitHub
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession)

Re: [PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105431329 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597322128 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -292,6 +301,7 @@ object LogKeys { case object LOADED_VERSION extends LogKey case

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1597322015 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder(

[PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]

2024-05-10 Thread via GitHub
panbingkun opened a new pull request, #46531: URL: https://github.com/apache/spark/pull/46531 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was

Re: [PR] [SPARK-48232][PYTHON][TESTS] Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build [spark]

2024-05-10 Thread via GitHub
HyukjinKwon commented on PR #46522: URL: https://github.com/apache/spark/pull/46522#issuecomment-2105404809 It's recovered: https://github.com/apache/spark/actions/runs/9036829724 for the record. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP] docs: restructure the docs index page [spark]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #44812: [WIP] docs: restructure the docs index page URL: https://github.com/apache/spark/pull/44812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45708][BUILD] Retry mvn deploy [spark]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #43559: [SPARK-45708][BUILD] Retry mvn deploy URL: https://github.com/apache/spark/pull/43559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #44022: [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source URL: https://github.com/apache/spark/pull/44022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46912] Use worker JAVA_HOME and SPARK_HOME instead of from submitter [spark]

2024-05-10 Thread via GitHub
github-actions[bot] commented on PR #44943: URL: https://github.com/apache/spark/pull/44943#issuecomment-2105402855 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597304010 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -292,6 +301,7 @@ object LogKeys { case object LOADED_VERSION extends LogKey

Re: [PR] [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46530: URL: https://github.com/apache/spark/pull/46530#issuecomment-2105381256 All compilations passed. Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #46530: [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API URL: https://github.com/apache/spark/pull/46530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
panbingkun commented on PR #46493: URL: https://github.com/apache/spark/pull/46493#issuecomment-2105375831 > LGTM overall. Thanks for the work! Updated, thanks! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597295650 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -214,7 +219,9 @@ public ManagedBuffer

Re: [PR] [SPARK-48205][FOLLOWUP] Add missing tags for the dataSource API [spark]

2024-05-10 Thread via GitHub
allisonwang-db commented on PR #46530: URL: https://github.com/apache/spark/pull/46530#issuecomment-2105373242 cc @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-48205][FOLLOWUP] Add missing tags for the dataSource API [spark]

2024-05-10 Thread via GitHub
allisonwang-db opened a new pull request, #46530: URL: https://github.com/apache/spark/pull/46530 ### What changes were proposed in this pull request? This is a follow-up PR for https://github.com/apache/spark/pull/46487 to add missing tags for the `dataSource` API.

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105356804 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]

2024-05-10 Thread via GitHub
allisonwang-db commented on code in PR #46487: URL: https://github.com/apache/spark/pull/46487#discussion_r1597284865 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -234,7 +234,7 @@ class SparkSession private( /** * A collection of methods for

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #46528: [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars URL: https://github.com/apache/spark/pull/46528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105347143 For the existing Hive UDFs which assumes `jodd` library, this could be a breaking change like #46528 . ``` - import jodd.datetime.JDateTime; + import

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
gengliangwang commented on PR #46493: URL: https://github.com/apache/spark/pull/46493#issuecomment-2105340839 LGTM overall. Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105335928 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #46528: URL: https://github.com/apache/spark/pull/46528#discussion_r1597270600 ## pom.xml: ## @@ -192,6 +192,8 @@ 1.17.0 1.26.1 2.16.1 + +2.6 Review Comment: Thank you. I updated the comment. -- This is an

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
viirya commented on code in PR #46528: URL: https://github.com/apache/spark/pull/46528#discussion_r1597269950 ## pom.xml: ## @@ -192,6 +192,8 @@ 1.17.0 1.26.1 2.16.1 + +2.6 Review Comment:  -- This is an automated message from the Apache Git

[PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

2024-05-10 Thread via GitHub
ianmcook opened a new pull request, #46529: URL: https://github.com/apache/spark/pull/46529 ### What changes were proposed in this pull request? This PR adds support for passing PyArrow tables to `createDataFrame()`. ### Why are the changes needed? This seems like a logical next

Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105328073 Hi, @pan3793 . It seems that we need to re-evaluate this dependency removal. Please see the following which is related to `commons-lang:commons:lang`. - #46528 --

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597257739 ## common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java: ## @@ -363,7 +367,8 @@ static MergedShuffleFileManager

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597250327 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -214,7 +219,9 @@ public ManagedBuffer

Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105298144 WDYT, cc @pan3793 , @sunchao , @LuciferYang , @yaooqinn , @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun opened a new pull request, #46528: URL: https://github.com/apache/spark/pull/46528 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-10 Thread via GitHub
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597243236 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -368,7 +382,8 @@ public int removeBlocks(String

Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2105258785 It turns out that Apache Spark is unable to support all legacy Hive UDF jar files. Let me make a follow-up. - https://github.com/apache/hive/pull/4892 -- This is an automated

Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2105239656 I locally verified that the failure of `HiveUDFDynamicLoadSuite` is consistent. ``` $ build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite

Re: [PR] [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression [spark]

2024-05-10 Thread via GitHub
kelvinjian-db commented on PR #46492: URL: https://github.com/apache/spark/pull/46492#issuecomment-2105225037 fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]

2024-05-10 Thread via GitHub
amaliujia commented on code in PR #46503: URL: https://github.com/apache/spark/pull/46503#discussion_r1597190775 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1151,6 +1151,23 @@ class CollationSQLExpressionsSuite }) } +

Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #46468: URL: https://github.com/apache/spark/pull/46468#discussion_r1597143409 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -46,7 +46,6 @@ commons-compress/1.26.1//commons-compress-1.26.1.jar

Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105092801 Please file a proper JIRA issue, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-05-10 Thread via GitHub
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105051891 also cc the 4.0.0-preview1 release manager @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46401: URL: https://github.com/apache/spark/pull/46401#issuecomment-2105050202 Merged to master for Apache Spark 4.0.0. Thank you, @fred-db , @agubichev, @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #46401: [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints URL: https://github.com/apache/spark/pull/46401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]

2024-05-10 Thread via GitHub
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105048194 @HiuKwok @dongjoon-hyun I meet an issue on the latest master branch that seems related to this change. The brief reproduction steps are: 1. make dist `dev/make-distribution.sh

Re: [PR] [SPARK-48146][SQL] Fix aggregate function in With expression child assertion [spark]

2024-05-10 Thread via GitHub
kelvinjian-db commented on code in PR #46443: URL: https://github.com/apache/spark/pull/46443#discussion_r1597045459 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala: ## @@ -92,6 +95,21 @@ object With { val commonExprRefs =

Re: [PR] [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on PR #46492: URL: https://github.com/apache/spark/pull/46492#issuecomment-2105006573 sorry it has conflicts now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1597016630 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: Please open a new JIRA and PR to discuss this. --

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1597015882 ## dev/test-dependencies.sh: ## @@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \ --non-recursive \ org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E

Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #43579: URL: https://github.com/apache/spark/pull/43579#issuecomment-2104947313 cc @mridulm , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596997363 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -106,7 +106,7 @@ class ClientSuite extends SparkFunSuite val

Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596992913 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1490,7 +1490,7 @@ private[spark] object Client extends Logging { *

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596994966 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: One additional question is should we remove

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596992306 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: the existing comments are outdated, Hadoop already

Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596991319 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -106,7 +106,7 @@ class ClientSuite extends SparkFunSuite val

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596990662 ## pom.xml: ## @@ -3446,6 +3446,7 @@ org.spark-project.spark:unused com.google.guava:guava +

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596988859 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala: ## @@ -341,7 +341,7 @@ class IntervalExpressionsSuite extends

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]

2024-05-10 Thread via GitHub
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596986722 ## dev/test-dependencies.sh: ## @@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \ --non-recursive \ org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E

Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]

2024-05-10 Thread via GitHub
cloud-fan closed pull request #46500: [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser URL: https://github.com/apache/spark/pull/46500 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on PR #46500: URL: https://github.com/apache/spark/pull/46500#issuecomment-2104909079 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2104900725 I resolved the conflicts for you~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]

2024-05-10 Thread via GitHub
fred-db commented on PR #46401: URL: https://github.com/apache/spark/pull/46401#issuecomment-2104893008 Thanks @dongjoon-hyun ! :) Rebased the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47716][SQL] Avoid view name conflict in SQLQueryTestSuite semantic sort test case [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #45855: URL: https://github.com/apache/spark/pull/45855#issuecomment-2104888508 Hi, @jchen5 and all. How do you want to proceed this PR? I guess https://github.com/apache/spark/pull/45855#discussion_r1550652914 is the latest direction. Could you update

Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]

2024-05-10 Thread via GitHub
vladimirg-db commented on code in PR #46500: URL: https://github.com/apache/spark/pull/46500#discussion_r1596925973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala: ## @@ -67,16 +67,30 @@ case class PartialResultArrayException(

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596901652 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2104807458 Could you resolve the conflicts, @beliefer ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596899527 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -499,8 +500,14 @@ object JdbcUtils extends Logging with

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596898967 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596897736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1596897091 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( }

Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596897400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf

Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #46481: [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. URL: https://github.com/apache/spark/pull/46481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]

2024-05-10 Thread via GitHub
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1596887611 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( }

Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #46481: URL: https://github.com/apache/spark/pull/46481#issuecomment-2104801936 Let me bring this first for further monitoring. Thank you, @chaoqin-li1123 and @allisonwang-db . Merged to master. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun closed pull request #45565: [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI URL: https://github.com/apache/spark/pull/45565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #45565: URL: https://github.com/apache/spark/pull/45565#issuecomment-2104797350 Thank you, @tgravescs ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596850227 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala: ## @@ -35,6 +35,56 @@ object EliminateView extends Rule[LogicalPlan] with

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596873275 ## sql/core/src/test/resources/sql-tests/results/view-schema-binding-config.sql.out: ## @@ -0,0 +1,794 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596873275 ## sql/core/src/test/resources/sql-tests/results/view-schema-binding-config.sql.out: ## @@ -0,0 +1,794 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596870581 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder(

Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]

2024-05-10 Thread via GitHub
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596869407 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -620,3 +621,65 @@ object CollationCheck extends (LogicalPlan => Unit) {

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 32+ [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596869988 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala: ## @@ -341,7 +341,7 @@ class IntervalExpressionsSuite

Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 32+ [spark]

2024-05-10 Thread via GitHub
dongjoon-hyun commented on PR #42493: URL: https://github.com/apache/spark/pull/42493#issuecomment-2104762964 Thank you for updating this, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   3   >