Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105548275 I raised SPARK-48238. And I have no solution without reverting javax => jakarta namespace migration yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]
guixiaowen commented on PR #46496: URL: https://github.com/apache/spark/pull/46496#issuecomment-2105515304 > sorry this description is very hard to read. I think you are trying to say that stage level scheduling isn't supported on k8s and yarn when dynamic allocation is disabled? When you say dynamic partitions do you mean dynamic allocation of executors? > > That is not true we added support - https://issues.apache.org/jira/browse/SPARK-45495 > > The description as is: > > > * When dynamic allocation is disabled: It allows users to specify different task resource requirements at the stage level and will use the same executors requested at startup. > > Is correct. When dynamic allocation is off, you can specify a different resource profile that changes the task requiresments but uses the existing executors. @tgravescs Thank you for your reply. Can you help me check my configuration, code, and error messages returned. Is there a problem with our understanding? set spark.dynamicAllocation.enabled=false ` import org.apache.spark.resource.ResourceProfileBuilder import org.apache.spark.resource.TaskResourceRequests val rdd = sc.range(0, 9) val rpBuilder = new ResourceProfileBuilder() val taskReqs = new TaskResourceRequests().cpus(1) val rp = rpBuilder.require(taskReqs).build rdd.withResources(rp) ` Return info: ` org.apache.spark.SparkException: TaskResourceProfiles are only supported for Standalone cluster for now when dynamic allocation is disabled. at org.apache.spark.resource.ResourceProfileManager.isSupported(ResourceProfileManager.scala:71) at org.apache.spark.resource.ResourceProfileManager.addResourceProfile(ResourceProfileManager.scala:126) at org.apache.spark.rdd.RDD.withResources(RDD.scala:1829) ... 48 elided ` According to the document description, is '**specify different task resource requirements** ' set TaskResourceRequests? ‘ When dynamic allocation is disabled: It allows users to specify different task resource requirements at the stage level and will use the same executors requested at startup. ’ Is there a problem with my understanding? Can you help me answer this? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]
beliefer commented on PR #46437: URL: https://github.com/apache/spark/pull/46437#issuecomment-2105511312 LGTM except @cloud-fan 's comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48172][SQL] Fix escaping issues in JDBC Dialects [spark]
beliefer commented on code in PR #46437: URL: https://github.com/apache/spark/pull/46437#discussion_r1597343529 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -65,7 +68,6 @@ protected String escapeSpecialCharsForLikePattern(String str) { switch (c) { case '_' -> builder.append("\\_"); case '%' -> builder.append("\\%"); -case '\'' -> builder.append("\\\'"); Review Comment: Got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [Only Test] test jaxb-runtime [spark]
panbingkun opened a new pull request, #46533: URL: https://github.com/apache/spark/pull/46533 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script [spark]
dongjoon-hyun closed pull request #46531: [SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script URL: https://github.com/apache/spark/pull/46531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105497685 I have narrowed the scope of the impact of not using `local maven repo cache` to the step `MIMA test` of the job `lint` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession) Review Comment: The git diff makes it difficult to see what I did here. I moved a few lines around at the top of this function and added conditional blocks below: ```py if type(data).__name__ == "Table": # new code here to handle PyArrow Table else: # existing code here to handle PandasDataFrameLike ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105493606 cc @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48237][BUILD] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]
panbingkun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105481077 > Please file a JIRA issue, @panbingkun . Done, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]
AngersZh commented on PR #46523: URL: https://github.com/apache/spark/pull/46523#issuecomment-2105466110 > Do you think you can provide test cases for this, @AngersZh ? Added a new UT to show the difference, pls take a look again @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]
dongjoon-hyun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105466841 Please file a JIRA issue, @panbingkun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2105463595 Thank you, @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]
pan3793 commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105463104 @dongjoon-hyun Okay, as this fail the CI, we should revert deps removing first and do more investigation later. While for supporting "legacy Hive UDF jars", I think if the user imports some classes that come from Hive transitive deps, Spark is not responsible for handling that. For example, Spark only includes part of Hive deps (I mean the all jars shipped by Hive binary tgz), for example, Hive ships `groovy-all-2.4.4.jar` but Spark does not, if user's UDF imports classes from `groovy-all-2.4.4.jar`, it should fail on Spark due to ClassNotFound, in this case user should add thrid party deps by themselves. I believe jodd/commons-lang 2.x/jackson 1.x are in same position. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]
panbingkun commented on PR #46532: URL: https://github.com/apache/spark/pull/46532#issuecomment-2105461805 In the `mima` step 1.with local maven repo cache: 2.without local maven repo cache: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [TEST Only] test sbt 1.10.0 performance without local maven repo cache [spark]
panbingkun opened a new pull request, #46532: URL: https://github.com/apache/spark/pull/46532 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession) Review Comment: The git diff makes it difficult to see what I did here. I moved a few lines around at the top of this function and added conditional blocks below: ```py if type(data).__name__ == 'Table': # new code here to handle PyArrow Table else: # existing code here to handle PandasDataFrameLike ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession) Review Comment: The git diff makes it difficult to see what I did here. I just moved a few lines around at the top of this function and added conditional blocks below: ```py if type(data).__name__ == 'Table': # new code here to handle PyArrow Table else: # existing code here to handle PandasDataFrameLike ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
beliefer commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2105457208 Let me rebase again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48155][SQL] AQEPropagateEmptyRelation for join should check if remain child is just BroadcastQueryStageExec [spark]
AngersZh commented on PR #46523: URL: https://github.com/apache/spark/pull/46523#issuecomment-2105441711 > Do you think you can provide test cases for this, @AngersZh ? `SPARK-39551: Invalid plan check - invalid broadcast query stage` Can cover this, I don't know if we need to remove `ValidateSparkPlan rule`, it's too weird and rough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]
ianmcook commented on code in PR #46529: URL: https://github.com/apache/spark/pull/46529#discussion_r1597327784 ## python/pyspark/sql/pandas/conversion.py: ## @@ -360,42 +361,52 @@ def createDataFrame( # type: ignore[misc] assert isinstance(self, SparkSession) Review Comment: The git diff makes it difficult to see what I did here. I just moved a few lines around at the top of this function and added conditional blocks below: ```py if isinstance(data, pa.Table): # some new code here else: # the existing code here ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]
panbingkun commented on PR #46531: URL: https://github.com/apache/spark/pull/46531#issuecomment-2105431329 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597322128 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -292,6 +301,7 @@ object LogKeys { case object LOADED_VERSION extends LogKey case object LOAD_FACTOR extends LogKey case object LOAD_TIME extends LogKey + case object LOCAL_DIRS extends LogKey Review Comment: Hmm ..., the following uses it, do we should merge it to `PATH`? https://github.com/apache/spark/assets/15246973/c62dd763-969b-46c3-b4be-8b6eb84d4395;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
srielau commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1597322015 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder( TableCapabilityCheck +: CommandCheck +: CollationCheck +: +ViewSyncSchemaToMetaStore +: Review Comment: There already (?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [MINOR] After executing `test-dependencies.sh`, the dir `dev/pr-deps` should be deleted [spark]
panbingkun opened a new pull request, #46531: URL: https://github.com/apache/spark/pull/46531 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48232][PYTHON][TESTS] Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build [spark]
HyukjinKwon commented on PR #46522: URL: https://github.com/apache/spark/pull/46522#issuecomment-2105404809 It's recovered: https://github.com/apache/spark/actions/runs/9036829724 for the record. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP] docs: restructure the docs index page [spark]
github-actions[bot] closed pull request #44812: [WIP] docs: restructure the docs index page URL: https://github.com/apache/spark/pull/44812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45708][BUILD] Retry mvn deploy [spark]
github-actions[bot] closed pull request #43559: [SPARK-45708][BUILD] Retry mvn deploy URL: https://github.com/apache/spark/pull/43559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]
github-actions[bot] closed pull request #44022: [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source URL: https://github.com/apache/spark/pull/44022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46912] Use worker JAVA_HOME and SPARK_HOME instead of from submitter [spark]
github-actions[bot] commented on PR #44943: URL: https://github.com/apache/spark/pull/44943#issuecomment-2105402855 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597304010 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -292,6 +301,7 @@ object LogKeys { case object LOADED_VERSION extends LogKey case object LOAD_FACTOR extends LogKey case object LOAD_TIME extends LogKey + case object LOCAL_DIRS extends LogKey Review Comment: we don't need this anymore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API [spark]
dongjoon-hyun commented on PR #46530: URL: https://github.com/apache/spark/pull/46530#issuecomment-2105381256 All compilations passed. Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API [spark]
dongjoon-hyun closed pull request #46530: [SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API URL: https://github.com/apache/spark/pull/46530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on PR #46493: URL: https://github.com/apache/spark/pull/46493#issuecomment-2105375831 > LGTM overall. Thanks for the work! Updated, thanks! ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
panbingkun commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597295650 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -214,7 +219,9 @@ public ManagedBuffer getRddBlockData( * this method. */ public void applicationRemoved(String appId, boolean cleanupLocalDirs) { -logger.info("Application {} removed, cleanupLocalDirs = {}", appId, cleanupLocalDirs); +logger.info("Application {} removed, cleanupLocalDirs = {}", + MDC.of(LogKeys.APP_ID$.MODULE$, appId), + MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, cleanupLocalDirs)); Review Comment: Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48205][FOLLOWUP] Add missing tags for the dataSource API [spark]
allisonwang-db commented on PR #46530: URL: https://github.com/apache/spark/pull/46530#issuecomment-2105373242 cc @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48205][FOLLOWUP] Add missing tags for the dataSource API [spark]
allisonwang-db opened a new pull request, #46530: URL: https://github.com/apache/spark/pull/46530 ### What changes were proposed in this pull request? This is a follow-up PR for https://github.com/apache/spark/pull/46487 to add missing tags for the `dataSource` API. ### Why are the changes needed? To address comments from a previous PR. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105356804 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48205][PYTHON] Remove the private[sql] modifier for Python data sources [spark]
allisonwang-db commented on code in PR #46487: URL: https://github.com/apache/spark/pull/46487#discussion_r1597284865 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -234,7 +234,7 @@ class SparkSession private( /** * A collection of methods for registering user-defined data sources. */ - private[sql] def dataSource: DataSourceRegistration = sessionState.dataSourceRegistration Review Comment: Will do! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun closed pull request #46528: [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars URL: https://github.com/apache/spark/pull/46528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]
dongjoon-hyun commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105347143 For the existing Hive UDFs which assumes `jodd` library, this could be a breaking change like #46528 . ``` - import jodd.datetime.JDateTime; + import org.apache.hadoop.hive.ql.io.parquet.timestamp.datetime.JDateTime; ``` Sorry but let me revert this first. We need a more graceful way of removing or additional verified migration path (documentation and config). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
gengliangwang commented on PR #46493: URL: https://github.com/apache/spark/pull/46493#issuecomment-2105340839 LGTM overall. Thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105335928 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun commented on code in PR #46528: URL: https://github.com/apache/spark/pull/46528#discussion_r1597270600 ## pom.xml: ## @@ -192,6 +192,8 @@ 1.17.0 1.26.1 2.16.1 + +2.6 Review Comment: Thank you. I updated the comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
viirya commented on code in PR #46528: URL: https://github.com/apache/spark/pull/46528#discussion_r1597269950 ## pom.xml: ## @@ -192,6 +192,8 @@ 1.17.0 1.26.1 2.16.1 + +2.6 Review Comment: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]
ianmcook opened a new pull request, #46529: URL: https://github.com/apache/spark/pull/46529 ### What changes were proposed in this pull request? This PR adds support for passing PyArrow tables to `createDataFrame()`. ### Why are the changes needed? This seems like a logical next step after the addition of a `toArrow()` DataFrame method in #45481. ### Does this PR introduce _any_ user-facing change? TBD ### How was this patch tested? TBD ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48230][BUILD] Remove unused `jodd-core` [spark]
dongjoon-hyun commented on PR #46520: URL: https://github.com/apache/spark/pull/46520#issuecomment-2105328073 Hi, @pan3793 . It seems that we need to re-evaluate this dependency removal. Please see the following which is related to `commons-lang:commons:lang`. - #46528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597257739 ## common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java: ## @@ -363,7 +367,8 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance( return mergeManagerSubClazz.getConstructor(TransportConf.class, File.class) .newInstance(conf, mergeManagerFile); } catch (Exception e) { - defaultLogger.error("Unable to create an instance of {}", mergeManagerImplClassName); + defaultLogger.error("Unable to create an instance of {}", +MDC.of(LogKeys.CLASS_NAME$.MODULE$, mergeManagerImplClassName)); Review Comment: CLASS_NAME looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597250327 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -214,7 +219,9 @@ public ManagedBuffer getRddBlockData( * this method. */ public void applicationRemoved(String appId, boolean cleanupLocalDirs) { -logger.info("Application {} removed, cleanupLocalDirs = {}", appId, cleanupLocalDirs); +logger.info("Application {} removed, cleanupLocalDirs = {}", + MDC.of(LogKeys.APP_ID$.MODULE$, appId), + MDC.of(LogKeys.LOCAL_DIRS$.MODULE$, cleanupLocalDirs)); Review Comment: cleanupLocalDirs is a boolean. let's just use `CLEANUP_LOCAL_DIRS` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun commented on PR #46528: URL: https://github.com/apache/spark/pull/46528#issuecomment-2105298144 WDYT, cc @pan3793 , @sunchao , @LuciferYang , @yaooqinn , @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars [spark]
dongjoon-hyun opened a new pull request, #46528: URL: https://github.com/apache/spark/pull/46528 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48209][CORE] Common (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46493: URL: https://github.com/apache/spark/pull/46493#discussion_r1597243236 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java: ## @@ -368,7 +382,8 @@ public int removeBlocks(String appId, String execId, String[] blockIds) { if (file.delete()) { numRemovedBlocks++; } else { -logger.warn("Failed to delete block: " + file.getAbsolutePath()); +logger.warn("Failed to delete block: {}", Review Comment: yes, path is fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2105258785 It turns out that Apache Spark is unable to support all legacy Hive UDF jar files. Let me make a follow-up. - https://github.com/apache/hive/pull/4892 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on PR #46468: URL: https://github.com/apache/spark/pull/46468#issuecomment-2105239656 I locally verified that the failure of `HiveUDFDynamicLoadSuite` is consistent. ``` $ build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite test - Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDF 13:33:42.810 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55031/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version. *** RUN ABORTED *** A needed class was not found. This could be due to an error in your runpath. Missing class: org/apache/commons/lang/StringUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:184) at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:164) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185) ... Cause: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression [spark]
kelvinjian-db commented on PR #46492: URL: https://github.com/apache/spark/pull/46492#issuecomment-2105225037 fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48229][SQL] Add collation support for inputFile expressions [spark]
amaliujia commented on code in PR #46503: URL: https://github.com/apache/spark/pull/46503#discussion_r1597190775 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1151,6 +1151,23 @@ class CollationSQLExpressionsSuite }) } + test("Support InputFileName expression with collation") { +// Supported collations +Seq("UTF8_BINARY", "UTF8_BINARY_LCASE", "UNICODE", "UNICODE_CI").foreach(collationName => { Review Comment: Stopping by this PR, maybe this was discussed a while ago, just curious: why these possible `collationName` is not listed by a enum? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10 [spark]
dongjoon-hyun commented on code in PR #46468: URL: https://github.com/apache/spark/pull/46468#discussion_r1597143409 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -46,7 +46,6 @@ commons-compress/1.26.1//commons-compress-1.26.1.jar commons-crypto/1.1.0//commons-crypto-1.1.0.jar commons-dbcp/1.4//commons-dbcp-1.4.jar commons-io/2.16.1//commons-io-2.16.1.jar -commons-lang/2.6//commons-lang-2.6.jar Review Comment: This seems to be too hasty or to have a bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]
dongjoon-hyun commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105092801 Please file a proper JIRA issue, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105051891 also cc the 4.0.0-preview1 release manager @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]
dongjoon-hyun commented on PR #46401: URL: https://github.com/apache/spark/pull/46401#issuecomment-2105050202 Merged to master for Apache Spark 4.0.0. Thank you, @fred-db , @agubichev, @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]
dongjoon-hyun closed pull request #46401: [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints URL: https://github.com/apache/spark/pull/46401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 [spark]
pan3793 commented on PR #45154: URL: https://github.com/apache/spark/pull/45154#issuecomment-2105048194 @HiuKwok @dongjoon-hyun I meet an issue on the latest master branch that seems related to this change. The brief reproduction steps are: 1. make dist `dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn` 2. setup yarn/hadoop conf, and perform `spark-sql --master=yarn` ``` 2024-05-10 17:36:32 ERROR SparkContext: Error initializing SparkContext. org.sparkproject.jetty.util.MultiException: Multiple exceptions at org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) ~[scala-library-2.13.13.jar:?] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) ~[scala-library-2.13.13.jar:?] at scala.collection.AbstractIterable.foreach(Iterable.scala:935) ~[scala-library-2.13.13.jar:?] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] at org.apache.spark.SparkContext.(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:405) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:162) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[?:?] at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at
Re: [PR] [SPARK-48146][SQL] Fix aggregate function in With expression child assertion [spark]
kelvinjian-db commented on code in PR #46443: URL: https://github.com/apache/spark/pull/46443#discussion_r1597045459 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/With.scala: ## @@ -92,6 +95,21 @@ object With { val commonExprRefs = commonExprDefs.map(new CommonExpressionRef(_)) With(replaced(commonExprRefs), commonExprDefs) } + + private[sql] def childContainsUnsupportedAggExpr(withExpr: With): Boolean = { +lazy val commonExprIds = withExpr.defs.map(_.id).toSet +withExpr.child.exists { + case agg: AggregateExpression => Review Comment: i agree, or we could add some pruning-based tree traversal functions like `existsWithPruning` (similar to `transformWithPruning`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48206][SQL][TESTS] Add tests for window rewrites with RewriteWithExpression [spark]
cloud-fan commented on PR #46492: URL: https://github.com/apache/spark/pull/46492#issuecomment-2105006573 sorry it has conflicts now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1597016630 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: Please open a new JIRA and PR to discuss this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1597015882 ## dev/test-dependencies.sh: ## @@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \ --non-recursive \ org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') # dependency:get for guava and jetty-io are workaround for SPARK-37302. -GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9.]+$") +GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9\.]+") Review Comment: Please open a new PR for this, @pan3793 . It's worth to have an independent JIRA issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]
dongjoon-hyun commented on PR #43579: URL: https://github.com/apache/spark/pull/43579#issuecomment-2104947313 cc @mridulm , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596997363 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -106,7 +106,7 @@ class ClientSuite extends SparkFunSuite val conf = new Configuration() val sparkConf = new SparkConf() .set(SPARK_JARS, Seq(SPARK)) - .set(USER_CLASS_PATH_FIRST, true) + .set(DRIVER_USER_CLASS_PATH_FIRST, true) Review Comment: Please make a new test case for `DRIVER_USER_CLASS_PATH_FIRST`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596992913 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1490,7 +1490,7 @@ private[spark] object Client extends Logging { * Populate the classpath entry in the given environment map. * * User jars are generally not added to the JVM's system classpath; those are handled by the AM - * and executor backend. When the deprecated `spark.yarn.user.classpath.first` is used, user jars + * and executor backend. When the `spark.driver.userClassPathFirst` is used, user jars Review Comment: New comment is correct? It seems that we miss `spark.executor.userClassPathFirst` case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596994966 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: One additional question is should we remove `${hadoop.deps.scope}`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596992306 ## assembly/pom.xml: ## @@ -75,11 +75,7 @@ ${project.version} - + Review Comment: the existing comments are outdated, Hadoop already switched to `hadoop-shaded-guava-1.2.0.jar` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45717][YARN] Avoid use `spark.yarn.user.classpath.first` [spark]
dongjoon-hyun commented on code in PR #43579: URL: https://github.com/apache/spark/pull/43579#discussion_r1596991319 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -106,7 +106,7 @@ class ClientSuite extends SparkFunSuite val conf = new Configuration() val sparkConf = new SparkConf() .set(SPARK_JARS, Seq(SPARK)) - .set(USER_CLASS_PATH_FIRST, true) + .set(DRIVER_USER_CLASS_PATH_FIRST, true) Review Comment: To @cxzl25, FYI, usually, Apache Spark is reluctant to change a test coverage because the deprecated one is still used in the production. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596990662 ## pom.xml: ## @@ -3446,6 +3446,7 @@ org.spark-project.spark:unused com.google.guava:guava + com.google.guava:failureaccess Review Comment: I refer to the shading rules in the connect module to handle Guava dependencies, cc @LuciferYang, would be great if you have an opportunity to take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596988859 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala: ## @@ -341,7 +341,7 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { Seq( (Period.ofMonths(2), Int.MaxValue) -> "overflow", - (Period.ofMonths(Int.MinValue), 10d) -> "not in range", + (Period.ofMonths(Int.MinValue), 10d) -> "out of range", Review Comment: Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 33.1.0-jre [spark]
pan3793 commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596986722 ## dev/test-dependencies.sh: ## @@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \ --non-recursive \ org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') # dependency:get for guava and jetty-io are workaround for SPARK-37302. -GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9.]+$") +GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9\.]+") Review Comment: update because the new Guava versions have `-jre` or `-android` suffix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]
cloud-fan closed pull request #46500: [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser URL: https://github.com/apache/spark/pull/46500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]
cloud-fan commented on PR #46500: URL: https://github.com/apache/spark/pull/46500#issuecomment-2104909079 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2104900725 I resolved the conflicts for you~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints [spark]
fred-db commented on PR #46401: URL: https://github.com/apache/spark/pull/46401#issuecomment-2104893008 Thanks @dongjoon-hyun ! :) Rebased the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47716][SQL] Avoid view name conflict in SQLQueryTestSuite semantic sort test case [spark]
dongjoon-hyun commented on PR #45855: URL: https://github.com/apache/spark/pull/45855#issuecomment-2104888508 Hi, @jchen5 and all. How do you want to proceed this PR? I guess https://github.com/apache/spark/pull/45855#discussion_r1550652914 is the latest direction. Could you update the PR in that direction? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]
vladimirg-db commented on code in PR #46500: URL: https://github.com/apache/spark/pull/46500#discussion_r1596925973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala: ## @@ -67,16 +67,30 @@ case class PartialResultArrayException( extends Exception(cause) /** - * Exception thrown when the underlying parser meet a bad record and can't parse it. + * Exception thrown when the underlying parser met a bad record and can't parse it. * @param record a function to return the record that cause the parser to fail * @param partialResults a function that returns an row array, which is the partial results of * parsing this bad record. - * @param cause the actual exception about why the record is bad and can't be parsed. + * @param cause the actual exception about why the record is bad and can't be parsed. It's better + * to use `LazyBadRecordCauseWrapper` here to delay heavy cause construction + * until it's needed. */ case class BadRecordException( @transient record: () => UTF8String, @transient partialResults: () => Array[InternalRow] = () => Array.empty[InternalRow], -cause: Throwable) extends Exception(cause) +cause: Throwable) extends Exception(cause) { + override def getStackTrace(): Array[StackTraceElement] = new Array[StackTraceElement](0) Review Comment: Added a doc for that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596901652 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val USE_LOCAL_SESSION_CALENDAR = buildConf("spark.sql.jdbc.useLocalSessionCalendar") Review Comment: let's turn it into a legacy config ``` spark.sql.legacy.jdbc.useNullCalendar ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]
dongjoon-hyun commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2104807458 Could you resolve the conflicts, @beliefer ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596899527 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -499,8 +500,14 @@ object JdbcUtils extends Logging with SQLConfHelper { } case TimestampType => + val calendar = if (conf.useLocalSessionCalendar) { +dialect.getDatabaseCalendar.get Review Comment: ```suggestion dialect.getDatabaseCalendar.orNull ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596898967 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val USE_LOCAL_SESSION_CALENDAR = buildConf("spark.sql.jdbc.useLocalSessionCalendar") +.doc("When retrieving timestamp NTZ columns from JDBC data source, user will see the " + + "values as they where shown in data source only if the JVM time zone is equal " + + "to the spark local session time zone. Since these two can differ, if this " + + "configuration property is set to true, rs.getTimestamp and rs.setTimestamp " + + "API calls will use calendar with spark local session time zone as the second argument") +.version("4.0.0") +.booleanConf Review Comment: let's add `.internal`. Users should not set this. It's only for unknown bugs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596897736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val USE_LOCAL_SESSION_CALENDAR = buildConf("spark.sql.jdbc.useLocalSessionCalendar") +.doc("When retrieving timestamp NTZ columns from JDBC data source, user will see the " + Review Comment: ```suggestion .doc("When read timestamp NTZ columns from JDBC data source as Spark timestamp LTZ, user will see the " + ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1596897091 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( } } + /** + * Utility class for collation aware Levenshtein function. + */ + public static class Levenshtein{ + +/** + * Implementation of SubstringEquals interface for collation aware comparison of two substrings. + */ +private static class CollationSubstringEquals implements UTF8String.SubstringEquals { + private final int collationId; + private final UTF8String left, right; + + CollationSubstringEquals(int collationId) { +this.collationId = collationId; +this.left = new UTF8String(); +this.right = new UTF8String(); + } + + @Override + public boolean equals(UTF8String left, UTF8String right, int posLeft, int posRight, +int lenLeft, int lenRight) { +this.left.moveAddress(left, posLeft, lenLeft); +this.right.moveAddress(right, posRight, lenRight); +return CollationFactory.fetchCollation(collationId).equalsFunction +.apply(this.left, this.right); + } +} + +public static Integer exec(final UTF8String left, final UTF8String right, final int collationId){ + CollationFactory.Collation collation = CollationFactory.fetchCollation(collationId); + + if (collation.supportsBinaryEquality){ +return left.levenshteinDistance(right); + } + else{ +return left.levenshteinDistance(right, new CollationSubstringEquals(collationId)); + } +} + +public static Integer execWithThreshold(final UTF8String left, final UTF8String right, final int threshold, final int collationId){ + CollationFactory.Collation collation = CollationFactory.fetchCollation(collationId); + + if (collation.supportsBinaryEquality){ +return left.levenshteinDistance(right, threshold); + } + else{ +return left.levenshteinDistance(right, threshold, new CollationSubstringEquals(collationId)); + } Review Comment: This should be done in other places as well (both threshold & non-threshold Levenshtein). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47424][SQL] Add getDatabaseCalendar method to the JdbcDialect [spark]
cloud-fan commented on code in PR #45537: URL: https://github.com/apache/spark/pull/45537#discussion_r1596897400 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3992,6 +3992,16 @@ object SQLConf { .booleanConf .createWithDefault(false) + val USE_LOCAL_SESSION_CALENDAR = buildConf("spark.sql.jdbc.useLocalSessionCalendar") +.doc("When retrieving timestamp NTZ columns from JDBC data source, user will see the " + + "values as they where shown in data source only if the JVM time zone is equal " + Review Comment: ```suggestion "values as they were shown in data source only if the JVM time zone is equal " + ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]
dongjoon-hyun closed pull request #46481: [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. URL: https://github.com/apache/spark/pull/46481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47415][SQL] Collation support: Levenshtein [spark]
nikolamand-db commented on code in PR #45963: URL: https://github.com/apache/spark/pull/45963#discussion_r1596887611 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( } } + /** + * Utility class for collation aware Levenshtein function. + */ + public static class Levenshtein{ Review Comment: ```suggestion public static class Levenshtein { ``` ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( } } + /** + * Utility class for collation aware Levenshtein function. + */ + public static class Levenshtein{ + +/** + * Implementation of SubstringEquals interface for collation aware comparison of two substrings. + */ +private static class CollationSubstringEquals implements UTF8String.SubstringEquals { + private final int collationId; + private final UTF8String left, right; + + CollationSubstringEquals(int collationId) { +this.collationId = collationId; +this.left = new UTF8String(); +this.right = new UTF8String(); + } + + @Override + public boolean equals(UTF8String left, UTF8String right, int posLeft, int posRight, +int lenLeft, int lenRight) { +this.left.moveAddress(left, posLeft, lenLeft); +this.right.moveAddress(right, posRight, lenRight); +return CollationFactory.fetchCollation(collationId).equalsFunction +.apply(this.left, this.right); + } +} + +public static Integer exec(final UTF8String left, final UTF8String right, final int collationId){ + CollationFactory.Collation collation = CollationFactory.fetchCollation(collationId); + + if (collation.supportsBinaryEquality){ +return left.levenshteinDistance(right); + } + else{ Review Comment: ```suggestion else { ``` ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( } } + /** + * Utility class for collation aware Levenshtein function. + */ + public static class Levenshtein{ + +/** + * Implementation of SubstringEquals interface for collation aware comparison of two substrings. + */ +private static class CollationSubstringEquals implements UTF8String.SubstringEquals { + private final int collationId; + private final UTF8String left, right; + + CollationSubstringEquals(int collationId) { +this.collationId = collationId; +this.left = new UTF8String(); +this.right = new UTF8String(); + } + + @Override + public boolean equals(UTF8String left, UTF8String right, int posLeft, int posRight, +int lenLeft, int lenRight) { +this.left.moveAddress(left, posLeft, lenLeft); +this.right.moveAddress(right, posRight, lenRight); +return CollationFactory.fetchCollation(collationId).equalsFunction +.apply(this.left, this.right); + } +} + +public static Integer exec(final UTF8String left, final UTF8String right, final int collationId){ + CollationFactory.Collation collation = CollationFactory.fetchCollation(collationId); + + if (collation.supportsBinaryEquality){ Review Comment: ```suggestion if (collation.supportsBinaryEquality) { ``` ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -722,6 +722,65 @@ public static UTF8String execLowercase( } } + /** + * Utility class for collation aware Levenshtein function. + */ + public static class Levenshtein{ + +/** + * Implementation of SubstringEquals interface for collation aware comparison of two substrings. + */ +private static class CollationSubstringEquals implements UTF8String.SubstringEquals { + private final int collationId; + private final UTF8String left, right; + + CollationSubstringEquals(int collationId) { +this.collationId = collationId; +this.left = new UTF8String(); +this.right = new UTF8String(); + } + + @Override + public boolean equals(UTF8String left, UTF8String right, int posLeft, int posRight, +int lenLeft, int lenRight) { +this.left.moveAddress(left, posLeft, lenLeft); +this.right.moveAddress(right, posRight, lenRight); +return CollationFactory.fetchCollation(collationId).equalsFunction +.apply(this.left, this.right); + } +} + +public static Integer exec(final UTF8String left, final UTF8String right, final int collationId){ + CollationFactory.Collation collation =
Re: [PR] [SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once. [spark]
dongjoon-hyun commented on PR #46481: URL: https://github.com/apache/spark/pull/46481#issuecomment-2104801936 Let me bring this first for further monitoring. Thank you, @chaoqin-li1123 and @allisonwang-db . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]
dongjoon-hyun closed pull request #45565: [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI URL: https://github.com/apache/spark/pull/45565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI [spark]
dongjoon-hyun commented on PR #45565: URL: https://github.com/apache/spark/pull/45565#issuecomment-2104797350 Thank you, @tgravescs ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596850227 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala: ## @@ -35,6 +35,56 @@ object EliminateView extends Rule[LogicalPlan] with CastSupport { } } +/** + * ViewBindingMode is used to specify the expected schema binding mode when we want to create or + * replace a view in [[CreateViewStatement]]. + */ +sealed trait ViewSchemaMode { + override val toString: String = getClass.getSimpleName.stripSuffix("$") +} + +/** + * SchemaBinding means the view only tolerates minimal changes to the underlying schema. + * It can tolerate extra columns in SELECT * and upcast to more generic types. + */ +object SchemaBinding extends ViewSchemaMode { + override val toString: String = "BINDING" Review Comment: ```suggestion override def toString: String = "BINDING" ``` since the function body is just a string literal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596873275 ## sql/core/src/test/resources/sql-tests/results/view-schema-binding-config.sql.out: ## @@ -0,0 +1,794 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +SET spark.sql.viewSchemaBindingMode +-- !query schema +struct +-- !query output +spark.sql.viewSchemaBindingModeCOMPENSATION + + +-- !query +SET spark.sql.viewSchemaBindingMode = EVOLUTION +-- !query schema +struct<> +-- !query output +java.lang.IllegalArgumentException +The value of spark.sql.viewSchemaBindingMode should be one of COMPENSATION, DISABLED, but was EVOLUTION + + +-- !query +SET spark.sql.viewSchemaBindingMode = DISABLED +-- !query schema +struct +-- !query output +spark.sql.viewSchemaBindingModeDISABLED + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA BINDING AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 44, +"fragment" : "WITH SCHEMA BINDING" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA COMPENSATION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 49, +"fragment" : "WITH SCHEMA COMPENSATION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA TYPE EVOLUTION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 51, +"fragment" : "WITH SCHEMA TYPE EVOLUTION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA EVOLUTION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 46, +"fragment" : "WITH SCHEMA EVOLUTION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v AS SELECT 1 +-- !query schema +struct<> +-- !query output + + + +-- !query +DESCRIBE EXTENDED v +-- !query schema +struct +-- !query output +1 int + +# Detailed Table Information +Catalogspark_catalog +Database default +Table v +Created Time [not included in comparison] +Last Access [not included in comparison] +Created By [not included in comparison] +Type VIEW +View Text SELECT 1 +View Original Text SELECT 1 +View Catalog and Namespace spark_catalog.default +View Query Output Columns [1] + + +-- !query +SHOW TABLE EXTENDED LIKE 'v' Review Comment: what extra information we want to get from `SHOW TABLE EXTENDED`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596873275 ## sql/core/src/test/resources/sql-tests/results/view-schema-binding-config.sql.out: ## @@ -0,0 +1,794 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +SET spark.sql.viewSchemaBindingMode +-- !query schema +struct +-- !query output +spark.sql.viewSchemaBindingModeCOMPENSATION + + +-- !query +SET spark.sql.viewSchemaBindingMode = EVOLUTION +-- !query schema +struct<> +-- !query output +java.lang.IllegalArgumentException +The value of spark.sql.viewSchemaBindingMode should be one of COMPENSATION, DISABLED, but was EVOLUTION + + +-- !query +SET spark.sql.viewSchemaBindingMode = DISABLED +-- !query schema +struct +-- !query output +spark.sql.viewSchemaBindingModeDISABLED + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA BINDING AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 44, +"fragment" : "WITH SCHEMA BINDING" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA COMPENSATION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 49, +"fragment" : "WITH SCHEMA COMPENSATION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA TYPE EVOLUTION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 51, +"fragment" : "WITH SCHEMA TYPE EVOLUTION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v WITH SCHEMA EVOLUTION AS SELECT 1 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.catalyst.parser.ParseException +{ + "errorClass" : "FEATURE_NOT_ENABLED", + "sqlState" : "56038", + "messageParameters" : { +"configKey" : "spark.sql.viewSchemaBindingMode", +"configValue" : "COMPENSATION", +"featureName" : "VIEW ... WITH SCHEMA ..." + }, + "queryContext" : [ { +"objectType" : "", +"objectName" : "", +"startIndex" : 26, +"stopIndex" : 46, +"fragment" : "WITH SCHEMA EVOLUTION" + } ] +} + + +-- !query +CREATE OR REPLACE VIEW v AS SELECT 1 +-- !query schema +struct<> +-- !query output + + + +-- !query +DESCRIBE EXTENDED v +-- !query schema +struct +-- !query output +1 int + +# Detailed Table Information +Catalogspark_catalog +Database default +Table v +Created Time [not included in comparison] +Last Access [not included in comparison] +Created By [not included in comparison] +Type VIEW +View Text SELECT 1 +View Original Text SELECT 1 +View Catalog and Namespace spark_catalog.default +View Query Output Columns [1] + + +-- !query +SHOW TABLE EXTENDED LIKE 'v' Review Comment: what extra information we want to get from `SHOW TABLE EXTENDED`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596870581 ## sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala: ## @@ -224,6 +224,7 @@ abstract class BaseSessionStateBuilder( TableCapabilityCheck +: CommandCheck +: CollationCheck +: +ViewSyncSchemaToMetaStore +: Review Comment: we should put this rule in `HiveSessionStateBuilder` as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48031][SQL] Support view schema evolution [spark]
cloud-fan commented on code in PR #46267: URL: https://github.com/apache/spark/pull/46267#discussion_r1596869407 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -620,3 +621,65 @@ object CollationCheck extends (LogicalPlan => Unit) { private def isCollationExpression(expression: Expression): Boolean = expression.isInstanceOf[Collation] || expression.isInstanceOf[Collate] } + + +/** + * This rule checks for references to views WITH SCHEMA [TYPE] EVOLUTION and synchronizes the + * catalog if evolution was detected. + * It does so by walking the resolved plan looking for View operators for persisted views. + */ +object ViewSyncSchemaToMetaStore extends (LogicalPlan => Unit) { + def apply(plan: LogicalPlan): Unit = { +plan.foreach { + case View(metaData, false, viewQuery) +if (metaData.viewSchemaMode == SchemaTypeEvolution || + metaData.viewSchemaMode == SchemaEvolution) => +val viewSchemaMode = metaData.viewSchemaMode +val viewFields = metaData.schema.fields +val viewQueryFields = viewQuery.schema.fields +val session = SparkSession.getActiveSession.get +val redoSignature = + viewSchemaMode == SchemaEvolution && viewFields.length != viewQueryFields.length +val fieldNames = viewQuery.schema.fieldNames +val newProperties = if (redoSignature) { + generateViewProperties( +metaData.properties, +session, +fieldNames, +fieldNames, +metaData.viewSchemaMode) +} else { + metaData.properties +} + +val redo = redoSignature || viewFields.zipWithIndex.exists { case (field, index) => + val planField = viewQueryFields(index) + (field.dataType != planField.dataType || +field.nullable != planField.nullable || +(viewSchemaMode == SchemaEvolution && ( Review Comment: if we hit this condition, shall we also redo signature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 32+ [spark]
dongjoon-hyun commented on code in PR #42493: URL: https://github.com/apache/spark/pull/42493#discussion_r1596869988 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/IntervalExpressionsSuite.scala: ## @@ -341,7 +341,7 @@ class IntervalExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { Seq( (Period.ofMonths(2), Int.MaxValue) -> "overflow", - (Period.ofMonths(Int.MinValue), 10d) -> "not in range", + (Period.ofMonths(Int.MinValue), 10d) -> "out of range", Review Comment: So, this error message is originated from Guava? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44811][BUILD] Upgrade Guava to 32+ [spark]
dongjoon-hyun commented on PR #42493: URL: https://github.com/apache/spark/pull/42493#issuecomment-2104762964 Thank you for updating this, @pan3793 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org