[GitHub] [spark] panbingkun opened a new pull request, #41721: [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes
panbingkun opened a new pull request, #41721: URL: https://github.com/apache/spark/pull/41721 ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] and delete some unused error classes. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #41687: [SPARK-44131][SQL] Add call_function and deprecate call_udf for Scala API
beliefer commented on PR #41687: URL: https://github.com/apache/spark/pull/41687#issuecomment-1605876800 ping @cloud-fan @zhengruifeng cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #41476: [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]
beliefer commented on PR #41476: URL: https://github.com/apache/spark/pull/41476#issuecomment-1605875836 ping @MaxGekk Rebased. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93
LuciferYang commented on PR #41681: URL: https://github.com/apache/spark/pull/41681#issuecomment-1605873942 https://github.com/apache/arrow/pull/36211/files arrow already upgrade Netty to 4.1.94.Final and this may be released in arrow 13.0, I'm not sure if this is available because it also updates the grpc to 1.56.0, which is different from Spark Connect using a different grpc version On the other hand, Netty 4.1.94 fixes a CVE https://github.com/apache/arrow/pull/36211, I want to know if this CVE will affect Spark? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41720: [SPARK-43969][SQL][TESTS][FOLLOWUP] Update `numeric.sql.out.java21`
LuciferYang commented on PR #41720: URL: https://github.com/apache/spark/pull/41720#issuecomment-1605868725 cc @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request, #41720: [SPARK-43969][SQL][TESTS][FOLLOWUP] Update `numeric.sql.out.java21`
LuciferYang opened a new pull request, #41720: URL: https://github.com/apache/spark/pull/41720 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/41458 updated `numeric.sql.out` but not update `numeric.sql.out.java21`, this pr updated `numeric.sql.out.java21` for Java 21. ### Why are the changes needed? Fix golden file for Java 21. https://github.com/apache/spark/actions/runs/5362442727/jobs/9729315685 ``` 2023-06-24T04:54:07.6401972Z [0m[[0m[0minfo[0m] [0m[0m[31m- postgreSQL/numeric.sql *** FAILED *** (1 minute, 4 seconds)[0m[0m 2023-06-24T04:54:07.6403269Z [0m[[0m[0minfo[0m] [0m[0m[31m postgreSQL/numeric.sql[0m[0m 2023-06-24T04:54:07.6404580Z [0m[[0m[0minfo[0m] [0m[0m[31m Expected "...OLUMN_ARITY_MISMATCH[",[0m[0m 2023-06-24T04:54:07.6405125Z [0m[[0m[0minfo[0m] [0m[0m[31m "sqlState" : "21S01",[0m[0m 2023-06-24T04:54:07.6405768Z [0m[[0m[0minfo[0m] [0m[0m[31m "messageParameters" : {[0m[0m 2023-06-24T04:54:07.6406338Z [0m[[0m[0minfo[0m] [0m[0m[31m "dataColumns" : "'id', 'id', 'val', 'val', '(val * val)'",[0m[0m 2023-06-24T04:54:07.6412205Z [0m[[0m[0minfo[0m] [0m[0m[31m "reason" : "too many data columns",[0m[0m 2023-06-24T04:54:07.6415614Z [0m[[0m[0minfo[0m] [0m[0m[31m "tableColumns" : "'id1', 'id2', 'result']",[0m[0m 2023-06-24T04:54:07.6418983Z [0m[[0m[0minfo[0m] [0m[0m[31m "tableName" :...", but got "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS",[0m[0m 2023-06-24T04:54:07.6584005Z [0m[[0m[0minfo[0m] [0m[0m[31m "sqlState" : "21S01",[0m[0m 2023-06-24T04:54:07.6584598Z [0m[[0m[0minfo[0m] [0m[0m[31m "messageParameters" : {[0m[0m 2023-06-24T04:54:07.6585164Z [0m[[0m[0minfo[0m] [0m[0m[31m "dataColumns" : "`id`, `id`, `val`, `val`, `(val * val)`",[0m[0m 2023-06-24T04:54:07.6585707Z [0m[[0m[0minfo[0m] [0m[0m[31m "tableColumns" : "`id1`, `id2`, `result`]",[0m[0m 2023-06-24T04:54:07.6586483Z [0m[[0m[0minfo[0m] [0m[0m[31m "tableName" :..." Result did not match for query #474[0m[0m 2023-06-24T04:54:07.6595826Z [0m[[0m[0minfo[0m] [0m[0m[31m INSERT INTO num_result SELECT t1.id, t2.id, t1.val, t2.val, t1.val * t2.val[0m[0m 2023-06-24T04:54:07.6604080Z [0m[[0m[0minfo[0m] [0m[0m[31m FROM num_data t1, num_data t2 (SQLQueryTestSuite.scala:848)[0m[0m 2023-06-24T04:54:07.6617182Z [0m[[0m[0minfo[0m] [0m[0m[31m org.scalatest.exceptions.TestFailedException:[0m[0m ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual checked using Java 21 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request, #41719: [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]
beliefer opened a new pull request, #41719: URL: https://github.com/apache/spark/pull/41719 ### What changes were proposed in this pull request? The pr aims to assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]. ### Why are the changes needed? Improve the error framework. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Exists test cases updated and added new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a diff in pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Pyth
beliefer commented on code in PR #41718: URL: https://github.com/apache/spark/pull/41718#discussion_r1241015841 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -6379,6 +6428,32 @@ object functions { def to_json(e: Column): Column = to_json(e, Map.empty[String, String]) + // scalastyle:off line.size.limit + /** + * Masks the given string value. This can be useful for creating copies of tables with sensitive + * information removed. + * + * @param input string value to mask. Supported types: STRING, VARCHAR, CHAR + * @param upperChar character to replace upper-case characters with. Specify NULL to retain original character. + * @param lowerChar character to replace lower-case characters with. Specify NULL to retain original character. + * @param digitChar character to replace digit characters with. Specify NULL to retain original character. + * @param otherChar character to replace all other characters with. Specify NULL to retain original character. + * + * @group string_funcs + * @since 3.5.0 + */ + // scalastyle:on line.size.limit + def mask( +input: Column, +upperChar: Column, +lowerChar: Column, +digitChar: Column, +otherChar: Column): Column = { Review Comment: Please supplement the API with other constructor of `Mask`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41673: [SPARK-44091][YARN][TESTS] Introduce `withResourceTypes` to `ResourceRequestTestHelper` to restore `resourceTypes` as default value after
LuciferYang commented on PR #41673: URL: https://github.com/apache/spark/pull/41673#issuecomment-1605849481 this one is fixed https://github.com/apache/spark/pull/40877#issuecomment-1595959697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93
LuciferYang commented on code in PR #41681: URL: https://github.com/apache/spark/pull/41681#discussion_r1241013608 ## pom.xml: ## @@ -212,7 +212,7 @@ 1.5.0 1.60 1.9.0 -4.1.92.Final +4.1.93.Final Review Comment: I think we should add some comments to inform other developers not to try upgrading to 4.1.94 and need to wait for arrow-memory-netty to upgrade together -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark-docker] dongjoon-hyun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
dongjoon-hyun commented on PR #46: URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605835524 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark-docker] Yikun closed pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
Yikun closed pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles URL: https://github.com/apache/spark-docker/pull/46 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark-docker] Yikun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
Yikun commented on PR #46: URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605835292 @dongjoon-hyun Thanks, merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41654: [SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache`
LuciferYang commented on PR #41654: URL: https://github.com/apache/spark/pull/41654#issuecomment-1605830267 > Hi @LuciferYang , thanks for the fix! I'm fine with either option. I rebase the code to make GA test this one again. @HyukjinKwon seems the author approves of this fix. I am planning to merge this one today, do you think it's ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41654: [SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache`
LuciferYang commented on PR #41654: URL: https://github.com/apache/spark/pull/41654#issuecomment-1605829615 Thanks @liuzqt -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python
LuciferYang commented on PR #41718: URL: https://github.com/apache/spark/pull/41718#issuecomment-1605828319 also cc @HyukjinKwon @panbingkun @beliefer FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #41678: [SPARK-44110][BUILD] Propagate proxy settings to forked JVMs
LuciferYang commented on PR #41678: URL: https://github.com/apache/spark/pull/41678#issuecomment-1605828159 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark-docker] Yikun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
Yikun commented on PR #46: URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605823664 cc @HyukjinKwon @zhengruifeng @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bersprockets commented on a diff in pull request #41712: [SPARK-44132][SQL] Materialize `Stream` of join column names to avoid codegen failure
bersprockets commented on code in PR #41712: URL: https://github.com/apache/spark/pull/41712#discussion_r1240998450 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan checkAnswer(sql(query), expected) } } + + test("SPARK-44132: FULL OUTER JOIN by streamed column name fails with NPE") { Review Comment: >Let me know if you would prefer that I also add/submit it. No, I think the current test is fine. I just wanted to make sure we were testing the original bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest
beliefer commented on PR #41443: URL: https://github.com/apache/spark/pull/41443#issuecomment-1605814025 @jdesjean Thank you for the e explanation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark-docker] Yikun opened a new pull request, #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
Yikun opened a new pull request, #46: URL: https://github.com/apache/spark-docker/pull/46 ### What changes were proposed in this pull request? Add Apache Spark 3.4.1 Dockerfiles. - Add 3.4.1 GPG key - Add .github/workflows/build_3.4.1.yaml - ./add-dockerfiles.sh 3.4.1 - Add version and tag info ### Why are the changes needed? Apache Spark 3.4.1 released: https://spark.apache.org/releases/spark-release-3-4-0.html ### Does this PR introduce _any_ user-facing change? Docker image will be published. ### How was this patch tested? Add workflow and CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default
ulysses-you commented on PR #40390: URL: https://github.com/apache/spark/pull/40390#issuecomment-1605811101 thank you @dongjoon-hyun for the reminder. There is a issue https://github.com/apache/spark/pull/41100 before this pr. I hope both of them can be shipped into Spark 3.5.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ivoson commented on pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python
ivoson commented on PR #41718: URL: https://github.com/apache/spark/pull/41718#issuecomment-1605803032 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ivoson opened a new pull request, #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python
ivoson opened a new pull request, #41718: URL: https://github.com/apache/spark/pull/41718 ### What changes were proposed in this pull request? Add following functions: - array_agg - array_size - cardinality - count_min_sketch - named_struct - json_array_length - json_object_keys - mask To: - Scala API - Python API - Spark Connect Scala Client - Spark Connect Python Client ### Why are the changes needed? Add Scala, Python and Connect API for these sql functions: array_agg, array_size, cardinality, count_min_sketch, named_struct, json_array_length, json_object_keys, mask ### Does this PR introduce _any_ user-facing change? Yes, added new functions. ### How was this patch tested? New UT added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog
github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog URL: https://github.com/apache/spark/pull/38885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #40460: [SPARK-42828][PYTHON][SQL] More explicit Python type annotations for GroupedData
github-actions[bot] commented on PR #40460: URL: https://github.com/apache/spark/pull/40460#issuecomment-1605791480 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ramon-garcia opened a new pull request, #41717: Support for TIME columns in Parquet files SPARK-44165
ramon-garcia opened a new pull request, #41717: URL: https://github.com/apache/spark/pull/41717 This pull request enables loading of TIME columns, both 32 bit and 64 bit wide. They are converted into DayTimeInterval columns. Test cases are also included. Best regards. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM
mridulm commented on PR #41711: URL: https://github.com/apache/spark/pull/41711#issuecomment-1605618711 It is unclear to me what the purpose of this PR is ... * Why do we need this ? What problem is it solving ? Is it common enough to require this ? * Who is going to use this ? Is it developers ? Reviewers ? * Does it need to be in spark ? Or can it be a documented in wiki instead ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM
srowen commented on PR #41711: URL: https://github.com/apache/spark/pull/41711#issuecomment-1605617426 Do we need this tool? or just need to run it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240871247 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +// From Java 9+, we can use 'ProcessHandle.current().pid()' +val pid = getProcessName().split("@").head +val builder = new ProcessBuilder("jmap", "-histo:live", pid) +builder.redirectErrorStream(true) +val p = builder.start() +val r = new BufferedReader(new InputStreamReader(p.getInputStream())) Review Comment: nit: This reader is not closed and/or we are not doing waitFor on the process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240871247 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +// From Java 9+, we can use 'ProcessHandle.current().pid()' +val pid = getProcessName().split("@").head +val builder = new ProcessBuilder("jmap", "-histo:live", pid) +builder.redirectErrorStream(true) +val p = builder.start() +val r = new BufferedReader(new InputStreamReader(p.getInputStream())) Review Comment: This reader is not closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM
gatorsmile commented on PR #41711: URL: https://github.com/apache/spark/pull/41711#issuecomment-1605607122 We published the error guideline a few years ago, but not all contributors adhered to it, resulting in variable quality in error messages. Since ChatGPT-4 has demonstrated a solid understanding of Spark from just a few attempts, I believe we should advocate for its use within the community to enhance Spark. This script is designed to simplify the process and provide an effective prompt, which is crucial for ChatGPT's generation of high-quality errors. Rather than depending on the community to learn how to write the prompt, we should take the initiative and do it for everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM
gatorsmile commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1240869830 ## dev/error_message_refiner.py: ## @@ -0,0 +1,235 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Utility for refining error messages based on LLM. + +Usage: +python error_message_refiner.py [--gpt_version=] + +Arguments: + Required. +The name of the error class to refine the messages for. +The list of error classes is located in +`core/src/main/resources/error/error-classes.json`. + +Options: +--gpt_version= Optional. +The version of Chat GPT to use for refining the error messages. +If not provided, the default version("gpt-3.5-turbo") will be used. + +Example usage: +python error_message_refiner.py CANNOT_DECODE_URL --gpt_version=gpt-4 + +Description: +This script refines error messages using the LLM based approach. +It takes the name of the error class as a required argument and, optionally, +allows specifying the version of Chat GPT to use for refining the messages. + +Options: +--gpt_version: Specifies the version of Chat GPT. + If not provided, the default version("gpt-3.5-turbo") will be used. + +Note: +- Ensure that the necessary dependencies are installed before running the script. +- Ensure that the valid API key is entered in the `api-key.txt`. +- The refined error messages will be displayed in the console output. +- To use the gpt-4 model, you need to join the waitlist. Please refer to + https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for more details. +""" + +import argparse +import json +import openai +import re +import subprocess +import random +from typing import Tuple, Optional +from sparktestsupport import SPARK_HOME + +PATH_TO_ERROR_CLASS = f"{SPARK_HOME}/core/src/main/resources/error/error-classes.json" +PATH_TO_API_KEY = f"{SPARK_HOME}/dev/api_key.txt" + +# You can obtain an API key from https://platform.openai.com/account/api-keys +openai.api_key = open(PATH_TO_API_KEY).read().rstrip("\n") + + +def _git_grep_files(search_string: str, exclude: str = None) -> str: +""" +Executes 'git grep' command to search for files containing the given search string. +Returns the file path where the search string is found. +""" +result = subprocess.run( +["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"], +capture_output=True, +text=True, +) +output = result.stdout.strip() + +files = output.split("\n") +files = [file for file in files if "Suite" not in file] +if exclude is not None: +files = [file for file in files if exclude not in file] +file = random.choice(files) +return file + + +def _find_function(file_name: str, search_string: str) -> Optional[str]: +""" +Searches for a function in the given file containing the specified search string. +Returns the name of the function if found, otherwise None. +""" +with open(file_name, "r") as file: +content = file.read() +functions = re.findall(r"def\s+(\w+)\s*\(", content) + +for function in functions: +function_content = re.search( + rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}", +content, +re.DOTALL, +) +if function_content and search_string in function_content.group(0): +return function + +return None + + +def _find_func_body(file_name: str, search_string: str) -> Optional[str]: +""" +Searches for a function body in the given file containing the specified search string. +Returns the function body if found, otherwise None. +""" +with open(file_name, "r") as file: +content = file.read() +functions = re.findall(r"def\s+(\w+)\s*\(", content) + +for function in functions: +function_content = re.search( + rf"def\s+{re.escape(function)}(?:(?!def\s).)*?{re.
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240868705 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +// From Java 9+, we can use 'ProcessHandle.current().pid()' +val pid = getProcessName().split("@").head +val builder = new ProcessBuilder("jmap", "-histo:live", pid) +builder.redirectErrorStream(true) +val p = builder.start() +val r = new BufferedReader(new InputStreamReader(p.getInputStream())) +val rows = ArrayBuffer.empty[String] +var line = "" +while (line != null) { + if (line.nonEmpty) rows += line + line = r.readLine() +} +rows.toArray Review Comment: Use `IOUtils.readLines` or `Source.getLines` instead ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240864330 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +// From Java 9+, we can use 'ProcessHandle.current().pid()' +val pid = getProcessName().split("@").head +val builder = new ProcessBuilder("jmap", "-histo:live", pid) +builder.redirectErrorStream(true) Review Comment: Log errors in invocation to executor logs instead of sending it to driver as response ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +val pid = String.valueOf(ProcessHandle.current().pid()) +val builder = new ProcessBuilder("jmap", "-histo:live", pid) Review Comment: @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap` (more specifically whatever comes from `System.getProperty("java.home")`), not the first `jmap` which happens to be in the `PATH`. It is common to override `JAVA_HOME` to specify the java version to be used explicitly (or even to not have jdk in the PATH at all). Also, there is no compatibility gaurantees that I am aware of between different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or vice versa) - if I missed any, please do let me know ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +val pid = String.valueOf(ProcessHandle.current().pid()) +val builder = new ProcessBuilder("jmap", "-histo:live", pid) Review Comment: @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap` (more specifically whatever come from java.home), not the first `jmap` which happens to be in the `PATH`. It is common to override `JAVA_HOME` to specify the java version to be used explicitly (or even to not have jdk in the PATH at all). Also, there is no compatibility gaurantees that I am aware of between different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or vice versa) - if I missed any, please do let me know ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab
mridulm commented on code in PR #41709: URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with SparkClassUtils { }.map(threadInfoToThreadStackTrace) } + /** Return a heap dump. Used to capture dumps for the web UI */ + def getHeapHistogram(): Array[String] = { +val pid = String.valueOf(ProcessHandle.current().pid()) +val builder = new ProcessBuilder("jmap", "-histo:live", pid) Review Comment: @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap`, not the first `jmap` which happens to be in the `PATH`. It is common to override `JAVA_HOME` to specify the java version to be used explicitly (or even to not have jdk in the PATH at all). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #41676: [SPARK-44109][CORE] Remove duplicate preferred locations of each RDD partition
mridulm commented on PR #41676: URL: https://github.com/apache/spark/pull/41676#issuecomment-1605579525 Not handling this for shuffle ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487
srowen commented on PR #41613: URL: https://github.com/apache/spark/pull/41613#issuecomment-1605573117 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487
srowen closed pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487 URL: https://github.com/apache/spark/pull/41613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93
panbingkun commented on PR #41681: URL: https://github.com/apache/spark/pull/41681#issuecomment-1605455942 > What I mean is that we may need to wait for the next arrow version to be compatible with the netty 4.1.94.Final Let's upgrade to the `netty 4.1.93.Final` version first. After `arrow memory netty'` completes the same upgrade, we will consider `netty 4.1.94.Final` again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #41572: [SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
panbingkun commented on PR #41572: URL: https://github.com/apache/spark/pull/41572#issuecomment-1605442340 > I have another concern, for testing backwards compatibility it might be useful to keep 'orphaned' protos around. This would effectively kill that. 1.Very good suggestion, but currently the orphan files deleted from the above, such as ``, `` These are all files submitted by mistake during the review process, 2.At the same time, I have added additional explanatory notes in the code comments. 3.We should provide an automated function to find orphaned files. As for whether to delete them, I think it needs to be weighed between the submitter and the code reviewer. Otherwise, many orphaned files are increasing, and many of them are only generated by submitting them incorrectly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #41572: [SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
panbingkun commented on PR #41572: URL: https://github.com/apache/spark/pull/41572#issuecomment-1605440164 > ``` > SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly org.apache.spark.sql.PlanGenerationTestSuite -- -z lpad" > ... > > [info] PlanGenerationTestSuite: > [info] - function lpad (35 milliseconds) > [info] - function lpad binary (1 millisecond) > [info] Run completed in 2 seconds, 58 milliseconds. > [info] Total number of tests run: 2 > [info] Suites: completed 1, aborted 0 > [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0 > [info] All tests passed. > [success] Total time: 120 s (02:00), completed Jun 15, 2023, 10:42:52 AM > ``` > > will re-generating golden files for single test or a group of tests still be supported after this PR? The current logic already supports the above scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] steven-aerts commented on a diff in pull request #41712: [SPARK-44132][SQL] join using Stream of column name fails codegen
steven-aerts commented on code in PR #41712: URL: https://github.com/apache/spark/pull/41712#discussion_r1240706217 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan checkAnswer(sql(query), expected) } } + + test("SPARK-44132: FULL OUTER JOIN by streamed column name fails with NPE") { Review Comment: @bersprockets absolutely. I also have a [unit test lying around](#41688 ) to validate it, but it feels superfluous to add that one too. Let me know if you would prefer that I also add/submit it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM
itholic commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1240625143 ## dev/error_message_refiner.py: ## @@ -0,0 +1,235 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Utility for refining error messages based on LLM. Review Comment: Yeah, I have a separate script to convert temp error class. I will post a PR right away with the same comments reflected when the review of current PR is completed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org