date:20230624

[GitHub] [spark] panbingkun opened a new pull request, #41721: [SPARK-44171][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2279-2282] & delete some unused error classes

2023-06-24 Thread via GitHub



panbingkun opened a new pull request, #41721:
URL: https://github.com/apache/spark/pull/41721

   ### What changes were proposed in this pull request?
   The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2279-2282] and delete some unused error classes.
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Pass GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #41687: [SPARK-44131][SQL] Add call_function and deprecate call_udf for Scala API

2023-06-24 Thread via GitHub



beliefer commented on PR #41687:
URL: https://github.com/apache/spark/pull/41687#issuecomment-1605876800

   ping @cloud-fan @zhengruifeng cc @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #41476: [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-24 Thread via GitHub



beliefer commented on PR #41476:
URL: https://github.com/apache/spark/pull/41476#issuecomment-1605875836

   ping @MaxGekk Rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41681:
URL: https://github.com/apache/spark/pull/41681#issuecomment-1605873942

   https://github.com/apache/arrow/pull/36211/files
   
   arrow already upgrade Netty to 4.1.94.Final and this may be released in 
arrow 13.0,  I'm not sure if this is available because it also updates the grpc 
to 1.56.0, which is different from Spark Connect using a different grpc version
   

   On the other hand, Netty 4.1.94 fixes a CVE 
https://github.com/apache/arrow/pull/36211, I want to know if this CVE will 
affect Spark?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41720: [SPARK-43969][SQL][TESTS][FOLLOWUP] Update `numeric.sql.out.java21`

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41720:
URL: https://github.com/apache/spark/pull/41720#issuecomment-1605868725

   cc @dongjoon-hyun FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request, #41720: [SPARK-43969][SQL][TESTS][FOLLOWUP] Update `numeric.sql.out.java21`

2023-06-24 Thread via GitHub



LuciferYang opened a new pull request, #41720:
URL: https://github.com/apache/spark/pull/41720

   ### What changes were proposed in this pull request?
   https://github.com/apache/spark/pull/41458 updated `numeric.sql.out` but not 
update `numeric.sql.out.java21`, this pr updated `numeric.sql.out.java21` for 
Java 21.
   
   
   ### Why are the changes needed?
   Fix golden file for Java 21.
   
   https://github.com/apache/spark/actions/runs/5362442727/jobs/9729315685
   
   ```
   2023-06-24T04:54:07.6401972Z [0m[[0m[0minfo[0m] [0m[0m[31m- 
postgreSQL/numeric.sql *** FAILED *** (1 minute, 4 seconds)[0m[0m
   2023-06-24T04:54:07.6403269Z [0m[[0m[0minfo[0m] [0m[0m[31m  
postgreSQL/numeric.sql[0m[0m
   2023-06-24T04:54:07.6404580Z [0m[[0m[0minfo[0m] [0m[0m[31m  Expected 
"...OLUMN_ARITY_MISMATCH[",[0m[0m
   2023-06-24T04:54:07.6405125Z [0m[[0m[0minfo[0m] [0m[0m[31m
"sqlState" : "21S01",[0m[0m
   2023-06-24T04:54:07.6405768Z [0m[[0m[0minfo[0m] [0m[0m[31m
"messageParameters" : {[0m[0m
   2023-06-24T04:54:07.6406338Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"dataColumns" : "'id', 'id', 'val', 'val', '(val * val)'",[0m[0m
   2023-06-24T04:54:07.6412205Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"reason" : "too many data columns",[0m[0m
   2023-06-24T04:54:07.6415614Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"tableColumns" : "'id1', 'id2', 'result']",[0m[0m
   2023-06-24T04:54:07.6418983Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"tableName" :...", but got 
"...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS",[0m[0m
   2023-06-24T04:54:07.6584005Z [0m[[0m[0minfo[0m] [0m[0m[31m
"sqlState" : "21S01",[0m[0m
   2023-06-24T04:54:07.6584598Z [0m[[0m[0minfo[0m] [0m[0m[31m
"messageParameters" : {[0m[0m
   2023-06-24T04:54:07.6585164Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"dataColumns" : "`id`, `id`, `val`, `val`, `(val * val)`",[0m[0m
   2023-06-24T04:54:07.6585707Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"tableColumns" : "`id1`, `id2`, `result`]",[0m[0m
   2023-06-24T04:54:07.6586483Z [0m[[0m[0minfo[0m] [0m[0m[31m  
"tableName" :..." Result did not match for query #474[0m[0m
   2023-06-24T04:54:07.6595826Z [0m[[0m[0minfo[0m] [0m[0m[31m  INSERT 
INTO num_result SELECT t1.id, t2.id, t1.val, t2.val, t1.val * t2.val[0m[0m
   2023-06-24T04:54:07.6604080Z [0m[[0m[0minfo[0m] [0m[0m[31m  FROM 
num_data t1, num_data t2 (SQLQueryTestSuite.scala:848)[0m[0m
   2023-06-24T04:54:07.6617182Z [0m[[0m[0minfo[0m] [0m[0m[31m  
org.scalatest.exceptions.TestFailedException:[0m[0m
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   - Pass GitHub Actions
   - Manual checked using Java 21


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request, #41719: [SPARK-44169][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2300-2304]

2023-06-24 Thread via GitHub



beliefer opened a new pull request, #41719:
URL: https://github.com/apache/spark/pull/41719

   ### What changes were proposed in this pull request?
   The pr aims to assign names to the error class 
_LEGACY_ERROR_TEMP_[2300-2304].
   
   
   ### Why are the changes needed?
   Improve the error framework.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   Exists test cases updated and added new test cases.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a diff in pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Pyth

2023-06-24 Thread via GitHub



beliefer commented on code in PR #41718:
URL: https://github.com/apache/spark/pull/41718#discussion_r1241015841


##
sql/core/src/main/scala/org/apache/spark/sql/functions.scala:
##
@@ -6379,6 +6428,32 @@ object functions {
   def to_json(e: Column): Column =
 to_json(e, Map.empty[String, String])
 
+  // scalastyle:off line.size.limit
+  /**
+   * Masks the given string value. This can be useful for creating copies of 
tables with sensitive
+   * information removed.
+   *
+   * @param input string value to mask. Supported types: STRING, VARCHAR, CHAR
+   * @param upperChar character to replace upper-case characters with. Specify 
NULL to retain original character.
+   * @param lowerChar character to replace lower-case characters with. Specify 
NULL to retain original character.
+   * @param digitChar character to replace digit characters with. Specify NULL 
to retain original character.
+   * @param otherChar character to replace all other characters with. Specify 
NULL to retain original character.
+   *
+   * @group string_funcs
+   * @since 3.5.0
+   */
+  // scalastyle:on line.size.limit
+  def mask(
+input: Column,
+upperChar: Column,
+lowerChar: Column,
+digitChar: Column,
+otherChar: Column): Column = {

Review Comment:
   Please supplement the API with other constructor of `Mask`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41673: [SPARK-44091][YARN][TESTS] Introduce `withResourceTypes` to `ResourceRequestTestHelper` to restore `resourceTypes` as default value after

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41673:
URL: https://github.com/apache/spark/pull/41673#issuecomment-1605849481

   this one is fixed 
https://github.com/apache/spark/pull/40877#issuecomment-1595959697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93

2023-06-24 Thread via GitHub



LuciferYang commented on code in PR #41681:
URL: https://github.com/apache/spark/pull/41681#discussion_r1241013608


##
pom.xml:
##
@@ -212,7 +212,7 @@
 1.5.0
 1.60
 1.9.0
-4.1.92.Final
+4.1.93.Final

Review Comment:
   I think we should add some comments to inform other developers not to try 
upgrading to 4.1.94 and need to wait for arrow-memory-netty to upgrade together
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark-docker] dongjoon-hyun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread via GitHub



dongjoon-hyun commented on PR #46:
URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605835524

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark-docker] Yikun closed pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread via GitHub



Yikun closed pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles
URL: https://github.com/apache/spark-docker/pull/46


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark-docker] Yikun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread via GitHub



Yikun commented on PR #46:
URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605835292

   @dongjoon-hyun Thanks, merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41654: [SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache`

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41654:
URL: https://github.com/apache/spark/pull/41654#issuecomment-1605830267

   > Hi @LuciferYang , thanks for the fix! I'm fine with either option.
   
   I rebase the code to make GA test this one again. @HyukjinKwon  seems the 
author approves of this fix. I am planning to merge this one today, do you 
think it's ok?
   

   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41654: [SPARK-44064][CORE][SQL] Add a new `apply` function to `NonFateSharingCache`

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41654:
URL: https://github.com/apache/spark/pull/41654#issuecomment-1605829615

   Thanks @liuzqt 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41718:
URL: https://github.com/apache/spark/pull/41718#issuecomment-1605828319

   also cc @HyukjinKwon @panbingkun @beliefer FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #41678: [SPARK-44110][BUILD] Propagate proxy settings to forked JVMs

2023-06-24 Thread via GitHub



LuciferYang commented on PR #41678:
URL: https://github.com/apache/spark/pull/41678#issuecomment-1605828159

   Late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark-docker] Yikun commented on pull request #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread via GitHub



Yikun commented on PR #46:
URL: https://github.com/apache/spark-docker/pull/46#issuecomment-1605823664

   cc @HyukjinKwon @zhengruifeng @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bersprockets commented on a diff in pull request #41712: [SPARK-44132][SQL] Materialize `Stream` of join column names to avoid codegen failure

2023-06-24 Thread via GitHub



bersprockets commented on code in PR #41712:
URL: https://github.com/apache/spark/pull/41712#discussion_r1240998450


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   checkAnswer(sql(query), expected)
 }
   }
+
+  test("SPARK-44132: FULL OUTER JOIN by streamed column name fails with NPE") {

Review Comment:
   >Let me know if you would prefer that I also add/submit it.
   
   No, I think the current test is fine. I just wanted to make sure we were 
testing the original bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #41443: [SPARK-43923][CONNECT] Post listenerBus events during ExecutePlanRequest

2023-06-24 Thread via GitHub



beliefer commented on PR #41443:
URL: https://github.com/apache/spark/pull/41443#issuecomment-1605814025

   @jdesjean Thank you for the e explanation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark-docker] Yikun opened a new pull request, #46: [SPARK-44168] Add Apache Spark 3.4.1 Dockerfiles

2023-06-24 Thread via GitHub



Yikun opened a new pull request, #46:
URL: https://github.com/apache/spark-docker/pull/46

   ### What changes were proposed in this pull request?
   Add Apache Spark 3.4.1 Dockerfiles.
   - Add 3.4.1 GPG key
   - Add .github/workflows/build_3.4.1.yaml
   - ./add-dockerfiles.sh 3.4.1
   - Add version and tag info
   
   ### Why are the changes needed?
   Apache Spark 3.4.1 released:
   https://spark.apache.org/releases/spark-release-3-4-0.html
   
   ### Does this PR introduce _any_ user-facing change?
   Docker image will be published.
   
   ### How was this patch tested?
   Add workflow and CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-06-24 Thread via GitHub



ulysses-you commented on PR #40390:
URL: https://github.com/apache/spark/pull/40390#issuecomment-1605811101

   thank you @dongjoon-hyun for the reminder. There is a issue 
https://github.com/apache/spark/pull/41100 before this pr. I hope both of them 
can be shipped into Spark 3.5.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ivoson commented on pull request #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python

2023-06-24 Thread via GitHub



ivoson commented on PR #41718:
URL: https://github.com/apache/spark/pull/41718#issuecomment-1605803032

   cc @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ivoson opened a new pull request, #41718: [SPARK-43926][CONNECT][PYTHON] Add array_agg, array_size, cardinality, count_min_sketch,mask,named_struct,json_* to Scala and Python

2023-06-24 Thread via GitHub



ivoson opened a new pull request, #41718:
URL: https://github.com/apache/spark/pull/41718

   ### What changes were proposed in this pull request?
   
   Add following functions:
   - array_agg
   - array_size
   - cardinality
   - count_min_sketch
   - named_struct
   - json_array_length
   - json_object_keys
   - mask
   
   To:
   
   - Scala API
   - Python API
   - Spark Connect Scala Client
   - Spark Connect Python Client
   
   ### Why are the changes needed?
   Add Scala, Python and Connect API for these sql functions: array_agg, 
array_size, cardinality, count_min_sketch, named_struct, json_array_length, 
json_object_keys, mask
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, added new functions.
   
   ### How was this patch tested?
   New UT added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable V2 file tables in read paths in session catalog

2023-06-24 Thread via GitHub



github-actions[bot] closed pull request #38885: [WIP][SPARK-41367][SQL] Enable 
V2 file tables in read paths in session catalog
URL: https://github.com/apache/spark/pull/38885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #40460: [SPARK-42828][PYTHON][SQL] More explicit Python type annotations for GroupedData

2023-06-24 Thread via GitHub



github-actions[bot] commented on PR #40460:
URL: https://github.com/apache/spark/pull/40460#issuecomment-1605791480

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ramon-garcia opened a new pull request, #41717: Support for TIME columns in Parquet files SPARK-44165

2023-06-24 Thread via GitHub



ramon-garcia opened a new pull request, #41717:
URL: https://github.com/apache/spark/pull/41717

   This pull request enables loading of TIME columns, both 32 bit and 64 bit 
wide. They are converted into DayTimeInterval columns.
   
   Test cases are also included.
   
   
   Best regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-06-24 Thread via GitHub



mridulm commented on PR #41711:
URL: https://github.com/apache/spark/pull/41711#issuecomment-1605618711

   It is unclear to me what the purpose of this PR is ... 
   
   * Why do we need this ? What problem is it solving ? Is it common enough to 
require this ?
   * Who is going to use this ? Is it developers ? Reviewers ?
   * Does it need to be in spark ? Or can it be a documented in wiki instead ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-06-24 Thread via GitHub



srowen commented on PR #41711:
URL: https://github.com/apache/spark/pull/41711#issuecomment-1605617426

   Do we need this tool? or just need to run it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240871247


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+// From Java 9+, we can use 'ProcessHandle.current().pid()'
+val pid = getProcessName().split("@").head
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+builder.redirectErrorStream(true)
+val p = builder.start()
+val r = new BufferedReader(new InputStreamReader(p.getInputStream()))

Review Comment:
   nit: This reader is not closed and/or we are not doing waitFor on the 
process.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240871247


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+// From Java 9+, we can use 'ProcessHandle.current().pid()'
+val pid = getProcessName().split("@").head
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+builder.redirectErrorStream(true)
+val p = builder.start()
+val r = new BufferedReader(new InputStreamReader(p.getInputStream()))

Review Comment:
   This reader is not closed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-06-24 Thread via GitHub



gatorsmile commented on PR #41711:
URL: https://github.com/apache/spark/pull/41711#issuecomment-1605607122

   We published the error guideline a few years ago, but not all contributors 
adhered to it, resulting in variable quality in error messages. Since ChatGPT-4 
has demonstrated a solid understanding of Spark from just a few attempts, I 
believe we should advocate for its use within the community to enhance Spark.
   
   This script is designed to simplify the process and provide an effective 
prompt, which is crucial for ChatGPT's generation of high-quality errors. 
Rather than depending on the community to learn how to write the prompt, we 
should take the initiative and do it for everyone


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-06-24 Thread via GitHub



gatorsmile commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240869830


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,235 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.
+
+Usage:
+python error_message_refiner.py  [--gpt_version=]
+
+Arguments:
+   Required.
+The name of the error class to refine the messages 
for.
+The list of error classes is located in
+`core/src/main/resources/error/error-classes.json`.
+
+Options:
+--gpt_version= Optional.
+The version of Chat GPT to use for refining the 
error messages.
+If not provided, the default 
version("gpt-3.5-turbo") will be used.
+
+Example usage:
+python error_message_refiner.py CANNOT_DECODE_URL --gpt_version=gpt-4
+
+Description:
+This script refines error messages using the LLM based approach.
+It takes the name of the error class as a required argument and, 
optionally,
+allows specifying the version of Chat GPT to use for refining the messages.
+
+Options:
+--gpt_version: Specifies the version of Chat GPT.
+   If not provided, the default version("gpt-3.5-turbo") 
will be used.
+
+Note:
+- Ensure that the necessary dependencies are installed before running the 
script.
+- Ensure that the valid API key is entered in the `api-key.txt`.
+- The refined error messages will be displayed in the console output.
+- To use the gpt-4 model, you need to join the waitlist. Please refer to
+  https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4 for 
more details.
+"""
+
+import argparse
+import json
+import openai
+import re
+import subprocess
+import random
+from typing import Tuple, Optional
+from sparktestsupport import SPARK_HOME
+
+PATH_TO_ERROR_CLASS = 
f"{SPARK_HOME}/core/src/main/resources/error/error-classes.json"
+PATH_TO_API_KEY = f"{SPARK_HOME}/dev/api_key.txt"
+
+# You can obtain an API key from https://platform.openai.com/account/api-keys
+openai.api_key = open(PATH_TO_API_KEY).read().rstrip("\n")
+
+
+def _git_grep_files(search_string: str, exclude: str = None) -> str:
+"""
+Executes 'git grep' command to search for files containing the given 
search string.
+Returns the file path where the search string is found.
+"""
+result = subprocess.run(
+["git", "grep", "-l", search_string, "--", f"{SPARK_HOME}/*.scala"],
+capture_output=True,
+text=True,
+)
+output = result.stdout.strip()
+
+files = output.split("\n")
+files = [file for file in files if "Suite" not in file]
+if exclude is not None:
+files = [file for file in files if exclude not in file]
+file = random.choice(files)
+return file
+
+
+def _find_function(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function in the given file containing the specified search 
string.
+Returns the name of the function if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def).)*?{re.escape(search_string)}",
+content,
+re.DOTALL,
+)
+if function_content and search_string in function_content.group(0):
+return function
+
+return None
+
+
+def _find_func_body(file_name: str, search_string: str) -> Optional[str]:
+"""
+Searches for a function body in the given file containing the specified 
search string.
+Returns the function body if found, otherwise None.
+"""
+with open(file_name, "r") as file:
+content = file.read()
+functions = re.findall(r"def\s+(\w+)\s*\(", content)
+
+for function in functions:
+function_content = re.search(
+
rf"def\s+{re.escape(function)}(?:(?!def\s).)*?{re.

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240868705


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+// From Java 9+, we can use 'ProcessHandle.current().pid()'
+val pid = getProcessName().split("@").head
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+builder.redirectErrorStream(true)
+val p = builder.start()
+val r = new BufferedReader(new InputStreamReader(p.getInputStream()))
+val rows = ArrayBuffer.empty[String]
+var line = ""
+while (line != null) {
+  if (line.nonEmpty) rows += line
+  line = r.readLine()
+}
+rows.toArray

Review Comment:
   Use `IOUtils.readLines` or `Source.getLines` instead ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240864330


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,23 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+// From Java 9+, we can use 'ProcessHandle.current().pid()'
+val pid = getProcessName().split("@").head
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)
+builder.redirectErrorStream(true)

Review Comment:
   Log errors in invocation to executor logs instead of sending it to driver as 
response ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+val pid = String.valueOf(ProcessHandle.current().pid())
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)

Review Comment:
   @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap` (more 
specifically whatever comes from `System.getProperty("java.home")`), not the 
first `jmap` which happens to be in the `PATH`. It is common to override 
`JAVA_HOME` to specify the java version to be used explicitly (or even to not 
have jdk in the PATH at all).
   
   
   Also, there is no compatibility gaurantees that I am aware of between 
different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or 
vice versa) - if I missed any, please do let me know !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+val pid = String.valueOf(ProcessHandle.current().pid())
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)

Review Comment:
   @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap` (more 
specifically whatever come from java.home), not the first `jmap` which happens 
to be in the `PATH`. It is common to override `JAVA_HOME` to specify the java 
version to be used explicitly (or even to not have jdk in the PATH at all).
   
   
   Also, there is no compatibility gaurantees that I am aware of between 
different versions of jdk and jmap (for example, jdk11 jmap against jdk17 or 
vice versa) - if I missed any, please do let me know !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a diff in pull request #41709: [SPARK-44153][CORE][UI] Support `Heap Histogram` column in `Executors` tab

2023-06-24 Thread via GitHub



mridulm commented on code in PR #41709:
URL: https://github.com/apache/spark/pull/41709#discussion_r1240849467


##
core/src/main/scala/org/apache/spark/util/Utils.scala:
##
@@ -2287,6 +2287,22 @@ private[spark] object Utils extends Logging with 
SparkClassUtils {
 }.map(threadInfoToThreadStackTrace)
   }
 
+  /** Return a heap dump. Used to capture dumps for the web UI */
+  def getHeapHistogram(): Array[String] = {
+val pid = String.valueOf(ProcessHandle.current().pid())
+val builder = new ProcessBuilder("jmap", "-histo:live", pid)

Review Comment:
   @dongjoon-hyun This is an issue - we should use `$JAVA_HOME/bin/jmap`, not 
the first `jmap` which happens to be in the `PATH`. It is common to override 
`JAVA_HOME` to specify the java version to be used explicitly (or even to not 
have jdk in the PATH at all).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #41676: [SPARK-44109][CORE] Remove duplicate preferred locations of each RDD partition

2023-06-24 Thread via GitHub



mridulm commented on PR #41676:
URL: https://github.com/apache/spark/pull/41676#issuecomment-1605579525

   Not handling this for shuffle ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487

2023-06-24 Thread via GitHub



srowen commented on PR #41613:
URL: https://github.com/apache/spark/pull/41613#issuecomment-1605573117

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 7.7.2 to fix CVE-2020-28487

2023-06-24 Thread via GitHub



srowen closed pull request #41613: [SPARK-39740][UI]: Upgrade vis timeline to 
7.7.2 to fix CVE-2020-28487
URL: https://github.com/apache/spark/pull/41613


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #41681: [SPARK-44128][BUILD] Upgrade netty from 4.1.92 to 4.1.93

2023-06-24 Thread via GitHub



panbingkun commented on PR #41681:
URL: https://github.com/apache/spark/pull/41681#issuecomment-1605455942

   > What I mean is that we may need to wait for the next arrow version to be 
compatible with the netty 4.1.94.Final
   
   Let's upgrade to the `netty 4.1.93.Final` version first. After `arrow memory 
netty'` completes the same upgrade, we will consider `netty 4.1.94.Final` again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #41572: [SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-24 Thread via GitHub



panbingkun commented on PR #41572:
URL: https://github.com/apache/spark/pull/41572#issuecomment-1605442340

   > I have another concern, for testing backwards compatibility it might be 
useful to keep 'orphaned' protos around. This would effectively kill that.
   
   1.Very good suggestion, but currently the orphan files deleted from the 
above, such as ``, ``
   These are all files submitted by mistake during the review process,
   
   2.At the same time, I have added additional explanatory notes in the code 
comments.
   
   3.We should provide an automated function to find orphaned files. As for 
whether to delete them, I think it needs to be weighed between the submitter 
and the code reviewer. Otherwise, many orphaned files are increasing, and many 
of them are only generated by submitting them incorrectly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] panbingkun commented on pull request #41572: [SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-24 Thread via GitHub



panbingkun commented on PR #41572:
URL: https://github.com/apache/spark/pull/41572#issuecomment-1605440164

   > ```
   > SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "connect-client-jvm/testOnly 
org.apache.spark.sql.PlanGenerationTestSuite -- -z lpad"
   > ...
   > 
   > [info] PlanGenerationTestSuite:
   > [info] - function lpad (35 milliseconds)
   > [info] - function lpad binary (1 millisecond)
   > [info] Run completed in 2 seconds, 58 milliseconds.
   > [info] Total number of tests run: 2
   > [info] Suites: completed 1, aborted 0
   > [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
   > [info] All tests passed.
   > [success] Total time: 120 s (02:00), completed Jun 15, 2023, 10:42:52 AM
   > ```
   > 
   > will re-generating golden files for single test or a group of tests still 
be supported after this PR?
   
   The current logic already supports the above scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] steven-aerts commented on a diff in pull request #41712: [SPARK-44132][SQL] join using Stream of column name fails codegen

2023-06-24 Thread via GitHub



steven-aerts commented on code in PR #41712:
URL: https://github.com/apache/spark/pull/41712#discussion_r1240706217


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1685,4 +1685,24 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   checkAnswer(sql(query), expected)
 }
   }
+
+  test("SPARK-44132: FULL OUTER JOIN by streamed column name fails with NPE") {

Review Comment:
   @bersprockets absolutely.
   I also have a [unit test lying around](#41688 ) to validate it, but it feels 
superfluous to add that one too.  Let me know if you would prefer that I also 
add/submit it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-06-24 Thread via GitHub



itholic commented on code in PR #41711:
URL: https://github.com/apache/spark/pull/41711#discussion_r1240625143


##
dev/error_message_refiner.py:
##
@@ -0,0 +1,235 @@
+#!/usr/bin/env python3
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+Utility for refining error messages based on LLM.

Review Comment:
   Yeah, I have a separate script to convert temp error class. I will post a PR 
right away with the same comments reflected when the review of current PR is 
completed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

46 matches

Mail list logo