date:20231009

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun closed pull request #43303: [SPARK-45474][CORE][WEBUI] Support 
top-level filtering in MasterPage JSON API
URL: https://github.com/apache/spark/pull/43303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]

2023-10-09 Thread via GitHub



LuciferYang closed pull request #43282: [SPARK-45389][SQL][3.3] Correct 
MetaException matching rule on getting partition metadata
URL: https://github.com/apache/spark/pull/43282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]

2023-10-09 Thread via GitHub



LuciferYang commented on PR #43282:
URL: https://github.com/apache/spark/pull/43282#issuecomment-1754510389

   Merged into branch-3.3, thanks @pan3793 ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754509757

   `core` module passed and this new feature is not invoked in the old code. 
Let me merge this. Thank you again!
   
   ![Screenshot 2023-10-09 at 11 43 51 
PM](https://github.com/apache/spark/assets/9700541/56ccc35f-756a-40a3-aad8-b20e32cd0950)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]

2023-10-09 Thread via GitHub



pan3793 commented on PR #43282:
URL: https://github.com/apache/spark/pull/43282#issuecomment-1754504959

   CI passed, seems there are some issues on GitHub Action state sync
   
   https://github.com/apache/spark/assets/26535726/a8030635-8474-4b05-82ac-6e9d71e12c62";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45389][SQL][3.4] Correct MetaException matching rule on getting partition metadata [spark]

2023-10-09 Thread via GitHub



LuciferYang closed pull request #43281: [SPARK-45389][SQL][3.4] Correct 
MetaException matching rule on getting partition metadata
URL: https://github.com/apache/spark/pull/43281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45389][SQL][3.4] Correct MetaException matching rule on getting partition metadata [spark]

2023-10-09 Thread via GitHub



LuciferYang commented on PR #43281:
URL: https://github.com/apache/spark/pull/43281#issuecomment-1754501342

   Merged into branch-3.4, thanks @pan3793 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45449][SQL] Cache Invalidation Issue with JDBC Table [spark]

2023-10-09 Thread via GitHub



cloud-fan closed pull request #43258: [SPARK-45449][SQL] Cache Invalidation 
Issue with JDBC Table
URL: https://github.com/apache/spark/pull/43258


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45449][SQL] Cache Invalidation Issue with JDBC Table [spark]

2023-10-09 Thread via GitHub



cloud-fan commented on PR #43258:
URL: https://github.com/apache/spark/pull/43258#issuecomment-1754497294

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]

2023-10-09 Thread via GitHub



cloud-fan closed pull request #43270: [SPARK-45205][SQL] CommandResultExec to 
override iterator methods to avoid triggering multiple jobs.
URL: https://github.com/apache/spark/pull/43270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]

2023-10-09 Thread via GitHub



cloud-fan commented on PR #43270:
URL: https://github.com/apache/spark/pull/43270#issuecomment-1754493893

   thanks, merging to master/3.5!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-43664][CONNECT][PS][DOCS] Add note for `ps.sql` [spark]

2023-10-09 Thread via GitHub



itholic commented on PR #43237:
URL: https://github.com/apache/spark/pull/43237#issuecomment-1754486647

   Applied suggestions & updated PR title/description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions [spark]

2023-10-09 Thread via GitHub



MaxGekk closed pull request #43271: [SPARK-45458][SQL] Convert 
IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions
URL: https://github.com/apache/spark/pull/43271


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions [spark]

2023-10-09 Thread via GitHub



MaxGekk commented on PR #43271:
URL: https://github.com/apache/spark/pull/43271#issuecomment-1754473107

   +1, LGTM. Merging to master.
   Thank you, @panbingkun and @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



viirya commented on PR #43302:
URL: https://github.com/apache/spark/pull/43302#issuecomment-1754467148

   Thank you @holdenk @beliefer @dongjoon-hyun @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-45477][INFRA] Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming [spark]

2023-10-09 Thread via GitHub



LuciferYang opened a new pull request, #43305:
URL: https://github.com/apache/spark/pull/43305

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #43304:
URL: https://github.com/apache/spark/pull/43304#issuecomment-1754455150

   > QQ: is there an E2E test for this change? 
   
   Added a simple test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754457307

   @dongjoon-hyun @LuciferYang @MaxGekk Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754448693

   Merged to master for Apache Spark 4.0.0~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun closed pull request #43296: [SPARK-45470][SQL] Avoid paste string 
value of hive orc compression kind
URL: https://github.com/apache/spark/pull/43296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754445286

   Thank you, @beliefer ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754437003

   > Got it. Are we going to do the same clean-up for the other data sources 
like Parquet? Or Avro in `avro` module?
   
   Yes. I will do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351528465


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -208,24 +208,63 @@ private[deploy] object JsonProtocol {
* `completeddrivers` a list of Json objects of [[DriverInfo]] of 
the completed drivers
* of the master
* `status` status of the master,
-   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
+   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]].
+   * If `field` is not None, the Json object will contain the matched 
field.

Review Comment:
   Got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub



LuciferYang commented on PR #43289:
URL: https://github.com/apache/spark/pull/43289#issuecomment-1754435400

   Thank you for your confirmation @dongjoon-hyun  ~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43289:
URL: https://github.com/apache/spark/pull/43289#issuecomment-1754434466

   Yes, `re-trigger` will help. I saw the same situation before after upgrading 
the recent Oracle image, but didn't check the root cause yet, @LuciferYang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub



LuciferYang commented on PR #43289:
URL: https://github.com/apache/spark/pull/43289#issuecomment-1754433043

   > @LuciferYang the docker integration tests are consistently failing with
   > 
   > ```
   > [info] *** 2 SUITES ABORTED ***
   > [error] Error during tests:
   > [error]org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite
   > [error]org.apache.spark.sql.jdbc.OracleIntegrationSuite
   > ```
   > 
   > I don't know if they are flaky but it seems to be unrelated to this PR
   
   https://github.com/apache/spark/pull/43123#issuecomment-1752118564
   
   friendly ping @dongjoon-hyun, it seems like this is an unstable case? I 
noticed that you have also encountered similar failures.
   
   @hasnain-db Can you manually re-trigger `Run Docker integration tests`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



viirya commented on code in PR #43302:
URL: https://github.com/apache/spark/pull/43302#discussion_r1351517603


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##
@@ -963,4 +964,18 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), 
null)
 checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), 
null)
   }
+
+  test("SPARK-45473: Non-constant scale to Round") {

Review Comment:
   Removed. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub



itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351516059


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -110,10 +110,6 @@ object TableOutputResolver {
   resolveColumnsByPosition(tableName, query.output, actualExpectedCols, 
conf, errors += _)
 }
 
-if (errors.nonEmpty) {

Review Comment:
   Make sense to me. Created new JIRA SPARK-45476 and update the PR title & 
description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754419226

   Got it. Are we going to do the same clean-up for the other data sources like 
Parquet? Or Avro in `avro` module?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub



HyukjinKwon closed pull request #43257: [SPARK-45450][PYTHON] Fix imports 
according to PEP8: pyspark.pandas and pyspark (core)
URL: https://github.com/apache/spark/pull/43257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #43257:
URL: https://github.com/apache/spark/pull/43257#issuecomment-1754413902

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



MaxGekk commented on code in PR #43302:
URL: https://github.com/apache/spark/pull/43302#discussion_r1351508855


##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##
@@ -963,4 +964,18 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), 
null)
 checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), 
null)
   }
+
+  test("SPARK-45473: Non-constant scale to Round") {

Review Comment:
   This is not needed since there are tests in 
ExpressionTypeCheckingSuite.scala.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754410308

   Thank you, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -208,24 +208,63 @@ private[deploy] object JsonProtocol {
* `completeddrivers` a list of Json objects of [[DriverInfo]] of 
the completed drivers
* of the master
* `status` status of the master,
-   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
+   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]].
+   * If `field` is not None, the Json object will contain the matched 
field.

Review Comment:
   Thank you. I re-read about this, @beliefer .
   
   In this paragraph, it's a logical continuation of three sentences. (1) If 
None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to 
mention that `will contain the matched field` and the very next (and final 
sentence) says, `unmatched` case, I believe the AS-IS is better. These three 
follow the end of previous sentences.
   ```
   a Json object containing the following fields if `field` is None: ...
   ```
   ```
   If `field` is not None, the Json object will contain the matched field.
   ```
   ```
   If `field` doesn't match, the Json object `(field -> "")` is returned.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -208,24 +208,63 @@ private[deploy] object JsonProtocol {
* `completeddrivers` a list of Json objects of [[DriverInfo]] of 
the completed drivers
* of the master
* `status` status of the master,
-   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
+   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]].
+   * If `field` is not None, the Json object will contain the matched 
field.

Review Comment:
   Thank you. I re-read about this, @beliefer .
   
   In this paragraph, it's a logical continuation of three sentences. (1) If 
None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to 
mention that `will contain the matched field` and the very next (and final 
sentence) says, `unmatched` case, I believe the AS-IS is better.
   ```
   a Json object containing the following fields if `field` is None: ...
   ```
   ```
   If `field` is not None, the Json object will contain the matched field.
   ```
   ```
   If `field` doesn't match, the Json object `(field -> "")` is returned.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -208,24 +208,63 @@ private[deploy] object JsonProtocol {
* `completeddrivers` a list of Json objects of [[DriverInfo]] of 
the completed drivers
* of the master
* `status` status of the master,
-   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
+   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]].
+   * If `field` is not None, the Json object will contain the matched 
field.

Review Comment:
   Thank you. I re-read about this, @beliefer .
   
   In this paragraph, it's a logical continuation of three sentences. (1) If 
None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to 
mention that `will contain the matched field`. And, the very next (and final 
sentence) says, `unmatched` case. I believe the AS-IS is better.
   ```
   a Json object containing the following fields if `field` is None: ...
   ```
   ```
   If `field` is not None, the Json object will contain the matched field.
   ```
   ```
   If `field` doesn't match, the Json object `(field -> "")` is returned.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



viirya commented on PR #43302:
URL: https://github.com/apache/spark/pull/43302#issuecomment-1754393924

   Thanks @dongjoon-hyun . I've fixed `ExpressionTypeCheckingSuite`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351472659


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -208,24 +208,63 @@ private[deploy] object JsonProtocol {
* `completeddrivers` a list of Json objects of [[DriverInfo]] of 
the completed drivers
* of the master
* `status` status of the master,
-   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
+   * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]].
+   * If `field` is not None, the Json object will contain the matched 
field.

Review Comment:
   `is not None` -> `is matched`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754380347

   Thank you for review and approval, @yaooqinn !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754367501

   Thank you again. The arguments are switched and the default value is given, 
@beliefer .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351440730


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   Sure. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351432373


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   Uh, I missed the signature. Could we put `field: Option[String]` as the last 
parameter?
   It seems give a default `None` is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]

2023-10-09 Thread via GitHub



yorksity commented on PR #43270:
URL: https://github.com/apache/spark/pull/43270#issuecomment-1754356006

   
![image](https://github.com/apache/spark/assets/38931534/a28c65ff-fc71-4eff-948f-977efb505c4e)
   All tests passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754340435

   Thank you for review. The document is updated too, @yaooqinn .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



yaooqinn commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351374027


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {
 val aliveWorkers = obj.workers.filter(_.isAlive())
-("url" -> obj.uri) ~
-("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~
-("aliveworkers" -> aliveWorkers.length) ~
-("cores" -> aliveWorkers.map(_.cores).sum) ~
-("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~
-("memory" -> aliveWorkers.map(_.memory).sum) ~
-("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~
-("resources" -> 
aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~
-("resourcesused" -> 
aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~
-("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~
-("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~
-("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~
-("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~
-("status" -> obj.status.toString)
+field match {
+  case None =>
+("url" -> obj.uri) ~
+("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~
+("aliveworkers" -> aliveWorkers.length) ~
+("cores" -> aliveWorkers.map (_.cores).sum) ~
+("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~
+("memory" -> aliveWorkers.map (_.memory).sum) ~
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) ) ~
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) ) ~
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) ) ~
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) ) ~
+("status" -> obj.status.toString)
+  case Some(field) =>
+field match {
+  case "url" =>
+("url" -> obj.uri)
+  case "workers" =>
+("workers" -> obj.workers.toList.map (writeWorkerInfo) )
+  case "aliveworkers" =>
+("aliveworkers" -> aliveWorkers.length)
+  case "cores" =>
+("cores" -> aliveWorkers.map (_.cores).sum)
+  case "coresused" =>
+("coresused" -> aliveWorkers.map (_.coresUsed).sum)
+  case "memory" =>
+("memory" -> aliveWorkers.map (_.memory).sum)
+  case "memoryused" =>
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum)
+  case "resources" =>
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) )
+  case "resourcesused" =>
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) )
+  case "activeapps" =>
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) )
+  case "completedapps" =>
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) )
+  case "activedrivers" =>
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) 
)
+  case "completeddrivers" =>
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) )
+  case "status" =>
+("status" -> obj.status.toString)
+  case field => (field -> "")

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351373013


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   Sure~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



yaooqinn commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351371713


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   I mean the `@return` part above 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351369480


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {
 val aliveWorkers = obj.workers.filter(_.isAlive())
-("url" -> obj.uri) ~
-("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~
-("aliveworkers" -> aliveWorkers.length) ~
-("cores" -> aliveWorkers.map(_.cores).sum) ~
-("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~
-("memory" -> aliveWorkers.map(_.memory).sum) ~
-("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~
-("resources" -> 
aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~
-("resourcesused" -> 
aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~
-("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~
-("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~
-("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~
-("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~
-("status" -> obj.status.toString)
+field match {
+  case None =>
+("url" -> obj.uri) ~
+("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~
+("aliveworkers" -> aliveWorkers.length) ~
+("cores" -> aliveWorkers.map (_.cores).sum) ~
+("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~
+("memory" -> aliveWorkers.map (_.memory).sum) ~
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) ) ~
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) ) ~
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) ) ~
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) ) ~
+("status" -> obj.status.toString)
+  case Some(field) =>
+field match {
+  case "url" =>
+("url" -> obj.uri)
+  case "workers" =>
+("workers" -> obj.workers.toList.map (writeWorkerInfo) )
+  case "aliveworkers" =>
+("aliveworkers" -> aliveWorkers.length)
+  case "cores" =>
+("cores" -> aliveWorkers.map (_.cores).sum)
+  case "coresused" =>
+("coresused" -> aliveWorkers.map (_.coresUsed).sum)
+  case "memory" =>
+("memory" -> aliveWorkers.map (_.memory).sum)
+  case "memoryused" =>
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum)
+  case "resources" =>
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) )
+  case "resourcesused" =>
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) )
+  case "activeapps" =>
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) )
+  case "completedapps" =>
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) )
+  case "activedrivers" =>
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) 
)
+  case "completeddrivers" =>
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) )
+  case "status" =>
+("status" -> obj.status.toString)
+  case field => (field -> "")

Review Comment:
   This is intentional instead of 404.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351369382


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   Do you mean the documentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



yaooqinn commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351369271


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {
 val aliveWorkers = obj.workers.filter(_.isAlive())
-("url" -> obj.uri) ~
-("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~
-("aliveworkers" -> aliveWorkers.length) ~
-("cores" -> aliveWorkers.map(_.cores).sum) ~
-("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~
-("memory" -> aliveWorkers.map(_.memory).sum) ~
-("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~
-("resources" -> 
aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~
-("resourcesused" -> 
aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~
-("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~
-("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~
-("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~
-("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~
-("status" -> obj.status.toString)
+field match {
+  case None =>
+("url" -> obj.uri) ~
+("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~
+("aliveworkers" -> aliveWorkers.length) ~
+("cores" -> aliveWorkers.map (_.cores).sum) ~
+("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~
+("memory" -> aliveWorkers.map (_.memory).sum) ~
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) ) ~
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) ) ~
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) ) ~
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) ) ~
+("status" -> obj.status.toString)
+  case Some(field) =>
+field match {
+  case "url" =>
+("url" -> obj.uri)
+  case "workers" =>
+("workers" -> obj.workers.toList.map (writeWorkerInfo) )
+  case "aliveworkers" =>
+("aliveworkers" -> aliveWorkers.length)
+  case "cores" =>
+("cores" -> aliveWorkers.map (_.cores).sum)
+  case "coresused" =>
+("coresused" -> aliveWorkers.map (_.coresUsed).sum)
+  case "memory" =>
+("memory" -> aliveWorkers.map (_.memory).sum)
+  case "memoryused" =>
+("memoryused" -> aliveWorkers.map (_.memoryUsed).sum)
+  case "resources" =>
+("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map 
(writeResourcesInfo) )
+  case "resourcesused" =>
+("resourcesused" ->
+  aliveWorkers.map (_.resourcesInfoUsed).toList.map 
(writeResourcesInfo) )
+  case "activeapps" =>
+("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) )
+  case "completedapps" =>
+("completedapps" -> obj.completedApps.toList.map 
(writeApplicationInfo) )
+  case "activedrivers" =>
+("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) 
)
+  case "completeddrivers" =>
+("completeddrivers" -> obj.completedDrivers.toList.map 
(writeDriverInfo) )
+  case "status" =>
+("status" -> obj.status.toString)
+  case field => (field -> "")

Review Comment:
   Use 404?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



yaooqinn commented on code in PR #43303:
URL: https://github.com/apache/spark/pull/43303#discussion_r1351369115


##
core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala:
##
@@ -210,22 +210,59 @@ private[deploy] object JsonProtocol {
* `status` status of the master,
* see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]
*/
-  def writeMasterState(obj: MasterStateResponse): JObject = {
+  def writeMasterState(field: Option[String], obj: MasterStateResponse): 
JObject = {

Review Comment:
   Shall we update the annotation, too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754322080

   Just a question, is this required for SPARK-44114? Or, simply want to remove 
the magic strings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754308194

   Thank you for review, @beliefer !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]

2023-10-09 Thread via GitHub



hasnain-db commented on PR #43289:
URL: https://github.com/apache/spark/pull/43289#issuecomment-1754299341

   @LuciferYang the docker integration tests are consistently failing with
   
   ```
   [info] *** 2 SUITES ABORTED ***
   [error] Error during tests:
   [error]  org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite
   [error]  org.apache.spark.sql.jdbc.OracleIntegrationSuite
   ```
   
   I don't know if they are flaky but it seems to be unrelated to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector elements [spark]

2023-10-09 Thread via GitHub



srowen commented on PR #43288:
URL: https://github.com/apache/spark/pull/43288#issuecomment-1754292488

   Is that really an error? It's the only way to represent a missing value


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action. [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #39976:
URL: https://github.com/apache/spark/pull/39976#issuecomment-1754286405

   Thanks. Made a followup: https://github.com/apache/spark/pull/43304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils [spark]

2023-10-09 Thread via GitHub



HyukjinKwon opened a new pull request, #43304:
URL: https://github.com/apache/spark/pull/43304

   ### What changes were proposed in this pull request?
   
   This PR is kind of a followup for https://github.com/apache/spark/pull/39976 
that addresses 
https://github.com/apache/spark/pull/39976#issuecomment-1752930380 comment.
   
   ### Why are the changes needed?
   
   In order to probably assign the SQL execution ID so `df.observe` works with 
this.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. `df.observe` will work with JDBC connectors.
   
   ### How was this patch tested?
   
   Manually tested.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351289594


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -110,10 +110,6 @@ object TableOutputResolver {
   resolveColumnsByPosition(tableName, query.output, actualExpectedCols, 
conf, errors += _)
 }
 
-if (errors.nonEmpty) {

Review Comment:
   I would jus tthrow an exception here since this is a followup of 
SPARK-42309. If we need to fix a different issue, we should have another jira.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun commented on PR #43303:
URL: https://github.com/apache/spark/pull/43303#issuecomment-1754272348

   Could you review this when you have some time, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45417][PYTHON] Make InheritableThread inherit the active session [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #43231:
URL: https://github.com/apache/spark/pull/43231#issuecomment-1754268341

   Mind taking a look at the linter failure? 
https://github.com/clee704/spark/actions/runs/6462049944/job/17543011013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-45474][CORE] Support top-level filtering in MasterPage JSON API [spark]

2023-10-09 Thread via GitHub



dongjoon-hyun opened a new pull request, #43303:
URL: https://github.com/apache/spark/pull/43303

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #43257:
URL: https://github.com/apache/spark/pull/43257#issuecomment-1754265070

   Yeah, I think we should add this to a linter ..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector values [spark]

2023-10-09 Thread via GitHub



zhengruifeng commented on PR #43288:
URL: https://github.com/apache/spark/pull/43288#issuecomment-1754259142

   > Does this change behavior or just refactor code? I'm actually surprised 
that NaN is treated as an error but looks like that is existing behavior?
   
   It is a behavior change.
   Before this PR, NaN/null doubles and null vectors are treated as errors, but 
NaN values inside a vector are ignored.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]

2023-10-09 Thread via GitHub



HeartSaVioR closed pull request #43299: [SPARK-45472][SS] RocksDB State Store 
Doesn't Need to Recheck checkpoint path existence
URL: https://github.com/apache/spark/pull/43299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]

2023-10-09 Thread via GitHub



HeartSaVioR commented on PR #43299:
URL: https://github.com/apache/spark/pull/43299#issuecomment-1754222412

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub



HeartSaVioR commented on PR #43174:
URL: https://github.com/apache/spark/pull/43174#issuecomment-1754205290

   Just ported the fix back to 3.5 as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub



HeartSaVioR closed pull request #43174: [SPARK-45419][SS] Avoid reusing rocksdb 
sst files in a dfferent rocksdb instance
URL: https://github.com/apache/spark/pull/43174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43280:
URL: https://github.com/apache/spark/pull/43280#discussion_r1351195382


##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -2580,7 +2580,7 @@ private[spark] class DAGScheduler(
 // if the cluster manager explicitly tells us that the entire worker was 
lost, then
 // we know to unregister shuffle output.  (Note that "worker" specifically 
refers to the process
 // from a Standalone cluster, where the shuffle service lives in the 
Worker.)
-val fileLost = !sc.shuffleDriverComponents.supportsReliableStorage() &&
+val fileLost = !sc.shuffleDriverComponents.supportsReliableStorage(execId) 
&&

Review Comment:
   I have the same issue. 
https://github.com/apache/spark/pull/43280#discussion_r134998



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]

2023-10-09 Thread via GitHub



HeartSaVioR commented on PR #43174:
URL: https://github.com/apache/spark/pull/43174#issuecomment-1754204031

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43039:
URL: https://github.com/apache/spark/pull/43039#discussion_r1351192772


##
python/pyspark/sql/dataframe.py:
##
@@ -2646,67 +2647,147 @@ def join(
 
 Examples
 
-The following performs a full outer join between ``df1`` and ``df2``.
+The following examples demonstrate various join types between ``df1`` 
and ``df2``.

Review Comment:
   I am neutral on this change here.
   
   Overall, LGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #42763:
URL: https://github.com/apache/spark/pull/42763#discussion_r1351177061


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -455,6 +457,19 @@ object TableOutputResolver {
 }
   }
 
+  def suitableForByNameCheck(
+  byName: Boolean,
+  expected: Seq[Attribute],
+  queryOutput: Seq[Attribute]): Unit = {
+if (!byName && expected.size == queryOutput.size &&
+  expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, 
e.name))) &&
+  expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, 
e._2.name))) {
+  logWarning("The query columns and the table columns have same names but 
different " +
+"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the 
query columns to " +
+"align with the table columns.")

Review Comment:
   I know that. We can extend the example above to satisfy the condition you 
mentioned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #42398:
URL: https://github.com/apache/spark/pull/42398#discussion_r1351167115


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.UnaryLike
+import org.apache.spark.sql.catalyst.types.PhysicalDataType
+import org.apache.spark.sql.types.{DataType, StringType}
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.collection.OpenHashMap
+
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the concatenated input values," +
+" separated by the delimiter string.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
+   a,b,c
+  > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col);
+   a,b
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col);
+   a,a
+  > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS 
tab(col);
+   a,b
+  > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col);
+   a|b
+  > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col);
+   NULL
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+case class ListAgg(
+child: Expression,
+delimiter: Expression = Literal.create(",", StringType),
+reverse: Boolean = false,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0) extends TypedAggregateWithHashMapAsBuffer
+  with UnaryLike[Expression] {
+
+  def this(child: Expression) = this(child, Literal.create(",", StringType), 
false, 0, 0)
+  def this(child: Expression, delimiter: Expression) = this(child, delimiter, 
false, 0, 0)
+
+  override def update(
+  buffer: OpenHashMap[AnyRef, Long],

Review Comment:
   I think this issue has been avoided if we refactor with `CollectList` or 
`extends Collect`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub



panbingkun commented on PR #43286:
URL: https://github.com/apache/spark/pull/43286#issuecomment-1754193939

   > Thank you @panbingkun! Can we regenerate the docs and add the canonical 
links to the released docs HTML?
   
   Regarding the published document HTML, the following PR is in progress:
   https://github.com/apache/spark-website/pull/482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



beliefer commented on PR #43296:
URL: https://github.com/apache/spark/pull/43296#issuecomment-1754189027

   > Ur, very sorry, but I'd not do this ORC change when we do the samething 
still for Parquet , @beliefer . This only increases the compile depdency for 
`org.apache.hadoop.hive.ql.io.orc.CompressionKind`. We prefer the existing 
loose-coupled way.
   
   All the change in the hive module already depends on `hive-exec`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub



Hisoka-X commented on code in PR #42763:
URL: https://github.com/apache/spark/pull/42763#discussion_r1351138719


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -455,6 +457,19 @@ object TableOutputResolver {
 }
   }
 
+  def suitableForByNameCheck(
+  byName: Boolean,
+  expected: Seq[Attribute],
+  queryOutput: Seq[Attribute]): Unit = {
+if (!byName && expected.size == queryOutput.size &&
+  expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, 
e.name))) &&
+  expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, 
e._2.name))) {
+  logWarning("The query columns and the table columns have same names but 
different " +
+"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the 
query columns to " +
+"align with the table columns.")

Review Comment:
   In the situation you metioned, the warning message will not show up. It only 
work when all field name equals but field name order mismatch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43296:
URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala:
##
@@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest {
   val path = s"${dir.getCanonicalPath}/table1"
   val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
   df.write
-.option("compression", "ZlIb")
+.option("compression", CompressionKind.ZLIB.name())

Review Comment:
   I will adjust it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #43296:
URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985


##
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala:
##
@@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest {
   val path = s"${dir.getCanonicalPath}/table1"
   val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
   df.write
-.option("compression", "ZlIb")
+.option("compression", CompressionKind.ZLIB.name())

Review Comment:
   I will revert this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]

2023-10-09 Thread via GitHub



beliefer commented on code in PR #42763:
URL: https://github.com/apache/spark/pull/42763#discussion_r1351135259


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -455,6 +457,19 @@ object TableOutputResolver {
 }
   }
 
+  def suitableForByNameCheck(
+  byName: Boolean,
+  expected: Seq[Attribute],
+  queryOutput: Seq[Attribute]): Unit = {
+if (!byName && expected.size == queryOutput.size &&
+  expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, 
e.name))) &&
+  expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, 
e._2.name))) {
+  logWarning("The query columns and the table columns have same names but 
different " +
+"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the 
query columns to " +
+"align with the table columns.")

Review Comment:
   Users always insert table columns doesn't match the query columns. For 
example, table `student(name: STRING, address: STRING)` and table 
`location(code: STRING, name: STRING)`. The `student.address` references 
`location.name`.
   I am neutral on this warning message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat [spark]

2023-10-09 Thread via GitHub



Hisoka-X commented on code in PR #43243:
URL: https://github.com/apache/spark/pull/43243#discussion_r1351132582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala:
##
@@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends 
Serializable {
 // We can only parse the value as TimestampNTZType if it does not have 
zone-offset or
 // time-zone component and can be parsed with the timestamp formatter.
 // Otherwise, it is likely to be a timestamp with timezone.
-if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {
-  SQLConf.get.timestampType
+val timestampType = SQLConf.get.timestampType
+if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY ||
+timestampType == TimestampNTZType) &&
+timestampNTZFormatter.parseWithoutTimeZoneOptional(field, 
false).isDefined) {

Review Comment:
   1. Because the `LEGACY` behavior used `timestampNTZFormatter` to parse 
timestamp. So I don't change it when use `LEGACY` mode. Without this, some test 
case like `CSVLegacyTimeParserSuite.SPARK-37326: Timestamp type inference for a 
column with TIMESTAMP_NTZ values` can't passed. 
https://github.com/Hisoka-X/spark/runs/17462554632
   2. It should be `(legacyTimeParserPolicy = LEGACY || timestampType == 
TimestampNTZType)` not `(legacyTimeParserPolicy = LEGACY || timestampType == 
TimestampLTZType)` if I think correctly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub



Hisoka-X commented on PR #42762:
URL: https://github.com/apache/spark/pull/42762#issuecomment-1754157169

   cc @HyukjinKwon @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]

2023-10-09 Thread via GitHub



itholic commented on code in PR #42762:
URL: https://github.com/apache/spark/pull/42762#discussion_r1351085843


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala:
##
@@ -110,10 +110,6 @@ object TableOutputResolver {
   resolveColumnsByPosition(tableName, query.output, actualExpectedCols, 
conf, errors += _)
 }
 
-if (errors.nonEmpty) {

Review Comment:
   Yeah, I think we can just remove it. Sorry for delaying, I've been stuck on 
Pandas 2.x support on PySpark side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]

2023-10-09 Thread via GitHub



holdenk commented on PR #43257:
URL: https://github.com/apache/spark/pull/43257#issuecomment-1754124296

   Also could we add automated checking for the import order change so that we 
don't have to do this again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub



holdenk commented on code in PR #42398:
URL: https://github.com/apache/spark/pull/42398#discussion_r1351071287


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.UnaryLike
+import org.apache.spark.sql.catalyst.types.PhysicalDataType
+import org.apache.spark.sql.types.{DataType, StringType}
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.collection.OpenHashMap
+
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the concatenated input values," +
+" separated by the delimiter string.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
+   a,b,c
+  > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col);
+   a,b
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col);
+   a,a
+  > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS 
tab(col);
+   a,b
+  > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col);
+   a|b
+  > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col);
+   NULL
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+case class ListAgg(
+child: Expression,
+delimiter: Expression = Literal.create(",", StringType),
+reverse: Boolean = false,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0) extends TypedAggregateWithHashMapAsBuffer
+  with UnaryLike[Expression] {
+
+  def this(child: Expression) = this(child, Literal.create(",", StringType), 
false, 0, 0)
+  def this(child: Expression, delimiter: Expression) = this(child, delimiter, 
false, 0, 0)
+
+  override def update(
+  buffer: OpenHashMap[AnyRef, Long],

Review Comment:
   So the idea with that is to bubble up the distinct to avoid a shuffle?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2023-10-09 Thread via GitHub



holdenk commented on code in PR #42398:
URL: https://github.com/apache/spark/pull/42398#discussion_r1351070023


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala:
##
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.trees.UnaryLike
+import org.apache.spark.sql.catalyst.types.PhysicalDataType
+import org.apache.spark.sql.types.{DataType, StringType}
+import org.apache.spark.unsafe.types.UTF8String
+import org.apache.spark.util.collection.OpenHashMap
+
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the concatenated input values," +
+" separated by the delimiter string.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
+   a,b,c
+  > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col);
+   a,b
+  > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col);
+   a,a
+  > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS 
tab(col);
+   a,b
+  > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col);
+   a|b
+  > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col);
+   NULL
+  """,
+  group = "agg_funcs",
+  since = "4.0.0")
+case class ListAgg(

Review Comment:
   Similarily lets put this in collect.scala with the other collect friends.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]

2023-10-09 Thread via GitHub



holdenk commented on code in PR #43039:
URL: https://github.com/apache/spark/pull/43039#discussion_r1351066467


##
python/pyspark/sql/dataframe.py:
##
@@ -2646,67 +2647,147 @@ def join(
 
 Examples
 
-The following performs a full outer join between ``df1`` and ``df2``.
+The following examples demonstrate various join types between ``df1`` 
and ``df2``.

Review Comment:
   nit: it's not just df1 and df2 (no need to fix unless other changes)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44837][SQL] Improve ALTER TABLE ALTER PARTITION column error message [spark]

2023-10-09 Thread via GitHub



holdenk commented on code in PR #42524:
URL: https://github.com/apache/spark/pull/42524#discussion_r1351065858


##
common/utils/src/main/resources/error/error-classes.json:
##
@@ -97,6 +97,12 @@
   "The method  can not be called on streaming 
Dataset/DataFrame."
 ]
   },
+  "CANNOT_ALTER_PARTITION_COLUMN" : {
+"message" : [
+  "ALTER TABLE (ALTER|CHANGE) COLUMN is not supported for partition 
columns, but found the partition column  in the table "
+],

Review Comment:
   Would it be good to clarify that it is not supported with the catalog of the 
specified table? e.g. sometimes it is supported with the right catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



holdenk commented on PR #43302:
URL: https://github.com/apache/spark/pull/43302#issuecomment-1754111612

   LGTM pending CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor [spark]

2023-10-09 Thread via GitHub



holdenk commented on PR #38852:
URL: https://github.com/apache/spark/pull/38852#issuecomment-1754108618

   Maybe the `NettyBlockTransferServiceSuite` would be enough so we can assert 
that we are keeping track of active streams and also once done that the count 
is zero (so we know this is not blocking shutdown indefinitely). WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey [spark]

2023-10-09 Thread via GitHub



github-actions[bot] closed pull request #40334: [SPARK-42716][SQL] DataSourceV2 
supports reporting key-grouped partitioning without HasPartitionKey
URL: https://github.com/apache/spark/pull/40334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]

2023-10-09 Thread via GitHub



HyukjinKwon closed pull request #42859: [SPARK-43299][SS][CONNECT] Convert 
StreamingQueryException in Scala Client
URL: https://github.com/apache/spark/pull/42859


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]

2023-10-09 Thread via GitHub



HyukjinKwon commented on PR #42859:
URL: https://github.com/apache/spark/pull/42859#issuecomment-1754064070

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub



ueshin commented on code in PR #43204:
URL: https://github.com/apache/spark/pull/43204#discussion_r1350999129


##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##
@@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction {
   val schema = DataType.fromJson(
 PythonWorkerUtils.readUTF(length, dataIn)).asInstanceOf[StructType]
 
+  // Receive the pickled AnalyzeResult buffer, if any.
+  val pickledAnalyzeResult: Option[Array[Byte]] = dataIn.readInt() match {
+case 0 =>
+  None

Review Comment:
   So it won't happen here as this is running `analyze` method.
   I think `PythonUDTFAnalyzeResult.pickledAnalyzeResult` should be 
`Array[Byte]` and it should be wrapped with `Some` in `Analyzer`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub



dtenedor commented on code in PR #43204:
URL: https://github.com/apache/spark/pull/43204#discussion_r1350991334


##
python/pyspark/worker.py:
##
@@ -693,6 +698,21 @@ def read_udtf(pickleSer, infile, eval_type):
 f"The return type of a UDTF must be a struct type, but got 
{type(return_type)}."
 )
 
+# Update the handler that creates a new UDTF instance to first try calling 
the UDTF constructor
+# with one argument containing the previous AnalyzeResult. If that fails, 
then try a constructor
+# with no arguments. In this way each UDTF class instance can decide if it 
wants to inspect the
+# AnalyzeResult.
+if has_pickled_analyze_result:
+prev_handler = handler
+
+def construct_udtf():
+try:
+return prev_handler(pickled_analyze_result)

Review Comment:
   Sounds good, done.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##
@@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction {
   val schema = DataType.fromJson(
 PythonWorkerUtils.readUTF(length, dataIn)).asInstanceOf[StructType]
 
+  // Receive the pickled AnalyzeResult buffer, if any.
+  val pickledAnalyzeResult: Option[Array[Byte]] = dataIn.readInt() match {
+case 0 =>
+  None

Review Comment:
   This should not happen unless the UDTF does not include an `analyze` method, 
in which case there is no picked `AnalyzeResult` buffer.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala:
##
@@ -241,12 +248,17 @@ case class UnresolvedPolymorphicPythonUDTF(
  * @param orderByExpressions if non-empty, this contains the list of ordering 
items that the
  *   'analyze' method explicitly indicated that the 
UDTF call should consume
  *   the input table rows by
+ * @param pickledAnalyzeResult this is the pickled 'AnalyzeResult' instance 
from the UDTF, which
+ *  contains all metadata returned by the Python 
UDTF 'analyze' method
+ *  including the result schema of the function 
call as well as optional
+ *  other information
  */
 case class PythonUDTFAnalyzeResult(
 schema: StructType,
 withSinglePartition: Boolean,
 partitionByExpressions: Seq[Expression],
-orderByExpressions: Seq[SortOrder]) {
+orderByExpressions: Seq[SortOrder],
+pickledAnalyzeResult: Option[Array[Byte]]) {

Review Comment:
   I find we need this since if the UDTF does not include an `analyze` method, 
this buffer is empty, and corresponding tests using these functions fail unless 
we check that this buffer is non-empty before using it.



##
python/pyspark/sql/udtf.py:
##
@@ -85,7 +85,7 @@ class OrderingColumn:
 overrideNullsFirst: Optional[bool] = None
 
 
-@dataclass(frozen=True)
+@dataclass

Review Comment:
   I found it was necessary to allow UDTFs to create subclasses of 
`AnalyzeResult`. This is the way that we plan for these functions to compute 
custom state in the `analyze` method just once per function call and then 
propagate this information to future class instances to be consumed by the 
`eval` and `terminate` methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] Minor: Fix incorrect error message for RoundBase [spark]

2023-10-09 Thread via GitHub



viirya opened a new pull request, #43302:
URL: https://github.com/apache/spark/pull/43302

   
   
   ### What changes were proposed in this pull request?
   
   
   This minor patch fixes incorrect error message of `RoundBase`.
   
   ### Why are the changes needed?
   
   
   Fix incorrect error message.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Unit test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub



allisonwang-db commented on PR #42425:
URL: https://github.com/apache/spark/pull/42425#issuecomment-1753943908

   @panbingkun thank you so much for working on this. It will be super helpful 
in improving the ranking for the PySpark documentations. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]

2023-10-09 Thread via GitHub



allisonwang-db commented on PR #43286:
URL: https://github.com/apache/spark/pull/43286#issuecomment-1753939826

   Thank you @panbingkun! Can we regenerate the docs and add the canonical 
links to the released docs HTML?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub



mridulm commented on PR #43280:
URL: https://github.com/apache/spark/pull/43280#issuecomment-1753930980

   Btw, it is not very clear to me how this functionality is going to be 
leveraged - unlers you are relying on resource profiles to tag which executors 
should leverage reliable shuffle and which should not.
   Spark will run tasks across stages on an executor - so there is no 
expectation of executor id to stage mapping (except via resource profiles).
   
   Please add more details in the PR description on how this can be leveraged 
reliably.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]

2023-10-09 Thread via GitHub



ueshin commented on code in PR #43204:
URL: https://github.com/apache/spark/pull/43204#discussion_r1350811414


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala:
##
@@ -241,12 +248,17 @@ case class UnresolvedPolymorphicPythonUDTF(
  * @param orderByExpressions if non-empty, this contains the list of ordering 
items that the
  *   'analyze' method explicitly indicated that the 
UDTF call should consume
  *   the input table rows by
+ * @param pickledAnalyzeResult this is the pickled 'AnalyzeResult' instance 
from the UDTF, which
+ *  contains all metadata returned by the Python 
UDTF 'analyze' method
+ *  including the result schema of the function 
call as well as optional
+ *  other information
  */
 case class PythonUDTFAnalyzeResult(
 schema: StructType,
 withSinglePartition: Boolean,
 partitionByExpressions: Seq[Expression],
-orderByExpressions: Seq[SortOrder]) {
+orderByExpressions: Seq[SortOrder],
+pickledAnalyzeResult: Option[Array[Byte]]) {

Review Comment:
   We don't need `Option` here as it will always be set?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub



mridulm commented on code in PR #43280:
URL: https://github.com/apache/spark/pull/43280#discussion_r1350810420


##
core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java:
##
@@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) 
{}
* Does this shuffle component support reliable storage - external to the 
lifecycle of the
* executor host ? For example, writing shuffle data to a distributed 
filesystem or
* persisting it in a remote shuffle service.
+   *
+   * Note: This method is for compatibility with older implementations,
+   * the newer implementation should not use this method.
*/
   default boolean supportsReliableStorage() {
 return false;
   }
+
+  /**
+   * Does this executor support reliable storage for all shuffle data.
+   * @param execId The executor id, null for validation use only.

Review Comment:
   executor id should be valid - and spark will pass a valid id when using this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]

2023-10-09 Thread via GitHub



mridulm commented on code in PR #43280:
URL: https://github.com/apache/spark/pull/43280#discussion_r1350809807


##
core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java:
##
@@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) 
{}
* Does this shuffle component support reliable storage - external to the 
lifecycle of the
* executor host ? For example, writing shuffle data to a distributed 
filesystem or
* persisting it in a remote shuffle service.
+   *
+   * Note: This method is for compatibility with older implementations,
+   * the newer implementation should not use this method.

Review Comment:
   This comment addition is incorrect.
   A shuffle implementation can either choose to be reliable irrespective of 
executor id, or choose to be reliable only for specific subset of executor id's 
- that is an implementation choice of the implementation, and not a function of 
newer vs older



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 217 matches

Mail list logo