Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun closed pull request #43303: [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API URL: https://github.com/apache/spark/pull/43303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]
LuciferYang closed pull request #43282: [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata URL: https://github.com/apache/spark/pull/43282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]
LuciferYang commented on PR #43282: URL: https://github.com/apache/spark/pull/43282#issuecomment-1754510389 Merged into branch-3.3, thanks @pan3793 ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754509757 `core` module passed and this new feature is not invoked in the old code. Let me merge this. Thank you again! ![Screenshot 2023-10-09 at 11 43 51 PM](https://github.com/apache/spark/assets/9700541/56ccc35f-756a-40a3-aad8-b20e32cd0950) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45389][SQL][3.3] Correct MetaException matching rule on getting partition metadata [spark]
pan3793 commented on PR #43282: URL: https://github.com/apache/spark/pull/43282#issuecomment-1754504959 CI passed, seems there are some issues on GitHub Action state sync https://github.com/apache/spark/assets/26535726/a8030635-8474-4b05-82ac-6e9d71e12c62";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45389][SQL][3.4] Correct MetaException matching rule on getting partition metadata [spark]
LuciferYang closed pull request #43281: [SPARK-45389][SQL][3.4] Correct MetaException matching rule on getting partition metadata URL: https://github.com/apache/spark/pull/43281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45389][SQL][3.4] Correct MetaException matching rule on getting partition metadata [spark]
LuciferYang commented on PR #43281: URL: https://github.com/apache/spark/pull/43281#issuecomment-1754501342 Merged into branch-3.4, thanks @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45449][SQL] Cache Invalidation Issue with JDBC Table [spark]
cloud-fan closed pull request #43258: [SPARK-45449][SQL] Cache Invalidation Issue with JDBC Table URL: https://github.com/apache/spark/pull/43258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45449][SQL] Cache Invalidation Issue with JDBC Table [spark]
cloud-fan commented on PR #43258: URL: https://github.com/apache/spark/pull/43258#issuecomment-1754497294 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]
cloud-fan closed pull request #43270: [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. URL: https://github.com/apache/spark/pull/43270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]
cloud-fan commented on PR #43270: URL: https://github.com/apache/spark/pull/43270#issuecomment-1754493893 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-43664][CONNECT][PS][DOCS] Add note for `ps.sql` [spark]
itholic commented on PR #43237: URL: https://github.com/apache/spark/pull/43237#issuecomment-1754486647 Applied suggestions & updated PR title/description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions [spark]
MaxGekk closed pull request #43271: [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions URL: https://github.com/apache/spark/pull/43271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45458][SQL] Convert IllegalArgumentException to SparkIllegalArgumentException in bitwiseExpressions [spark]
MaxGekk commented on PR #43271: URL: https://github.com/apache/spark/pull/43271#issuecomment-1754473107 +1, LGTM. Merging to master. Thank you, @panbingkun and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]
viirya commented on PR #43302: URL: https://github.com/apache/spark/pull/43302#issuecomment-1754467148 Thank you @holdenk @beliefer @dongjoon-hyun @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-45477][INFRA] Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming [spark]
LuciferYang opened a new pull request, #43305: URL: https://github.com/apache/spark/pull/43305 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils [spark]
HyukjinKwon commented on PR #43304: URL: https://github.com/apache/spark/pull/43304#issuecomment-1754455150 > QQ: is there an E2E test for this change? Added a simple test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
beliefer commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754457307 @dongjoon-hyun @LuciferYang @MaxGekk Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
dongjoon-hyun commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754448693 Merged to master for Apache Spark 4.0.0~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
dongjoon-hyun closed pull request #43296: [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind URL: https://github.com/apache/spark/pull/43296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754445286 Thank you, @beliefer ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
beliefer commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754437003 > Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module? Yes. I will do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351528465 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a list of Json objects of [[DriverInfo]] of the completed drivers * of the master * `status` status of the master, - * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] + * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]. + * If `field` is not None, the Json object will contain the matched field. Review Comment: Got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]
LuciferYang commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754435400 Thank you for your confirmation @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]
dongjoon-hyun commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754434466 Yes, `re-trigger` will help. I saw the same situation before after upgrading the recent Oracle image, but didn't check the root cause yet, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]
LuciferYang commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754433043 > @LuciferYang the docker integration tests are consistently failing with > > ``` > [info] *** 2 SUITES ABORTED *** > [error] Error during tests: > [error]org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite > [error]org.apache.spark.sql.jdbc.OracleIntegrationSuite > ``` > > I don't know if they are flaky but it seems to be unrelated to this PR https://github.com/apache/spark/pull/43123#issuecomment-1752118564 friendly ping @dongjoon-hyun, it seems like this is an unstable case? I noticed that you have also encountered similar failures. @hasnain-db Can you manually re-trigger `Run Docker integration tests`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]
viirya commented on code in PR #43302: URL: https://github.com/apache/spark/pull/43302#discussion_r1351517603 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -963,4 +964,18 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), null) checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), null) } + + test("SPARK-45473: Non-constant scale to Round") { Review Comment: Removed. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45476][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]
itholic commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351516059 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver { resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _) } -if (errors.nonEmpty) { Review Comment: Make sense to me. Created new JIRA SPARK-45476 and update the PR title & description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
dongjoon-hyun commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754419226 Got it. Are we going to do the same clean-up for the other data sources like Parquet? Or Avro in `avro` module? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]
HyukjinKwon closed pull request #43257: [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) URL: https://github.com/apache/spark/pull/43257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]
HyukjinKwon commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754413902 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]
MaxGekk commented on code in PR #43302: URL: https://github.com/apache/spark/pull/43302#discussion_r1351508855 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -963,4 +964,18 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(WidthBucket(5.35, 0.024, Double.NegativeInfinity, 5L), null) checkEvaluation(WidthBucket(5.35, 0.024, Double.PositiveInfinity, 5L), null) } + + test("SPARK-45473: Non-constant scale to Round") { Review Comment: This is not needed since there are tests in ExpressionTypeCheckingSuite.scala. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754410308 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a list of Json objects of [[DriverInfo]] of the completed drivers * of the master * `status` status of the master, - * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] + * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]. + * If `field` is not None, the Json object will contain the matched field. Review Comment: Thank you. I re-read about this, @beliefer . In this paragraph, it's a logical continuation of three sentences. (1) If None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to mention that `will contain the matched field` and the very next (and final sentence) says, `unmatched` case, I believe the AS-IS is better. These three follow the end of previous sentences. ``` a Json object containing the following fields if `field` is None: ... ``` ``` If `field` is not None, the Json object will contain the matched field. ``` ``` If `field` doesn't match, the Json object `(field -> "")` is returned. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a list of Json objects of [[DriverInfo]] of the completed drivers * of the master * `status` status of the master, - * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] + * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]. + * If `field` is not None, the Json object will contain the matched field. Review Comment: Thank you. I re-read about this, @beliefer . In this paragraph, it's a logical continuation of three sentences. (1) If None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to mention that `will contain the matched field` and the very next (and final sentence) says, `unmatched` case, I believe the AS-IS is better. ``` a Json object containing the following fields if `field` is None: ... ``` ``` If `field` is not None, the Json object will contain the matched field. ``` ``` If `field` doesn't match, the Json object `(field -> "")` is returned. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351506716 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a list of Json objects of [[DriverInfo]] of the completed drivers * of the master * `status` status of the master, - * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] + * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]. + * If `field` is not None, the Json object will contain the matched field. Review Comment: Thank you. I re-read about this, @beliefer . In this paragraph, it's a logical continuation of three sentences. (1) If None, (2) If not None, (2-1) If `field` doesn't matched. Since (2) starts to mention that `will contain the matched field`. And, the very next (and final sentence) says, `unmatched` case. I believe the AS-IS is better. ``` a Json object containing the following fields if `field` is None: ... ``` ``` If `field` is not None, the Json object will contain the matched field. ``` ``` If `field` doesn't match, the Json object `(field -> "")` is returned. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]
viirya commented on PR #43302: URL: https://github.com/apache/spark/pull/43302#issuecomment-1754393924 Thanks @dongjoon-hyun . I've fixed `ExpressionTypeCheckingSuite`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351472659 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -208,24 +208,63 @@ private[deploy] object JsonProtocol { * `completeddrivers` a list of Json objects of [[DriverInfo]] of the completed drivers * of the master * `status` status of the master, - * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] + * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]]. + * If `field` is not None, the Json object will contain the matched field. Review Comment: `is not None` -> `is matched`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754380347 Thank you for review and approval, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754367501 Thank you again. The arguments are switched and the default value is given, @beliefer . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351440730 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: Sure. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
beliefer commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351432373 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: Uh, I missed the signature. Could we put `field: Option[String]` as the last parameter? It seems give a default `None` is better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs. [spark]
yorksity commented on PR #43270: URL: https://github.com/apache/spark/pull/43270#issuecomment-1754356006 ![image](https://github.com/apache/spark/assets/38931534/a28c65ff-fc71-4eff-948f-977efb505c4e) All tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754340435 Thank you for review. The document is updated too, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351374027 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { val aliveWorkers = obj.workers.filter(_.isAlive()) -("url" -> obj.uri) ~ -("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~ -("aliveworkers" -> aliveWorkers.length) ~ -("cores" -> aliveWorkers.map(_.cores).sum) ~ -("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~ -("memory" -> aliveWorkers.map(_.memory).sum) ~ -("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~ -("resources" -> aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~ -("resourcesused" -> aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~ -("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~ -("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~ -("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~ -("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~ -("status" -> obj.status.toString) +field match { + case None => +("url" -> obj.uri) ~ +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~ +("aliveworkers" -> aliveWorkers.length) ~ +("cores" -> aliveWorkers.map (_.cores).sum) ~ +("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~ +("memory" -> aliveWorkers.map (_.memory).sum) ~ +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~ +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) ~ +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) ~ +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~ +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) ~ +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~ +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) ~ +("status" -> obj.status.toString) + case Some(field) => +field match { + case "url" => +("url" -> obj.uri) + case "workers" => +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) + case "aliveworkers" => +("aliveworkers" -> aliveWorkers.length) + case "cores" => +("cores" -> aliveWorkers.map (_.cores).sum) + case "coresused" => +("coresused" -> aliveWorkers.map (_.coresUsed).sum) + case "memory" => +("memory" -> aliveWorkers.map (_.memory).sum) + case "memoryused" => +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) + case "resources" => +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) + case "resourcesused" => +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) + case "activeapps" => +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) + case "completedapps" => +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) + case "activedrivers" => +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) + case "completeddrivers" => +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) + case "status" => +("status" -> obj.status.toString) + case field => (field -> "") Review Comment: OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351373013 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: Sure~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351371713 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: I mean the `@return` part above -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369480 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { val aliveWorkers = obj.workers.filter(_.isAlive()) -("url" -> obj.uri) ~ -("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~ -("aliveworkers" -> aliveWorkers.length) ~ -("cores" -> aliveWorkers.map(_.cores).sum) ~ -("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~ -("memory" -> aliveWorkers.map(_.memory).sum) ~ -("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~ -("resources" -> aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~ -("resourcesused" -> aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~ -("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~ -("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~ -("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~ -("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~ -("status" -> obj.status.toString) +field match { + case None => +("url" -> obj.uri) ~ +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~ +("aliveworkers" -> aliveWorkers.length) ~ +("cores" -> aliveWorkers.map (_.cores).sum) ~ +("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~ +("memory" -> aliveWorkers.map (_.memory).sum) ~ +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~ +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) ~ +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) ~ +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~ +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) ~ +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~ +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) ~ +("status" -> obj.status.toString) + case Some(field) => +field match { + case "url" => +("url" -> obj.uri) + case "workers" => +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) + case "aliveworkers" => +("aliveworkers" -> aliveWorkers.length) + case "cores" => +("cores" -> aliveWorkers.map (_.cores).sum) + case "coresused" => +("coresused" -> aliveWorkers.map (_.coresUsed).sum) + case "memory" => +("memory" -> aliveWorkers.map (_.memory).sum) + case "memoryused" => +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) + case "resources" => +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) + case "resourcesused" => +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) + case "activeapps" => +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) + case "completedapps" => +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) + case "activedrivers" => +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) + case "completeddrivers" => +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) + case "status" => +("status" -> obj.status.toString) + case field => (field -> "") Review Comment: This is intentional instead of 404. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369382 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: Do you mean the documentation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369271 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { val aliveWorkers = obj.workers.filter(_.isAlive()) -("url" -> obj.uri) ~ -("workers" -> obj.workers.toList.map(writeWorkerInfo)) ~ -("aliveworkers" -> aliveWorkers.length) ~ -("cores" -> aliveWorkers.map(_.cores).sum) ~ -("coresused" -> aliveWorkers.map(_.coresUsed).sum) ~ -("memory" -> aliveWorkers.map(_.memory).sum) ~ -("memoryused" -> aliveWorkers.map(_.memoryUsed).sum) ~ -("resources" -> aliveWorkers.map(_.resourcesInfo).toList.map(writeResourcesInfo)) ~ -("resourcesused" -> aliveWorkers.map(_.resourcesInfoUsed).toList.map(writeResourcesInfo)) ~ -("activeapps" -> obj.activeApps.toList.map(writeApplicationInfo)) ~ -("completedapps" -> obj.completedApps.toList.map(writeApplicationInfo)) ~ -("activedrivers" -> obj.activeDrivers.toList.map(writeDriverInfo)) ~ -("completeddrivers" -> obj.completedDrivers.toList.map(writeDriverInfo)) ~ -("status" -> obj.status.toString) +field match { + case None => +("url" -> obj.uri) ~ +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) ~ +("aliveworkers" -> aliveWorkers.length) ~ +("cores" -> aliveWorkers.map (_.cores).sum) ~ +("coresused" -> aliveWorkers.map (_.coresUsed).sum) ~ +("memory" -> aliveWorkers.map (_.memory).sum) ~ +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) ~ +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) ~ +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) ~ +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) ~ +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) ~ +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) ~ +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) ~ +("status" -> obj.status.toString) + case Some(field) => +field match { + case "url" => +("url" -> obj.uri) + case "workers" => +("workers" -> obj.workers.toList.map (writeWorkerInfo) ) + case "aliveworkers" => +("aliveworkers" -> aliveWorkers.length) + case "cores" => +("cores" -> aliveWorkers.map (_.cores).sum) + case "coresused" => +("coresused" -> aliveWorkers.map (_.coresUsed).sum) + case "memory" => +("memory" -> aliveWorkers.map (_.memory).sum) + case "memoryused" => +("memoryused" -> aliveWorkers.map (_.memoryUsed).sum) + case "resources" => +("resources" -> aliveWorkers.map (_.resourcesInfo).toList.map (writeResourcesInfo) ) + case "resourcesused" => +("resourcesused" -> + aliveWorkers.map (_.resourcesInfoUsed).toList.map (writeResourcesInfo) ) + case "activeapps" => +("activeapps" -> obj.activeApps.toList.map (writeApplicationInfo) ) + case "completedapps" => +("completedapps" -> obj.completedApps.toList.map (writeApplicationInfo) ) + case "activedrivers" => +("activedrivers" -> obj.activeDrivers.toList.map (writeDriverInfo) ) + case "completeddrivers" => +("completeddrivers" -> obj.completedDrivers.toList.map (writeDriverInfo) ) + case "status" => +("status" -> obj.status.toString) + case field => (field -> "") Review Comment: Use 404? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
yaooqinn commented on code in PR #43303: URL: https://github.com/apache/spark/pull/43303#discussion_r1351369115 ## core/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala: ## @@ -210,22 +210,59 @@ private[deploy] object JsonProtocol { * `status` status of the master, * see [[org.apache.spark.deploy.master.RecoveryState.MasterState]] */ - def writeMasterState(obj: MasterStateResponse): JObject = { + def writeMasterState(field: Option[String], obj: MasterStateResponse): JObject = { Review Comment: Shall we update the annotation, too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
dongjoon-hyun commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754322080 Just a question, is this required for SPARK-44114? Or, simply want to remove the magic strings? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754308194 Thank you for review, @beliefer ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45464][CORE] Fix network-yarn distribution build [spark]
hasnain-db commented on PR #43289: URL: https://github.com/apache/spark/pull/43289#issuecomment-1754299341 @LuciferYang the docker integration tests are consistently failing with ``` [info] *** 2 SUITES ABORTED *** [error] Error during tests: [error] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite [error] org.apache.spark.sql.jdbc.OracleIntegrationSuite ``` I don't know if they are flaky but it seems to be unrelated to this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector elements [spark]
srowen commented on PR #43288: URL: https://github.com/apache/spark/pull/43288#issuecomment-1754292488 Is that really an error? It's the only way to represent a missing value -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42034] QueryExecutionListener and Observation API do not work with `foreach` / `reduce` / `foreachPartition` action. [spark]
HyukjinKwon commented on PR #39976: URL: https://github.com/apache/spark/pull/39976#issuecomment-1754286405 Thanks. Made a followup: https://github.com/apache/spark/pull/43304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils [spark]
HyukjinKwon opened a new pull request, #43304: URL: https://github.com/apache/spark/pull/43304 ### What changes were proposed in this pull request? This PR is kind of a followup for https://github.com/apache/spark/pull/39976 that addresses https://github.com/apache/spark/pull/39976#issuecomment-1752930380 comment. ### Why are the changes needed? In order to probably assign the SQL execution ID so `df.observe` works with this. ### Does this PR introduce _any_ user-facing change? Yes. `df.observe` will work with JDBC connectors. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]
HyukjinKwon commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351289594 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver { resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _) } -if (errors.nonEmpty) { Review Comment: I would jus tthrow an exception here since this is a followup of SPARK-42309. If we need to fix a different issue, we should have another jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45474][CORE][WEBUI] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun commented on PR #43303: URL: https://github.com/apache/spark/pull/43303#issuecomment-1754272348 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45417][PYTHON] Make InheritableThread inherit the active session [spark]
HyukjinKwon commented on PR #43231: URL: https://github.com/apache/spark/pull/43231#issuecomment-1754268341 Mind taking a look at the linter failure? https://github.com/clee704/spark/actions/runs/6462049944/job/17543011013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-45474][CORE] Support top-level filtering in MasterPage JSON API [spark]
dongjoon-hyun opened a new pull request, #43303: URL: https://github.com/apache/spark/pull/43303 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]
HyukjinKwon commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754265070 Yeah, I think we should add this to a linter .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45466][ML] `VectorAssembler` should validate the vector values [spark]
zhengruifeng commented on PR #43288: URL: https://github.com/apache/spark/pull/43288#issuecomment-1754259142 > Does this change behavior or just refactor code? I'm actually surprised that NaN is treated as an error but looks like that is existing behavior? It is a behavior change. Before this PR, NaN/null doubles and null vectors are treated as errors, but NaN values inside a vector are ignored. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]
HeartSaVioR closed pull request #43299: [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence URL: https://github.com/apache/spark/pull/43299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45472][SS] RocksDB State Store Doesn't Need to Recheck checkpoint path existence [spark]
HeartSaVioR commented on PR #43299: URL: https://github.com/apache/spark/pull/43299#issuecomment-1754222412 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]
HeartSaVioR commented on PR #43174: URL: https://github.com/apache/spark/pull/43174#issuecomment-1754205290 Just ported the fix back to 3.5 as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]
HeartSaVioR closed pull request #43174: [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance URL: https://github.com/apache/spark/pull/43174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]
beliefer commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1351195382 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2580,7 +2580,7 @@ private[spark] class DAGScheduler( // if the cluster manager explicitly tells us that the entire worker was lost, then // we know to unregister shuffle output. (Note that "worker" specifically refers to the process // from a Standalone cluster, where the shuffle service lives in the Worker.) -val fileLost = !sc.shuffleDriverComponents.supportsReliableStorage() && +val fileLost = !sc.shuffleDriverComponents.supportsReliableStorage(execId) && Review Comment: I have the same issue. https://github.com/apache/spark/pull/43280#discussion_r134998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45419][SS] Avoid reusing rocksdb sst files in a dfferent rocksdb instance [spark]
HeartSaVioR commented on PR #43174: URL: https://github.com/apache/spark/pull/43174#issuecomment-1754204031 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]
beliefer commented on code in PR #43039: URL: https://github.com/apache/spark/pull/43039#discussion_r1351192772 ## python/pyspark/sql/dataframe.py: ## @@ -2646,67 +2647,147 @@ def join( Examples -The following performs a full outer join between ``df1`` and ``df2``. +The following examples demonstrate various join types between ``df1`` and ``df2``. Review Comment: I am neutral on this change here. Overall, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]
beliefer commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351177061 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } + def suitableForByNameCheck( + byName: Boolean, + expected: Seq[Attribute], + queryOutput: Seq[Attribute]): Unit = { +if (!byName && expected.size == queryOutput.size && + expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, e.name))) && + expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, e._2.name))) { + logWarning("The query columns and the table columns have same names but different " + +"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the query columns to " + +"align with the table columns.") Review Comment: I know that. We can extend the example above to satisfy the condition you mentioned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]
beliefer commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351167115 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.trees.UnaryLike +import org.apache.spark.sql.catalyst.types.PhysicalDataType +import org.apache.spark.sql.types.{DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.collection.OpenHashMap + +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the concatenated input values," + +" separated by the delimiter string.", + examples = """ +Examples: + > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); + a,b,c + > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col); + a,a + > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col); + a|b + > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col); + NULL + """, + group = "agg_funcs", + since = "4.0.0") +case class ListAgg( +child: Expression, +delimiter: Expression = Literal.create(",", StringType), +reverse: Boolean = false, +mutableAggBufferOffset: Int = 0, +inputAggBufferOffset: Int = 0) extends TypedAggregateWithHashMapAsBuffer + with UnaryLike[Expression] { + + def this(child: Expression) = this(child, Literal.create(",", StringType), false, 0, 0) + def this(child: Expression, delimiter: Expression) = this(child, delimiter, false, 0, 0) + + override def update( + buffer: OpenHashMap[AnyRef, Long], Review Comment: I think this issue has been avoided if we refactor with `CollectList` or `extends Collect`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]
panbingkun commented on PR #43286: URL: https://github.com/apache/spark/pull/43286#issuecomment-1754193939 > Thank you @panbingkun! Can we regenerate the docs and add the canonical links to the released docs HTML? Regarding the published document HTML, the following PR is in progress: https://github.com/apache/spark-website/pull/482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
beliefer commented on PR #43296: URL: https://github.com/apache/spark/pull/43296#issuecomment-1754189027 > Ur, very sorry, but I'd not do this ORC change when we do the samething still for Parquet , @beliefer . This only increases the compile depdency for `org.apache.hadoop.hive.ql.io.orc.CompressionKind`. We prefer the existing loose-coupled way. All the change in the hive module already depends on `hive-exec`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]
Hisoka-X commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351138719 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } + def suitableForByNameCheck( + byName: Boolean, + expected: Seq[Attribute], + queryOutput: Seq[Attribute]): Unit = { +if (!byName && expected.size == queryOutput.size && + expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, e.name))) && + expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, e._2.name))) { + logWarning("The query columns and the table columns have same names but different " + +"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the query columns to " + +"align with the table columns.") Review Comment: In the situation you metioned, the warning message will not show up. It only work when all field name equals but field name order mismatch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
beliefer commented on code in PR #43296: URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala: ## @@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { val path = s"${dir.getCanonicalPath}/table1" val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") df.write -.option("compression", "ZlIb") +.option("compression", CompressionKind.ZLIB.name()) Review Comment: I will adjust it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45470][SQL] Avoid paste string value of hive orc compression kind [spark]
beliefer commented on code in PR #43296: URL: https://github.com/apache/spark/pull/43296#discussion_r1351137985 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcHadoopFsRelationSuite.scala: ## @@ -87,7 +88,7 @@ class OrcHadoopFsRelationSuite extends HadoopFsRelationTest { val path = s"${dir.getCanonicalPath}/table1" val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b") df.write -.option("compression", "ZlIb") +.option("compression", CompressionKind.ZLIB.name()) Review Comment: I will revert this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44735][SQL] Add warning msg when inserting columns with the same name by row that don't match up [spark]
beliefer commented on code in PR #42763: URL: https://github.com/apache/spark/pull/42763#discussion_r1351135259 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -455,6 +457,19 @@ object TableOutputResolver { } } + def suitableForByNameCheck( + byName: Boolean, + expected: Seq[Attribute], + queryOutput: Seq[Attribute]): Unit = { +if (!byName && expected.size == queryOutput.size && + expected.forall(e => queryOutput.exists(p => conf.resolver(p.name, e.name))) && + expected.zip(queryOutput).exists(e => !conf.resolver(e._1.name, e._2.name))) { + logWarning("The query columns and the table columns have same names but different " + +"orders. You can use INSERT [INTO | OVERWRITE] BY NAME to reorder the query columns to " + +"align with the table columns.") Review Comment: Users always insert table columns doesn't match the query columns. For example, table `student(name: STRING, address: STRING)` and table `location(code: STRING, name: STRING)`. The `student.address` references `location.name`. I am neutral on this warning message. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat [spark]
Hisoka-X commented on code in PR #43243: URL: https://github.com/apache/spark/pull/43243#discussion_r1351132582 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends Serializable { // We can only parse the value as TimestampNTZType if it does not have zone-offset or // time-zone component and can be parsed with the timestamp formatter. // Otherwise, it is likely to be a timestamp with timezone. -if (timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { - SQLConf.get.timestampType +val timestampType = SQLConf.get.timestampType +if ((SQLConf.get.legacyTimeParserPolicy == LegacyBehaviorPolicy.LEGACY || +timestampType == TimestampNTZType) && +timestampNTZFormatter.parseWithoutTimeZoneOptional(field, false).isDefined) { Review Comment: 1. Because the `LEGACY` behavior used `timestampNTZFormatter` to parse timestamp. So I don't change it when use `LEGACY` mode. Without this, some test case like `CSVLegacyTimeParserSuite.SPARK-37326: Timestamp type inference for a column with TIMESTAMP_NTZ values` can't passed. https://github.com/Hisoka-X/spark/runs/17462554632 2. It should be `(legacyTimeParserPolicy = LEGACY || timestampType == TimestampNTZType)` not `(legacyTimeParserPolicy = LEGACY || timestampType == TimestampLTZType)` if I think correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]
Hisoka-X commented on PR #42762: URL: https://github.com/apache/spark/pull/42762#issuecomment-1754157169 cc @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-42309][SQL][FOLLOWUP] Raise exception directly instead of calling `resolveColumnsByPosition` [spark]
itholic commented on code in PR #42762: URL: https://github.com/apache/spark/pull/42762#discussion_r1351085843 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -110,10 +110,6 @@ object TableOutputResolver { resolveColumnsByPosition(tableName, query.output, actualExpectedCols, conf, errors += _) } -if (errors.nonEmpty) { Review Comment: Yeah, I think we can just remove it. Sorry for delaying, I've been stuck on Pandas 2.x support on PySpark side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45450][PYTHON] Fix imports according to PEP8: pyspark.pandas and pyspark (core) [spark]
holdenk commented on PR #43257: URL: https://github.com/apache/spark/pull/43257#issuecomment-1754124296 Also could we add automated checking for the import order change so that we don't have to do this again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]
holdenk commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351071287 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.trees.UnaryLike +import org.apache.spark.sql.catalyst.types.PhysicalDataType +import org.apache.spark.sql.types.{DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.collection.OpenHashMap + +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the concatenated input values," + +" separated by the delimiter string.", + examples = """ +Examples: + > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); + a,b,c + > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col); + a,a + > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col); + a|b + > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col); + NULL + """, + group = "agg_funcs", + since = "4.0.0") +case class ListAgg( +child: Expression, +delimiter: Expression = Literal.create(",", StringType), +reverse: Boolean = false, +mutableAggBufferOffset: Int = 0, +inputAggBufferOffset: Int = 0) extends TypedAggregateWithHashMapAsBuffer + with UnaryLike[Expression] { + + def this(child: Expression) = this(child, Literal.create(",", StringType), false, 0, 0) + def this(child: Expression, delimiter: Expression) = this(child, delimiter, false, 0, 0) + + override def update( + buffer: OpenHashMap[AnyRef, Long], Review Comment: So the idea with that is to bubble up the distinct to avoid a shuffle? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]
holdenk commented on code in PR #42398: URL: https://github.com/apache/spark/pull/42398#discussion_r1351070023 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ListAgg.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.trees.UnaryLike +import org.apache.spark.sql.catalyst.types.PhysicalDataType +import org.apache.spark.sql.types.{DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String +import org.apache.spark.util.collection.OpenHashMap + +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the concatenated input values," + +" separated by the delimiter string.", + examples = """ +Examples: + > SELECT _FUNC_(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col); + a,b,c + > SELECT _FUNC_(col) FROM VALUES (NULL), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col) FROM VALUES ('a'), ('a') AS tab(col); + a,a + > SELECT _FUNC_(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col); + a,b + > SELECT _FUNC_(col, '|') FROM VALUES ('a'), ('b') AS tab(col); + a|b + > SELECT _FUNC_(col) FROM VALUES (NULL), (NULL) AS tab(col); + NULL + """, + group = "agg_funcs", + since = "4.0.0") +case class ListAgg( Review Comment: Similarily lets put this in collect.scala with the other collect friends. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45220][PYTHON][DOCS] Refine docstring of DataFrame.join [spark]
holdenk commented on code in PR #43039: URL: https://github.com/apache/spark/pull/43039#discussion_r1351066467 ## python/pyspark/sql/dataframe.py: ## @@ -2646,67 +2647,147 @@ def join( Examples -The following performs a full outer join between ``df1`` and ``df2``. +The following examples demonstrate various join types between ``df1`` and ``df2``. Review Comment: nit: it's not just df1 and df2 (no need to fix unless other changes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44837][SQL] Improve ALTER TABLE ALTER PARTITION column error message [spark]
holdenk commented on code in PR #42524: URL: https://github.com/apache/spark/pull/42524#discussion_r1351065858 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -97,6 +97,12 @@ "The method can not be called on streaming Dataset/DataFrame." ] }, + "CANNOT_ALTER_PARTITION_COLUMN" : { +"message" : [ + "ALTER TABLE (ALTER|CHANGE) COLUMN is not supported for partition columns, but found the partition column in the table " +], Review Comment: Would it be good to clarify that it is not supported with the catalog of the specified table? e.g. sometimes it is supported with the right catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45473][SQL] Fix incorrect error message for RoundBase [spark]
holdenk commented on PR #43302: URL: https://github.com/apache/spark/pull/43302#issuecomment-1754111612 LGTM pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor [spark]
holdenk commented on PR #38852: URL: https://github.com/apache/spark/pull/38852#issuecomment-1754108618 Maybe the `NettyBlockTransferServiceSuite` would be enough so we can assert that we are keeping track of active streams and also once done that the count is zero (so we know this is not blocking shutdown indefinitely). WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey [spark]
github-actions[bot] closed pull request #40334: [SPARK-42716][SQL] DataSourceV2 supports reporting key-grouped partitioning without HasPartitionKey URL: https://github.com/apache/spark/pull/40334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]
HyukjinKwon closed pull request #42859: [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client URL: https://github.com/apache/spark/pull/42859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-43299][SS][CONNECT] Convert StreamingQueryException in Scala Client [spark]
HyukjinKwon commented on PR #42859: URL: https://github.com/apache/spark/pull/42859#issuecomment-1754064070 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350999129 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction { val schema = DataType.fromJson( PythonWorkerUtils.readUTF(length, dataIn)).asInstanceOf[StructType] + // Receive the pickled AnalyzeResult buffer, if any. + val pickledAnalyzeResult: Option[Array[Byte]] = dataIn.readInt() match { +case 0 => + None Review Comment: So it won't happen here as this is running `analyze` method. I think `PythonUDTFAnalyzeResult.pickledAnalyzeResult` should be `Array[Byte]` and it should be wrapped with `Some` in `Analyzer`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]
dtenedor commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350991334 ## python/pyspark/worker.py: ## @@ -693,6 +698,21 @@ def read_udtf(pickleSer, infile, eval_type): f"The return type of a UDTF must be a struct type, but got {type(return_type)}." ) +# Update the handler that creates a new UDTF instance to first try calling the UDTF constructor +# with one argument containing the previous AnalyzeResult. If that fails, then try a constructor +# with no arguments. In this way each UDTF class instance can decide if it wants to inspect the +# AnalyzeResult. +if has_pickled_analyze_result: +prev_handler = handler + +def construct_udtf(): +try: +return prev_handler(pickled_analyze_result) Review Comment: Sounds good, done. ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -284,6 +285,16 @@ object UserDefinedPythonTableFunction { val schema = DataType.fromJson( PythonWorkerUtils.readUTF(length, dataIn)).asInstanceOf[StructType] + // Receive the pickled AnalyzeResult buffer, if any. + val pickledAnalyzeResult: Option[Array[Byte]] = dataIn.readInt() match { +case 0 => + None Review Comment: This should not happen unless the UDTF does not include an `analyze` method, in which case there is no picked `AnalyzeResult` buffer. ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -241,12 +248,17 @@ case class UnresolvedPolymorphicPythonUDTF( * @param orderByExpressions if non-empty, this contains the list of ordering items that the * 'analyze' method explicitly indicated that the UDTF call should consume * the input table rows by + * @param pickledAnalyzeResult this is the pickled 'AnalyzeResult' instance from the UDTF, which + * contains all metadata returned by the Python UDTF 'analyze' method + * including the result schema of the function call as well as optional + * other information */ case class PythonUDTFAnalyzeResult( schema: StructType, withSinglePartition: Boolean, partitionByExpressions: Seq[Expression], -orderByExpressions: Seq[SortOrder]) { +orderByExpressions: Seq[SortOrder], +pickledAnalyzeResult: Option[Array[Byte]]) { Review Comment: I find we need this since if the UDTF does not include an `analyze` method, this buffer is empty, and corresponding tests using these functions fail unless we check that this buffer is non-empty before using it. ## python/pyspark/sql/udtf.py: ## @@ -85,7 +85,7 @@ class OrderingColumn: overrideNullsFirst: Optional[bool] = None -@dataclass(frozen=True) +@dataclass Review Comment: I found it was necessary to allow UDTFs to create subclasses of `AnalyzeResult`. This is the way that we plan for these functions to compute custom state in the `analyze` method just once per function call and then propagate this information to future class instances to be consumed by the `eval` and `terminate` methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] Minor: Fix incorrect error message for RoundBase [spark]
viirya opened a new pull request, #43302: URL: https://github.com/apache/spark/pull/43302 ### What changes were proposed in this pull request? This minor patch fixes incorrect error message of `RoundBase`. ### Why are the changes needed? Fix incorrect error message. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page [spark]
allisonwang-db commented on PR #42425: URL: https://github.com/apache/spark/pull/42425#issuecomment-1753943908 @panbingkun thank you so much for working on this. It will be super helpful in improving the ranking for the PySpark documentations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44729][PYTHON][DOCS][3.3] Add canonical links to the PySpark docs page [spark]
allisonwang-db commented on PR #43286: URL: https://github.com/apache/spark/pull/43286#issuecomment-1753939826 Thank you @panbingkun! Can we regenerate the docs and add the canonical links to the released docs HTML? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]
mridulm commented on PR #43280: URL: https://github.com/apache/spark/pull/43280#issuecomment-1753930980 Btw, it is not very clear to me how this functionality is going to be leveraged - unlers you are relying on resource profiles to tag which executors should leverage reliable shuffle and which should not. Spark will run tasks across stages on an executor - so there is no expectation of executor id to stage mapping (except via resource profiles). Please add more details in the PR description on how this can be leveraged reliably. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45402][SQL][PYTHON] Add UDTF API for 'eval' and 'terminate' methods to consume previous 'analyze' result [spark]
ueshin commented on code in PR #43204: URL: https://github.com/apache/spark/pull/43204#discussion_r1350811414 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -241,12 +248,17 @@ case class UnresolvedPolymorphicPythonUDTF( * @param orderByExpressions if non-empty, this contains the list of ordering items that the * 'analyze' method explicitly indicated that the UDTF call should consume * the input table rows by + * @param pickledAnalyzeResult this is the pickled 'AnalyzeResult' instance from the UDTF, which + * contains all metadata returned by the Python UDTF 'analyze' method + * including the result schema of the function call as well as optional + * other information */ case class PythonUDTFAnalyzeResult( schema: StructType, withSinglePartition: Boolean, partitionByExpressions: Seq[Expression], -orderByExpressions: Seq[SortOrder]) { +orderByExpressions: Seq[SortOrder], +pickledAnalyzeResult: Option[Array[Byte]]) { Review Comment: We don't need `Option` here as it will always be set? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]
mridulm commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1350810420 ## core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java: ## @@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) {} * Does this shuffle component support reliable storage - external to the lifecycle of the * executor host ? For example, writing shuffle data to a distributed filesystem or * persisting it in a remote shuffle service. + * + * Note: This method is for compatibility with older implementations, + * the newer implementation should not use this method. */ default boolean supportsReliableStorage() { return false; } + + /** + * Does this executor support reliable storage for all shuffle data. + * @param execId The executor id, null for validation use only. Review Comment: executor id should be valid - and spark will pass a valid id when using this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-45463][CORE][SHUFFLE] Support reliable store with specified executorId [spark]
mridulm commented on code in PR #43280: URL: https://github.com/apache/spark/pull/43280#discussion_r1350809807 ## core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java: ## @@ -66,8 +66,19 @@ default void removeShuffle(int shuffleId, boolean blocking) {} * Does this shuffle component support reliable storage - external to the lifecycle of the * executor host ? For example, writing shuffle data to a distributed filesystem or * persisting it in a remote shuffle service. + * + * Note: This method is for compatibility with older implementations, + * the newer implementation should not use this method. Review Comment: This comment addition is incorrect. A shuffle implementation can either choose to be reliable irrespective of executor id, or choose to be reliable only for specific subset of executor id's - that is an implementation choice of the implementation, and not a function of newer vs older -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org