[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
HeartSaVioR commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1023563189 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -42,40 +43,101 @@ object UnsupportedOperationChecker

[GitHub] [spark] dongjoon-hyun commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-15 Thread GitBox
dongjoon-hyun commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1316546164 +1 for @sunchao 's comment. To @bsikander , it would be great if you can participate [[VOTE] Release Spark 3.2.3

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
HeartSaVioR commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1023563189 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -42,40 +43,101 @@ object UnsupportedOperationChecker

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023621416 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala: ## @@ -103,7 +103,7 @@ private[spark]

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023621416 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala: ## @@ -103,7 +103,7 @@ private[spark]

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023614721 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,10 +60,22 @@ import

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38669: URL: https://github.com/apache/spark/pull/38669#discussion_r1023607778 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/SchemaColumnConvertNotSupportedException.java: ## @@ -54,7 +54,8 @@ public

[GitHub] [spark] LuciferYang commented on a diff in pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38668: URL: https://github.com/apache/spark/pull/38668#discussion_r1023595740 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -125,7 +127,11 @@ private[storage] class BlockManagerDecommissioner(

[GitHub] [spark] itholic commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-15 Thread GitBox
itholic commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1021424319 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -242,9 +242,13 @@ class CastWithAnsiOnSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-15 Thread GitBox
cloud-fan commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1023592944 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -609,6 +609,20 @@ private[hive] class HiveClientImpl(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38668: URL: https://github.com/apache/spark/pull/38668#discussion_r1023590958 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -125,7 +127,11 @@ private[storage] class BlockManagerDecommissioner(

[GitHub] [spark] cloud-fan commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-15 Thread GitBox
cloud-fan commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316496874 I'm OK to reuse the usage of `TypeCheckFailure`, but many advanced users use catalyst plans/expressions directly. It's frustrating to remove it and break third party Spark extensions.

[GitHub] [spark] dongjoon-hyun commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-15 Thread GitBox
dongjoon-hyun commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1316472970 @Stycos SPARK-40801 is arrived after 3.3.1 release. ![Screenshot 2022-11-15 at 11 01 06

[GitHub] [spark] zhengruifeng opened a new pull request, #38670: [SPARK-41157][CONNECT][PYTHON][TEST] Show detailed differences in test

2022-11-15 Thread GitBox
zhengruifeng opened a new pull request, #38670: URL: https://github.com/apache/spark/pull/38670 ### What changes were proposed in this pull request? use `assert_eq` in `PandasOnSparkTestCase` to compare dataframes ### Why are the changes needed? show detailed error message

[GitHub] [spark] LuciferYang commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1023569619 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] beliefer commented on pull request #34367: [SPARK-37099][SQL] Introduce a rank-based filter to optimize top-k computation

2022-11-15 Thread GitBox
beliefer commented on PR #34367: URL: https://github.com/apache/spark/pull/34367#issuecomment-1316464454 > It is a long time since I initially sent this PR, and I don't have time to work on it, if any guys are interested in this optimization, feel free to take over it. cc @beliefer

[GitHub] [spark] wangyum commented on a diff in pull request #38649: [SPARK-41132][SQL] Convert LikeAny and NotLikeAny to InSet if no pattern contains wildcards

2022-11-15 Thread GitBox
wangyum commented on code in PR #38649: URL: https://github.com/apache/spark/pull/38649#discussion_r1023546094 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -780,6 +780,13 @@ object LikeSimplification extends Rule[LogicalPlan] {

[GitHub] [spark] MaxGekk commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-15 Thread GitBox
MaxGekk commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1316427566 @LuciferYang @panbingkun @itholic @cloud-fan @srielau @anchovYu Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] sunchao commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-15 Thread GitBox
sunchao commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1316416703 @bsikander again, pls check [d...@spark.apache.org](mailto:d...@spark.apache.org) - it's being voted. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] bsikander commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-11-15 Thread GitBox
bsikander commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1316415618 @sunchao @bjornjorgensen any update on this release? As internal alarms are going off continuously, i am desperately looking for the release. -- This is an automated message from

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-15 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1023536354 ## core/src/main/resources/error/error-classes.json: ## @@ -1277,6 +1277,11 @@ "A correlated outer name reference within a subquery expression body was not

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-15 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1316408322 Thank you @sunchao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] yabola commented on pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-15 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-1316389795 @mridulm as your comment said https://github.com/apache/spark/pull/37922#discussion_r990763769 , I want to Improve this part of the deletion logic -- This is an automated message from

[GitHub] [spark] zhengruifeng commented on pull request #34367: [SPARK-37099][SQL] Introduce a rank-based filter to optimize top-k computation

2022-11-15 Thread GitBox
zhengruifeng commented on PR #34367: URL: https://github.com/apache/spark/pull/34367#issuecomment-1316354088 It is a long time since I initially sent this PR, and I don't have time to work on it, if any guys are interested in this optimization, feel free to take over it. cc @beliefer

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-15 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1316351255 Thank you @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on pull request #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-15 Thread GitBox
viirya commented on PR #38669: URL: https://github.com/apache/spark/pull/38669#issuecomment-1316348790 cc @dongjoon-hyun @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-15 Thread GitBox
zhengruifeng commented on code in PR #38666: URL: https://github.com/apache/spark/pull/38666#discussion_r1023503919 ## connector/connect/README.md: ## @@ -52,9 +52,15 @@ To use the release version of Spark Connect: ### Run Tests ```bash +# Run a single Python test.

[GitHub] [spark] viirya opened a new pull request, #38669: [SPARK-41155][SQL] Add error message to SchemaColumnConvertNotSupportedException

2022-11-15 Thread GitBox
viirya opened a new pull request, #38669: URL: https://github.com/apache/spark/pull/38669 ### What changes were proposed in this pull request? This patch adds error message to `SchemaColumnConvertNotSupportedException`. ### Why are the changes needed?

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-15 Thread GitBox
zhengruifeng commented on code in PR #38666: URL: https://github.com/apache/spark/pull/38666#discussion_r1023503919 ## connector/connect/README.md: ## @@ -52,9 +52,15 @@ To use the release version of Spark Connect: ### Run Tests ```bash +# Run a single Python test.

[GitHub] [spark] HyukjinKwon closed pull request #38667: [SPARK-40798][DOCS][FOLLOW-UP] Fix a typo in the configuration name at migration guide

2022-11-15 Thread GitBox
HyukjinKwon closed pull request #38667: [SPARK-40798][DOCS][FOLLOW-UP] Fix a typo in the configuration name at migration guide URL: https://github.com/apache/spark/pull/38667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #38667: [SPARK-40798][DOCS][FOLLOW-UP] Fix a typo in the configuration name at migration guide

2022-11-15 Thread GitBox
HyukjinKwon commented on PR #38667: URL: https://github.com/apache/spark/pull/38667#issuecomment-1316341751 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip commented on pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request

2022-11-15 Thread GitBox
grundprinzip commented on PR #38630: URL: https://github.com/apache/spark/pull/38630#issuecomment-1316339424 @amaliujia can you please update the pr description to remove the enum part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] wankunde commented on a diff in pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-15 Thread GitBox
wankunde commented on code in PR #38495: URL: https://github.com/apache/spark/pull/38495#discussion_r1023487514 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -722,18 +722,15 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,

[GitHub] [spark] warrenzhu25 opened a new pull request, #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-15 Thread GitBox
warrenzhu25 opened a new pull request, #38668: URL: https://github.com/apache/spark/pull/38668 ### What changes were proposed in this pull request? Log migrated shuffle data size and migration time ### Why are the changes needed? Get info about migrated shuffle data size and

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023478697 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,6 +60,7 @@ import

[GitHub] [spark] pan3793 commented on pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
pan3793 commented on PR #38651: URL: https://github.com/apache/spark/pull/38651#issuecomment-1316286965 @dongjoon-hyun thanks for review, I addressed your comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023477562 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,10 +60,22 @@ import

[GitHub] [spark] ulysses-you commented on pull request #38667: [SPARK-40798][DOCS][FOLLOW-UP] Fix a typo in the configuration name at migration guide

2022-11-15 Thread GitBox
ulysses-you commented on PR #38667: URL: https://github.com/apache/spark/pull/38667#issuecomment-1316283965 thank you @HyukjinKwon @anchovYu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-15 Thread GitBox
LuciferYang commented on PR #38620: URL: https://github.com/apache/spark/pull/38620#issuecomment-1316264566 Thanks @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-15 Thread GitBox
wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1316254348 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng opened a new pull request, #34367: [SPARK-37099][SQL] Introduce a rank-based filter to optimize top-k computation

2022-11-15 Thread GitBox
zhengruifeng opened a new pull request, #34367: URL: https://github.com/apache/spark/pull/34367 ### What changes were proposed in this pull request? introduce a new node `RankLimit` to filter out uncessary rows based on rank computed on partial dataset. it supports following

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38257: [SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy

2022-11-15 Thread GitBox
HyukjinKwon commented on code in PR #38257: URL: https://github.com/apache/spark/pull/38257#discussion_r1023455115 ## docs/sql-migration-guide.md: ## @@ -34,6 +34,7 @@ license: | - Valid hexadecimal strings should include only allowed symbols (0-9A-Fa-f). - Valid

[GitHub] [spark] HyukjinKwon opened a new pull request, #38667: [SPARK-40798][DOCS] Fix a typo in the configuration name at migration guide

2022-11-15 Thread GitBox
HyukjinKwon opened a new pull request, #38667: URL: https://github.com/apache/spark/pull/38667 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/38257 to fix a typo from

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-15 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1316229029 Merged to master. Thanks for fixing this @liuzqt ! Thanks for the reviews @Ngone51, @sadikovi, @jiangxb1987 :-) And thanks for help with GA @HyukjinKwon and @Yikun ! -- This is

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38257: [SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy

2022-11-15 Thread GitBox
HyukjinKwon commented on code in PR #38257: URL: https://github.com/apache/spark/pull/38257#discussion_r1023453997 ## docs/sql-migration-guide.md: ## @@ -34,6 +34,7 @@ license: | - Valid hexadecimal strings should include only allowed symbols (0-9A-Fa-f). - Valid

[GitHub] [spark] asfgit closed pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-15 Thread GitBox
asfgit closed pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB URL: https://github.com/apache/spark/pull/38064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1023446576 ## project/SparkBuild.scala: ## @@ -109,6 +109,14 @@ object SparkBuild extends PomBuild { if (profiles.contains("jdwp-test-debug")) {

[GitHub] [spark] LuciferYang commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1023446458 ## connector/connect/pom.xml: ## @@ -371,4 +350,68 @@ + + + official-pb Review Comment: done ##

[GitHub] [spark] LuciferYang commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-15 Thread GitBox
LuciferYang commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1023446300 ## connector/connect/README.md: ## @@ -24,7 +24,31 @@ or ```bash ./build/sbt -Phive clean package ``` - + +### Build with user-defined `protoc` and

[GitHub] [spark] Yaohua628 commented on a diff in pull request #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-15 Thread GitBox
Yaohua628 commented on code in PR #38663: URL: https://github.com/apache/spark/pull/38663#discussion_r1023434857 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3380,4 +3380,20 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] Yaohua628 commented on a diff in pull request #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-15 Thread GitBox
Yaohua628 commented on code in PR #38663: URL: https://github.com/apache/spark/pull/38663#discussion_r1023431618 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala: ## @@ -852,6 +852,31 @@ class PlanParserSuite extends AnalysisTest {

[GitHub] [spark] Yaohua628 commented on a diff in pull request #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-15 Thread GitBox
Yaohua628 commented on code in PR #38663: URL: https://github.com/apache/spark/pull/38663#discussion_r1023428966 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/NamedArgumentFunction.scala: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] beliefer commented on pull request #37630: [SPARK-40193][SQL] Merge subquery plans with different filters

2022-11-15 Thread GitBox
beliefer commented on PR #37630: URL: https://github.com/apache/spark/pull/37630#issuecomment-1316166252 @peter-toth Could you fix the conflicts again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on pull request #37630: [SPARK-40193][SQL] Merge subquery plans with different filters

2022-11-15 Thread GitBox
beliefer commented on PR #37630: URL: https://github.com/apache/spark/pull/37630#issuecomment-1316165732 We tested this PR and the results is: ![image](https://user-images.githubusercontent.com/8486025/202063426-42a3b8bb-fac8-431e-8477-ad908644ab71.png) cc @sigmod too. -- This

[GitHub] [spark] amaliujia opened a new pull request, #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-15 Thread GitBox
amaliujia opened a new pull request, #38666: URL: https://github.com/apache/spark/pull/38666 ### What changes were proposed in this pull request? Improve developer documentation for Connect project for how to run `pyspark-connect` module which runs all existing Connect Python

[GitHub] [spark] amaliujia commented on pull request #38666: [CONENCT][PYTHON][DOC] Document how to run the module of tests for Spark Connect Python tests

2022-11-15 Thread GitBox
amaliujia commented on PR #38666: URL: https://github.com/apache/spark/pull/38666#issuecomment-1316126191 R: @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] bersprockets commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-15 Thread GitBox
bersprockets commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1020798321 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] Stycos commented on pull request #38262: [SPARK-40801][BUILD] Upgrade `Apache commons-text` to 1.10

2022-11-15 Thread GitBox
Stycos commented on PR #38262: URL: https://github.com/apache/spark/pull/38262#issuecomment-1316090923 When I execute `pip install pyspark` I still get commons-text-1.9.jar in the jars folder. Shouldn't I get 1.10 now? -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] github-actions[bot] closed pull request #37409: [SPARK-39970][CORE] Introduce ThrottledLogger to prevent log message flooding caused by network issues

2022-11-15 Thread GitBox
github-actions[bot] closed pull request #37409: [SPARK-39970][CORE] Introduce ThrottledLogger to prevent log message flooding caused by network issues URL: https://github.com/apache/spark/pull/37409 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] rangadi commented on a diff in pull request #38384: [SPARK-40657][PROTOBUF] Require shading for Java class jar, improve error handling

2022-11-15 Thread GitBox
rangadi commented on code in PR #38384: URL: https://github.com/apache/spark/pull/38384#discussion_r1023371786 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala: ## @@ -155,21 +155,52 @@ private[sql] object ProtobufUtils extends

[GitHub] [spark] warrenzhu25 commented on pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-15 Thread GitBox
warrenzhu25 commented on PR #38441: URL: https://github.com/apache/spark/pull/38441#issuecomment-1315971701 > Can you move `SCHEDULER_MAX_RETAINED_REMOVED_EXECUTORS` to below `STAGE_IGNORE_DECOMMISSION_FETCH_FAILURE` ? This is causing the build failure. Updated. -- This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #38539: [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1

2022-11-15 Thread GitBox
dongjoon-hyun commented on PR #38539: URL: https://github.com/apache/spark/pull/38539#issuecomment-1315901316 We need to validate this dependency change in `master` (for Apache Spark 3.4.0) first. Did you use this in your production environment? -- This is an automated message from the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023278795 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala: ## @@ -723,6 +723,18 @@ private[spark] object Config extends

[GitHub] [spark] MaxGekk opened a new pull request, #38665: [WIP][SQL] Remove the class `TypeCheckFailure`

2022-11-15 Thread GitBox
MaxGekk opened a new pull request, #38665: URL: https://github.com/apache/spark/pull/38665 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-15 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1023277805 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -57,6 +60,7 @@ import

[GitHub] [spark] dongjoon-hyun closed pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-15 Thread GitBox
dongjoon-hyun closed pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0 URL: https://github.com/apache/spark/pull/38620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-15 Thread GitBox
dongjoon-hyun commented on PR #38620: URL: https://github.com/apache/spark/pull/38620#issuecomment-1315885828 That will be enough, @LuciferYang . Thank you, @LuciferYang and @HyukjinKwon . Merged to master. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] amaliujia commented on a diff in pull request #38595: [SPARK-41090][SQL] Throw Exception for `db_name.view_name` when creating temp view by Dataset API

2022-11-15 Thread GitBox
amaliujia commented on code in PR #38595: URL: https://github.com/apache/spark/pull/38595#discussion_r1023261314 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -542,11 +542,11 @@ private[sql] object QueryParsingErrors extends

[GitHub] [spark] grundprinzip commented on a diff in pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request

2022-11-15 Thread GitBox
grundprinzip commented on code in PR #38630: URL: https://github.com/apache/spark/pull/38630#discussion_r1023255565 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -48,6 +48,11 @@ message Request { // The logical plan to be executed / analyzed. Plan

[GitHub] [spark] grundprinzip commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `proto

2022-11-15 Thread GitBox
grundprinzip commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1023252592 ## connector/connect/README.md: ## @@ -24,7 +24,31 @@ or ```bash ./build/sbt -Phive clean package ``` - + +### Build with user-defined `protoc` and

[GitHub] [spark] gengliangwang commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-15 Thread GitBox
gengliangwang commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023241331 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -29,26 +29,13 @@ import

[GitHub] [spark] gengliangwang commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-15 Thread GitBox
gengliangwang commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023241088 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -146,8 +146,12 @@ object FileSourceStrategy extends

[GitHub] [spark] MaxGekk commented on a diff in pull request #38647: [SPARK-41133][SQL] Integrate `UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION` into `NUMERIC_VALUE_OUT_OF_RANGE`

2022-11-15 Thread GitBox
MaxGekk commented on code in PR #38647: URL: https://github.com/apache/spark/pull/38647#discussion_r1023238958 ## connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeSuite.scala: ## @@ -436,7 +436,7 @@ abstract class AvroLogicalTypeSuite extends QueryTest

[GitHub] [spark] MaxGekk closed pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox
MaxGekk closed pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes URL: https://github.com/apache/spark/pull/38531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox
MaxGekk commented on PR #38531: URL: https://github.com/apache/spark/pull/38531#issuecomment-1315834870 +1, LGTM. Merging to master. Thank you, @panbingkun and @cloud-fan @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-15 Thread GitBox
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1023233663 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, +

[GitHub] [spark] kyle-ai2 commented on pull request #38539: [SPARK-41030][BUILD] Upgrade `Apache Ivy` to 2.5.1

2022-11-15 Thread GitBox
kyle-ai2 commented on PR #38539: URL: https://github.com/apache/spark/pull/38539#issuecomment-1315814385 Hello @dongjoon-hyun, Will this fix be backported for Spark 3.2 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #38511: [SPARK-41017][SQL] Support column pruning with multiple nondeterministic Filters

2022-11-15 Thread GitBox
gengliangwang commented on code in PR #38511: URL: https://github.com/apache/spark/pull/38511#discussion_r1023204106 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala: ## @@ -85,15 +72,25 @@ object PhysicalOperation extends AliasHelper with

[GitHub] [spark] anchovYu commented on a diff in pull request #38257: [SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy

2022-11-15 Thread GitBox
anchovYu commented on code in PR #38257: URL: https://github.com/apache/spark/pull/38257#discussion_r1023195030 ## docs/sql-migration-guide.md: ## @@ -34,6 +34,7 @@ license: | - Valid hexadecimal strings should include only allowed symbols (0-9A-Fa-f). - Valid values

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1023192849 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,41 +42,70 @@ object UnsupportedOperationChecker

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1023155184 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -940,22 +1056,22 @@ class UnsupportedOperationsSuite

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38611: [SPARK-41107][PYTHON][INFRA][TESTS] Install memory-profiler in the CI

2022-11-15 Thread GitBox
xinrong-meng commented on code in PR #38611: URL: https://github.com/apache/spark/pull/38611#discussion_r1023163181 ## dev/infra/Dockerfile: ## @@ -32,7 +32,7 @@ RUN $APT_INSTALL software-properties-common git libxml2-dev pkg-config curl wget RUN update-alternatives --set

[GitHub] [spark] grundprinzip commented on a diff in pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-15 Thread GitBox
grundprinzip commented on code in PR #38605: URL: https://github.com/apache/spark/pull/38605#discussion_r1023155908 ## connector/connect/README.md: ## @@ -70,3 +70,4 @@ When contributing a new client please be aware that we strive to have a common user experience across all

[GitHub] [spark] WweiL commented on pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
WweiL commented on PR #38503: URL: https://github.com/apache/spark/pull/38503#issuecomment-1315727807 > Shouldn't you also fix https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala to remove the flag->false

[GitHub] [spark] alex-balikov commented on pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
alex-balikov commented on PR #38503: URL: https://github.com/apache/spark/pull/38503#issuecomment-1315726141 Shouldn't you also fix https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala to remove the flag->false

[GitHub] [spark] amaliujia commented on pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-15 Thread GitBox
amaliujia commented on PR #38605: URL: https://github.com/apache/spark/pull/38605#issuecomment-1315721838 @grundprinzip suggestions applied. The doc look much better now with some more details filled in. Minding take another look? -- This is an automated message from the Apache Git

[GitHub] [spark] vinodkc commented on pull request #38608: [SPARK-41080][SQL] Support Bit manipulation function SETBIT

2022-11-15 Thread GitBox
vinodkc commented on PR #38608: URL: https://github.com/apache/spark/pull/38608#issuecomment-1315718307 CC @cloud-fan , @HyukjinKwon , @dongjoon-hyun Can you please review this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] vinodkc commented on pull request #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET

2022-11-15 Thread GitBox
vinodkc commented on PR #38661: URL: https://github.com/apache/spark/pull/38661#issuecomment-1315717202 CC @cloud-fan , @HyukjinKwon Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-15 Thread GitBox
warrenzhu25 commented on code in PR #38441: URL: https://github.com/apache/spark/pull/38441#discussion_r1023138870 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2193,9 +2193,11 @@ private[spark] class DAGScheduler( * Return true when: *

[GitHub] [spark] vinodkc commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-11-15 Thread GitBox
vinodkc commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1315715844 @cloud-fan , yes we could share common code among 3 functions (trunc, floor, ceil). Updated the PR -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] amaliujia commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-15 Thread GitBox
amaliujia commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1315710131 @dengziming thanks! BTW you can try to covert this PR to `draft` then re-open when you think it is ready for review again. -- This is an automated message from the Apache Git

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-15 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1315695999 Also, can you please update to latest master @gaoyajun02 ? Not sure why we are seeing the linter failure in build -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] mridulm commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-15 Thread GitBox
mridulm commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1315692121 There is a pending [comment](https://github.com/apache/spark/pull/38333/files#r1019735633), can you take a look at it @gaoyajun02 ? Thx -- This is an automated message from the

[GitHub] [spark] leewyang commented on pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-15 Thread GitBox
leewyang commented on PR #37734: URL: https://github.com/apache/spark/pull/37734#issuecomment-1315678614 BTW, I'm seeing a change in behavior in the `pandas_udf` when used with `limit` in the latest master branch of spark (vs. 3.3.1), per this example code: ``` import numpy as np

[GitHub] [spark] WweiL closed pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-15 Thread GitBox
WweiL closed pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries. URL: https://github.com/apache/spark/pull/38503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dtenedor commented on a diff in pull request #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-15 Thread GitBox
dtenedor commented on code in PR #38663: URL: https://github.com/apache/spark/pull/38663#discussion_r1023072531 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -769,7 +769,7 @@ inlineTable ; functionTable -:

[GitHub] [spark] dtenedor commented on a diff in pull request #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-15 Thread GitBox
dtenedor commented on code in PR #38663: URL: https://github.com/apache/spark/pull/38663#discussion_r1023072531 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -769,7 +769,7 @@ inlineTable ; functionTable -:

[GitHub] [spark] awdavidson commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-11-15 Thread GitBox
awdavidson commented on PR #38312: URL: https://github.com/apache/spark/pull/38312#issuecomment-1315558808 > @awdavidson I would like to understand the use case a bit better. Is the parquet file was written by an earlier Spark (version < 3.2) and does the error comes when that parquet file

[GitHub] [spark] AmplabJenkins commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-15 Thread GitBox
AmplabJenkins commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1315548580 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >