[spark] branch master updated (4d9e577 -> 9054a6a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4d9e577 [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception add 9054a6a [SPARK-36652][SQL] AQE dynamic join selection should not apply to non-equi join No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/optimizer/joins.scala| 4 .../org/apache/spark/sql/execution/SparkStrategies.scala | 3 ++- .../sql/execution/adaptive/DynamicJoinSelection.scala | 5 +++-- .../test/scala/org/apache/spark/sql/JoinHintSuite.scala| 14 ++ 4 files changed, 23 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4d9e577 [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception 4d9e577 is described below commit 4d9e577694f5232d808adf8b1ca35681216bd3d4 Author: Angerszh AuthorDate: Fri Sep 3 11:42:53 2021 +0900 [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception ### What changes were proposed in this pull request? Meet a case in yarn-cluster mode, after stop SparkContext call ApplicationMaster's Shutdown hook. Throw timeout exception then cause program throw exit code 1. But actually job success. ``` 21/09/02 12:36:55 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) 21/09/02 12:36:55 ERROR Utils: Uncaught exception in thread shutdown-hook-0 java.io.InterruptedIOException: Call interrupted at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1569) at org.apache.hadoop.ipc.Client.call(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1418) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:251) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:130) at com.sun.proxy.$Proxy21.finishApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy22.finishApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.unregisterApplicationMaster(AMRMClientImpl.java:479) at org.apache.spark.deploy.yarn.YarnRMClient.unregister(YarnRMClient.scala:90) at org.apache.spark.deploy.yarn.ApplicationMaster.unregister(ApplicationMaster.scala:384) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl$1.apply$mcV$sp(ApplicationMaster.scala:313) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
[spark] branch master updated (b72fa5e -> 38b6fbd)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b72fa5e [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py' add 38b6fbd [SPARK-36351][SQL] Refactor filter push down in file source v2 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/v2/avro/AvroScan.scala| 4 -- .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 19 .../SupportsPushDownCatalystFilters.scala | 41 .../execution/datasources/DataSourceUtils.scala| 21 +++- .../datasources/PruneFileSourcePartitions.scala| 56 +++--- .../sql/execution/datasources/v2/FileScan.scala| 6 --- .../execution/datasources/v2/FileScanBuilder.scala | 44 +++-- .../execution/datasources/v2/PushDownUtils.scala | 7 +-- .../sql/execution/datasources/v2/csv/CSVScan.scala | 6 +-- .../datasources/v2/csv/CSVScanBuilder.scala| 19 .../execution/datasources/v2/json/JsonScan.scala | 6 +-- .../datasources/v2/json/JsonScanBuilder.scala | 19 .../sql/execution/datasources/v2/orc/OrcScan.scala | 4 -- .../datasources/v2/orc/OrcScanBuilder.scala| 19 .../datasources/v2/parquet/ParquetScan.scala | 4 -- .../v2/parquet/ParquetScanBuilder.scala| 15 ++ .../execution/datasources/v2/text/TextScan.scala | 6 +-- .../datasources/v2/text/TextScanBuilder.scala | 3 +- .../sql/execution/datasources/json/JsonSuite.scala | 6 +-- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 20 +++- 20 files changed, 177 insertions(+), 148 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 99f6f7f [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py' 99f6f7f is described below commit 99f6f7f8f8f87677c058f2428e8c82c5ea47e3ea Author: William Hyun AuthorDate: Thu Sep 2 18:50:59 2021 -0700 [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py' ### What changes were proposed in this pull request? This PR aims to update comments in `gen-sql-config-docs.py`. ### Why are the changes needed? To make it up to date according to Spark version 3.2.0 release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #33902 from williamhyun/fixtool. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit b72fa5ef1c06b128011cc72d36f7bc02450ee675) Signed-off-by: Dongjoon Hyun --- sql/gen-sql-config-docs.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/gen-sql-config-docs.py b/sql/gen-sql-config-docs.py index 1ce3a61..83334b6 100644 --- a/sql/gen-sql-config-docs.py +++ b/sql/gen-sql-config-docs.py @@ -61,9 +61,9 @@ def generate_sql_configs_table_html(sql_configs, path): spark.sql.adaptive.enabled -false +true When true, enable adaptive query execution. -2.1.0 +1.6.0 ... - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (568ad6a -> b72fa5e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 568ad6a [SPARK-36637][SQL] Provide proper error message when use undefined window frame add b72fa5e [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py' No new revisions were added by this update. Summary of changes: sql/gen-sql-config-docs.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 568ad6a [SPARK-36637][SQL] Provide proper error message when use undefined window frame 568ad6a is described below commit 568ad6aa4435ce76ca3b5d9966e64259ea1f9b38 Author: Angerszh AuthorDate: Thu Sep 2 22:32:31 2021 +0800 [SPARK-36637][SQL] Provide proper error message when use undefined window frame ### What changes were proposed in this pull request? Two case of using undefined window frame as below should provide proper error message 1. For case using undefined window frame with window function ``` SELECT nth_value(employee_name, 2) OVER w second_highest_salary FROM basic_pays; ``` origin error message is ``` Window function nth_value(employee_name#x, 2, false) requires an OVER clause. ``` It's confused that in use use a window frame `w` but it's not defined. Now the error message is ``` Window specification w is not defined in the WINDOW clause. ``` 2. For case using undefined window frame with aggregation function ``` SELECT SUM(salary) OVER w sum_salary FROM basic_pays; ``` origin error message is ``` Error in query: unresolved operator 'Aggregate [unresolvedwindowexpression(sum(salary#2), WindowSpecReference(w)) AS sum_salary#34] +- SubqueryAlias spark_catalog.default.basic_pays +- HiveTableRelation [`default`.`employees`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#0, dept#1, salary#2, age#3], Partition Cols: []] ``` In this case, when convert GlobalAggregate, should skip UnresolvedWindowExpression Now the error message is ``` Window specification w is not defined in the WINDOW clause. ``` ### Why are the changes needed? Provide proper error message ### Does this PR introduce _any_ user-facing change? Yes, error messages are improved as described in desc ### How was this patch tested? Added UT Closes #33892 from AngersZh/SPARK-36637. Authored-by: Angerszh Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/analysis/Analyzer.scala | 15 +++-- .../catalyst/expressions/windowExpressions.scala | 4 +++- .../spark/sql/catalyst/trees/TreePatterns.scala| 1 + .../src/test/resources/sql-tests/inputs/window.sql | 12 +- .../resources/sql-tests/results/window.sql.out | 26 +- 5 files changed, 53 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index a26f6b6..340b859 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -437,8 +437,8 @@ class Analyzer(override val catalogManager: CatalogManager) * Substitute child plan with WindowSpecDefinitions. */ object WindowsSubstitution extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( - _.containsPattern(WITH_WINDOW_DEFINITION), ruleId) { +def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDownWithPruning( + _.containsAnyPattern(WITH_WINDOW_DEFINITION, UNRESOLVED_WINDOW_EXPRESSION), ruleId) { // Lookup WindowSpecDefinitions. This rule works with unresolved children. case WithWindowDefinition(windowDefinitions, child) => child.resolveExpressions { case UnresolvedWindowExpression(c, WindowSpecReference(windowName)) => @@ -446,6 +446,14 @@ class Analyzer(override val catalogManager: CatalogManager) throw QueryCompilationErrors.windowSpecificationNotDefinedError(windowName)) WindowExpression(c, windowSpecDefinition) } + + case p @ Project(projectList, _) => +projectList.foreach(_.transformDownWithPruning( + _.containsPattern(UNRESOLVED_WINDOW_EXPRESSION), ruleId) { + case UnresolvedWindowExpression(_, windowSpec) => +throw QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name) +}) +p } } @@ -2494,6 +2502,9 @@ class Analyzer(override val catalogManager: CatalogManager) expr.collect { case WindowExpression(ae: AggregateExpression, _) => ae case WindowExpression(e: PythonUDF, _) if PythonUDF.isGroupedAggPandasUDF(e) => e + case UnresolvedWindowExpression(ae: AggregateExpression, _) => ae + case UnresolvedWindowExpression(e: PythonUDF, _) +if PythonUDF.isGroupedAggPandasUDF(e) => e } }.toSet diff --git
[spark] branch branch-3.2 updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8b4cc90 [SPARK-36637][SQL] Provide proper error message when use undefined window frame 8b4cc90 is described below commit 8b4cc90c44d561b59bcb042025eae337657f10f9 Author: Angerszh AuthorDate: Thu Sep 2 22:32:31 2021 +0800 [SPARK-36637][SQL] Provide proper error message when use undefined window frame ### What changes were proposed in this pull request? Two case of using undefined window frame as below should provide proper error message 1. For case using undefined window frame with window function ``` SELECT nth_value(employee_name, 2) OVER w second_highest_salary FROM basic_pays; ``` origin error message is ``` Window function nth_value(employee_name#x, 2, false) requires an OVER clause. ``` It's confused that in use use a window frame `w` but it's not defined. Now the error message is ``` Window specification w is not defined in the WINDOW clause. ``` 2. For case using undefined window frame with aggregation function ``` SELECT SUM(salary) OVER w sum_salary FROM basic_pays; ``` origin error message is ``` Error in query: unresolved operator 'Aggregate [unresolvedwindowexpression(sum(salary#2), WindowSpecReference(w)) AS sum_salary#34] +- SubqueryAlias spark_catalog.default.basic_pays +- HiveTableRelation [`default`.`employees`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#0, dept#1, salary#2, age#3], Partition Cols: []] ``` In this case, when convert GlobalAggregate, should skip UnresolvedWindowExpression Now the error message is ``` Window specification w is not defined in the WINDOW clause. ``` ### Why are the changes needed? Provide proper error message ### Does this PR introduce _any_ user-facing change? Yes, error messages are improved as described in desc ### How was this patch tested? Added UT Closes #33892 from AngersZh/SPARK-36637. Authored-by: Angerszh Signed-off-by: Wenchen Fan (cherry picked from commit 568ad6aa4435ce76ca3b5d9966e64259ea1f9b38) Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/analysis/Analyzer.scala | 15 +++-- .../catalyst/expressions/windowExpressions.scala | 4 +++- .../spark/sql/catalyst/trees/TreePatterns.scala| 1 + .../src/test/resources/sql-tests/inputs/window.sql | 12 +- .../resources/sql-tests/results/window.sql.out | 26 +- 5 files changed, 53 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 92018eb..fa6b247 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -437,8 +437,8 @@ class Analyzer(override val catalogManager: CatalogManager) * Substitute child plan with WindowSpecDefinitions. */ object WindowsSubstitution extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( - _.containsPattern(WITH_WINDOW_DEFINITION), ruleId) { +def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsDownWithPruning( + _.containsAnyPattern(WITH_WINDOW_DEFINITION, UNRESOLVED_WINDOW_EXPRESSION), ruleId) { // Lookup WindowSpecDefinitions. This rule works with unresolved children. case WithWindowDefinition(windowDefinitions, child) => child.resolveExpressions { case UnresolvedWindowExpression(c, WindowSpecReference(windowName)) => @@ -446,6 +446,14 @@ class Analyzer(override val catalogManager: CatalogManager) throw QueryCompilationErrors.windowSpecificationNotDefinedError(windowName)) WindowExpression(c, windowSpecDefinition) } + + case p @ Project(projectList, _) => +projectList.foreach(_.transformDownWithPruning( + _.containsPattern(UNRESOLVED_WINDOW_EXPRESSION), ruleId) { + case UnresolvedWindowExpression(_, windowSpec) => +throw QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name) +}) +p } } @@ -2492,6 +2500,9 @@ class Analyzer(override val catalogManager: CatalogManager) expr.collect { case WindowExpression(ae: AggregateExpression, _) => ae case WindowExpression(e: PythonUDF, _) if PythonUDF.isGroupedAggPandasUDF(e) => e + case UnresolvedWindowExpression(ae: AggregateExpression, _) => ae + case
[spark] branch branch-3.1 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 6352085 [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version 6352085 is described below commit 6352085d538c69c90d5ebfd41efa92ddc1c64e7f Author: Cary Lee AuthorDate: Thu Sep 2 15:02:40 2021 +0200 [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version ### What changes were proposed in this pull request? Update both `DataFrame.approxQuantile` and `DataFrameStatFunctions.approxQuantile` to support overloaded definitions when multiple columns are supplied. ### Why are the changes needed? The current type hints don't support the multi-column signature, a form that was added in Spark 2.2 (see [the approxQuantile docs](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.approxQuantile.html).) This change was also introduced to pyspark-stubs (https://github.com/zero323/pyspark-stubs/pull/552). zero323 asked me to open a PR for the upstream change. ### Does this PR introduce _any_ user-facing change? This change only affects type hints - it brings the `approxQuantile` type hints up to date with the actual code. ### How was this patch tested? Ran `./dev/lint-python`. Closes #33880 from carylee/master. Authored-by: Cary Lee Signed-off-by: zero323 (cherry picked from commit 37f5ab07fa2343e77ae16b6460898ecbee4b3faf) Signed-off-by: zero323 --- python/pyspark/sql/dataframe.pyi | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/python/pyspark/sql/dataframe.pyi b/python/pyspark/sql/dataframe.pyi index af1bac6..062a8a5 100644 --- a/python/pyspark/sql/dataframe.pyi +++ b/python/pyspark/sql/dataframe.pyi @@ -237,12 +237,20 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): value: OptionalPrimitiveType, subset: Optional[List[str]] = ..., ) -> DataFrame: ... +@overload def approxQuantile( self, -col: Union[str, Tuple[str, ...], List[str]], -probabilities: Union[List[float], Tuple[float, ...]], -relativeError: float +col: str, +probabilities: Union[List[float], Tuple[float]], +relativeError: float, ) -> List[float]: ... +@overload +def approxQuantile( +self, +col: Union[List[str], Tuple[str]], +probabilities: Union[List[float], Tuple[float]], +relativeError: float, +) -> List[List[float]]: ... def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> float: ... def cov(self, col1: str, col2: str) -> float: ... def crosstab(self, col1: str, col2: str) -> DataFrame: ... @@ -314,9 +322,20 @@ class DataFrameNaFunctions: class DataFrameStatFunctions: df: DataFrame def __init__(self, df: DataFrame) -> None: ... +@overload def approxQuantile( -self, col: str, probabilities: List[float], relativeError: float +self, +col: str, +probabilities: Union[List[float], Tuple[float]], +relativeError: float, ) -> List[float]: ... +@overload +def approxQuantile( +self, +col: Union[List[str], Tuple[str]], +probabilities: Union[List[float], Tuple[float]], +relativeError: float, +) -> List[List[float]]: ... def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> float: ... def cov(self, col1: str, col2: str) -> float: ... def crosstab(self, col1: str, col2: str) -> DataFrame: ... - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 11d10fc [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version 11d10fc is described below commit 11d10fc994bb7e801e527ea765e1674a5a35d446 Author: Cary Lee AuthorDate: Thu Sep 2 15:02:40 2021 +0200 [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version ### What changes were proposed in this pull request? Update both `DataFrame.approxQuantile` and `DataFrameStatFunctions.approxQuantile` to support overloaded definitions when multiple columns are supplied. ### Why are the changes needed? The current type hints don't support the multi-column signature, a form that was added in Spark 2.2 (see [the approxQuantile docs](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.approxQuantile.html).) This change was also introduced to pyspark-stubs (https://github.com/zero323/pyspark-stubs/pull/552). zero323 asked me to open a PR for the upstream change. ### Does this PR introduce _any_ user-facing change? This change only affects type hints - it brings the `approxQuantile` type hints up to date with the actual code. ### How was this patch tested? Ran `./dev/lint-python`. Closes #33880 from carylee/master. Authored-by: Cary Lee Signed-off-by: zero323 (cherry picked from commit 37f5ab07fa2343e77ae16b6460898ecbee4b3faf) Signed-off-by: zero323 --- python/pyspark/sql/dataframe.pyi | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/python/pyspark/sql/dataframe.pyi b/python/pyspark/sql/dataframe.pyi index 9e762bf..d43c311 100644 --- a/python/pyspark/sql/dataframe.pyi +++ b/python/pyspark/sql/dataframe.pyi @@ -238,12 +238,20 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): value: OptionalPrimitiveType, subset: Optional[List[str]] = ..., ) -> DataFrame: ... +@overload def approxQuantile( self, -col: Union[str, Tuple[str, ...], List[str]], -probabilities: Union[List[float], Tuple[float, ...]], -relativeError: float +col: str, +probabilities: Union[List[float], Tuple[float]], +relativeError: float, ) -> List[float]: ... +@overload +def approxQuantile( +self, +col: Union[List[str], Tuple[str]], +probabilities: Union[List[float], Tuple[float]], +relativeError: float, +) -> List[List[float]]: ... def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> float: ... def cov(self, col1: str, col2: str) -> float: ... def crosstab(self, col1: str, col2: str) -> DataFrame: ... @@ -316,9 +324,20 @@ class DataFrameNaFunctions: class DataFrameStatFunctions: df: DataFrame def __init__(self, df: DataFrame) -> None: ... +@overload def approxQuantile( -self, col: str, probabilities: List[float], relativeError: float +self, +col: str, +probabilities: Union[List[float], Tuple[float]], +relativeError: float, ) -> List[float]: ... +@overload +def approxQuantile( +self, +col: Union[List[str], Tuple[str]], +probabilities: Union[List[float], Tuple[float]], +relativeError: float, +) -> List[List[float]]: ... def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> float: ... def cov(self, col1: str, col2: str) -> float: ... def crosstab(self, col1: str, col2: str) -> DataFrame: ... - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (94c3062 -> 37f5ab0)
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 94c3062 [SPARK-36400][TEST][FOLLOWUP] Add test for redacting sensitive information in UI by config add 37f5ab0 [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.pyi | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9c5bcac -> 94c3062)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9c5bcac [SPARK-36626][PYTHON] Support TimestampNTZ in createDataFrame/toPandas and Python UDFs add 94c3062 [SPARK-36400][TEST][FOLLOWUP] Add test for redacting sensitive information in UI by config No new revisions were added by this update. Summary of changes: .../sql/hive/thriftserver/UISeleniumSuite.scala| 45 ++ 1 file changed, 45 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org