date:20210902

[spark] branch master updated (4d9e577 -> 9054a6a)

2021-09-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4d9e577  [SPARK-36650][YARN] ApplicationMaster shutdown hook should 
catch timeout exception
 add 9054a6a  [SPARK-36652][SQL] AQE dynamic join selection should not 
apply to non-equi join

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/optimizer/joins.scala|  4 
 .../org/apache/spark/sql/execution/SparkStrategies.scala   |  3 ++-
 .../sql/execution/adaptive/DynamicJoinSelection.scala  |  5 +++--
 .../test/scala/org/apache/spark/sql/JoinHintSuite.scala| 14 ++
 4 files changed, 23 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception

2021-09-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d9e577  [SPARK-36650][YARN] ApplicationMaster shutdown hook should 
catch timeout exception
4d9e577 is described below

commit 4d9e577694f5232d808adf8b1ca35681216bd3d4
Author: Angerszh 
AuthorDate: Fri Sep 3 11:42:53 2021 +0900

[SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout 
exception

### What changes were proposed in this pull request?
Meet a case in yarn-cluster mode, after stop SparkContext call 
ApplicationMaster's Shutdown hook.
Throw timeout exception  then cause program throw exit code 1. But actually 
job success.
```
21/09/02 12:36:55 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, 
java.util.concurrent.TimeoutException
java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at 
org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95)
21/09/02 12:36:55 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.io.InterruptedIOException: Call interrupted
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1569)
at org.apache.hadoop.ipc.Client.call(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1418)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:251)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:130)
at com.sun.proxy.$Proxy21.finishApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy22.finishApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.unregisterApplicationMaster(AMRMClientImpl.java:479)
at 
org.apache.spark.deploy.yarn.YarnRMClient.unregister(YarnRMClient.scala:90)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.unregister(ApplicationMaster.scala:384)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl$1.apply$mcV$sp(ApplicationMaster.scala:313)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1992)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

[spark] branch master updated (b72fa5e -> 38b6fbd)

2021-09-02 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b72fa5e  [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'
 add 38b6fbd  [SPARK-36351][SQL] Refactor filter push down in file source v2

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/v2/avro/AvroScan.scala|  4 --
 .../apache/spark/sql/v2/avro/AvroScanBuilder.scala | 19 
 .../SupportsPushDownCatalystFilters.scala  | 41 
 .../execution/datasources/DataSourceUtils.scala| 21 +++-
 .../datasources/PruneFileSourcePartitions.scala| 56 +++---
 .../sql/execution/datasources/v2/FileScan.scala|  6 ---
 .../execution/datasources/v2/FileScanBuilder.scala | 44 +++--
 .../execution/datasources/v2/PushDownUtils.scala   |  7 +--
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  6 +--
 .../datasources/v2/csv/CSVScanBuilder.scala| 19 
 .../execution/datasources/v2/json/JsonScan.scala   |  6 +--
 .../datasources/v2/json/JsonScanBuilder.scala  | 19 
 .../sql/execution/datasources/v2/orc/OrcScan.scala |  4 --
 .../datasources/v2/orc/OrcScanBuilder.scala| 19 
 .../datasources/v2/parquet/ParquetScan.scala   |  4 --
 .../v2/parquet/ParquetScanBuilder.scala| 15 ++
 .../execution/datasources/v2/text/TextScan.scala   |  6 +--
 .../datasources/v2/text/TextScanBuilder.scala  |  3 +-
 .../sql/execution/datasources/json/JsonSuite.scala |  6 +--
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 20 +++-
 20 files changed, 177 insertions(+), 148 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'

2021-09-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 99f6f7f  [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'
99f6f7f is described below

commit 99f6f7f8f8f87677c058f2428e8c82c5ea47e3ea
Author: William Hyun 
AuthorDate: Thu Sep 2 18:50:59 2021 -0700

[SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'

### What changes were proposed in this pull request?
This PR aims to update comments in `gen-sql-config-docs.py`.

### Why are the changes needed?
To make it up to date according to Spark version 3.2.0 release.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A.

Closes #33902 from williamhyun/fixtool.

Authored-by: William Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b72fa5ef1c06b128011cc72d36f7bc02450ee675)
Signed-off-by: Dongjoon Hyun 
---
 sql/gen-sql-config-docs.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sql/gen-sql-config-docs.py b/sql/gen-sql-config-docs.py
index 1ce3a61..83334b6 100644
--- a/sql/gen-sql-config-docs.py
+++ b/sql/gen-sql-config-docs.py
@@ -61,9 +61,9 @@ def generate_sql_configs_table_html(sql_configs, path):
 
 
 spark.sql.adaptive.enabled
-false
+true
 When true, enable adaptive query execution.
-2.1.0
+1.6.0
 
 
 ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (568ad6a -> b72fa5e)

2021-09-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 568ad6a  [SPARK-36637][SQL] Provide proper error message when use 
undefined window frame
 add b72fa5e  [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'

No new revisions were added by this update.

Summary of changes:
 sql/gen-sql-config-docs.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame

2021-09-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 568ad6a  [SPARK-36637][SQL] Provide proper error message when use 
undefined window frame
568ad6a is described below

commit 568ad6aa4435ce76ca3b5d9966e64259ea1f9b38
Author: Angerszh 
AuthorDate: Thu Sep 2 22:32:31 2021 +0800

[SPARK-36637][SQL] Provide proper error message when use undefined window 
frame

### What changes were proposed in this pull request?
Two case of using undefined window frame as below should provide proper 
error message

1. For case using undefined window frame with window function
```
SELECT nth_value(employee_name, 2) OVER w second_highest_salary
FROM basic_pays;
```
origin error message is
```
Window function nth_value(employee_name#x, 2, false) requires an OVER 
clause.
```
It's confused that in use use a window frame `w` but it's not defined.
Now the error message is
```
Window specification w is not defined in the WINDOW clause.
```

2. For case using undefined window frame with aggregation function
```
SELECT SUM(salary) OVER w sum_salary
FROM basic_pays;
```
origin error message is
```
Error in query: unresolved operator 'Aggregate 
[unresolvedwindowexpression(sum(salary#2), WindowSpecReference(w)) AS 
sum_salary#34]
+- SubqueryAlias spark_catalog.default.basic_pays
+- HiveTableRelation [`default`.`employees`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#0, dept#1, 
salary#2, age#3], Partition Cols: []]
```
In this case, when convert GlobalAggregate, should skip 
UnresolvedWindowExpression
Now the error message is
```
Window specification w is not defined in the WINDOW clause.
```

### Why are the changes needed?
Provide proper error message

### Does this PR introduce _any_ user-facing change?
Yes, error messages are improved as described in desc

### How was this patch tested?
Added UT

Closes #33892 from AngersZh/SPARK-36637.

Authored-by: Angerszh 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 15 +++--
 .../catalyst/expressions/windowExpressions.scala   |  4 +++-
 .../spark/sql/catalyst/trees/TreePatterns.scala|  1 +
 .../src/test/resources/sql-tests/inputs/window.sql | 12 +-
 .../resources/sql-tests/results/window.sql.out | 26 +-
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index a26f6b6..340b859 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -437,8 +437,8 @@ class Analyzer(override val catalogManager: CatalogManager)
* Substitute child plan with WindowSpecDefinitions.
*/
   object WindowsSubstitution extends Rule[LogicalPlan] {
-def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
-  _.containsPattern(WITH_WINDOW_DEFINITION), ruleId) {
+def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsDownWithPruning(
+  _.containsAnyPattern(WITH_WINDOW_DEFINITION, 
UNRESOLVED_WINDOW_EXPRESSION), ruleId) {
   // Lookup WindowSpecDefinitions. This rule works with unresolved 
children.
   case WithWindowDefinition(windowDefinitions, child) => 
child.resolveExpressions {
 case UnresolvedWindowExpression(c, WindowSpecReference(windowName)) =>
@@ -446,6 +446,14 @@ class Analyzer(override val catalogManager: CatalogManager)
 throw 
QueryCompilationErrors.windowSpecificationNotDefinedError(windowName))
   WindowExpression(c, windowSpecDefinition)
   }
+
+  case p @ Project(projectList, _) =>
+projectList.foreach(_.transformDownWithPruning(
+  _.containsPattern(UNRESOLVED_WINDOW_EXPRESSION), ruleId) {
+  case UnresolvedWindowExpression(_, windowSpec) =>
+throw 
QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name)
+})
+p
 }
   }
 
@@ -2494,6 +2502,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 expr.collect {
   case WindowExpression(ae: AggregateExpression, _) => ae
   case WindowExpression(e: PythonUDF, _) if 
PythonUDF.isGroupedAggPandasUDF(e) => e
+  case UnresolvedWindowExpression(ae: AggregateExpression, _) => ae
+  case UnresolvedWindowExpression(e: PythonUDF, _)
+if PythonUDF.isGroupedAggPandasUDF(e) => e
 }
   }.toSet
 
diff --git

[spark] branch branch-3.2 updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame

2021-09-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 8b4cc90  [SPARK-36637][SQL] Provide proper error message when use 
undefined window frame
8b4cc90 is described below

commit 8b4cc90c44d561b59bcb042025eae337657f10f9
Author: Angerszh 
AuthorDate: Thu Sep 2 22:32:31 2021 +0800

[SPARK-36637][SQL] Provide proper error message when use undefined window 
frame

### What changes were proposed in this pull request?
Two case of using undefined window frame as below should provide proper 
error message

1. For case using undefined window frame with window function
```
SELECT nth_value(employee_name, 2) OVER w second_highest_salary
FROM basic_pays;
```
origin error message is
```
Window function nth_value(employee_name#x, 2, false) requires an OVER 
clause.
```
It's confused that in use use a window frame `w` but it's not defined.
Now the error message is
```
Window specification w is not defined in the WINDOW clause.
```

2. For case using undefined window frame with aggregation function
```
SELECT SUM(salary) OVER w sum_salary
FROM basic_pays;
```
origin error message is
```
Error in query: unresolved operator 'Aggregate 
[unresolvedwindowexpression(sum(salary#2), WindowSpecReference(w)) AS 
sum_salary#34]
+- SubqueryAlias spark_catalog.default.basic_pays
+- HiveTableRelation [`default`.`employees`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#0, dept#1, 
salary#2, age#3], Partition Cols: []]
```
In this case, when convert GlobalAggregate, should skip 
UnresolvedWindowExpression
Now the error message is
```
Window specification w is not defined in the WINDOW clause.
```

### Why are the changes needed?
Provide proper error message

### Does this PR introduce _any_ user-facing change?
Yes, error messages are improved as described in desc

### How was this patch tested?
Added UT

Closes #33892 from AngersZh/SPARK-36637.

Authored-by: Angerszh 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 568ad6aa4435ce76ca3b5d9966e64259ea1f9b38)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 15 +++--
 .../catalyst/expressions/windowExpressions.scala   |  4 +++-
 .../spark/sql/catalyst/trees/TreePatterns.scala|  1 +
 .../src/test/resources/sql-tests/inputs/window.sql | 12 +-
 .../resources/sql-tests/results/window.sql.out | 26 +-
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 92018eb..fa6b247 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -437,8 +437,8 @@ class Analyzer(override val catalogManager: CatalogManager)
* Substitute child plan with WindowSpecDefinitions.
*/
   object WindowsSubstitution extends Rule[LogicalPlan] {
-def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
-  _.containsPattern(WITH_WINDOW_DEFINITION), ruleId) {
+def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsDownWithPruning(
+  _.containsAnyPattern(WITH_WINDOW_DEFINITION, 
UNRESOLVED_WINDOW_EXPRESSION), ruleId) {
   // Lookup WindowSpecDefinitions. This rule works with unresolved 
children.
   case WithWindowDefinition(windowDefinitions, child) => 
child.resolveExpressions {
 case UnresolvedWindowExpression(c, WindowSpecReference(windowName)) =>
@@ -446,6 +446,14 @@ class Analyzer(override val catalogManager: CatalogManager)
 throw 
QueryCompilationErrors.windowSpecificationNotDefinedError(windowName))
   WindowExpression(c, windowSpecDefinition)
   }
+
+  case p @ Project(projectList, _) =>
+projectList.foreach(_.transformDownWithPruning(
+  _.containsPattern(UNRESOLVED_WINDOW_EXPRESSION), ruleId) {
+  case UnresolvedWindowExpression(_, windowSpec) =>
+throw 
QueryCompilationErrors.windowSpecificationNotDefinedError(windowSpec.name)
+})
+p
 }
   }
 
@@ -2492,6 +2500,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 expr.collect {
   case WindowExpression(ae: AggregateExpression, _) => ae
   case WindowExpression(e: PythonUDF, _) if 
PythonUDF.isGroupedAggPandasUDF(e) => e
+  case UnresolvedWindowExpression(ae: AggregateExpression, _) => ae
+  case

[spark] branch branch-3.1 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version

2021-09-02 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 6352085  [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to 
support multi-column version
6352085 is described below

commit 6352085d538c69c90d5ebfd41efa92ddc1c64e7f
Author: Cary Lee 
AuthorDate: Thu Sep 2 15:02:40 2021 +0200

[SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support 
multi-column version

### What changes were proposed in this pull request?
Update both `DataFrame.approxQuantile` and 
`DataFrameStatFunctions.approxQuantile` to support overloaded definitions when 
multiple columns are supplied.

### Why are the changes needed?
The current type hints don't support the multi-column signature, a form 
that was added in Spark 2.2 (see [the approxQuantile 
docs](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.approxQuantile.html).)
 This change was also introduced to pyspark-stubs 
(https://github.com/zero323/pyspark-stubs/pull/552). zero323 asked me to open a 
PR for the upstream change.

### Does this PR introduce _any_ user-facing change?
This change only affects type hints - it brings the `approxQuantile` type 
hints up to date with the actual code.

### How was this patch tested?
Ran `./dev/lint-python`.

Closes #33880 from carylee/master.

Authored-by: Cary Lee 
Signed-off-by: zero323 
(cherry picked from commit 37f5ab07fa2343e77ae16b6460898ecbee4b3faf)
Signed-off-by: zero323 
---
 python/pyspark/sql/dataframe.pyi | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.pyi b/python/pyspark/sql/dataframe.pyi
index af1bac6..062a8a5 100644
--- a/python/pyspark/sql/dataframe.pyi
+++ b/python/pyspark/sql/dataframe.pyi
@@ -237,12 +237,20 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 value: OptionalPrimitiveType,
 subset: Optional[List[str]] = ...,
 ) -> DataFrame: ...
+@overload
 def approxQuantile(
 self,
-col: Union[str, Tuple[str, ...], List[str]],
-probabilities: Union[List[float], Tuple[float, ...]],
-relativeError: float
+col: str,
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
 ) -> List[float]: ...
+@overload
+def approxQuantile(
+self,
+col: Union[List[str], Tuple[str]],
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
+) -> List[List[float]]: ...
 def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> 
float: ...
 def cov(self, col1: str, col2: str) -> float: ...
 def crosstab(self, col1: str, col2: str) -> DataFrame: ...
@@ -314,9 +322,20 @@ class DataFrameNaFunctions:
 class DataFrameStatFunctions:
 df: DataFrame
 def __init__(self, df: DataFrame) -> None: ...
+@overload
 def approxQuantile(
-self, col: str, probabilities: List[float], relativeError: float
+self,
+col: str,
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
 ) -> List[float]: ...
+@overload
+def approxQuantile(
+self,
+col: Union[List[str], Tuple[str]],
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
+) -> List[List[float]]: ...
 def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> 
float: ...
 def cov(self, col1: str, col2: str) -> float: ...
 def crosstab(self, col1: str, col2: str) -> DataFrame: ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version

2021-09-02 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 11d10fc  [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to 
support multi-column version
11d10fc is described below

commit 11d10fc994bb7e801e527ea765e1674a5a35d446
Author: Cary Lee 
AuthorDate: Thu Sep 2 15:02:40 2021 +0200

[SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support 
multi-column version

### What changes were proposed in this pull request?
Update both `DataFrame.approxQuantile` and 
`DataFrameStatFunctions.approxQuantile` to support overloaded definitions when 
multiple columns are supplied.

### Why are the changes needed?
The current type hints don't support the multi-column signature, a form 
that was added in Spark 2.2 (see [the approxQuantile 
docs](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.approxQuantile.html).)
 This change was also introduced to pyspark-stubs 
(https://github.com/zero323/pyspark-stubs/pull/552). zero323 asked me to open a 
PR for the upstream change.

### Does this PR introduce _any_ user-facing change?
This change only affects type hints - it brings the `approxQuantile` type 
hints up to date with the actual code.

### How was this patch tested?
Ran `./dev/lint-python`.

Closes #33880 from carylee/master.

Authored-by: Cary Lee 
Signed-off-by: zero323 
(cherry picked from commit 37f5ab07fa2343e77ae16b6460898ecbee4b3faf)
Signed-off-by: zero323 
---
 python/pyspark/sql/dataframe.pyi | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.pyi b/python/pyspark/sql/dataframe.pyi
index 9e762bf..d43c311 100644
--- a/python/pyspark/sql/dataframe.pyi
+++ b/python/pyspark/sql/dataframe.pyi
@@ -238,12 +238,20 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 value: OptionalPrimitiveType,
 subset: Optional[List[str]] = ...,
 ) -> DataFrame: ...
+@overload
 def approxQuantile(
 self,
-col: Union[str, Tuple[str, ...], List[str]],
-probabilities: Union[List[float], Tuple[float, ...]],
-relativeError: float
+col: str,
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
 ) -> List[float]: ...
+@overload
+def approxQuantile(
+self,
+col: Union[List[str], Tuple[str]],
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
+) -> List[List[float]]: ...
 def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> 
float: ...
 def cov(self, col1: str, col2: str) -> float: ...
 def crosstab(self, col1: str, col2: str) -> DataFrame: ...
@@ -316,9 +324,20 @@ class DataFrameNaFunctions:
 class DataFrameStatFunctions:
 df: DataFrame
 def __init__(self, df: DataFrame) -> None: ...
+@overload
 def approxQuantile(
-self, col: str, probabilities: List[float], relativeError: float
+self,
+col: str,
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
 ) -> List[float]: ...
+@overload
+def approxQuantile(
+self,
+col: Union[List[str], Tuple[str]],
+probabilities: Union[List[float], Tuple[float]],
+relativeError: float,
+) -> List[List[float]]: ...
 def corr(self, col1: str, col2: str, method: Optional[str] = ...) -> 
float: ...
 def cov(self, col1: str, col2: str) -> float: ...
 def crosstab(self, col1: str, col2: str) -> DataFrame: ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (94c3062 -> 37f5ab0)

2021-09-02 Thread zero323

This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 94c3062  [SPARK-36400][TEST][FOLLOWUP] Add test for redacting 
sensitive information in UI by config
 add 37f5ab0  [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to 
support multi-column version

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.pyi | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (9c5bcac -> 94c3062)

2021-09-02 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 9c5bcac  [SPARK-36626][PYTHON] Support TimestampNTZ in 
createDataFrame/toPandas and Python UDFs
 add 94c3062  [SPARK-36400][TEST][FOLLOWUP] Add test for redacting 
sensitive information in UI by config

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/thriftserver/UISeleniumSuite.scala| 45 ++
 1 file changed, 45 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4d9e577 -> 9054a6a)

[spark] branch master updated: [SPARK-36650][YARN] ApplicationMaster shutdown hook should catch timeout exception

[spark] branch master updated (b72fa5e -> 38b6fbd)

[spark] branch branch-3.2 updated: [SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py'

[spark] branch master updated (568ad6a -> b72fa5e)

[spark] branch master updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame

[spark] branch branch-3.2 updated: [SPARK-36637][SQL] Provide proper error message when use undefined window frame

[spark] branch branch-3.1 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version

[spark] branch branch-3.2 updated: [SPARK-36617][PYTHON] Fix type hints for `approxQuantile` to support multi-column version

[spark] branch master updated (94c3062 -> 37f5ab0)

[spark] branch master updated (9c5bcac -> 94c3062)

11 matches

Site Navigation

Mail list logo

Footer information