[spark] branch master updated: [SPARK-36973][PYTHON] Deduplicate prepare data method for HistogramPlotBase and KdePlotBase

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f678c75  [SPARK-36973][PYTHON] Deduplicate prepare data method for 
HistogramPlotBase and KdePlotBase
f678c75 is described below

commit f678c75d3940b2887fdb2621691b791b95d79469
Author: dch nguyen 
AuthorDate: Wed Oct 13 15:56:09 2021 +0900

[SPARK-36973][PYTHON] Deduplicate prepare data method for HistogramPlotBase 
and KdePlotBase

### What changes were proposed in this pull request?
Deduplicate prepare data method for HistogramPlotBase and KdePlotBase

### Why are the changes needed?
Deduplicate code
Remove 2 ```TODO``` comment

### Does this PR introduce _any_ user-facing change?
No, only for Dev

### How was this patch tested?
Existing tests

Closes #34251 from dchvn/SPARK-36973.

Lead-authored-by: dch nguyen 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/plot/core.py | 31 +++
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/python/pyspark/pandas/plot/core.py 
b/python/pyspark/pandas/plot/core.py
index dc95eac..89b8320 100644
--- a/python/pyspark/pandas/plot/core.py
+++ b/python/pyspark/pandas/plot/core.py
@@ -98,10 +98,9 @@ class SampledPlotBase:
 )
 
 
-class HistogramPlotBase:
+class NumericPlotBase:
 @staticmethod
-def prepare_hist_data(data, bins):
-# TODO: this logic is similar with KdePlotBase. Might have to 
deduplicate it.
+def prepare_numeric_data(data):
 from pyspark.pandas.series import Series
 
 if isinstance(data, Series):
@@ -117,6 +116,13 @@ class HistogramPlotBase:
 "Empty {0!r}: no numeric data to " 
"plot".format(numeric_data.__class__.__name__)
 )
 
+return data, numeric_data
+
+
+class HistogramPlotBase(NumericPlotBase):
+@staticmethod
+def prepare_hist_data(data, bins):
+data, numeric_data = NumericPlotBase.prepare_numeric_data(data)
 if is_integer(bins):
 # computes boundaries for the column
 bins = HistogramPlotBase.get_bins(data.to_spark(), bins)
@@ -340,25 +346,10 @@ class BoxPlotBase:
 return fliers
 
 
-class KdePlotBase:
+class KdePlotBase(NumericPlotBase):
 @staticmethod
 def prepare_kde_data(data):
-# TODO: this logic is similar with HistogramPlotBase. Might have to 
deduplicate it.
-from pyspark.pandas.series import Series
-
-if isinstance(data, Series):
-data = data.to_frame()
-
-numeric_data = data.select_dtypes(
-include=["byte", "decimal", "integer", "float", "long", "double", 
np.datetime64]
-)
-
-# no empty frames or series allowed
-if len(numeric_data.columns) == 0:
-raise TypeError(
-"Empty {0!r}: no numeric data to " 
"plot".format(numeric_data.__class__.__name__)
-)
-
+_, numeric_data = NumericPlotBase.prepare_numeric_data(data)
 return numeric_data
 
 @staticmethod

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bc7e4f5 -> 5982162)

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bc7e4f5  [SPARK-36953][PYTHON] Expose SQL state and error class in 
PySpark exceptions
 add 5982162  [SPARK-36976][R] Add max_by/min_by API to SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE   |  2 ++
 R/pkg/R/functions.R   | 46 +++
 R/pkg/R/generics.R|  8 ++
 R/pkg/tests/fulltests/test_sparkSQL.R | 16 
 4 files changed, 72 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bc7e4f5  [SPARK-36953][PYTHON] Expose SQL state and error class in 
PySpark exceptions
bc7e4f5 is described below

commit bc7e4f54a993cd6d97a2bd28d9d62578dee130e8
Author: Hyukjin Kwon 
AuthorDate: Wed Oct 13 13:28:09 2021 +0900

[SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions

### What changes were proposed in this pull request?

This PR proposes to leverage the error message framework by exposing the 
methods below:

- `getErrorClass`
- `getSqlState`

at captured PySpark SQL exceptions (from `SparkThrowable`).

In addition, this PR adds a bit of refactoring. Previously the exception 
capture was done by string comparison which is flaky. Now, the logic leverages 
`isInstanceOf` on JVM.

### Why are the changes needed?

Users can leverage the error class and SQL state codes by:

```python
try:
...
except AnalysisException as e:
if e.getSqlState().startswith("4"):
...
```

### Does this PR introduce _any_ user-facing change?

Yes, users now can get `getErrorClass` and `getSqlState` from SQL 
exceptions.

### How was this patch tested?

Manually tested, and unittests were added.

Closes #34219 from HyukjinKwon/SPARK-36953.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_utils.py |  8 
 python/pyspark/sql/utils.py| 83 +-
 2 files changed, 70 insertions(+), 21 deletions(-)

diff --git a/python/pyspark/sql/tests/test_utils.py 
b/python/pyspark/sql/tests/test_utils.py
index 6d23736..10b579d 100644
--- a/python/pyspark/sql/tests/test_utils.py
+++ b/python/pyspark/sql/tests/test_utils.py
@@ -48,6 +48,14 @@ class UtilsTests(ReusedSQLTestCase):
 self.assertRegex(e.desc, "1024 is not in the permitted values")
 self.assertRegex(e.stackTrace, "org.apache.spark.sql.functions")
 
+def test_get_error_class_state(self):
+# SPARK-36953: test CapturedException.getErrorClass and getSqlState 
(from SparkThrowable)
+try:
+self.spark.sql("""SELECT a""")
+except AnalysisException as e:
+self.assertEquals(e.getErrorClass(), "MISSING_COLUMN")
+self.assertEquals(e.getSqlState(), "42000")
+
 
 if __name__ == "__main__":
 import unittest
diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py
index ced587ca..578cf71 100644
--- a/python/pyspark/sql/utils.py
+++ b/python/pyspark/sql/utils.py
@@ -16,24 +16,59 @@
 #
 
 import py4j
+from py4j.java_gateway import is_instance_of
 
 from pyspark import SparkContext
 
 
 class CapturedException(Exception):
-def __init__(self, desc, stackTrace, cause=None):
-self.desc = desc
-self.stackTrace = stackTrace
+def __init__(self, desc=None, stackTrace=None, cause=None, origin=None):
+# desc & stackTrace vs origin are mutually exclusive.
+# cause is optional.
+assert ((origin is not None and desc is None and stackTrace is None)
+or (origin is None and desc is not None and stackTrace is not 
None))
+
+self.desc = desc if desc is not None else origin.getMessage()
+self.stackTrace = (
+stackTrace if stackTrace is not None
+else 
SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(origin)
+)
 self.cause = convert_exception(cause) if cause is not None else None
+if self.cause is None and origin is not None and origin.getCause() is 
not None:
+self.cause = convert_exception(origin.getCause())
+self._origin = origin
 
 def __str__(self):
-sql_conf = 
SparkContext._jvm.org.apache.spark.sql.internal.SQLConf.get()
+assert SparkContext._jvm is not None
+
+jvm = SparkContext._jvm
+sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get()
 debug_enabled = sql_conf.pysparkJVMStacktraceEnabled()
 desc = self.desc
 if debug_enabled:
 desc = desc + "\n\nJVM stacktrace:\n%s" % self.stackTrace
 return str(desc)
 
+def getErrorClass(self):
+assert SparkContext._gateway is not None
+
+gw = SparkContext._gateway
+if self._origin is not None and is_instance_of(
+gw, self._origin, "org.apache.spark.SparkThrowable"):
+return self._origin.getErrorClass()
+else:
+return None
+
+def getSqlState(self):
+assert SparkContext._gateway is not None
+
+gw = SparkContext._gateway
+if self._origin is not None and is_instance_of(
+gw, self._origin, "org.a

[spark] branch master updated (92caa75 -> e861b0d)

2021-10-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92caa75  Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 
6g for Java 11"
 add e861b0d  [SPARK-36794][SQL] Ignore duplicated join keys when building 
relation for SEMI/ANTI shuffled hash join

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/joins/HashedRelation.scala | 30 ++---
 .../sql/execution/joins/ShuffledHashJoinExec.scala | 14 +++-
 .../scala/org/apache/spark/sql/JoinSuite.scala | 38 ++
 3 files changed, 70 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HeartSaVioR commented on pull request #359: New home page and layout for Spark website

2021-10-12 Thread GitBox


HeartSaVioR commented on pull request #359:
URL: https://github.com/apache/spark-website/pull/359#issuecomment-941878858


   Late LGTM. Looks nice!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 4b86fe4  Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 
6g for Java 11"
4b86fe4 is described below

commit 4b86fe4c71559df12ab8a1ebcf5662c4cf87ca7f
Author: Hyukjin Kwon 
AuthorDate: Wed Oct 13 12:09:57 2021 +0900

Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"

This reverts commit 29ebfdcdff74af72c6900fa0856ada3ab07f8de1.
---
 pom.xml| 6 +++---
 project/SparkBuild.scala   | 4 ++--
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive/pom.xml   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/pom.xml b/pom.xml
index bd8ede6..d9c10ee 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2640,7 +2640,7 @@
 
   -Xss128m
   -Xms4g
-  -Xmx6g
+  -Xmx4g
   -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
@@ -2690,7 +2690,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
-  -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+  -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
   

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 02/02: Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 92caa751257b894887d34e6abf02307931c090cd
Author: Hyukjin Kwon 
AuthorDate: Wed Oct 13 12:08:44 2021 +0900

Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"

This reverts commit 6ed13147c99b2f652748b716c70dd1937230cafd.
---
 pom.xml| 6 +++---
 project/SparkBuild.scala   | 4 ++--
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive/pom.xml   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/pom.xml b/pom.xml
index f5a0c3e..e36495f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2657,7 +2657,7 @@
 
   -Xss128m
   -Xms4g
-  -Xmx6g
+  -Xmx4g
   -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
@@ -2707,7 +2707,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
-  -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+  -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
   

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0144818 -> 92caa75)

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0144818  [SPARK-36985][PYTHON] Fix future typing errors in 
pyspark.pandas
 new 521a2f6  Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, 
for test mem limit"
 new 92caa75  Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 
6g for Java 11"

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 pom.xml| 6 +++---
 project/SparkBuild.scala   | 4 ++--
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive/pom.xml   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/02: Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, for test mem limit"

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 521a2f6ee269c25b2a28064b29f5810eea4fe30c
Author: Hyukjin Kwon 
AuthorDate: Wed Oct 13 12:08:38 2021 +0900

Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, for test mem 
limit"

This reverts commit 6e8cd3b1a7489c9b0c5779559e45b3cd5decc1ea.
---
 pom.xml| 6 +++---
 project/SparkBuild.scala   | 4 ++--
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive/pom.xml   | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/pom.xml b/pom.xml
index 2c46c52..f5a0c3e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2657,7 +2657,7 @@
 
   -Xss128m
   -Xms4g
-  -Xmx5g
+  -Xmx6g
   -XX:MaxMetaspaceSize=2g
   -XX:ReservedCodeCacheSize=${CodeCacheSize}
 
@@ -2707,7 +2707,7 @@
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx5g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
-  -da -Xmx5g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
+  -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} 
-Dio.netty.tryReflectionSetAccessible=true
 
   
   

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (973f04e -> 0144818)

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 973f04e  [SPARK-36961][PYTHON] Use PEP526 style variable type hints
 add 0144818  [SPARK-36985][PYTHON] Fix future typing errors in 
pyspark.pandas

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/indexes/base.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36961][PYTHON] Use PEP526 style variable type hints

2021-10-12 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 973f04e  [SPARK-36961][PYTHON] Use PEP526 style variable type hints
973f04e is described below

commit 973f04eea7140dc61457cc12e74d5e7e333013db
Author: Takuya UESHIN 
AuthorDate: Wed Oct 13 09:35:45 2021 +0900

[SPARK-36961][PYTHON] Use PEP526 style variable type hints

### What changes were proposed in this pull request?

Uses PEP526 style variable type hints.

### Why are the changes needed?

Now that we have started using newer Python syntax in the code base.
We should use PEP526 style variable type hints.

- https://www.python.org/dev/peps/pep-0526/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #34227 from ueshin/issues/SPARK-36961/pep526.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/accessors.py |  8 ++--
 python/pyspark/pandas/categorical.py   |  6 ++-
 python/pyspark/pandas/config.py|  6 +--
 python/pyspark/pandas/frame.py | 73 +++---
 python/pyspark/pandas/generic.py   |  3 +-
 python/pyspark/pandas/groupby.py   | 16 +++
 python/pyspark/pandas/indexes/base.py  | 28 ++--
 python/pyspark/pandas/indexes/multi.py | 10 ++--
 python/pyspark/pandas/indexing.py  | 12 ++---
 python/pyspark/pandas/internal.py  | 40 
 python/pyspark/pandas/mlflow.py|  4 +-
 python/pyspark/pandas/namespace.py | 55 +-
 python/pyspark/pandas/series.py| 20 
 python/pyspark/pandas/sql_processor.py |  6 +--
 python/pyspark/pandas/typedef/typehints.py | 16 +++
 python/pyspark/pandas/utils.py | 23 ++
 python/pyspark/pandas/window.py| 12 +++--
 python/pyspark/sql/pandas/conversion.py|  2 +-
 python/pyspark/sql/pandas/types.py |  3 +-
 19 files changed, 186 insertions(+), 157 deletions(-)

diff --git a/python/pyspark/pandas/accessors.py 
b/python/pyspark/pandas/accessors.py
index e69a86e..c54f21d 100644
--- a/python/pyspark/pandas/accessors.py
+++ b/python/pyspark/pandas/accessors.py
@@ -343,7 +343,7 @@ class PandasOnSparkFrameMethods(object):
 original_func = func
 func = lambda o: original_func(o, *args, **kwds)
 
-self_applied = DataFrame(self._psdf._internal.resolved_copy)  # type: 
DataFrame
+self_applied: DataFrame = DataFrame(self._psdf._internal.resolved_copy)
 
 if should_infer_schema:
 # Here we execute with the first 1000 to get the return type.
@@ -356,7 +356,7 @@ class PandasOnSparkFrameMethods(object):
 "The given function should return a frame; however, "
 "the return type was %s." % type(applied)
 )
-psdf = ps.DataFrame(applied)  # type: DataFrame
+psdf: DataFrame = DataFrame(applied)
 if len(pdf) <= limit:
 return psdf
 
@@ -632,7 +632,7 @@ class PandasOnSparkFrameMethods(object):
 [field.struct_field for field in index_fields + 
data_fields]
 )
 
-self_applied = DataFrame(self._psdf._internal.resolved_copy)  
# type: DataFrame
+self_applied: DataFrame = 
DataFrame(self._psdf._internal.resolved_copy)
 
 output_func = GroupBy._make_pandas_df_builder_func(
 self_applied, func, return_schema, retain_index=True
@@ -893,7 +893,7 @@ class PandasOnSparkSeriesMethods(object):
 limit = ps.get_option("compute.shortcut_limit")
 pser = self._psser.head(limit + 1)._to_internal_pandas()
 transformed = pser.transform(func)
-psser = Series(transformed)  # type: Series
+psser: Series = Series(transformed)
 
 field = psser._internal.data_fields[0].normalize_spark_type()
 else:
diff --git a/python/pyspark/pandas/categorical.py 
b/python/pyspark/pandas/categorical.py
index fa11228..d580253 100644
--- a/python/pyspark/pandas/categorical.py
+++ b/python/pyspark/pandas/categorical.py
@@ -239,8 +239,9 @@ class CategoricalAccessor(object):
 FutureWarning,
 )
 
+categories: List[Any]
 if is_list_like(new_categories):
-categories = list(new_categories)  # type: List
+categories = list(new_categories)
 else:
 categories = [new_categories]
 
@@ -433,8 +434,9 @@ class CategoricalAccessor(object):
 FutureWarning,
 )
 
+categories: List[Any]
 if is_list_like(removals):
-categories =

[spark] branch master updated: [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12

2021-10-12 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a91b9a  [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12
3a91b9a is described below

commit 3a91b9ac598abcb69703d2cd0247b5e378be58c0
Author: Kousuke Saruta 
AuthorDate: Wed Oct 13 09:18:22 2021 +0900

[SPARK-36981][BUILD] Upgrade joda-time to 2.10.12

### What changes were proposed in this pull request?

This PR upgrades `joda-time` from `2.10.10` to `2.10.12`.

### Why are the changes needed?

`2.10.12` supports an updated TZDB.

[diff](https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R1037)
https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CIs.

Closes #34253 from sarutak/upgrade-joda-2.10.12.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
---
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 
b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index 94a4758..d37b38b 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -148,7 +148,7 @@ jetty-util/6.1.26//jetty-util-6.1.26.jar
 jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar
 jetty/6.1.26//jetty-6.1.26.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.10.10//joda-time-2.10.10.jar
+joda-time/2.10.12//joda-time-2.10.12.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 
b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index 091f399..3040ffe 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -136,7 +136,7 @@ jettison/1.1//jettison-1.1.jar
 jetty-util-ajax/9.4.43.v20210629//jetty-util-ajax-9.4.43.v20210629.jar
 jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar
 jline/2.14.6//jline-2.14.6.jar
-joda-time/2.10.10//joda-time-2.10.10.jar
+joda-time/2.10.12//joda-time-2.10.12.jar
 jodd-core/3.5.2//jodd-core-3.5.2.jar
 jpam/1.1//jpam-1.1.jar
 json/1.8//json-1.8.jar
diff --git a/pom.xml b/pom.xml
index 6225fc0..2c46c52 100644
--- a/pom.xml
+++ b/pom.xml
@@ -184,7 +184,7 @@
 14.0.1
 3.0.16
 2.34
-2.10.10
+2.10.12
 3.5.2
 3.0.0
 0.12.0

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated (29ebfdc -> c42c8a3)

2021-10-12 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 29ebfdc  [SPARK-36900][TESTS][BUILD] Increase test memory to 6g for 
Java 11
 add c42c8a3  [SPARK-36979][SQL][3.2] Add RewriteLateralSubquery rule into 
nonExcludableRules

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala  | 3 ++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala   | 7 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36951][PYTHON] Inline type hints for python/pyspark/sql/column.py

2021-10-12 Thread ueshin
This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ba57f5  [SPARK-36951][PYTHON] Inline type hints for 
python/pyspark/sql/column.py
3ba57f5 is described below

commit 3ba57f5edc5594ee676249cd309b8f0d8248462e
Author: Xinrong Meng 
AuthorDate: Tue Oct 12 13:36:22 2021 -0700

[SPARK-36951][PYTHON] Inline type hints for python/pyspark/sql/column.py

### What changes were proposed in this pull request?
Inline type hints for python/pyspark/sql/column.py

### Why are the changes needed?
Currently, Inline type hints for python/pyspark/sql/column.pyi doesn't 
support type checking within function bodies. So we inline type hints to 
support that.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing test.

Closes #34226 from xinrong-databricks/inline_column.

Authored-by: Xinrong Meng 
Signed-off-by: Takuya UESHIN 
---
 python/pyspark/sql/column.py  | 236 --
 python/pyspark/sql/column.pyi | 118 ---
 python/pyspark/sql/dataframe.py   |  12 +-
 python/pyspark/sql/functions.py   |   3 +-
 python/pyspark/sql/observation.py |   5 +-
 python/pyspark/sql/window.py  |   4 +-
 6 files changed, 190 insertions(+), 188 deletions(-)

diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index c46b0eb..a3e3e9e 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -18,25 +18,43 @@
 import sys
 import json
 import warnings
+from typing import (
+cast,
+overload,
+Any,
+Callable,
+Iterable,
+List,
+Optional,
+Tuple,
+TYPE_CHECKING,
+Union
+)
+
+from py4j.java_gateway import JavaObject
 
 from pyspark import copy_func
 from pyspark.context import SparkContext
 from pyspark.sql.types import DataType, StructField, StructType, IntegerType, 
StringType
 
+if TYPE_CHECKING:
+from pyspark.sql._typing import ColumnOrName, LiteralType, DecimalLiteral, 
DateTimeLiteral
+from pyspark.sql.window import WindowSpec
+
 __all__ = ["Column"]
 
 
-def _create_column_from_literal(literal):
-sc = SparkContext._active_spark_context
+def _create_column_from_literal(literal: Union["LiteralType", 
"DecimalLiteral"]) -> "Column":
+sc = SparkContext._active_spark_context  # type: ignore[attr-defined]
 return sc._jvm.functions.lit(literal)
 
 
-def _create_column_from_name(name):
-sc = SparkContext._active_spark_context
+def _create_column_from_name(name: str) -> "Column":
+sc = SparkContext._active_spark_context  # type: ignore[attr-defined]
 return sc._jvm.functions.col(name)
 
 
-def _to_java_column(col):
+def _to_java_column(col: "ColumnOrName") -> JavaObject:
 if isinstance(col, Column):
 jcol = col._jc
 elif isinstance(col, str):
@@ -50,7 +68,11 @@ def _to_java_column(col):
 return jcol
 
 
-def _to_seq(sc, cols, converter=None):
+def _to_seq(
+sc: SparkContext,
+cols: Iterable["ColumnOrName"],
+converter: Optional[Callable[["ColumnOrName"], JavaObject]] = None,
+) -> JavaObject:
 """
 Convert a list of Column (or names) into a JVM Seq of Column.
 
@@ -59,10 +81,14 @@ def _to_seq(sc, cols, converter=None):
 """
 if converter:
 cols = [converter(c) for c in cols]
-return sc._jvm.PythonUtils.toSeq(cols)
+return sc._jvm.PythonUtils.toSeq(cols)  # type: ignore[attr-defined]
 
 
-def _to_list(sc, cols, converter=None):
+def _to_list(
+sc: SparkContext,
+cols: List["ColumnOrName"],
+converter: Optional[Callable[["ColumnOrName"], JavaObject]] = None,
+) -> JavaObject:
 """
 Convert a list of Column (or names) into a JVM (Scala) List of Column.
 
@@ -71,30 +97,37 @@ def _to_list(sc, cols, converter=None):
 """
 if converter:
 cols = [converter(c) for c in cols]
-return sc._jvm.PythonUtils.toList(cols)
+return sc._jvm.PythonUtils.toList(cols)  # type: ignore[attr-defined]
 
 
-def _unary_op(name, doc="unary operator"):
+def _unary_op(
+name: str,
+doc: str = "unary operator",
+) -> Callable[["Column"], "Column"]:
 """ Create a method for given unary operator """
-def _(self):
+def _(self: "Column") -> "Column":
 jc = getattr(self._jc, name)()
 return Column(jc)
 _.__doc__ = doc
 return _
 
 
-def _func_op(name, doc=''):
-def _(self):
-sc = SparkContext._active_spark_context
+def _func_op(name: str, doc: str = '') -> Callable[["Column"], "Column"]:
+def _(self: "Column") -> "Column":
+sc = SparkContext._active_spark_context  # type: ignore[attr-defined]
 jc = getattr(sc._jvm.functions, name)(self._jc)
 return Column(jc)
 _.__doc__ = doc
 return _
 
 
-def _bin_func_op(name, rev

[GitHub] [spark-website] gengliangwang merged pull request #360: Fix URL in twitter:image of home page

2021-10-12 Thread GitBox


gengliangwang merged pull request #360:
URL: https://github.com/apache/spark-website/pull/360


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Fix URL in twitter:image of home page (#360)

2021-10-12 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 0a6505e  Fix URL in twitter:image of home page (#360)
0a6505e is described below

commit 0a6505e4f7862290a2cf0326df16762887bfa1ef
Author: Gengliang Wang 
AuthorDate: Wed Oct 13 02:51:22 2021 +0800

Fix URL in twitter:image of home page (#360)

The URL of twitter:image misses one slash. This PR is to fix it.
---
 _layouts/home.html | 2 +-
 site/index.html| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/_layouts/home.html b/_layouts/home.html
index 6bde86c..83310da 100644
--- a/_layouts/home.html
+++ b/_layouts/home.html
@@ -22,7 +22,7 @@
   
   
   
-  
+  
 
 
   https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css"; 
rel="stylesheet"
diff --git a/site/index.html b/site/index.html
index 7021702..de9de34 100644
--- a/site/index.html
+++ b/site/index.html
@@ -18,7 +18,7 @@
   
   
   
-  https://spark.apache.orgimages/spark-twitter-card-large.jpg";>
+  https://spark.apache.org/images/spark-twitter-card-large.jpg";>
 
 
   https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css"; 
rel="stylesheet"

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #360: Fix URL in twitter:image of home page

2021-10-12 Thread GitBox


gengliangwang commented on pull request #360:
URL: https://github.com/apache/spark-website/pull/360#issuecomment-941284401


   > Any other occurences? might look for this variable followed by no slash, 
just to make sure there aren't others
   
   @srowen I checked. This is the only bug I can find.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen commented on pull request #360: Fix URL in twitter:image of home page

2021-10-12 Thread GitBox


srowen commented on pull request #360:
URL: https://github.com/apache/spark-website/pull/360#issuecomment-941279889


   (There's no markdown template for layouts right?)
   Any other occurences? might look for this variable followed by no slash, 
just to make sure there aren't others


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #360: Fix URL in twitter:image of home page

2021-10-12 Thread GitBox


gengliangwang commented on pull request #360:
URL: https://github.com/apache/spark-website/pull/360#issuecomment-941261626


   cc @gatorsmile 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang opened a new pull request #360: Fix URL in twitter:image of home page

2021-10-12 Thread GitBox


gengliangwang opened a new pull request #360:
URL: https://github.com/apache/spark-website/pull/360


   
   The URL of `twitter:image` misses one slash. This PR is to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (dc1db95 -> 1af7072)

2021-10-12 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from dc1db95  [SPARK-36867][SQL] Fix error message with GROUP BY alias
 add 1af7072  [SPARK-36970][SQL] Manual disabled format `B` of 
`date_format` function to make Java 17 compatible with Java 8

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala| 6 +-
 .../org/apache/spark/sql/catalyst/util/DatetimeFormatterSuite.scala | 2 +-
 .../resources/sql-tests/results/datetime-formatting-invalid.sql.out | 2 +-
 3 files changed, 7 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang commented on pull request #359: New home page and layout for Spark website

2021-10-12 Thread GitBox


gengliangwang commented on pull request #359:
URL: https://github.com/apache/spark-website/pull/359#issuecomment-941194036


   Merged. Thanks all for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gengliangwang merged pull request #359: New home page and layout for Spark website

2021-10-12 Thread GitBox


gengliangwang merged pull request #359:
URL: https://github.com/apache/spark-website/pull/359


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36867][SQL] Fix error message with GROUP BY alias

2021-10-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dc1db95  [SPARK-36867][SQL] Fix error message with GROUP BY alias
dc1db95 is described below

commit dc1db950adb9a210acfe4a0a77988955a5f35e5e
Author: Wenchen Fan 
AuthorDate: Tue Oct 12 22:47:31 2021 +0800

[SPARK-36867][SQL] Fix error message with GROUP BY alias

### What changes were proposed in this pull request?

When checking unresolved attributes, we should check 
`Aggregate.aggregateExpressions` before `Aggregate.groupingExpressions`, 
because the latter may rely on the former, due to the GROUP BY alias feature.

### Why are the changes needed?

improve error message

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

Closes #34244 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 28 +-
 .../test/resources/sql-tests/inputs/group-by.sql   |  3 +++
 .../resources/sql-tests/results/group-by.sql.out   | 11 -
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index bdd7ffb..5bf37a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -165,7 +165,14 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 }
 }
 
-operator transformExpressionsUp {
+val exprs = operator match {
+  // `groupingExpressions` may rely on `aggregateExpressions`, due to 
the GROUP BY alias
+  // feature. We should check errors in `aggregateExpressions` first.
+  case a: Aggregate => a.aggregateExpressions ++ a.groupingExpressions
+  case _ => operator.expressions
+}
+
+exprs.foreach(_.foreachUp {
   case a: Attribute if !a.resolved =>
 val missingCol = a.sql
 val candidates = operator.inputSet.toSeq.map(_.qualifiedName)
@@ -209,27 +216,26 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 failAnalysis(s"${wf.prettyName} function can only be evaluated in 
an ordered " +
   s"row-based window frame with a single offset: $w")
 
-  case w @ WindowExpression(e, s) =>
+  case w: WindowExpression =>
 // Only allow window functions with an aggregate expression or an 
offset window
 // function or a Pandas window UDF.
-e match {
+w.windowFunction match {
   case _: AggregateExpression | _: FrameLessOffsetWindowFunction |
-  _: AggregateWindowFunction =>
-w
-  case f: PythonUDF if PythonUDF.isWindowPandasUDF(f) =>
-w
-  case _ =>
-failAnalysis(s"Expression '$e' not supported within a window 
function.")
+  _: AggregateWindowFunction => // OK
+  case f: PythonUDF if PythonUDF.isWindowPandasUDF(f) => // OK
+  case other =>
+failAnalysis(s"Expression '$other' not supported within a 
window function.")
 }
 
   case s: SubqueryExpression =>
 checkSubqueryExpression(operator, s)
-s
 
   case e: ExpressionWithRandomSeed if !e.seedExpression.foldable =>
 failAnalysis(
   s"Input argument to ${e.prettyName} must be a constant.")
-}
+
+  case _ =>
+})
 
 operator match {
   case etw: EventTimeWatermark =>
diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql 
b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
index e2c3672..039373b 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
@@ -45,6 +45,9 @@ SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 
AS a, 2 AS b, 3 AS
 SELECT a AS k, COUNT(b) FROM testData GROUP BY k;
 SELECT a AS k, COUNT(b) FROM testData GROUP BY k HAVING k > 1;
 
+-- GROUP BY alias with invalid col in SELECT list
+SELECT a AS k, COUNT(non_existing) FROM testData GROUP BY k;
+
 -- Aggregate functions cannot be used in GROUP BY
 SELECT COUNT(b) AS k FROM testData GROUP BY k;
 
diff --git a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out 
b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out
index 37deb87..f598f49 100644
--- a/sql/core/src/test/resources/sql-tests/res

[spark] tag v3.2.0 created (now 5d45a41)

2021-10-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to tag v3.2.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 5d45a41  (commit)
No new revisions were added by this update.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36914][SQL] Implement dropIndex and listIndexes in JDBC (MySQL dialect)

2021-10-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a453fd5  [SPARK-36914][SQL] Implement dropIndex and listIndexes in 
JDBC (MySQL dialect)
a453fd5 is described below

commit a453fd55dd37516fbfb9332cf43e360796dfb955
Author: Huaxin Gao 
AuthorDate: Tue Oct 12 22:36:47 2021 +0800

[SPARK-36914][SQL] Implement dropIndex and listIndexes in JDBC (MySQL 
dialect)

### What changes were proposed in this pull request?
This PR implements `dropIndex` and `listIndexes` in MySQL dialect

### Why are the changes needed?
As a subtask of the V2 Index support, this PR completes the implementation 
for JDBC V2 index support.

### Does this PR introduce _any_ user-facing change?
Yes, `dropIndex/listIndexes` in DS V2 JDBC

### How was this patch tested?
new tests

Closes #34236 from huaxingao/listIndexJDBC.

Authored-by: Huaxin Gao 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala  | 33 +++-
 .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala  | 95 +-
 .../sql/connector/catalog/index/SupportsIndex.java |  7 +-
 .../sql/connector/catalog/index/TableIndex.java| 12 ++-
 .../catalyst/analysis/NoSuchItemException.scala|  4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 24 ++
 .../execution/datasources/v2/jdbc/JDBCTable.scala  | 13 ++-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 25 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 84 +++
 9 files changed, 239 insertions(+), 58 deletions(-)

diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
index 3cb8787..67e8108 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
@@ -24,8 +24,6 @@ import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.AnalysisException
-import org.apache.spark.sql.catalyst.analysis.IndexAlreadyExistsException
-import org.apache.spark.sql.connector.catalog.{Catalogs, Identifier, 
TableCatalog}
 import org.apache.spark.sql.connector.catalog.index.SupportsIndex
 import org.apache.spark.sql.connector.expressions.{FieldReference, 
NamedReference}
 import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog
@@ -121,31 +119,22 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite with V2JDBCTest {
 assert(t.schema === expectedSchema)
   }
 
-  override def testIndex(tbl: String): Unit = {
-val loaded = Catalogs.load("mysql", conf)
-val jdbcTable = loaded.asInstanceOf[TableCatalog]
-  .loadTable(Identifier.of(Array.empty[String], "new_table"))
-  .asInstanceOf[SupportsIndex]
-assert(jdbcTable.indexExists("i1") == false)
-assert(jdbcTable.indexExists("i2") == false)
+  override def supportsIndex: Boolean = true
 
+  override def testIndexProperties(jdbcTable: SupportsIndex): Unit = {
 val properties = new util.Properties();
 properties.put("KEY_BLOCK_SIZE", "10")
 properties.put("COMMENT", "'this is a comment'")
-jdbcTable.createIndex("i1", "", Array(FieldReference("col1")),
+// MySQL doesn't allow property set on individual column, so use empty 
Array for
+// column properties
+jdbcTable.createIndex("i1", "BTREE", Array(FieldReference("col1")),
   Array.empty[util.Map[NamedReference, util.Properties]], properties)
 
-jdbcTable.createIndex("i2", "",
-  Array(FieldReference("col2"), FieldReference("col3"), 
FieldReference("col5")),
-  Array.empty[util.Map[NamedReference, util.Properties]], new 
util.Properties)
-
-assert(jdbcTable.indexExists("i1") == true)
-assert(jdbcTable.indexExists("i2") == true)
-
-val m = intercept[IndexAlreadyExistsException] {
-  jdbcTable.createIndex("i1", "", Array(FieldReference("col1")),
-Array.empty[util.Map[NamedReference, util.Properties]], properties)
-}.getMessage
-assert(m.contains("Failed to create index: i1 in new_table"))
+var index = jdbcTable.listIndexes()
+// The index property size is actually 1. Even though the index is created
+// with properties "KEY_BLOCK_SIZE", "10" and "COMMENT", "'this is a 
comment'", when
+// retrieving index using `SHOW INDEXES`, MySQL only returns `COMMENT`.
+assert(index(0).properties.size == 1)
+assert(index(0).properties.get("COMMENT").equals("this is a comment"))
   }
 }
diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2J

[spark] branch master updated (36b3bbc0 -> b9a8165)

2021-10-12 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 36b3bbc0 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into 
nonExcludableRules
 add b9a8165  [SPARK-36972][PYTHON] Add max_by/min_by API to PySpark

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst |  2 +
 python/pyspark/sql/functions.py  | 72 
 2 files changed, 74 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules

2021-10-12 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 36b3bbc0 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into 
nonExcludableRules
36b3bbc0 is described below

commit 36b3bbc0aa9f9c39677960cd93f32988c7d7aaca
Author: ulysses-you 
AuthorDate: Tue Oct 12 16:21:53 2021 +0800

[SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules

### What changes were proposed in this pull request?

Add RewriteLateralSubquery rule into nonExcludableRules.

### Why are the changes needed?

Lateral Join has no meaning without rule `RewriteLateralSubquery`. So now 
if we set 
`spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery`,
 the lateral join query will fail with:
```
java.lang.AssertionError: assertion failed: No plan for LateralJoin 
lateral-subquery#218
```

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

add test

Closes #34249 from ulysses-you/SPARK-36979.

Authored-by: ulysses-you 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala  | 3 ++-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala   | 7 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index b8c7fe7..73be790 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -284,7 +284,8 @@ abstract class Optimizer(catalogManager: CatalogManager)
   NormalizeFloatingNumbers.ruleName ::
   ReplaceUpdateFieldsExpression.ruleName ::
   PullOutGroupingExpressions.ruleName ::
-  RewriteAsOfJoin.ruleName :: Nil
+  RewriteAsOfJoin.ruleName ::
+  RewriteLateralSubquery.ruleName :: Nil
 
   /**
* Optimize all the subqueries inside expression.
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 3d5b911..11b7ee6 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -4204,6 +4204,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 checkAnswer(sql("""SELECT from_json(r'{"a": "\\"}', 'a string')"""), 
Row(Row("\\")))
 checkAnswer(sql("""SELECT from_json(R'{"a": "\\"}', 'a string')"""), 
Row(Row("\\")))
   }
+
+  test("SPARK-36979: Add RewriteLateralSubquery rule into nonExcludableRules") 
{
+withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
+  "org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery") {
+  sql("SELECT * FROM testData, LATERAL (SELECT * FROM testData)").collect()
+}
+  }
 }
 
 case class Foo(bar: Option[String])

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] yaooqinn commented on pull request #359: New home page and layout for Spark website

2021-10-12 Thread GitBox


yaooqinn commented on pull request #359:
URL: https://github.com/apache/spark-website/pull/359#issuecomment-940778866


   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org