[spark] branch master updated: [SPARK-36973][PYTHON] Deduplicate prepare data method for HistogramPlotBase and KdePlotBase
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f678c75 [SPARK-36973][PYTHON] Deduplicate prepare data method for HistogramPlotBase and KdePlotBase f678c75 is described below commit f678c75d3940b2887fdb2621691b791b95d79469 Author: dch nguyen AuthorDate: Wed Oct 13 15:56:09 2021 +0900 [SPARK-36973][PYTHON] Deduplicate prepare data method for HistogramPlotBase and KdePlotBase ### What changes were proposed in this pull request? Deduplicate prepare data method for HistogramPlotBase and KdePlotBase ### Why are the changes needed? Deduplicate code Remove 2 ```TODO``` comment ### Does this PR introduce _any_ user-facing change? No, only for Dev ### How was this patch tested? Existing tests Closes #34251 from dchvn/SPARK-36973. Lead-authored-by: dch nguyen Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/plot/core.py | 31 +++ 1 file changed, 11 insertions(+), 20 deletions(-) diff --git a/python/pyspark/pandas/plot/core.py b/python/pyspark/pandas/plot/core.py index dc95eac..89b8320 100644 --- a/python/pyspark/pandas/plot/core.py +++ b/python/pyspark/pandas/plot/core.py @@ -98,10 +98,9 @@ class SampledPlotBase: ) -class HistogramPlotBase: +class NumericPlotBase: @staticmethod -def prepare_hist_data(data, bins): -# TODO: this logic is similar with KdePlotBase. Might have to deduplicate it. +def prepare_numeric_data(data): from pyspark.pandas.series import Series if isinstance(data, Series): @@ -117,6 +116,13 @@ class HistogramPlotBase: "Empty {0!r}: no numeric data to " "plot".format(numeric_data.__class__.__name__) ) +return data, numeric_data + + +class HistogramPlotBase(NumericPlotBase): +@staticmethod +def prepare_hist_data(data, bins): +data, numeric_data = NumericPlotBase.prepare_numeric_data(data) if is_integer(bins): # computes boundaries for the column bins = HistogramPlotBase.get_bins(data.to_spark(), bins) @@ -340,25 +346,10 @@ class BoxPlotBase: return fliers -class KdePlotBase: +class KdePlotBase(NumericPlotBase): @staticmethod def prepare_kde_data(data): -# TODO: this logic is similar with HistogramPlotBase. Might have to deduplicate it. -from pyspark.pandas.series import Series - -if isinstance(data, Series): -data = data.to_frame() - -numeric_data = data.select_dtypes( -include=["byte", "decimal", "integer", "float", "long", "double", np.datetime64] -) - -# no empty frames or series allowed -if len(numeric_data.columns) == 0: -raise TypeError( -"Empty {0!r}: no numeric data to " "plot".format(numeric_data.__class__.__name__) -) - +_, numeric_data = NumericPlotBase.prepare_numeric_data(data) return numeric_data @staticmethod - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bc7e4f5 -> 5982162)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bc7e4f5 [SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions add 5982162 [SPARK-36976][R] Add max_by/min_by API to SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 46 +++ R/pkg/R/generics.R| 8 ++ R/pkg/tests/fulltests/test_sparkSQL.R | 16 4 files changed, 72 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bc7e4f5 [SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions bc7e4f5 is described below commit bc7e4f54a993cd6d97a2bd28d9d62578dee130e8 Author: Hyukjin Kwon AuthorDate: Wed Oct 13 13:28:09 2021 +0900 [SPARK-36953][PYTHON] Expose SQL state and error class in PySpark exceptions ### What changes were proposed in this pull request? This PR proposes to leverage the error message framework by exposing the methods below: - `getErrorClass` - `getSqlState` at captured PySpark SQL exceptions (from `SparkThrowable`). In addition, this PR adds a bit of refactoring. Previously the exception capture was done by string comparison which is flaky. Now, the logic leverages `isInstanceOf` on JVM. ### Why are the changes needed? Users can leverage the error class and SQL state codes by: ```python try: ... except AnalysisException as e: if e.getSqlState().startswith("4"): ... ``` ### Does this PR introduce _any_ user-facing change? Yes, users now can get `getErrorClass` and `getSqlState` from SQL exceptions. ### How was this patch tested? Manually tested, and unittests were added. Closes #34219 from HyukjinKwon/SPARK-36953. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/tests/test_utils.py | 8 python/pyspark/sql/utils.py| 83 +- 2 files changed, 70 insertions(+), 21 deletions(-) diff --git a/python/pyspark/sql/tests/test_utils.py b/python/pyspark/sql/tests/test_utils.py index 6d23736..10b579d 100644 --- a/python/pyspark/sql/tests/test_utils.py +++ b/python/pyspark/sql/tests/test_utils.py @@ -48,6 +48,14 @@ class UtilsTests(ReusedSQLTestCase): self.assertRegex(e.desc, "1024 is not in the permitted values") self.assertRegex(e.stackTrace, "org.apache.spark.sql.functions") +def test_get_error_class_state(self): +# SPARK-36953: test CapturedException.getErrorClass and getSqlState (from SparkThrowable) +try: +self.spark.sql("""SELECT a""") +except AnalysisException as e: +self.assertEquals(e.getErrorClass(), "MISSING_COLUMN") +self.assertEquals(e.getSqlState(), "42000") + if __name__ == "__main__": import unittest diff --git a/python/pyspark/sql/utils.py b/python/pyspark/sql/utils.py index ced587ca..578cf71 100644 --- a/python/pyspark/sql/utils.py +++ b/python/pyspark/sql/utils.py @@ -16,24 +16,59 @@ # import py4j +from py4j.java_gateway import is_instance_of from pyspark import SparkContext class CapturedException(Exception): -def __init__(self, desc, stackTrace, cause=None): -self.desc = desc -self.stackTrace = stackTrace +def __init__(self, desc=None, stackTrace=None, cause=None, origin=None): +# desc & stackTrace vs origin are mutually exclusive. +# cause is optional. +assert ((origin is not None and desc is None and stackTrace is None) +or (origin is None and desc is not None and stackTrace is not None)) + +self.desc = desc if desc is not None else origin.getMessage() +self.stackTrace = ( +stackTrace if stackTrace is not None +else SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(origin) +) self.cause = convert_exception(cause) if cause is not None else None +if self.cause is None and origin is not None and origin.getCause() is not None: +self.cause = convert_exception(origin.getCause()) +self._origin = origin def __str__(self): -sql_conf = SparkContext._jvm.org.apache.spark.sql.internal.SQLConf.get() +assert SparkContext._jvm is not None + +jvm = SparkContext._jvm +sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get() debug_enabled = sql_conf.pysparkJVMStacktraceEnabled() desc = self.desc if debug_enabled: desc = desc + "\n\nJVM stacktrace:\n%s" % self.stackTrace return str(desc) +def getErrorClass(self): +assert SparkContext._gateway is not None + +gw = SparkContext._gateway +if self._origin is not None and is_instance_of( +gw, self._origin, "org.apache.spark.SparkThrowable"): +return self._origin.getErrorClass() +else: +return None + +def getSqlState(self): +assert SparkContext._gateway is not None + +gw = SparkContext._gateway +if self._origin is not None and is_instance_of( +gw, self._origin, "org.a
[spark] branch master updated (92caa75 -> e861b0d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 92caa75 Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11" add e861b0d [SPARK-36794][SQL] Ignore duplicated join keys when building relation for SEMI/ANTI shuffled hash join No new revisions were added by this update. Summary of changes: .../spark/sql/execution/joins/HashedRelation.scala | 30 ++--- .../sql/execution/joins/ShuffledHashJoinExec.scala | 14 +++- .../scala/org/apache/spark/sql/JoinSuite.scala | 38 ++ 3 files changed, 70 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HeartSaVioR commented on pull request #359: New home page and layout for Spark website
HeartSaVioR commented on pull request #359: URL: https://github.com/apache/spark-website/pull/359#issuecomment-941878858 Late LGTM. Looks nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 4b86fe4 Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11" 4b86fe4 is described below commit 4b86fe4c71559df12ab8a1ebcf5662c4cf87ca7f Author: Hyukjin Kwon AuthorDate: Wed Oct 13 12:09:57 2021 +0900 Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11" This reverts commit 29ebfdcdff74af72c6900fa0856ada3ab07f8de1. --- pom.xml| 6 +++--- project/SparkBuild.scala | 4 ++-- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive/pom.xml | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/pom.xml b/pom.xml index bd8ede6..d9c10ee 100644 --- a/pom.xml +++ b/pom.xml @@ -2640,7 +2640,7 @@ -Xss128m -Xms4g - -Xmx6g + -Xmx4g -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2690,7 +2690,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true + -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git commit 92caa751257b894887d34e6abf02307931c090cd Author: Hyukjin Kwon AuthorDate: Wed Oct 13 12:08:44 2021 +0900 Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11" This reverts commit 6ed13147c99b2f652748b716c70dd1937230cafd. --- pom.xml| 6 +++--- project/SparkBuild.scala | 4 ++-- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive/pom.xml | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/pom.xml b/pom.xml index f5a0c3e..e36495f 100644 --- a/pom.xml +++ b/pom.xml @@ -2657,7 +2657,7 @@ -Xss128m -Xms4g - -Xmx6g + -Xmx4g -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2707,7 +2707,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true + -da -Xmx4g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0144818 -> 92caa75)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0144818 [SPARK-36985][PYTHON] Fix future typing errors in pyspark.pandas new 521a2f6 Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, for test mem limit" new 92caa75 Revert "[SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: pom.xml| 6 +++--- project/SparkBuild.scala | 4 ++-- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive/pom.xml | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, for test mem limit"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git commit 521a2f6ee269c25b2a28064b29f5810eea4fe30c Author: Hyukjin Kwon AuthorDate: Wed Oct 13 12:08:38 2021 +0900 Revert "[SPARK-36900][BUILD][TESTS][FOLLOWUP] Try 5g, not 6g, for test mem limit" This reverts commit 6e8cd3b1a7489c9b0c5779559e45b3cd5decc1ea. --- pom.xml| 6 +++--- project/SparkBuild.scala | 4 ++-- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive/pom.xml | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) diff --git a/pom.xml b/pom.xml index 2c46c52..f5a0c3e 100644 --- a/pom.xml +++ b/pom.xml @@ -2657,7 +2657,7 @@ -Xss128m -Xms4g - -Xmx5g + -Xmx6g -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} @@ -2707,7 +2707,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports --ea -Xmx5g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true +-ea -Xmx6g -Xss4m -XX:MaxMetaspaceSize=2g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - -da -Xmx5g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true + -da -Xmx6g -XX:ReservedCodeCacheSize=${CodeCacheSize} -Dio.netty.tryReflectionSetAccessible=true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (973f04e -> 0144818)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 973f04e [SPARK-36961][PYTHON] Use PEP526 style variable type hints add 0144818 [SPARK-36985][PYTHON] Fix future typing errors in pyspark.pandas No new revisions were added by this update. Summary of changes: python/pyspark/pandas/indexes/base.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36961][PYTHON] Use PEP526 style variable type hints
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 973f04e [SPARK-36961][PYTHON] Use PEP526 style variable type hints 973f04e is described below commit 973f04eea7140dc61457cc12e74d5e7e333013db Author: Takuya UESHIN AuthorDate: Wed Oct 13 09:35:45 2021 +0900 [SPARK-36961][PYTHON] Use PEP526 style variable type hints ### What changes were proposed in this pull request? Uses PEP526 style variable type hints. ### Why are the changes needed? Now that we have started using newer Python syntax in the code base. We should use PEP526 style variable type hints. - https://www.python.org/dev/peps/pep-0526/ ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #34227 from ueshin/issues/SPARK-36961/pep526. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/accessors.py | 8 ++-- python/pyspark/pandas/categorical.py | 6 ++- python/pyspark/pandas/config.py| 6 +-- python/pyspark/pandas/frame.py | 73 +++--- python/pyspark/pandas/generic.py | 3 +- python/pyspark/pandas/groupby.py | 16 +++ python/pyspark/pandas/indexes/base.py | 28 ++-- python/pyspark/pandas/indexes/multi.py | 10 ++-- python/pyspark/pandas/indexing.py | 12 ++--- python/pyspark/pandas/internal.py | 40 python/pyspark/pandas/mlflow.py| 4 +- python/pyspark/pandas/namespace.py | 55 +- python/pyspark/pandas/series.py| 20 python/pyspark/pandas/sql_processor.py | 6 +-- python/pyspark/pandas/typedef/typehints.py | 16 +++ python/pyspark/pandas/utils.py | 23 ++ python/pyspark/pandas/window.py| 12 +++-- python/pyspark/sql/pandas/conversion.py| 2 +- python/pyspark/sql/pandas/types.py | 3 +- 19 files changed, 186 insertions(+), 157 deletions(-) diff --git a/python/pyspark/pandas/accessors.py b/python/pyspark/pandas/accessors.py index e69a86e..c54f21d 100644 --- a/python/pyspark/pandas/accessors.py +++ b/python/pyspark/pandas/accessors.py @@ -343,7 +343,7 @@ class PandasOnSparkFrameMethods(object): original_func = func func = lambda o: original_func(o, *args, **kwds) -self_applied = DataFrame(self._psdf._internal.resolved_copy) # type: DataFrame +self_applied: DataFrame = DataFrame(self._psdf._internal.resolved_copy) if should_infer_schema: # Here we execute with the first 1000 to get the return type. @@ -356,7 +356,7 @@ class PandasOnSparkFrameMethods(object): "The given function should return a frame; however, " "the return type was %s." % type(applied) ) -psdf = ps.DataFrame(applied) # type: DataFrame +psdf: DataFrame = DataFrame(applied) if len(pdf) <= limit: return psdf @@ -632,7 +632,7 @@ class PandasOnSparkFrameMethods(object): [field.struct_field for field in index_fields + data_fields] ) -self_applied = DataFrame(self._psdf._internal.resolved_copy) # type: DataFrame +self_applied: DataFrame = DataFrame(self._psdf._internal.resolved_copy) output_func = GroupBy._make_pandas_df_builder_func( self_applied, func, return_schema, retain_index=True @@ -893,7 +893,7 @@ class PandasOnSparkSeriesMethods(object): limit = ps.get_option("compute.shortcut_limit") pser = self._psser.head(limit + 1)._to_internal_pandas() transformed = pser.transform(func) -psser = Series(transformed) # type: Series +psser: Series = Series(transformed) field = psser._internal.data_fields[0].normalize_spark_type() else: diff --git a/python/pyspark/pandas/categorical.py b/python/pyspark/pandas/categorical.py index fa11228..d580253 100644 --- a/python/pyspark/pandas/categorical.py +++ b/python/pyspark/pandas/categorical.py @@ -239,8 +239,9 @@ class CategoricalAccessor(object): FutureWarning, ) +categories: List[Any] if is_list_like(new_categories): -categories = list(new_categories) # type: List +categories = list(new_categories) else: categories = [new_categories] @@ -433,8 +434,9 @@ class CategoricalAccessor(object): FutureWarning, ) +categories: List[Any] if is_list_like(removals): -categories =
[spark] branch master updated: [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3a91b9a [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12 3a91b9a is described below commit 3a91b9ac598abcb69703d2cd0247b5e378be58c0 Author: Kousuke Saruta AuthorDate: Wed Oct 13 09:18:22 2021 +0900 [SPARK-36981][BUILD] Upgrade joda-time to 2.10.12 ### What changes were proposed in this pull request? This PR upgrades `joda-time` from `2.10.10` to `2.10.12`. ### Why are the changes needed? `2.10.12` supports an updated TZDB. [diff](https://github.com/JodaOrg/joda-time/compare/v2.10.10...v2.10.12#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R1037) https://github.com/JodaOrg/joda-time/issues/566#issuecomment-930207547 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CIs. Closes #34253 from sarutak/upgrade-joda-2.10.12. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 94a4758..d37b38b 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -148,7 +148,7 @@ jetty-util/6.1.26//jetty-util-6.1.26.jar jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar jetty/6.1.26//jetty-6.1.26.jar jline/2.14.6//jline-2.14.6.jar -joda-time/2.10.10//joda-time-2.10.10.jar +joda-time/2.10.12//joda-time-2.10.12.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 091f399..3040ffe 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -136,7 +136,7 @@ jettison/1.1//jettison-1.1.jar jetty-util-ajax/9.4.43.v20210629//jetty-util-ajax-9.4.43.v20210629.jar jetty-util/9.4.43.v20210629//jetty-util-9.4.43.v20210629.jar jline/2.14.6//jline-2.14.6.jar -joda-time/2.10.10//joda-time-2.10.10.jar +joda-time/2.10.12//joda-time-2.10.12.jar jodd-core/3.5.2//jodd-core-3.5.2.jar jpam/1.1//jpam-1.1.jar json/1.8//json-1.8.jar diff --git a/pom.xml b/pom.xml index 6225fc0..2c46c52 100644 --- a/pom.xml +++ b/pom.xml @@ -184,7 +184,7 @@ 14.0.1 3.0.16 2.34 -2.10.10 +2.10.12 3.5.2 3.0.0 0.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated (29ebfdc -> c42c8a3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git. from 29ebfdc [SPARK-36900][TESTS][BUILD] Increase test memory to 6g for Java 11 add c42c8a3 [SPARK-36979][SQL][3.2] Add RewriteLateralSubquery rule into nonExcludableRules No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 3 ++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 7 +++ 2 files changed, 9 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36951][PYTHON] Inline type hints for python/pyspark/sql/column.py
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3ba57f5 [SPARK-36951][PYTHON] Inline type hints for python/pyspark/sql/column.py 3ba57f5 is described below commit 3ba57f5edc5594ee676249cd309b8f0d8248462e Author: Xinrong Meng AuthorDate: Tue Oct 12 13:36:22 2021 -0700 [SPARK-36951][PYTHON] Inline type hints for python/pyspark/sql/column.py ### What changes were proposed in this pull request? Inline type hints for python/pyspark/sql/column.py ### Why are the changes needed? Currently, Inline type hints for python/pyspark/sql/column.pyi doesn't support type checking within function bodies. So we inline type hints to support that. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing test. Closes #34226 from xinrong-databricks/inline_column. Authored-by: Xinrong Meng Signed-off-by: Takuya UESHIN --- python/pyspark/sql/column.py | 236 -- python/pyspark/sql/column.pyi | 118 --- python/pyspark/sql/dataframe.py | 12 +- python/pyspark/sql/functions.py | 3 +- python/pyspark/sql/observation.py | 5 +- python/pyspark/sql/window.py | 4 +- 6 files changed, 190 insertions(+), 188 deletions(-) diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index c46b0eb..a3e3e9e 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -18,25 +18,43 @@ import sys import json import warnings +from typing import ( +cast, +overload, +Any, +Callable, +Iterable, +List, +Optional, +Tuple, +TYPE_CHECKING, +Union +) + +from py4j.java_gateway import JavaObject from pyspark import copy_func from pyspark.context import SparkContext from pyspark.sql.types import DataType, StructField, StructType, IntegerType, StringType +if TYPE_CHECKING: +from pyspark.sql._typing import ColumnOrName, LiteralType, DecimalLiteral, DateTimeLiteral +from pyspark.sql.window import WindowSpec + __all__ = ["Column"] -def _create_column_from_literal(literal): -sc = SparkContext._active_spark_context +def _create_column_from_literal(literal: Union["LiteralType", "DecimalLiteral"]) -> "Column": +sc = SparkContext._active_spark_context # type: ignore[attr-defined] return sc._jvm.functions.lit(literal) -def _create_column_from_name(name): -sc = SparkContext._active_spark_context +def _create_column_from_name(name: str) -> "Column": +sc = SparkContext._active_spark_context # type: ignore[attr-defined] return sc._jvm.functions.col(name) -def _to_java_column(col): +def _to_java_column(col: "ColumnOrName") -> JavaObject: if isinstance(col, Column): jcol = col._jc elif isinstance(col, str): @@ -50,7 +68,11 @@ def _to_java_column(col): return jcol -def _to_seq(sc, cols, converter=None): +def _to_seq( +sc: SparkContext, +cols: Iterable["ColumnOrName"], +converter: Optional[Callable[["ColumnOrName"], JavaObject]] = None, +) -> JavaObject: """ Convert a list of Column (or names) into a JVM Seq of Column. @@ -59,10 +81,14 @@ def _to_seq(sc, cols, converter=None): """ if converter: cols = [converter(c) for c in cols] -return sc._jvm.PythonUtils.toSeq(cols) +return sc._jvm.PythonUtils.toSeq(cols) # type: ignore[attr-defined] -def _to_list(sc, cols, converter=None): +def _to_list( +sc: SparkContext, +cols: List["ColumnOrName"], +converter: Optional[Callable[["ColumnOrName"], JavaObject]] = None, +) -> JavaObject: """ Convert a list of Column (or names) into a JVM (Scala) List of Column. @@ -71,30 +97,37 @@ def _to_list(sc, cols, converter=None): """ if converter: cols = [converter(c) for c in cols] -return sc._jvm.PythonUtils.toList(cols) +return sc._jvm.PythonUtils.toList(cols) # type: ignore[attr-defined] -def _unary_op(name, doc="unary operator"): +def _unary_op( +name: str, +doc: str = "unary operator", +) -> Callable[["Column"], "Column"]: """ Create a method for given unary operator """ -def _(self): +def _(self: "Column") -> "Column": jc = getattr(self._jc, name)() return Column(jc) _.__doc__ = doc return _ -def _func_op(name, doc=''): -def _(self): -sc = SparkContext._active_spark_context +def _func_op(name: str, doc: str = '') -> Callable[["Column"], "Column"]: +def _(self: "Column") -> "Column": +sc = SparkContext._active_spark_context # type: ignore[attr-defined] jc = getattr(sc._jvm.functions, name)(self._jc) return Column(jc) _.__doc__ = doc return _ -def _bin_func_op(name, rev
[GitHub] [spark-website] gengliangwang merged pull request #360: Fix URL in twitter:image of home page
gengliangwang merged pull request #360: URL: https://github.com/apache/spark-website/pull/360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix URL in twitter:image of home page (#360)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 0a6505e Fix URL in twitter:image of home page (#360) 0a6505e is described below commit 0a6505e4f7862290a2cf0326df16762887bfa1ef Author: Gengliang Wang AuthorDate: Wed Oct 13 02:51:22 2021 +0800 Fix URL in twitter:image of home page (#360) The URL of twitter:image misses one slash. This PR is to fix it. --- _layouts/home.html | 2 +- site/index.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_layouts/home.html b/_layouts/home.html index 6bde86c..83310da 100644 --- a/_layouts/home.html +++ b/_layouts/home.html @@ -22,7 +22,7 @@ - + https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css"; rel="stylesheet" diff --git a/site/index.html b/site/index.html index 7021702..de9de34 100644 --- a/site/index.html +++ b/site/index.html @@ -18,7 +18,7 @@ - https://spark.apache.orgimages/spark-twitter-card-large.jpg";> + https://spark.apache.org/images/spark-twitter-card-large.jpg";> https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css"; rel="stylesheet" - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on pull request #360: Fix URL in twitter:image of home page
gengliangwang commented on pull request #360: URL: https://github.com/apache/spark-website/pull/360#issuecomment-941284401 > Any other occurences? might look for this variable followed by no slash, just to make sure there aren't others @srowen I checked. This is the only bug I can find. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on pull request #360: Fix URL in twitter:image of home page
srowen commented on pull request #360: URL: https://github.com/apache/spark-website/pull/360#issuecomment-941279889 (There's no markdown template for layouts right?) Any other occurences? might look for this variable followed by no slash, just to make sure there aren't others -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on pull request #360: Fix URL in twitter:image of home page
gengliangwang commented on pull request #360: URL: https://github.com/apache/spark-website/pull/360#issuecomment-941261626 cc @gatorsmile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang opened a new pull request #360: Fix URL in twitter:image of home page
gengliangwang opened a new pull request #360: URL: https://github.com/apache/spark-website/pull/360 The URL of `twitter:image` misses one slash. This PR is to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (dc1db95 -> 1af7072)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from dc1db95 [SPARK-36867][SQL] Fix error message with GROUP BY alias add 1af7072 [SPARK-36970][SQL] Manual disabled format `B` of `date_format` function to make Java 17 compatible with Java 8 No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala| 6 +- .../org/apache/spark/sql/catalyst/util/DatetimeFormatterSuite.scala | 2 +- .../resources/sql-tests/results/datetime-formatting-invalid.sql.out | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang commented on pull request #359: New home page and layout for Spark website
gengliangwang commented on pull request #359: URL: https://github.com/apache/spark-website/pull/359#issuecomment-941194036 Merged. Thanks all for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gengliangwang merged pull request #359: New home page and layout for Spark website
gengliangwang merged pull request #359: URL: https://github.com/apache/spark-website/pull/359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36867][SQL] Fix error message with GROUP BY alias
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dc1db95 [SPARK-36867][SQL] Fix error message with GROUP BY alias dc1db95 is described below commit dc1db950adb9a210acfe4a0a77988955a5f35e5e Author: Wenchen Fan AuthorDate: Tue Oct 12 22:47:31 2021 +0800 [SPARK-36867][SQL] Fix error message with GROUP BY alias ### What changes were proposed in this pull request? When checking unresolved attributes, we should check `Aggregate.aggregateExpressions` before `Aggregate.groupingExpressions`, because the latter may rely on the former, due to the GROUP BY alias feature. ### Why are the changes needed? improve error message ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test Closes #34244 from cloud-fan/bug. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CheckAnalysis.scala | 28 +- .../test/resources/sql-tests/inputs/group-by.sql | 3 +++ .../resources/sql-tests/results/group-by.sql.out | 11 - 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index bdd7ffb..5bf37a2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -165,7 +165,14 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { } } -operator transformExpressionsUp { +val exprs = operator match { + // `groupingExpressions` may rely on `aggregateExpressions`, due to the GROUP BY alias + // feature. We should check errors in `aggregateExpressions` first. + case a: Aggregate => a.aggregateExpressions ++ a.groupingExpressions + case _ => operator.expressions +} + +exprs.foreach(_.foreachUp { case a: Attribute if !a.resolved => val missingCol = a.sql val candidates = operator.inputSet.toSeq.map(_.qualifiedName) @@ -209,27 +216,26 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { failAnalysis(s"${wf.prettyName} function can only be evaluated in an ordered " + s"row-based window frame with a single offset: $w") - case w @ WindowExpression(e, s) => + case w: WindowExpression => // Only allow window functions with an aggregate expression or an offset window // function or a Pandas window UDF. -e match { +w.windowFunction match { case _: AggregateExpression | _: FrameLessOffsetWindowFunction | - _: AggregateWindowFunction => -w - case f: PythonUDF if PythonUDF.isWindowPandasUDF(f) => -w - case _ => -failAnalysis(s"Expression '$e' not supported within a window function.") + _: AggregateWindowFunction => // OK + case f: PythonUDF if PythonUDF.isWindowPandasUDF(f) => // OK + case other => +failAnalysis(s"Expression '$other' not supported within a window function.") } case s: SubqueryExpression => checkSubqueryExpression(operator, s) -s case e: ExpressionWithRandomSeed if !e.seedExpression.foldable => failAnalysis( s"Input argument to ${e.prettyName} must be a constant.") -} + + case _ => +}) operator match { case etw: EventTimeWatermark => diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql index e2c3672..039373b 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql @@ -45,6 +45,9 @@ SELECT COUNT(DISTINCT b), COUNT(DISTINCT b, c) FROM (SELECT 1 AS a, 2 AS b, 3 AS SELECT a AS k, COUNT(b) FROM testData GROUP BY k; SELECT a AS k, COUNT(b) FROM testData GROUP BY k HAVING k > 1; +-- GROUP BY alias with invalid col in SELECT list +SELECT a AS k, COUNT(non_existing) FROM testData GROUP BY k; + -- Aggregate functions cannot be used in GROUP BY SELECT COUNT(b) AS k FROM testData GROUP BY k; diff --git a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out index 37deb87..f598f49 100644 --- a/sql/core/src/test/resources/sql-tests/res
[spark] tag v3.2.0 created (now 5d45a41)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to tag v3.2.0 in repository https://gitbox.apache.org/repos/asf/spark.git. at 5d45a41 (commit) No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36914][SQL] Implement dropIndex and listIndexes in JDBC (MySQL dialect)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a453fd5 [SPARK-36914][SQL] Implement dropIndex and listIndexes in JDBC (MySQL dialect) a453fd5 is described below commit a453fd55dd37516fbfb9332cf43e360796dfb955 Author: Huaxin Gao AuthorDate: Tue Oct 12 22:36:47 2021 +0800 [SPARK-36914][SQL] Implement dropIndex and listIndexes in JDBC (MySQL dialect) ### What changes were proposed in this pull request? This PR implements `dropIndex` and `listIndexes` in MySQL dialect ### Why are the changes needed? As a subtask of the V2 Index support, this PR completes the implementation for JDBC V2 index support. ### Does this PR introduce _any_ user-facing change? Yes, `dropIndex/listIndexes` in DS V2 JDBC ### How was this patch tested? new tests Closes #34236 from huaxingao/listIndexJDBC. Authored-by: Huaxin Gao Signed-off-by: Wenchen Fan --- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 33 +++- .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 95 +- .../sql/connector/catalog/index/SupportsIndex.java | 7 +- .../sql/connector/catalog/index/TableIndex.java| 12 ++- .../catalyst/analysis/NoSuchItemException.scala| 4 +- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 24 ++ .../execution/datasources/v2/jdbc/JDBCTable.scala | 13 ++- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 25 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 84 +++ 9 files changed, 239 insertions(+), 58 deletions(-) diff --git a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala index 3cb8787..67e8108 100644 --- a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala +++ b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala @@ -24,8 +24,6 @@ import org.scalatest.time.SpanSugar._ import org.apache.spark.SparkConf import org.apache.spark.sql.AnalysisException -import org.apache.spark.sql.catalyst.analysis.IndexAlreadyExistsException -import org.apache.spark.sql.connector.catalog.{Catalogs, Identifier, TableCatalog} import org.apache.spark.sql.connector.catalog.index.SupportsIndex import org.apache.spark.sql.connector.expressions.{FieldReference, NamedReference} import org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog @@ -121,31 +119,22 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationSuite with V2JDBCTest { assert(t.schema === expectedSchema) } - override def testIndex(tbl: String): Unit = { -val loaded = Catalogs.load("mysql", conf) -val jdbcTable = loaded.asInstanceOf[TableCatalog] - .loadTable(Identifier.of(Array.empty[String], "new_table")) - .asInstanceOf[SupportsIndex] -assert(jdbcTable.indexExists("i1") == false) -assert(jdbcTable.indexExists("i2") == false) + override def supportsIndex: Boolean = true + override def testIndexProperties(jdbcTable: SupportsIndex): Unit = { val properties = new util.Properties(); properties.put("KEY_BLOCK_SIZE", "10") properties.put("COMMENT", "'this is a comment'") -jdbcTable.createIndex("i1", "", Array(FieldReference("col1")), +// MySQL doesn't allow property set on individual column, so use empty Array for +// column properties +jdbcTable.createIndex("i1", "BTREE", Array(FieldReference("col1")), Array.empty[util.Map[NamedReference, util.Properties]], properties) -jdbcTable.createIndex("i2", "", - Array(FieldReference("col2"), FieldReference("col3"), FieldReference("col5")), - Array.empty[util.Map[NamedReference, util.Properties]], new util.Properties) - -assert(jdbcTable.indexExists("i1") == true) -assert(jdbcTable.indexExists("i2") == true) - -val m = intercept[IndexAlreadyExistsException] { - jdbcTable.createIndex("i1", "", Array(FieldReference("col1")), -Array.empty[util.Map[NamedReference, util.Properties]], properties) -}.getMessage -assert(m.contains("Failed to create index: i1 in new_table")) +var index = jdbcTable.listIndexes() +// The index property size is actually 1. Even though the index is created +// with properties "KEY_BLOCK_SIZE", "10" and "COMMENT", "'this is a comment'", when +// retrieving index using `SHOW INDEXES`, MySQL only returns `COMMENT`. +assert(index(0).properties.size == 1) +assert(index(0).properties.get("COMMENT").equals("this is a comment")) } } diff --git a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2J
[spark] branch master updated (36b3bbc0 -> b9a8165)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 36b3bbc0 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules add b9a8165 [SPARK-36972][PYTHON] Add max_by/min_by API to PySpark No new revisions were added by this update. Summary of changes: python/docs/source/reference/pyspark.sql.rst | 2 + python/pyspark/sql/functions.py | 72 2 files changed, 74 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 36b3bbc0 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules 36b3bbc0 is described below commit 36b3bbc0aa9f9c39677960cd93f32988c7d7aaca Author: ulysses-you AuthorDate: Tue Oct 12 16:21:53 2021 +0800 [SPARK-36979][SQL] Add RewriteLateralSubquery rule into nonExcludableRules ### What changes were proposed in this pull request? Add RewriteLateralSubquery rule into nonExcludableRules. ### Why are the changes needed? Lateral Join has no meaning without rule `RewriteLateralSubquery`. So now if we set `spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery`, the lateral join query will fail with: ``` java.lang.AssertionError: assertion failed: No plan for LateralJoin lateral-subquery#218 ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add test Closes #34249 from ulysses-you/SPARK-36979. Authored-by: ulysses-you Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 3 ++- sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 7 +++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index b8c7fe7..73be790 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -284,7 +284,8 @@ abstract class Optimizer(catalogManager: CatalogManager) NormalizeFloatingNumbers.ruleName :: ReplaceUpdateFieldsExpression.ruleName :: PullOutGroupingExpressions.ruleName :: - RewriteAsOfJoin.ruleName :: Nil + RewriteAsOfJoin.ruleName :: + RewriteLateralSubquery.ruleName :: Nil /** * Optimize all the subqueries inside expression. diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 3d5b911..11b7ee6 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -4204,6 +4204,13 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark checkAnswer(sql("""SELECT from_json(r'{"a": "\\"}', 'a string')"""), Row(Row("\\"))) checkAnswer(sql("""SELECT from_json(R'{"a": "\\"}', 'a string')"""), Row(Row("\\"))) } + + test("SPARK-36979: Add RewriteLateralSubquery rule into nonExcludableRules") { +withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> + "org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery") { + sql("SELECT * FROM testData, LATERAL (SELECT * FROM testData)").collect() +} + } } case class Foo(bar: Option[String]) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] yaooqinn commented on pull request #359: New home page and layout for Spark website
yaooqinn commented on pull request #359: URL: https://github.com/apache/spark-website/pull/359#issuecomment-940778866 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org