[spark] branch master updated: [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 281b00a [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2 281b00a is described below commit 281b00ab5b3dd3f21dd6af020ad5455f35498b79 Author: Kousuke Saruta AuthorDate: Wed Aug 18 13:40:52 2021 +0900 [SPARK-34309][BUILD][FOLLOWUP] Upgrade Caffeine to 2.9.2 ### What changes were proposed in this pull request? This PR upgrades Caffeine to `2.9.2`. Caffeine was introduced in SPARK-34309 (#31517). At the time that PR was opened, the latest version of caffeine was `2.9.1` but now `2.9.2` is available. ### Why are the changes needed? `2.9.2` have the following improvements (https://github.com/ben-manes/caffeine/releases/tag/v2.9.2). * Fixed reading an intermittent null weak/soft value during a concurrent write * Fixed extraneous eviction when concurrently removing a collected entry after a writer resurrects it with a new mapping * Fixed excessive retries of discarding an expired entry when the fixed duration period is extended, thereby resurrecting it ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CIs. Closes #33772 from sarutak/upgrade-caffeine-2.9.2. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 1dc01b5..31dd02f 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -30,7 +30,7 @@ blas/2.2.0//blas-2.2.0.jar bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar breeze_2.12/1.2//breeze_2.12-1.2.jar -caffeine/2.9.1//caffeine-2.9.1.jar +caffeine/2.9.2//caffeine-2.9.2.jar cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar checker-qual/3.10.0//checker-qual-3.10.0.jar chill-java/0.10.0//chill-java-0.10.0.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 698a03c..5b27680 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -25,7 +25,7 @@ blas/2.2.0//blas-2.2.0.jar bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar breeze-macros_2.12/1.2//breeze-macros_2.12-1.2.jar breeze_2.12/1.2//breeze_2.12-1.2.jar -caffeine/2.9.1//caffeine-2.9.1.jar +caffeine/2.9.2//caffeine-2.9.2.jar cats-kernel_2.12/2.1.1//cats-kernel_2.12-2.1.1.jar checker-qual/3.10.0//checker-qual-3.10.0.jar chill-java/0.10.0//chill-java-0.10.0.jar diff --git a/pom.xml b/pom.xml index bd1722f..1452b0b 100644 --- a/pom.xml +++ b/pom.xml @@ -182,7 +182,7 @@ 2.6.2 4.1.17 14.0.1 -2.9.1 +2.9.2 3.0.16 2.34 2.10.10 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 31d771d [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex 31d771d is described below commit 31d771dcf242cfa477b04f28950526bf87b7e90a Author: Kousuke Saruta AuthorDate: Wed Aug 18 13:31:22 2021 +0900 [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex ### What changes were proposed in this pull request? This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`. The problem is that sensitive information included in queries can be exposed. ![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png) ![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png) ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed UI. ![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png) ![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png) Closes #33743 from sarutak/thrift-redact. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2) Signed-off-by: Kousuke Saruta --- .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index f7a4be9..acb00e4 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -220,10 +220,11 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) logInfo(s"Submitting query '$statement' with $statementId") +val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString, - statement, + redactedStatement, statementId, parentSession.getUsername) setHasResultSet(true) // avoid no resultset for async run - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new b749b49 [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex b749b49 is described below commit b749b49a283800d3e12455a00a23da24bf6cd333 Author: Kousuke Saruta AuthorDate: Wed Aug 18 13:31:22 2021 +0900 [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex ### What changes were proposed in this pull request? This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`. The problem is that sensitive information included in queries can be exposed. ![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png) ![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png) ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed UI. ![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png) ![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png) Closes #33743 from sarutak/thrift-redact. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta (cherry picked from commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2) Signed-off-by: Kousuke Saruta --- .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index f43f8e7..0df5885 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) logInfo(s"Submitting query '$statement' with $statementId") +val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString, - statement, + redactedStatement, statementId, parentSession.getUsername) setHasResultSet(true) // avoid no resultset for async run - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b914ff7 [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex b914ff7 is described below commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2 Author: Kousuke Saruta AuthorDate: Wed Aug 18 13:31:22 2021 +0900 [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex ### What changes were proposed in this pull request? This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`. The problem is that sensitive information included in queries can be exposed. ![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png) ![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png) ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed UI. ![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png) ![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png) Closes #33743 from sarutak/thrift-redact. Authored-by: Kousuke Saruta Signed-off-by: Kousuke Saruta --- .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index f43f8e7..0df5885 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) logInfo(s"Submitting query '$statement' with $statementId") +val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString, - statement, + redactedStatement, statementId, parentSession.getUsername) setHasResultSet(true) // avoid no resultset for async run - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 528fca8 [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version 528fca8 is described below commit 528fca8944036ebd7ded3be8fbb799de080f663a Author: Takuya UESHIN AuthorDate: Wed Aug 18 10:36:09 2021 +0900 [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version ### What changes were proposed in this pull request? This is a follow-up of #33687. Use `LooseVersion` instead of `pkg_resources.parse_version`. ### Why are the changes needed? In the previous PR, `pkg_resources.parse_version` was used, but we should use `LooseVersion` instead to be consistent in the code base. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #33768 from ueshin/issues/SPARK-36370/LooseVersion. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon (cherry picked from commit 7fb8ea319e4931f7721ac6f9c12100c95d252cd2) Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/groupby.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py index 2daf80f..70ece9c 100644 --- a/python/pyspark/pandas/groupby.py +++ b/python/pyspark/pandas/groupby.py @@ -26,7 +26,6 @@ from collections import OrderedDict, namedtuple from distutils.version import LooseVersion from functools import partial from itertools import product -from pkg_resources import parse_version # type: ignore from typing import ( Any, Callable, @@ -47,7 +46,7 @@ from typing import ( import pandas as pd from pandas.api.types import is_hashable, is_list_like -if parse_version(pd.__version__) >= parse_version("1.3.0"): +if LooseVersion(pd.__version__) >= LooseVersion("1.3.0"): from pandas.core.common import _builtin_table else: from pandas.core.base import SelectionMixin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7fb8ea3 [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version 7fb8ea3 is described below commit 7fb8ea319e4931f7721ac6f9c12100c95d252cd2 Author: Takuya UESHIN AuthorDate: Wed Aug 18 10:36:09 2021 +0900 [SPARK-36370][PYTHON][FOLLOWUP] Use LooseVersion instead of pkg_resources.parse_version ### What changes were proposed in this pull request? This is a follow-up of #33687. Use `LooseVersion` instead of `pkg_resources.parse_version`. ### Why are the changes needed? In the previous PR, `pkg_resources.parse_version` was used, but we should use `LooseVersion` instead to be consistent in the code base. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #33768 from ueshin/issues/SPARK-36370/LooseVersion. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/groupby.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py index 1ced2ce..beb36e6 100644 --- a/python/pyspark/pandas/groupby.py +++ b/python/pyspark/pandas/groupby.py @@ -26,7 +26,6 @@ from collections import OrderedDict, namedtuple from distutils.version import LooseVersion from functools import partial from itertools import product -from pkg_resources import parse_version # type: ignore from typing import ( Any, Callable, @@ -47,7 +46,7 @@ from typing import ( import pandas as pd from pandas.api.types import is_hashable, is_list_like -if parse_version(pd.__version__) >= parse_version("1.3.0"): +if LooseVersion(pd.__version__) >= LooseVersion("1.3.0"): from pandas.core.common import _builtin_table else: from pandas.core.base import SelectionMixin - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-36535][SQL] Refine the sql reference doc
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 5107ad3 [SPARK-36535][SQL] Refine the sql reference doc 5107ad3 is described below commit 5107ad3157c07c91fec2e30fc97e72684b84cf14 Author: Wenchen Fan AuthorDate: Tue Aug 17 12:46:38 2021 -0700 [SPARK-36535][SQL] Refine the sql reference doc ### What changes were proposed in this pull request? Refine the SQL reference doc: - remove useless subitems in the sidebar - remove useless sub-menu-pages (e.g. `sql-ref-syntax-aux.md`) - avoid using `#` in `sql-ref-literals.md` ### Why are the changes needed? The subitems in the sidebar are quite useless, as the menu page serves the same functionalities: https://user-images.githubusercontent.com/3182036/129765924-d7e69bc1-e351-4581-a6de-f2468022f372.png";> It's also extra work to keep the manu page and sidebar subitems in sync (The ANSI compliance page is already out of sync). The sub-menu-pages are only referenced by the sidebar, and duplicates the content of the menu page. As a result, the `sql-ref-syntax-aux.md` is already outdated compared to the menu page. It's easier to just look at the menu page. The `#` is not rendered properly: https://user-images.githubusercontent.com/3182036/129766760-6f385443-e597-44aa-888d-14d128d45f84.png";> It's better to avoid using it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #33767 from cloud-fan/doc. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun (cherry picked from commit 4b015e8d7d6f5972341104f2a359bb9d09c4385b) Signed-off-by: Dongjoon Hyun --- docs/_data/menu-sql.yaml | 187 +- docs/sql-ref-literals.md | 42 - docs/sql-ref-syntax-aux.md| 29 -- docs/sql-ref-syntax-ddl.md| 37 docs/sql-ref-syntax-dml-insert.md | 27 -- docs/sql-ref-syntax-dml.md| 25 - docs/sql-ref-syntax-qry.md| 53 --- docs/sql-ref-syntax.md| 12 +++ 8 files changed, 34 insertions(+), 378 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index e7b22c4..22e01df 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -75,28 +75,12 @@ subitems: - text: ANSI Compliance url: sql-ref-ansi-compliance.html - subitems: -- text: Arithmetic Operations - url: sql-ref-ansi-compliance.html#arithmetic-operations -- text: Type Conversion - url: sql-ref-ansi-compliance.html#type-conversion -- text: SQL Keywords - url: sql-ref-ansi-compliance.html#sql-keywords - text: Data Types url: sql-ref-datatypes.html - text: Datetime Pattern url: sql-ref-datetime-pattern.html - text: Functions url: sql-ref-functions.html - subitems: - - text: Built-in Functions -url: sql-ref-functions-builtin.html - - text: Scalar UDFs (User-Defined Functions) -url: sql-ref-functions-udf-scalar.html - - text: UDAFs (User-Defined Aggregate Functions) -url: sql-ref-functions-udf-aggregate.html - - text: Integration with Hive UDFs/UDAFs/UDTFs -url: sql-ref-functions-udf-hive.html - text: Identifiers url: sql-ref-identifier.html - text: Literals @@ -107,173 +91,10 @@ url: sql-ref-syntax.html subitems: - text: Data Definition Statements - url: sql-ref-syntax-ddl.html - subitems: -- text: ALTER DATABASE - url: sql-ref-syntax-ddl-alter-database.html -- text: ALTER TABLE - url: sql-ref-syntax-ddl-alter-table.html -- text: ALTER VIEW - url: sql-ref-syntax-ddl-alter-view.html -- text: CREATE DATABASE - url: sql-ref-syntax-ddl-create-database.html -- text: CREATE FUNCTION - url: sql-ref-syntax-ddl-create-function.html -- text: CREATE TABLE - url: sql-ref-syntax-ddl-create-table.html -- text: CREATE VIEW - url: sql-ref-syntax-ddl-create-view.html -- text: DROP DATABASE - url: sql-ref-syntax-ddl-drop-database.html -- text: DROP FUNCTION - url: sql-ref-syntax-ddl-drop-function.html -- text: DROP TABLE - url: sql-ref-syntax-ddl-drop-table.html -- text: DROP VIEW - url: sql-ref-syntax-ddl-drop-view.html -- text: TRUNCATE TABLE - url: sql-ref-syntax-ddl-truncate-table.html -- text: REPAIR TABLE - url: sql-ref-syntax-
[spark] branch master updated: [SPARK-36535][SQL] Refine the sql reference doc
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4b015e8 [SPARK-36535][SQL] Refine the sql reference doc 4b015e8 is described below commit 4b015e8d7d6f5972341104f2a359bb9d09c4385b Author: Wenchen Fan AuthorDate: Tue Aug 17 12:46:38 2021 -0700 [SPARK-36535][SQL] Refine the sql reference doc ### What changes were proposed in this pull request? Refine the SQL reference doc: - remove useless subitems in the sidebar - remove useless sub-menu-pages (e.g. `sql-ref-syntax-aux.md`) - avoid using `#` in `sql-ref-literals.md` ### Why are the changes needed? The subitems in the sidebar are quite useless, as the menu page serves the same functionalities: https://user-images.githubusercontent.com/3182036/129765924-d7e69bc1-e351-4581-a6de-f2468022f372.png";> It's also extra work to keep the manu page and sidebar subitems in sync (The ANSI compliance page is already out of sync). The sub-menu-pages are only referenced by the sidebar, and duplicates the content of the menu page. As a result, the `sql-ref-syntax-aux.md` is already outdated compared to the menu page. It's easier to just look at the menu page. The `#` is not rendered properly: https://user-images.githubusercontent.com/3182036/129766760-6f385443-e597-44aa-888d-14d128d45f84.png";> It's better to avoid using it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #33767 from cloud-fan/doc. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- docs/_data/menu-sql.yaml | 187 +- docs/sql-ref-literals.md | 42 - docs/sql-ref-syntax-aux.md| 29 -- docs/sql-ref-syntax-ddl.md| 37 docs/sql-ref-syntax-dml-insert.md | 27 -- docs/sql-ref-syntax-dml.md| 25 - docs/sql-ref-syntax-qry.md| 53 --- docs/sql-ref-syntax.md| 12 +++ 8 files changed, 34 insertions(+), 378 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index e7b22c4..22e01df 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -75,28 +75,12 @@ subitems: - text: ANSI Compliance url: sql-ref-ansi-compliance.html - subitems: -- text: Arithmetic Operations - url: sql-ref-ansi-compliance.html#arithmetic-operations -- text: Type Conversion - url: sql-ref-ansi-compliance.html#type-conversion -- text: SQL Keywords - url: sql-ref-ansi-compliance.html#sql-keywords - text: Data Types url: sql-ref-datatypes.html - text: Datetime Pattern url: sql-ref-datetime-pattern.html - text: Functions url: sql-ref-functions.html - subitems: - - text: Built-in Functions -url: sql-ref-functions-builtin.html - - text: Scalar UDFs (User-Defined Functions) -url: sql-ref-functions-udf-scalar.html - - text: UDAFs (User-Defined Aggregate Functions) -url: sql-ref-functions-udf-aggregate.html - - text: Integration with Hive UDFs/UDAFs/UDTFs -url: sql-ref-functions-udf-hive.html - text: Identifiers url: sql-ref-identifier.html - text: Literals @@ -107,173 +91,10 @@ url: sql-ref-syntax.html subitems: - text: Data Definition Statements - url: sql-ref-syntax-ddl.html - subitems: -- text: ALTER DATABASE - url: sql-ref-syntax-ddl-alter-database.html -- text: ALTER TABLE - url: sql-ref-syntax-ddl-alter-table.html -- text: ALTER VIEW - url: sql-ref-syntax-ddl-alter-view.html -- text: CREATE DATABASE - url: sql-ref-syntax-ddl-create-database.html -- text: CREATE FUNCTION - url: sql-ref-syntax-ddl-create-function.html -- text: CREATE TABLE - url: sql-ref-syntax-ddl-create-table.html -- text: CREATE VIEW - url: sql-ref-syntax-ddl-create-view.html -- text: DROP DATABASE - url: sql-ref-syntax-ddl-drop-database.html -- text: DROP FUNCTION - url: sql-ref-syntax-ddl-drop-function.html -- text: DROP TABLE - url: sql-ref-syntax-ddl-drop-table.html -- text: DROP VIEW - url: sql-ref-syntax-ddl-drop-view.html -- text: TRUNCATE TABLE - url: sql-ref-syntax-ddl-truncate-table.html -- text: REPAIR TABLE - url: sql-ref-syntax-ddl-repair-table.html -- text: USE DATABASE - url: sql-ref-syntax-ddl-usedb.html +
[spark] branch branch-3.2 updated: [SPARK-36370][PYTHON] _builtin_table directly imported from pandas instead of being redefined
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new e15daa3 [SPARK-36370][PYTHON] _builtin_table directly imported from pandas instead of being redefined e15daa3 is described below commit e15daa31b36669a7e29367e385f28b6ba25acf09 Author: Cedric-Magnan AuthorDate: Tue Aug 17 10:46:49 2021 -0700 [SPARK-36370][PYTHON] _builtin_table directly imported from pandas instead of being redefined ### What changes were proposed in this pull request? Suggesting to refactor the way the _builtin_table is defined in the `python/pyspark/pandas/groupby.py` module. Pandas has recently refactored the way we import the _builtin_table and is now part of the pandas.core.common module instead of being an attribute of the pandas.core.base.SelectionMixin class. ### Why are the changes needed? This change is not fully needed but the current implementation redefines this table within pyspark, so any changes of this table from the pandas library would need to be updated in the pyspark repository as well. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran the following command successfully : ```sh python/run-tests --testnames 'pyspark.pandas.tests.test_groupby' ``` Tests passed in 327 seconds Closes #33687 from Cedric-Magnan/_builtin_table_from_pandas. Authored-by: Cedric-Magnan Signed-off-by: Takuya UESHIN (cherry picked from commit 964dfe254ff8ebf9d7f5c7115ff8f79da3f28261) Signed-off-by: Takuya UESHIN --- python/pyspark/pandas/groupby.py | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/python/pyspark/pandas/groupby.py b/python/pyspark/pandas/groupby.py index 376592d..2daf80f 100644 --- a/python/pyspark/pandas/groupby.py +++ b/python/pyspark/pandas/groupby.py @@ -20,13 +20,13 @@ A wrapper for GroupedData to behave similar to pandas GroupBy. """ from abc import ABCMeta, abstractmethod -import builtins import sys import inspect from collections import OrderedDict, namedtuple from distutils.version import LooseVersion from functools import partial from itertools import product +from pkg_resources import parse_version # type: ignore from typing import ( Any, Callable, @@ -44,10 +44,16 @@ from typing import ( TYPE_CHECKING, ) -import numpy as np import pandas as pd from pandas.api.types import is_hashable, is_list_like +if parse_version(pd.__version__) >= parse_version("1.3.0"): +from pandas.core.common import _builtin_table +else: +from pandas.core.base import SelectionMixin + +_builtin_table = SelectionMixin._builtin_table + from pyspark.sql import Column, DataFrame as SparkDataFrame, Window, functions as F from pyspark.sql.types import ( # noqa: F401 DataType, @@ -97,12 +103,6 @@ if TYPE_CHECKING: # to keep it the same as pandas NamedAgg = namedtuple("NamedAgg", ["column", "aggfunc"]) -_builtin_table = { -builtins.sum: np.sum, -builtins.max: np.max, -builtins.min: np.min, -} # type: Dict[Callable, Callable] - class GroupBy(Generic[FrameLike], metaclass=ABCMeta): """ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c0441bb -> 964dfe2)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c0441bb [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string add 964dfe2 [SPARK-36370][PYTHON] _builtin_table directly imported from pandas instead of being redefined No new revisions were added by this update. Summary of changes: python/pyspark/pandas/groupby.py | 16 1 file changed, 8 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c0441bb [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string c0441bb is described below commit c0441bb7e83e83e3240bf7e2991de34b01a182f5 Author: itholic AuthorDate: Tue Aug 17 10:29:16 2021 -0700 [SPARK-36387][PYTHON] Fix Series.astype from datetime to nullable string ### What changes were proposed in this pull request? This PR proposes to fix `Series.astype` when converting datetime type to StringDtype, to match the behavior of pandas 1.3. In pandas < 1.3, ```python >>> pd.Series(["2020-10-27 00:00:01", None], name="datetime").astype("string") 02020-10-27 00:00:01 1NaT Name: datetime, dtype: string ``` This is changed to ```python >>> pd.Series(["2020-10-27 00:00:01", None], name="datetime").astype("string") 02020-10-27 00:00:01 1 Name: datetime, dtype: string ``` in pandas >= 1.3, so we follow the behavior of latest pandas. ### Why are the changes needed? Because pandas-on-Spark always follow the behavior of latest pandas. ### Does this PR introduce _any_ user-facing change? Yes, the behavior is changed to latest pandas when converting datetime to nullable string (StringDtype) ### How was this patch tested? Unittest passed Closes #33735 from itholic/SPARK-36387. Authored-by: itholic Signed-off-by: Takuya UESHIN --- python/pyspark/pandas/data_type_ops/base.py | 2 +- python/pyspark/pandas/data_type_ops/datetime_ops.py | 19 --- python/pyspark/pandas/tests/test_series.py | 8 +--- 3 files changed, 10 insertions(+), 19 deletions(-) diff --git a/python/pyspark/pandas/data_type_ops/base.py b/python/pyspark/pandas/data_type_ops/base.py index c69715f..b4c8c3e 100644 --- a/python/pyspark/pandas/data_type_ops/base.py +++ b/python/pyspark/pandas/data_type_ops/base.py @@ -155,7 +155,7 @@ def _as_string_type( index_ops: IndexOpsLike, dtype: Union[str, type, Dtype], *, null_str: str = str(None) ) -> IndexOpsLike: """Cast `index_ops` to StringType Spark type, given `dtype` and `null_str`, -representing null Spark column. +representing null Spark column. Note that `null_str` is for non-extension dtypes only. """ spark_type = StringType() if isinstance(dtype, extension_dtypes): diff --git a/python/pyspark/pandas/data_type_ops/datetime_ops.py b/python/pyspark/pandas/data_type_ops/datetime_ops.py index 071c22e..63d817b 100644 --- a/python/pyspark/pandas/data_type_ops/datetime_ops.py +++ b/python/pyspark/pandas/data_type_ops/datetime_ops.py @@ -23,7 +23,7 @@ import numpy as np import pandas as pd from pandas.api.types import CategoricalDtype -from pyspark.sql import functions as F, Column +from pyspark.sql import Column from pyspark.sql.types import BooleanType, LongType, StringType, TimestampType from pyspark.pandas._typing import Dtype, IndexOpsLike, SeriesOrIndex @@ -33,10 +33,11 @@ from pyspark.pandas.data_type_ops.base import ( _as_bool_type, _as_categorical_type, _as_other_type, +_as_string_type, _sanitize_list_like, ) from pyspark.pandas.spark import functions as SF -from pyspark.pandas.typedef import extension_dtypes, pandas_on_spark_type +from pyspark.pandas.typedef import pandas_on_spark_type class DatetimeOps(DataTypeOps): @@ -133,18 +134,6 @@ class DatetimeOps(DataTypeOps): elif isinstance(spark_type, BooleanType): return _as_bool_type(index_ops, dtype) elif isinstance(spark_type, StringType): -if isinstance(dtype, extension_dtypes): -# seems like a pandas' bug? -scol = F.when(index_ops.spark.column.isNull(), str(pd.NaT)).otherwise( -index_ops.spark.column.cast(spark_type) -) -else: -null_str = str(pd.NaT) -casted = index_ops.spark.column.cast(spark_type) -scol = F.when(index_ops.spark.column.isNull(), null_str).otherwise(casted) -return index_ops._with_new_scol( -scol.alias(index_ops._internal.data_spark_column_names[0]), -field=index_ops._internal.data_fields[0].copy(dtype=dtype, spark_type=spark_type), -) +return _as_string_type(index_ops, dtype, null_str=str(pd.NaT)) else: return _as_other_type(index_ops, dtype, spark_type) diff --git a/python/pyspark/pandas/tests/test_series.py b/python/pyspark/pandas/tests/test_series.py index d9ba3c76..58c87ed 100644 --- a/python/pyspark/pandas/tests/test_series.py +++ b/python/pyspark/pandas/tests/t
[spark] branch branch-3.2 updated: Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases"
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 70635b4 Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases" 70635b4 is described below commit 70635b4b2633be544563c1cb00e6333fdb1f3782 Author: Gengliang Wang AuthorDate: Tue Aug 17 20:23:49 2021 +0800 Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases" ### What changes were proposed in this pull request? Revert [[SPARK-35028][SQL] ANSI mode: disallow group by aliases ](https://github.com/apache/spark/pull/32129) ### Why are the changes needed? It turns out that many users are using the group by alias feature. Spark has its precedence rule when alias names conflict with column names in Group by clause: always use the table column. This should be reasonable and acceptable. Also, external DBMS such as PostgreSQL and MySQL allow grouping by alias, too. As we are going to announce ANSI mode GA in Spark 3.2, I suggest allowing the group by alias in ANSI mode. ### Does this PR introduce _any_ user-facing change? No, the feature is not released yet. ### How was this patch tested? Unit tests Closes #33758 from gengliangwang/revertGroupByAlias. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 8bfb4f1e72f33205b94957f7dacf298b0c8bde17) Signed-off-by: Gengliang Wang --- docs/sql-ref-ansi-compliance.md|1 - .../spark/sql/catalyst/analysis/Analyzer.scala |2 +- .../org/apache/spark/sql/internal/SQLConf.scala| 27 +- .../sql-tests/inputs/ansi/group-analytics.sql |1 - .../sql-tests/results/ansi/group-analytics.sql.out | 1293 5 files changed, 14 insertions(+), 1310 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index a647abc..f0e1066 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -255,7 +255,6 @@ The behavior of some SQL functions can be different under ANSI mode (`spark.sql. The behavior of some SQL operators can be different under ANSI mode (`spark.sql.ansi.enabled=true`). - `array_col[index]`: This operator throws `ArrayIndexOutOfBoundsException` if using invalid indices. - `map_col[key]`: This operator throws `NoSuchElementException` if key does not exist in map. - - `GROUP BY`: aliases in a select list can not be used in GROUP BY clauses. Each column referenced in a GROUP BY clause shall unambiguously reference a column of the table resulting from the FROM clause. ### Useful Functions for ANSI Mode diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 2f0a709..92018eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1951,7 +1951,7 @@ class Analyzer(override val catalogManager: CatalogManager) // mayResolveAttrByAggregateExprs requires the TreePattern UNRESOLVED_ATTRIBUTE. _.containsAllPatterns(AGGREGATE, UNRESOLVED_ATTRIBUTE), ruleId) { case agg @ Aggregate(groups, aggs, child) - if allowGroupByAlias && child.resolved && aggs.forall(_.resolved) && + if conf.groupByAliases && child.resolved && aggs.forall(_.resolved) && groups.exists(!_.resolved) => agg.copy(groupingExpressions = mayResolveAttrByAggregateExprs(groups, aggs, child)) } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 555242f..6869977 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -240,17 +240,6 @@ object SQLConf { .intConf .createWithDefault(100) - val ANSI_ENABLED = buildConf("spark.sql.ansi.enabled") -.doc("When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant. " + - "For example, Spark will throw an exception at runtime instead of returning null results " + - "when the inputs to a SQL operator/function are invalid." + - "For full details of this dialect, you can find them in the section \"ANSI Compliance\" of " + - "Spark's documentation. Some ANSI dialect features may be not from the ANSI SQL " + - "standard directly, but their behaviors align with ANSI SQL's style") -.version("3.0.0") -.booleanConf -.createWithDefault(false) - val OPTIMIZER_EXCLUDED_RULES = buildConf("spar
[spark] branch master updated (82a3150 -> 8bfb4f1)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82a3150 [SPARK-36524][SQL] Common class for ANSI interval types add 8bfb4f1 Revert "[SPARK-35028][SQL] ANSI mode: disallow group by aliases" No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md|1 - .../spark/sql/catalyst/analysis/Analyzer.scala |2 +- .../org/apache/spark/sql/internal/SQLConf.scala| 27 +- .../sql-tests/inputs/ansi/group-analytics.sql |1 - .../sql-tests/results/ansi/group-analytics.sql.out | 1293 5 files changed, 14 insertions(+), 1310 deletions(-) delete mode 100644 sql/core/src/test/resources/sql-tests/inputs/ansi/group-analytics.sql delete mode 100644 sql/core/src/test/resources/sql-tests/results/ansi/group-analytics.sql.out - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-36379][SQL][3.1] Null at root level of a JSON array should not fail w/ permissive mode
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 32d127d [SPARK-36379][SQL][3.1] Null at root level of a JSON array should not fail w/ permissive mode 32d127d is described below commit 32d127de4a4a628276e659bd6a5d572c625ed565 Author: Hyukjin Kwon AuthorDate: Tue Aug 17 21:10:44 2021 +0900 [SPARK-36379][SQL][3.1] Null at root level of a JSON array should not fail w/ permissive mode This PR backports https://github.com/apache/spark/pull/33608 to branch-3.1 - ### What changes were proposed in this pull request? This PR proposes to fail properly so JSON parser can proceed and parse the input with the permissive mode. Previously, we passed `null`s as are, the root `InternalRow`s became `null`s, and it causes the query fails even with permissive mode on. Now, we fail explicitly if `null` is passed when the input array contains `null`. Note that this is consistent with non-array JSON input: **Permissive mode:** ```scala spark.read.json(Seq("""{"a": "str"}""", """null""").toDS).collect() ``` ``` res0: Array[org.apache.spark.sql.Row] = Array([str], [null]) ``` **Failfast mode**: ```scala spark.read.option("mode", "failfast").json(Seq("""{"a": "str"}""", """null""").toDS).collect() ``` ``` org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:70) at org.apache.spark.sql.DataFrameReader.$anonfun$json$7(DataFrameReader.scala:540) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) ``` ### Why are the changes needed? To make the permissive mode to proceed and parse without throwing an exception. ### Does this PR introduce _any_ user-facing change? **Permissive mode:** ```scala spark.read.json(Seq("""[{"a": "str"}, null]""").toDS).collect() ``` Before: ``` java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) ``` After: ``` res0: Array[org.apache.spark.sql.Row] = Array([null]) ``` NOTE that this behaviour is consistent when JSON object is malformed: ```scala spark.read.schema("a int").json(Seq("""[{"a": 123}, {123123}, {"a": 123}]""").toDS).collect() ``` ``` res0: Array[org.apache.spark.sql.Row] = Array([null]) ``` Since we're parsing _one_ JSON array, related records all fail together. **Failfast mode:** ```scala spark.read.option("mode", "failfast").json(Seq("""[{"a": "str"}, null]""").toDS).collect() ``` Before: ``` java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) ``` After: ``` org.apache.spark.SparkException: Malformed records are detected in record parsing. Parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'. at org.apache.spark.sql.catalyst.util.FailureSafeParser.parse(FailureSafeParser.scala:70) at org.apache.spark.sql.DataFrameReader.$anonfun$json$7(DataFrameReader.scala:540) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) ``` ### How was this patch tested? Manually tested, and unit test was added. Closes #33762 from HyukjinKwon/cherry-pick-SPARK-36379. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/catalyst/json/JacksonParser.scala | 9 ++--- .../spark/sql/execution/datasources/json/JsonSuite.scala | 14 ++ 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala index bbcf
[spark] branch branch-3.2 updated: [SPARK-36524][SQL] Common class for ANSI interval types
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 07c6976 [SPARK-36524][SQL] Common class for ANSI interval types 07c6976 is described below commit 07c6976f79e418be8aed9bed8e7b396231a27c25 Author: Max Gekk AuthorDate: Tue Aug 17 12:27:56 2021 +0300 [SPARK-36524][SQL] Common class for ANSI interval types ### What changes were proposed in this pull request? Add new type `AnsiIntervalType` to `AbstractDataType.scala`, and extend it by `YearMonthIntervalType` and by `DayTimeIntervalType` ### Why are the changes needed? To improve code maintenance. The change will allow to replace checking of both `YearMonthIntervalType` and `DayTimeIntervalType` by a check of `AnsiIntervalType`, for instance: ```scala case _: YearMonthIntervalType | _: DayTimeIntervalType => false ``` by ```scala case _: AnsiIntervalType => false ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By existing test suites. Closes #33753 from MaxGekk/ansi-interval-type-trait. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 82a31508afffd089048e28276c75b5deb1ada47f) Signed-off-by: Max Gekk --- .../avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala | 2 +- .../scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 8 .../org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 2 +- .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 2 +- .../org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 4 ++-- .../spark/sql/catalyst/expressions/collectionOperations.scala | 2 +- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 2 +- .../main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala | 4 ++-- .../main/scala/org/apache/spark/sql/types/AbstractDataType.scala | 5 + .../scala/org/apache/spark/sql/types/DayTimeIntervalType.scala| 2 +- .../scala/org/apache/spark/sql/types/YearMonthIntervalType.scala | 2 +- .../spark/sql/execution/datasources/csv/CSVFileFormat.scala | 2 +- .../spark/sql/execution/datasources/json/JsonFileFormat.scala | 2 +- .../spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +- .../apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala | 4 ++-- .../spark/sql/execution/datasources/v2/json/JsonTable.scala | 2 +- .../apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala | 2 +- .../spark/sql/execution/datasources/v2/parquet/ParquetTable.scala | 2 +- .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala| 2 +- .../spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala| 5 ++--- .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 22 files changed, 33 insertions(+), 29 deletions(-) diff --git a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala index 68b393e..5b8afe8 100644 --- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala +++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala @@ -71,7 +71,7 @@ private[sql] object AvroUtils extends Logging { } def supportsDataType(dataType: DataType): Boolean = dataType match { -case _: DayTimeIntervalType | _: YearMonthIntervalType => false +case _: AnsiIntervalType => false case _: AtomicType => true diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 468986d..2f0a709 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -377,9 +377,9 @@ class Analyzer(override val catalogManager: CatalogManager) TimestampAddYMInterval(r, l) case (CalendarIntervalType, CalendarIntervalType) | (_: DayTimeIntervalType, _: DayTimeIntervalType) => a - case (_: NullType, _: DayTimeIntervalType | _: YearMonthIntervalType) => + case (_: NullType, _: AnsiIntervalType) => a.copy(left = Cast(a.left, a.right.dataType)) - case (_: DayTimeIntervalType | _: YearMonthIntervalType, _: NullType) => + case (_: AnsiIntervalType, _: NullType) => a.copy(right = Cast(a.right, a.left.dataType)) case (DateType, CalendarIntervalType) => DateAddInterval(l, r, ansiEnabled = f) case (_, CalendarIntervalType | _: DayTimeInter
[spark] branch master updated (ea13c5a -> 82a3150)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ea13c5a [SPARK-36052][K8S][FOLLOWUP] Update config version to 3.2.0 add 82a3150 [SPARK-36524][SQL] Common class for ANSI interval types No new revisions were added by this update. Summary of changes: .../avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala | 2 +- .../scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 8 .../org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala | 2 +- .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala| 2 +- .../org/apache/spark/sql/catalyst/expressions/arithmetic.scala| 4 ++-- .../spark/sql/catalyst/expressions/collectionOperations.scala | 2 +- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 2 +- .../main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala | 4 ++-- .../main/scala/org/apache/spark/sql/types/AbstractDataType.scala | 5 + .../scala/org/apache/spark/sql/types/DayTimeIntervalType.scala| 2 +- .../scala/org/apache/spark/sql/types/YearMonthIntervalType.scala | 2 +- .../spark/sql/execution/datasources/csv/CSVFileFormat.scala | 2 +- .../spark/sql/execution/datasources/json/JsonFileFormat.scala | 2 +- .../spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/parquet/ParquetFileFormat.scala | 2 +- .../apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala | 4 ++-- .../spark/sql/execution/datasources/v2/json/JsonTable.scala | 2 +- .../apache/spark/sql/execution/datasources/v2/orc/OrcTable.scala | 2 +- .../spark/sql/execution/datasources/v2/parquet/ParquetTable.scala | 2 +- .../sql/hive/thriftserver/SparkExecuteStatementOperation.scala| 2 +- .../spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala| 5 ++--- .../main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 22 files changed, 33 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org