[spark] branch master updated (d50d464 -> cd4476f)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d50d464 [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend add cd4476f [SPARK-37469][WEBUI] unified shuffle read block time to shuffle read fetch wait time in StagePage No new revisions were added by this update. Summary of changes: .../org/apache/spark/ui/static/stagepage.js| 16 .../spark/ui/static/stagespage-template.html | 2 +- .../resources/org/apache/spark/ui/static/webui.css | 4 ++-- .../org/apache/spark/status/AppStatusStore.scala | 2 +- .../scala/org/apache/spark/status/storeTypes.scala | 5 +++-- .../main/scala/org/apache/spark/ui/ToolTips.scala | 2 +- .../scala/org/apache/spark/ui/jobs/StagePage.scala | 9 + .../spark/ui/jobs/TaskDetailsClassNames.scala | 2 +- docs/img/AllStagesPageDetail6.png | Bin 106909 -> 163423 bytes docs/web-ui.md | 2 +- 10 files changed, 23 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d50d464 [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend d50d464 is described below commit d50d464357847a2d858259926f6ff48cf5ad25a6 Author: Angerszh AuthorDate: Tue Dec 7 12:41:35 2021 +0800 [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend ### What changes were proposed in this pull request? In current spark-sql cli interface, if the end SQL is not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception. For example: ``` spark-sql> /* This is a comment without end symbol SELECT 1; spark-sql> ``` After this pr: ``` spark-sql> /* This is a comment without end symbol SELECT 1; Error in query: Unclosed bracketed comment(line 1, pos 0) == SQL == /* This is a comment without end symbol SELECT 1; ^^^ ``` In SPARK-33100 add this change https://github.com/apache/spark/pull/29982 Hive related code https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490 ### Why are the changes needed? Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot. ### Does this PR introduce _any_ user-facing change? Yes, if user write a wrong comment in sql/sql file or query in the end. Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception. ### How was this patch tested? added UT and test by handle. ``` spark-sql> /* SELECT /*+ HINT() 4; */; Error in query: mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26) == SQL == /* SELECT /*+ HINT() 4; */; --^^^ spark-sql> /* SELECT /*+ HINT() 4; */ > SELECT 1; 1 Time taken: 3.16 seconds, Fetched 1 row(s) spark-sql> /* SELECT /*+ HINT() */ 4; */; spark-sql> > ; spark-sql> > /* SELECT /*+ HINT() 4\\; > SELECT 1; Error in query: Unclosed bracketed comment(line 1, pos 0) == SQL == /* SELECT /*+ HINT() 4\\; ^^^ SELECT 1; spark-sql> ``` Closes #34815 from AngersZh/SPARK-37555. Authored-by: Angerszh Signed-off-by: Wenchen Fan --- .../spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 2 +- .../org/apache/spark/sql/hive/thriftserver/CliSuite.scala | 13 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala index 6b5b412..3c4c4dd 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala @@ -613,7 +613,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging { isStatement = statementInProgress(index) } -if (isStatement) { +if (beginIndex < line.length()) { ret.add(line.substring(beginIndex)) } ret diff --git a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala index b404d77..11e6578 100644 --- a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala +++ b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala @@ -620,4 +620,17 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { |""".stripMargin -> "SELECT 1" ) } + + test("SPARK-37555: spark-sql should pass last unclosed comment to backend") { +runCliWithin(2.minute)( + // Only unclosed comment. + "/* SELECT /*+ HINT() 4; */;".stripMargin -> "mismatched input ';'", + // Unclosed nested
[spark] branch master updated: [SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 41a940f [SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted 41a940f is described below commit 41a940f0713b3ecc2c4ca8be7630331fb4f5 Author: Cheng Su AuthorDate: Tue Dec 7 12:37:19 2021 +0800 [SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted ### What changes were proposed in this pull request? This is a follow up of https://github.com/apache/spark/pull/34702#discussion_r762743589 , where we can replace object hash aggregate with sort aggregate as well. This PR is to handle object hash aggregate. ### Why are the changes needed? Increase coverage of rule by handling object hash aggregate as well. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified unit test in `ReplaceHashWithSortAggSuite.scala` to cover object hash aggregate (by using aggregate expression `COLLECT_LIST`). Closes #34824 from c21/agg-rule-followup. Authored-by: Cheng Su Signed-off-by: Wenchen Fan --- .../sql/execution/ReplaceHashWithSortAgg.scala | 40 +--- .../execution/aggregate/BaseAggregateExec.scala| 10 ++ .../execution/aggregate/HashAggregateExec.scala| 9 -- .../execution/ReplaceHashWithSortAggSuite.scala| 104 +++-- 4 files changed, 94 insertions(+), 69 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala index 63ad2d0..4495bc9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala @@ -20,17 +20,18 @@ package org.apache.spark.sql.execution import org.apache.spark.sql.catalyst.expressions.SortOrder import org.apache.spark.sql.catalyst.expressions.aggregate.{Complete, Final, Partial} import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.execution.aggregate.HashAggregateExec +import org.apache.spark.sql.execution.aggregate.{BaseAggregateExec, HashAggregateExec, ObjectHashAggregateExec} import org.apache.spark.sql.internal.SQLConf /** - * Replace [[HashAggregateExec]] with [[SortAggregateExec]] in the spark plan if: + * Replace hash-based aggregate with sort aggregate in the spark plan if: * - * 1. The plan is a pair of partial and final [[HashAggregateExec]], and the child of partial - *aggregate satisfies the sort order of corresponding [[SortAggregateExec]]. + * 1. The plan is a pair of partial and final [[HashAggregateExec]] or [[ObjectHashAggregateExec]], + *and the child of partial aggregate satisfies the sort order of corresponding + *[[SortAggregateExec]]. * or - * 2. The plan is a [[HashAggregateExec]], and the child satisfies the sort order of - *corresponding [[SortAggregateExec]]. + * 2. The plan is a [[HashAggregateExec]] or [[ObjectHashAggregateExec]], and the child satisfies + *the sort order of corresponding [[SortAggregateExec]]. * * Examples: * 1. aggregate after join: @@ -47,9 +48,9 @@ import org.apache.spark.sql.internal.SQLConf * | => | * Sort(t1.i)Sort(t1.i) * - * [[HashAggregateExec]] can be replaced when its child satisfies the sort order of - * corresponding [[SortAggregateExec]]. [[SortAggregateExec]] is faster in the sense that - * it does not have hashing overhead of [[HashAggregateExec]]. + * Hash-based aggregate can be replaced when its child satisfies the sort order of + * corresponding sort aggregate. Sort aggregate is faster in the sense that + * it does not have hashing overhead of hash aggregate. */ object ReplaceHashWithSortAgg extends Rule[SparkPlan] { def apply(plan: SparkPlan): SparkPlan = { @@ -61,14 +62,15 @@ object ReplaceHashWithSortAgg extends Rule[SparkPlan] { } /** - * Replace [[HashAggregateExec]] with [[SortAggregateExec]]. + * Replace [[HashAggregateExec]] and [[ObjectHashAggregateExec]] with [[SortAggregateExec]]. */ private def replaceHashAgg(plan: SparkPlan): SparkPlan = { plan.transformDown { - case hashAgg: HashAggregateExec if hashAgg.groupingExpressions.nonEmpty => + case hashAgg: BaseAggregateExec if isHashBasedAggWithKeys(hashAgg) => val sortAgg = hashAgg.toSortAggregate hashAgg.child match { - case partialAgg: HashAggregateExec if isPartialAgg(partialAgg, hashAgg) => + case partialAgg: BaseAggregateExec +if
[spark] branch master updated (bde47c8 -> 116255d)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bde47c8 [SPARK-37546][SQL] V2 ReplaceTableAsSelect command should qualify location add 116255d [SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the never changed 'var' to 'val' No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/deploy/ClientArguments.scala | 2 +- .../main/scala/org/apache/spark/rdd/CoalescedRDD.scala | 2 +- .../main/scala/org/apache/spark/status/LiveEntity.scala | 3 +-- .../spark/storage/ShuffleBlockFetcherIterator.scala | 2 +- .../scala/org/apache/spark/deploy/SparkSubmitSuite.scala | 2 +- .../org/apache/spark/deploy/SparkSubmitUtilsSuite.scala | 2 +- .../org/apache/spark/scheduler/TaskSetManagerSuite.scala | 2 +- .../org/apache/spark/examples/MiniReadWriteTest.scala| 8 .../spark/sql/kafka010/KafkaOffsetReaderAdmin.scala | 2 +- .../spark/sql/kafka010/KafkaOffsetReaderConsumer.scala | 2 +- .../org/apache/spark/graphx/util/GraphGenerators.scala | 2 +- .../scala/org/apache/spark/ml/linalg/BLASBenchmark.scala | 16 .../scala/org/apache/spark/mllib/feature/Word2Vec.scala | 2 +- .../org/apache/spark/ml/feature/InteractionSuite.scala | 4 ++-- .../org/apache/spark/ml/recommendation/ALSSuite.scala| 2 +- .../src/main/scala/org/apache/spark/sql/Row.scala| 2 +- .../spark/sql/catalyst/expressions/jsonExpressions.scala | 2 +- .../apache/spark/sql/catalyst/util/DateTimeUtils.scala | 2 +- .../columnar/compression/compressionSchemes.scala| 2 +- .../execution/datasources/BasicWriteStatsTracker.scala | 2 +- .../test/scala/org/apache/spark/sql/SubquerySuite.scala | 2 +- .../datasources/parquet/ParquetColumnIndexSuite.scala| 2 +- .../sql/streaming/test/DataStreamReaderWriterSuite.scala | 4 ++-- .../apache/spark/sql/hive/client/HiveClientImpl.scala| 2 +- .../spark/streaming/ReceivedBlockTrackerSuite.scala | 4 ++-- 25 files changed, 38 insertions(+), 39 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0b959b5 -> bde47c8)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0b959b5 [SPARK-37552][SQL] Add the `convert_timezone()` function add bde47c8 [SPARK-37546][SQL] V2 ReplaceTableAsSelect command should qualify location No new revisions were added by this update. Summary of changes: .../datasources/v2/DataSourceV2Strategy.scala | 25 +++--- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 17 +++ 2 files changed, 29 insertions(+), 13 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (72669b5 -> 0b959b5)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 72669b5 [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3 add 0b959b5 [SPARK-37552][SQL] Add the `convert_timezone()` function No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../catalyst/expressions/datetimeExpressions.scala | 53 ++ .../spark/sql/catalyst/util/DateTimeUtils.scala| 17 +++ .../expressions/DateExpressionsSuite.scala | 40 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 24 ++ .../sql-functions/sql-expression-schema.md | 3 +- .../resources/sql-tests/inputs/timestamp-ntz.sql | 2 + .../sql-tests/results/timestamp-ntz.sql.out| 10 +++- 8 files changed, 148 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon closed pull request #372: Regenerate PySpark documentation for Spark 3.2.0
HyukjinKwon closed pull request #372: URL: https://github.com/apache/spark-website/pull/372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #372: Regenerate PySpark documentation for Spark 3.2.0
HyukjinKwon commented on pull request #372: URL: https://github.com/apache/spark-website/pull/372#issuecomment-986556653 Thanks. Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 9a17d8b [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3 9a17d8b is described below commit 9a17d8b8657a7bb9eadb8e297ea75c8ca19ed988 Author: Hyukjin Kwon AuthorDate: Mon Dec 6 17:33:43 2021 +0900 [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3 This PR upgrades Py4J from 0.10.9.2 to 0.10.9.3 which contains the bug fix (https://github.com/bartdag/py4j/pull/440) that directly affected us. For example, once you cancel a cell in Jupyter, all following cells simply fail. This PR fixes the bug by upgrading Py4J. To fix a regression in Spark 3.2.0 in notebooks like Jupyter. Fixes a regression described in SPARK-37004 Manually tested the fix when I land https://github.com/bartdag/py4j/pull/440 to Py4J. Closes #34814 from HyukjinKwon/SPARK-37004. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit 72669b574ecbcfd35873aaf751807c90bb415c8f) Signed-off-by: Hyukjin Kwon --- bin/pyspark | 2 +- bin/pyspark2.cmd| 2 +- core/pom.xml| 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- python/docs/Makefile| 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- ...{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} | Bin 41839 -> 42021 bytes python/setup.py | 2 +- sbin/spark-config.sh| 2 +- 12 files changed, 11 insertions(+), 11 deletions(-) diff --git a/bin/pyspark b/bin/pyspark index 38ebe51..4840589 100755 --- a/bin/pyspark +++ b/bin/pyspark @@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS # Add the PySpark classes to the Python path: export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH" -export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.2-src.zip:$PYTHONPATH" +export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH" # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP="$PYTHONSTARTUP" diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd index f5f9fad..a19627a 100644 --- a/bin/pyspark2.cmd +++ b/bin/pyspark2.cmd @@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" ( ) set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% -set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.2-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py diff --git a/core/pom.xml b/core/pom.xml index 2229a95..936ab7f 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -433,7 +433,7 @@ net.sf.py4j py4j - 0.10.9.2 + 0.10.9.3 org.apache.spark diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala index 549edc4..8daba86 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala @@ -27,7 +27,7 @@ import org.apache.spark.SparkContext import org.apache.spark.api.java.{JavaRDD, JavaSparkContext} private[spark] object PythonUtils { - val PY4J_ZIP_NAME = "py4j-0.10.9.2-src.zip" + val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip" /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our JAR */ def sparkPythonPath: String = { diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index ae774b3..909a77c 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -208,7 +208,7 @@ parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar protobuf-java/2.5.0//protobuf-java-2.5.0.jar -py4j/0.10.9.2//py4j-0.10.9.2.jar +py4j/0.10.9.3//py4j-0.10.9.3.jar pyrolite/4.30//pyrolite-4.30.jar rocksdbjni/6.20.3//rocksdbjni-6.20.3.jar scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index a02f318..79d730c 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -179,7 +179,7 @@
[spark] branch master updated (4f36978 -> 72669b5)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4f36978 [SPARK-37360][SQL] Support TimestampNTZ in JSON data source add 72669b5 [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3 No new revisions were added by this update. Summary of changes: bin/pyspark | 2 +- bin/pyspark2.cmd| 2 +- core/pom.xml| 2 +- .../org/apache/spark/api/python/PythonUtils.scala | 2 +- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- python/docs/Makefile| 2 +- python/docs/make2.bat | 2 +- python/docs/source/getting_started/install.rst | 2 +- ...{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} | Bin 41839 -> 42021 bytes python/setup.py | 2 +- sbin/spark-config.sh| 2 +- 12 files changed, 11 insertions(+), 11 deletions(-) rename python/lib/{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} (55%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37360][SQL] Support TimestampNTZ in JSON data source
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4f36978 [SPARK-37360][SQL] Support TimestampNTZ in JSON data source 4f36978 is described below commit 4f369789bd5d6cc81a85fe01a37e0ae90cbdeb6c Author: Ivan Sadikov AuthorDate: Mon Dec 6 13:24:46 2021 +0500 [SPARK-37360][SQL] Support TimestampNTZ in JSON data source ### What changes were proposed in this pull request? This PR adds support for TimestampNTZ type in the JSON data source. Most of the functionality has already been added, this patch verifies that writes + reads work for TimestampNTZ type and adds schema inference depending on the timestamp value format written. The following applies: - If there is a mixture of `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values, use `TIMESTAMP_LTZ`. - If there are only `TIMESTAMP_NTZ` values, resolve using the the default timestamp type configured with `spark.sql.timestampType`. In addition, I introduced a new JSON option `timestampNTZFormat` which is similar to `timestampFormat` but it allows to configure read/write pattern for `TIMESTAMP_NTZ` types. It is basically a copy of timestamp pattern but without timezone. ### Why are the changes needed? The PR fixes issues when writing and reading TimestampNTZ to and from JSON. ### Does this PR introduce _any_ user-facing change? Previously, JSON data source would infer timestamp values as `TimestampType` when reading a JSON file. Now, the data source would infer the timestamp value type based on the format (with or without timezone) and default timestamp type based on `spark.sql.timestampType`. A new JSON option `timestampNTZFormat` is added to control the way values are formatted during writes or parsed during reads. ### How was this patch tested? I extended `JsonSuite` with a few unit tests to verify that write-read roundtrip works for `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values. Closes #34638 from sadikovi/timestamp-ntz-support-json. Authored-by: Ivan Sadikov Signed-off-by: Max Gekk --- docs/sql-data-sources-json.md | 10 +- .../spark/sql/catalyst/json/JSONOptions.scala | 9 +- .../spark/sql/catalyst/json/JacksonGenerator.scala | 2 +- .../spark/sql/catalyst/json/JacksonParser.scala| 4 +- .../spark/sql/catalyst/json/JsonInferSchema.scala | 12 ++ .../sql/execution/datasources/json/JsonSuite.scala | 194 - 6 files changed, 216 insertions(+), 15 deletions(-) diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md index 5e3bd2b..b5f27aa 100644 --- a/docs/sql-data-sources-json.md +++ b/docs/sql-data-sources-json.md @@ -9,9 +9,9 @@ license: | The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -197,6 +197,12 @@ Data source options of JSON can be set via: read/write +timestampNTZFormat +-MM-dd'T'HH:mm:ss[.SSS] +Sets the string that indicates a timestamp without timezone format. Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;>Datetime Patterns. This applies to timestamp without timezone type, note that zone-offset and time-zone components are not supported when writing or reading this data type. +read/write + + multiLine false Parse one record, which may span multiple lines, per file. diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala index 029c014..e801912 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala @@ -106,6 +106,10 @@ private[sql] class JSONOptions( s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS][XXX]" }) + val timestampNTZFormatInRead: Option[String] = parameters.get("timestampNTZFormat") + val timestampNTZFormatInWrite: String = +parameters.getOrElse("timestampNTZFormat", s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS]") + val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) /** @@ -138,8 +142,9 @@ private[sql] class JSONOptions( val pretty: Boolean =
[spark] branch master updated (eb2eb9e -> 66b256e)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eb2eb9e [SPARK-37550][SQL][DOCS] Add an example of parsing jsonStr with complex types for from_json add 66b256e [SPARK-37540][SQL] Detect more unsupported time travel No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 17 + .../sql/catalyst/analysis/CTESubstitution.scala| 9 +++-- .../sql/catalyst/analysis/RelationTimeTravel.scala | 2 ++ .../spark/sql/catalyst/trees/TreePatterns.scala| 1 + .../spark/sql/errors/QueryCompilationErrors.scala | 4 +-- .../spark/sql/execution/datasources/rules.scala| 42 ++ .../datasources/v2/V2SessionCatalog.scala | 3 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 10 ++ .../spark/sql/execution/SQLViewTestSuite.scala | 4 +-- 9 files changed, 62 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org