date:20211206

[spark] branch master updated (d50d464 -> cd4476f)

2021-12-06 Thread sarutak

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d50d464  [SPARK-37555][SQL] spark-sql should pass last unclosed 
comment to backend
 add cd4476f  [SPARK-37469][WEBUI] unified shuffle read block time to 
shuffle read fetch wait time in StagePage

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/ui/static/stagepage.js|  16 
 .../spark/ui/static/stagespage-template.html   |   2 +-
 .../resources/org/apache/spark/ui/static/webui.css |   4 ++--
 .../org/apache/spark/status/AppStatusStore.scala   |   2 +-
 .../scala/org/apache/spark/status/storeTypes.scala |   5 +++--
 .../main/scala/org/apache/spark/ui/ToolTips.scala  |   2 +-
 .../scala/org/apache/spark/ui/jobs/StagePage.scala |   9 +
 .../spark/ui/jobs/TaskDetailsClassNames.scala  |   2 +-
 docs/img/AllStagesPageDetail6.png  | Bin 106909 -> 163423 bytes
 docs/web-ui.md |   2 +-
 10 files changed, 23 insertions(+), 21 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend

2021-12-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d50d464  [SPARK-37555][SQL] spark-sql should pass last unclosed 
comment to backend
d50d464 is described below

commit d50d464357847a2d858259926f6ff48cf5ad25a6
Author: Angerszh 
AuthorDate: Tue Dec 7 12:41:35 2021 +0800

[SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend

### What changes were proposed in this pull request?
In current spark-sql cli interface, if the end SQL is  not a close comment, 
the SQL won't be passed to backend engine and just ignored. This caused a 
problem that if user write a SQL with wrong comment. It's just ignored and 
won't throw exception.
For example:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
spark-sql>
```

After this pr:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^
```

In SPARK-33100 add this change https://github.com/apache/spark/pull/29982

Hive related code

https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490

### Why are the changes needed?
Exact exceptions are thrown for wrong statements, which is convenient for 
users to troubleshoot.

### Does this PR introduce _any_ user-facing change?
Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed 
to backend engine and if the statement is not correct, it will throw SQL 
exception.

### How was this patch tested?
added UT and  test by handle.

```
spark-sql> /* SELECT /*+ HINT() 4; */;
Error in query:
mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
26)

== SQL ==
/* SELECT /*+ HINT() 4; */;
--^^^

spark-sql> /* SELECT /*+ HINT() 4; */
 > SELECT 1;
1
Time taken: 3.16 seconds, Fetched 1 row(s)
spark-sql> /* SELECT /*+ HINT() */ 4; */;
spark-sql>
 > ;
spark-sql>
 > /* SELECT /*+ HINT() 4\\;
 > SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

== SQL ==
/* SELECT /*+ HINT() 4\\;
^^^
SELECT 1;

spark-sql>
```

Closes #34815 from AngersZh/SPARK-37555.

Authored-by: Angerszh 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala |  2 +-
 .../org/apache/spark/sql/hive/thriftserver/CliSuite.scala   | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index 6b5b412..3c4c4dd 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -613,7 +613,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver 
with Logging {
 
   isStatement = statementInProgress(index)
 }
-if (isStatement) {
+if (beginIndex < line.length()) {
   ret.add(line.substring(beginIndex))
 }
 ret
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
index b404d77..11e6578 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
@@ -620,4 +620,17 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 |""".stripMargin -> "SELECT 1"
 )
   }
+
+  test("SPARK-37555: spark-sql should pass last unclosed comment to backend") {
+runCliWithin(2.minute)(
+  // Only unclosed comment.
+  "/* SELECT /*+ HINT() 4; */;".stripMargin -> "mismatched input ';'",
+  // Unclosed nested

[spark] branch master updated: [SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted

2021-12-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 41a940f  [SPARK-37557][SQL] Replace object hash with sort aggregate if 
child is already sorted
41a940f is described below

commit 41a940f0713b3ecc2c4ca8be7630331fb4f5
Author: Cheng Su 
AuthorDate: Tue Dec 7 12:37:19 2021 +0800

[SPARK-37557][SQL] Replace object hash with sort aggregate if child is 
already sorted

### What changes were proposed in this pull request?

This is a follow up of 
https://github.com/apache/spark/pull/34702#discussion_r762743589 , where we can 
replace object hash aggregate with sort aggregate as well. This PR is to handle 
object hash aggregate.

### Why are the changes needed?

Increase coverage of rule by handling object hash aggregate as well.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified unit test in `ReplaceHashWithSortAggSuite.scala` to cover object 
hash aggregate (by using aggregate expression `COLLECT_LIST`).

Closes #34824 from c21/agg-rule-followup.

Authored-by: Cheng Su 
Signed-off-by: Wenchen Fan 
---
 .../sql/execution/ReplaceHashWithSortAgg.scala |  40 +---
 .../execution/aggregate/BaseAggregateExec.scala|  10 ++
 .../execution/aggregate/HashAggregateExec.scala|   9 --
 .../execution/ReplaceHashWithSortAggSuite.scala| 104 +++--
 4 files changed, 94 insertions(+), 69 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala
index 63ad2d0..4495bc9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala
@@ -20,17 +20,18 @@ package org.apache.spark.sql.execution
 import org.apache.spark.sql.catalyst.expressions.SortOrder
 import org.apache.spark.sql.catalyst.expressions.aggregate.{Complete, Final, 
Partial}
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.execution.aggregate.HashAggregateExec
+import org.apache.spark.sql.execution.aggregate.{BaseAggregateExec, 
HashAggregateExec, ObjectHashAggregateExec}
 import org.apache.spark.sql.internal.SQLConf
 
 /**
- * Replace [[HashAggregateExec]] with [[SortAggregateExec]] in the spark plan 
if:
+ * Replace hash-based aggregate with sort aggregate in the spark plan if:
  *
- * 1. The plan is a pair of partial and final [[HashAggregateExec]], and the 
child of partial
- *aggregate satisfies the sort order of corresponding 
[[SortAggregateExec]].
+ * 1. The plan is a pair of partial and final [[HashAggregateExec]] or 
[[ObjectHashAggregateExec]],
+ *and the child of partial aggregate satisfies the sort order of 
corresponding
+ *[[SortAggregateExec]].
  * or
- * 2. The plan is a [[HashAggregateExec]], and the child satisfies the sort 
order of
- *corresponding [[SortAggregateExec]].
+ * 2. The plan is a [[HashAggregateExec]] or [[ObjectHashAggregateExec]], and 
the child satisfies
+ *the sort order of corresponding [[SortAggregateExec]].
  *
  * Examples:
  * 1. aggregate after join:
@@ -47,9 +48,9 @@ import org.apache.spark.sql.internal.SQLConf
  *   | =>  |
  *   Sort(t1.i)Sort(t1.i)
  *
- * [[HashAggregateExec]] can be replaced when its child satisfies the sort 
order of
- * corresponding [[SortAggregateExec]]. [[SortAggregateExec]] is faster in the 
sense that
- * it does not have hashing overhead of [[HashAggregateExec]].
+ * Hash-based aggregate can be replaced when its child satisfies the sort 
order of
+ * corresponding sort aggregate. Sort aggregate is faster in the sense that
+ * it does not have hashing overhead of hash aggregate.
  */
 object ReplaceHashWithSortAgg extends Rule[SparkPlan] {
   def apply(plan: SparkPlan): SparkPlan = {
@@ -61,14 +62,15 @@ object ReplaceHashWithSortAgg extends Rule[SparkPlan] {
   }
 
   /**
-   * Replace [[HashAggregateExec]] with [[SortAggregateExec]].
+   * Replace [[HashAggregateExec]] and [[ObjectHashAggregateExec]] with 
[[SortAggregateExec]].
*/
   private def replaceHashAgg(plan: SparkPlan): SparkPlan = {
 plan.transformDown {
-  case hashAgg: HashAggregateExec if hashAgg.groupingExpressions.nonEmpty 
=>
+  case hashAgg: BaseAggregateExec if isHashBasedAggWithKeys(hashAgg) =>
 val sortAgg = hashAgg.toSortAggregate
 hashAgg.child match {
-  case partialAgg: HashAggregateExec if isPartialAgg(partialAgg, 
hashAgg) =>
+  case partialAgg: BaseAggregateExec
+if

[spark] branch master updated (bde47c8 -> 116255d)

2021-12-06 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bde47c8  [SPARK-37546][SQL] V2 ReplaceTableAsSelect command should 
qualify location
 add 116255d  
[SPARK-37506][CORE][SQL][DSTREAM][GRAPHX][ML][MLLIB][SS][EXAMPLES] Change the 
never changed 'var' to 'val'

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/deploy/ClientArguments.scala  |  2 +-
 .../main/scala/org/apache/spark/rdd/CoalescedRDD.scala   |  2 +-
 .../main/scala/org/apache/spark/status/LiveEntity.scala  |  3 +--
 .../spark/storage/ShuffleBlockFetcherIterator.scala  |  2 +-
 .../scala/org/apache/spark/deploy/SparkSubmitSuite.scala |  2 +-
 .../org/apache/spark/deploy/SparkSubmitUtilsSuite.scala  |  2 +-
 .../org/apache/spark/scheduler/TaskSetManagerSuite.scala |  2 +-
 .../org/apache/spark/examples/MiniReadWriteTest.scala|  8 
 .../spark/sql/kafka010/KafkaOffsetReaderAdmin.scala  |  2 +-
 .../spark/sql/kafka010/KafkaOffsetReaderConsumer.scala   |  2 +-
 .../org/apache/spark/graphx/util/GraphGenerators.scala   |  2 +-
 .../scala/org/apache/spark/ml/linalg/BLASBenchmark.scala | 16 
 .../scala/org/apache/spark/mllib/feature/Word2Vec.scala  |  2 +-
 .../org/apache/spark/ml/feature/InteractionSuite.scala   |  4 ++--
 .../org/apache/spark/ml/recommendation/ALSSuite.scala|  2 +-
 .../src/main/scala/org/apache/spark/sql/Row.scala|  2 +-
 .../spark/sql/catalyst/expressions/jsonExpressions.scala |  2 +-
 .../apache/spark/sql/catalyst/util/DateTimeUtils.scala   |  2 +-
 .../columnar/compression/compressionSchemes.scala|  2 +-
 .../execution/datasources/BasicWriteStatsTracker.scala   |  2 +-
 .../test/scala/org/apache/spark/sql/SubquerySuite.scala  |  2 +-
 .../datasources/parquet/ParquetColumnIndexSuite.scala|  2 +-
 .../sql/streaming/test/DataStreamReaderWriterSuite.scala |  4 ++--
 .../apache/spark/sql/hive/client/HiveClientImpl.scala|  2 +-
 .../spark/streaming/ReceivedBlockTrackerSuite.scala  |  4 ++--
 25 files changed, 38 insertions(+), 39 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0b959b5 -> bde47c8)

2021-12-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0b959b5  [SPARK-37552][SQL] Add the `convert_timezone()` function
 add bde47c8  [SPARK-37546][SQL] V2 ReplaceTableAsSelect command should 
qualify location

No new revisions were added by this update.

Summary of changes:
 .../datasources/v2/DataSourceV2Strategy.scala  | 25 +++---
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 17 +++
 2 files changed, 29 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (72669b5 -> 0b959b5)

2021-12-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 72669b5  [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3
 add 0b959b5  [SPARK-37552][SQL] Add the `convert_timezone()` function

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  1 +
 .../catalyst/expressions/datetimeExpressions.scala | 53 ++
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 17 +++
 .../expressions/DateExpressionsSuite.scala | 40 
 .../sql/catalyst/util/DateTimeUtilsSuite.scala | 24 ++
 .../sql-functions/sql-expression-schema.md |  3 +-
 .../resources/sql-tests/inputs/timestamp-ntz.sql   |  2 +
 .../sql-tests/results/timestamp-ntz.sql.out| 10 +++-
 8 files changed, 148 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon closed pull request #372: Regenerate PySpark documentation for Spark 3.2.0

2021-12-06 Thread GitBox



HyukjinKwon closed pull request #372:
URL: https://github.com/apache/spark-website/pull/372


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] HyukjinKwon commented on pull request #372: Regenerate PySpark documentation for Spark 3.2.0

2021-12-06 Thread GitBox



HyukjinKwon commented on pull request #372:
URL: https://github.com/apache/spark-website/pull/372#issuecomment-986556653


   Thanks.
   
   Merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3

2021-12-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 9a17d8b  [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3
9a17d8b is described below

commit 9a17d8b8657a7bb9eadb8e297ea75c8ca19ed988
Author: Hyukjin Kwon 
AuthorDate: Mon Dec 6 17:33:43 2021 +0900

[SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3

This PR upgrades Py4J from 0.10.9.2 to 0.10.9.3 which contains the bug fix 
(https://github.com/bartdag/py4j/pull/440) that directly affected us.

For example, once you cancel a cell in Jupyter, all following cells simply 
fail. This PR fixes the bug by upgrading Py4J.

To fix a regression in Spark 3.2.0 in notebooks like Jupyter.

Fixes a regression described in SPARK-37004

Manually tested the fix when I land 
https://github.com/bartdag/py4j/pull/440 to Py4J.

Closes #34814 from HyukjinKwon/SPARK-37004.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 72669b574ecbcfd35873aaf751807c90bb415c8f)
Signed-off-by: Hyukjin Kwon 
---
 bin/pyspark |   2 +-
 bin/pyspark2.cmd|   2 +-
 core/pom.xml|   2 +-
 .../org/apache/spark/api/python/PythonUtils.scala   |   2 +-
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 |   2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 |   2 +-
 python/docs/Makefile|   2 +-
 python/docs/make2.bat   |   2 +-
 python/docs/source/getting_started/install.rst  |   2 +-
 ...{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} | Bin 41839 -> 42021 bytes
 python/setup.py |   2 +-
 sbin/spark-config.sh|   2 +-
 12 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/bin/pyspark b/bin/pyspark
index 38ebe51..4840589 100755
--- a/bin/pyspark
+++ b/bin/pyspark
@@ -50,7 +50,7 @@ export PYSPARK_DRIVER_PYTHON_OPTS
 
 # Add the PySpark classes to the Python path:
 export PYTHONPATH="${SPARK_HOME}/python/:$PYTHONPATH"
-export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.2-src.zip:$PYTHONPATH"
+export PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.9.3-src.zip:$PYTHONPATH"
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP="$PYTHONSTARTUP"
diff --git a/bin/pyspark2.cmd b/bin/pyspark2.cmd
index f5f9fad..a19627a 100644
--- a/bin/pyspark2.cmd
+++ b/bin/pyspark2.cmd
@@ -30,7 +30,7 @@ if "x%PYSPARK_DRIVER_PYTHON%"=="x" (
 )
 
 set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
-set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.2-src.zip;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.10.9.3-src.zip;%PYTHONPATH%
 
 set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%
 set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py
diff --git a/core/pom.xml b/core/pom.xml
index 2229a95..936ab7f 100644
--- a/core/pom.xml
+++ b/core/pom.xml
@@ -433,7 +433,7 @@
 
   net.sf.py4j
   py4j
-  0.10.9.2
+  0.10.9.3
 
 
   org.apache.spark
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
index 549edc4..8daba86 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
@@ -27,7 +27,7 @@ import org.apache.spark.SparkContext
 import org.apache.spark.api.java.{JavaRDD, JavaSparkContext}
 
 private[spark] object PythonUtils {
-  val PY4J_ZIP_NAME = "py4j-0.10.9.2-src.zip"
+  val PY4J_ZIP_NAME = "py4j-0.10.9.3-src.zip"
 
   /** Get the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or 
from our JAR */
   def sparkPythonPath: String = {
diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 
b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
index ae774b3..909a77c 100644
--- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3
@@ -208,7 +208,7 @@ 
parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar
 parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar
 parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar
 protobuf-java/2.5.0//protobuf-java-2.5.0.jar
-py4j/0.10.9.2//py4j-0.10.9.2.jar
+py4j/0.10.9.3//py4j-0.10.9.3.jar
 pyrolite/4.30//pyrolite-4.30.jar
 rocksdbjni/6.20.3//rocksdbjni-6.20.3.jar
 scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 
b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
index a02f318..79d730c 100644
--- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3
@@ -179,7 +179,7 @@

[spark] branch master updated (4f36978 -> 72669b5)

2021-12-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4f36978  [SPARK-37360][SQL] Support TimestampNTZ in JSON data source
 add 72669b5  [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3

No new revisions were added by this update.

Summary of changes:
 bin/pyspark |   2 +-
 bin/pyspark2.cmd|   2 +-
 core/pom.xml|   2 +-
 .../org/apache/spark/api/python/PythonUtils.scala   |   2 +-
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 |   2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 |   2 +-
 python/docs/Makefile|   2 +-
 python/docs/make2.bat   |   2 +-
 python/docs/source/getting_started/install.rst  |   2 +-
 ...{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} | Bin 41839 -> 42021 bytes
 python/setup.py |   2 +-
 sbin/spark-config.sh|   2 +-
 12 files changed, 11 insertions(+), 11 deletions(-)
 rename python/lib/{py4j-0.10.9.2-src.zip => py4j-0.10.9.3-src.zip} (55%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37360][SQL] Support TimestampNTZ in JSON data source

2021-12-06 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f36978  [SPARK-37360][SQL] Support TimestampNTZ in JSON data source
4f36978 is described below

commit 4f369789bd5d6cc81a85fe01a37e0ae90cbdeb6c
Author: Ivan Sadikov 
AuthorDate: Mon Dec 6 13:24:46 2021 +0500

[SPARK-37360][SQL] Support TimestampNTZ in JSON data source

### What changes were proposed in this pull request?

This PR adds support for TimestampNTZ type in the JSON data source.

Most of the functionality has already been added, this patch verifies that 
writes + reads work for TimestampNTZ type and adds schema inference depending 
on the timestamp value format written. The following applies:
- If there is a mixture of `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values, use 
`TIMESTAMP_LTZ`.
- If there are only `TIMESTAMP_NTZ` values, resolve using the the default 
timestamp type configured with `spark.sql.timestampType`.

In addition, I introduced a new JSON option `timestampNTZFormat` which is 
similar to `timestampFormat` but it allows to configure read/write pattern for 
`TIMESTAMP_NTZ` types. It is basically a copy of timestamp pattern but without 
timezone.

### Why are the changes needed?

The PR fixes issues when writing and reading TimestampNTZ to and from JSON.

### Does this PR introduce _any_ user-facing change?

Previously, JSON data source would infer timestamp values as 
`TimestampType` when reading a JSON file. Now, the data source would infer the 
timestamp value type based on the format (with or without timezone) and default 
timestamp type based on `spark.sql.timestampType`.

A new JSON option `timestampNTZFormat` is added to control the way values 
are formatted during writes or parsed during reads.

### How was this patch tested?

I extended `JsonSuite` with a few unit tests to verify that write-read 
roundtrip works for `TIMESTAMP_NTZ` and `TIMESTAMP_LTZ` values.

Closes #34638 from sadikovi/timestamp-ntz-support-json.

Authored-by: Ivan Sadikov 
Signed-off-by: Max Gekk 
---
 docs/sql-data-sources-json.md  |  10 +-
 .../spark/sql/catalyst/json/JSONOptions.scala  |   9 +-
 .../spark/sql/catalyst/json/JacksonGenerator.scala |   2 +-
 .../spark/sql/catalyst/json/JacksonParser.scala|   4 +-
 .../spark/sql/catalyst/json/JsonInferSchema.scala  |  12 ++
 .../sql/execution/datasources/json/JsonSuite.scala | 194 -
 6 files changed, 216 insertions(+), 15 deletions(-)

diff --git a/docs/sql-data-sources-json.md b/docs/sql-data-sources-json.md
index 5e3bd2b..b5f27aa 100644
--- a/docs/sql-data-sources-json.md
+++ b/docs/sql-data-sources-json.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
  http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -197,6 +197,12 @@ Data source options of JSON can be set via:
 read/write
   
   
+timestampNTZFormat
+-MM-dd'T'HH:mm:ss[.SSS]
+Sets the string that indicates a timestamp without timezone format. 
Custom date formats follow the formats at https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html;>Datetime
 Patterns. This applies to timestamp without timezone type, note that 
zone-offset and time-zone components are not supported when writing or reading 
this data type.
+read/write
+  
+  
 multiLine
 false
 Parse one record, which may span multiple lines, per file.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
index 029c014..e801912 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
@@ -106,6 +106,10 @@ private[sql] class JSONOptions(
   s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS][XXX]"
 })
 
+  val timestampNTZFormatInRead: Option[String] = 
parameters.get("timestampNTZFormat")
+  val timestampNTZFormatInWrite: String =
+parameters.getOrElse("timestampNTZFormat", 
s"${DateFormatter.defaultPattern}'T'HH:mm:ss[.SSS]")
+
   val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
   /**
@@ -138,8 +142,9 @@ private[sql] class JSONOptions(
   val pretty: Boolean =

[spark] branch master updated (eb2eb9e -> 66b256e)

2021-12-06 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eb2eb9e  [SPARK-37550][SQL][DOCS] Add an example of parsing jsonStr 
with complex types for from_json
 add 66b256e  [SPARK-37540][SQL] Detect more unsupported time travel

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 17 +
 .../sql/catalyst/analysis/CTESubstitution.scala|  9 +++--
 .../sql/catalyst/analysis/RelationTimeTravel.scala |  2 ++
 .../spark/sql/catalyst/trees/TreePatterns.scala|  1 +
 .../spark/sql/errors/QueryCompilationErrors.scala  |  4 +--
 .../spark/sql/execution/datasources/rules.scala| 42 ++
 .../datasources/v2/V2SessionCatalog.scala  |  3 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 10 ++
 .../spark/sql/execution/SQLViewTestSuite.scala |  4 +--
 9 files changed, 62 insertions(+), 30 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d50d464 -> cd4476f)

[spark] branch master updated: [SPARK-37555][SQL] spark-sql should pass last unclosed comment to backend

[spark] branch master updated: [SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted

[spark] branch master updated (bde47c8 -> 116255d)

[spark] branch master updated (0b959b5 -> bde47c8)

[spark] branch master updated (72669b5 -> 0b959b5)

[GitHub] [spark-website] HyukjinKwon closed pull request #372: Regenerate PySpark documentation for Spark 3.2.0

[GitHub] [spark-website] HyukjinKwon commented on pull request #372: Regenerate PySpark documentation for Spark 3.2.0

[spark] branch branch-3.2 updated: [SPARK-37004][PYTHON] Upgrade to Py4J 0.10.9.3

[spark] branch master updated (4f36978 -> 72669b5)

[spark] branch master updated: [SPARK-37360][SQL] Support TimestampNTZ in JSON data source

[spark] branch master updated (eb2eb9e -> 66b256e)

12 matches

Site Navigation

Mail list logo

Footer information