[spark] branch master updated (4aeddb8 -> 13ddc91)

2021-09-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4aeddb8  [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class
 add 13ddc91  [SPARK-36435][PYTHON] Implement MultIndex.equal_levels

No new revisions were added by this update.

Summary of changes:
 .../source/reference/pyspark.pandas/indexing.rst   |  1 +
 python/pyspark/pandas/indexes/multi.py | 37 +-
 python/pyspark/pandas/missing/indexes.py   |  1 -
 python/pyspark/pandas/tests/indexes/test_base.py   | 35 
 4 files changed, 72 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class

2021-09-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4aeddb8  [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class
4aeddb8 is described below

commit 4aeddb81d1cc9c464d5dddbd4d54dda4836a6078
Author: Karen Feng 
AuthorDate: Fri Oct 1 13:46:45 2021 +0900

[SPARK-36870][SQL] Introduce INTERNAL_ERROR error class

### What changes were proposed in this pull request?

Adds the `INTERNAL_ERROR` error class and the `isInternalError` API to 
`SparkThrowable`.
Removes existing error classes that are internal-only and replaces them 
with `INTERNAL_ERROR`.

### Why are the changes needed?

Makes it easy for end-users to diagnose whether an error is an internal 
error. If an end-user encounters an internal error, it should be reported 
immediately. This also limits the number of error classes, making it easy to 
audit. We do not need high-quality error messages for internal errors, as they 
should not be exposed to the end-user.

### Does this PR introduce _any_ user-facing change?

Yes; this changes the error class in master.

### How was this patch tested?

Unit tests

Closes #34123 from karenfeng/internal-error-class.

Authored-by: Karen Feng 
Signed-off-by: Hyukjin Kwon 
---
 .../main/java/org/apache/spark/SparkThrowable.java |  9 +++-
 .../apache/spark/memory/SparkOutOfMemoryError.java |  4 
 core/src/main/resources/error/README.md| 19 
 core/src/main/resources/error/error-classes.json   | 19 +++-
 .../main/scala/org/apache/spark/ErrorInfo.scala|  4 
 .../scala/org/apache/spark/SparkException.scala| 20 -
 .../org/apache/spark/SparkThrowableSuite.scala | 17 +++
 .../org/apache/spark/sql/AnalysisException.scala   |  1 -
 .../spark/sql/errors/QueryExecutionErrors.scala| 25 --
 9 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/SparkThrowable.java 
b/core/src/main/java/org/apache/spark/SparkThrowable.java
index 31a9ab0..2be0c3c 100644
--- a/core/src/main/java/org/apache/spark/SparkThrowable.java
+++ b/core/src/main/java/org/apache/spark/SparkThrowable.java
@@ -38,5 +38,12 @@ public interface SparkThrowable {
 
   // Portable error identifier across SQL engines
   // If null, error class or SQLSTATE is not set
-  String getSqlState();
+  default String getSqlState() {
+return SparkThrowableHelper.getSqlState(this.getErrorClass());
+  }
+
+  // True if this error is an internal error.
+  default boolean isInternalError() {
+return SparkThrowableHelper.isInternalError(this.getErrorClass());
+  }
 }
diff --git 
a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java 
b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
index bf7984b..88eada3 100644
--- a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
+++ b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java
@@ -47,8 +47,4 @@ public final class SparkOutOfMemoryError extends 
OutOfMemoryError implements Spa
 public String getErrorClass() {
 return errorClass;
 }
-
-public String getSqlState() {
-return SparkThrowableHelper.getSqlState(errorClass);
-}
 }
diff --git a/core/src/main/resources/error/README.md 
b/core/src/main/resources/error/README.md
index c4ab6bb..f58eb6d 100644
--- a/core/src/main/resources/error/README.md
+++ b/core/src/main/resources/error/README.md
@@ -5,13 +5,16 @@ and message parameters rather than an arbitrary error message.
 
 ## Usage
 
-1. Check if an appropriate error class already exists in `error-class.json`.
-   If true, skip to step 3. Otherwise, continue to step 2.
-2. Add a new class to `error-class.json`; keep in mind the invariants below.
-3. Check if the exception type already extends `SparkThrowable`.
-   If true, skip to step 5. Otherwise, continue to step 4.
-4. Mix `SparkThrowable` into the exception.
-5. Throw the exception with the error class and message parameters.
+1. Check if the error is an internal error.
+   Internal errors are bugs in the code that we do not expect users to 
encounter; this does not include unsupported operations.
+   If true, use the error class `INTERNAL_ERROR` and skip to step 4.
+2. Check if an appropriate error class already exists in `error-class.json`.
+   If true, use the error class and skip to step 4.
+3. Add a new class to `error-class.json`; keep in mind the invariants below.
+4. Check if the exception type already extends `SparkThrowable`.
+   If true, skip to step 6.
+5. Mix `SparkThrowable` into the exception.
+6. Throw the exception with the error class and message parameters.
 
 ### Before
 
@@ -37,8 +40,6 @@ Throw with 

[spark] branch master updated: [SPARK-36830][SQL] Support reading and writing ANSI intervals from/to JSON datasources

2021-09-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c15580  [SPARK-36830][SQL] Support reading and writing ANSI intervals 
from/to JSON datasources
7c15580 is described below

commit 7c155806ed6e1a2488d4ec2fa8da620318fea8dd
Author: Kousuke Saruta 
AuthorDate: Thu Sep 30 20:22:06 2021 +0300

[SPARK-36830][SQL] Support reading and writing ANSI intervals from/to JSON 
datasources

### What changes were proposed in this pull request?

This PR aims to support reading and writing ANSI intervals from/to JSON 
datasources.
Aith this change, a interval data is written as a literal form like 
`{"col":"INTERVAL '1-2' YEAR TO MONTH"}`.
For the reading part, we need to specify the schema explicitly like:
```
val readDF = spark.read.schema("col INTERVAL YEAR TO MONTH").json(...)
```

### Why are the changes needed?

For better usability. There should be no reason to prohibit from 
reading/writing ANSI intervals from/to JSON datasources.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test. It covers both V1 and V2 sources.

Closes #34155 from sarutak/ansi-interval-json-source.

Authored-by: Kousuke Saruta 
Signed-off-by: Max Gekk 
---
 .../sql/execution/datasources/DataSource.scala  |  2 +-
 .../execution/datasources/json/JsonFileFormat.scala |  2 --
 .../execution/datasources/v2/json/JsonTable.scala   |  2 --
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala  | 19 ---
 .../datasources/CommonFileDataSourceSuite.scala |  2 +-
 .../sql/execution/datasources/json/JsonSuite.scala  | 21 -
 6 files changed, 22 insertions(+), 26 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
index be9a912..32913c6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
@@ -581,7 +581,7 @@ case class DataSource(
 
   // TODO: Remove the Set below once all the built-in datasources support ANSI 
interval types
   private val writeAllowedSources: Set[Class[_]] =
-Set(classOf[ParquetFileFormat], classOf[CSVFileFormat])
+Set(classOf[ParquetFileFormat], classOf[CSVFileFormat], 
classOf[JsonFileFormat])
 
   private def disallowWritingIntervals(
   dataTypes: Seq[DataType],
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
index 8357a41..9c6c77a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
@@ -134,8 +134,6 @@ class JsonFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
   override def equals(other: Any): Boolean = other.isInstanceOf[JsonFileFormat]
 
   override def supportDataType(dataType: DataType): Boolean = dataType match {
-case _: AnsiIntervalType => false
-
 case _: AtomicType => true
 
 case st: StructType => st.forall { f => supportDataType(f.dataType) }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
index 244dd0f..5216800 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala
@@ -55,8 +55,6 @@ case class JsonTable(
 }
 
   override def supportsDataType(dataType: DataType): Boolean = dataType match {
-case _: AnsiIntervalType => false
-
 case _: AtomicType => true
 
 case st: StructType => st.forall { f => supportsDataType(f.dataType) }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 22e3e33..7b2c0bb 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -1445,32 +1445,13 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 
 val ymDF = sql("select interval 3 years -3 month")
 checkAnswer(ymDF, Row(Period.of(2, 9, 0)))
-withTempPath(f => {
-  val e = intercept[AnalysisException] {
-ymDF.write.json(f.getCanonicalPath)
-  }
-  

[spark] branch branch-3.2 updated: [SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window

2021-09-30 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 8b2b6bb  [SPARK-36865][PYTHON][DOCS] Add PySpark API document of 
session_window
8b2b6bb is described below

commit 8b2b6bb0d3d26c5d1d121136b5916e5aeac1ade9
Author: Kousuke Saruta 
AuthorDate: Thu Sep 30 16:51:12 2021 +0900

[SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window

### What changes were proposed in this pull request?

This PR adds PySpark API document of `session_window`.
The docstring of the function doesn't comply with numpydoc format so this 
PR also fix it.
Further, the API document of `window` doesn't have `Parameters` section so 
it's also added in this PR.

### Why are the changes needed?

To provide PySpark users with the API document of the newly added function.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`make html` in `python/docs` and get the following docs.

[window]

![time-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963797-ce25b268-20ca-48e3-ac8d-cbcbd85ebb3e.png)

[session_window]

![session-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963853-dd9d8417-139b-41ee-9924-14544b1a91af.png)

Closes #34118 from sarutak/python-session-window-doc.

Authored-by: Kousuke Saruta 
Signed-off-by: Jungtaek Lim 
(cherry picked from commit 5a32e41e9c992c6f08a48454d783e7cd97c971fc)
Signed-off-by: Jungtaek Lim 
---
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/functions.py  | 35 
 2 files changed, 36 insertions(+)

diff --git a/python/docs/source/reference/pyspark.sql.rst 
b/python/docs/source/reference/pyspark.sql.rst
index 13c489b..ca4a95a 100644
--- a/python/docs/source/reference/pyspark.sql.rst
+++ b/python/docs/source/reference/pyspark.sql.rst
@@ -497,6 +497,7 @@ Functions
 second
 sentences
 sequence
+session_window
 sha1
 sha2
 shiftleft
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index fa96ea6..c7bc581 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2300,6 +2300,29 @@ def window(timeColumn, windowDuration, 
slideDuration=None, startTime=None):
 
 .. versionadded:: 2.0.0
 
+Parameters
+--
+timeColumn : :class:`~pyspark.sql.Column`
+The column or the expression to use as the timestamp for windowing by 
time.
+The time column must be of TimestampType.
+windowDuration : str
+A string specifying the width of the window, e.g. `10 minutes`,
+`1 second`. Check `org.apache.spark.unsafe.types.CalendarInterval` for
+valid duration identifiers. Note that the duration is a fixed length of
+time, and does not vary over time according to a calendar. For example,
+`1 day` always means 86,400,000 milliseconds, not a calendar day.
+slideDuration : str, optional
+A new window will be generated every `slideDuration`. Must be less than
+or equal to the `windowDuration`. Check
+`org.apache.spark.unsafe.types.CalendarInterval` for valid duration
+identifiers. This duration is likewise absolute, and does not vary
+according to a calendar.
+startTime : str, optional
+The offset with respect to 1970-01-01 00:00:00 UTC with which to start
+window intervals. For example, in order to have hourly tumbling 
windows that
+start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... 
provide
+`startTime` as `15 minutes`.
+
 Examples
 
 >>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", 
"val")
@@ -2347,7 +2370,19 @@ def session_window(timeColumn, gapDuration):
 input row.
 The output column will be a struct called 'session_window' by default with 
the nested columns
 'start' and 'end', where 'start' and 'end' will be of 
:class:`pyspark.sql.types.TimestampType`.
+
 .. versionadded:: 3.2.0
+
+Parameters
+--
+timeColumn : :class:`~pyspark.sql.Column`
+The column or the expression to use as the timestamp for windowing by 
time.
+The time column must be of TimestampType.
+gapDuration : :class:`~pyspark.sql.Column` or str
+A column or string specifying the timeout of the session. It could be 
static value,
+e.g. `10 minutes`, `1 second`, or an expression/UDF that specifies gap
+duration dynamically based on the input row.
+
 Examples
 
 >>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", 
"val")


[spark] branch master updated (17e3ca6 -> 5a32e41)

2021-09-30 Thread kabhwan
This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 17e3ca6  [SPARK-36899][R] Support ILIKE API on R
 add 5a32e41  [SPARK-36865][PYTHON][DOCS] Add PySpark API document of 
session_window

No new revisions were added by this update.

Summary of changes:
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/functions.py  | 35 
 2 files changed, 36 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org