[spark] branch master updated (4aeddb8 -> 13ddc91)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4aeddb8 [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class add 13ddc91 [SPARK-36435][PYTHON] Implement MultIndex.equal_levels No new revisions were added by this update. Summary of changes: .../source/reference/pyspark.pandas/indexing.rst | 1 + python/pyspark/pandas/indexes/multi.py | 37 +- python/pyspark/pandas/missing/indexes.py | 1 - python/pyspark/pandas/tests/indexes/test_base.py | 35 4 files changed, 72 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4aeddb8 [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class 4aeddb8 is described below commit 4aeddb81d1cc9c464d5dddbd4d54dda4836a6078 Author: Karen Feng AuthorDate: Fri Oct 1 13:46:45 2021 +0900 [SPARK-36870][SQL] Introduce INTERNAL_ERROR error class ### What changes were proposed in this pull request? Adds the `INTERNAL_ERROR` error class and the `isInternalError` API to `SparkThrowable`. Removes existing error classes that are internal-only and replaces them with `INTERNAL_ERROR`. ### Why are the changes needed? Makes it easy for end-users to diagnose whether an error is an internal error. If an end-user encounters an internal error, it should be reported immediately. This also limits the number of error classes, making it easy to audit. We do not need high-quality error messages for internal errors, as they should not be exposed to the end-user. ### Does this PR introduce _any_ user-facing change? Yes; this changes the error class in master. ### How was this patch tested? Unit tests Closes #34123 from karenfeng/internal-error-class. Authored-by: Karen Feng Signed-off-by: Hyukjin Kwon --- .../main/java/org/apache/spark/SparkThrowable.java | 9 +++- .../apache/spark/memory/SparkOutOfMemoryError.java | 4 core/src/main/resources/error/README.md| 19 core/src/main/resources/error/error-classes.json | 19 +++- .../main/scala/org/apache/spark/ErrorInfo.scala| 4 .../scala/org/apache/spark/SparkException.scala| 20 - .../org/apache/spark/SparkThrowableSuite.scala | 17 +++ .../org/apache/spark/sql/AnalysisException.scala | 1 - .../spark/sql/errors/QueryExecutionErrors.scala| 25 -- 9 files changed, 56 insertions(+), 62 deletions(-) diff --git a/core/src/main/java/org/apache/spark/SparkThrowable.java b/core/src/main/java/org/apache/spark/SparkThrowable.java index 31a9ab0..2be0c3c 100644 --- a/core/src/main/java/org/apache/spark/SparkThrowable.java +++ b/core/src/main/java/org/apache/spark/SparkThrowable.java @@ -38,5 +38,12 @@ public interface SparkThrowable { // Portable error identifier across SQL engines // If null, error class or SQLSTATE is not set - String getSqlState(); + default String getSqlState() { +return SparkThrowableHelper.getSqlState(this.getErrorClass()); + } + + // True if this error is an internal error. + default boolean isInternalError() { +return SparkThrowableHelper.isInternalError(this.getErrorClass()); + } } diff --git a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java index bf7984b..88eada3 100644 --- a/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java +++ b/core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java @@ -47,8 +47,4 @@ public final class SparkOutOfMemoryError extends OutOfMemoryError implements Spa public String getErrorClass() { return errorClass; } - -public String getSqlState() { -return SparkThrowableHelper.getSqlState(errorClass); -} } diff --git a/core/src/main/resources/error/README.md b/core/src/main/resources/error/README.md index c4ab6bb..f58eb6d 100644 --- a/core/src/main/resources/error/README.md +++ b/core/src/main/resources/error/README.md @@ -5,13 +5,16 @@ and message parameters rather than an arbitrary error message. ## Usage -1. Check if an appropriate error class already exists in `error-class.json`. - If true, skip to step 3. Otherwise, continue to step 2. -2. Add a new class to `error-class.json`; keep in mind the invariants below. -3. Check if the exception type already extends `SparkThrowable`. - If true, skip to step 5. Otherwise, continue to step 4. -4. Mix `SparkThrowable` into the exception. -5. Throw the exception with the error class and message parameters. +1. Check if the error is an internal error. + Internal errors are bugs in the code that we do not expect users to encounter; this does not include unsupported operations. + If true, use the error class `INTERNAL_ERROR` and skip to step 4. +2. Check if an appropriate error class already exists in `error-class.json`. + If true, use the error class and skip to step 4. +3. Add a new class to `error-class.json`; keep in mind the invariants below. +4. Check if the exception type already extends `SparkThrowable`. + If true, skip to step 6. +5. Mix `SparkThrowable` into the exception. +6. Throw the exception with the error class and message parameters. ### Before @@ -37,8 +40,6 @@ Throw with
[spark] branch master updated: [SPARK-36830][SQL] Support reading and writing ANSI intervals from/to JSON datasources
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7c15580 [SPARK-36830][SQL] Support reading and writing ANSI intervals from/to JSON datasources 7c15580 is described below commit 7c155806ed6e1a2488d4ec2fa8da620318fea8dd Author: Kousuke Saruta AuthorDate: Thu Sep 30 20:22:06 2021 +0300 [SPARK-36830][SQL] Support reading and writing ANSI intervals from/to JSON datasources ### What changes were proposed in this pull request? This PR aims to support reading and writing ANSI intervals from/to JSON datasources. Aith this change, a interval data is written as a literal form like `{"col":"INTERVAL '1-2' YEAR TO MONTH"}`. For the reading part, we need to specify the schema explicitly like: ``` val readDF = spark.read.schema("col INTERVAL YEAR TO MONTH").json(...) ``` ### Why are the changes needed? For better usability. There should be no reason to prohibit from reading/writing ANSI intervals from/to JSON datasources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. It covers both V1 and V2 sources. Closes #34155 from sarutak/ansi-interval-json-source. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../sql/execution/datasources/DataSource.scala | 2 +- .../execution/datasources/json/JsonFileFormat.scala | 2 -- .../execution/datasources/v2/json/JsonTable.scala | 2 -- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 19 --- .../datasources/CommonFileDataSourceSuite.scala | 2 +- .../sql/execution/datasources/json/JsonSuite.scala | 21 - 6 files changed, 22 insertions(+), 26 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index be9a912..32913c6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -581,7 +581,7 @@ case class DataSource( // TODO: Remove the Set below once all the built-in datasources support ANSI interval types private val writeAllowedSources: Set[Class[_]] = -Set(classOf[ParquetFileFormat], classOf[CSVFileFormat]) +Set(classOf[ParquetFileFormat], classOf[CSVFileFormat], classOf[JsonFileFormat]) private def disallowWritingIntervals( dataTypes: Seq[DataType], diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala index 8357a41..9c6c77a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala @@ -134,8 +134,6 @@ class JsonFileFormat extends TextBasedFileFormat with DataSourceRegister { override def equals(other: Any): Boolean = other.isInstanceOf[JsonFileFormat] override def supportDataType(dataType: DataType): Boolean = dataType match { -case _: AnsiIntervalType => false - case _: AtomicType => true case st: StructType => st.forall { f => supportDataType(f.dataType) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala index 244dd0f..5216800 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonTable.scala @@ -55,8 +55,6 @@ case class JsonTable( } override def supportsDataType(dataType: DataType): Boolean = dataType match { -case _: AnsiIntervalType => false - case _: AtomicType => true case st: StructType => st.forall { f => supportsDataType(f.dataType) } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 22e3e33..7b2c0bb 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -1445,32 +1445,13 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark val ymDF = sql("select interval 3 years -3 month") checkAnswer(ymDF, Row(Period.of(2, 9, 0))) -withTempPath(f => { - val e = intercept[AnalysisException] { -ymDF.write.json(f.getCanonicalPath) - } -
[spark] branch branch-3.2 updated: [SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 8b2b6bb [SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window 8b2b6bb is described below commit 8b2b6bb0d3d26c5d1d121136b5916e5aeac1ade9 Author: Kousuke Saruta AuthorDate: Thu Sep 30 16:51:12 2021 +0900 [SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window ### What changes were proposed in this pull request? This PR adds PySpark API document of `session_window`. The docstring of the function doesn't comply with numpydoc format so this PR also fix it. Further, the API document of `window` doesn't have `Parameters` section so it's also added in this PR. ### Why are the changes needed? To provide PySpark users with the API document of the newly added function. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `make html` in `python/docs` and get the following docs. [window] ![time-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963797-ce25b268-20ca-48e3-ac8d-cbcbd85ebb3e.png) [session_window] ![session-window-python-doc-after](https://user-images.githubusercontent.com/4736016/134963853-dd9d8417-139b-41ee-9924-14544b1a91af.png) Closes #34118 from sarutak/python-session-window-doc. Authored-by: Kousuke Saruta Signed-off-by: Jungtaek Lim (cherry picked from commit 5a32e41e9c992c6f08a48454d783e7cd97c971fc) Signed-off-by: Jungtaek Lim --- python/docs/source/reference/pyspark.sql.rst | 1 + python/pyspark/sql/functions.py | 35 2 files changed, 36 insertions(+) diff --git a/python/docs/source/reference/pyspark.sql.rst b/python/docs/source/reference/pyspark.sql.rst index 13c489b..ca4a95a 100644 --- a/python/docs/source/reference/pyspark.sql.rst +++ b/python/docs/source/reference/pyspark.sql.rst @@ -497,6 +497,7 @@ Functions second sentences sequence +session_window sha1 sha2 shiftleft diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index fa96ea6..c7bc581 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -2300,6 +2300,29 @@ def window(timeColumn, windowDuration, slideDuration=None, startTime=None): .. versionadded:: 2.0.0 +Parameters +-- +timeColumn : :class:`~pyspark.sql.Column` +The column or the expression to use as the timestamp for windowing by time. +The time column must be of TimestampType. +windowDuration : str +A string specifying the width of the window, e.g. `10 minutes`, +`1 second`. Check `org.apache.spark.unsafe.types.CalendarInterval` for +valid duration identifiers. Note that the duration is a fixed length of +time, and does not vary over time according to a calendar. For example, +`1 day` always means 86,400,000 milliseconds, not a calendar day. +slideDuration : str, optional +A new window will be generated every `slideDuration`. Must be less than +or equal to the `windowDuration`. Check +`org.apache.spark.unsafe.types.CalendarInterval` for valid duration +identifiers. This duration is likewise absolute, and does not vary +according to a calendar. +startTime : str, optional +The offset with respect to 1970-01-01 00:00:00 UTC with which to start +window intervals. For example, in order to have hourly tumbling windows that +start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide +`startTime` as `15 minutes`. + Examples >>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", "val") @@ -2347,7 +2370,19 @@ def session_window(timeColumn, gapDuration): input row. The output column will be a struct called 'session_window' by default with the nested columns 'start' and 'end', where 'start' and 'end' will be of :class:`pyspark.sql.types.TimestampType`. + .. versionadded:: 3.2.0 + +Parameters +-- +timeColumn : :class:`~pyspark.sql.Column` +The column or the expression to use as the timestamp for windowing by time. +The time column must be of TimestampType. +gapDuration : :class:`~pyspark.sql.Column` or str +A column or string specifying the timeout of the session. It could be static value, +e.g. `10 minutes`, `1 second`, or an expression/UDF that specifies gap +duration dynamically based on the input row. + Examples >>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", "val")
[spark] branch master updated (17e3ca6 -> 5a32e41)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 17e3ca6 [SPARK-36899][R] Support ILIKE API on R add 5a32e41 [SPARK-36865][PYTHON][DOCS] Add PySpark API document of session_window No new revisions were added by this update. Summary of changes: python/docs/source/reference/pyspark.sql.rst | 1 + python/pyspark/sql/functions.py | 35 2 files changed, 36 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org