[spark] branch master updated (b5297c4 -> 65286ae)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b5297c4 [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype add 65286ae [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4| 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 65286ae [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment 65286ae is described below commit 65286aec4b3c4e93d8beac6dd1b097ce97d53fd8 Author: ulysses AuthorDate: Wed Jul 8 11:30:47 2020 +0900 [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment ### What changes were proposed in this pull request? Modify the comment of `SqlBase.g4`. ### Why are the changes needed? `docs/sql-keywords.md` has already moved to `docs/sql-ref-ansi-compliance.md#sql-keywords`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No need. Closes #29033 from ulysses-you/SPARK-30703-FOLLOWUP. Authored-by: ulysses Signed-off-by: Takeshi Yamamuro --- .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4| 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 691fde8..b383e03 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -1461,8 +1461,7 @@ nonReserved ; // NOTE: If you add a new token in the list below, you should update the list of keywords -// in `docs/sql-keywords.md`. If the token is a non-reserved keyword, -// please update `ansiNonReserved` and `nonReserved` as well. +// and reserved tag in `docs/sql-ref-ansi-compliance.md#sql-keywords`. // // Start of the keywords list - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9247c3 -> 7b86838)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9247c3 [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka connector executor side to avoid infinite wait add 7b86838 [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 20 +++ .../spark/sql/execution/DataSourceScanExec.scala | 29 ++- .../spark/sql/execution/QueryExecution.scala | 2 + .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++ .../execution/datasources/FileSourceStrategy.scala | 1 + .../org/apache/spark/sql/DataFrameJoinSuite.scala | 2 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 17 ++ .../scala/org/apache/spark/sql/SubquerySuite.scala | 2 +- .../CoalesceBucketsInSortMergeJoinSuite.scala | 194 + .../spark/sql/sources/BucketedReadSuite.scala | 137 ++- 10 files changed, 523 insertions(+), 13 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9247c3 -> 7b86838)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9247c3 [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka connector executor side to avoid infinite wait add 7b86838 [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 20 +++ .../spark/sql/execution/DataSourceScanExec.scala | 29 ++- .../spark/sql/execution/QueryExecution.scala | 2 + .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++ .../execution/datasources/FileSourceStrategy.scala | 1 + .../org/apache/spark/sql/DataFrameJoinSuite.scala | 2 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 17 ++ .../scala/org/apache/spark/sql/SubquerySuite.scala | 2 +- .../CoalesceBucketsInSortMergeJoinSuite.scala | 194 + .../spark/sql/sources/BucketedReadSuite.scala | 137 ++- 10 files changed, 523 insertions(+), 13 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a9247c3 -> 7b86838)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a9247c3 [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka connector executor side to avoid infinite wait add 7b86838 [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 20 +++ .../spark/sql/execution/DataSourceScanExec.scala | 29 ++- .../spark/sql/execution/QueryExecution.scala | 2 + .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++ .../execution/datasources/FileSourceStrategy.scala | 1 + .../org/apache/spark/sql/DataFrameJoinSuite.scala | 2 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 17 ++ .../scala/org/apache/spark/sql/SubquerySuite.scala | 2 +- .../CoalesceBucketsInSortMergeJoinSuite.scala | 194 + .../spark/sql/sources/BucketedReadSuite.scala | 137 ++- 10 files changed, 523 insertions(+), 13 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit ed69190ce0762f3b741b8d175ef8d02da45f3183 Author: Takeshi Yamamuro AuthorDate: Tue Jun 16 00:27:45 2020 +0900 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index eab194c..e5ca7e9d 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL. |ALTER|non-reserved|non-reserved|reserved| |ANALYZE|non-reserved|non-reserved|non-reserved| |AND|reserved|non-reserved|reserved| -|ANTI|reserved|strict-non-reserved|non-reserved| +|ANTI|non-reserved|strict-non-reserved|non-reserved| |ANY|reserved|non-reserved|reserved| |ARCHIVE|non-reserved|non-reserved|non-reserved| |ARRAY|non-reserved|non-reserved|reserved| @@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL. |MAP|non-reserved|non-reserved|non-reserved| |MATCHED|non-reserved|non-reserved|non-reserved| |MERGE|non-reserved|non-reserved|non-reserved| -|MINUS|reserved|strict-non-reserved|non-reserved| +|MINUS|not-reserved|strict-non-reserved|non-reserved| |MINUTE|reserved|non-reserved|reserved| |MONTH|reserved|non-reserved|reserved| |MSCK|non-reserved|non-reserved|non-reserved| @@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL. |SCHEMA|non-reserved|non-reserved|non-reserved| |SECOND|reserved|non-reserved|reserved| |SELECT|reserved|non-reserved|reserved| -|SEMI|reserved|strict-non-reserved|non-reserved| +|SEMI|non-reserved|strict-non-reserved|non-reserved| |SEPARATED|non-reserved|non-reserved|non-reserved| |SERDE|non-reserved|non-reserved|non-reserved| |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 14a6687..5821a74 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -994,6 +994,7 @@ ansiNonReserved | AFTER | ALTER | ANALYZE +| ANTI | ARCHIVE | ARRAY | ASC @@ -1126,10 +1127,12 @@ ansiNonReserved | ROW | ROWS | SCHEMA +| SEMI | SEPARATED | SERDE | SERDEPROPERTIES | SET +| SETMINUS | SETS | SHOW | SKEWED diff --git a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt new file mode 100644 index 000..921491a --- /dev/null +++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt @@ -0,0 +1,401 @@ +-- This file comes from: https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords +ABS +ACOS +ALL +ALLOCATE +ALTER +AND +ANY +ARE +ARRAY +ARRAY_AGG +ARRAY_MAX_CARDINALITY +AS +ASENSITIVE +ASIN +ASYMMETRIC +AT +ATAN +ATOMIC +AUTHORIZATION +AVG +BEGIN +BEGIN_FRAME +BEGIN_PARTITION +BETWEEN +BIGINT +BINARY +BLOB +BOOLEAN +BOTH +BY +CALL +CALLED +CARDINALITY +CASCADED +CASE +CAST +CEIL +CEILING +CHAR +CHAR_LENGTH +CHARACTER +CHARACTER_LENGTH +CHECK +CLASSIFIER +CLOB +CLOSE +COALESCE +COLLATE +COLLECT +COLUMN +COMMIT +CONDITION +CONNECT +CONSTRAINT +CONTAINS +CONVERT +COPY +CORR +CORRESPONDING +COS +COSH +COUNT +COVAR_POP +COVAR_SAMP +CREATE +CROSS +CUBE +CUME_DIST +CURRENT +CURRENT_CATALOG +CURRENT_DATE +CURRENT_DEFAULT_TRANSFORM_GROUP +CURRENT_PATH +CURRENT_ROLE +CURRENT_ROW +CURRENT_SCHEMA +CURRENT_TIME +CURRENT_TIMESTAMP +CURRENT_TRANSFORM_GROUP_FOR_TYPE +CURRENT_USER +CURSOR +CYCLE +DATE +DAY +DEALLOCATE +DEC +DECIMAL +DECFLOAT +DECLARE +DEFAULT +DEFINE +DELETE +DENSE_RANK +DEREF +DESCRIBE +DETERMINISTIC +DISCONNECT +DISTINCT +DOUBLE +DROP +DYNAMIC +EACH +ELEMENT +ELSE +EMPTY +END +END_FRAME +END_PARTITION +END-EXEC +EQUALS +ESCAPE +EVERY +EXCEPT +EXEC +EXECUTE +EXISTS +EXP +EXTERNAL +EXTRACT +FALSE +FETCH +FILTER +FIRST_VALUE
[spark] branch branch-3.0 updated (764da2f -> ed69190)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 764da2f [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates new b70c68a [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file new ed69190 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 7 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 ++ .../parser/TableIdentifierParserSuite.scala| 452 ++--- 4 files changed, 537 insertions(+), 329 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit b70c68ae458d929cbf28a084cecf8252b4a3849f Author: Takeshi Yamamuro AuthorDate: Sat Jun 13 07:12:27 2020 +0900 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file ### What changes were proposed in this pull request? This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly. This approach is based on the cloud-fan suggestion: https://github.com/apache/spark/pull/28779#issuecomment-642033217 ### Why are the changes needed? It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28802 from maropu/SPARK-31950-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 208a503..14a6687 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -989,6 +989,7 @@ alterColumnAction // You can find the full keywords list by searching "Start of the keywords list" in this file. // The non-reserved keywords are listed below. Keywords not in this list are reserved keywords. ansiNonReserved +//--ANSI-NON-RESERVED-START : ADD | AFTER | ALTER @@ -1165,6 +1166,7 @@ ansiNonReserved | VIEW | VIEWS | WINDOW +//--ANSI-NON-RESERVED-END ; // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords in Spark SQL. @@ -1442,6 +1444,7 @@ nonReserved // // Start of the keywords list // +//--SPARK-KEYWORD-LIST-START ADD: 'ADD'; AFTER: 'AFTER'; ALL: 'ALL'; @@ -1694,6 +1697,7 @@ WHERE: 'WHERE'; WINDOW: 'WINDOW'; WITH: 'WITH'; YEAR: 'YEAR'; +//--SPARK-KEYWORD-LIST-END // // End of the keywords list // diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index bd617bf..04969e3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -16,9 +16,14 @@ */ package org.apache.spark.sql.catalyst.parser +import java.util.Locale + +import scala.collection.mutable + import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.catalyst.util.fileToString import org.apache.spark.sql.internal.SQLConf class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { @@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "where", "with") - // All the keywords in `docs/sql-keywords.md` are listed below: - val allCandidateKeywords = Set( -"add", -"after", -"all", -"alter", -"analyze", -"and", -"anti", -"any", -"archive", -"array", -"as", -"asc", -"at", -"authorization", -"between", -"both", -"bucket", -"buckets", -"by", -"cache", -"cascade", -"case", -"cast", -"change", -"check", -"clear", -"cluster", -"clustered", -"codegen", -"collate", -"collection", -"column", -"columns", -"comment", -"commit", -"compact", -"compactions", -"compute", -"concatenate", -"constraint", -"cost", -"create", -"cross", -&q
[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit ed69190ce0762f3b741b8d175ef8d02da45f3183 Author: Takeshi Yamamuro AuthorDate: Tue Jun 16 00:27:45 2020 +0900 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index eab194c..e5ca7e9d 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL. |ALTER|non-reserved|non-reserved|reserved| |ANALYZE|non-reserved|non-reserved|non-reserved| |AND|reserved|non-reserved|reserved| -|ANTI|reserved|strict-non-reserved|non-reserved| +|ANTI|non-reserved|strict-non-reserved|non-reserved| |ANY|reserved|non-reserved|reserved| |ARCHIVE|non-reserved|non-reserved|non-reserved| |ARRAY|non-reserved|non-reserved|reserved| @@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL. |MAP|non-reserved|non-reserved|non-reserved| |MATCHED|non-reserved|non-reserved|non-reserved| |MERGE|non-reserved|non-reserved|non-reserved| -|MINUS|reserved|strict-non-reserved|non-reserved| +|MINUS|not-reserved|strict-non-reserved|non-reserved| |MINUTE|reserved|non-reserved|reserved| |MONTH|reserved|non-reserved|reserved| |MSCK|non-reserved|non-reserved|non-reserved| @@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL. |SCHEMA|non-reserved|non-reserved|non-reserved| |SECOND|reserved|non-reserved|reserved| |SELECT|reserved|non-reserved|reserved| -|SEMI|reserved|strict-non-reserved|non-reserved| +|SEMI|non-reserved|strict-non-reserved|non-reserved| |SEPARATED|non-reserved|non-reserved|non-reserved| |SERDE|non-reserved|non-reserved|non-reserved| |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 14a6687..5821a74 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -994,6 +994,7 @@ ansiNonReserved | AFTER | ALTER | ANALYZE +| ANTI | ARCHIVE | ARRAY | ASC @@ -1126,10 +1127,12 @@ ansiNonReserved | ROW | ROWS | SCHEMA +| SEMI | SEPARATED | SERDE | SERDEPROPERTIES | SET +| SETMINUS | SETS | SHOW | SKEWED diff --git a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt new file mode 100644 index 000..921491a --- /dev/null +++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt @@ -0,0 +1,401 @@ +-- This file comes from: https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords +ABS +ACOS +ALL +ALLOCATE +ALTER +AND +ANY +ARE +ARRAY +ARRAY_AGG +ARRAY_MAX_CARDINALITY +AS +ASENSITIVE +ASIN +ASYMMETRIC +AT +ATAN +ATOMIC +AUTHORIZATION +AVG +BEGIN +BEGIN_FRAME +BEGIN_PARTITION +BETWEEN +BIGINT +BINARY +BLOB +BOOLEAN +BOTH +BY +CALL +CALLED +CARDINALITY +CASCADED +CASE +CAST +CEIL +CEILING +CHAR +CHAR_LENGTH +CHARACTER +CHARACTER_LENGTH +CHECK +CLASSIFIER +CLOB +CLOSE +COALESCE +COLLATE +COLLECT +COLUMN +COMMIT +CONDITION +CONNECT +CONSTRAINT +CONTAINS +CONVERT +COPY +CORR +CORRESPONDING +COS +COSH +COUNT +COVAR_POP +COVAR_SAMP +CREATE +CROSS +CUBE +CUME_DIST +CURRENT +CURRENT_CATALOG +CURRENT_DATE +CURRENT_DEFAULT_TRANSFORM_GROUP +CURRENT_PATH +CURRENT_ROLE +CURRENT_ROW +CURRENT_SCHEMA +CURRENT_TIME +CURRENT_TIMESTAMP +CURRENT_TRANSFORM_GROUP_FOR_TYPE +CURRENT_USER +CURSOR +CYCLE +DATE +DAY +DEALLOCATE +DEC +DECIMAL +DECFLOAT +DECLARE +DEFAULT +DEFINE +DELETE +DENSE_RANK +DEREF +DESCRIBE +DETERMINISTIC +DISCONNECT +DISTINCT +DOUBLE +DROP +DYNAMIC +EACH +ELEMENT +ELSE +EMPTY +END +END_FRAME +END_PARTITION +END-EXEC +EQUALS +ESCAPE +EVERY +EXCEPT +EXEC +EXECUTE +EXISTS +EXP +EXTERNAL +EXTRACT +FALSE +FETCH +FILTER +FIRST_VALUE
[spark] branch branch-3.0 updated (764da2f -> ed69190)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 764da2f [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates new b70c68a [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file new ed69190 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 7 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 ++ .../parser/TableIdentifierParserSuite.scala| 452 ++--- 4 files changed, 537 insertions(+), 329 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit b70c68ae458d929cbf28a084cecf8252b4a3849f Author: Takeshi Yamamuro AuthorDate: Sat Jun 13 07:12:27 2020 +0900 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file ### What changes were proposed in this pull request? This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly. This approach is based on the cloud-fan suggestion: https://github.com/apache/spark/pull/28779#issuecomment-642033217 ### Why are the changes needed? It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28802 from maropu/SPARK-31950-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 208a503..14a6687 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -989,6 +989,7 @@ alterColumnAction // You can find the full keywords list by searching "Start of the keywords list" in this file. // The non-reserved keywords are listed below. Keywords not in this list are reserved keywords. ansiNonReserved +//--ANSI-NON-RESERVED-START : ADD | AFTER | ALTER @@ -1165,6 +1166,7 @@ ansiNonReserved | VIEW | VIEWS | WINDOW +//--ANSI-NON-RESERVED-END ; // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords in Spark SQL. @@ -1442,6 +1444,7 @@ nonReserved // // Start of the keywords list // +//--SPARK-KEYWORD-LIST-START ADD: 'ADD'; AFTER: 'AFTER'; ALL: 'ALL'; @@ -1694,6 +1697,7 @@ WHERE: 'WHERE'; WINDOW: 'WINDOW'; WITH: 'WITH'; YEAR: 'YEAR'; +//--SPARK-KEYWORD-LIST-END // // End of the keywords list // diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index bd617bf..04969e3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -16,9 +16,14 @@ */ package org.apache.spark.sql.catalyst.parser +import java.util.Locale + +import scala.collection.mutable + import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.catalyst.util.fileToString import org.apache.spark.sql.internal.SQLConf class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { @@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "where", "with") - // All the keywords in `docs/sql-keywords.md` are listed below: - val allCandidateKeywords = Set( -"add", -"after", -"all", -"alter", -"analyze", -"and", -"anti", -"any", -"archive", -"array", -"as", -"asc", -"at", -"authorization", -"between", -"both", -"bucket", -"buckets", -"by", -"cache", -"cascade", -"case", -"cast", -"change", -"check", -"clear", -"cluster", -"clustered", -"codegen", -"collate", -"collection", -"column", -"columns", -"comment", -"commit", -"compact", -"compactions", -"compute", -"concatenate", -"constraint", -"cost", -"create", -"cross", -&q
[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit b70c68ae458d929cbf28a084cecf8252b4a3849f Author: Takeshi Yamamuro AuthorDate: Sat Jun 13 07:12:27 2020 +0900 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file ### What changes were proposed in this pull request? This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly. This approach is based on the cloud-fan suggestion: https://github.com/apache/spark/pull/28779#issuecomment-642033217 ### Why are the changes needed? It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28802 from maropu/SPARK-31950-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 208a503..14a6687 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -989,6 +989,7 @@ alterColumnAction // You can find the full keywords list by searching "Start of the keywords list" in this file. // The non-reserved keywords are listed below. Keywords not in this list are reserved keywords. ansiNonReserved +//--ANSI-NON-RESERVED-START : ADD | AFTER | ALTER @@ -1165,6 +1166,7 @@ ansiNonReserved | VIEW | VIEWS | WINDOW +//--ANSI-NON-RESERVED-END ; // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords in Spark SQL. @@ -1442,6 +1444,7 @@ nonReserved // // Start of the keywords list // +//--SPARK-KEYWORD-LIST-START ADD: 'ADD'; AFTER: 'AFTER'; ALL: 'ALL'; @@ -1694,6 +1697,7 @@ WHERE: 'WHERE'; WINDOW: 'WINDOW'; WITH: 'WITH'; YEAR: 'YEAR'; +//--SPARK-KEYWORD-LIST-END // // End of the keywords list // diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index bd617bf..04969e3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -16,9 +16,14 @@ */ package org.apache.spark.sql.catalyst.parser +import java.util.Locale + +import scala.collection.mutable + import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.catalyst.util.fileToString import org.apache.spark.sql.internal.SQLConf class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { @@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "where", "with") - // All the keywords in `docs/sql-keywords.md` are listed below: - val allCandidateKeywords = Set( -"add", -"after", -"all", -"alter", -"analyze", -"and", -"anti", -"any", -"archive", -"array", -"as", -"asc", -"at", -"authorization", -"between", -"both", -"bucket", -"buckets", -"by", -"cache", -"cascade", -"case", -"cast", -"change", -"check", -"clear", -"cluster", -"clustered", -"codegen", -"collate", -"collection", -"column", -"columns", -"comment", -"commit", -"compact", -"compactions", -"compute", -"concatenate", -"constraint", -"cost", -"create", -"cross", -&q
[spark] branch branch-3.0 updated (764da2f -> ed69190)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 764da2f [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates new b70c68a [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file new ed69190 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 7 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 ++ .../parser/TableIdentifierParserSuite.scala| 452 ++--- 4 files changed, 537 insertions(+), 329 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit ed69190ce0762f3b741b8d175ef8d02da45f3183 Author: Takeshi Yamamuro AuthorDate: Tue Jun 16 00:27:45 2020 +0900 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index eab194c..e5ca7e9d 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL. |ALTER|non-reserved|non-reserved|reserved| |ANALYZE|non-reserved|non-reserved|non-reserved| |AND|reserved|non-reserved|reserved| -|ANTI|reserved|strict-non-reserved|non-reserved| +|ANTI|non-reserved|strict-non-reserved|non-reserved| |ANY|reserved|non-reserved|reserved| |ARCHIVE|non-reserved|non-reserved|non-reserved| |ARRAY|non-reserved|non-reserved|reserved| @@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL. |MAP|non-reserved|non-reserved|non-reserved| |MATCHED|non-reserved|non-reserved|non-reserved| |MERGE|non-reserved|non-reserved|non-reserved| -|MINUS|reserved|strict-non-reserved|non-reserved| +|MINUS|not-reserved|strict-non-reserved|non-reserved| |MINUTE|reserved|non-reserved|reserved| |MONTH|reserved|non-reserved|reserved| |MSCK|non-reserved|non-reserved|non-reserved| @@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL. |SCHEMA|non-reserved|non-reserved|non-reserved| |SECOND|reserved|non-reserved|reserved| |SELECT|reserved|non-reserved|reserved| -|SEMI|reserved|strict-non-reserved|non-reserved| +|SEMI|non-reserved|strict-non-reserved|non-reserved| |SEPARATED|non-reserved|non-reserved|non-reserved| |SERDE|non-reserved|non-reserved|non-reserved| |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 14a6687..5821a74 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -994,6 +994,7 @@ ansiNonReserved | AFTER | ALTER | ANALYZE +| ANTI | ARCHIVE | ARRAY | ASC @@ -1126,10 +1127,12 @@ ansiNonReserved | ROW | ROWS | SCHEMA +| SEMI | SEPARATED | SERDE | SERDEPROPERTIES | SET +| SETMINUS | SETS | SHOW | SKEWED diff --git a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt new file mode 100644 index 000..921491a --- /dev/null +++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt @@ -0,0 +1,401 @@ +-- This file comes from: https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords +ABS +ACOS +ALL +ALLOCATE +ALTER +AND +ANY +ARE +ARRAY +ARRAY_AGG +ARRAY_MAX_CARDINALITY +AS +ASENSITIVE +ASIN +ASYMMETRIC +AT +ATAN +ATOMIC +AUTHORIZATION +AVG +BEGIN +BEGIN_FRAME +BEGIN_PARTITION +BETWEEN +BIGINT +BINARY +BLOB +BOOLEAN +BOTH +BY +CALL +CALLED +CARDINALITY +CASCADED +CASE +CAST +CEIL +CEILING +CHAR +CHAR_LENGTH +CHARACTER +CHARACTER_LENGTH +CHECK +CLASSIFIER +CLOB +CLOSE +COALESCE +COLLATE +COLLECT +COLUMN +COMMIT +CONDITION +CONNECT +CONSTRAINT +CONTAINS +CONVERT +COPY +CORR +CORRESPONDING +COS +COSH +COUNT +COVAR_POP +COVAR_SAMP +CREATE +CROSS +CUBE +CUME_DIST +CURRENT +CURRENT_CATALOG +CURRENT_DATE +CURRENT_DEFAULT_TRANSFORM_GROUP +CURRENT_PATH +CURRENT_ROLE +CURRENT_ROW +CURRENT_SCHEMA +CURRENT_TIME +CURRENT_TIMESTAMP +CURRENT_TRANSFORM_GROUP_FOR_TYPE +CURRENT_USER +CURSOR +CYCLE +DATE +DAY +DEALLOCATE +DEC +DECIMAL +DECFLOAT +DECLARE +DEFAULT +DEFINE +DELETE +DENSE_RANK +DEREF +DESCRIBE +DETERMINISTIC +DISCONNECT +DISTINCT +DOUBLE +DROP +DYNAMIC +EACH +ELEMENT +ELSE +EMPTY +END +END_FRAME +END_PARTITION +END-EXEC +EQUALS +ESCAPE +EVERY +EXCEPT +EXEC +EXECUTE +EXISTS +EXP +EXTERNAL +EXTRACT +FALSE +FETCH +FILTER +FIRST_VALUE
[spark] branch branch-3.0 updated (764da2f -> ed69190)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 764da2f [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates new b70c68a [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file new ed69190 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 7 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 ++ .../parser/TableIdentifierParserSuite.scala| 452 ++--- 4 files changed, 537 insertions(+), 329 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit ed69190ce0762f3b741b8d175ef8d02da45f3183 Author: Takeshi Yamamuro AuthorDate: Tue Jun 16 00:27:45 2020 +0900 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index eab194c..e5ca7e9d 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL. |ALTER|non-reserved|non-reserved|reserved| |ANALYZE|non-reserved|non-reserved|non-reserved| |AND|reserved|non-reserved|reserved| -|ANTI|reserved|strict-non-reserved|non-reserved| +|ANTI|non-reserved|strict-non-reserved|non-reserved| |ANY|reserved|non-reserved|reserved| |ARCHIVE|non-reserved|non-reserved|non-reserved| |ARRAY|non-reserved|non-reserved|reserved| @@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL. |MAP|non-reserved|non-reserved|non-reserved| |MATCHED|non-reserved|non-reserved|non-reserved| |MERGE|non-reserved|non-reserved|non-reserved| -|MINUS|reserved|strict-non-reserved|non-reserved| +|MINUS|not-reserved|strict-non-reserved|non-reserved| |MINUTE|reserved|non-reserved|reserved| |MONTH|reserved|non-reserved|reserved| |MSCK|non-reserved|non-reserved|non-reserved| @@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL. |SCHEMA|non-reserved|non-reserved|non-reserved| |SECOND|reserved|non-reserved|reserved| |SELECT|reserved|non-reserved|reserved| -|SEMI|reserved|strict-non-reserved|non-reserved| +|SEMI|non-reserved|strict-non-reserved|non-reserved| |SEPARATED|non-reserved|non-reserved|non-reserved| |SERDE|non-reserved|non-reserved|non-reserved| |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 14a6687..5821a74 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -994,6 +994,7 @@ ansiNonReserved | AFTER | ALTER | ANALYZE +| ANTI | ARCHIVE | ARRAY | ASC @@ -1126,10 +1127,12 @@ ansiNonReserved | ROW | ROWS | SCHEMA +| SEMI | SEPARATED | SERDE | SERDEPROPERTIES | SET +| SETMINUS | SETS | SHOW | SKEWED diff --git a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt new file mode 100644 index 000..921491a --- /dev/null +++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt @@ -0,0 +1,401 @@ +-- This file comes from: https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords +ABS +ACOS +ALL +ALLOCATE +ALTER +AND +ANY +ARE +ARRAY +ARRAY_AGG +ARRAY_MAX_CARDINALITY +AS +ASENSITIVE +ASIN +ASYMMETRIC +AT +ATAN +ATOMIC +AUTHORIZATION +AVG +BEGIN +BEGIN_FRAME +BEGIN_PARTITION +BETWEEN +BIGINT +BINARY +BLOB +BOOLEAN +BOTH +BY +CALL +CALLED +CARDINALITY +CASCADED +CASE +CAST +CEIL +CEILING +CHAR +CHAR_LENGTH +CHARACTER +CHARACTER_LENGTH +CHECK +CLASSIFIER +CLOB +CLOSE +COALESCE +COLLATE +COLLECT +COLUMN +COMMIT +CONDITION +CONNECT +CONSTRAINT +CONTAINS +CONVERT +COPY +CORR +CORRESPONDING +COS +COSH +COUNT +COVAR_POP +COVAR_SAMP +CREATE +CROSS +CUBE +CUME_DIST +CURRENT +CURRENT_CATALOG +CURRENT_DATE +CURRENT_DEFAULT_TRANSFORM_GROUP +CURRENT_PATH +CURRENT_ROLE +CURRENT_ROW +CURRENT_SCHEMA +CURRENT_TIME +CURRENT_TIMESTAMP +CURRENT_TRANSFORM_GROUP_FOR_TYPE +CURRENT_USER +CURSOR +CYCLE +DATE +DAY +DEALLOCATE +DEC +DECIMAL +DECFLOAT +DECLARE +DEFAULT +DEFINE +DELETE +DENSE_RANK +DEREF +DESCRIBE +DETERMINISTIC +DISCONNECT +DISTINCT +DOUBLE +DROP +DYNAMIC +EACH +ELEMENT +ELSE +EMPTY +END +END_FRAME +END_PARTITION +END-EXEC +EQUALS +ESCAPE +EVERY +EXCEPT +EXEC +EXECUTE +EXISTS +EXP +EXTERNAL +EXTRACT +FALSE +FETCH +FILTER +FIRST_VALUE
[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit b70c68ae458d929cbf28a084cecf8252b4a3849f Author: Takeshi Yamamuro AuthorDate: Sat Jun 13 07:12:27 2020 +0900 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file ### What changes were proposed in this pull request? This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly. This approach is based on the cloud-fan suggestion: https://github.com/apache/spark/pull/28779#issuecomment-642033217 ### Why are the changes needed? It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28802 from maropu/SPARK-31950-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 208a503..14a6687 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -989,6 +989,7 @@ alterColumnAction // You can find the full keywords list by searching "Start of the keywords list" in this file. // The non-reserved keywords are listed below. Keywords not in this list are reserved keywords. ansiNonReserved +//--ANSI-NON-RESERVED-START : ADD | AFTER | ALTER @@ -1165,6 +1166,7 @@ ansiNonReserved | VIEW | VIEWS | WINDOW +//--ANSI-NON-RESERVED-END ; // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords in Spark SQL. @@ -1442,6 +1444,7 @@ nonReserved // // Start of the keywords list // +//--SPARK-KEYWORD-LIST-START ADD: 'ADD'; AFTER: 'AFTER'; ALL: 'ALL'; @@ -1694,6 +1697,7 @@ WHERE: 'WHERE'; WINDOW: 'WINDOW'; WITH: 'WITH'; YEAR: 'YEAR'; +//--SPARK-KEYWORD-LIST-END // // End of the keywords list // diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index bd617bf..04969e3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -16,9 +16,14 @@ */ package org.apache.spark.sql.catalyst.parser +import java.util.Locale + +import scala.collection.mutable + import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.catalyst.util.fileToString import org.apache.spark.sql.internal.SQLConf class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { @@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "where", "with") - // All the keywords in `docs/sql-keywords.md` are listed below: - val allCandidateKeywords = Set( -"add", -"after", -"all", -"alter", -"analyze", -"and", -"anti", -"any", -"archive", -"array", -"as", -"asc", -"at", -"authorization", -"between", -"both", -"bucket", -"buckets", -"by", -"cache", -"cascade", -"case", -"cast", -"change", -"check", -"clear", -"cluster", -"clustered", -"codegen", -"collate", -"collection", -"column", -"columns", -"comment", -"commit", -"compact", -"compactions", -"compute", -"concatenate", -"constraint", -"cost", -"create", -"cross", -&q
[spark] branch branch-3.0 updated (764da2f -> ed69190)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 764da2f [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates new b70c68a [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file new ed69190 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 7 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 ++ .../parser/TableIdentifierParserSuite.scala| 452 ++--- 4 files changed, 537 insertions(+), 329 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit b70c68ae458d929cbf28a084cecf8252b4a3849f Author: Takeshi Yamamuro AuthorDate: Sat Jun 13 07:12:27 2020 +0900 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file ### What changes were proposed in this pull request? This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly. This approach is based on the cloud-fan suggestion: https://github.com/apache/spark/pull/28779#issuecomment-642033217 ### Why are the changes needed? It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28802 from maropu/SPARK-31950-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 208a503..14a6687 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -989,6 +989,7 @@ alterColumnAction // You can find the full keywords list by searching "Start of the keywords list" in this file. // The non-reserved keywords are listed below. Keywords not in this list are reserved keywords. ansiNonReserved +//--ANSI-NON-RESERVED-START : ADD | AFTER | ALTER @@ -1165,6 +1166,7 @@ ansiNonReserved | VIEW | VIEWS | WINDOW +//--ANSI-NON-RESERVED-END ; // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords in Spark SQL. @@ -1442,6 +1444,7 @@ nonReserved // // Start of the keywords list // +//--SPARK-KEYWORD-LIST-START ADD: 'ADD'; AFTER: 'AFTER'; ALL: 'ALL'; @@ -1694,6 +1697,7 @@ WHERE: 'WHERE'; WINDOW: 'WINDOW'; WITH: 'WITH'; YEAR: 'YEAR'; +//--SPARK-KEYWORD-LIST-END // // End of the keywords list // diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index bd617bf..04969e3 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -16,9 +16,14 @@ */ package org.apache.spark.sql.catalyst.parser +import java.util.Locale + +import scala.collection.mutable + import org.apache.spark.SparkFunSuite import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.plans.SQLHelper +import org.apache.spark.sql.catalyst.util.fileToString import org.apache.spark.sql.internal.SQLConf class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { @@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "where", "with") - // All the keywords in `docs/sql-keywords.md` are listed below: - val allCandidateKeywords = Set( -"add", -"after", -"all", -"alter", -"analyze", -"and", -"anti", -"any", -"archive", -"array", -"as", -"asc", -"at", -"authorization", -"between", -"both", -"bucket", -"buckets", -"by", -"cache", -"cascade", -"case", -"cast", -"change", -"check", -"clear", -"cluster", -"clustered", -"codegen", -"collate", -"collection", -"column", -"columns", -"comment", -"commit", -"compact", -"compactions", -"compute", -"concatenate", -"constraint", -"cost", -"create", -"cross", -&q
[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit ed69190ce0762f3b741b8d175ef8d02da45f3183 Author: Takeshi Yamamuro AuthorDate: Tue Jun 16 00:27:45 2020 +0900 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords ### What changes were proposed in this pull request? This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved to non-reserved. ### Why are the changes needed? To comply with the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28807 from maropu/SPARK-26905-2. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index eab194c..e5ca7e9d 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL. |ALTER|non-reserved|non-reserved|reserved| |ANALYZE|non-reserved|non-reserved|non-reserved| |AND|reserved|non-reserved|reserved| -|ANTI|reserved|strict-non-reserved|non-reserved| +|ANTI|non-reserved|strict-non-reserved|non-reserved| |ANY|reserved|non-reserved|reserved| |ARCHIVE|non-reserved|non-reserved|non-reserved| |ARRAY|non-reserved|non-reserved|reserved| @@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL. |MAP|non-reserved|non-reserved|non-reserved| |MATCHED|non-reserved|non-reserved|non-reserved| |MERGE|non-reserved|non-reserved|non-reserved| -|MINUS|reserved|strict-non-reserved|non-reserved| +|MINUS|not-reserved|strict-non-reserved|non-reserved| |MINUTE|reserved|non-reserved|reserved| |MONTH|reserved|non-reserved|reserved| |MSCK|non-reserved|non-reserved|non-reserved| @@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL. |SCHEMA|non-reserved|non-reserved|non-reserved| |SECOND|reserved|non-reserved|reserved| |SELECT|reserved|non-reserved|reserved| -|SEMI|reserved|strict-non-reserved|non-reserved| +|SEMI|non-reserved|strict-non-reserved|non-reserved| |SEPARATED|non-reserved|non-reserved|non-reserved| |SERDE|non-reserved|non-reserved|non-reserved| |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved| diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 14a6687..5821a74 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -994,6 +994,7 @@ ansiNonReserved | AFTER | ALTER | ANALYZE +| ANTI | ARCHIVE | ARRAY | ASC @@ -1126,10 +1127,12 @@ ansiNonReserved | ROW | ROWS | SCHEMA +| SEMI | SEPARATED | SERDE | SERDEPROPERTIES | SET +| SETMINUS | SETS | SHOW | SKEWED diff --git a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt new file mode 100644 index 000..921491a --- /dev/null +++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt @@ -0,0 +1,401 @@ +-- This file comes from: https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords +ABS +ACOS +ALL +ALLOCATE +ALTER +AND +ANY +ARE +ARRAY +ARRAY_AGG +ARRAY_MAX_CARDINALITY +AS +ASENSITIVE +ASIN +ASYMMETRIC +AT +ATAN +ATOMIC +AUTHORIZATION +AVG +BEGIN +BEGIN_FRAME +BEGIN_PARTITION +BETWEEN +BIGINT +BINARY +BLOB +BOOLEAN +BOTH +BY +CALL +CALLED +CARDINALITY +CASCADED +CASE +CAST +CEIL +CEILING +CHAR +CHAR_LENGTH +CHARACTER +CHARACTER_LENGTH +CHECK +CLASSIFIER +CLOB +CLOSE +COALESCE +COLLATE +COLLECT +COLUMN +COMMIT +CONDITION +CONNECT +CONSTRAINT +CONTAINS +CONVERT +COPY +CORR +CORRESPONDING +COS +COSH +COUNT +COVAR_POP +COVAR_SAMP +CREATE +CROSS +CUBE +CUME_DIST +CURRENT +CURRENT_CATALOG +CURRENT_DATE +CURRENT_DEFAULT_TRANSFORM_GROUP +CURRENT_PATH +CURRENT_ROLE +CURRENT_ROW +CURRENT_SCHEMA +CURRENT_TIME +CURRENT_TIMESTAMP +CURRENT_TRANSFORM_GROUP_FOR_TYPE +CURRENT_USER +CURSOR +CYCLE +DATE +DAY +DEALLOCATE +DEC +DECIMAL +DECFLOAT +DECLARE +DEFAULT +DEFINE +DELETE +DENSE_RANK +DEREF +DESCRIBE +DETERMINISTIC +DISCONNECT +DISTINCT +DOUBLE +DROP +DYNAMIC +EACH +ELEMENT +ELSE +EMPTY +END +END_FRAME +END_PARTITION +END-EXEC +EQUALS +ESCAPE +EVERY +EXCEPT +EXEC +EXECUTE +EXISTS +EXP +EXTERNAL +EXTRACT +FALSE +FETCH +FILTER +FIRST_VALUE
[spark] branch master updated (eae1747 -> 3698a14)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eae1747 [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb add 3698a14 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eae1747 -> 3698a14)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eae1747 [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb add 3698a14 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eae1747 -> 3698a14)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eae1747 [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb add 3698a14 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eae1747 -> 3698a14)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eae1747 [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb add 3698a14 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eae1747 -> 3698a14)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from eae1747 [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb add 3698a14 [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md| 6 +- .../apache/spark/sql/catalyst/parser/SqlBase.g4| 3 + .../resources/ansi-sql-2016-reserved-keywords.txt | 401 + .../parser/TableIdentifierParserSuite.scala| 24 +- 4 files changed, 429 insertions(+), 5 deletions(-) create mode 100644 sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (78d08a8 -> a620a2a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file add a620a2a [SPARK-31977][SQL] Returns the plan directly from NestedColumnAliasing No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 3 +-- 2 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (78d08a8 -> a620a2a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file add a620a2a [SPARK-31977][SQL] Returns the plan directly from NestedColumnAliasing No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 3 +-- 2 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (78d08a8 -> a620a2a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file add a620a2a [SPARK-31977][SQL] Returns the plan directly from NestedColumnAliasing No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 3 +-- 2 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (78d08a8 -> a620a2a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file add a620a2a [SPARK-31977][SQL] Returns the plan directly from NestedColumnAliasing No new revisions were added by this update. Summary of changes: .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 3 +-- 2 files changed, 13 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28f131f -> 78d08a8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28f131f [SPARK-31979] Release script should not fail when remove non-existing files add 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28f131f -> 78d08a8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28f131f [SPARK-31979] Release script should not fail when remove non-existing files add 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28f131f -> 78d08a8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28f131f [SPARK-31979] Release script should not fail when remove non-existing files add 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28f131f -> 78d08a8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28f131f [SPARK-31979] Release script should not fail when remove non-existing files add 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (28f131f -> 78d08a8)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 28f131f [SPARK-31979] Release script should not fail when remove non-existing files add 78d08a8 [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/parser/SqlBase.g4| 4 + .../parser/TableIdentifierParserSuite.scala| 432 + 2 files changed, 110 insertions(+), 326 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f61b31a [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException f61b31a is described below commit f61b31a5a484c7e90920ec36c456594ce92cdf73 Author: Dilip Biswal AuthorDate: Fri Jun 12 09:19:29 2020 +0900 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException ### What changes were proposed in this pull request? A minor fix to fix the append method of StringConcat to cap the length at MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause StringIndexOutOfBoundsException Thanks to **Jeffrey Stokes** for reporting the issue and explaining the underlying problem in detail in the JIRA. ### Why are the changes needed? This fixes StringIndexOutOfBoundsException on an overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added a test in StringsUtilSuite. Closes #28750 from dilipbiswal/SPARK-31916. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index b42ae4e..2a416d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -123,7 +123,11 @@ object StringUtils extends Logging { val stringToAppend = if (available >= sLen) s else s.substring(0, available) strings.append(stringToAppend) } -length += sLen + +// Keeps the total length of appended strings. Note that we need to cap the length at +// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will overflow +// length causing StringIndexOutOfBoundsException in the substring call above. +length = Math.min(length.toLong + sLen, ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala index 67bc4bc..c68e89fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala @@ -18,9 +18,11 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.plans.SQLHelper import org.apache.spark.sql.catalyst.util.StringUtils._ +import org.apache.spark.sql.internal.SQLConf -class StringUtilsSuite extends SparkFunSuite { +class StringUtilsSuite extends SparkFunSuite with SQLHelper { test("escapeLikeRegex") { val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E" @@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite { assert(checkLimit("1234567")) assert(checkLimit("1234567890")) } + + test("SPARK-31916: StringConcat doesn't overflow on many inputs") { +val concat = new StringConcat(maxLength = 100) +val stringToAppend = "Test internal index of StringConcat does not overflow with many " + + "append calls" +0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ => + concat.append(stringToAppend) +} +assert(concat.toString.length === 100) + } + + test("SPARK-31916: verify that PlanStringConcat's output shows the actual length of the plan") { +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") { + val concat = new PlanStringConcat() + 0.to(3).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "Truncated plan of 60 characters") +} + +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") { + val concat = new PlanStringConcat() + 0.to(2).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "plan fragment 0plan fragment 1... 15 more characters") +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f61b31a [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException f61b31a is described below commit f61b31a5a484c7e90920ec36c456594ce92cdf73 Author: Dilip Biswal AuthorDate: Fri Jun 12 09:19:29 2020 +0900 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException ### What changes were proposed in this pull request? A minor fix to fix the append method of StringConcat to cap the length at MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause StringIndexOutOfBoundsException Thanks to **Jeffrey Stokes** for reporting the issue and explaining the underlying problem in detail in the JIRA. ### Why are the changes needed? This fixes StringIndexOutOfBoundsException on an overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added a test in StringsUtilSuite. Closes #28750 from dilipbiswal/SPARK-31916. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index b42ae4e..2a416d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -123,7 +123,11 @@ object StringUtils extends Logging { val stringToAppend = if (available >= sLen) s else s.substring(0, available) strings.append(stringToAppend) } -length += sLen + +// Keeps the total length of appended strings. Note that we need to cap the length at +// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will overflow +// length causing StringIndexOutOfBoundsException in the substring call above. +length = Math.min(length.toLong + sLen, ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala index 67bc4bc..c68e89fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala @@ -18,9 +18,11 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.plans.SQLHelper import org.apache.spark.sql.catalyst.util.StringUtils._ +import org.apache.spark.sql.internal.SQLConf -class StringUtilsSuite extends SparkFunSuite { +class StringUtilsSuite extends SparkFunSuite with SQLHelper { test("escapeLikeRegex") { val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E" @@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite { assert(checkLimit("1234567")) assert(checkLimit("1234567890")) } + + test("SPARK-31916: StringConcat doesn't overflow on many inputs") { +val concat = new StringConcat(maxLength = 100) +val stringToAppend = "Test internal index of StringConcat does not overflow with many " + + "append calls" +0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ => + concat.append(stringToAppend) +} +assert(concat.toString.length === 100) + } + + test("SPARK-31916: verify that PlanStringConcat's output shows the actual length of the plan") { +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") { + val concat = new PlanStringConcat() + 0.to(3).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "Truncated plan of 60 characters") +} + +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") { + val concat = new PlanStringConcat() + 0.to(2).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "plan fragment 0plan fragment 1... 15 more characters") +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88a4e55 -> b87a342)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88a4e55 [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 add b87a342 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f61b31a [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException f61b31a is described below commit f61b31a5a484c7e90920ec36c456594ce92cdf73 Author: Dilip Biswal AuthorDate: Fri Jun 12 09:19:29 2020 +0900 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException ### What changes were proposed in this pull request? A minor fix to fix the append method of StringConcat to cap the length at MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause StringIndexOutOfBoundsException Thanks to **Jeffrey Stokes** for reporting the issue and explaining the underlying problem in detail in the JIRA. ### Why are the changes needed? This fixes StringIndexOutOfBoundsException on an overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added a test in StringsUtilSuite. Closes #28750 from dilipbiswal/SPARK-31916. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index b42ae4e..2a416d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -123,7 +123,11 @@ object StringUtils extends Logging { val stringToAppend = if (available >= sLen) s else s.substring(0, available) strings.append(stringToAppend) } -length += sLen + +// Keeps the total length of appended strings. Note that we need to cap the length at +// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will overflow +// length causing StringIndexOutOfBoundsException in the substring call above. +length = Math.min(length.toLong + sLen, ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala index 67bc4bc..c68e89fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala @@ -18,9 +18,11 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.plans.SQLHelper import org.apache.spark.sql.catalyst.util.StringUtils._ +import org.apache.spark.sql.internal.SQLConf -class StringUtilsSuite extends SparkFunSuite { +class StringUtilsSuite extends SparkFunSuite with SQLHelper { test("escapeLikeRegex") { val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E" @@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite { assert(checkLimit("1234567")) assert(checkLimit("1234567890")) } + + test("SPARK-31916: StringConcat doesn't overflow on many inputs") { +val concat = new StringConcat(maxLength = 100) +val stringToAppend = "Test internal index of StringConcat does not overflow with many " + + "append calls" +0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ => + concat.append(stringToAppend) +} +assert(concat.toString.length === 100) + } + + test("SPARK-31916: verify that PlanStringConcat's output shows the actual length of the plan") { +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") { + val concat = new PlanStringConcat() + 0.to(3).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "Truncated plan of 60 characters") +} + +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") { + val concat = new PlanStringConcat() + 0.to(2).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "plan fragment 0plan fragment 1... 15 more characters") +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88a4e55 -> b87a342)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88a4e55 [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 add b87a342 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f61b31a [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException f61b31a is described below commit f61b31a5a484c7e90920ec36c456594ce92cdf73 Author: Dilip Biswal AuthorDate: Fri Jun 12 09:19:29 2020 +0900 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException ### What changes were proposed in this pull request? A minor fix to fix the append method of StringConcat to cap the length at MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause StringIndexOutOfBoundsException Thanks to **Jeffrey Stokes** for reporting the issue and explaining the underlying problem in detail in the JIRA. ### Why are the changes needed? This fixes StringIndexOutOfBoundsException on an overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added a test in StringsUtilSuite. Closes #28750 from dilipbiswal/SPARK-31916. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index b42ae4e..2a416d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -123,7 +123,11 @@ object StringUtils extends Logging { val stringToAppend = if (available >= sLen) s else s.substring(0, available) strings.append(stringToAppend) } -length += sLen + +// Keeps the total length of appended strings. Note that we need to cap the length at +// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will overflow +// length causing StringIndexOutOfBoundsException in the substring call above. +length = Math.min(length.toLong + sLen, ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala index 67bc4bc..c68e89fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala @@ -18,9 +18,11 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.plans.SQLHelper import org.apache.spark.sql.catalyst.util.StringUtils._ +import org.apache.spark.sql.internal.SQLConf -class StringUtilsSuite extends SparkFunSuite { +class StringUtilsSuite extends SparkFunSuite with SQLHelper { test("escapeLikeRegex") { val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E" @@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite { assert(checkLimit("1234567")) assert(checkLimit("1234567890")) } + + test("SPARK-31916: StringConcat doesn't overflow on many inputs") { +val concat = new StringConcat(maxLength = 100) +val stringToAppend = "Test internal index of StringConcat does not overflow with many " + + "append calls" +0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ => + concat.append(stringToAppend) +} +assert(concat.toString.length === 100) + } + + test("SPARK-31916: verify that PlanStringConcat's output shows the actual length of the plan") { +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") { + val concat = new PlanStringConcat() + 0.to(3).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "Truncated plan of 60 characters") +} + +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") { + val concat = new PlanStringConcat() + 0.to(2).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "plan fragment 0plan fragment 1... 15 more characters") +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88a4e55 -> b87a342)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88a4e55 [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 add b87a342 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f61b31a [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException f61b31a is described below commit f61b31a5a484c7e90920ec36c456594ce92cdf73 Author: Dilip Biswal AuthorDate: Fri Jun 12 09:19:29 2020 +0900 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException ### What changes were proposed in this pull request? A minor fix to fix the append method of StringConcat to cap the length at MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause StringIndexOutOfBoundsException Thanks to **Jeffrey Stokes** for reporting the issue and explaining the underlying problem in detail in the JIRA. ### Why are the changes needed? This fixes StringIndexOutOfBoundsException on an overflow. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added a test in StringsUtilSuite. Closes #28750 from dilipbiswal/SPARK-31916. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4) Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala index b42ae4e..2a416d6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala @@ -123,7 +123,11 @@ object StringUtils extends Logging { val stringToAppend = if (available >= sLen) s else s.substring(0, available) strings.append(stringToAppend) } -length += sLen + +// Keeps the total length of appended strings. Note that we need to cap the length at +// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will overflow +// length causing StringIndexOutOfBoundsException in the substring call above. +length = Math.min(length.toLong + sLen, ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala index 67bc4bc..c68e89fc 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala @@ -18,9 +18,11 @@ package org.apache.spark.sql.catalyst.util import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.plans.SQLHelper import org.apache.spark.sql.catalyst.util.StringUtils._ +import org.apache.spark.sql.internal.SQLConf -class StringUtilsSuite extends SparkFunSuite { +class StringUtilsSuite extends SparkFunSuite with SQLHelper { test("escapeLikeRegex") { val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E" @@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite { assert(checkLimit("1234567")) assert(checkLimit("1234567890")) } + + test("SPARK-31916: StringConcat doesn't overflow on many inputs") { +val concat = new StringConcat(maxLength = 100) +val stringToAppend = "Test internal index of StringConcat does not overflow with many " + + "append calls" +0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ => + concat.append(stringToAppend) +} +assert(concat.toString.length === 100) + } + + test("SPARK-31916: verify that PlanStringConcat's output shows the actual length of the plan") { +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") { + val concat = new PlanStringConcat() + 0.to(3).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "Truncated plan of 60 characters") +} + +withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") { + val concat = new PlanStringConcat() + 0.to(2).foreach { i => +concat.append(s"plan fragment $i") + } + assert(concat.toString === "plan fragment 0plan fragment 1... 15 more characters") +} + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88a4e55 -> b87a342)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88a4e55 [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 add b87a342 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (88a4e55 -> b87a342)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 88a4e55 [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0 add b87a342 [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/util/StringUtils.scala | 6 +++- .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +- 2 files changed, 36 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4b625bd [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber add 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 4b625bd [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber add 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list 89b1d46 is described below commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb Author: Takeshi Yamamuro AuthorDate: Wed Jun 10 16:29:43 2020 +0900 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list ### What changes were proposed in this pull request? This PR intends to add `TYPE` in the ANSI non-reserved list because it is not reserved in the standard. See SPARK-26905 for a full set of the reserved/non-reserved keywords of `SQL:2016`. Note: The current master behaviour is as follows; ``` scala> sql("SET spark.sql.ansi.enabled=false") scala> sql("create table t1 (type int)") res4: org.apache.spark.sql.DataFrame = [] scala> sql("SET spark.sql.ansi.enabled=true") scala> sql("create table t2 (type int)") org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'type'(line 1, pos 17) == SQL == create table t2 (type int) -^^^ ``` ### Why are the changes needed? To follow the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? Makes users use `TYPE` as identifiers. ### How was this patch tested? Update the keyword lists in `TableIdentifierParserSuite`. Closes #28773 from maropu/SPARK-26905. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e) Signed-off-by: Takeshi Yamamuro --- .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 2adaa9f..208a503 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -1153,6 +1153,7 @@ ansiNonReserved | TRIM | TRUE | TRUNCATE +| TYPE | UNARCHIVE | UNBOUNDED | UNCACHE diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index d5b0885..bd617bf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "transform", "true", "truncate", +"type", "unarchive", "unbounded", "uncache", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list 89b1d46 is described below commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb Author: Takeshi Yamamuro AuthorDate: Wed Jun 10 16:29:43 2020 +0900 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list ### What changes were proposed in this pull request? This PR intends to add `TYPE` in the ANSI non-reserved list because it is not reserved in the standard. See SPARK-26905 for a full set of the reserved/non-reserved keywords of `SQL:2016`. Note: The current master behaviour is as follows; ``` scala> sql("SET spark.sql.ansi.enabled=false") scala> sql("create table t1 (type int)") res4: org.apache.spark.sql.DataFrame = [] scala> sql("SET spark.sql.ansi.enabled=true") scala> sql("create table t2 (type int)") org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'type'(line 1, pos 17) == SQL == create table t2 (type int) -^^^ ``` ### Why are the changes needed? To follow the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? Makes users use `TYPE` as identifiers. ### How was this patch tested? Update the keyword lists in `TableIdentifierParserSuite`. Closes #28773 from maropu/SPARK-26905. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e) Signed-off-by: Takeshi Yamamuro --- .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 2adaa9f..208a503 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -1153,6 +1153,7 @@ ansiNonReserved | TRIM | TRUE | TRUNCATE +| TYPE | UNARCHIVE | UNBOUNDED | UNCACHE diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index d5b0885..bd617bf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "transform", "true", "truncate", +"type", "unarchive", "unbounded", "uncache", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 89b1d46 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list 89b1d46 is described below commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb Author: Takeshi Yamamuro AuthorDate: Wed Jun 10 16:29:43 2020 +0900 [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list ### What changes were proposed in this pull request? This PR intends to add `TYPE` in the ANSI non-reserved list because it is not reserved in the standard. See SPARK-26905 for a full set of the reserved/non-reserved keywords of `SQL:2016`. Note: The current master behaviour is as follows; ``` scala> sql("SET spark.sql.ansi.enabled=false") scala> sql("create table t1 (type int)") res4: org.apache.spark.sql.DataFrame = [] scala> sql("SET spark.sql.ansi.enabled=true") scala> sql("create table t2 (type int)") org.apache.spark.sql.catalyst.parser.ParseException: no viable alternative at input 'type'(line 1, pos 17) == SQL == create table t2 (type int) -^^^ ``` ### Why are the changes needed? To follow the ANSI/SQL standard. ### Does this PR introduce _any_ user-facing change? Makes users use `TYPE` as identifiers. ### How was this patch tested? Update the keyword lists in `TableIdentifierParserSuite`. Closes #28773 from maropu/SPARK-26905. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e) Signed-off-by: Takeshi Yamamuro --- .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 index 2adaa9f..208a503 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 @@ -1153,6 +1153,7 @@ ansiNonReserved | TRIM | TRUE | TRUNCATE +| TYPE | UNARCHIVE | UNBOUNDED | UNCACHE diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala index d5b0885..bd617bf 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala @@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper { "transform", "true", "truncate", +"type", "unarchive", "unbounded", "uncache", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3771c6 -> e14029b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3771c6 [SPARK-31935][SQL] Hadoop file system config should be effective in data source options add e14029b [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list No new revisions were added by this update. Summary of changes: .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 1 + .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fa608b9 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns fa608b9 is described below commit fa608b949b854d716904f4e43a4a10c71742b3c6 Author: LantaoJin AuthorDate: Sat Jun 6 07:35:25 2020 +0900 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns ### What changes were proposed in this pull request? ```sql CREATE TABLE t1(a STRING, B VARCHAR(10), C CHAR(10)) STORED AS parquet; CREATE TABLE t2 USING parquet PARTITIONED BY (b, c) AS SELECT * FROM t1; SELECT * FROM t2 WHERE b = 'A'; ``` Above SQL throws MetaException > Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:810) ... 114 more Caused by: MetaException(message:Filtering is supported only on partition keys of type string, or integral types) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:184) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:439) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:356) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:278) at org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:583) at org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:3315) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2768) at org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:182) at org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3248) at org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3232) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2974) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3250) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:2906) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) at com.sun.proxy.$Proxy25.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5093) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy26.get_partitions_by_filter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1232) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) at com.sun.proxy.$Proxy27.listPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2679) ... 119 more ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a unit test.
[spark] branch branch-3.0 updated: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fa608b9 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns fa608b9 is described below commit fa608b949b854d716904f4e43a4a10c71742b3c6 Author: LantaoJin AuthorDate: Sat Jun 6 07:35:25 2020 +0900 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns ### What changes were proposed in this pull request? ```sql CREATE TABLE t1(a STRING, B VARCHAR(10), C CHAR(10)) STORED AS parquet; CREATE TABLE t2 USING parquet PARTITIONED BY (b, c) AS SELECT * FROM t1; SELECT * FROM t2 WHERE b = 'A'; ``` Above SQL throws MetaException > Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:810) ... 114 more Caused by: MetaException(message:Filtering is supported only on partition keys of type string, or integral types) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:184) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:439) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:356) at org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:278) at org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:583) at org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:3315) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2768) at org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:182) at org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3248) at org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3232) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2974) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3250) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:2906) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) at com.sun.proxy.$Proxy25.getPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5093) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) at com.sun.proxy.$Proxy26.get_partitions_by_filter(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1232) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) at com.sun.proxy.$Proxy27.listPartitionsByFilter(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2679) ... 119 more ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a unit test.
[spark] branch master updated (fc6af9d -> 5079831)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fc6af9d [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting add 5079831 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/hive/client/HiveShim.scala | 3 ++- .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 10 ++ 2 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fc6af9d -> 5079831)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fc6af9d [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting add 5079831 [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/hive/client/HiveShim.scala | 3 ++- .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 10 ++ 2 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 72c466e [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator 72c466e is described below commit 72c466e0c37e4cc639040161699b6c0bffde70d5 Author: sandeep katta AuthorDate: Sun May 24 21:39:16 2020 +0900 [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator ### What changes were proposed in this pull request? `IntegralDivide` operator returns Long DataType, so integer overflow case should be handled. If the operands are of type Int it will be casted to Long ### Why are the changes needed? As `IntegralDivide` returns Long datatype, integer overflow should not happen ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT and also tested in the local cluster After fix ![image](https://user-images.githubusercontent.com/35216143/82603361-25eccc00-9bd0-11ea-9ca7-001c539e628b.png) SQL Test After fix ![image](https://user-images.githubusercontent.com/35216143/82637689-f0250300-9c22-11ea-85c3-886ab2c23471.png) Before Fix ![image](https://user-images.githubusercontent.com/35216143/82637984-878a5600-9c23-11ea-9e47-5ce2fb923c01.png) Closes #28628 from sandeep-katta/branch3Backport. Authored-by: sandeep katta Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 .../sql/catalyst/expressions/arithmetic.scala | 2 +- .../sql/catalyst/analysis/TypeCoercionSuite.scala | 24 ++ .../expressions/ArithmeticExpressionSuite.scala| 7 +-- .../sql-functions/sql-expression-schema.md | 2 +- .../resources/sql-tests/results/operators.sql.out | 8 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 7 files changed, 57 insertions(+), 12 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala index c6e3f56..a6f8e12 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala @@ -61,6 +61,7 @@ object TypeCoercion { IfCoercion :: StackCoercion :: Division :: + IntegralDivision :: ImplicitTypeCasts :: DateTimeOperations :: WindowFrameCoercion :: @@ -685,6 +686,23 @@ object TypeCoercion { } /** + * The DIV operator always returns long-type value. + * This rule cast the integral inputs to long type, to avoid overflow during calculation. + */ + object IntegralDivision extends TypeCoercionRule { +override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case e if !e.childrenResolved => e + case d @ IntegralDivide(left, right) => +IntegralDivide(mayCastToLong(left), mayCastToLong(right)) +} + +private def mayCastToLong(expr: Expression): Expression = expr.dataType match { + case _: ByteType | _: ShortType | _: IntegerType => Cast(expr, LongType) + case _ => expr +} + } + + /** * Coerces the type of different branches of a CASE WHEN statement to a common type. */ object CaseWhenCoercion extends TypeCoercionRule { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala index 354845d..7c52183 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala @@ -412,7 +412,7 @@ case class IntegralDivide( left: Expression, right: Expression) extends DivModLike { - override def inputType: AbstractDataType = TypeCollection(IntegralType, DecimalType) + override def inputType: AbstractDataType = TypeCollection(LongType, DecimalType) override def dataType: DataType = LongType diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala index e37555f..1ea1ddb 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala @@ -1559,6 +1559,30 @@ class TypeCoercionSuite extends AnalysisTest { Li
[spark] branch branch-3.0 updated (576c224 -> 72c466e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 576c224 [SPARK-31755][SQL][3.0] allow missing year/hour when parsing date/timestamp string add 72c466e [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 .../sql/catalyst/expressions/arithmetic.scala | 2 +- .../sql/catalyst/analysis/TypeCoercionSuite.scala | 24 ++ .../expressions/ArithmeticExpressionSuite.scala| 7 +-- .../sql-functions/sql-expression-schema.md | 2 +- .../resources/sql-tests/results/operators.sql.out | 8 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 7 files changed, 57 insertions(+), 12 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Fix typos: Github to GitHub
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 9879569 Fix typos: Github to GitHub 9879569 is described below commit 9879569826e89be4addf3d1f977924cc28062e2c Author: John Bampton AuthorDate: Sun May 24 19:03:46 2020 +0900 Fix typos: Github to GitHub Author: John Bampton Closes #264 from jbampton/fix-word-case. --- contributing.md | 10 +- release-process.md| 4 ++-- site/contributing.html| 10 +- site/downloads.html | 2 +- site/release-process.html | 6 +++--- 5 files changed, 16 insertions(+), 16 deletions(-) diff --git a/contributing.md b/contributing.md index 3016a26..5c68d98 100644 --- a/contributing.md +++ b/contributing.md @@ -43,7 +43,7 @@ feedback on any performance or correctness issues found in the newer release. Contributing by Reviewing Changes Changes to Spark source code are proposed, reviewed and committed via -https://github.com/apache/spark/pulls";>Github pull requests (described later). +https://github.com/apache/spark/pulls";>GitHub pull requests (described later). Anyone can view and comment on active changes here. Reviewing others' changes is a good way to learn how the change process works and gain exposure to activity in various parts of the code. You can help by reviewing the changes and asking @@ -243,7 +243,7 @@ Once you've downloaded Spark, you can find instructions for installing and build JIRA Generally, Spark uses JIRA to track logical issues, including bugs and improvements, and uses -Github pull requests to manage the review and merge of specific code changes. That is, JIRAs are +GitHub pull requests to manage the review and merge of specific code changes. That is, JIRAs are used to describe _what_ should be fixed or changed, and high-level approaches, and pull requests describe _how_ to implement that change in the project's source code. For example, major design decisions are discussed in JIRA. @@ -300,7 +300,7 @@ Example: `Fix typos in Foo scaladoc` Pull Request -1. https://help.github.com/articles/fork-a-repo/";>Fork the Github repository at +1. https://help.github.com/articles/fork-a-repo/";>Fork the GitHub repository at https://github.com/apache/spark";>https://github.com/apache/spark if you haven't already 1. Clone your fork, create a new branch, push commits to the branch. 1. Consider whether documentation or tests need to be added or updated as part of the change, @@ -341,9 +341,9 @@ the `master` branch of `apache/spark`. (Only in special cases would the PR be op https://spark-prs.appspot.com/";>spark-prs.appspot.com and Title may be the JIRA's title or a more specific title describing the PR itself. 1. If the pull request is still a work in progress, and so is not ready to be merged, - but needs to be pushed to Github to facilitate review, then add `[WIP]` after the component. + but needs to be pushed to GitHub to facilitate review, then add `[WIP]` after the component. 1. Consider identifying committers or other contributors who have worked on the code being - changed. Find the file(s) in Github and click "Blame" to see a line-by-line annotation of + changed. Find the file(s) in GitHub and click "Blame" to see a line-by-line annotation of who changed the code last. You can add `@username` in the PR description to ping them immediately. 1. Please state that the contribution is your original work and that you license the work diff --git a/release-process.md b/release-process.md index d3a9f9f..8165d24 100644 --- a/release-process.md +++ b/release-process.md @@ -264,7 +264,7 @@ pick the release version from the list, then click on "Release Notes". Copy this Then run `jekyll build` to update the `site` directory. After merging the change into the `asf-site` branch, you may need to create a follow-up empty -commit to force synchronization between ASF's git and the web site, and also the github mirror. +commit to force synchronization between ASF's git and the web site, and also the GitHub mirror. For some reason synchronization seems to not be reliable for this repository. On a related note, make sure the version is marked as released on JIRA. Go find the release page as above, eg., @@ -278,7 +278,7 @@ releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 and the previous tag to Once you have generated the initial contributors list, it is highly likely that there will be warnings about author names not being properly translated. To fix this, run https://github.com/apache/spark/blob/branch-1.1/dev/create-release/transla
[spark] branch branch-3.0 updated: [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception for invalid length input
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 2183345 [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception for invalid length input 2183345 is described below commit 218334523dacd116a03f2340ad89e33abe93e452 Author: Takeshi Yamamuro AuthorDate: Sat May 23 08:48:29 2020 +0900 [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception for invalid length input ### What changes were proposed in this pull request? This PR intends to add trivial tests to check https://github.com/apache/spark/pull/27024 has already been fixed in the master. Closes #27024 ### Why are the changes needed? For test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests. Closes #28604 from maropu/SPARK-29854. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro (cherry picked from commit 7ca73f03fbc6e213c30e725bf480709ed036a376) Signed-off-by: Takeshi Yamamuro --- .../sql-tests/inputs/ansi/string-functions.sql | 1 + .../sql-tests/inputs/string-functions.sql | 6 +++- .../results/{ => ansi}/string-functions.sql.out| 34 +- .../sql-tests/results/string-functions.sql.out | 18 +++- 4 files changed, 50 insertions(+), 9 deletions(-) diff --git a/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql b/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql new file mode 100644 index 000..dd28e9b --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql @@ -0,0 +1 @@ +--IMPORT string-functions.sql diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql index 8e33471..f5ed203 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql @@ -48,4 +48,8 @@ SELECT trim(LEADING 'xyz' FROM 'zzzytestxyz'); SELECT trim(LEADING 'xy' FROM 'xyxXxyLAST WORD'); SELECT trim(TRAILING 'xyz' FROM 'testxxzx'); SELECT trim(TRAILING 'xyz' FROM 'xyztestxxzx'); -SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy'); \ No newline at end of file +SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy'); + +-- Check lpad/rpad with invalid length parameter +SELECT lpad('hi', 'invalid_length'); +SELECT rpad('hi', 'invalid_length'); diff --git a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out similarity index 87% copy from sql/core/src/test/resources/sql-tests/results/string-functions.sql.out copy to sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out index 43c18f5..b507713 100644 --- a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 34 +-- Number of queries: 36 -- !query @@ -63,7 +63,7 @@ struct +struct -- !query output ab abcdab NULL @@ -71,15 +71,16 @@ ab abcdab NULL -- !query select left(null, -2), left("abcd", -2), left("abcd", 0), left("abcd", 'a') -- !query schema -struct +struct<> -- !query output -NULL NULL +java.lang.NumberFormatException +invalid input syntax for type numeric: a -- !query select right("abcd", 2), right("abcd", 5), right("abcd", '2'), right("abcd", null) -- !query schema -struct +struct -- !query output cd abcdcd NULL @@ -87,9 +88,10 @@ cd abcdcd NULL -- !query select right(null, -2), right("abcd", -2), right("abcd", 0), right("abcd", 'a') -- !query schema -struct +struct<> -- !query output -NULL NULL +java.lang.NumberFormatException +invalid input syntax for type numeric: a -- !query @@ -274,3 +276,21 @@ SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy') struct -- !query output TURNERyxX + + +-- !query +SELECT lpad('hi', 'invalid_length') +-- !query schema +struct<> +-- !query output +java.lang.NumberFormatException +invalid input syntax for type numeric: invalid_length + + +-- !query +SELECT rpad('hi', 'invalid_length') +-- !query
[spark] branch master updated (5a258b0 -> 7ca73f0)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5a258b0 [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the metadata log file when finding the latest batch ID add 7ca73f0 [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception for invalid length input No new revisions were added by this update. Summary of changes: .../sql-tests/inputs/ansi/string-functions.sql | 1 + .../sql-tests/inputs/string-functions.sql | 6 - .../results/{ => ansi}/string-functions.sql.out| 30 ++ .../sql-tests/results/string-functions.sql.out | 18 - 4 files changed, 48 insertions(+), 7 deletions(-) create mode 100644 sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql copy sql/core/src/test/resources/sql-tests/results/{ => ansi}/string-functions.sql.out (90%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 23019aa [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref 23019aa is described below commit 23019aa429d8f0db52b1ed5e9e6dc00ea7b94740 Author: Huaxin Gao AuthorDate: Sat May 23 08:43:16 2020 +0900 [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref ### What changes were proposed in this pull request? Fix a few issues in SQL Reference ### Why are the changes needed? To make SQL Reference look better ### Does this PR introduce _any_ user-facing change? Yes. before: https://user-images.githubusercontent.com/13592258/82639052-d0f38a80-9bbc-11ea-81a4-22def4ca5cc0.png";> after: https://user-images.githubusercontent.com/13592258/82639063-d5b83e80-9bbc-11ea-84d1-8361e6bee949.png";> before: https://user-images.githubusercontent.com/13592258/82639252-3e9fb680-9bbd-11ea-863c-e6a6c2f83a06.png";> after: https://user-images.githubusercontent.com/13592258/82639265-42cbd400-9bbd-11ea-8df2-fc5c255b84d3.png";> before: https://user-images.githubusercontent.com/13592258/82639072-db158900-9bbc-11ea-9963-731881cda4fd.png";> after https://user-images.githubusercontent.com/13592258/82639082-dfda3d00-9bbc-11ea-9bd2-f922cc91f175.png";> ### How was this patch tested? Manually build and check Closes #28608 from huaxingao/doc_fix. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit ad9532a09c70bf6acc8b79b4fdbfcd6afadcbc91) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 42 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 2 +- docs/sql-ref-syntax-qry.md | 35 +++--- docs/sql-ref-syntax.md | 28 docs/sql-ref.md | 16 +++--- 5 files changed, 67 insertions(+), 56 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 57fc493..289a9d3 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -76,14 +76,6 @@ - text: SQL Reference url: sql-ref.html subitems: -- text: Data Types - url: sql-ref-datatypes.html -- text: Identifiers - url: sql-ref-identifier.html -- text: Literals - url: sql-ref-literals.html -- text: Null Semantics - url: sql-ref-null-semantics.html - text: ANSI Compliance url: sql-ref-ansi-compliance.html subitems: @@ -93,6 +85,27 @@ url: sql-ref-ansi-compliance.html#type-conversion - text: SQL Keywords url: sql-ref-ansi-compliance.html#sql-keywords +- text: Data Types + url: sql-ref-datatypes.html +- text: Datetime Pattern + url: sql-ref-datetime-pattern.html +- text: Functions + url: sql-ref-functions.html + subitems: + - text: Built-in Functions +url: sql-ref-functions-builtin.html + - text: Scalar UDFs (User-Defined Functions) +url: sql-ref-functions-udf-scalar.html + - text: UDAFs (User-Defined Aggregate Functions) +url: sql-ref-functions-udf-aggregate.html + - text: Integration with Hive UDFs/UDAFs/UDTFs +url: sql-ref-functions-udf-hive.html +- text: Identifiers + url: sql-ref-identifier.html +- text: Literals + url: sql-ref-literals.html +- text: Null Semantics + url: sql-ref-null-semantics.html - text: SQL Syntax url: sql-ref-syntax.html subitems: @@ -247,16 +260,3 @@ url: sql-ref-syntax-aux-resource-mgmt-list-file.html - text: LIST JAR url: sql-ref-syntax-aux-resource-mgmt-list-jar.html -- text: Functions - url: sql-ref-functions.html - subitems: - - text: Built-in Functions -url: sql-ref-functions-builtin.html - - text: Scalar UDFs (User-Defined Functions) -url: sql-ref-functions-udf-scalar.html - - text: UDAFs (User-Defined Aggregate Functions) -url: sql-ref-functions-udf-aggregate.html - - text: Integration with Hive UDFs/UDAFs/UDTFs -url: sql-ref-functions-udf-hive.html -- text: Datetime Pattern - url: sql-ref-datetime-pattern.html diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md b/docs/sql-ref-syntax-aux-conf-mgmt.md index f5e48ef2..1900fb7 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt.md @@ -20,4 +20,4 @@ license: | --- * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) - * [UNSET](sql-ref-syntax-aux-conf-mgmt-reset.html) + * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) diff -
[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 23019aa [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref 23019aa is described below commit 23019aa429d8f0db52b1ed5e9e6dc00ea7b94740 Author: Huaxin Gao AuthorDate: Sat May 23 08:43:16 2020 +0900 [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref ### What changes were proposed in this pull request? Fix a few issues in SQL Reference ### Why are the changes needed? To make SQL Reference look better ### Does this PR introduce _any_ user-facing change? Yes. before: https://user-images.githubusercontent.com/13592258/82639052-d0f38a80-9bbc-11ea-81a4-22def4ca5cc0.png";> after: https://user-images.githubusercontent.com/13592258/82639063-d5b83e80-9bbc-11ea-84d1-8361e6bee949.png";> before: https://user-images.githubusercontent.com/13592258/82639252-3e9fb680-9bbd-11ea-863c-e6a6c2f83a06.png";> after: https://user-images.githubusercontent.com/13592258/82639265-42cbd400-9bbd-11ea-8df2-fc5c255b84d3.png";> before: https://user-images.githubusercontent.com/13592258/82639072-db158900-9bbc-11ea-9963-731881cda4fd.png";> after https://user-images.githubusercontent.com/13592258/82639082-dfda3d00-9bbc-11ea-9bd2-f922cc91f175.png";> ### How was this patch tested? Manually build and check Closes #28608 from huaxingao/doc_fix. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit ad9532a09c70bf6acc8b79b4fdbfcd6afadcbc91) Signed-off-by: Takeshi Yamamuro --- docs/_data/menu-sql.yaml | 42 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 2 +- docs/sql-ref-syntax-qry.md | 35 +++--- docs/sql-ref-syntax.md | 28 docs/sql-ref.md | 16 +++--- 5 files changed, 67 insertions(+), 56 deletions(-) diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml index 57fc493..289a9d3 100644 --- a/docs/_data/menu-sql.yaml +++ b/docs/_data/menu-sql.yaml @@ -76,14 +76,6 @@ - text: SQL Reference url: sql-ref.html subitems: -- text: Data Types - url: sql-ref-datatypes.html -- text: Identifiers - url: sql-ref-identifier.html -- text: Literals - url: sql-ref-literals.html -- text: Null Semantics - url: sql-ref-null-semantics.html - text: ANSI Compliance url: sql-ref-ansi-compliance.html subitems: @@ -93,6 +85,27 @@ url: sql-ref-ansi-compliance.html#type-conversion - text: SQL Keywords url: sql-ref-ansi-compliance.html#sql-keywords +- text: Data Types + url: sql-ref-datatypes.html +- text: Datetime Pattern + url: sql-ref-datetime-pattern.html +- text: Functions + url: sql-ref-functions.html + subitems: + - text: Built-in Functions +url: sql-ref-functions-builtin.html + - text: Scalar UDFs (User-Defined Functions) +url: sql-ref-functions-udf-scalar.html + - text: UDAFs (User-Defined Aggregate Functions) +url: sql-ref-functions-udf-aggregate.html + - text: Integration with Hive UDFs/UDAFs/UDTFs +url: sql-ref-functions-udf-hive.html +- text: Identifiers + url: sql-ref-identifier.html +- text: Literals + url: sql-ref-literals.html +- text: Null Semantics + url: sql-ref-null-semantics.html - text: SQL Syntax url: sql-ref-syntax.html subitems: @@ -247,16 +260,3 @@ url: sql-ref-syntax-aux-resource-mgmt-list-file.html - text: LIST JAR url: sql-ref-syntax-aux-resource-mgmt-list-jar.html -- text: Functions - url: sql-ref-functions.html - subitems: - - text: Built-in Functions -url: sql-ref-functions-builtin.html - - text: Scalar UDFs (User-Defined Functions) -url: sql-ref-functions-udf-scalar.html - - text: UDAFs (User-Defined Aggregate Functions) -url: sql-ref-functions-udf-aggregate.html - - text: Integration with Hive UDFs/UDAFs/UDTFs -url: sql-ref-functions-udf-hive.html -- text: Datetime Pattern - url: sql-ref-datetime-pattern.html diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md b/docs/sql-ref-syntax-aux-conf-mgmt.md index f5e48ef2..1900fb7 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt.md @@ -20,4 +20,4 @@ license: | --- * [SET](sql-ref-syntax-aux-conf-mgmt-set.html) - * [UNSET](sql-ref-syntax-aux-conf-mgmt-reset.html) + * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) diff -
[spark] branch master updated (2115c55 -> ad9532a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2115c55 [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions add ad9532a [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 42 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 2 +- docs/sql-ref-syntax-qry.md | 35 +++--- docs/sql-ref-syntax.md | 28 docs/sql-ref.md | 16 +++--- 5 files changed, 67 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2115c55 -> ad9532a)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2115c55 [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions add ad9532a [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref No new revisions were added by this update. Summary of changes: docs/_data/menu-sql.yaml | 42 ++-- docs/sql-ref-syntax-aux-conf-mgmt.md | 2 +- docs/sql-ref-syntax-qry.md | 35 +++--- docs/sql-ref-syntax.md | 28 docs/sql-ref.md | 16 +++--- 5 files changed, 67 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (f6053b9 -> 847d6d4)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from f6053b9 Preparing development version 3.0.1-SNAPSHOT add 847d6d4 [SPARK-31102][SQL][3.0] Spark-sql fails to parse when contains comment No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/parser/SqlBase.g4 | 2 +- .../spark/sql/catalyst/parser/PlanParserSuite.scala | 10 ++ .../sql/hive/thriftserver/SparkSQLCLIDriver.scala| 12 ++-- .../spark/sql/hive/thriftserver/CliSuite.scala | 20 +++- 4 files changed, 28 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fafe0f3 [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry fafe0f3 is described below commit fafe0f311cc1c48002b68f26ab9b274ffd565665 Author: Kent Yao AuthorDate: Thu May 7 14:37:03 2020 +0900 [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry ### What changes were proposed in this pull request? The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - https://issues.apache.org/jira/browse/HADOOP-12656 > Looking at MiniKdc implementation, if port is 0, the constructor use ServerSocket to find an unused port, assign the port number to the member variable port and close the ServerSocket object; later, in initKDCServer(), instantiate a TcpTransport object and bind at that port. > It appears that the port may be used in between, and then throw the exception. Related test failures are suspected, such as https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ ```scala [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** (15 seconds, 426 milliseconds) [info] java.net.BindException: Address already in use [info] at sun.nio.ch.Net.bind0(Native Method) [info] at sun.nio.ch.Net.bind(Net.java:433) [info] at sun.nio.ch.Net.bind(Net.java:425) [info] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) [info] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422) [info] at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) ``` After comparing the error stack trace with similar issues reported in different projects, such as https://issues.apache.org/jira/browse/KAFKA-3453 https://issues.apache.org/jira/browse/HBASE-14734 We can be sure that they are caused by the same problem issued in HADOOP-12656. In the PR, We apply the approach from HBASE first before we finally drop Hadoop 2.7.x ### Why are the changes needed? fix test flakiness ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? the test itself passing Jenkins Closes #28442 from yaooqinn/SPARK-31631. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro --- .../HadoopDelegationTokenManagerSuite.scala| 30 -- .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++--- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala index 275bca3..fc28968 100644 --- a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala @@ -19,10 +19,14 @@ package org.apache.spark.deploy.security import java.security.PrivilegedExceptionAction +import scala.util.control.NonFatal + import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION import org.apache.hadoop.minikdc.MiniKdc import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.scalatest.concurrent.Eventually._ +import org.scalatest.time.SpanSugar._ import org.apache.spark.{SparkConf, SparkFunSuite} import org.apache.spark.deploy.SparkHadoopUtil @@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends SparkFunSuite {
[spark] branch branch-3.0 updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new fafe0f3 [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry fafe0f3 is described below commit fafe0f311cc1c48002b68f26ab9b274ffd565665 Author: Kent Yao AuthorDate: Thu May 7 14:37:03 2020 +0900 [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry ### What changes were proposed in this pull request? The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - https://issues.apache.org/jira/browse/HADOOP-12656 > Looking at MiniKdc implementation, if port is 0, the constructor use ServerSocket to find an unused port, assign the port number to the member variable port and close the ServerSocket object; later, in initKDCServer(), instantiate a TcpTransport object and bind at that port. > It appears that the port may be used in between, and then throw the exception. Related test failures are suspected, such as https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ ```scala [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** (15 seconds, 426 milliseconds) [info] java.net.BindException: Address already in use [info] at sun.nio.ch.Net.bind0(Native Method) [info] at sun.nio.ch.Net.bind(Net.java:433) [info] at sun.nio.ch.Net.bind(Net.java:425) [info] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) [info] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422) [info] at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) ``` After comparing the error stack trace with similar issues reported in different projects, such as https://issues.apache.org/jira/browse/KAFKA-3453 https://issues.apache.org/jira/browse/HBASE-14734 We can be sure that they are caused by the same problem issued in HADOOP-12656. In the PR, We apply the approach from HBASE first before we finally drop Hadoop 2.7.x ### Why are the changes needed? fix test flakiness ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? the test itself passing Jenkins Closes #28442 from yaooqinn/SPARK-31631. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro --- .../HadoopDelegationTokenManagerSuite.scala| 30 -- .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++--- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala index 275bca3..fc28968 100644 --- a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala @@ -19,10 +19,14 @@ package org.apache.spark.deploy.security import java.security.PrivilegedExceptionAction +import scala.util.control.NonFatal + import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION import org.apache.hadoop.minikdc.MiniKdc import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.scalatest.concurrent.Eventually._ +import org.scalatest.time.SpanSugar._ import org.apache.spark.{SparkConf, SparkFunSuite} import org.apache.spark.deploy.SparkHadoopUtil @@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends SparkFunSuite {
[spark] branch master updated (bd6b53c -> b31ae7b)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd6b53c [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry add b31ae7b [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions No new revisions were added by this update. Summary of changes: python/pyspark/sql/tests/test_context.py | 2 +- .../sql/catalyst/expressions/Expression.scala | 14 ++ .../catalyst/expressions/datetimeExpressions.scala | 31 ++- .../sql/catalyst/expressions/nullExpressions.scala | 10 +- .../catalyst/expressions/stringExpressions.scala | 4 +- .../apache/spark/sql/catalyst/util/package.scala | 2 + .../sql-functions/sql-expression-schema.md | 16 +- .../test/resources/sql-tests/inputs/extract.sql| 5 + .../sql-tests/results/ansi/datetime.sql.out| 64 +++--- .../sql-tests/results/ansi/interval.sql.out| 8 +- .../sql-tests/results/csv-functions.sql.out| 2 +- .../resources/sql-tests/results/datetime.sql.out | 64 +++--- .../resources/sql-tests/results/extract.sql.out| 214 - .../sql-tests/results/group-by-filter.sql.out | 12 +- .../resources/sql-tests/results/interval.sql.out | 10 +- .../sql-tests/results/json-functions.sql.out | 2 +- .../sql-tests/results/postgreSQL/text.sql.out | 6 +- .../sql-tests/results/predicate-functions.sql.out | 26 +-- .../results/sql-compatibility-functions.sql.out| 18 +- .../sql-tests/results/string-functions.sql.out | 8 +- .../typeCoercion/native/dateTimeOperations.sql.out | 8 +- .../native/stringCastAndExpressions.sql.out| 4 +- .../scala/org/apache/spark/sql/ExplainSuite.scala | 6 +- 23 files changed, 292 insertions(+), 244 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (052ff49 -> bd6b53c)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 052ff49 [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors add bd6b53c [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry No new revisions were added by this update. Summary of changes: .../HadoopDelegationTokenManagerSuite.scala| 30 -- .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++--- 2 files changed, 54 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bd6b53c [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry bd6b53c is described below commit bd6b53cc0ba93f7f1ff8e00ccc366cd02a24d72a Author: Kent Yao AuthorDate: Thu May 7 14:37:03 2020 +0900 [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry ### What changes were proposed in this pull request? The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - https://issues.apache.org/jira/browse/HADOOP-12656 > Looking at MiniKdc implementation, if port is 0, the constructor use ServerSocket to find an unused port, assign the port number to the member variable port and close the ServerSocket object; later, in initKDCServer(), instantiate a TcpTransport object and bind at that port. > It appears that the port may be used in between, and then throw the exception. Related test failures are suspected, such as https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ ```scala [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** (15 seconds, 426 milliseconds) [info] java.net.BindException: Address already in use [info] at sun.nio.ch.Net.bind0(Native Method) [info] at sun.nio.ch.Net.bind(Net.java:433) [info] at sun.nio.ch.Net.bind(Net.java:425) [info] at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) [info] at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198) [info] at org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68) [info] at org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422) [info] at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) ``` After comparing the error stack trace with similar issues reported in different projects, such as https://issues.apache.org/jira/browse/KAFKA-3453 https://issues.apache.org/jira/browse/HBASE-14734 We can be sure that they are caused by the same problem issued in HADOOP-12656. In the PR, We apply the approach from HBASE first before we finally drop Hadoop 2.7.x ### Why are the changes needed? fix test flakiness ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? the test itself passing Jenkins Closes #28442 from yaooqinn/SPARK-31631. Authored-by: Kent Yao Signed-off-by: Takeshi Yamamuro --- .../HadoopDelegationTokenManagerSuite.scala| 30 -- .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++--- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala index 275bca3..fc28968 100644 --- a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala @@ -19,10 +19,14 @@ package org.apache.spark.deploy.security import java.security.PrivilegedExceptionAction +import scala.util.control.NonFatal + import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION import org.apache.hadoop.minikdc.MiniKdc import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.scalatest.concurrent.Eventually._ +import org.scalatest.time.SpanSugar._ import org.apache.spark.{SparkConf, SparkFunSuite} import org.apache.spark.deploy.SparkHadoopUtil @@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends SparkFunSuite { // krb5.conf. MiniKdc set
[spark] branch branch-3.0 updated: [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new dc7324e [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown dc7324e is described below commit dc7324e5e39783995b90e64d4737127c10a210cf Author: Liang-Chi Hsieh AuthorDate: Thu May 7 09:57:08 2020 +0900 [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown ### What changes were proposed in this pull request? This is a followup to address the https://github.com/apache/spark/pull/28366#discussion_r420611872 by refining the SQL config document. ### Why are the changes needed? Make developers less confusing. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Only doc change. Closes #28468 from viirya/SPARK-31365-followup. Authored-by: Liang-Chi Hsieh Signed-off-by: Takeshi Yamamuro (cherry picked from commit 9bf738724a3895551464d8ba0d455bc90868983f) Signed-off-by: Takeshi Yamamuro --- .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 8d673c5..6c18280 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -2070,7 +2070,8 @@ object SQLConf { .internal() .doc("A comma-separated list of data source short names or fully qualified data source " + "implementation class names for which Spark tries to push down predicates for nested " + -"columns and/or names containing `dots` to data sources. Currently, Parquet implements " + +"columns and/or names containing `dots` to data sources. This configuration is only " + +"effective with file-based data source in DSv1. Currently, Parquet implements " + "both optimizations while ORC only supports predicates for names containing `dots`. The " + "other data sources don't support this feature yet. So the default value is 'parquet,orc'.") .version("3.0.0") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d38bc2 -> 9bf7387)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d38bc2 [SPARK-31361][SQL][FOLLOWUP] Use LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite add 9bf7387 [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f8a20c4 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration f8a20c4 is described below commit f8a20c470bf115b0834970ce02eb2ec103e0f6df Author: HyukjinKwon AuthorDate: Thu May 7 09:00:59 2020 +0900 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration ### What changes were proposed in this pull request? This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' configuration and remove it in the future release. ### Why are the changes needed? This optimization can cause a potential correctness issue, see also SPARK-26709. Also, it seems difficult to extend the optimization. Basically you should whitelist all available functions. It costs some maintenance overhead, see also SPARK-31590. Looks we should just better let users use `SparkSessionExtensions` instead if they must use, and remove it in Spark side. ### Does this PR introduce _any_ user-facing change? Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation warning: ```scala scala> spark.conf.unset("spark.sql.optimizer.metadataOnly") ``` ``` 20/05/06 12:57:23 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ```scala scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true") ``` ``` 20/05/06 12:57:44 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ### How was this patch tested? Manually tested. Closes #28459 from HyukjinKwon/SPARK-31647. Authored-by: HyukjinKwon Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378) Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 51404a2..8d673c5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -844,8 +844,10 @@ object SQLConf { .doc("When true, enable the metadata-only query optimization that use the table's metadata " + "to produce the partition columns instead of table scans. It applies when all the columns " + "scanned are partition columns and the query has an aggregate operator that satisfies " + - "distinct semantics. By default the optimization is disabled, since it may return " + - "incorrect results when the files are empty.") + "distinct semantics. By default the optimization is disabled, and deprecated as of Spark " + + "3.0 since it may return incorrect results when the files are empty, see also SPARK-26709." + + "It will be removed in the future releases. If you must use, use 'SparkSessionExtensions' " + + "instead to inject it as a custom rule.") .version("2.1.1") .booleanConf .createWithDefault(false) @@ -2587,7 +2589,10 @@ object SQLConf { DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0", s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."), DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0", -s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.") +s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."), + DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0", +"Avoid to depend on this optimization to prevent a potential correctness issue. " + + "If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule.") ) Map(configs.map { cfg => cfg.key -> cfg } : _*) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f8a20c4 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration f8a20c4 is described below commit f8a20c470bf115b0834970ce02eb2ec103e0f6df Author: HyukjinKwon AuthorDate: Thu May 7 09:00:59 2020 +0900 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration ### What changes were proposed in this pull request? This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' configuration and remove it in the future release. ### Why are the changes needed? This optimization can cause a potential correctness issue, see also SPARK-26709. Also, it seems difficult to extend the optimization. Basically you should whitelist all available functions. It costs some maintenance overhead, see also SPARK-31590. Looks we should just better let users use `SparkSessionExtensions` instead if they must use, and remove it in Spark side. ### Does this PR introduce _any_ user-facing change? Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation warning: ```scala scala> spark.conf.unset("spark.sql.optimizer.metadataOnly") ``` ``` 20/05/06 12:57:23 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ```scala scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true") ``` ``` 20/05/06 12:57:44 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ### How was this patch tested? Manually tested. Closes #28459 from HyukjinKwon/SPARK-31647. Authored-by: HyukjinKwon Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378) Signed-off-by: Takeshi Yamamuro --- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 51404a2..8d673c5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -844,8 +844,10 @@ object SQLConf { .doc("When true, enable the metadata-only query optimization that use the table's metadata " + "to produce the partition columns instead of table scans. It applies when all the columns " + "scanned are partition columns and the query has an aggregate operator that satisfies " + - "distinct semantics. By default the optimization is disabled, since it may return " + - "incorrect results when the files are empty.") + "distinct semantics. By default the optimization is disabled, and deprecated as of Spark " + + "3.0 since it may return incorrect results when the files are empty, see also SPARK-26709." + + "It will be removed in the future releases. If you must use, use 'SparkSessionExtensions' " + + "instead to inject it as a custom rule.") .version("2.1.1") .booleanConf .createWithDefault(false) @@ -2587,7 +2589,10 @@ object SQLConf { DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0", s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."), DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0", -s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.") +s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."), + DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0", +"Avoid to depend on this optimization to prevent a potential correctness issue. " + + "If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule.") ) Map(configs.map { cfg => cfg.key -> cfg } : _*) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (09ece50 -> 5c5dd77)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 09ece50 [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to PySpark add 5c5dd77 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (09ece50 -> 5c5dd77)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 09ece50 [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to PySpark add 5c5dd77 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ccde0a1 [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ccde0a1 is described below commit ccde0a1ae2d880585cb554cc67f75ef972a78c67 Author: Dilip Biswal AuthorDate: Tue May 5 15:21:14 2020 +0900 [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table ### What changes were proposed in this pull request? This PR is to clean up the markdown file in remaining pages in sql reference. The first one was done by gatorsmile in [28415](https://github.com/apache/spark/pull/28415) - Replace HTML table by MD table - **sql-ref-ansi-compliance.md** https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png";> - **sql-ref-datatypes.md (Scala)** https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png";> https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png";> - **sql-ref-datatypes.md (Java)** https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png";> https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png";> - **sql-ref-datatypes.md (Python)** https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png";> https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png";> - **sql-ref-datatypes.md (R)** https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png";> https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png";> - **sql-ref-datatypes.md (SQL)** https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png";> - **sql-ref-syntax-qry-select-tvf.md** https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png";> ### Why are the changes needed? Make the doc cleaner and easily editable by MD editors ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually using jekyll serve Closes #28433 from dilipbiswal/sql-doc-table-cleanup. Authored-by: Dilip Biswal Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5052d9557d964c07d0b8bd2e2b08ede7c6958118) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-ansi-compliance.md | 542 +- docs/sql-ref-datatypes.md | 695 +- docs/sql-ref-datetime-pattern.md | 8 +- docs/sql-ref-null-semantics.md| 131 ++- docs/sql-ref-syntax-qry-select-tvf.md | 33 +- 5 files changed, 388 insertions(+), 1021 deletions(-) diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md index 6cf1653..93fb10b 100644 --- a/docs/sql-ref-ansi-compliance.md +++ b/docs/sql-ref-ansi-compliance.md @@ -27,35 +27,10 @@ The casting behaviours are defined as store assignment rules in the standard. When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules. This is a separate configuration because its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` is disabled by default. - -Property NameDefaultMeaningSince Version - - spark.sql.ansi.enabled - false - -(Experimental) When true, Spark tries to conform to the ANSI SQL specification: -1. Spark will throw a runtime exception if an overflow occurs in any operation on integral/decimal field. -2. Spark will forbid using the reserved keywords of ANSI SQL as identifiers in the SQL parser. - - 3.0.0 - - - spark.sql.storeAssignmentPolicy - ANSI - -(Experimental) When inserting a value into a column with different data type, Spark will perform type coercion. -Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict. With ANSI policy, -Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. -It disallows certain unreasonable type conversions such as converting string to int or double to boolean. -With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which is very loose. -e.g. converting string to int or double to boolean is allowed. -It is also the only behavior in Spark 2.x and it is compatible with Hive. -With strict policy, Spark doesn't allow any possib
[spark] branch master updated (8d1f7d2 -> 5052d95)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8d1f7d2 [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException add 5052d95 [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table No new revisions were added by this update. Summary of changes: docs/sql-ref-ansi-compliance.md | 542 +- docs/sql-ref-datatypes.md | 695 +- docs/sql-ref-datetime-pattern.md | 8 +- docs/sql-ref-null-semantics.md| 131 ++- docs/sql-ref-syntax-qry-select-tvf.md | 33 +- 5 files changed, 388 insertions(+), 1021 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (735771e -> 8d1f7d2)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 735771e [SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and TIMESTAMP_MILLIS timestamps in read/write add 8d1f7d2 [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent TableAlreadyExistsException No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/CachedTableSuite.scala| 238 ++-- .../apache/spark/sql/ColumnExpressionSuite.scala | 40 +- .../org/apache/spark/sql/DataFrameSuite.scala | 87 +- .../spark/sql/DataFrameWindowFunctionsSuite.scala | 122 +- .../scala/org/apache/spark/sql/JoinSuite.scala | 270 ++-- .../org/apache/spark/sql/JsonFunctionsSuite.scala | 72 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 895 +++-- .../spark/sql/ScalaReflectionRelationSuite.scala | 96 +- .../scala/org/apache/spark/sql/SubquerySuite.scala | 40 +- .../apache/spark/sql/UserDefinedTypeSuite.scala| 14 +- .../sql/execution/SQLWindowFunctionSuite.scala | 433 --- .../columnar/InMemoryColumnarQuerySuite.scala | 46 +- .../sql/execution/datasources/json/JsonSuite.scala | 1367 ++-- .../sql/execution/joins/BroadcastJoinSuite.scala | 37 +- .../sql/execution/metric/SQLMetricsSuite.scala | 45 +- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 14 +- .../org/apache/spark/sql/sources/InsertSuite.scala | 16 +- .../apache/spark/sql/sources/SaveLoadSuite.scala | 36 +- .../apache/spark/sql/streaming/StreamSuite.scala | 16 +- .../sql/streaming/continuous/ContinuousSuite.scala | 22 +- .../sql/hive/execution/AggregationQuerySuite.scala | 78 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 54 +- .../spark/sql/hive/execution/HiveQuerySuite.scala | 72 +- .../spark/sql/hive/execution/SQLQuerySuite.scala | 535 24 files changed, 2445 insertions(+), 2200 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark`
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d400880 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` d400880 is described below commit d4008804f987fa3d3405335e2469886a0d61dd67 Author: Max Gekk AuthorDate: Mon May 4 09:39:50 2020 +0900 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` ### What changes were proposed in this pull request? - Changed to the number of rows in benchmark cases from 3 to the actual number `N`. - Regenerated benchmark results in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? The changes are needed to have: - Correct benchmark results - Base line for other perf improvements that can be checked in the same environment. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the benchmark and checking its output. Closes #28440 from MaxGekk/SPARK-31527-DateTimeBenchmark-followup. Authored-by: Max Gekk Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2fb85f6b684843f337b6e73ba57ee9e57a53496d) Signed-off-by: Takeshi Yamamuro --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 474 ++--- .../execution/benchmark/DateTimeBenchmark.scala| 2 +- 3 files changed, 475 insertions(+), 475 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index 1004bcf..61b4c76 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -2,456 +2,456 @@ datetime +/- interval -Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.4 -Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz datetime +/- interval:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -date + interval(m) 919933 22 0.0 306237514.3 1.0X -date + interval(m, d) 910916 9 0.0 303338619.0 1.0X -date + interval(m, d, ms) 3912 3923 16 0.0 1303942791.7 0.2X -date - interval(m) 883887 6 0.0 294268789.3 1.0X -date - interval(m, d) 898911 18 0.0 299453403.0 1.0X -date - interval(m, d, ms) 3937 3944 11 0.0 1312269472.0 0.2X -timestamp + interval(m)2226 2236 14 0.0 741972014.3 0.4X -timestamp + interval(m, d) 2264 2274 13 0.0 754709121.0 0.4X -timestamp + interval(m, d, ms) 2202 2223 30 0.0 734001075.0 0.4X -timestamp - interval(m)1992 2005 17 0.0 664152744.7 0.5X -timestamp - interval(m, d) 2069 2075 9 0.0 689631159.0 0.4X -timestamp - interval(m, d, ms) 2240 2244 6 0.0 746538728.0 0.4X +date + interval(m) 1485 1567 116 6.7 148.5 1.0X +date + interval(m, d) 1504 1510 9 6.6 150.4 1.0X +date + interval(m, d, ms) 7000 7013 18 1.4 700.0 0.2X +date - interval(m) 1466 1478 17 6.8 146.6 1.0X +date - interval(m, d) 1533 1534 1 6.5 153.3 1.0X
[spark] branch branch-3.0 updated: [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark`
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d400880 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` d400880 is described below commit d4008804f987fa3d3405335e2469886a0d61dd67 Author: Max Gekk AuthorDate: Mon May 4 09:39:50 2020 +0900 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` ### What changes were proposed in this pull request? - Changed to the number of rows in benchmark cases from 3 to the actual number `N`. - Regenerated benchmark results in the environment: | Item | Description | | | | | Region | us-west-2 (Oregon) | | Instance | r3.xlarge | | AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 (ami-06f2f779464715dc5) | | Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 11.0.6+10 | ### Why are the changes needed? The changes are needed to have: - Correct benchmark results - Base line for other perf improvements that can be checked in the same environment. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the benchmark and checking its output. Closes #28440 from MaxGekk/SPARK-31527-DateTimeBenchmark-followup. Authored-by: Max Gekk Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2fb85f6b684843f337b6e73ba57ee9e57a53496d) Signed-off-by: Takeshi Yamamuro --- .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 474 ++--- .../execution/benchmark/DateTimeBenchmark.scala| 2 +- 3 files changed, 475 insertions(+), 475 deletions(-) diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt index 1004bcf..61b4c76 100644 --- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt +++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt @@ -2,456 +2,456 @@ datetime +/- interval -Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.4 -Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz +OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws +Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz datetime +/- interval:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -date + interval(m) 919933 22 0.0 306237514.3 1.0X -date + interval(m, d) 910916 9 0.0 303338619.0 1.0X -date + interval(m, d, ms) 3912 3923 16 0.0 1303942791.7 0.2X -date - interval(m) 883887 6 0.0 294268789.3 1.0X -date - interval(m, d) 898911 18 0.0 299453403.0 1.0X -date - interval(m, d, ms) 3937 3944 11 0.0 1312269472.0 0.2X -timestamp + interval(m)2226 2236 14 0.0 741972014.3 0.4X -timestamp + interval(m, d) 2264 2274 13 0.0 754709121.0 0.4X -timestamp + interval(m, d, ms) 2202 2223 30 0.0 734001075.0 0.4X -timestamp - interval(m)1992 2005 17 0.0 664152744.7 0.5X -timestamp - interval(m, d) 2069 2075 9 0.0 689631159.0 0.4X -timestamp - interval(m, d, ms) 2240 2244 6 0.0 746538728.0 0.4X +date + interval(m) 1485 1567 116 6.7 148.5 1.0X +date + interval(m, d) 1504 1510 9 6.6 150.4 1.0X +date + interval(m, d, ms) 7000 7013 18 1.4 700.0 0.2X +date - interval(m) 1466 1478 17 6.8 146.6 1.0X +date - interval(m, d) 1533 1534 1 6.5 153.3 1.0X
[spark] branch master updated (f53d8c6 -> 2fb85f6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f53d8c6 [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical add 2fb85f6 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 474 ++--- .../execution/benchmark/DateTimeBenchmark.scala| 2 +- 3 files changed, 475 insertions(+), 475 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f53d8c6 -> 2fb85f6)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f53d8c6 [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical add 2fb85f6 [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark` No new revisions were added by this update. Summary of changes: .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++--- sql/core/benchmarks/DateTimeBenchmark-results.txt | 474 ++--- .../execution/benchmark/DateTimeBenchmark.scala| 2 +- 3 files changed, 475 insertions(+), 475 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3aa659c [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default 3aa659c is described below commit 3aa659ce29877f386a24da9d04e66069d04afaa8 Author: Max Gekk AuthorDate: Sat May 2 17:54:36 2020 +0900 [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default ### What changes were proposed in this pull request? Set `spark.ui.enabled` to `false` in `SqlBasedBenchmark.getSparkSession`. This disables UI in all SQL benchmarks by default. ### Why are the changes needed? UI overhead lowers numbers in the `Relative` column and impacts on `Stdev` in benchmark results. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Checked by running `DateTimeRebaseBenchmark`. Closes #28432 from MaxGekk/ui-off-in-benchmarks. Authored-by: Max Gekk Signed-off-by: Takeshi Yamamuro (cherry picked from commit 13dddee9a8490ead00ff00bd741db4a170dfd759) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala | 2 -- .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala | 2 -- .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++ 3 files changed, 2 insertions(+), 4 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index d29c5e3..0fc43c7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -23,7 +23,6 @@ import scala.util.Random import org.apache.spark.SparkConf import org.apache.spark.benchmark.Benchmark -import org.apache.spark.internal.config.UI._ import org.apache.spark.sql.{DataFrame, DataFrameWriter, Row, SparkSession} import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.datasources.parquet.{SpecificParquetRecordReaderBase, VectorizedParquetRecordReader} @@ -52,7 +51,6 @@ object DataSourceReadBenchmark extends SqlBasedBenchmark { .set("spark.master", "local[1]") .setIfMissing("spark.driver.memory", "3g") .setIfMissing("spark.executor.memory", "3g") - .setIfMissing(UI_ENABLED, false) val sparkSession = SparkSession.builder.config(conf).getOrCreate() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala index 444ffa4..b3f65d4 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala @@ -23,7 +23,6 @@ import scala.util.Random import org.apache.spark.SparkConf import org.apache.spark.benchmark.Benchmark -import org.apache.spark.internal.config.UI._ import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.sql.internal.SQLConf @@ -49,7 +48,6 @@ object FilterPushdownBenchmark extends SqlBasedBenchmark { .set("spark.master", "local[1]") .setIfMissing("spark.driver.memory", "3g") .setIfMissing("spark.executor.memory", "3g") - .setIfMissing(UI_ENABLED, false) .setIfMissing("orc.compression", "snappy") .setIfMissing("spark.sql.parquet.compression.codec", "snappy") diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala index ee7a03e..28387dc 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.execution.benchmark import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.internal.config.UI.UI_ENABLED import org.apache.spark.sql.{Dataset, SparkSession} import org.apache.spark.sql.SaveMode.Overwrite import org.apache.spark.sql.catalyst.plans.SQLHelper @@ -37,6 +38,7 @@ trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { .appName(this.getClass.getCanonicalName)
[spark] branch branch-3.0 updated: [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3aa659c [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default 3aa659c is described below commit 3aa659ce29877f386a24da9d04e66069d04afaa8 Author: Max Gekk AuthorDate: Sat May 2 17:54:36 2020 +0900 [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default ### What changes were proposed in this pull request? Set `spark.ui.enabled` to `false` in `SqlBasedBenchmark.getSparkSession`. This disables UI in all SQL benchmarks by default. ### Why are the changes needed? UI overhead lowers numbers in the `Relative` column and impacts on `Stdev` in benchmark results. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Checked by running `DateTimeRebaseBenchmark`. Closes #28432 from MaxGekk/ui-off-in-benchmarks. Authored-by: Max Gekk Signed-off-by: Takeshi Yamamuro (cherry picked from commit 13dddee9a8490ead00ff00bd741db4a170dfd759) Signed-off-by: Takeshi Yamamuro --- .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala | 2 -- .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala | 2 -- .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++ 3 files changed, 2 insertions(+), 4 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala index d29c5e3..0fc43c7 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala @@ -23,7 +23,6 @@ import scala.util.Random import org.apache.spark.SparkConf import org.apache.spark.benchmark.Benchmark -import org.apache.spark.internal.config.UI._ import org.apache.spark.sql.{DataFrame, DataFrameWriter, Row, SparkSession} import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.execution.datasources.parquet.{SpecificParquetRecordReaderBase, VectorizedParquetRecordReader} @@ -52,7 +51,6 @@ object DataSourceReadBenchmark extends SqlBasedBenchmark { .set("spark.master", "local[1]") .setIfMissing("spark.driver.memory", "3g") .setIfMissing("spark.executor.memory", "3g") - .setIfMissing(UI_ENABLED, false) val sparkSession = SparkSession.builder.config(conf).getOrCreate() diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala index 444ffa4..b3f65d4 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala @@ -23,7 +23,6 @@ import scala.util.Random import org.apache.spark.SparkConf import org.apache.spark.benchmark.Benchmark -import org.apache.spark.internal.config.UI._ import org.apache.spark.sql.{DataFrame, SparkSession} import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.sql.internal.SQLConf @@ -49,7 +48,6 @@ object FilterPushdownBenchmark extends SqlBasedBenchmark { .set("spark.master", "local[1]") .setIfMissing("spark.driver.memory", "3g") .setIfMissing("spark.executor.memory", "3g") - .setIfMissing(UI_ENABLED, false) .setIfMissing("orc.compression", "snappy") .setIfMissing("spark.sql.parquet.compression.codec", "snappy") diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala index ee7a03e..28387dc 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala @@ -18,6 +18,7 @@ package org.apache.spark.sql.execution.benchmark import org.apache.spark.benchmark.{Benchmark, BenchmarkBase} +import org.apache.spark.internal.config.UI.UI_ENABLED import org.apache.spark.sql.{Dataset, SparkSession} import org.apache.spark.sql.SaveMode.Overwrite import org.apache.spark.sql.catalyst.plans.SQLHelper @@ -37,6 +38,7 @@ trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper { .appName(this.getClass.getCanonicalName)
[spark] branch master updated (75da050 -> 13dddee)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75da050 [MINOR][SQL][DOCS] Remove two leading spaces from sql tables add 13dddee [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala | 2 -- .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala | 2 -- .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++ 3 files changed, 2 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (75da050 -> 13dddee)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 75da050 [MINOR][SQL][DOCS] Remove two leading spaces from sql tables add 13dddee [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala | 2 -- .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala | 2 -- .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++ 3 files changed, 2 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1222ce0 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements 1222ce0 is described below commit 1222ce064f97ed9ad34e2fca4d270762592a1854 Author: Pablo Langa AuthorDate: Fri May 1 22:09:04 2020 +0900 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements ### What changes were proposed in this pull request? The collect_set() aggregate function should produce a set of distinct elements. When the column argument's type is BinayType this is not the case. Example: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window case class R(id: String, value: String, bytes: Array[Byte]) def makeR(id: String, value: String) = R(id, value, value.getBytes) val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), makeR("b", "fish")).toDF() // In the example below "bytesSet" erroneously has duplicates but "stringSet" does not (as expected). df.agg(collect_set('value) as "stringSet", collect_set('bytes) as "byteSet").show(truncate=false) // The same problem is displayed when using window functions. val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) val result = df.select( collect_set('value).over(win) as "stringSet", collect_set('bytes).over(win) as "bytesSet" ) .select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", size('bytesSet) as "bytesSetSize") .show() ``` We use a HashSet buffer to accumulate the results, the problem is that arrays equality in Scala don't behave as expected, arrays ara just plain java arrays and the equality don't compare the content of the arrays Array(1, 2, 3) == Array(1, 2, 3) => False The result is that duplicates are not removed in the hashset The solution proposed is that in the last stage, when we have all the data in the Hashset buffer, we delete duplicates changing the type of the elements and then transform it to the original type. This transformation is only applied when we have a BinaryType ### Why are the changes needed? Fix the bug explained ### Does this PR introduce any user-facing change? Yes. Now `collect_set()` correctly deduplicates array of byte. ### How was this patch tested? Unit testing Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug. Authored-by: Pablo Langa Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c) Signed-off-by: Takeshi Yamamuro --- .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index be972f0..8dc3171 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -23,6 +23,7 @@ import scala.collection.mutable import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.ArrayData import org.apache.spark.sql.catalyst.util.GenericArrayData import org.apache.spark.sql.types._ @@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper // actual order of input rows. override lazy val deterministic: Boolean = false + protected def convertToBufferElement(value: Any): Any + override def update(buffer: T, input: InternalRow): T = { val value = child.eval(input) // Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. // See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator if (value != null) { - buffer += InternalRow.copyValue(value) + buffer += convertToBufferElement(value) } buffer } @@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper buffer ++= other } - override def eval(buffer: T):
[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1222ce0 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements 1222ce0 is described below commit 1222ce064f97ed9ad34e2fca4d270762592a1854 Author: Pablo Langa AuthorDate: Fri May 1 22:09:04 2020 +0900 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements ### What changes were proposed in this pull request? The collect_set() aggregate function should produce a set of distinct elements. When the column argument's type is BinayType this is not the case. Example: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window case class R(id: String, value: String, bytes: Array[Byte]) def makeR(id: String, value: String) = R(id, value, value.getBytes) val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), makeR("b", "fish")).toDF() // In the example below "bytesSet" erroneously has duplicates but "stringSet" does not (as expected). df.agg(collect_set('value) as "stringSet", collect_set('bytes) as "byteSet").show(truncate=false) // The same problem is displayed when using window functions. val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) val result = df.select( collect_set('value).over(win) as "stringSet", collect_set('bytes).over(win) as "bytesSet" ) .select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", size('bytesSet) as "bytesSetSize") .show() ``` We use a HashSet buffer to accumulate the results, the problem is that arrays equality in Scala don't behave as expected, arrays ara just plain java arrays and the equality don't compare the content of the arrays Array(1, 2, 3) == Array(1, 2, 3) => False The result is that duplicates are not removed in the hashset The solution proposed is that in the last stage, when we have all the data in the Hashset buffer, we delete duplicates changing the type of the elements and then transform it to the original type. This transformation is only applied when we have a BinaryType ### Why are the changes needed? Fix the bug explained ### Does this PR introduce any user-facing change? Yes. Now `collect_set()` correctly deduplicates array of byte. ### How was this patch tested? Unit testing Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug. Authored-by: Pablo Langa Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c) Signed-off-by: Takeshi Yamamuro --- .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index be972f0..8dc3171 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -23,6 +23,7 @@ import scala.collection.mutable import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.ArrayData import org.apache.spark.sql.catalyst.util.GenericArrayData import org.apache.spark.sql.types._ @@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper // actual order of input rows. override lazy val deterministic: Boolean = false + protected def convertToBufferElement(value: Any): Any + override def update(buffer: T, input: InternalRow): T = { val value = child.eval(input) // Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. // See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator if (value != null) { - buffer += InternalRow.copyValue(value) + buffer += convertToBufferElement(value) } buffer } @@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper buffer ++= other } - override def eval(buffer: T):
[spark] branch branch-3.0 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1795a70 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements 1795a70 is described below commit 1795a70bb04fad1b8cf76271443a448f8d72fc8a Author: Pablo Langa AuthorDate: Fri May 1 22:09:04 2020 +0900 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements ### What changes were proposed in this pull request? The collect_set() aggregate function should produce a set of distinct elements. When the column argument's type is BinayType this is not the case. Example: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window case class R(id: String, value: String, bytes: Array[Byte]) def makeR(id: String, value: String) = R(id, value, value.getBytes) val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), makeR("b", "fish")).toDF() // In the example below "bytesSet" erroneously has duplicates but "stringSet" does not (as expected). df.agg(collect_set('value) as "stringSet", collect_set('bytes) as "byteSet").show(truncate=false) // The same problem is displayed when using window functions. val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) val result = df.select( collect_set('value).over(win) as "stringSet", collect_set('bytes).over(win) as "bytesSet" ) .select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", size('bytesSet) as "bytesSetSize") .show() ``` We use a HashSet buffer to accumulate the results, the problem is that arrays equality in Scala don't behave as expected, arrays ara just plain java arrays and the equality don't compare the content of the arrays Array(1, 2, 3) == Array(1, 2, 3) => False The result is that duplicates are not removed in the hashset The solution proposed is that in the last stage, when we have all the data in the Hashset buffer, we delete duplicates changing the type of the elements and then transform it to the original type. This transformation is only applied when we have a BinaryType ### Why are the changes needed? Fix the bug explained ### Does this PR introduce any user-facing change? Yes. Now `collect_set()` correctly deduplicates array of byte. ### How was this patch tested? Unit testing Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug. Authored-by: Pablo Langa Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c) Signed-off-by: Takeshi Yamamuro --- .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index 5848aa3..0a3d876 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -23,6 +23,7 @@ import scala.collection.mutable import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.ArrayData import org.apache.spark.sql.catalyst.util.GenericArrayData import org.apache.spark.sql.types._ @@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper // actual order of input rows. override lazy val deterministic: Boolean = false + protected def convertToBufferElement(value: Any): Any + override def update(buffer: T, input: InternalRow): T = { val value = child.eval(input) // Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. // See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator if (value != null) { - buffer += InternalRow.copyValue(value) + buffer += convertToBufferElement(value) } buffer } @@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper buffer ++= other } - override def eval(buffer: T):
[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1222ce0 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements 1222ce0 is described below commit 1222ce064f97ed9ad34e2fca4d270762592a1854 Author: Pablo Langa AuthorDate: Fri May 1 22:09:04 2020 +0900 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements ### What changes were proposed in this pull request? The collect_set() aggregate function should produce a set of distinct elements. When the column argument's type is BinayType this is not the case. Example: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window case class R(id: String, value: String, bytes: Array[Byte]) def makeR(id: String, value: String) = R(id, value, value.getBytes) val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), makeR("b", "fish")).toDF() // In the example below "bytesSet" erroneously has duplicates but "stringSet" does not (as expected). df.agg(collect_set('value) as "stringSet", collect_set('bytes) as "byteSet").show(truncate=false) // The same problem is displayed when using window functions. val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) val result = df.select( collect_set('value).over(win) as "stringSet", collect_set('bytes).over(win) as "bytesSet" ) .select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", size('bytesSet) as "bytesSetSize") .show() ``` We use a HashSet buffer to accumulate the results, the problem is that arrays equality in Scala don't behave as expected, arrays ara just plain java arrays and the equality don't compare the content of the arrays Array(1, 2, 3) == Array(1, 2, 3) => False The result is that duplicates are not removed in the hashset The solution proposed is that in the last stage, when we have all the data in the Hashset buffer, we delete duplicates changing the type of the elements and then transform it to the original type. This transformation is only applied when we have a BinaryType ### Why are the changes needed? Fix the bug explained ### Does this PR introduce any user-facing change? Yes. Now `collect_set()` correctly deduplicates array of byte. ### How was this patch tested? Unit testing Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug. Authored-by: Pablo Langa Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c) Signed-off-by: Takeshi Yamamuro --- .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index be972f0..8dc3171 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -23,6 +23,7 @@ import scala.collection.mutable import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.ArrayData import org.apache.spark.sql.catalyst.util.GenericArrayData import org.apache.spark.sql.types._ @@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper // actual order of input rows. override lazy val deterministic: Boolean = false + protected def convertToBufferElement(value: Any): Any + override def update(buffer: T, input: InternalRow): T = { val value = child.eval(input) // Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. // See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator if (value != null) { - buffer += InternalRow.copyValue(value) + buffer += convertToBufferElement(value) } buffer } @@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper buffer ++= other } - override def eval(buffer: T):
[spark] branch branch-3.0 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 1795a70 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements 1795a70 is described below commit 1795a70bb04fad1b8cf76271443a448f8d72fc8a Author: Pablo Langa AuthorDate: Fri May 1 22:09:04 2020 +0900 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements ### What changes were proposed in this pull request? The collect_set() aggregate function should produce a set of distinct elements. When the column argument's type is BinayType this is not the case. Example: ```scala import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions.Window case class R(id: String, value: String, bytes: Array[Byte]) def makeR(id: String, value: String) = R(id, value, value.getBytes) val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), makeR("b", "fish")).toDF() // In the example below "bytesSet" erroneously has duplicates but "stringSet" does not (as expected). df.agg(collect_set('value) as "stringSet", collect_set('bytes) as "byteSet").show(truncate=false) // The same problem is displayed when using window functions. val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) val result = df.select( collect_set('value).over(win) as "stringSet", collect_set('bytes).over(win) as "bytesSet" ) .select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", size('bytesSet) as "bytesSetSize") .show() ``` We use a HashSet buffer to accumulate the results, the problem is that arrays equality in Scala don't behave as expected, arrays ara just plain java arrays and the equality don't compare the content of the arrays Array(1, 2, 3) == Array(1, 2, 3) => False The result is that duplicates are not removed in the hashset The solution proposed is that in the last stage, when we have all the data in the Hashset buffer, we delete duplicates changing the type of the elements and then transform it to the original type. This transformation is only applied when we have a BinaryType ### Why are the changes needed? Fix the bug explained ### Does this PR introduce any user-facing change? Yes. Now `collect_set()` correctly deduplicates array of byte. ### How was this patch tested? Unit testing Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug. Authored-by: Pablo Langa Signed-off-by: Takeshi Yamamuro (cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c) Signed-off-by: Takeshi Yamamuro --- .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala index 5848aa3..0a3d876 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala @@ -23,6 +23,7 @@ import scala.collection.mutable import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.ArrayData import org.apache.spark.sql.catalyst.util.GenericArrayData import org.apache.spark.sql.types._ @@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper // actual order of input rows. override lazy val deterministic: Boolean = false + protected def convertToBufferElement(value: Any): Any + override def update(buffer: T, input: InternalRow): T = { val value = child.eval(input) // Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. // See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator if (value != null) { - buffer += InternalRow.copyValue(value) + buffer += convertToBufferElement(value) } buffer } @@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends TypedImper buffer ++= other } - override def eval(buffer: T):
[spark] branch master updated (b7cde42 -> 4fecc20)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7cde42 [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout" add 4fecc20 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b7cde42 -> 4fecc20)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b7cde42 [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout" add 4fecc20 [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements No new revisions were added by this update. Summary of changes: .../catalyst/expressions/aggregate/collect.scala | 45 +++--- .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 2 files changed, 55 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7c6b970 [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite 7c6b970 is described below commit 7c6b9708b6fbc81d583081a7b027fe1cce493b6c Author: Takeshi Yamamuro AuthorDate: Fri May 1 18:37:41 2020 +0900 [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite ### What changes were proposed in this pull request? This PR is a follow-up PR to update the golden file of `ExpressionsSchemaSuite`. ### Why are the changes needed? To recover tests in branch-3.0. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28427 from maropu/SPARK-31372-FOLLOWUP. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../src/test/resources/sql-functions/sql-expression-schema.md| 9 ++--- .../test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala | 7 ++- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md index 1e22ae2..2091de2 100644 --- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md +++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md @@ -1,6 +1,6 @@ ## Summary - - Number of queries: 333 + - Number of queries: 328 - Number of expressions that missing example: 34 - Expressions missing examples: and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,struct,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch ## Schema of Built-in Functions @@ -123,7 +123,7 @@ | org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual | >= | SELECT 2 >= 1 | struct<(2 >= 1):boolean> | | org.apache.spark.sql.catalyst.expressions.Greatest | greatest | SELECT greatest(10, 9, 2, 4, 3) | struct | | org.apache.spark.sql.catalyst.expressions.Grouping | grouping | SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name) | struct | -| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | struct | +| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | struct | | org.apache.spark.sql.catalyst.expressions.Hex | hex | SELECT hex(17) | struct | | org.apache.spark.sql.catalyst.expressions.Hour | hour | SELECT hour('2009-07-30 12:58:59') | struct | | org.apache.spark.sql.catalyst.expressions.Hypot | hypot | SELECT hypot(3, 4) | struct | @@ -140,7 +140,6 @@ | org.apache.spark.sql.catalyst.expressions.IsNaN | isnan | SELECT isnan(cast('NaN' as double)) | struct | | org.apache.spark.sql.catalyst.expressions.IsNotNull | isnotnull | SELECT isnotnull(1) | struct<(1 IS NOT NULL):boolean> | | org.apache.spark.sql.catalyst.expressions.IsNull | isnull | SELECT isnull(1) | struct<(1 IS NULL):boolean> | -| org.apache.spark.sql.catalyst.expressions.JsonObjectKeys | json_object_keys | SELECT json_object_keys('{}') | struct> | | org.apache.spark.sql.catalyst.expressions.JsonToStructs | from_json | SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') | struct> | | org.apache.spark.sql.catalyst.expressions.JsonTuple | json_tuple | SELECT json_tuple('{"a":1, "b":2}', 'a', 'b') | struct | | org.apache.spark.sql.catalyst.expressions.Lag | lag | N/A | N/A | @@ -151,7 +150,6 @@ | org.apache.spark.sql.catalyst.expressions.Length | character_length | SELECT character_length('Spark SQL ') | struct | | org.apache.spark.sql.catalyst.expressions.Length | char_length | SELECT char_length('Spark SQL ') | struct | | org.apache.spark.sql.catalyst.expressions.Length | length | SELECT length('Spark SQL ') | struct | -| org.apache.spark.sql.catalyst.expressions.LengthOfJsonArray | json_array_length | SELECT json_array_length('[1,2,3,4]') | struct | | org.apache.spark.sql.catalyst.express
[spark] branch branch-3.0 updated: [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 7c6b970 [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite 7c6b970 is described below commit 7c6b9708b6fbc81d583081a7b027fe1cce493b6c Author: Takeshi Yamamuro AuthorDate: Fri May 1 18:37:41 2020 +0900 [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite ### What changes were proposed in this pull request? This PR is a follow-up PR to update the golden file of `ExpressionsSchemaSuite`. ### Why are the changes needed? To recover tests in branch-3.0. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #28427 from maropu/SPARK-31372-FOLLOWUP. Authored-by: Takeshi Yamamuro Signed-off-by: Takeshi Yamamuro --- .../src/test/resources/sql-functions/sql-expression-schema.md| 9 ++--- .../test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala | 7 ++- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md index 1e22ae2..2091de2 100644 --- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md +++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md @@ -1,6 +1,6 @@ ## Summary - - Number of queries: 333 + - Number of queries: 328 - Number of expressions that missing example: 34 - Expressions missing examples: and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,struct,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch ## Schema of Built-in Functions @@ -123,7 +123,7 @@ | org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual | >= | SELECT 2 >= 1 | struct<(2 >= 1):boolean> | | org.apache.spark.sql.catalyst.expressions.Greatest | greatest | SELECT greatest(10, 9, 2, 4, 3) | struct | | org.apache.spark.sql.catalyst.expressions.Grouping | grouping | SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name) | struct | -| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | struct | +| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | struct | | org.apache.spark.sql.catalyst.expressions.Hex | hex | SELECT hex(17) | struct | | org.apache.spark.sql.catalyst.expressions.Hour | hour | SELECT hour('2009-07-30 12:58:59') | struct | | org.apache.spark.sql.catalyst.expressions.Hypot | hypot | SELECT hypot(3, 4) | struct | @@ -140,7 +140,6 @@ | org.apache.spark.sql.catalyst.expressions.IsNaN | isnan | SELECT isnan(cast('NaN' as double)) | struct | | org.apache.spark.sql.catalyst.expressions.IsNotNull | isnotnull | SELECT isnotnull(1) | struct<(1 IS NOT NULL):boolean> | | org.apache.spark.sql.catalyst.expressions.IsNull | isnull | SELECT isnull(1) | struct<(1 IS NULL):boolean> | -| org.apache.spark.sql.catalyst.expressions.JsonObjectKeys | json_object_keys | SELECT json_object_keys('{}') | struct> | | org.apache.spark.sql.catalyst.expressions.JsonToStructs | from_json | SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') | struct> | | org.apache.spark.sql.catalyst.expressions.JsonTuple | json_tuple | SELECT json_tuple('{"a":1, "b":2}', 'a', 'b') | struct | | org.apache.spark.sql.catalyst.expressions.Lag | lag | N/A | N/A | @@ -151,7 +150,6 @@ | org.apache.spark.sql.catalyst.expressions.Length | character_length | SELECT character_length('Spark SQL ') | struct | | org.apache.spark.sql.catalyst.expressions.Length | char_length | SELECT char_length('Spark SQL ') | struct | | org.apache.spark.sql.catalyst.expressions.Length | length | SELECT length('Spark SQL ') | struct | -| org.apache.spark.sql.catalyst.expressions.LengthOfJsonArray | json_array_length | SELECT json_array_length('[1,2,3,4]') | struct | | org.apache.spark.sql.catalyst.express
[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS] SQL Reference clean up
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f7c1feb [SPARK-31612][SQL][DOCS] SQL Reference clean up f7c1feb is described below commit f7c1feba123534bf9a64e7c381464c64c4572308 Author: Huaxin Gao AuthorDate: Fri May 1 06:30:35 2020 +0900 [SPARK-31612][SQL][DOCS] SQL Reference clean up ### What changes were proposed in this pull request? SQL Reference cleanup ### Why are the changes needed? To complete SQL Reference ### Does this PR introduce _any_ user-facing change? updated sql-ref-syntax-qry.html before https://user-images.githubusercontent.com/13592258/80677799-70b27280-8a6e-11ea-8e3f-a768f29d0377.png";> after https://user-images.githubusercontent.com/13592258/80677803-74de9000-8a6e-11ea-880c-aa05c53254a6.png";> ### How was this patch tested? Manually build and check Closes #28417 from huaxingao/cleanup. Authored-by: Huaxin Gao Signed-off-by: Takeshi Yamamuro (cherry picked from commit 2410a45703b829391211caaf1a745511f95298ad) Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-aux-describe-database.md | 2 +- docs/sql-ref-syntax-aux-show-tables.md | 2 +- docs/sql-ref-syntax-aux-show-views.md | 2 +- docs/sql-ref-syntax-ddl-alter-database.md | 2 +- docs/sql-ref-syntax-ddl-alter-table.md | 4 ++-- docs/sql-ref-syntax-ddl-alter-view.md | 6 +++--- docs/sql-ref-syntax-ddl-create-function.md | 4 ++-- docs/sql-ref-syntax-ddl-create-table-datasource.md | 2 +- docs/sql-ref-syntax-ddl-create-view.md | 4 ++-- docs/sql-ref-syntax-ddl-drop-database.md | 6 +++--- docs/sql-ref-syntax-ddl-drop-function.md | 18 +- docs/sql-ref-syntax-ddl-drop-table.md | 2 +- docs/sql-ref-syntax-ddl-drop-view.md | 2 +- docs/sql-ref-syntax-ddl-truncate-table.md | 6 +++--- docs/sql-ref-syntax-dml-insert-into.md | 4 ++-- ...l-ref-syntax-dml-insert-overwrite-directory-hive.md | 2 +- docs/sql-ref-syntax-dml-insert-overwrite-directory.md | 2 +- docs/sql-ref-syntax-dml-insert-overwrite-table.md | 2 +- docs/sql-ref-syntax-qry-select-usedb.md| 2 +- docs/sql-ref-syntax-qry.md | 11 ++- 20 files changed, 51 insertions(+), 34 deletions(-) diff --git a/docs/sql-ref-syntax-aux-describe-database.md b/docs/sql-ref-syntax-aux-describe-database.md index 2f7b1ce..590438b 100644 --- a/docs/sql-ref-syntax-aux-describe-database.md +++ b/docs/sql-ref-syntax-aux-describe-database.md @@ -42,7 +42,7 @@ interchangeable. -### Example +### Examples {% highlight sql %} -- Create employees DATABASE diff --git a/docs/sql-ref-syntax-aux-show-tables.md b/docs/sql-ref-syntax-aux-show-tables.md index f4b3dff..cd54d45 100644 --- a/docs/sql-ref-syntax-aux-show-tables.md +++ b/docs/sql-ref-syntax-aux-show-tables.md @@ -52,7 +52,7 @@ SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ] -### Example +### Examples {% highlight sql %} -- List all tables in default database diff --git a/docs/sql-ref-syntax-aux-show-views.md b/docs/sql-ref-syntax-aux-show-views.md index 0d9210b..b1a8d3b 100644 --- a/docs/sql-ref-syntax-aux-show-views.md +++ b/docs/sql-ref-syntax-aux-show-views.md @@ -51,7 +51,7 @@ SHOW VIEWS [ { FROM | IN } database_name ] [ LIKE regex_pattern ] -### Example +### Examples {% highlight sql %} -- Create views in different databases, also create global/local temp views. CREATE VIEW sam AS SELECT id, salary FROM employee WHERE name = 'sam'; diff --git a/docs/sql-ref-syntax-ddl-alter-database.md b/docs/sql-ref-syntax-ddl-alter-database.md index 520aba3..65b85dc 100644 --- a/docs/sql-ref-syntax-ddl-alter-database.md +++ b/docs/sql-ref-syntax-ddl-alter-database.md @@ -31,7 +31,7 @@ for a database and may be used for auditing purposes. {% highlight sql %} ALTER { DATABASE | SCHEMA } database_name -SET DBPROPERTIES ( property_name = property_value, ... ) +SET DBPROPERTIES ( property_name = property_value [ , ... ] ) {% endhighlight %} ### Parameters diff --git a/docs/sql-ref-syntax-ddl-alter-table.md b/docs/sql-ref-syntax-ddl-alter-table.md index edb081b..0a74aa0 100644 --- a/docs/sql-ref-syntax-ddl-alter-table.md +++ b/docs/sql-ref-syntax-ddl-alter-table.md @@ -66,7 +66,7 @@ ALTER TABLE table_identifier partition_spec RENAME TO partition_spec Syntax {% highlight sql %} -ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , col_spec ... ] ) +ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , ... ] ) {% endhighl