from:"\"yamamuro\""

[spark] branch master updated (b5297c4 -> 65286ae)

2020-07-07 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b5297c4  [SPARK-20680][SQL] Spark-sql do not support for creating 
table with void column datatype
 add 65286ae  [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment

2020-07-07 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 65286ae  [SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment
65286ae is described below

commit 65286aec4b3c4e93d8beac6dd1b097ce97d53fd8
Author: ulysses 
AuthorDate: Wed Jul 8 11:30:47 2020 +0900

[SPARK-30703][SQL][FOLLOWUP] Update SqlBase.g4 invalid comment

### What changes were proposed in this pull request?

Modify the comment of `SqlBase.g4`.

### Why are the changes needed?

`docs/sql-keywords.md` has already moved to 
`docs/sql-ref-ansi-compliance.md#sql-keywords`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No need.

Closes #29033 from ulysses-you/SPARK-30703-FOLLOWUP.

Authored-by: ulysses 
Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 691fde8..b383e03 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -1461,8 +1461,7 @@ nonReserved
 ;
 
 // NOTE: If you add a new token in the list below, you should update the list 
of keywords
-// in `docs/sql-keywords.md`. If the token is a non-reserved keyword,
-// please update `ansiNonReserved` and `nonReserved` as well.
+// and reserved tag in `docs/sql-ref-ansi-compliance.md#sql-keywords`.
 
 //
 // Start of the keywords list


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a9247c3 -> 7b86838)

2020-06-19 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a9247c3  [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka 
connector executor side to avoid infinite wait
 add 7b86838  [SPARK-31350][SQL] Coalesce bucketed tables for sort merge 
join if applicable

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  20 +++
 .../spark/sql/execution/DataSourceScanExec.scala   |  29 ++-
 .../spark/sql/execution/QueryExecution.scala   |   2 +
 .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++
 .../execution/datasources/FileSourceStrategy.scala |   1 +
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |  17 ++
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   2 +-
 .../CoalesceBucketsInSortMergeJoinSuite.scala  | 194 +
 .../spark/sql/sources/BucketedReadSuite.scala  | 137 ++-
 10 files changed, 523 insertions(+), 13 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a9247c3 -> 7b86838)

2020-06-19 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a9247c3  [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka 
connector executor side to avoid infinite wait
 add 7b86838  [SPARK-31350][SQL] Coalesce bucketed tables for sort merge 
join if applicable

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  20 +++
 .../spark/sql/execution/DataSourceScanExec.scala   |  29 ++-
 .../spark/sql/execution/QueryExecution.scala   |   2 +
 .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++
 .../execution/datasources/FileSourceStrategy.scala |   1 +
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |  17 ++
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   2 +-
 .../CoalesceBucketsInSortMergeJoinSuite.scala  | 194 +
 .../spark/sql/sources/BucketedReadSuite.scala  | 137 ++-
 10 files changed, 523 insertions(+), 13 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a9247c3 -> 7b86838)

2020-06-19 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a9247c3  [SPARK-32033][SS][DSTEAMS] Use new poll API in Kafka 
connector executor side to avoid infinite wait
 add 7b86838  [SPARK-31350][SQL] Coalesce bucketed tables for sort merge 
join if applicable

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  20 +++
 .../spark/sql/execution/DataSourceScanExec.scala   |  29 ++-
 .../spark/sql/execution/QueryExecution.scala   |   2 +
 .../bucketing/CoalesceBucketsInSortMergeJoin.scala | 132 ++
 .../execution/datasources/FileSourceStrategy.scala |   1 +
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |   2 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |  17 ++
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   2 +-
 .../CoalesceBucketsInSortMergeJoinSuite.scala  | 194 +
 .../spark/sql/sources/BucketedReadSuite.scala  | 137 ++-
 10 files changed, 523 insertions(+), 13 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoin.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInSortMergeJoinSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ed69190ce0762f3b741b8d175ef8d02da45f3183
Author: Takeshi Yamamuro 
AuthorDate: Tue Jun 16 00:27:45 2020 +0900

[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

### What changes were proposed in this pull request?

This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved 
to non-reserved.

### Why are the changes needed?

To comply with the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28807 from maropu/SPARK-26905-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index eab194c..e5ca7e9d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL.
 |ALTER|non-reserved|non-reserved|reserved|
 |ANALYZE|non-reserved|non-reserved|non-reserved|
 |AND|reserved|non-reserved|reserved|
-|ANTI|reserved|strict-non-reserved|non-reserved|
+|ANTI|non-reserved|strict-non-reserved|non-reserved|
 |ANY|reserved|non-reserved|reserved|
 |ARCHIVE|non-reserved|non-reserved|non-reserved|
 |ARRAY|non-reserved|non-reserved|reserved|
@@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL.
 |MAP|non-reserved|non-reserved|non-reserved|
 |MATCHED|non-reserved|non-reserved|non-reserved|
 |MERGE|non-reserved|non-reserved|non-reserved|
-|MINUS|reserved|strict-non-reserved|non-reserved|
+|MINUS|not-reserved|strict-non-reserved|non-reserved|
 |MINUTE|reserved|non-reserved|reserved|
 |MONTH|reserved|non-reserved|reserved|
 |MSCK|non-reserved|non-reserved|non-reserved|
@@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL.
 |SCHEMA|non-reserved|non-reserved|non-reserved|
 |SECOND|reserved|non-reserved|reserved|
 |SELECT|reserved|non-reserved|reserved|
-|SEMI|reserved|strict-non-reserved|non-reserved|
+|SEMI|non-reserved|strict-non-reserved|non-reserved|
 |SEPARATED|non-reserved|non-reserved|non-reserved|
 |SERDE|non-reserved|non-reserved|non-reserved|
 |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 14a6687..5821a74 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -994,6 +994,7 @@ ansiNonReserved
 | AFTER
 | ALTER
 | ANALYZE
+| ANTI
 | ARCHIVE
 | ARRAY
 | ASC
@@ -1126,10 +1127,12 @@ ansiNonReserved
 | ROW
 | ROWS
 | SCHEMA
+| SEMI
 | SEPARATED
 | SERDE
 | SERDEPROPERTIES
 | SET
+| SETMINUS
 | SETS
 | SHOW
 | SKEWED
diff --git 
a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt 
b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
new file mode 100644
index 000..921491a
--- /dev/null
+++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
@@ -0,0 +1,401 @@
+-- This file comes from: 
https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords
+ABS
+ACOS
+ALL
+ALLOCATE
+ALTER
+AND
+ANY
+ARE
+ARRAY
+ARRAY_AGG
+ARRAY_MAX_CARDINALITY
+AS
+ASENSITIVE
+ASIN
+ASYMMETRIC
+AT
+ATAN
+ATOMIC
+AUTHORIZATION
+AVG
+BEGIN
+BEGIN_FRAME
+BEGIN_PARTITION
+BETWEEN
+BIGINT
+BINARY
+BLOB
+BOOLEAN
+BOTH
+BY
+CALL
+CALLED
+CARDINALITY
+CASCADED
+CASE
+CAST
+CEIL
+CEILING
+CHAR
+CHAR_LENGTH
+CHARACTER
+CHARACTER_LENGTH
+CHECK
+CLASSIFIER
+CLOB
+CLOSE
+COALESCE
+COLLATE
+COLLECT
+COLUMN
+COMMIT
+CONDITION
+CONNECT
+CONSTRAINT
+CONTAINS
+CONVERT
+COPY
+CORR
+CORRESPONDING
+COS
+COSH
+COUNT
+COVAR_POP
+COVAR_SAMP
+CREATE
+CROSS
+CUBE
+CUME_DIST
+CURRENT
+CURRENT_CATALOG
+CURRENT_DATE
+CURRENT_DEFAULT_TRANSFORM_GROUP
+CURRENT_PATH
+CURRENT_ROLE
+CURRENT_ROW
+CURRENT_SCHEMA
+CURRENT_TIME
+CURRENT_TIMESTAMP
+CURRENT_TRANSFORM_GROUP_FOR_TYPE
+CURRENT_USER
+CURSOR
+CYCLE
+DATE
+DAY
+DEALLOCATE
+DEC
+DECIMAL
+DECFLOAT
+DECLARE
+DEFAULT
+DEFINE
+DELETE
+DENSE_RANK
+DEREF
+DESCRIBE
+DETERMINISTIC
+DISCONNECT
+DISTINCT
+DOUBLE
+DROP
+DYNAMIC
+EACH
+ELEMENT
+ELSE
+EMPTY
+END
+END_FRAME
+END_PARTITION
+END-EXEC
+EQUALS
+ESCAPE
+EVERY
+EXCEPT
+EXEC
+EXECUTE
+EXISTS
+EXP
+EXTERNAL
+EXTRACT
+FALSE
+FETCH
+FILTER
+FIRST_VALUE

[spark] branch branch-3.0 updated (764da2f -> ed69190)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 764da2f  [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates
 new b70c68a  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 new ed69190  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   7 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 ++
 .../parser/TableIdentifierParserSuite.scala| 452 ++---
 4 files changed, 537 insertions(+), 329 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b70c68ae458d929cbf28a084cecf8252b4a3849f
Author: Takeshi Yamamuro 
AuthorDate: Sat Jun 13 07:12:27 2020 +0900

[SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the 
ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: 
https://github.com/apache/spark/pull/28779#issuecomment-642033217

### Why are the changes needed?

It is hard to maintain a full set of the keywords in 
`TableIdentifierParserSuite`, so it would be nice if we could extract them from 
the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 208a503..14a6687 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -989,6 +989,7 @@ alterColumnAction
 // You can find the full keywords list by searching "Start of the keywords 
list" in this file.
 // The non-reserved keywords are listed below. Keywords not in this list are 
reserved keywords.
 ansiNonReserved
+//--ANSI-NON-RESERVED-START
 : ADD
 | AFTER
 | ALTER
@@ -1165,6 +1166,7 @@ ansiNonReserved
 | VIEW
 | VIEWS
 | WINDOW
+//--ANSI-NON-RESERVED-END
 ;
 
 // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords 
in Spark SQL.
@@ -1442,6 +1444,7 @@ nonReserved
 //
 // Start of the keywords list
 //
+//--SPARK-KEYWORD-LIST-START
 ADD: 'ADD';
 AFTER: 'AFTER';
 ALL: 'ALL';
@@ -1694,6 +1697,7 @@ WHERE: 'WHERE';
 WINDOW: 'WINDOW';
 WITH: 'WITH';
 YEAR: 'YEAR';
+//--SPARK-KEYWORD-LIST-END
 //
 // End of the keywords list
 //
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index bd617bf..04969e3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -16,9 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
+import java.util.Locale
+
+import scala.collection.mutable
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.catalyst.util.fileToString
 import org.apache.spark.sql.internal.SQLConf
 
 class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
@@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
 "where",
 "with")
 
-  // All the keywords in `docs/sql-keywords.md` are listed below:
-  val allCandidateKeywords = Set(
-"add",
-"after",
-"all",
-"alter",
-"analyze",
-"and",
-"anti",
-"any",
-"archive",
-"array",
-"as",
-"asc",
-"at",
-"authorization",
-"between",
-"both",
-"bucket",
-"buckets",
-"by",
-"cache",
-"cascade",
-"case",
-"cast",
-"change",
-"check",
-"clear",
-"cluster",
-"clustered",
-"codegen",
-"collate",
-"collection",
-"column",
-"columns",
-"comment",
-"commit",
-"compact",
-"compactions",
-"compute",
-"concatenate",
-"constraint",
-"cost",
-"create",
-"cross",
-&q

[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ed69190ce0762f3b741b8d175ef8d02da45f3183
Author: Takeshi Yamamuro 
AuthorDate: Tue Jun 16 00:27:45 2020 +0900

[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

### What changes were proposed in this pull request?

This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved 
to non-reserved.

### Why are the changes needed?

To comply with the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28807 from maropu/SPARK-26905-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index eab194c..e5ca7e9d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL.
 |ALTER|non-reserved|non-reserved|reserved|
 |ANALYZE|non-reserved|non-reserved|non-reserved|
 |AND|reserved|non-reserved|reserved|
-|ANTI|reserved|strict-non-reserved|non-reserved|
+|ANTI|non-reserved|strict-non-reserved|non-reserved|
 |ANY|reserved|non-reserved|reserved|
 |ARCHIVE|non-reserved|non-reserved|non-reserved|
 |ARRAY|non-reserved|non-reserved|reserved|
@@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL.
 |MAP|non-reserved|non-reserved|non-reserved|
 |MATCHED|non-reserved|non-reserved|non-reserved|
 |MERGE|non-reserved|non-reserved|non-reserved|
-|MINUS|reserved|strict-non-reserved|non-reserved|
+|MINUS|not-reserved|strict-non-reserved|non-reserved|
 |MINUTE|reserved|non-reserved|reserved|
 |MONTH|reserved|non-reserved|reserved|
 |MSCK|non-reserved|non-reserved|non-reserved|
@@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL.
 |SCHEMA|non-reserved|non-reserved|non-reserved|
 |SECOND|reserved|non-reserved|reserved|
 |SELECT|reserved|non-reserved|reserved|
-|SEMI|reserved|strict-non-reserved|non-reserved|
+|SEMI|non-reserved|strict-non-reserved|non-reserved|
 |SEPARATED|non-reserved|non-reserved|non-reserved|
 |SERDE|non-reserved|non-reserved|non-reserved|
 |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 14a6687..5821a74 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -994,6 +994,7 @@ ansiNonReserved
 | AFTER
 | ALTER
 | ANALYZE
+| ANTI
 | ARCHIVE
 | ARRAY
 | ASC
@@ -1126,10 +1127,12 @@ ansiNonReserved
 | ROW
 | ROWS
 | SCHEMA
+| SEMI
 | SEPARATED
 | SERDE
 | SERDEPROPERTIES
 | SET
+| SETMINUS
 | SETS
 | SHOW
 | SKEWED
diff --git 
a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt 
b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
new file mode 100644
index 000..921491a
--- /dev/null
+++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
@@ -0,0 +1,401 @@
+-- This file comes from: 
https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords
+ABS
+ACOS
+ALL
+ALLOCATE
+ALTER
+AND
+ANY
+ARE
+ARRAY
+ARRAY_AGG
+ARRAY_MAX_CARDINALITY
+AS
+ASENSITIVE
+ASIN
+ASYMMETRIC
+AT
+ATAN
+ATOMIC
+AUTHORIZATION
+AVG
+BEGIN
+BEGIN_FRAME
+BEGIN_PARTITION
+BETWEEN
+BIGINT
+BINARY
+BLOB
+BOOLEAN
+BOTH
+BY
+CALL
+CALLED
+CARDINALITY
+CASCADED
+CASE
+CAST
+CEIL
+CEILING
+CHAR
+CHAR_LENGTH
+CHARACTER
+CHARACTER_LENGTH
+CHECK
+CLASSIFIER
+CLOB
+CLOSE
+COALESCE
+COLLATE
+COLLECT
+COLUMN
+COMMIT
+CONDITION
+CONNECT
+CONSTRAINT
+CONTAINS
+CONVERT
+COPY
+CORR
+CORRESPONDING
+COS
+COSH
+COUNT
+COVAR_POP
+COVAR_SAMP
+CREATE
+CROSS
+CUBE
+CUME_DIST
+CURRENT
+CURRENT_CATALOG
+CURRENT_DATE
+CURRENT_DEFAULT_TRANSFORM_GROUP
+CURRENT_PATH
+CURRENT_ROLE
+CURRENT_ROW
+CURRENT_SCHEMA
+CURRENT_TIME
+CURRENT_TIMESTAMP
+CURRENT_TRANSFORM_GROUP_FOR_TYPE
+CURRENT_USER
+CURSOR
+CYCLE
+DATE
+DAY
+DEALLOCATE
+DEC
+DECIMAL
+DECFLOAT
+DECLARE
+DEFAULT
+DEFINE
+DELETE
+DENSE_RANK
+DEREF
+DESCRIBE
+DETERMINISTIC
+DISCONNECT
+DISTINCT
+DOUBLE
+DROP
+DYNAMIC
+EACH
+ELEMENT
+ELSE
+EMPTY
+END
+END_FRAME
+END_PARTITION
+END-EXEC
+EQUALS
+ESCAPE
+EVERY
+EXCEPT
+EXEC
+EXECUTE
+EXISTS
+EXP
+EXTERNAL
+EXTRACT
+FALSE
+FETCH
+FILTER
+FIRST_VALUE

[spark] branch branch-3.0 updated (764da2f -> ed69190)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 764da2f  [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates
 new b70c68a  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 new ed69190  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   7 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 ++
 .../parser/TableIdentifierParserSuite.scala| 452 ++---
 4 files changed, 537 insertions(+), 329 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b70c68ae458d929cbf28a084cecf8252b4a3849f
Author: Takeshi Yamamuro 
AuthorDate: Sat Jun 13 07:12:27 2020 +0900

[SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the 
ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: 
https://github.com/apache/spark/pull/28779#issuecomment-642033217

### Why are the changes needed?

It is hard to maintain a full set of the keywords in 
`TableIdentifierParserSuite`, so it would be nice if we could extract them from 
the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 208a503..14a6687 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -989,6 +989,7 @@ alterColumnAction
 // You can find the full keywords list by searching "Start of the keywords 
list" in this file.
 // The non-reserved keywords are listed below. Keywords not in this list are 
reserved keywords.
 ansiNonReserved
+//--ANSI-NON-RESERVED-START
 : ADD
 | AFTER
 | ALTER
@@ -1165,6 +1166,7 @@ ansiNonReserved
 | VIEW
 | VIEWS
 | WINDOW
+//--ANSI-NON-RESERVED-END
 ;
 
 // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords 
in Spark SQL.
@@ -1442,6 +1444,7 @@ nonReserved
 //
 // Start of the keywords list
 //
+//--SPARK-KEYWORD-LIST-START
 ADD: 'ADD';
 AFTER: 'AFTER';
 ALL: 'ALL';
@@ -1694,6 +1697,7 @@ WHERE: 'WHERE';
 WINDOW: 'WINDOW';
 WITH: 'WITH';
 YEAR: 'YEAR';
+//--SPARK-KEYWORD-LIST-END
 //
 // End of the keywords list
 //
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index bd617bf..04969e3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -16,9 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
+import java.util.Locale
+
+import scala.collection.mutable
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.catalyst.util.fileToString
 import org.apache.spark.sql.internal.SQLConf
 
 class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
@@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
 "where",
 "with")
 
-  // All the keywords in `docs/sql-keywords.md` are listed below:
-  val allCandidateKeywords = Set(
-"add",
-"after",
-"all",
-"alter",
-"analyze",
-"and",
-"anti",
-"any",
-"archive",
-"array",
-"as",
-"asc",
-"at",
-"authorization",
-"between",
-"both",
-"bucket",
-"buckets",
-"by",
-"cache",
-"cascade",
-"case",
-"cast",
-"change",
-"check",
-"clear",
-"cluster",
-"clustered",
-"codegen",
-"collate",
-"collection",
-"column",
-"columns",
-"comment",
-"commit",
-"compact",
-"compactions",
-"compute",
-"concatenate",
-"constraint",
-"cost",
-"create",
-"cross",
-&q

[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b70c68ae458d929cbf28a084cecf8252b4a3849f
Author: Takeshi Yamamuro 
AuthorDate: Sat Jun 13 07:12:27 2020 +0900

[SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the 
ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: 
https://github.com/apache/spark/pull/28779#issuecomment-642033217

### Why are the changes needed?

It is hard to maintain a full set of the keywords in 
`TableIdentifierParserSuite`, so it would be nice if we could extract them from 
the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 208a503..14a6687 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -989,6 +989,7 @@ alterColumnAction
 // You can find the full keywords list by searching "Start of the keywords 
list" in this file.
 // The non-reserved keywords are listed below. Keywords not in this list are 
reserved keywords.
 ansiNonReserved
+//--ANSI-NON-RESERVED-START
 : ADD
 | AFTER
 | ALTER
@@ -1165,6 +1166,7 @@ ansiNonReserved
 | VIEW
 | VIEWS
 | WINDOW
+//--ANSI-NON-RESERVED-END
 ;
 
 // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords 
in Spark SQL.
@@ -1442,6 +1444,7 @@ nonReserved
 //
 // Start of the keywords list
 //
+//--SPARK-KEYWORD-LIST-START
 ADD: 'ADD';
 AFTER: 'AFTER';
 ALL: 'ALL';
@@ -1694,6 +1697,7 @@ WHERE: 'WHERE';
 WINDOW: 'WINDOW';
 WITH: 'WITH';
 YEAR: 'YEAR';
+//--SPARK-KEYWORD-LIST-END
 //
 // End of the keywords list
 //
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index bd617bf..04969e3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -16,9 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
+import java.util.Locale
+
+import scala.collection.mutable
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.catalyst.util.fileToString
 import org.apache.spark.sql.internal.SQLConf
 
 class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
@@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
 "where",
 "with")
 
-  // All the keywords in `docs/sql-keywords.md` are listed below:
-  val allCandidateKeywords = Set(
-"add",
-"after",
-"all",
-"alter",
-"analyze",
-"and",
-"anti",
-"any",
-"archive",
-"array",
-"as",
-"asc",
-"at",
-"authorization",
-"between",
-"both",
-"bucket",
-"buckets",
-"by",
-"cache",
-"cascade",
-"case",
-"cast",
-"change",
-"check",
-"clear",
-"cluster",
-"clustered",
-"codegen",
-"collate",
-"collection",
-"column",
-"columns",
-"comment",
-"commit",
-"compact",
-"compactions",
-"compute",
-"concatenate",
-"constraint",
-"cost",
-"create",
-"cross",
-&q

[spark] branch branch-3.0 updated (764da2f -> ed69190)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 764da2f  [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates
 new b70c68a  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 new ed69190  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   7 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 ++
 .../parser/TableIdentifierParserSuite.scala| 452 ++---
 4 files changed, 537 insertions(+), 329 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ed69190ce0762f3b741b8d175ef8d02da45f3183
Author: Takeshi Yamamuro 
AuthorDate: Tue Jun 16 00:27:45 2020 +0900

[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

### What changes were proposed in this pull request?

This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved 
to non-reserved.

### Why are the changes needed?

To comply with the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28807 from maropu/SPARK-26905-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index eab194c..e5ca7e9d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL.
 |ALTER|non-reserved|non-reserved|reserved|
 |ANALYZE|non-reserved|non-reserved|non-reserved|
 |AND|reserved|non-reserved|reserved|
-|ANTI|reserved|strict-non-reserved|non-reserved|
+|ANTI|non-reserved|strict-non-reserved|non-reserved|
 |ANY|reserved|non-reserved|reserved|
 |ARCHIVE|non-reserved|non-reserved|non-reserved|
 |ARRAY|non-reserved|non-reserved|reserved|
@@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL.
 |MAP|non-reserved|non-reserved|non-reserved|
 |MATCHED|non-reserved|non-reserved|non-reserved|
 |MERGE|non-reserved|non-reserved|non-reserved|
-|MINUS|reserved|strict-non-reserved|non-reserved|
+|MINUS|not-reserved|strict-non-reserved|non-reserved|
 |MINUTE|reserved|non-reserved|reserved|
 |MONTH|reserved|non-reserved|reserved|
 |MSCK|non-reserved|non-reserved|non-reserved|
@@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL.
 |SCHEMA|non-reserved|non-reserved|non-reserved|
 |SECOND|reserved|non-reserved|reserved|
 |SELECT|reserved|non-reserved|reserved|
-|SEMI|reserved|strict-non-reserved|non-reserved|
+|SEMI|non-reserved|strict-non-reserved|non-reserved|
 |SEPARATED|non-reserved|non-reserved|non-reserved|
 |SERDE|non-reserved|non-reserved|non-reserved|
 |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 14a6687..5821a74 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -994,6 +994,7 @@ ansiNonReserved
 | AFTER
 | ALTER
 | ANALYZE
+| ANTI
 | ARCHIVE
 | ARRAY
 | ASC
@@ -1126,10 +1127,12 @@ ansiNonReserved
 | ROW
 | ROWS
 | SCHEMA
+| SEMI
 | SEPARATED
 | SERDE
 | SERDEPROPERTIES
 | SET
+| SETMINUS
 | SETS
 | SHOW
 | SKEWED
diff --git 
a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt 
b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
new file mode 100644
index 000..921491a
--- /dev/null
+++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
@@ -0,0 +1,401 @@
+-- This file comes from: 
https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords
+ABS
+ACOS
+ALL
+ALLOCATE
+ALTER
+AND
+ANY
+ARE
+ARRAY
+ARRAY_AGG
+ARRAY_MAX_CARDINALITY
+AS
+ASENSITIVE
+ASIN
+ASYMMETRIC
+AT
+ATAN
+ATOMIC
+AUTHORIZATION
+AVG
+BEGIN
+BEGIN_FRAME
+BEGIN_PARTITION
+BETWEEN
+BIGINT
+BINARY
+BLOB
+BOOLEAN
+BOTH
+BY
+CALL
+CALLED
+CARDINALITY
+CASCADED
+CASE
+CAST
+CEIL
+CEILING
+CHAR
+CHAR_LENGTH
+CHARACTER
+CHARACTER_LENGTH
+CHECK
+CLASSIFIER
+CLOB
+CLOSE
+COALESCE
+COLLATE
+COLLECT
+COLUMN
+COMMIT
+CONDITION
+CONNECT
+CONSTRAINT
+CONTAINS
+CONVERT
+COPY
+CORR
+CORRESPONDING
+COS
+COSH
+COUNT
+COVAR_POP
+COVAR_SAMP
+CREATE
+CROSS
+CUBE
+CUME_DIST
+CURRENT
+CURRENT_CATALOG
+CURRENT_DATE
+CURRENT_DEFAULT_TRANSFORM_GROUP
+CURRENT_PATH
+CURRENT_ROLE
+CURRENT_ROW
+CURRENT_SCHEMA
+CURRENT_TIME
+CURRENT_TIMESTAMP
+CURRENT_TRANSFORM_GROUP_FOR_TYPE
+CURRENT_USER
+CURSOR
+CYCLE
+DATE
+DAY
+DEALLOCATE
+DEC
+DECIMAL
+DECFLOAT
+DECLARE
+DEFAULT
+DEFINE
+DELETE
+DENSE_RANK
+DEREF
+DESCRIBE
+DETERMINISTIC
+DISCONNECT
+DISTINCT
+DOUBLE
+DROP
+DYNAMIC
+EACH
+ELEMENT
+ELSE
+EMPTY
+END
+END_FRAME
+END_PARTITION
+END-EXEC
+EQUALS
+ESCAPE
+EVERY
+EXCEPT
+EXEC
+EXECUTE
+EXISTS
+EXP
+EXTERNAL
+EXTRACT
+FALSE
+FETCH
+FILTER
+FIRST_VALUE

[spark] branch branch-3.0 updated (764da2f -> ed69190)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 764da2f  [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates
 new b70c68a  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 new ed69190  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   7 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 ++
 .../parser/TableIdentifierParserSuite.scala| 452 ++---
 4 files changed, 537 insertions(+), 329 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ed69190ce0762f3b741b8d175ef8d02da45f3183
Author: Takeshi Yamamuro 
AuthorDate: Tue Jun 16 00:27:45 2020 +0900

[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

### What changes were proposed in this pull request?

This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved 
to non-reserved.

### Why are the changes needed?

To comply with the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28807 from maropu/SPARK-26905-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index eab194c..e5ca7e9d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL.
 |ALTER|non-reserved|non-reserved|reserved|
 |ANALYZE|non-reserved|non-reserved|non-reserved|
 |AND|reserved|non-reserved|reserved|
-|ANTI|reserved|strict-non-reserved|non-reserved|
+|ANTI|non-reserved|strict-non-reserved|non-reserved|
 |ANY|reserved|non-reserved|reserved|
 |ARCHIVE|non-reserved|non-reserved|non-reserved|
 |ARRAY|non-reserved|non-reserved|reserved|
@@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL.
 |MAP|non-reserved|non-reserved|non-reserved|
 |MATCHED|non-reserved|non-reserved|non-reserved|
 |MERGE|non-reserved|non-reserved|non-reserved|
-|MINUS|reserved|strict-non-reserved|non-reserved|
+|MINUS|not-reserved|strict-non-reserved|non-reserved|
 |MINUTE|reserved|non-reserved|reserved|
 |MONTH|reserved|non-reserved|reserved|
 |MSCK|non-reserved|non-reserved|non-reserved|
@@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL.
 |SCHEMA|non-reserved|non-reserved|non-reserved|
 |SECOND|reserved|non-reserved|reserved|
 |SELECT|reserved|non-reserved|reserved|
-|SEMI|reserved|strict-non-reserved|non-reserved|
+|SEMI|non-reserved|strict-non-reserved|non-reserved|
 |SEPARATED|non-reserved|non-reserved|non-reserved|
 |SERDE|non-reserved|non-reserved|non-reserved|
 |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 14a6687..5821a74 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -994,6 +994,7 @@ ansiNonReserved
 | AFTER
 | ALTER
 | ANALYZE
+| ANTI
 | ARCHIVE
 | ARRAY
 | ASC
@@ -1126,10 +1127,12 @@ ansiNonReserved
 | ROW
 | ROWS
 | SCHEMA
+| SEMI
 | SEPARATED
 | SERDE
 | SERDEPROPERTIES
 | SET
+| SETMINUS
 | SETS
 | SHOW
 | SKEWED
diff --git 
a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt 
b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
new file mode 100644
index 000..921491a
--- /dev/null
+++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
@@ -0,0 +1,401 @@
+-- This file comes from: 
https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords
+ABS
+ACOS
+ALL
+ALLOCATE
+ALTER
+AND
+ANY
+ARE
+ARRAY
+ARRAY_AGG
+ARRAY_MAX_CARDINALITY
+AS
+ASENSITIVE
+ASIN
+ASYMMETRIC
+AT
+ATAN
+ATOMIC
+AUTHORIZATION
+AVG
+BEGIN
+BEGIN_FRAME
+BEGIN_PARTITION
+BETWEEN
+BIGINT
+BINARY
+BLOB
+BOOLEAN
+BOTH
+BY
+CALL
+CALLED
+CARDINALITY
+CASCADED
+CASE
+CAST
+CEIL
+CEILING
+CHAR
+CHAR_LENGTH
+CHARACTER
+CHARACTER_LENGTH
+CHECK
+CLASSIFIER
+CLOB
+CLOSE
+COALESCE
+COLLATE
+COLLECT
+COLUMN
+COMMIT
+CONDITION
+CONNECT
+CONSTRAINT
+CONTAINS
+CONVERT
+COPY
+CORR
+CORRESPONDING
+COS
+COSH
+COUNT
+COVAR_POP
+COVAR_SAMP
+CREATE
+CROSS
+CUBE
+CUME_DIST
+CURRENT
+CURRENT_CATALOG
+CURRENT_DATE
+CURRENT_DEFAULT_TRANSFORM_GROUP
+CURRENT_PATH
+CURRENT_ROLE
+CURRENT_ROW
+CURRENT_SCHEMA
+CURRENT_TIME
+CURRENT_TIMESTAMP
+CURRENT_TRANSFORM_GROUP_FOR_TYPE
+CURRENT_USER
+CURSOR
+CYCLE
+DATE
+DAY
+DEALLOCATE
+DEC
+DECIMAL
+DECFLOAT
+DECLARE
+DEFAULT
+DEFINE
+DELETE
+DENSE_RANK
+DEREF
+DESCRIBE
+DETERMINISTIC
+DISCONNECT
+DISTINCT
+DOUBLE
+DROP
+DYNAMIC
+EACH
+ELEMENT
+ELSE
+EMPTY
+END
+END_FRAME
+END_PARTITION
+END-EXEC
+EQUALS
+ESCAPE
+EVERY
+EXCEPT
+EXEC
+EXECUTE
+EXISTS
+EXP
+EXTERNAL
+EXTRACT
+FALSE
+FETCH
+FILTER
+FIRST_VALUE

[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b70c68ae458d929cbf28a084cecf8252b4a3849f
Author: Takeshi Yamamuro 
AuthorDate: Sat Jun 13 07:12:27 2020 +0900

[SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the 
ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: 
https://github.com/apache/spark/pull/28779#issuecomment-642033217

### Why are the changes needed?

It is hard to maintain a full set of the keywords in 
`TableIdentifierParserSuite`, so it would be nice if we could extract them from 
the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 208a503..14a6687 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -989,6 +989,7 @@ alterColumnAction
 // You can find the full keywords list by searching "Start of the keywords 
list" in this file.
 // The non-reserved keywords are listed below. Keywords not in this list are 
reserved keywords.
 ansiNonReserved
+//--ANSI-NON-RESERVED-START
 : ADD
 | AFTER
 | ALTER
@@ -1165,6 +1166,7 @@ ansiNonReserved
 | VIEW
 | VIEWS
 | WINDOW
+//--ANSI-NON-RESERVED-END
 ;
 
 // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords 
in Spark SQL.
@@ -1442,6 +1444,7 @@ nonReserved
 //
 // Start of the keywords list
 //
+//--SPARK-KEYWORD-LIST-START
 ADD: 'ADD';
 AFTER: 'AFTER';
 ALL: 'ALL';
@@ -1694,6 +1697,7 @@ WHERE: 'WHERE';
 WINDOW: 'WINDOW';
 WITH: 'WITH';
 YEAR: 'YEAR';
+//--SPARK-KEYWORD-LIST-END
 //
 // End of the keywords list
 //
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index bd617bf..04969e3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -16,9 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
+import java.util.Locale
+
+import scala.collection.mutable
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.catalyst.util.fileToString
 import org.apache.spark.sql.internal.SQLConf
 
 class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
@@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
 "where",
 "with")
 
-  // All the keywords in `docs/sql-keywords.md` are listed below:
-  val allCandidateKeywords = Set(
-"add",
-"after",
-"all",
-"alter",
-"analyze",
-"and",
-"anti",
-"any",
-"archive",
-"array",
-"as",
-"asc",
-"at",
-"authorization",
-"between",
-"both",
-"bucket",
-"buckets",
-"by",
-"cache",
-"cascade",
-"case",
-"cast",
-"change",
-"check",
-"clear",
-"cluster",
-"clustered",
-"codegen",
-"collate",
-"collection",
-"column",
-"columns",
-"comment",
-"commit",
-"compact",
-"compactions",
-"compute",
-"concatenate",
-"constraint",
-"cost",
-"create",
-"cross",
-&q

[spark] branch branch-3.0 updated (764da2f -> ed69190)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 764da2f  [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates
 new b70c68a  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 new ed69190  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   7 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 ++
 .../parser/TableIdentifierParserSuite.scala| 452 ++---
 4 files changed, 537 insertions(+), 329 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/02: [SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b70c68ae458d929cbf28a084cecf8252b4a3849f
Author: Takeshi Yamamuro 
AuthorDate: Sat Jun 13 07:12:27 2020 +0900

[SPARK-31950][SQL][TESTS] Extract SQL keywords from the SqlBase.g4 file

### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the 
ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: 
https://github.com/apache/spark/pull/28779#issuecomment-642033217

### Why are the changes needed?

It is hard to maintain a full set of the keywords in 
`TableIdentifierParserSuite`, so it would be nice if we could extract them from 
the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 208a503..14a6687 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -989,6 +989,7 @@ alterColumnAction
 // You can find the full keywords list by searching "Start of the keywords 
list" in this file.
 // The non-reserved keywords are listed below. Keywords not in this list are 
reserved keywords.
 ansiNonReserved
+//--ANSI-NON-RESERVED-START
 : ADD
 | AFTER
 | ALTER
@@ -1165,6 +1166,7 @@ ansiNonReserved
 | VIEW
 | VIEWS
 | WINDOW
+//--ANSI-NON-RESERVED-END
 ;
 
 // When `SQL_standard_keyword_behavior=false`, there are 2 kinds of keywords 
in Spark SQL.
@@ -1442,6 +1444,7 @@ nonReserved
 //
 // Start of the keywords list
 //
+//--SPARK-KEYWORD-LIST-START
 ADD: 'ADD';
 AFTER: 'AFTER';
 ALL: 'ALL';
@@ -1694,6 +1697,7 @@ WHERE: 'WHERE';
 WINDOW: 'WINDOW';
 WITH: 'WITH';
 YEAR: 'YEAR';
+//--SPARK-KEYWORD-LIST-END
 //
 // End of the keywords list
 //
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index bd617bf..04969e3 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -16,9 +16,14 @@
  */
 package org.apache.spark.sql.catalyst.parser
 
+import java.util.Locale
+
+import scala.collection.mutable
+
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.plans.SQLHelper
+import org.apache.spark.sql.catalyst.util.fileToString
 import org.apache.spark.sql.internal.SQLConf
 
 class TableIdentifierParserSuite extends SparkFunSuite with SQLHelper {
@@ -285,334 +290,109 @@ class TableIdentifierParserSuite extends SparkFunSuite 
with SQLHelper {
 "where",
 "with")
 
-  // All the keywords in `docs/sql-keywords.md` are listed below:
-  val allCandidateKeywords = Set(
-"add",
-"after",
-"all",
-"alter",
-"analyze",
-"and",
-"anti",
-"any",
-"archive",
-"array",
-"as",
-"asc",
-"at",
-"authorization",
-"between",
-"both",
-"bucket",
-"buckets",
-"by",
-"cache",
-"cascade",
-"case",
-"cast",
-"change",
-"check",
-"clear",
-"cluster",
-"clustered",
-"codegen",
-"collate",
-"collection",
-"column",
-"columns",
-"comment",
-"commit",
-"compact",
-"compactions",
-"compute",
-"concatenate",
-"constraint",
-"cost",
-"create",
-"cross",
-&q

[spark] 02/02: [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ed69190ce0762f3b741b8d175ef8d02da45f3183
Author: Takeshi Yamamuro 
AuthorDate: Tue Jun 16 00:27:45 2020 +0900

[SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

### What changes were proposed in this pull request?

This PR intends to move keywords `ANTI`, `SEMI`, and `MINUS` from reserved 
to non-reserved.

### Why are the changes needed?

To comply with the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28807 from maropu/SPARK-26905-2.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index eab194c..e5ca7e9d 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -135,7 +135,7 @@ Below is a list of all the keywords in Spark SQL.
 |ALTER|non-reserved|non-reserved|reserved|
 |ANALYZE|non-reserved|non-reserved|non-reserved|
 |AND|reserved|non-reserved|reserved|
-|ANTI|reserved|strict-non-reserved|non-reserved|
+|ANTI|non-reserved|strict-non-reserved|non-reserved|
 |ANY|reserved|non-reserved|reserved|
 |ARCHIVE|non-reserved|non-reserved|non-reserved|
 |ARRAY|non-reserved|non-reserved|reserved|
@@ -264,7 +264,7 @@ Below is a list of all the keywords in Spark SQL.
 |MAP|non-reserved|non-reserved|non-reserved|
 |MATCHED|non-reserved|non-reserved|non-reserved|
 |MERGE|non-reserved|non-reserved|non-reserved|
-|MINUS|reserved|strict-non-reserved|non-reserved|
+|MINUS|not-reserved|strict-non-reserved|non-reserved|
 |MINUTE|reserved|non-reserved|reserved|
 |MONTH|reserved|non-reserved|reserved|
 |MSCK|non-reserved|non-reserved|non-reserved|
@@ -325,7 +325,7 @@ Below is a list of all the keywords in Spark SQL.
 |SCHEMA|non-reserved|non-reserved|non-reserved|
 |SECOND|reserved|non-reserved|reserved|
 |SELECT|reserved|non-reserved|reserved|
-|SEMI|reserved|strict-non-reserved|non-reserved|
+|SEMI|non-reserved|strict-non-reserved|non-reserved|
 |SEPARATED|non-reserved|non-reserved|non-reserved|
 |SERDE|non-reserved|non-reserved|non-reserved|
 |SERDEPROPERTIES|non-reserved|non-reserved|non-reserved|
diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 14a6687..5821a74 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -994,6 +994,7 @@ ansiNonReserved
 | AFTER
 | ALTER
 | ANALYZE
+| ANTI
 | ARCHIVE
 | ARRAY
 | ASC
@@ -1126,10 +1127,12 @@ ansiNonReserved
 | ROW
 | ROWS
 | SCHEMA
+| SEMI
 | SEPARATED
 | SERDE
 | SERDEPROPERTIES
 | SET
+| SETMINUS
 | SETS
 | SHOW
 | SKEWED
diff --git 
a/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt 
b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
new file mode 100644
index 000..921491a
--- /dev/null
+++ b/sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt
@@ -0,0 +1,401 @@
+-- This file comes from: 
https://github.com/postgres/postgres/tree/master/doc/src/sgml/keywords
+ABS
+ACOS
+ALL
+ALLOCATE
+ALTER
+AND
+ANY
+ARE
+ARRAY
+ARRAY_AGG
+ARRAY_MAX_CARDINALITY
+AS
+ASENSITIVE
+ASIN
+ASYMMETRIC
+AT
+ATAN
+ATOMIC
+AUTHORIZATION
+AVG
+BEGIN
+BEGIN_FRAME
+BEGIN_PARTITION
+BETWEEN
+BIGINT
+BINARY
+BLOB
+BOOLEAN
+BOTH
+BY
+CALL
+CALLED
+CARDINALITY
+CASCADED
+CASE
+CAST
+CEIL
+CEILING
+CHAR
+CHAR_LENGTH
+CHARACTER
+CHARACTER_LENGTH
+CHECK
+CLASSIFIER
+CLOB
+CLOSE
+COALESCE
+COLLATE
+COLLECT
+COLUMN
+COMMIT
+CONDITION
+CONNECT
+CONSTRAINT
+CONTAINS
+CONVERT
+COPY
+CORR
+CORRESPONDING
+COS
+COSH
+COUNT
+COVAR_POP
+COVAR_SAMP
+CREATE
+CROSS
+CUBE
+CUME_DIST
+CURRENT
+CURRENT_CATALOG
+CURRENT_DATE
+CURRENT_DEFAULT_TRANSFORM_GROUP
+CURRENT_PATH
+CURRENT_ROLE
+CURRENT_ROW
+CURRENT_SCHEMA
+CURRENT_TIME
+CURRENT_TIMESTAMP
+CURRENT_TRANSFORM_GROUP_FOR_TYPE
+CURRENT_USER
+CURSOR
+CYCLE
+DATE
+DAY
+DEALLOCATE
+DEC
+DECIMAL
+DECFLOAT
+DECLARE
+DEFAULT
+DEFINE
+DELETE
+DENSE_RANK
+DEREF
+DESCRIBE
+DETERMINISTIC
+DISCONNECT
+DISTINCT
+DOUBLE
+DROP
+DYNAMIC
+EACH
+ELEMENT
+ELSE
+EMPTY
+END
+END_FRAME
+END_PARTITION
+END-EXEC
+EQUALS
+ESCAPE
+EVERY
+EXCEPT
+EXEC
+EXECUTE
+EXISTS
+EXP
+EXTERNAL
+EXTRACT
+FALSE
+FETCH
+FILTER
+FIRST_VALUE

[spark] branch master updated (eae1747 -> 3698a14)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eae1747  [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test 
"SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb
 add 3698a14  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eae1747 -> 3698a14)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eae1747  [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test 
"SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb
 add 3698a14  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eae1747 -> 3698a14)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eae1747  [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test 
"SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb
 add 3698a14  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eae1747 -> 3698a14)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eae1747  [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test 
"SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb
 add 3698a14  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eae1747 -> 3698a14)

2020-06-15 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eae1747  [SPARK-31959][SQL][TESTS][FOLLOWUP] Adopt the test 
"SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb
 add 3698a14  [SPARK-26905][SQL] Follow the SQL:2016 reserved keywords

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md|   6 +-
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   3 +
 .../resources/ansi-sql-2016-reserved-keywords.txt  | 401 +
 .../parser/TableIdentifierParserSuite.scala|  24 +-
 4 files changed, 429 insertions(+), 5 deletions(-)
 create mode 100644 
sql/catalyst/src/test/resources/ansi-sql-2016-reserved-keywords.txt


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (78d08a8 -> a620a2a)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 add a620a2a  [SPARK-31977][SQL] Returns the plan directly from 
NestedColumnAliasing

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 ---
 .../spark/sql/catalyst/optimizer/Optimizer.scala  |  3 +--
 2 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (78d08a8 -> a620a2a)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 add a620a2a  [SPARK-31977][SQL] Returns the plan directly from 
NestedColumnAliasing

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 ---
 .../spark/sql/catalyst/optimizer/Optimizer.scala  |  3 +--
 2 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (78d08a8 -> a620a2a)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 add a620a2a  [SPARK-31977][SQL] Returns the plan directly from 
NestedColumnAliasing

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 ---
 .../spark/sql/catalyst/optimizer/Optimizer.scala  |  3 +--
 2 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (78d08a8 -> a620a2a)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file
 add a620a2a  [SPARK-31977][SQL] Returns the plan directly from 
NestedColumnAliasing

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/optimizer/NestedColumnAliasing.scala | 19 ---
 .../spark/sql/catalyst/optimizer/Optimizer.scala  |  3 +--
 2 files changed, 13 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (28f131f -> 78d08a8)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 28f131f  [SPARK-31979] Release script should not fail when remove 
non-existing files
 add 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (28f131f -> 78d08a8)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 28f131f  [SPARK-31979] Release script should not fail when remove 
non-existing files
 add 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (28f131f -> 78d08a8)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 28f131f  [SPARK-31979] Release script should not fail when remove 
non-existing files
 add 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (28f131f -> 78d08a8)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 28f131f  [SPARK-31979] Release script should not fail when remove 
non-existing files
 add 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (28f131f -> 78d08a8)

2020-06-12 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 28f131f  [SPARK-31979] Release script should not fail when remove 
non-existing files
 add 78d08a8  [SPARK-31950][SQL][TESTS] Extract SQL keywords from the 
SqlBase.g4 file

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|   4 +
 .../parser/TableIdentifierParserSuite.scala| 432 +
 2 files changed, 110 insertions(+), 326 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f61b31a  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException
f61b31a is described below

commit f61b31a5a484c7e90920ec36c456594ce92cdf73
Author: Dilip Biswal 
AuthorDate: Fri Jun 12 09:19:29 2020 +0900

[SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

### What changes were proposed in this pull request?
A minor fix to fix the append method of StringConcat to cap the length at 
MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause 
StringIndexOutOfBoundsException

Thanks to **Jeffrey Stokes** for reporting the issue and explaining the 
underlying problem in detail in the JIRA.

### Why are the changes needed?
This fixes StringIndexOutOfBoundsException on an overflow.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Added a test in StringsUtilSuite.

Closes #28750 from dilipbiswal/SPARK-31916.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index b42ae4e..2a416d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -123,7 +123,11 @@ object StringUtils extends Logging {
   val stringToAppend = if (available >= sLen) s else s.substring(0, 
available)
   strings.append(stringToAppend)
 }
-length += sLen
+
+// Keeps the total length of appended strings. Note that we need to 
cap the length at
+// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will 
overflow
+// length causing StringIndexOutOfBoundsException in the substring 
call above.
+length = Math.min(length.toLong + sLen, 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt
   }
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
index 67bc4bc..c68e89fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
@@ -18,9 +18,11 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.StringUtils._
+import org.apache.spark.sql.internal.SQLConf
 
-class StringUtilsSuite extends SparkFunSuite {
+class StringUtilsSuite extends SparkFunSuite with SQLHelper {
 
   test("escapeLikeRegex") {
 val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E"
@@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(checkLimit("1234567"))
 assert(checkLimit("1234567890"))
   }
+
+  test("SPARK-31916: StringConcat doesn't overflow on many inputs") {
+val concat = new StringConcat(maxLength = 100)
+val stringToAppend = "Test internal index of StringConcat does not 
overflow with many " +
+  "append calls"
+0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ =>
+  concat.append(stringToAppend)
+}
+assert(concat.toString.length === 100)
+  }
+
+  test("SPARK-31916: verify that PlanStringConcat's output shows the actual 
length of the plan") {
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") {
+  val concat = new PlanStringConcat()
+  0.to(3).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "Truncated plan of 60 characters")
+}
+
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") {
+  val concat = new PlanStringConcat()
+  0.to(2).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "plan fragment 0plan fragment 1... 15 more 
characters")
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f61b31a  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException
f61b31a is described below

commit f61b31a5a484c7e90920ec36c456594ce92cdf73
Author: Dilip Biswal 
AuthorDate: Fri Jun 12 09:19:29 2020 +0900

[SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

### What changes were proposed in this pull request?
A minor fix to fix the append method of StringConcat to cap the length at 
MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause 
StringIndexOutOfBoundsException

Thanks to **Jeffrey Stokes** for reporting the issue and explaining the 
underlying problem in detail in the JIRA.

### Why are the changes needed?
This fixes StringIndexOutOfBoundsException on an overflow.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Added a test in StringsUtilSuite.

Closes #28750 from dilipbiswal/SPARK-31916.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index b42ae4e..2a416d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -123,7 +123,11 @@ object StringUtils extends Logging {
   val stringToAppend = if (available >= sLen) s else s.substring(0, 
available)
   strings.append(stringToAppend)
 }
-length += sLen
+
+// Keeps the total length of appended strings. Note that we need to 
cap the length at
+// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will 
overflow
+// length causing StringIndexOutOfBoundsException in the substring 
call above.
+length = Math.min(length.toLong + sLen, 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt
   }
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
index 67bc4bc..c68e89fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
@@ -18,9 +18,11 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.StringUtils._
+import org.apache.spark.sql.internal.SQLConf
 
-class StringUtilsSuite extends SparkFunSuite {
+class StringUtilsSuite extends SparkFunSuite with SQLHelper {
 
   test("escapeLikeRegex") {
 val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E"
@@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(checkLimit("1234567"))
 assert(checkLimit("1234567890"))
   }
+
+  test("SPARK-31916: StringConcat doesn't overflow on many inputs") {
+val concat = new StringConcat(maxLength = 100)
+val stringToAppend = "Test internal index of StringConcat does not 
overflow with many " +
+  "append calls"
+0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ =>
+  concat.append(stringToAppend)
+}
+assert(concat.toString.length === 100)
+  }
+
+  test("SPARK-31916: verify that PlanStringConcat's output shows the actual 
length of the plan") {
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") {
+  val concat = new PlanStringConcat()
+  0.to(3).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "Truncated plan of 60 characters")
+}
+
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") {
+  val concat = new PlanStringConcat()
+  0.to(2).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "plan fragment 0plan fragment 1... 15 more 
characters")
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88a4e55 -> b87a342)

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88a4e55  [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
 add b87a342  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f61b31a  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException
f61b31a is described below

commit f61b31a5a484c7e90920ec36c456594ce92cdf73
Author: Dilip Biswal 
AuthorDate: Fri Jun 12 09:19:29 2020 +0900

[SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

### What changes were proposed in this pull request?
A minor fix to fix the append method of StringConcat to cap the length at 
MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause 
StringIndexOutOfBoundsException

Thanks to **Jeffrey Stokes** for reporting the issue and explaining the 
underlying problem in detail in the JIRA.

### Why are the changes needed?
This fixes StringIndexOutOfBoundsException on an overflow.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Added a test in StringsUtilSuite.

Closes #28750 from dilipbiswal/SPARK-31916.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index b42ae4e..2a416d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -123,7 +123,11 @@ object StringUtils extends Logging {
   val stringToAppend = if (available >= sLen) s else s.substring(0, 
available)
   strings.append(stringToAppend)
 }
-length += sLen
+
+// Keeps the total length of appended strings. Note that we need to 
cap the length at
+// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will 
overflow
+// length causing StringIndexOutOfBoundsException in the substring 
call above.
+length = Math.min(length.toLong + sLen, 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt
   }
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
index 67bc4bc..c68e89fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
@@ -18,9 +18,11 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.StringUtils._
+import org.apache.spark.sql.internal.SQLConf
 
-class StringUtilsSuite extends SparkFunSuite {
+class StringUtilsSuite extends SparkFunSuite with SQLHelper {
 
   test("escapeLikeRegex") {
 val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E"
@@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(checkLimit("1234567"))
 assert(checkLimit("1234567890"))
   }
+
+  test("SPARK-31916: StringConcat doesn't overflow on many inputs") {
+val concat = new StringConcat(maxLength = 100)
+val stringToAppend = "Test internal index of StringConcat does not 
overflow with many " +
+  "append calls"
+0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ =>
+  concat.append(stringToAppend)
+}
+assert(concat.toString.length === 100)
+  }
+
+  test("SPARK-31916: verify that PlanStringConcat's output shows the actual 
length of the plan") {
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") {
+  val concat = new PlanStringConcat()
+  0.to(3).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "Truncated plan of 60 characters")
+}
+
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") {
+  val concat = new PlanStringConcat()
+  0.to(2).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "plan fragment 0plan fragment 1... 15 more 
characters")
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88a4e55 -> b87a342)

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88a4e55  [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
 add b87a342  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f61b31a  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException
f61b31a is described below

commit f61b31a5a484c7e90920ec36c456594ce92cdf73
Author: Dilip Biswal 
AuthorDate: Fri Jun 12 09:19:29 2020 +0900

[SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

### What changes were proposed in this pull request?
A minor fix to fix the append method of StringConcat to cap the length at 
MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause 
StringIndexOutOfBoundsException

Thanks to **Jeffrey Stokes** for reporting the issue and explaining the 
underlying problem in detail in the JIRA.

### Why are the changes needed?
This fixes StringIndexOutOfBoundsException on an overflow.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Added a test in StringsUtilSuite.

Closes #28750 from dilipbiswal/SPARK-31916.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index b42ae4e..2a416d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -123,7 +123,11 @@ object StringUtils extends Logging {
   val stringToAppend = if (available >= sLen) s else s.substring(0, 
available)
   strings.append(stringToAppend)
 }
-length += sLen
+
+// Keeps the total length of appended strings. Note that we need to 
cap the length at
+// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will 
overflow
+// length causing StringIndexOutOfBoundsException in the substring 
call above.
+length = Math.min(length.toLong + sLen, 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt
   }
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
index 67bc4bc..c68e89fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
@@ -18,9 +18,11 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.StringUtils._
+import org.apache.spark.sql.internal.SQLConf
 
-class StringUtilsSuite extends SparkFunSuite {
+class StringUtilsSuite extends SparkFunSuite with SQLHelper {
 
   test("escapeLikeRegex") {
 val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E"
@@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(checkLimit("1234567"))
 assert(checkLimit("1234567890"))
   }
+
+  test("SPARK-31916: StringConcat doesn't overflow on many inputs") {
+val concat = new StringConcat(maxLength = 100)
+val stringToAppend = "Test internal index of StringConcat does not 
overflow with many " +
+  "append calls"
+0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ =>
+  concat.append(stringToAppend)
+}
+assert(concat.toString.length === 100)
+  }
+
+  test("SPARK-31916: verify that PlanStringConcat's output shows the actual 
length of the plan") {
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") {
+  val concat = new PlanStringConcat()
+  0.to(3).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "Truncated plan of 60 characters")
+}
+
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") {
+  val concat = new PlanStringConcat()
+  0.to(2).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "plan fragment 0plan fragment 1... 15 more 
characters")
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88a4e55 -> b87a342)

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88a4e55  [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
 add b87a342  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f61b31a  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException
f61b31a is described below

commit f61b31a5a484c7e90920ec36c456594ce92cdf73
Author: Dilip Biswal 
AuthorDate: Fri Jun 12 09:19:29 2020 +0900

[SPARK-31916][SQL] StringConcat can lead to StringIndexOutOfBoundsException

### What changes were proposed in this pull request?
A minor fix to fix the append method of StringConcat to cap the length at 
MAX_ROUNDED_ARRAY_LENGTH to make sure it does not overflow and cause 
StringIndexOutOfBoundsException

Thanks to **Jeffrey Stokes** for reporting the issue and explaining the 
underlying problem in detail in the JIRA.

### Why are the changes needed?
This fixes StringIndexOutOfBoundsException on an overflow.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Added a test in StringsUtilSuite.

Closes #28750 from dilipbiswal/SPARK-31916.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit b87a342c7dd51046fcbe323db640c825646fb8d4)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
index b42ae4e..2a416d6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
@@ -123,7 +123,11 @@ object StringUtils extends Logging {
   val stringToAppend = if (available >= sLen) s else s.substring(0, 
available)
   strings.append(stringToAppend)
 }
-length += sLen
+
+// Keeps the total length of appended strings. Note that we need to 
cap the length at
+// `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH`; otherwise, we will 
overflow
+// length causing StringIndexOutOfBoundsException in the substring 
call above.
+length = Math.min(length.toLong + sLen, 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH).toInt
   }
 }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
index 67bc4bc..c68e89fc 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala
@@ -18,9 +18,11 @@
 package org.apache.spark.sql.catalyst.util
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.plans.SQLHelper
 import org.apache.spark.sql.catalyst.util.StringUtils._
+import org.apache.spark.sql.internal.SQLConf
 
-class StringUtilsSuite extends SparkFunSuite {
+class StringUtilsSuite extends SparkFunSuite with SQLHelper {
 
   test("escapeLikeRegex") {
 val expectedEscapedStrOne = "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E"
@@ -98,4 +100,32 @@ class StringUtilsSuite extends SparkFunSuite {
 assert(checkLimit("1234567"))
 assert(checkLimit("1234567890"))
   }
+
+  test("SPARK-31916: StringConcat doesn't overflow on many inputs") {
+val concat = new StringConcat(maxLength = 100)
+val stringToAppend = "Test internal index of StringConcat does not 
overflow with many " +
+  "append calls"
+0.to((Integer.MAX_VALUE / stringToAppend.length) + 1).foreach { _ =>
+  concat.append(stringToAppend)
+}
+assert(concat.toString.length === 100)
+  }
+
+  test("SPARK-31916: verify that PlanStringConcat's output shows the actual 
length of the plan") {
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "0") {
+  val concat = new PlanStringConcat()
+  0.to(3).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "Truncated plan of 60 characters")
+}
+
+withSQLConf(SQLConf.MAX_PLAN_STRING_LENGTH.key -> "60") {
+  val concat = new PlanStringConcat()
+  0.to(2).foreach { i =>
+concat.append(s"plan fragment $i")
+  }
+  assert(concat.toString === "plan fragment 0plan fragment 1... 15 more 
characters")
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88a4e55 -> b87a342)

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88a4e55  [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
 add b87a342  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (88a4e55 -> b87a342)

2020-06-11 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88a4e55  [SPARK-31765][WEBUI][TEST-MAVEN] Upgrade HtmlUnit >= 2.37.0
 add b87a342  [SPARK-31916][SQL] StringConcat can lead to 
StringIndexOutOfBoundsException

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/util/StringUtils.scala  |  6 +++-
 .../spark/sql/catalyst/util/StringUtilsSuite.scala | 32 +-
 2 files changed, 36 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4b625bd  [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for 
ThriftCLIService to getPortNumber
 add 89b1d46  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (4b625bd -> 89b1d46)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4b625bd  [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for 
ThriftCLIService to getPortNumber
 add 89b1d46  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3771c6 -> e14029b)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3771c6  [SPARK-31935][SQL] Hadoop file system config should be 
effective in data source options
 add e14029b  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 89b1d46  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
89b1d46 is described below

commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb
Author: Takeshi Yamamuro 
AuthorDate: Wed Jun 10 16:29:43 2020 +0900

[SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

### What changes were proposed in this pull request?

This PR intends to add `TYPE` in the ANSI non-reserved list because it is 
not reserved in the standard. See SPARK-26905 for a full set of the 
reserved/non-reserved keywords of `SQL:2016`.

Note: The current master behaviour is as follows;
```
scala> sql("SET spark.sql.ansi.enabled=false")
scala> sql("create table t1 (type int)")
res4: org.apache.spark.sql.DataFrame = []

scala> sql("SET spark.sql.ansi.enabled=true")
scala> sql("create table t2 (type int)")
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'type'(line 1, pos 17)

== SQL ==
create table t2 (type int)
-^^^
```

### Why are the changes needed?

To follow the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

Makes users use `TYPE` as identifiers.

### How was this patch tested?

Update the keyword lists in `TableIdentifierParserSuite`.

Closes #28773 from maropu/SPARK-26905.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e)

Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 2adaa9f..208a503 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -1153,6 +1153,7 @@ ansiNonReserved
 | TRIM
 | TRUE
 | TRUNCATE
+| TYPE
 | UNARCHIVE
 | UNBOUNDED
 | UNCACHE
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index d5b0885..bd617bf 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with 
SQLHelper {
 "transform",
 "true",
 "truncate",
+"type",
 "unarchive",
 "unbounded",
 "uncache",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3771c6 -> e14029b)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3771c6  [SPARK-31935][SQL] Hadoop file system config should be 
effective in data source options
 add e14029b  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 89b1d46  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
89b1d46 is described below

commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb
Author: Takeshi Yamamuro 
AuthorDate: Wed Jun 10 16:29:43 2020 +0900

[SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

### What changes were proposed in this pull request?

This PR intends to add `TYPE` in the ANSI non-reserved list because it is 
not reserved in the standard. See SPARK-26905 for a full set of the 
reserved/non-reserved keywords of `SQL:2016`.

Note: The current master behaviour is as follows;
```
scala> sql("SET spark.sql.ansi.enabled=false")
scala> sql("create table t1 (type int)")
res4: org.apache.spark.sql.DataFrame = []

scala> sql("SET spark.sql.ansi.enabled=true")
scala> sql("create table t2 (type int)")
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'type'(line 1, pos 17)

== SQL ==
create table t2 (type int)
-^^^
```

### Why are the changes needed?

To follow the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

Makes users use `TYPE` as identifiers.

### How was this patch tested?

Update the keyword lists in `TableIdentifierParserSuite`.

Closes #28773 from maropu/SPARK-26905.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e)

Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 2adaa9f..208a503 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -1153,6 +1153,7 @@ ansiNonReserved
 | TRIM
 | TRUE
 | TRUNCATE
+| TYPE
 | UNARCHIVE
 | UNBOUNDED
 | UNCACHE
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index d5b0885..bd617bf 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with 
SQLHelper {
 "transform",
 "true",
 "truncate",
+"type",
 "unarchive",
 "unbounded",
 "uncache",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3771c6 -> e14029b)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3771c6  [SPARK-31935][SQL] Hadoop file system config should be 
effective in data source options
 add e14029b  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 89b1d46  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list
89b1d46 is described below

commit 89b1d4614ef1a3d15ff0f1e745c770ebd8f5cddb
Author: Takeshi Yamamuro 
AuthorDate: Wed Jun 10 16:29:43 2020 +0900

[SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

### What changes were proposed in this pull request?

This PR intends to add `TYPE` in the ANSI non-reserved list because it is 
not reserved in the standard. See SPARK-26905 for a full set of the 
reserved/non-reserved keywords of `SQL:2016`.

Note: The current master behaviour is as follows;
```
scala> sql("SET spark.sql.ansi.enabled=false")
scala> sql("create table t1 (type int)")
res4: org.apache.spark.sql.DataFrame = []

scala> sql("SET spark.sql.ansi.enabled=true")
scala> sql("create table t2 (type int)")
org.apache.spark.sql.catalyst.parser.ParseException:
no viable alternative at input 'type'(line 1, pos 17)

== SQL ==
create table t2 (type int)
-^^^
```

### Why are the changes needed?

To follow the ANSI/SQL standard.

### Does this PR introduce _any_ user-facing change?

Makes users use `TYPE` as identifiers.

### How was this patch tested?

Update the keyword lists in `TableIdentifierParserSuite`.

Closes #28773 from maropu/SPARK-26905.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit e14029b18df10db5094f8abf8b9874dbc9186b4e)

Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index 2adaa9f..208a503 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -1153,6 +1153,7 @@ ansiNonReserved
 | TRIM
 | TRUE
 | TRUNCATE
+| TYPE
 | UNARCHIVE
 | UNBOUNDED
 | UNCACHE
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
index d5b0885..bd617bf 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala
@@ -513,6 +513,7 @@ class TableIdentifierParserSuite extends SparkFunSuite with 
SQLHelper {
 "transform",
 "true",
 "truncate",
+"type",
 "unarchive",
 "unbounded",
 "uncache",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3771c6 -> e14029b)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3771c6  [SPARK-31935][SQL] Hadoop file system config should be 
effective in data source options
 add e14029b  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f3771c6 -> e14029b)

2020-06-10 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f3771c6  [SPARK-31935][SQL] Hadoop file system config should be 
effective in data source options
 add e14029b  [SPARK-26905][SQL] Add `TYPE` in the ANSI non-reserved list

No new revisions were added by this update.

Summary of changes:
 .../src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4  | 1 +
 .../apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala| 1 +
 2 files changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns

2020-06-05 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fa608b9  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns
fa608b9 is described below

commit fa608b949b854d716904f4e43a4a10c71742b3c6
Author: LantaoJin 
AuthorDate: Sat Jun 6 07:35:25 2020 +0900

[SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition 
columns

### What changes were proposed in this pull request?
```sql
CREATE TABLE t1(a STRING, B VARCHAR(10), C CHAR(10)) STORED AS parquet;
CREATE TABLE t2 USING parquet PARTITIONED BY (b, c) AS SELECT * FROM t1;
SELECT * FROM t2 WHERE b = 'A';
```
Above SQL throws MetaException

> Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:810)
... 114 more
Caused by: MetaException(message:Filtering is supported only on partition 
keys of type string, or integral types)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:184)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:439)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:356)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:278)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:583)
at 
org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:3315)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2768)
at 
org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:182)
at 
org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3248)
at 
org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3232)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2974)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3250)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:2906)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy25.getPartitionsByFilter(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5093)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy26.get_partitions_by_filter(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1232)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy27.listPartitionsByFilter(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2679)
... 119 more

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add a unit test.

[spark] branch branch-3.0 updated: [SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition columns

2020-06-05 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fa608b9  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns
fa608b9 is described below

commit fa608b949b854d716904f4e43a4a10c71742b3c6
Author: LantaoJin 
AuthorDate: Sat Jun 6 07:35:25 2020 +0900

[SPARK-31904][SQL] Fix case sensitive problem of char and varchar partition 
columns

### What changes were proposed in this pull request?
```sql
CREATE TABLE t1(a STRING, B VARCHAR(10), C CHAR(10)) STORED AS parquet;
CREATE TABLE t2 USING parquet PARTITIONED BY (b, c) AS SELECT * FROM t1;
SELECT * FROM t2 WHERE b = 'A';
```
Above SQL throws MetaException

> Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:810)
... 114 more
Caused by: MetaException(message:Filtering is supported only on partition 
keys of type string, or integral types)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:184)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:439)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:356)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:278)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:583)
at 
org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:3315)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2768)
at 
org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:182)
at 
org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3248)
at 
org.apache.hadoop.hive.metastore.ObjectStore$7.getJdoResult(ObjectStore.java:3232)
at 
org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2974)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3250)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:2906)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
at com.sun.proxy.$Proxy25.getPartitionsByFilter(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5093)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy26.get_partitions_by_filter(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:1232)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy27.listPartitionsByFilter(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByFilter(Hive.java:2679)
... 119 more

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add a unit test.

[spark] branch master updated (fc6af9d -> 5079831)

2020-06-05 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fc6af9d  [SPARK-31867][SQL][FOLLOWUP] Check result differences for 
datetime formatting
 add 5079831  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/hive/client/HiveShim.scala |  3 ++-
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fc6af9d -> 5079831)

2020-06-05 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fc6af9d  [SPARK-31867][SQL][FOLLOWUP] Check result differences for 
datetime formatting
 add 5079831  [SPARK-31904][SQL] Fix case sensitive problem of char and 
varchar partition columns

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/hive/client/HiveShim.scala |  3 ++-
 .../org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for IntegralDivide operator

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 72c466e  [SPARK-31761][SQL][3.0] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide operator
72c466e is described below

commit 72c466e0c37e4cc639040161699b6c0bffde70d5
Author: sandeep katta 
AuthorDate: Sun May 24 21:39:16 2020 +0900

[SPARK-31761][SQL][3.0] cast integer to Long to avoid IntegerOverflow for 
IntegralDivide operator

### What changes were proposed in this pull request?
`IntegralDivide` operator returns Long DataType, so integer overflow case 
should be handled.
If the operands are of type Int it will be casted to Long

### Why are the changes needed?
As `IntegralDivide` returns Long datatype, integer overflow should not 
happen

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT and also tested in the local cluster

After fix


![image](https://user-images.githubusercontent.com/35216143/82603361-25eccc00-9bd0-11ea-9ca7-001c539e628b.png)

SQL Test

After fix

![image](https://user-images.githubusercontent.com/35216143/82637689-f0250300-9c22-11ea-85c3-886ab2c23471.png)

Before Fix

![image](https://user-images.githubusercontent.com/35216143/82637984-878a5600-9c23-11ea-9e47-5ce2fb923c01.png)

Closes #28628 from sandeep-katta/branch3Backport.

Authored-by: sandeep katta 
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 
 .../sql/catalyst/expressions/arithmetic.scala  |  2 +-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  | 24 ++
 .../expressions/ArithmeticExpressionSuite.scala|  7 +--
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../resources/sql-tests/results/operators.sql.out  |  8 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  8 
 7 files changed, 57 insertions(+), 12 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
index c6e3f56..a6f8e12 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
@@ -61,6 +61,7 @@ object TypeCoercion {
   IfCoercion ::
   StackCoercion ::
   Division ::
+  IntegralDivision ::
   ImplicitTypeCasts ::
   DateTimeOperations ::
   WindowFrameCoercion ::
@@ -685,6 +686,23 @@ object TypeCoercion {
   }
 
   /**
+   * The DIV operator always returns long-type value.
+   * This rule cast the integral inputs to long type, to avoid overflow during 
calculation.
+   */
+  object IntegralDivision extends TypeCoercionRule {
+override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan 
resolveExpressions {
+  case e if !e.childrenResolved => e
+  case d @ IntegralDivide(left, right) =>
+IntegralDivide(mayCastToLong(left), mayCastToLong(right))
+}
+
+private def mayCastToLong(expr: Expression): Expression = expr.dataType 
match {
+  case _: ByteType | _: ShortType | _: IntegerType => Cast(expr, LongType)
+  case _ => expr
+}
+  }
+
+  /**
* Coerces the type of different branches of a CASE WHEN statement to a 
common type.
*/
   object CaseWhenCoercion extends TypeCoercionRule {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 354845d..7c52183 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -412,7 +412,7 @@ case class IntegralDivide(
 left: Expression,
 right: Expression) extends DivModLike {
 
-  override def inputType: AbstractDataType = TypeCollection(IntegralType, 
DecimalType)
+  override def inputType: AbstractDataType = TypeCollection(LongType, 
DecimalType)
 
   override def dataType: DataType = LongType
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
index e37555f..1ea1ddb 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
@@ -1559,6 +1559,30 @@ class TypeCoercionSuite extends AnalysisTest {
   Li

[spark] branch branch-3.0 updated (576c224 -> 72c466e)

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 576c224  [SPARK-31755][SQL][3.0] allow missing year/hour when parsing 
date/timestamp string
 add 72c466e  [SPARK-31761][SQL][3.0] cast integer to Long to avoid 
IntegerOverflow for IntegralDivide operator

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 18 
 .../sql/catalyst/expressions/arithmetic.scala  |  2 +-
 .../sql/catalyst/analysis/TypeCoercionSuite.scala  | 24 ++
 .../expressions/ArithmeticExpressionSuite.scala|  7 +--
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../resources/sql-tests/results/operators.sql.out  |  8 
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  8 
 7 files changed, 57 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: Fix typos: Github to GitHub

2020-05-24 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9879569  Fix typos: Github to GitHub
9879569 is described below

commit 9879569826e89be4addf3d1f977924cc28062e2c
Author: John Bampton 
AuthorDate: Sun May 24 19:03:46 2020 +0900

Fix typos: Github to GitHub

Author: John Bampton 

Closes #264 from jbampton/fix-word-case.
---
 contributing.md   | 10 +-
 release-process.md|  4 ++--
 site/contributing.html| 10 +-
 site/downloads.html   |  2 +-
 site/release-process.html |  6 +++---
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/contributing.md b/contributing.md
index 3016a26..5c68d98 100644
--- a/contributing.md
+++ b/contributing.md
@@ -43,7 +43,7 @@ feedback on any performance or correctness issues found in 
the newer release.
 Contributing by Reviewing Changes
 
 Changes to Spark source code are proposed, reviewed and committed via 
-https://github.com/apache/spark/pulls";>Github pull requests 
(described later). 
+https://github.com/apache/spark/pulls";>GitHub pull requests 
(described later). 
 Anyone can view and comment on active changes here. 
 Reviewing others' changes is a good way to learn how the change process works 
and gain exposure 
 to activity in various parts of the code. You can help by reviewing the 
changes and asking 
@@ -243,7 +243,7 @@ Once you've downloaded Spark, you can find instructions for 
installing and build
 JIRA
 
 Generally, Spark uses JIRA to track logical issues, including bugs and 
improvements, and uses 
-Github pull requests to manage the review and merge of specific code changes. 
That is, JIRAs are 
+GitHub pull requests to manage the review and merge of specific code changes. 
That is, JIRAs are 
 used to describe _what_ should be fixed or changed, and high-level approaches, 
and pull requests 
 describe _how_ to implement that change in the project's source code. For 
example, major design 
 decisions are discussed in JIRA.
@@ -300,7 +300,7 @@ Example: `Fix typos in Foo scaladoc`
 
 Pull Request
 
-1. https://help.github.com/articles/fork-a-repo/";>Fork the Github 
repository at 
+1. https://help.github.com/articles/fork-a-repo/";>Fork the GitHub 
repository at 
 https://github.com/apache/spark";>https://github.com/apache/spark 
if you haven't already
 1. Clone your fork, create a new branch, push commits to the branch.
 1. Consider whether documentation or tests need to be added or updated as part 
of the change, 
@@ -341,9 +341,9 @@ the `master` branch of `apache/spark`. (Only in special 
cases would the PR be op
  https://spark-prs.appspot.com/";>spark-prs.appspot.com and 
  Title may be the JIRA's title or a more specific title describing the PR 
itself.
  1. If the pull request is still a work in progress, and so is not ready 
to be merged, 
- but needs to be pushed to Github to facilitate review, then add `[WIP]` 
after the component.
+ but needs to be pushed to GitHub to facilitate review, then add `[WIP]` 
after the component.
  1. Consider identifying committers or other contributors who have worked 
on the code being 
- changed. Find the file(s) in Github and click "Blame" to see a 
line-by-line annotation of 
+ changed. Find the file(s) in GitHub and click "Blame" to see a 
line-by-line annotation of 
  who changed the code last. You can add `@username` in the PR description 
to ping them 
  immediately.
  1. Please state that the contribution is your original work and that you 
license the work 
diff --git a/release-process.md b/release-process.md
index d3a9f9f..8165d24 100644
--- a/release-process.md
+++ b/release-process.md
@@ -264,7 +264,7 @@ pick the release version from the list, then click on 
"Release Notes". Copy this
 Then run `jekyll build` to update the `site` directory.
 
 After merging the change into the `asf-site` branch, you may need to create a 
follow-up empty
-commit to force synchronization between ASF's git and the web site, and also 
the github mirror.
+commit to force synchronization between ASF's git and the web site, and also 
the GitHub mirror.
 For some reason synchronization seems to not be reliable for this repository.
 
 On a related note, make sure the version is marked as released on JIRA. Go 
find the release page as above, eg.,
@@ -278,7 +278,7 @@ releasing Spark 1.2.0, set the current tag to v1.2.0-rc2 
and the previous tag to
 Once you have generated the initial contributors list, it is highly likely 
that there will be
 warnings about author names not being properly translated. To fix this, run
 https://github.com/apache/spark/blob/branch-1.1/dev/create-release/transla

[spark] branch branch-3.0 updated: [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception for invalid length input

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 2183345  [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw 
an exception for invalid length input
2183345 is described below

commit 218334523dacd116a03f2340ad89e33abe93e452
Author: Takeshi Yamamuro 
AuthorDate: Sat May 23 08:48:29 2020 +0900

[SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw an exception 
for invalid length input

### What changes were proposed in this pull request?

This PR intends to add trivial tests to check 
https://github.com/apache/spark/pull/27024 has already been fixed in the master.

Closes #27024

### Why are the changes needed?

For test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests.

Closes #28604 from maropu/SPARK-29854.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 7ca73f03fbc6e213c30e725bf480709ed036a376)
Signed-off-by: Takeshi Yamamuro 
---
 .../sql-tests/inputs/ansi/string-functions.sql |  1 +
 .../sql-tests/inputs/string-functions.sql  |  6 +++-
 .../results/{ => ansi}/string-functions.sql.out| 34 +-
 .../sql-tests/results/string-functions.sql.out | 18 +++-
 4 files changed, 50 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql 
b/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql
new file mode 100644
index 000..dd28e9b
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql
@@ -0,0 +1 @@
+--IMPORT string-functions.sql
diff --git a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql 
b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
index 8e33471..f5ed203 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/string-functions.sql
@@ -48,4 +48,8 @@ SELECT trim(LEADING 'xyz' FROM 'zzzytestxyz');
 SELECT trim(LEADING 'xy' FROM 'xyxXxyLAST WORD');
 SELECT trim(TRAILING 'xyz' FROM 'testxxzx');
 SELECT trim(TRAILING 'xyz' FROM 'xyztestxxzx');
-SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy');
\ No newline at end of file
+SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy');
+
+-- Check lpad/rpad with invalid length parameter
+SELECT lpad('hi', 'invalid_length');
+SELECT rpad('hi', 'invalid_length');
diff --git 
a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out 
b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
similarity index 87%
copy from sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
copy to 
sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
index 43c18f5..b507713 100644
--- a/sql/core/src/test/resources/sql-tests/results/string-functions.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 34
+-- Number of queries: 36
 
 
 -- !query
@@ -63,7 +63,7 @@ struct
+struct
 -- !query output
 ab abcdab  NULL
 
@@ -71,15 +71,16 @@ ab  abcdab  NULL
 -- !query
 select left(null, -2), left("abcd", -2), left("abcd", 0), left("abcd", 'a')
 -- !query schema
-struct
+struct<>
 -- !query output
-NULL   NULL
+java.lang.NumberFormatException
+invalid input syntax for type numeric: a
 
 
 -- !query
 select right("abcd", 2), right("abcd", 5), right("abcd", '2'), right("abcd", 
null)
 -- !query schema
-struct
+struct
 -- !query output
 cd abcdcd  NULL
 
@@ -87,9 +88,10 @@ cd   abcdcd  NULL
 -- !query
 select right(null, -2), right("abcd", -2), right("abcd", 0), right("abcd", 'a')
 -- !query schema
-struct
+struct<>
 -- !query output
-NULL   NULL
+java.lang.NumberFormatException
+invalid input syntax for type numeric: a
 
 
 -- !query
@@ -274,3 +276,21 @@ SELECT trim(TRAILING 'xy' FROM 'TURNERyxXxy')
 struct
 -- !query output
 TURNERyxX
+
+
+-- !query
+SELECT lpad('hi', 'invalid_length')
+-- !query schema
+struct<>
+-- !query output
+java.lang.NumberFormatException
+invalid input syntax for type numeric: invalid_length
+
+
+-- !query
+SELECT rpad('hi', 'invalid_length')
+-- !query

[spark] branch master updated (5a258b0 -> 7ca73f0)

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5a258b0  [SPARK-30915][SS] CompactibleFileStreamLog: Avoid reading the 
metadata log file when finding the latest batch ID
 add 7ca73f0  [SPARK-29854][SQL][TESTS] Add tests to check lpad/rpad throw 
an exception for invalid length input

No new revisions were added by this update.

Summary of changes:
 .../sql-tests/inputs/ansi/string-functions.sql |  1 +
 .../sql-tests/inputs/string-functions.sql  |  6 -
 .../results/{ => ansi}/string-functions.sql.out| 30 ++
 .../sql-tests/results/string-functions.sql.out | 18 -
 4 files changed, 48 insertions(+), 7 deletions(-)
 create mode 100644 
sql/core/src/test/resources/sql-tests/inputs/ansi/string-functions.sql
 copy sql/core/src/test/resources/sql-tests/results/{ => 
ansi}/string-functions.sql.out (90%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 23019aa  [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL 
ref
23019aa is described below

commit 23019aa429d8f0db52b1ed5e9e6dc00ea7b94740
Author: Huaxin Gao 
AuthorDate: Sat May 23 08:43:16 2020 +0900

[SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref

### What changes were proposed in this pull request?
Fix a few issues in SQL Reference

### Why are the changes needed?
To make SQL Reference look better

### Does this PR introduce _any_ user-facing change?
Yes.
before:
https://user-images.githubusercontent.com/13592258/82639052-d0f38a80-9bbc-11ea-81a4-22def4ca5cc0.png";>

after:

https://user-images.githubusercontent.com/13592258/82639063-d5b83e80-9bbc-11ea-84d1-8361e6bee949.png";>

before:
https://user-images.githubusercontent.com/13592258/82639252-3e9fb680-9bbd-11ea-863c-e6a6c2f83a06.png";>

after:

https://user-images.githubusercontent.com/13592258/82639265-42cbd400-9bbd-11ea-8df2-fc5c255b84d3.png";>

before:
https://user-images.githubusercontent.com/13592258/82639072-db158900-9bbc-11ea-9963-731881cda4fd.png";>

after

https://user-images.githubusercontent.com/13592258/82639082-dfda3d00-9bbc-11ea-9bd2-f922cc91f175.png";>

### How was this patch tested?
Manually build and check

Closes #28608 from huaxingao/doc_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit ad9532a09c70bf6acc8b79b4fdbfcd6afadcbc91)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml | 42 ++--
 docs/sql-ref-syntax-aux-conf-mgmt.md |  2 +-
 docs/sql-ref-syntax-qry.md   | 35 +++---
 docs/sql-ref-syntax.md   | 28 
 docs/sql-ref.md  | 16 +++---
 5 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 57fc493..289a9d3 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -76,14 +76,6 @@
 - text: SQL Reference
   url: sql-ref.html
   subitems:
-- text: Data Types
-  url: sql-ref-datatypes.html
-- text: Identifiers
-  url: sql-ref-identifier.html
-- text: Literals
-  url: sql-ref-literals.html
-- text: Null Semantics
-  url: sql-ref-null-semantics.html
 - text: ANSI Compliance
   url: sql-ref-ansi-compliance.html
   subitems:
@@ -93,6 +85,27 @@
   url: sql-ref-ansi-compliance.html#type-conversion
 - text: SQL Keywords
   url: sql-ref-ansi-compliance.html#sql-keywords
+- text: Data Types
+  url: sql-ref-datatypes.html
+- text: Datetime Pattern
+  url: sql-ref-datetime-pattern.html
+- text: Functions
+  url: sql-ref-functions.html
+  subitems:
+  - text: Built-in Functions
+url: sql-ref-functions-builtin.html
+  - text: Scalar UDFs (User-Defined Functions)
+url: sql-ref-functions-udf-scalar.html
+  - text: UDAFs (User-Defined Aggregate Functions)
+url: sql-ref-functions-udf-aggregate.html
+  - text: Integration with Hive UDFs/UDAFs/UDTFs
+url: sql-ref-functions-udf-hive.html
+- text: Identifiers
+  url: sql-ref-identifier.html
+- text: Literals
+  url: sql-ref-literals.html
+- text: Null Semantics
+  url: sql-ref-null-semantics.html
 - text: SQL Syntax
   url: sql-ref-syntax.html
   subitems:
@@ -247,16 +260,3 @@
   url: sql-ref-syntax-aux-resource-mgmt-list-file.html
 - text: LIST JAR
   url: sql-ref-syntax-aux-resource-mgmt-list-jar.html
-- text: Functions
-  url: sql-ref-functions.html
-  subitems:
-  - text: Built-in Functions
-url: sql-ref-functions-builtin.html
-  - text: Scalar UDFs (User-Defined Functions)
-url: sql-ref-functions-udf-scalar.html
-  - text: UDAFs (User-Defined Aggregate Functions)
-url: sql-ref-functions-udf-aggregate.html
-  - text: Integration with Hive UDFs/UDAFs/UDTFs
-url: sql-ref-functions-udf-hive.html
-- text: Datetime Pattern
-  url: sql-ref-datetime-pattern.html
diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md 
b/docs/sql-ref-syntax-aux-conf-mgmt.md
index f5e48ef2..1900fb7 100644
--- a/docs/sql-ref-syntax-aux-conf-mgmt.md
+++ b/docs/sql-ref-syntax-aux-conf-mgmt.md
@@ -20,4 +20,4 @@ license: |
 ---
 
  * [SET](sql-ref-syntax-aux-conf-mgmt-set.html)
- * [UNSET](sql-ref-syntax-aux-conf-mgmt-reset.html)
+ * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html)
diff -

[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 23019aa  [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL 
ref
23019aa is described below

commit 23019aa429d8f0db52b1ed5e9e6dc00ea7b94740
Author: Huaxin Gao 
AuthorDate: Sat May 23 08:43:16 2020 +0900

[SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL ref

### What changes were proposed in this pull request?
Fix a few issues in SQL Reference

### Why are the changes needed?
To make SQL Reference look better

### Does this PR introduce _any_ user-facing change?
Yes.
before:
https://user-images.githubusercontent.com/13592258/82639052-d0f38a80-9bbc-11ea-81a4-22def4ca5cc0.png";>

after:

https://user-images.githubusercontent.com/13592258/82639063-d5b83e80-9bbc-11ea-84d1-8361e6bee949.png";>

before:
https://user-images.githubusercontent.com/13592258/82639252-3e9fb680-9bbd-11ea-863c-e6a6c2f83a06.png";>

after:

https://user-images.githubusercontent.com/13592258/82639265-42cbd400-9bbd-11ea-8df2-fc5c255b84d3.png";>

before:
https://user-images.githubusercontent.com/13592258/82639072-db158900-9bbc-11ea-9963-731881cda4fd.png";>

after

https://user-images.githubusercontent.com/13592258/82639082-dfda3d00-9bbc-11ea-9bd2-f922cc91f175.png";>

### How was this patch tested?
Manually build and check

Closes #28608 from huaxingao/doc_fix.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit ad9532a09c70bf6acc8b79b4fdbfcd6afadcbc91)
Signed-off-by: Takeshi Yamamuro 
---
 docs/_data/menu-sql.yaml | 42 ++--
 docs/sql-ref-syntax-aux-conf-mgmt.md |  2 +-
 docs/sql-ref-syntax-qry.md   | 35 +++---
 docs/sql-ref-syntax.md   | 28 
 docs/sql-ref.md  | 16 +++---
 5 files changed, 67 insertions(+), 56 deletions(-)

diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
index 57fc493..289a9d3 100644
--- a/docs/_data/menu-sql.yaml
+++ b/docs/_data/menu-sql.yaml
@@ -76,14 +76,6 @@
 - text: SQL Reference
   url: sql-ref.html
   subitems:
-- text: Data Types
-  url: sql-ref-datatypes.html
-- text: Identifiers
-  url: sql-ref-identifier.html
-- text: Literals
-  url: sql-ref-literals.html
-- text: Null Semantics
-  url: sql-ref-null-semantics.html
 - text: ANSI Compliance
   url: sql-ref-ansi-compliance.html
   subitems:
@@ -93,6 +85,27 @@
   url: sql-ref-ansi-compliance.html#type-conversion
 - text: SQL Keywords
   url: sql-ref-ansi-compliance.html#sql-keywords
+- text: Data Types
+  url: sql-ref-datatypes.html
+- text: Datetime Pattern
+  url: sql-ref-datetime-pattern.html
+- text: Functions
+  url: sql-ref-functions.html
+  subitems:
+  - text: Built-in Functions
+url: sql-ref-functions-builtin.html
+  - text: Scalar UDFs (User-Defined Functions)
+url: sql-ref-functions-udf-scalar.html
+  - text: UDAFs (User-Defined Aggregate Functions)
+url: sql-ref-functions-udf-aggregate.html
+  - text: Integration with Hive UDFs/UDAFs/UDTFs
+url: sql-ref-functions-udf-hive.html
+- text: Identifiers
+  url: sql-ref-identifier.html
+- text: Literals
+  url: sql-ref-literals.html
+- text: Null Semantics
+  url: sql-ref-null-semantics.html
 - text: SQL Syntax
   url: sql-ref-syntax.html
   subitems:
@@ -247,16 +260,3 @@
   url: sql-ref-syntax-aux-resource-mgmt-list-file.html
 - text: LIST JAR
   url: sql-ref-syntax-aux-resource-mgmt-list-jar.html
-- text: Functions
-  url: sql-ref-functions.html
-  subitems:
-  - text: Built-in Functions
-url: sql-ref-functions-builtin.html
-  - text: Scalar UDFs (User-Defined Functions)
-url: sql-ref-functions-udf-scalar.html
-  - text: UDAFs (User-Defined Aggregate Functions)
-url: sql-ref-functions-udf-aggregate.html
-  - text: Integration with Hive UDFs/UDAFs/UDTFs
-url: sql-ref-functions-udf-hive.html
-- text: Datetime Pattern
-  url: sql-ref-datetime-pattern.html
diff --git a/docs/sql-ref-syntax-aux-conf-mgmt.md 
b/docs/sql-ref-syntax-aux-conf-mgmt.md
index f5e48ef2..1900fb7 100644
--- a/docs/sql-ref-syntax-aux-conf-mgmt.md
+++ b/docs/sql-ref-syntax-aux-conf-mgmt.md
@@ -20,4 +20,4 @@ license: |
 ---
 
  * [SET](sql-ref-syntax-aux-conf-mgmt-set.html)
- * [UNSET](sql-ref-syntax-aux-conf-mgmt-reset.html)
+ * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html)
diff -

[spark] branch master updated (2115c55 -> ad9532a)

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2115c55  [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS 
and TIMESTAMP_MICROS functions
 add ad9532a  [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL 
ref

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml | 42 ++--
 docs/sql-ref-syntax-aux-conf-mgmt.md |  2 +-
 docs/sql-ref-syntax-qry.md   | 35 +++---
 docs/sql-ref-syntax.md   | 28 
 docs/sql-ref.md  | 16 +++---
 5 files changed, 67 insertions(+), 56 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2115c55 -> ad9532a)

2020-05-22 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2115c55  [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLIS 
and TIMESTAMP_MICROS functions
 add ad9532a  [SPARK-31612][SQL][DOCS][FOLLOW-UP] Fix a few issues in SQL 
ref

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml | 42 ++--
 docs/sql-ref-syntax-aux-conf-mgmt.md |  2 +-
 docs/sql-ref-syntax-qry.md   | 35 +++---
 docs/sql-ref-syntax.md   | 28 
 docs/sql-ref.md  | 16 +++---
 5 files changed, 67 insertions(+), 56 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (f6053b9 -> 847d6d4)

2020-05-18 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f6053b9  Preparing development version 3.0.1-SNAPSHOT
 add 847d6d4  [SPARK-31102][SQL][3.0] Spark-sql fails to parse when 
contains comment

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/parser/SqlBase.g4  |  2 +-
 .../spark/sql/catalyst/parser/PlanParserSuite.scala  | 10 ++
 .../sql/hive/thriftserver/SparkSQLCLIDriver.scala| 12 ++--
 .../spark/sql/hive/thriftserver/CliSuite.scala   | 20 +++-
 4 files changed, 28 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-07 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fafe0f3  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
fafe0f3 is described below

commit fafe0f311cc1c48002b68f26ab9b274ffd565665
Author: Kent Yao 
AuthorDate: Thu May 7 14:37:03 2020 +0900

[SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 
'address in use' BindException with retry

### What changes were proposed in this pull request?
The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - 
https://issues.apache.org/jira/browse/HADOOP-12656
> Looking at MiniKdc implementation, if port is 0, the constructor use 
ServerSocket to find an unused port, assign the port number to the member 
variable port and close the ServerSocket object; later, in initKDCServer(), 
instantiate a TcpTransport object and bind at that port.

> It appears that the port may be used in between, and then throw the 
exception.

Related test failures are suspected,  such as 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/

```scala
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
*** (15 seconds, 426 milliseconds)
[info]   java.net.BindException: Address already in use
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:433)
[info]   at sun.nio.ch.Net.bind(Net.java:425)
[info]   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
[info]   at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
```
After comparing the error stack trace with similar issues reported  in 
different projects, such as
https://issues.apache.org/jira/browse/KAFKA-3453
https://issues.apache.org/jira/browse/HBASE-14734

We can be sure that they are caused by the same problem issued in 
HADOOP-12656.

In the PR, We apply the approach from HBASE first before we finally drop 
Hadoop 2.7.x

### Why are the changes needed?

fix test flakiness

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?

the test itself passing Jenkins

Closes #28442 from yaooqinn/SPARK-31631.

Authored-by: Kent Yao 
    Signed-off-by: Takeshi Yamamuro 
---
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
index 275bca3..fc28968 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
@@ -19,10 +19,14 @@ package org.apache.spark.deploy.security
 
 import java.security.PrivilegedExceptionAction
 
+import scala.util.control.NonFatal
+
 import org.apache.hadoop.conf.Configuration
 import 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION
 import org.apache.hadoop.minikdc.MiniKdc
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.scalatest.concurrent.Eventually._
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.{SparkConf, SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends 
SparkFunSuite {

[spark] branch branch-3.0 updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-07 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fafe0f3  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
fafe0f3 is described below

commit fafe0f311cc1c48002b68f26ab9b274ffd565665
Author: Kent Yao 
AuthorDate: Thu May 7 14:37:03 2020 +0900

[SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 
'address in use' BindException with retry

### What changes were proposed in this pull request?
The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - 
https://issues.apache.org/jira/browse/HADOOP-12656
> Looking at MiniKdc implementation, if port is 0, the constructor use 
ServerSocket to find an unused port, assign the port number to the member 
variable port and close the ServerSocket object; later, in initKDCServer(), 
instantiate a TcpTransport object and bind at that port.

> It appears that the port may be used in between, and then throw the 
exception.

Related test failures are suspected,  such as 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/

```scala
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
*** (15 seconds, 426 milliseconds)
[info]   java.net.BindException: Address already in use
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:433)
[info]   at sun.nio.ch.Net.bind(Net.java:425)
[info]   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
[info]   at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
```
After comparing the error stack trace with similar issues reported  in 
different projects, such as
https://issues.apache.org/jira/browse/KAFKA-3453
https://issues.apache.org/jira/browse/HBASE-14734

We can be sure that they are caused by the same problem issued in 
HADOOP-12656.

In the PR, We apply the approach from HBASE first before we finally drop 
Hadoop 2.7.x

### Why are the changes needed?

fix test flakiness

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?

the test itself passing Jenkins

Closes #28442 from yaooqinn/SPARK-31631.

Authored-by: Kent Yao 
    Signed-off-by: Takeshi Yamamuro 
---
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
index 275bca3..fc28968 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
@@ -19,10 +19,14 @@ package org.apache.spark.deploy.security
 
 import java.security.PrivilegedExceptionAction
 
+import scala.util.control.NonFatal
+
 import org.apache.hadoop.conf.Configuration
 import 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION
 import org.apache.hadoop.minikdc.MiniKdc
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.scalatest.concurrent.Eventually._
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.{SparkConf, SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends 
SparkFunSuite {

[spark] branch master updated (bd6b53c -> b31ae7b)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
 add b31ae7b  [SPARK-31615][SQL] Pretty string output for sql method of 
RuntimeReplaceable expressions

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/test_context.py   |   2 +-
 .../sql/catalyst/expressions/Expression.scala  |  14 ++
 .../catalyst/expressions/datetimeExpressions.scala |  31 ++-
 .../sql/catalyst/expressions/nullExpressions.scala |  10 +-
 .../catalyst/expressions/stringExpressions.scala   |   4 +-
 .../apache/spark/sql/catalyst/util/package.scala   |   2 +
 .../sql-functions/sql-expression-schema.md |  16 +-
 .../test/resources/sql-tests/inputs/extract.sql|   5 +
 .../sql-tests/results/ansi/datetime.sql.out|  64 +++---
 .../sql-tests/results/ansi/interval.sql.out|   8 +-
 .../sql-tests/results/csv-functions.sql.out|   2 +-
 .../resources/sql-tests/results/datetime.sql.out   |  64 +++---
 .../resources/sql-tests/results/extract.sql.out| 214 -
 .../sql-tests/results/group-by-filter.sql.out  |  12 +-
 .../resources/sql-tests/results/interval.sql.out   |  10 +-
 .../sql-tests/results/json-functions.sql.out   |   2 +-
 .../sql-tests/results/postgreSQL/text.sql.out  |   6 +-
 .../sql-tests/results/predicate-functions.sql.out  |  26 +--
 .../results/sql-compatibility-functions.sql.out|  18 +-
 .../sql-tests/results/string-functions.sql.out |   8 +-
 .../typeCoercion/native/dateTimeOperations.sql.out |   8 +-
 .../native/stringCastAndExpressions.sql.out|   4 +-
 .../scala/org/apache/spark/sql/ExplainSuite.scala  |   6 +-
 23 files changed, 292 insertions(+), 244 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (052ff49 -> bd6b53c)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 052ff49  [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input 
vectors
 add bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry

No new revisions were added by this update.

Summary of changes:
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd6b53c  [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc 
which throws 'address in use' BindException with retry
bd6b53c is described below

commit bd6b53cc0ba93f7f1ff8e00ccc366cd02a24d72a
Author: Kent Yao 
AuthorDate: Thu May 7 14:37:03 2020 +0900

[SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 
'address in use' BindException with retry

### What changes were proposed in this pull request?
The `Kafka*Suite`s are flaky because of the Hadoop MiniKdc issue - 
https://issues.apache.org/jira/browse/HADOOP-12656
> Looking at MiniKdc implementation, if port is 0, the constructor use 
ServerSocket to find an unused port, assign the port number to the member 
variable port and close the ServerSocket object; later, in initKDCServer(), 
instantiate a TcpTransport object and bind at that port.

> It appears that the port may be used in between, and then throw the 
exception.

Related test failures are suspected,  such as 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/

```scala
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
*** (15 seconds, 426 milliseconds)
[info]   java.net.BindException: Address already in use
[info]   at sun.nio.ch.Net.bind0(Native Method)
[info]   at sun.nio.ch.Net.bind(Net.java:433)
[info]   at sun.nio.ch.Net.bind(Net.java:425)
[info]   at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
[info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
[info]   at 
org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
[info]   at 
org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
[info]   at 
org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
```
After comparing the error stack trace with similar issues reported  in 
different projects, such as
https://issues.apache.org/jira/browse/KAFKA-3453
https://issues.apache.org/jira/browse/HBASE-14734

We can be sure that they are caused by the same problem issued in 
HADOOP-12656.

In the PR, We apply the approach from HBASE first before we finally drop 
Hadoop 2.7.x

### Why are the changes needed?

fix test flakiness

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?

the test itself passing Jenkins

Closes #28442 from yaooqinn/SPARK-31631.

Authored-by: Kent Yao 
    Signed-off-by: Takeshi Yamamuro 
---
 .../HadoopDelegationTokenManagerSuite.scala| 30 --
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 29 ++---
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
index 275bca3..fc28968 100644
--- 
a/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManagerSuite.scala
@@ -19,10 +19,14 @@ package org.apache.spark.deploy.security
 
 import java.security.PrivilegedExceptionAction
 
+import scala.util.control.NonFatal
+
 import org.apache.hadoop.conf.Configuration
 import 
org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION
 import org.apache.hadoop.minikdc.MiniKdc
 import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+import org.scalatest.concurrent.Eventually._
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark.{SparkConf, SparkFunSuite}
 import org.apache.spark.deploy.SparkHadoopUtil
@@ -88,8 +92,30 @@ class HadoopDelegationTokenManagerSuite extends 
SparkFunSuite {
   // krb5.conf. MiniKdc set

[spark] branch branch-3.0 updated: [SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate pushdown

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new dc7324e  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown
dc7324e is described below

commit dc7324e5e39783995b90e64d4737127c10a210cf
Author: Liang-Chi Hsieh 
AuthorDate: Thu May 7 09:57:08 2020 +0900

[SPARK-31365][SQL][FOLLOWUP] Refine config document for nested predicate 
pushdown

### What changes were proposed in this pull request?

This is a followup to address the 
https://github.com/apache/spark/pull/28366#discussion_r420611872 by refining 
the SQL config document.

### Why are the changes needed?

Make developers less confusing.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Only doc change.

Closes #28468 from viirya/SPARK-31365-followup.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 9bf738724a3895551464d8ba0d455bc90868983f)
Signed-off-by: Takeshi Yamamuro 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 8d673c5..6c18280 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2070,7 +2070,8 @@ object SQLConf {
   .internal()
   .doc("A comma-separated list of data source short names or fully 
qualified data source " +
 "implementation class names for which Spark tries to push down 
predicates for nested " +
-"columns and/or names containing `dots` to data sources. Currently, 
Parquet implements " +
+"columns and/or names containing `dots` to data sources. This 
configuration is only " +
+"effective with file-based data source in DSv1. Currently, Parquet 
implements " +
 "both optimizations while ORC only supports predicates for names 
containing `dots`. The " +
 "other data sources don't support this feature yet. So the default 
value is 'parquet,orc'.")
   .version("3.0.0")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3d38bc2 -> 9bf7387)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d38bc2  [SPARK-31361][SQL][FOLLOWUP] Use 
LEGACY_PARQUET_REBASE_DATETIME_IN_READ instead of avro config in ParquetIOSuite
 add 9bf7387  [SPARK-31365][SQL][FOLLOWUP] Refine config document for 
nested predicate pushdown

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f8a20c4  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
f8a20c4 is described below

commit f8a20c470bf115b0834970ce02eb2ec103e0f6df
Author: HyukjinKwon 
AuthorDate: Thu May 7 09:00:59 2020 +0900

[SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' 
configuration

### What changes were proposed in this pull request?

This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.

### Why are the changes needed?

This optimization can cause a potential correctness issue, see also 
SPARK-26709.
Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.

Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.

### Does this PR introduce _any_ user-facing change?

Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation 
warning:

```scala
scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
```
```
20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
 deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```
```scala
scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
```
```
20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```

### How was this patch tested?

Manually tested.

Closes #28459 from HyukjinKwon/SPARK-31647.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
    (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378)
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 51404a2..8d673c5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -844,8 +844,10 @@ object SQLConf {
 .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
   "to produce the partition columns instead of table scans. It applies 
when all the columns " +
   "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-  "distinct semantics. By default the optimization is disabled, since it 
may return " +
-  "incorrect results when the files are empty.")
+  "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +
+  "3.0 since it may return incorrect results when the files are empty, see 
also SPARK-26709." +
+  "It will be removed in the future releases. If you must use, use 
'SparkSessionExtensions' " +
+  "instead to inject it as a custom rule.")
 .version("2.1.1")
 .booleanConf
 .createWithDefault(false)
@@ -2587,7 +2589,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",
+"Avoid to depend on this optimization to prevent a potential 
correctness issue. " +
+  "If you must use, use 'SparkSessionExtensions' instead to inject it 
as a custom rule.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f8a20c4  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
f8a20c4 is described below

commit f8a20c470bf115b0834970ce02eb2ec103e0f6df
Author: HyukjinKwon 
AuthorDate: Thu May 7 09:00:59 2020 +0900

[SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' 
configuration

### What changes were proposed in this pull request?

This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.

### Why are the changes needed?

This optimization can cause a potential correctness issue, see also 
SPARK-26709.
Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.

Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.

### Does this PR introduce _any_ user-facing change?

Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation 
warning:

```scala
scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
```
```
20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
 deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```
```scala
scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
```
```
20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
 to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
inject it as a custom rule.
```

### How was this patch tested?

Manually tested.

Closes #28459 from HyukjinKwon/SPARK-31647.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
    (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378)
Signed-off-by: Takeshi Yamamuro 
---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 51404a2..8d673c5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -844,8 +844,10 @@ object SQLConf {
 .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
   "to produce the partition columns instead of table scans. It applies 
when all the columns " +
   "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-  "distinct semantics. By default the optimization is disabled, since it 
may return " +
-  "incorrect results when the files are empty.")
+  "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +
+  "3.0 since it may return incorrect results when the files are empty, see 
also SPARK-26709." +
+  "It will be removed in the future releases. If you must use, use 
'SparkSessionExtensions' " +
+  "instead to inject it as a custom rule.")
 .version("2.1.1")
 .booleanConf
 .createWithDefault(false)
@@ -2587,7 +2589,10 @@ object SQLConf {
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
   DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+  DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",
+"Avoid to depend on this optimization to prevent a potential 
correctness issue. " +
+  "If you must use, use 'SparkSessionExtensions' instead to inject it 
as a custom rule.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (09ece50 -> 5c5dd77)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark
 add 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (09ece50 -> 5c5dd77)

2020-05-06 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 09ece50  [SPARK-31609][ML][PYSPARK] Add VarianceThresholdSelector to 
PySpark
 add 5c5dd77  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala| 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table

2020-05-04 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ccde0a1  [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown 
Table
ccde0a1 is described below

commit ccde0a1ae2d880585cb554cc67f75ef972a78c67
Author: Dilip Biswal 
AuthorDate: Tue May 5 15:21:14 2020 +0900

[SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown Table

### What changes were proposed in this pull request?
This PR is to clean up the markdown file in remaining pages in sql 
reference. The first one was done by gatorsmile in  
[28415](https://github.com/apache/spark/pull/28415)

- Replace HTML table by MD table
- **sql-ref-ansi-compliance.md**
https://user-images.githubusercontent.com/14225158/80848981-1cbca080-8bca-11ea-8a5d-63174b31c800.png";>
- **sql-ref-datatypes.md (Scala)**
https://user-images.githubusercontent.com/14225158/80849057-6a390d80-8bca-11ea-8866-ab08bab31432.png";>
https://user-images.githubusercontent.com/14225158/80849061-6c9b6780-8bca-11ea-834c-eb93d3ab47ae.png";>
- **sql-ref-datatypes.md (Java)**
https://user-images.githubusercontent.com/14225158/80849138-b3895d00-8bca-11ea-9d3b-555acad2086c.png";>
https://user-images.githubusercontent.com/14225158/80849140-b6844d80-8bca-11ea-9ca9-1812b6a76c02.png";>
- **sql-ref-datatypes.md (Python)**
https://user-images.githubusercontent.com/14225158/80849202-0400ba80-8bcb-11ea-96a5-7caecbf9dbbf.png";>
https://user-images.githubusercontent.com/14225158/80849205-06fbab00-8bcb-11ea-8f00-6df52b151684.png";>
- **sql-ref-datatypes.md (R)**
https://user-images.githubusercontent.com/14225158/80849288-5fcb4380-8bcb-11ea-8277-8589b5bb31bc.png";>
https://user-images.githubusercontent.com/14225158/80849294-62c63400-8bcb-11ea-9438-b4f1193bc757.png";>
- **sql-ref-datatypes.md (SQL)**
https://user-images.githubusercontent.com/14225158/80849336-986b1d00-8bcb-11ea-9736-5fb40496b681.png";>
- **sql-ref-syntax-qry-select-tvf.md**
https://user-images.githubusercontent.com/14225158/80849399-d10af680-8bcb-11ea-8dc2-e3e750e21a59.png";>

### Why are the changes needed?
Make the doc cleaner and easily editable by MD editors

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Manually using jekyll serve

Closes #28433 from dilipbiswal/sql-doc-table-cleanup.

Authored-by: Dilip Biswal 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5052d9557d964c07d0b8bd2e2b08ede7c6958118)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-ansi-compliance.md   | 542 +-
 docs/sql-ref-datatypes.md | 695 +-
 docs/sql-ref-datetime-pattern.md  |   8 +-
 docs/sql-ref-null-semantics.md| 131 ++-
 docs/sql-ref-syntax-qry-select-tvf.md |  33 +-
 5 files changed, 388 insertions(+), 1021 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 6cf1653..93fb10b 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -27,35 +27,10 @@ The casting behaviours are defined as store assignment 
rules in the standard.
 
 When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies 
with the ANSI store assignment rules. This is a separate configuration because 
its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` 
is disabled by default.
 
-
-Property NameDefaultMeaningSince 
Version
-
-  spark.sql.ansi.enabled
-  false
-  
-(Experimental) When true, Spark tries to conform to the ANSI SQL 
specification:
-1. Spark will throw a runtime exception if an overflow occurs in any 
operation on integral/decimal field.
-2. Spark will forbid using the reserved keywords of ANSI SQL as 
identifiers in the SQL parser.
-  
-  3.0.0
-
-
-  spark.sql.storeAssignmentPolicy
-  ANSI
-  
-(Experimental) When inserting a value into a column with different data 
type, Spark will perform type coercion.
-Currently, we support 3 policies for the type coercion rules: ANSI, legacy 
and strict. With ANSI policy,
-Spark performs the type coercion as per ANSI SQL. In practice, the 
behavior is mostly the same as PostgreSQL.
-It disallows certain unreasonable type conversions such as converting 
string to int or double to boolean.
-With legacy policy, Spark allows the type coercion as long as it is a 
valid Cast, which is very loose.
-e.g. converting string to int or double to boolean is allowed.
-It is also the only behavior in Spark 2.x and it is compatible with Hive.
-With strict policy, Spark doesn't allow any possib

[spark] branch master updated (8d1f7d2 -> 5052d95)

2020-05-04 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8d1f7d2  [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent 
TableAlreadyExistsException
 add 5052d95  [SPARK-31030][DOCS][FOLLOWUP] Replace HTML Table by Markdown 
Table

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md   | 542 +-
 docs/sql-ref-datatypes.md | 695 +-
 docs/sql-ref-datetime-pattern.md  |   8 +-
 docs/sql-ref-null-semantics.md| 131 ++-
 docs/sql-ref-syntax-qry-select-tvf.md |  33 +-
 5 files changed, 388 insertions(+), 1021 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (735771e -> 8d1f7d2)

2020-05-04 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 735771e  [SPARK-31623][SQL][TESTS] Benchmark rebasing of INT96 and 
TIMESTAMP_MILLIS timestamps in read/write
 add 8d1f7d2  [SPARK-31467][SQL][TEST] Refactor the sql tests to prevent 
TableAlreadyExistsException

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/CachedTableSuite.scala|  238 ++--
 .../apache/spark/sql/ColumnExpressionSuite.scala   |   40 +-
 .../org/apache/spark/sql/DataFrameSuite.scala  |   87 +-
 .../spark/sql/DataFrameWindowFunctionsSuite.scala  |  122 +-
 .../scala/org/apache/spark/sql/JoinSuite.scala |  270 ++--
 .../org/apache/spark/sql/JsonFunctionsSuite.scala  |   72 +-
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  895 +++--
 .../spark/sql/ScalaReflectionRelationSuite.scala   |   96 +-
 .../scala/org/apache/spark/sql/SubquerySuite.scala |   40 +-
 .../apache/spark/sql/UserDefinedTypeSuite.scala|   14 +-
 .../sql/execution/SQLWindowFunctionSuite.scala |  433 ---
 .../columnar/InMemoryColumnarQuerySuite.scala  |   46 +-
 .../sql/execution/datasources/json/JsonSuite.scala | 1367 ++--
 .../sql/execution/joins/BroadcastJoinSuite.scala   |   37 +-
 .../sql/execution/metric/SQLMetricsSuite.scala |   45 +-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |   14 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala |   16 +-
 .../apache/spark/sql/sources/SaveLoadSuite.scala   |   36 +-
 .../apache/spark/sql/streaming/StreamSuite.scala   |   16 +-
 .../sql/streaming/continuous/ContinuousSuite.scala |   22 +-
 .../sql/hive/execution/AggregationQuerySuite.scala |   78 +-
 .../spark/sql/hive/execution/HiveDDLSuite.scala|   54 +-
 .../spark/sql/hive/execution/HiveQuerySuite.scala  |   72 +-
 .../spark/sql/hive/execution/SQLQuerySuite.scala   |  535 
 24 files changed, 2445 insertions(+), 2200 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark`

2020-05-03 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d400880  [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`
d400880 is described below

commit d4008804f987fa3d3405335e2469886a0d61dd67
Author: Max Gekk 
AuthorDate: Mon May 4 09:39:50 2020 +0900

[SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`

### What changes were proposed in this pull request?
- Changed to the number of rows in benchmark cases from 3 to the actual 
number `N`.
- Regenerated benchmark results in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
The changes are needed to have:
- Correct benchmark results
- Base line for other perf improvements that can be checked in the same 
environment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the benchmark and checking its output.

Closes #28440 from MaxGekk/SPARK-31527-DateTimeBenchmark-followup.

Authored-by: Max Gekk 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 2fb85f6b684843f337b6e73ba57ee9e57a53496d)
Signed-off-by: Takeshi Yamamuro 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 474 ++---
 .../execution/benchmark/DateTimeBenchmark.scala|   2 +-
 3 files changed, 475 insertions(+), 475 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index 1004bcf..61b4c76 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -2,456 +2,456 @@
 datetime +/- interval
 

 
-Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.4
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 datetime +/- interval:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-date + interval(m)  919933 
 22  0.0   306237514.3   1.0X
-date + interval(m, d)   910916 
  9  0.0   303338619.0   1.0X
-date + interval(m, d, ms)  3912   3923 
 16  0.0  1303942791.7   0.2X
-date - interval(m)  883887 
  6  0.0   294268789.3   1.0X
-date - interval(m, d)   898911 
 18  0.0   299453403.0   1.0X
-date - interval(m, d, ms)  3937   3944 
 11  0.0  1312269472.0   0.2X
-timestamp + interval(m)2226   2236 
 14  0.0   741972014.3   0.4X
-timestamp + interval(m, d) 2264   2274 
 13  0.0   754709121.0   0.4X
-timestamp + interval(m, d, ms) 2202   2223 
 30  0.0   734001075.0   0.4X
-timestamp - interval(m)1992   2005 
 17  0.0   664152744.7   0.5X
-timestamp - interval(m, d) 2069   2075 
  9  0.0   689631159.0   0.4X
-timestamp - interval(m, d, ms) 2240   2244 
  6  0.0   746538728.0   0.4X
+date + interval(m) 1485   1567 
116  6.7 148.5   1.0X
+date + interval(m, d)  1504   1510 
  9  6.6 150.4   1.0X
+date + interval(m, d, ms)  7000   7013 
 18  1.4 700.0   0.2X
+date - interval(m) 1466   1478 
 17  6.8 146.6   1.0X
+date - interval(m, d)  1533   1534 
  1  6.5 153.3   1.0X

[spark] branch branch-3.0 updated: [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in `DateTimeBenchmark`

2020-05-03 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d400880  [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`
d400880 is described below

commit d4008804f987fa3d3405335e2469886a0d61dd67
Author: Max Gekk 
AuthorDate: Mon May 4 09:39:50 2020 +0900

[SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`

### What changes were proposed in this pull request?
- Changed to the number of rows in benchmark cases from 3 to the actual 
number `N`.
- Regenerated benchmark results in the environment:

| Item | Description |
|  | |
| Region | us-west-2 (Oregon) |
| Instance | r3.xlarge |
| AMI | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 
(ami-06f2f779464715dc5) |
| Java | OpenJDK 64-Bit Server VM 1.8.0_242 and OpenJDK 64-Bit Server VM 
11.0.6+10 |

### Why are the changes needed?
The changes are needed to have:
- Correct benchmark results
- Base line for other perf improvements that can be checked in the same 
environment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the benchmark and checking its output.

Closes #28440 from MaxGekk/SPARK-31527-DateTimeBenchmark-followup.

Authored-by: Max Gekk 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 2fb85f6b684843f337b6e73ba57ee9e57a53496d)
Signed-off-by: Takeshi Yamamuro 
---
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 474 ++---
 .../execution/benchmark/DateTimeBenchmark.scala|   2 +-
 3 files changed, 475 insertions(+), 475 deletions(-)

diff --git a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt 
b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
index 1004bcf..61b4c76 100644
--- a/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
+++ b/sql/core/benchmarks/DateTimeBenchmark-jdk11-results.txt
@@ -2,456 +2,456 @@
 datetime +/- interval
 

 
-Java HotSpot(TM) 64-Bit Server VM 11.0.5+10-LTS on Mac OS X 10.15.4
-Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
+OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 
4.15.0-1063-aws
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 datetime +/- interval:Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-date + interval(m)  919933 
 22  0.0   306237514.3   1.0X
-date + interval(m, d)   910916 
  9  0.0   303338619.0   1.0X
-date + interval(m, d, ms)  3912   3923 
 16  0.0  1303942791.7   0.2X
-date - interval(m)  883887 
  6  0.0   294268789.3   1.0X
-date - interval(m, d)   898911 
 18  0.0   299453403.0   1.0X
-date - interval(m, d, ms)  3937   3944 
 11  0.0  1312269472.0   0.2X
-timestamp + interval(m)2226   2236 
 14  0.0   741972014.3   0.4X
-timestamp + interval(m, d) 2264   2274 
 13  0.0   754709121.0   0.4X
-timestamp + interval(m, d, ms) 2202   2223 
 30  0.0   734001075.0   0.4X
-timestamp - interval(m)1992   2005 
 17  0.0   664152744.7   0.5X
-timestamp - interval(m, d) 2069   2075 
  9  0.0   689631159.0   0.4X
-timestamp - interval(m, d, ms) 2240   2244 
  6  0.0   746538728.0   0.4X
+date + interval(m) 1485   1567 
116  6.7 148.5   1.0X
+date + interval(m, d)  1504   1510 
  9  6.6 150.4   1.0X
+date + interval(m, d, ms)  7000   7013 
 18  1.4 700.0   0.2X
+date - interval(m) 1466   1478 
 17  6.8 146.6   1.0X
+date - interval(m, d)  1533   1534 
  1  6.5 153.3   1.0X

[spark] branch master updated (f53d8c6 -> 2fb85f6)

2020-05-03 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f53d8c6  [SPARK-31571][R] Overhaul stop/message/warning calls to be 
more canonical
 add 2fb85f6  [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 474 ++---
 .../execution/benchmark/DateTimeBenchmark.scala|   2 +-
 3 files changed, 475 insertions(+), 475 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f53d8c6 -> 2fb85f6)

2020-05-03 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f53d8c6  [SPARK-31571][R] Overhaul stop/message/warning calls to be 
more canonical
 add 2fb85f6  [SPARK-31527][SQL][TESTS][FOLLOWUP] Fix the number of rows in 
`DateTimeBenchmark`

No new revisions were added by this update.

Summary of changes:
 .../benchmarks/DateTimeBenchmark-jdk11-results.txt | 474 ++---
 sql/core/benchmarks/DateTimeBenchmark-results.txt  | 474 ++---
 .../execution/benchmark/DateTimeBenchmark.scala|   2 +-
 3 files changed, 475 insertions(+), 475 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

2020-05-02 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3aa659c  [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default
3aa659c is described below

commit 3aa659ce29877f386a24da9d04e66069d04afaa8
Author: Max Gekk 
AuthorDate: Sat May 2 17:54:36 2020 +0900

[MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

### What changes were proposed in this pull request?
Set `spark.ui.enabled` to `false` in `SqlBasedBenchmark.getSparkSession`. 
This disables UI in all SQL benchmarks by default.

### Why are the changes needed?
UI overhead lowers numbers in the `Relative` column and impacts on `Stdev` 
in benchmark results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Checked by running `DateTimeRebaseBenchmark`.

Closes #28432 from MaxGekk/ui-off-in-benchmarks.

Authored-by: Max Gekk 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 13dddee9a8490ead00ff00bd741db4a170dfd759)
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala  | 2 --
 .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala  | 2 --
 .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
index d29c5e3..0fc43c7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
@@ -23,7 +23,6 @@ import scala.util.Random
 
 import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.internal.config.UI._
 import org.apache.spark.sql.{DataFrame, DataFrameWriter, Row, SparkSession}
 import org.apache.spark.sql.catalyst.InternalRow
 import 
org.apache.spark.sql.execution.datasources.parquet.{SpecificParquetRecordReaderBase,
 VectorizedParquetRecordReader}
@@ -52,7 +51,6 @@ object DataSourceReadBenchmark extends SqlBasedBenchmark {
   .set("spark.master", "local[1]")
   .setIfMissing("spark.driver.memory", "3g")
   .setIfMissing("spark.executor.memory", "3g")
-  .setIfMissing(UI_ENABLED, false)
 
 val sparkSession = SparkSession.builder.config(conf).getOrCreate()
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
index 444ffa4..b3f65d4 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
@@ -23,7 +23,6 @@ import scala.util.Random
 
 import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.internal.config.UI._
 import org.apache.spark.sql.{DataFrame, SparkSession}
 import org.apache.spark.sql.functions.monotonically_increasing_id
 import org.apache.spark.sql.internal.SQLConf
@@ -49,7 +48,6 @@ object FilterPushdownBenchmark extends SqlBasedBenchmark {
   .set("spark.master", "local[1]")
   .setIfMissing("spark.driver.memory", "3g")
   .setIfMissing("spark.executor.memory", "3g")
-  .setIfMissing(UI_ENABLED, false)
   .setIfMissing("orc.compression", "snappy")
   .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
index ee7a03e..28387dc 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.internal.config.UI.UI_ENABLED
 import org.apache.spark.sql.{Dataset, SparkSession}
 import org.apache.spark.sql.SaveMode.Overwrite
 import org.apache.spark.sql.catalyst.plans.SQLHelper
@@ -37,6 +38,7 @@ trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
   .appName(this.getClass.getCanonicalName)

[spark] branch branch-3.0 updated: [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

2020-05-02 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3aa659c  [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default
3aa659c is described below

commit 3aa659ce29877f386a24da9d04e66069d04afaa8
Author: Max Gekk 
AuthorDate: Sat May 2 17:54:36 2020 +0900

[MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

### What changes were proposed in this pull request?
Set `spark.ui.enabled` to `false` in `SqlBasedBenchmark.getSparkSession`. 
This disables UI in all SQL benchmarks by default.

### Why are the changes needed?
UI overhead lowers numbers in the `Relative` column and impacts on `Stdev` 
in benchmark results.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Checked by running `DateTimeRebaseBenchmark`.

Closes #28432 from MaxGekk/ui-off-in-benchmarks.

Authored-by: Max Gekk 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 13dddee9a8490ead00ff00bd741db4a170dfd759)
Signed-off-by: Takeshi Yamamuro 
---
 .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala  | 2 --
 .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala  | 2 --
 .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
index d29c5e3..0fc43c7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
@@ -23,7 +23,6 @@ import scala.util.Random
 
 import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.internal.config.UI._
 import org.apache.spark.sql.{DataFrame, DataFrameWriter, Row, SparkSession}
 import org.apache.spark.sql.catalyst.InternalRow
 import 
org.apache.spark.sql.execution.datasources.parquet.{SpecificParquetRecordReaderBase,
 VectorizedParquetRecordReader}
@@ -52,7 +51,6 @@ object DataSourceReadBenchmark extends SqlBasedBenchmark {
   .set("spark.master", "local[1]")
   .setIfMissing("spark.driver.memory", "3g")
   .setIfMissing("spark.executor.memory", "3g")
-  .setIfMissing(UI_ENABLED, false)
 
 val sparkSession = SparkSession.builder.config(conf).getOrCreate()
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
index 444ffa4..b3f65d4 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
@@ -23,7 +23,6 @@ import scala.util.Random
 
 import org.apache.spark.SparkConf
 import org.apache.spark.benchmark.Benchmark
-import org.apache.spark.internal.config.UI._
 import org.apache.spark.sql.{DataFrame, SparkSession}
 import org.apache.spark.sql.functions.monotonically_increasing_id
 import org.apache.spark.sql.internal.SQLConf
@@ -49,7 +48,6 @@ object FilterPushdownBenchmark extends SqlBasedBenchmark {
   .set("spark.master", "local[1]")
   .setIfMissing("spark.driver.memory", "3g")
   .setIfMissing("spark.executor.memory", "3g")
-  .setIfMissing(UI_ENABLED, false)
   .setIfMissing("orc.compression", "snappy")
   .setIfMissing("spark.sql.parquet.compression.codec", "snappy")
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
index ee7a03e..28387dc 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.execution.benchmark
 
 import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.internal.config.UI.UI_ENABLED
 import org.apache.spark.sql.{Dataset, SparkSession}
 import org.apache.spark.sql.SaveMode.Overwrite
 import org.apache.spark.sql.catalyst.plans.SQLHelper
@@ -37,6 +38,7 @@ trait SqlBasedBenchmark extends BenchmarkBase with SQLHelper {
   .appName(this.getClass.getCanonicalName)

[spark] branch master updated (75da050 -> 13dddee)

2020-05-02 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75da050  [MINOR][SQL][DOCS] Remove two leading spaces from sql tables
 add 13dddee  [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala  | 2 --
 .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala  | 2 --
 .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++
 3 files changed, 2 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (75da050 -> 13dddee)

2020-05-02 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 75da050  [MINOR][SQL][DOCS] Remove two leading spaces from sql tables
 add 13dddee  [MINOR][SQL][TESTS] Disable UI in SQL benchmarks by default

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala  | 2 --
 .../apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala  | 2 --
 .../org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala| 2 ++
 3 files changed, 2 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1222ce0  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements
1222ce0 is described below

commit 1222ce064f97ed9ad34e2fca4d270762592a1854
Author: Pablo Langa 
AuthorDate: Fri May 1 22:09:04 2020 +0900

[SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

### What changes were proposed in this pull request?

The collect_set() aggregate function should produce a set of distinct 
elements. When the column argument's type is BinayType this is not the case.

Example:
```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

case class R(id: String, value: String, bytes: Array[Byte])
def makeR(id: String, value: String) = R(id, value, value.getBytes)
val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), 
makeR("b", "fish")).toDF()
// In the example below "bytesSet" erroneously has duplicates but 
"stringSet" does not (as expected).
df.agg(collect_set('value) as "stringSet", collect_set('bytes) as 
"byteSet").show(truncate=false)
// The same problem is displayed when using window functions.
val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
val result = df.select(
  collect_set('value).over(win) as "stringSet",
  collect_set('bytes).over(win) as "bytesSet"
)
.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", 
size('bytesSet) as "bytesSetSize")
.show()
```

We use a HashSet buffer to accumulate the results, the problem is that 
arrays equality in Scala don't behave as expected, arrays ara just plain java 
arrays and the equality don't compare the content of the arrays
Array(1, 2, 3) == Array(1, 2, 3)  => False
The result is that duplicates are not removed in the hashset

The solution proposed is that in the last stage, when we have all the data 
in the Hashset buffer, we delete duplicates changing the type of the elements 
and then transform it to the original type.
This transformation is only applied when we have a BinaryType

### Why are the changes needed?
Fix the bug explained

### Does this PR introduce any user-facing change?
Yes. Now `collect_set()` correctly deduplicates array of byte.

### How was this patch tested?
Unit testing

Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug.

Authored-by: Pablo Langa 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index be972f0..8dc3171 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.ArrayData
 import org.apache.spark.sql.catalyst.util.GenericArrayData
 import org.apache.spark.sql.types._
 
@@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
   // actual order of input rows.
   override lazy val deterministic: Boolean = false
 
+  protected def convertToBufferElement(value: Any): Any
+
   override def update(buffer: T, input: InternalRow): T = {
 val value = child.eval(input)
 
 // Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
 // See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
 if (value != null) {
-  buffer += InternalRow.copyValue(value)
+  buffer += convertToBufferElement(value)
 }
 buffer
   }
@@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
 buffer ++= other
   }
 
-  override def eval(buffer: T):

[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1222ce0  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements
1222ce0 is described below

commit 1222ce064f97ed9ad34e2fca4d270762592a1854
Author: Pablo Langa 
AuthorDate: Fri May 1 22:09:04 2020 +0900

[SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

### What changes were proposed in this pull request?

The collect_set() aggregate function should produce a set of distinct 
elements. When the column argument's type is BinayType this is not the case.

Example:
```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

case class R(id: String, value: String, bytes: Array[Byte])
def makeR(id: String, value: String) = R(id, value, value.getBytes)
val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), 
makeR("b", "fish")).toDF()
// In the example below "bytesSet" erroneously has duplicates but 
"stringSet" does not (as expected).
df.agg(collect_set('value) as "stringSet", collect_set('bytes) as 
"byteSet").show(truncate=false)
// The same problem is displayed when using window functions.
val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
val result = df.select(
  collect_set('value).over(win) as "stringSet",
  collect_set('bytes).over(win) as "bytesSet"
)
.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", 
size('bytesSet) as "bytesSetSize")
.show()
```

We use a HashSet buffer to accumulate the results, the problem is that 
arrays equality in Scala don't behave as expected, arrays ara just plain java 
arrays and the equality don't compare the content of the arrays
Array(1, 2, 3) == Array(1, 2, 3)  => False
The result is that duplicates are not removed in the hashset

The solution proposed is that in the last stage, when we have all the data 
in the Hashset buffer, we delete duplicates changing the type of the elements 
and then transform it to the original type.
This transformation is only applied when we have a BinaryType

### Why are the changes needed?
Fix the bug explained

### Does this PR introduce any user-facing change?
Yes. Now `collect_set()` correctly deduplicates array of byte.

### How was this patch tested?
Unit testing

Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug.

Authored-by: Pablo Langa 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index be972f0..8dc3171 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.ArrayData
 import org.apache.spark.sql.catalyst.util.GenericArrayData
 import org.apache.spark.sql.types._
 
@@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
   // actual order of input rows.
   override lazy val deterministic: Boolean = false
 
+  protected def convertToBufferElement(value: Any): Any
+
   override def update(buffer: T, input: InternalRow): T = {
 val value = child.eval(input)
 
 // Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
 // See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
 if (value != null) {
-  buffer += InternalRow.copyValue(value)
+  buffer += convertToBufferElement(value)
 }
 buffer
   }
@@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
 buffer ++= other
   }
 
-  override def eval(buffer: T):

[spark] branch branch-3.0 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1795a70  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements
1795a70 is described below

commit 1795a70bb04fad1b8cf76271443a448f8d72fc8a
Author: Pablo Langa 
AuthorDate: Fri May 1 22:09:04 2020 +0900

[SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

### What changes were proposed in this pull request?

The collect_set() aggregate function should produce a set of distinct 
elements. When the column argument's type is BinayType this is not the case.

Example:
```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

case class R(id: String, value: String, bytes: Array[Byte])
def makeR(id: String, value: String) = R(id, value, value.getBytes)
val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), 
makeR("b", "fish")).toDF()
// In the example below "bytesSet" erroneously has duplicates but 
"stringSet" does not (as expected).
df.agg(collect_set('value) as "stringSet", collect_set('bytes) as 
"byteSet").show(truncate=false)
// The same problem is displayed when using window functions.
val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
val result = df.select(
  collect_set('value).over(win) as "stringSet",
  collect_set('bytes).over(win) as "bytesSet"
)
.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", 
size('bytesSet) as "bytesSetSize")
.show()
```

We use a HashSet buffer to accumulate the results, the problem is that 
arrays equality in Scala don't behave as expected, arrays ara just plain java 
arrays and the equality don't compare the content of the arrays
Array(1, 2, 3) == Array(1, 2, 3)  => False
The result is that duplicates are not removed in the hashset

The solution proposed is that in the last stage, when we have all the data 
in the Hashset buffer, we delete duplicates changing the type of the elements 
and then transform it to the original type.
This transformation is only applied when we have a BinaryType

### Why are the changes needed?
Fix the bug explained

### Does this PR introduce any user-facing change?
Yes. Now `collect_set()` correctly deduplicates array of byte.

### How was this patch tested?
Unit testing

Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug.

Authored-by: Pablo Langa 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index 5848aa3..0a3d876 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.ArrayData
 import org.apache.spark.sql.catalyst.util.GenericArrayData
 import org.apache.spark.sql.types._
 
@@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
   // actual order of input rows.
   override lazy val deterministic: Boolean = false
 
+  protected def convertToBufferElement(value: Any): Any
+
   override def update(buffer: T, input: InternalRow): T = {
 val value = child.eval(input)
 
 // Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
 // See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
 if (value != null) {
-  buffer += InternalRow.copyValue(value)
+  buffer += convertToBufferElement(value)
 }
 buffer
   }
@@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
 buffer ++= other
   }
 
-  override def eval(buffer: T):

[spark] branch branch-2.4 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 1222ce0  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements
1222ce0 is described below

commit 1222ce064f97ed9ad34e2fca4d270762592a1854
Author: Pablo Langa 
AuthorDate: Fri May 1 22:09:04 2020 +0900

[SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

### What changes were proposed in this pull request?

The collect_set() aggregate function should produce a set of distinct 
elements. When the column argument's type is BinayType this is not the case.

Example:
```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

case class R(id: String, value: String, bytes: Array[Byte])
def makeR(id: String, value: String) = R(id, value, value.getBytes)
val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), 
makeR("b", "fish")).toDF()
// In the example below "bytesSet" erroneously has duplicates but 
"stringSet" does not (as expected).
df.agg(collect_set('value) as "stringSet", collect_set('bytes) as 
"byteSet").show(truncate=false)
// The same problem is displayed when using window functions.
val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
val result = df.select(
  collect_set('value).over(win) as "stringSet",
  collect_set('bytes).over(win) as "bytesSet"
)
.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", 
size('bytesSet) as "bytesSetSize")
.show()
```

We use a HashSet buffer to accumulate the results, the problem is that 
arrays equality in Scala don't behave as expected, arrays ara just plain java 
arrays and the equality don't compare the content of the arrays
Array(1, 2, 3) == Array(1, 2, 3)  => False
The result is that duplicates are not removed in the hashset

The solution proposed is that in the last stage, when we have all the data 
in the Hashset buffer, we delete duplicates changing the type of the elements 
and then transform it to the original type.
This transformation is only applied when we have a BinaryType

### Why are the changes needed?
Fix the bug explained

### Does this PR introduce any user-facing change?
Yes. Now `collect_set()` correctly deduplicates array of byte.

### How was this patch tested?
Unit testing

Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug.

Authored-by: Pablo Langa 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index be972f0..8dc3171 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.ArrayData
 import org.apache.spark.sql.catalyst.util.GenericArrayData
 import org.apache.spark.sql.types._
 
@@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
   // actual order of input rows.
   override lazy val deterministic: Boolean = false
 
+  protected def convertToBufferElement(value: Any): Any
+
   override def update(buffer: T, input: InternalRow): T = {
 val value = child.eval(input)
 
 // Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
 // See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
 if (value != null) {
-  buffer += InternalRow.copyValue(value)
+  buffer += convertToBufferElement(value)
 }
 buffer
   }
@@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
 buffer ++= other
   }
 
-  override def eval(buffer: T):

[spark] branch branch-3.0 updated: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 1795a70  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements
1795a70 is described below

commit 1795a70bb04fad1b8cf76271443a448f8d72fc8a
Author: Pablo Langa 
AuthorDate: Fri May 1 22:09:04 2020 +0900

[SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

### What changes were proposed in this pull request?

The collect_set() aggregate function should produce a set of distinct 
elements. When the column argument's type is BinayType this is not the case.

Example:
```scala
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions.Window

case class R(id: String, value: String, bytes: Array[Byte])
def makeR(id: String, value: String) = R(id, value, value.getBytes)
val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), 
makeR("b", "fish")).toDF()
// In the example below "bytesSet" erroneously has duplicates but 
"stringSet" does not (as expected).
df.agg(collect_set('value) as "stringSet", collect_set('bytes) as 
"byteSet").show(truncate=false)
// The same problem is displayed when using window functions.
val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, 
Window.unboundedFollowing)
val result = df.select(
  collect_set('value).over(win) as "stringSet",
  collect_set('bytes).over(win) as "bytesSet"
)
.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", 
size('bytesSet) as "bytesSetSize")
.show()
```

We use a HashSet buffer to accumulate the results, the problem is that 
arrays equality in Scala don't behave as expected, arrays ara just plain java 
arrays and the equality don't compare the content of the arrays
Array(1, 2, 3) == Array(1, 2, 3)  => False
The result is that duplicates are not removed in the hashset

The solution proposed is that in the last stage, when we have all the data 
in the Hashset buffer, we delete duplicates changing the type of the elements 
and then transform it to the original type.
This transformation is only applied when we have a BinaryType

### Why are the changes needed?
Fix the bug explained

### Does this PR introduce any user-facing change?
Yes. Now `collect_set()` correctly deduplicates array of byte.

### How was this patch tested?
Unit testing

Closes #28351 from planga82/feature/SPARK-31500_COLLECT_SET_bug.

Authored-by: Pablo Langa 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 4fecc20f6ecdfe642890cf0a368a85558c40a47c)
Signed-off-by: Takeshi Yamamuro 
---
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
index 5848aa3..0a3d876 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
@@ -23,6 +23,7 @@ import scala.collection.mutable
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.util.ArrayData
 import org.apache.spark.sql.catalyst.util.GenericArrayData
 import org.apache.spark.sql.types._
 
@@ -46,13 +47,15 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
   // actual order of input rows.
   override lazy val deterministic: Boolean = false
 
+  protected def convertToBufferElement(value: Any): Any
+
   override def update(buffer: T, input: InternalRow): T = {
 val value = child.eval(input)
 
 // Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
 // See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
 if (value != null) {
-  buffer += InternalRow.copyValue(value)
+  buffer += convertToBufferElement(value)
 }
 buffer
   }
@@ -61,12 +64,10 @@ abstract class Collect[T <: Growable[Any] with 
Iterable[Any]] extends TypedImper
 buffer ++= other
   }
 
-  override def eval(buffer: T):

[spark] branch master updated (b7cde42 -> 4fecc20)

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b7cde42  [SPARK-31619][CORE] Rename config 
"spark.dynamicAllocation.shuffleTimeout" to 
"spark.dynamicAllocation.shuffleTracking.timeout"
 add 4fecc20  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b7cde42 -> 4fecc20)

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b7cde42  [SPARK-31619][CORE] Rename config 
"spark.dynamicAllocation.shuffleTimeout" to 
"spark.dynamicAllocation.shuffleTracking.timeout"
 add 4fecc20  [SPARK-31500][SQL] collect_set() of BinaryType returns 
duplicate elements

No new revisions were added by this update.

Summary of changes:
 .../catalyst/expressions/aggregate/collect.scala   | 45 +++---
 .../apache/spark/sql/DataFrameAggregateSuite.scala | 16 
 2 files changed, 55 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7c6b970  [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden 
file of ExpressionsSchemaSuite
7c6b970 is described below

commit 7c6b9708b6fbc81d583081a7b027fe1cce493b6c
Author: Takeshi Yamamuro 
AuthorDate: Fri May 1 18:37:41 2020 +0900

[SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of 
ExpressionsSchemaSuite

### What changes were proposed in this pull request?

This PR is a follow-up PR to update the golden file of 
`ExpressionsSchemaSuite`.

### Why are the changes needed?

To recover tests in branch-3.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28427 from maropu/SPARK-31372-FOLLOWUP.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../src/test/resources/sql-functions/sql-expression-schema.md| 9 ++---
 .../test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala | 7 ++-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md 
b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
index 1e22ae2..2091de2 100644
--- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
+++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
@@ -1,6 +1,6 @@
 
 ## Summary
-  - Number of queries: 333
+  - Number of queries: 328
   - Number of expressions that missing example: 34
   - Expressions missing examples: 
and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,struct,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch
 ## Schema of Built-in Functions
@@ -123,7 +123,7 @@
 | org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual | >= | SELECT 2 
>= 1 | struct<(2 >= 1):boolean> |
 | org.apache.spark.sql.catalyst.expressions.Greatest | greatest | SELECT 
greatest(10, 9, 2, 4, 3) | struct |
 | org.apache.spark.sql.catalyst.expressions.Grouping | grouping | SELECT name, 
grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) 
GROUP BY cube(name) | 
struct |
-| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT 
name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 
'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | 
struct |
+| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT 
name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 
'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | 
struct |
 | org.apache.spark.sql.catalyst.expressions.Hex | hex | SELECT hex(17) | 
struct |
 | org.apache.spark.sql.catalyst.expressions.Hour | hour | SELECT 
hour('2009-07-30 12:58:59') | struct |
 | org.apache.spark.sql.catalyst.expressions.Hypot | hypot | SELECT hypot(3, 4) 
| struct |
@@ -140,7 +140,6 @@
 | org.apache.spark.sql.catalyst.expressions.IsNaN | isnan | SELECT 
isnan(cast('NaN' as double)) | struct |
 | org.apache.spark.sql.catalyst.expressions.IsNotNull | isnotnull | SELECT 
isnotnull(1) | struct<(1 IS NOT NULL):boolean> |
 | org.apache.spark.sql.catalyst.expressions.IsNull | isnull | SELECT isnull(1) 
| struct<(1 IS NULL):boolean> |
-| org.apache.spark.sql.catalyst.expressions.JsonObjectKeys | json_object_keys 
| SELECT json_object_keys('{}') | struct> |
 | org.apache.spark.sql.catalyst.expressions.JsonToStructs | from_json | SELECT 
from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') | struct> |
 | org.apache.spark.sql.catalyst.expressions.JsonTuple | json_tuple | SELECT 
json_tuple('{"a":1, "b":2}', 'a', 'b') | struct |
 | org.apache.spark.sql.catalyst.expressions.Lag | lag | N/A | N/A |
@@ -151,7 +150,6 @@
 | org.apache.spark.sql.catalyst.expressions.Length | character_length | SELECT 
character_length('Spark SQL ') | struct |
 | org.apache.spark.sql.catalyst.expressions.Length | char_length | SELECT 
char_length('Spark SQL ') | struct |
 | org.apache.spark.sql.catalyst.expressions.Length | length | SELECT 
length('Spark SQL ') | struct |
-| org.apache.spark.sql.catalyst.expressions.LengthOfJsonArray | 
json_array_length | SELECT json_array_length('[1,2,3,4]') | 
struct |
 | org.apache.spark.sql.catalyst.express

[spark] branch branch-3.0 updated: [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of ExpressionsSchemaSuite

2020-05-01 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 7c6b970  [SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden 
file of ExpressionsSchemaSuite
7c6b970 is described below

commit 7c6b9708b6fbc81d583081a7b027fe1cce493b6c
Author: Takeshi Yamamuro 
AuthorDate: Fri May 1 18:37:41 2020 +0900

[SPARK-31372][SQL][TEST][FOLLOWUP][3.0] Update the golden file of 
ExpressionsSchemaSuite

### What changes were proposed in this pull request?

This PR is a follow-up PR to update the golden file of 
`ExpressionsSchemaSuite`.

### Why are the changes needed?

To recover tests in branch-3.0.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28427 from maropu/SPARK-31372-FOLLOWUP.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../src/test/resources/sql-functions/sql-expression-schema.md| 9 ++---
 .../test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala | 7 ++-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md 
b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
index 1e22ae2..2091de2 100644
--- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
+++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
@@ -1,6 +1,6 @@
 
 ## Summary
-  - Number of queries: 333
+  - Number of queries: 328
   - Number of expressions that missing example: 34
   - Expressions missing examples: 
and,string,tinyint,double,smallint,date,decimal,boolean,float,binary,bigint,int,timestamp,cume_dist,dense_rank,input_file_block_length,input_file_block_start,input_file_name,lag,lead,monotonically_increasing_id,ntile,struct,!,not,or,percent_rank,rank,row_number,spark_partition_id,version,window,positive,count_min_sketch
 ## Schema of Built-in Functions
@@ -123,7 +123,7 @@
 | org.apache.spark.sql.catalyst.expressions.GreaterThanOrEqual | >= | SELECT 2 
>= 1 | struct<(2 >= 1):boolean> |
 | org.apache.spark.sql.catalyst.expressions.Greatest | greatest | SELECT 
greatest(10, 9, 2, 4, 3) | struct |
 | org.apache.spark.sql.catalyst.expressions.Grouping | grouping | SELECT name, 
grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) 
GROUP BY cube(name) | 
struct |
-| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT 
name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 
'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | 
struct |
+| org.apache.spark.sql.catalyst.expressions.GroupingID | grouping_id | SELECT 
name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 
'Bob', 180) people(age, name, height) GROUP BY cube(name, height) | 
struct |
 | org.apache.spark.sql.catalyst.expressions.Hex | hex | SELECT hex(17) | 
struct |
 | org.apache.spark.sql.catalyst.expressions.Hour | hour | SELECT 
hour('2009-07-30 12:58:59') | struct |
 | org.apache.spark.sql.catalyst.expressions.Hypot | hypot | SELECT hypot(3, 4) 
| struct |
@@ -140,7 +140,6 @@
 | org.apache.spark.sql.catalyst.expressions.IsNaN | isnan | SELECT 
isnan(cast('NaN' as double)) | struct |
 | org.apache.spark.sql.catalyst.expressions.IsNotNull | isnotnull | SELECT 
isnotnull(1) | struct<(1 IS NOT NULL):boolean> |
 | org.apache.spark.sql.catalyst.expressions.IsNull | isnull | SELECT isnull(1) 
| struct<(1 IS NULL):boolean> |
-| org.apache.spark.sql.catalyst.expressions.JsonObjectKeys | json_object_keys 
| SELECT json_object_keys('{}') | struct> |
 | org.apache.spark.sql.catalyst.expressions.JsonToStructs | from_json | SELECT 
from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') | struct> |
 | org.apache.spark.sql.catalyst.expressions.JsonTuple | json_tuple | SELECT 
json_tuple('{"a":1, "b":2}', 'a', 'b') | struct |
 | org.apache.spark.sql.catalyst.expressions.Lag | lag | N/A | N/A |
@@ -151,7 +150,6 @@
 | org.apache.spark.sql.catalyst.expressions.Length | character_length | SELECT 
character_length('Spark SQL ') | struct |
 | org.apache.spark.sql.catalyst.expressions.Length | char_length | SELECT 
char_length('Spark SQL ') | struct |
 | org.apache.spark.sql.catalyst.expressions.Length | length | SELECT 
length('Spark SQL ') | struct |
-| org.apache.spark.sql.catalyst.expressions.LengthOfJsonArray | 
json_array_length | SELECT json_array_length('[1,2,3,4]') | 
struct |
 | org.apache.spark.sql.catalyst.express

[spark] branch branch-3.0 updated: [SPARK-31612][SQL][DOCS] SQL Reference clean up

2020-04-30 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new f7c1feb  [SPARK-31612][SQL][DOCS] SQL Reference clean up
f7c1feb is described below

commit f7c1feba123534bf9a64e7c381464c64c4572308
Author: Huaxin Gao 
AuthorDate: Fri May 1 06:30:35 2020 +0900

[SPARK-31612][SQL][DOCS] SQL Reference clean up

### What changes were proposed in this pull request?
SQL Reference cleanup

### Why are the changes needed?
To complete SQL Reference

### Does this PR introduce _any_ user-facing change?
updated sql-ref-syntax-qry.html

before
https://user-images.githubusercontent.com/13592258/80677799-70b27280-8a6e-11ea-8e3f-a768f29d0377.png";>

after
https://user-images.githubusercontent.com/13592258/80677803-74de9000-8a6e-11ea-880c-aa05c53254a6.png";>

### How was this patch tested?
Manually build and check

Closes #28417 from huaxingao/cleanup.

Authored-by: Huaxin Gao 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 2410a45703b829391211caaf1a745511f95298ad)
Signed-off-by: Takeshi Yamamuro 
---
 docs/sql-ref-syntax-aux-describe-database.md   |  2 +-
 docs/sql-ref-syntax-aux-show-tables.md |  2 +-
 docs/sql-ref-syntax-aux-show-views.md  |  2 +-
 docs/sql-ref-syntax-ddl-alter-database.md  |  2 +-
 docs/sql-ref-syntax-ddl-alter-table.md |  4 ++--
 docs/sql-ref-syntax-ddl-alter-view.md  |  6 +++---
 docs/sql-ref-syntax-ddl-create-function.md |  4 ++--
 docs/sql-ref-syntax-ddl-create-table-datasource.md |  2 +-
 docs/sql-ref-syntax-ddl-create-view.md |  4 ++--
 docs/sql-ref-syntax-ddl-drop-database.md   |  6 +++---
 docs/sql-ref-syntax-ddl-drop-function.md   | 18 +-
 docs/sql-ref-syntax-ddl-drop-table.md  |  2 +-
 docs/sql-ref-syntax-ddl-drop-view.md   |  2 +-
 docs/sql-ref-syntax-ddl-truncate-table.md  |  6 +++---
 docs/sql-ref-syntax-dml-insert-into.md |  4 ++--
 ...l-ref-syntax-dml-insert-overwrite-directory-hive.md |  2 +-
 docs/sql-ref-syntax-dml-insert-overwrite-directory.md  |  2 +-
 docs/sql-ref-syntax-dml-insert-overwrite-table.md  |  2 +-
 docs/sql-ref-syntax-qry-select-usedb.md|  2 +-
 docs/sql-ref-syntax-qry.md | 11 ++-
 20 files changed, 51 insertions(+), 34 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-describe-database.md 
b/docs/sql-ref-syntax-aux-describe-database.md
index 2f7b1ce..590438b 100644
--- a/docs/sql-ref-syntax-aux-describe-database.md
+++ b/docs/sql-ref-syntax-aux-describe-database.md
@@ -42,7 +42,7 @@ interchangeable.
   
 
 
-### Example
+### Examples
 
 {% highlight sql %}
 -- Create employees DATABASE
diff --git a/docs/sql-ref-syntax-aux-show-tables.md 
b/docs/sql-ref-syntax-aux-show-tables.md
index f4b3dff..cd54d45 100644
--- a/docs/sql-ref-syntax-aux-show-tables.md
+++ b/docs/sql-ref-syntax-aux-show-tables.md
@@ -52,7 +52,7 @@ SHOW TABLES [ { FROM | IN } database_name ] [ LIKE 
regex_pattern ]
   
 
 
-### Example
+### Examples
 
 {% highlight sql %}
 -- List all tables in default database
diff --git a/docs/sql-ref-syntax-aux-show-views.md 
b/docs/sql-ref-syntax-aux-show-views.md
index 0d9210b..b1a8d3b 100644
--- a/docs/sql-ref-syntax-aux-show-views.md
+++ b/docs/sql-ref-syntax-aux-show-views.md
@@ -51,7 +51,7 @@ SHOW VIEWS [ { FROM | IN } database_name ] [ LIKE 
regex_pattern ]
   
 
 
-### Example
+### Examples
 {% highlight sql %}
 -- Create views in different databases, also create global/local temp views.
 CREATE VIEW sam AS SELECT id, salary FROM employee WHERE name = 'sam';
diff --git a/docs/sql-ref-syntax-ddl-alter-database.md 
b/docs/sql-ref-syntax-ddl-alter-database.md
index 520aba3..65b85dc 100644
--- a/docs/sql-ref-syntax-ddl-alter-database.md
+++ b/docs/sql-ref-syntax-ddl-alter-database.md
@@ -31,7 +31,7 @@ for a database and may be used for auditing purposes.
 
 {% highlight sql %}
 ALTER { DATABASE | SCHEMA } database_name
-SET DBPROPERTIES ( property_name = property_value, ... )
+SET DBPROPERTIES ( property_name = property_value [ , ... ] )
 {% endhighlight %}
 
 ### Parameters
diff --git a/docs/sql-ref-syntax-ddl-alter-table.md 
b/docs/sql-ref-syntax-ddl-alter-table.md
index edb081b..0a74aa0 100644
--- a/docs/sql-ref-syntax-ddl-alter-table.md
+++ b/docs/sql-ref-syntax-ddl-alter-table.md
@@ -66,7 +66,7 @@ ALTER TABLE table_identifier partition_spec RENAME TO 
partition_spec
  Syntax
 
 {% highlight sql %}
-ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , col_spec ... ] )
+ALTER TABLE table_identifier ADD COLUMNS ( col_spec [ , ... ] )
 {% endhighl

< 2 3 4 5 6 7 8 9 10 >

601 - 700 of 917 matches

Mail list logo