[spark] branch master updated (aa509c1 -> 194edc8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aa509c1 [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled add 194edc8 Revert "[SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider" No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 1 - .../execution/datasources/orc/FakeKeyProvider.java | 144 - ...org.apache.hadoop.crypto.key.KeyProviderFactory | 16 --- .../datasources/orc/OrcEncryptionSuite.scala | 98 -- 4 files changed, 259 deletions(-) delete mode 100644 sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/orc/FakeKeyProvider.java delete mode 100644 sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory delete mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3aa4e11 -> aa509c1)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3aa4e11 [SPARK-33861][SQL][FOLLOWUP] Simplify conditional in predicate should consider deterministic add aa509c1 [SPARK-34031][SQL] Union operator missing rowCount when CBO enabled No new revisions were added by this update. Summary of changes: .../plans/logical/basicLogicalOperators.scala | 2 +- .../statsEstimation/BasicStatsPlanVisitor.scala| 10 +- .../BasicStatsEstimationSuite.scala| 11 + .../approved-plans-v1_4/q2.sf100/explain.txt | 128 ++-- .../approved-plans-v1_4/q2.sf100/simplified.txt| 98 ++- .../approved-plans-v1_4/q5.sf100/explain.txt | 220 +++ .../approved-plans-v1_4/q5.sf100/simplified.txt| 64 +- .../approved-plans-v1_4/q54.sf100/explain.txt | 726 ++--- .../approved-plans-v1_4/q54.sf100/simplified.txt | 244 --- .../approved-plans-v2_7/q5a.sf100/explain.txt | 210 +++--- .../approved-plans-v2_7/q5a.sf100/simplified.txt | 64 +- 11 files changed, 874 insertions(+), 903 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (26b6039 -> 3aa4e11)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 26b6039 [SPARK-34028][SQL] Cleanup "unreachable code" compilation warning add 3aa4e11 [SPARK-33861][SQL][FOLLOWUP] Simplify conditional in predicate should consider deterministic No new revisions were added by this update. Summary of changes: .../catalyst/optimizer/SimplifyConditionalsInPredicate.scala | 6 -- .../optimizer/SimplifyConditionalsInPredicateSuite.scala | 11 ++- 2 files changed, 10 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0ba3ab4 -> 26b6039)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0ba3ab4 [SPARK-34021][R] Fix hyper links in SparkR documentation for CRAN submission add 26b6039 [SPARK-34028][SQL] Cleanup "unreachable code" compilation warning No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala | 1 - 1 file changed, 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34021][R] Fix hyper links in SparkR documentation for CRAN submission
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new ddaa677 [SPARK-34021][R] Fix hyper links in SparkR documentation for CRAN submission ddaa677 is described below commit ddaa677a9e4b19d3b61ca157ec236e27e298133a Author: HyukjinKwon AuthorDate: Thu Jan 7 13:58:13 2021 +0900 [SPARK-34021][R] Fix hyper links in SparkR documentation for CRAN submission ### What changes were proposed in this pull request? 3.0.1 CRAN submission was failed as the reason below: ``` Found the following (possibly) invalid URLs: URL: http://jsonlines.org/ (moved to https://jsonlines.org/) From: man/read.json.Rd man/write.json.Rd Status: 200 Message: OK URL: https://dl.acm.org/citation.cfm?id=1608614 (moved to https://dl.acm.org/doi/10.1109/MC.2009.263) From: inst/doc/sparkr-vignettes.html Status: 200 Message: OK ``` The links were being redirected now. This PR checked all hyperlinks in the docs such as `href{...}` and `url{...}`, and fixed all in SparkR: - Fix two problems above. - Fix http to https - Fix `https://www.apache.org/ https://spark.apache.org/` -> `https://www.apache.org https://spark.apache.org`. ### Why are the changes needed? For CRAN submission. ### Does this PR introduce _any_ user-facing change? Virtually no because it's just cleanup that CRAN requires. ### How was this patch tested? Manually tested by clicking the links Closes #31058 from HyukjinKwon/SPARK-34021. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon (cherry picked from commit 0ba3ab4c23ee1cd3785caa0fde76862dce478530) Signed-off-by: HyukjinKwon --- R/pkg/DESCRIPTION| 2 +- R/pkg/R/DataFrame.R | 2 +- R/pkg/R/SQLContext.R | 2 +- R/pkg/R/install.R| 6 +++--- R/pkg/R/mllib_classification.R | 4 ++-- R/pkg/R/mllib_clustering.R | 4 ++-- R/pkg/R/mllib_recommendation.R | 2 +- R/pkg/R/mllib_regression.R | 2 +- R/pkg/R/mllib_stat.R | 2 +- R/pkg/R/mllib_tree.R | 12 ++-- R/pkg/R/stats.R | 3 ++- R/pkg/vignettes/sparkr-vignettes.Rmd | 2 +- 12 files changed, 22 insertions(+), 21 deletions(-) diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION index ae32d59..2a8b8a5 100644 --- a/R/pkg/DESCRIPTION +++ b/R/pkg/DESCRIPTION @@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), email = "felixche...@apache.org"), person(family = "The Apache Software Foundation", role = c("aut", "cph"))) License: Apache License (== 2.0) -URL: https://www.apache.org/ https://spark.apache.org/ +URL: https://www.apache.org https://spark.apache.org BugReports: https://spark.apache.org/contributing.html SystemRequirements: Java (>= 8, < 12) Depends: diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R index 8ca338f..72d9615 100644 --- a/R/pkg/R/DataFrame.R +++ b/R/pkg/R/DataFrame.R @@ -880,7 +880,7 @@ setMethod("toJSON", #' Save the contents of SparkDataFrame as a JSON file #' -#' Save the contents of a SparkDataFrame as a JSON file (\href{http://jsonlines.org/}{ +#' Save the contents of a SparkDataFrame as a JSON file (\href{https://jsonlines.org/}{ #' JSON Lines text format or newline-delimited JSON}). Files written out #' with this method can be read back in as a SparkDataFrame using read.json(). #' diff --git a/R/pkg/R/SQLContext.R b/R/pkg/R/SQLContext.R index 5ed0481..14262e1 100644 --- a/R/pkg/R/SQLContext.R +++ b/R/pkg/R/SQLContext.R @@ -374,7 +374,7 @@ setMethod("toDF", signature(x = "RDD"), #' Create a SparkDataFrame from a JSON file. #' #' Loads a JSON file, returning the result as a SparkDataFrame -#' By default, (\href{http://jsonlines.org/}{JSON Lines text format or newline-delimited JSON} +#' By default, (\href{https://jsonlines.org/}{JSON Lines text format or newline-delimited JSON} #' ) is supported. For JSON (one record per file), set a named property \code{multiLine} to #' \code{TRUE}. #' It goes through the entire dataset once to determine the schema. diff --git a/R/pkg/R/install.R b/R/pkg/R/install.R index 5bc5ae0..bbb9188 100644 --- a/R/pkg/R/install.R +++ b/R/pkg/R/install.R @@ -39,11 +39,11 @@ #' version number in the format of "x.y" where x and y are integer. #' If \code{hadoopVersion = "without"}, "Hadoop free" build is installed. #' See -#' \href{http://spark.apache.org/docs/latest/hadoop-provided.html}{ +#' \href
[spark] branch master updated (9b5df2a -> 0ba3ab4)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9b5df2a [SPARK-34036][DOCS] Update ORC data source documentation add 0ba3ab4 [SPARK-34021][R] Fix hyper links in SparkR documentation for CRAN submission No new revisions were added by this update. Summary of changes: R/pkg/DESCRIPTION| 2 +- R/pkg/R/DataFrame.R | 2 +- R/pkg/R/SQLContext.R | 2 +- R/pkg/R/install.R| 6 +++--- R/pkg/R/mllib_classification.R | 4 ++-- R/pkg/R/mllib_clustering.R | 4 ++-- R/pkg/R/mllib_recommendation.R | 2 +- R/pkg/R/mllib_regression.R | 2 +- R/pkg/R/mllib_stat.R | 2 +- R/pkg/R/mllib_tree.R | 12 ++-- R/pkg/R/stats.R | 3 ++- R/pkg/vignettes/sparkr-vignettes.Rmd | 2 +- 12 files changed, 22 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f9daf03 -> 9b5df2a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f9daf03 [SPARK-33806][SQL][FOLLOWUP] Fold RepartitionExpression num partition should check if partition expression is empty add 9b5df2a [SPARK-34036][DOCS] Update ORC data source documentation No new revisions were added by this update. Summary of changes: docs/sql-data-sources-orc.md | 135 +-- 1 file changed, 129 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8bb70bf -> f9daf03)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8bb70bf [SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider add f9daf03 [SPARK-33806][SQL][FOLLOWUP] Fold RepartitionExpression num partition should check if partition expression is empty No new revisions were added by this update. Summary of changes: .../sql/catalyst/plans/logical/basicLogicalOperators.scala| 2 +- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a0269bb -> 8bb70bf)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a0269bb [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs add 8bb70bf [SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 1 + .../execution/datasources/orc/FakeKeyProvider.java | 144 + ...org.apache.hadoop.crypto.key.KeyProviderFactory | 6 +- .../datasources/orc/OrcEncryptionSuite.scala | 98 ++ 4 files changed, 245 insertions(+), 4 deletions(-) create mode 100644 sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/orc/FakeKeyProvider.java copy python/pyspark/sql/avro/__init__.py => sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory (85%) create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 672862f [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs 672862f is described below commit 672862f5ad7d779bfb1d9588f942f92fb1e6ac90 Author: Kazuaki Ishizaki AuthorDate: Wed Jan 6 09:28:22 2021 -0800 [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs ### What changes were proposed in this pull request? This PR is a follow-up of #31061. It fixes a typo in a document: `Finctions` -> `Functions` ### Why are the changes needed? Make the change better documented. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #31069 from kiszk/SPARK-34022-followup. Authored-by: Kazuaki Ishizaki Signed-off-by: Dongjoon Hyun (cherry picked from commit a0269bb419a37c31850e02884385b889cd153133) Signed-off-by: Dongjoon Hyun --- sql/gen-sql-api-docs.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/gen-sql-api-docs.py b/sql/gen-sql-api-docs.py index 7251850..2f73409 100644 --- a/sql/gen-sql-api-docs.py +++ b/sql/gen-sql-api-docs.py @@ -195,7 +195,7 @@ def generate_sql_api_markdown(jvm, path): """ with open(path, 'w') as mdfile: -mdfile.write("# Built-in Finctions\n\n") +mdfile.write("# Built-in Functions\n\n") for info in _list_function_infos(jvm): name = info.name usage = _make_pretty_usage(info.usage) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3cdc4ef -> a0269bb)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3cdc4ef [SPARK-32685][SQL][FOLLOW-UP] Update migration guide about change default filed.delim to '\t' when user specifies serde add a0269bb [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs No new revisions were added by this update. Summary of changes: sql/gen-sql-api-docs.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6788304 -> 3cdc4ef)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6788304 [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" add 3cdc4ef [SPARK-32685][SQL][FOLLOW-UP] Update migration guide about change default filed.delim to '\t' when user specifies serde No new revisions were added by this update. Summary of changes: docs/sql-migration-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34011][SQL][3.1][3.0] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c9c3d6f [SPARK-34011][SQL][3.1][3.0] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION` c9c3d6f is described below commit c9c3d6faaa3b5f6656caaeb14a152dd77404cbc0 Author: Max Gekk AuthorDate: Wed Jan 6 05:00:57 2021 -0800 [SPARK-34011][SQL][3.1][3.0] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION` ### What changes were proposed in this pull request? 1. Invoke `refreshTable()` from `AlterTableRenamePartitionCommand.run()` after partitions renaming. In particular, this re-creates the cache associated with the modified table. 2. Refresh the cache associated with tables from v2 table catalogs in the `ALTER TABLE .. RENAME TO PARTITION` command. ### Why are the changes needed? This fixes the issues portrayed by the example: ```sql spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY (part0); spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0; spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1; spark-sql> CACHE TABLE tbl1; spark-sql> SELECT * FROM tbl1; 0 0 1 1 spark-sql> ALTER TABLE tbl1 PARTITION (part0=0) RENAME TO PARTITION (part=2); spark-sql> SELECT * FROM tbl1; 0 0 1 1 ``` The last query must not return `0 2` since `0 0` was renamed by previous command. ### Does this PR introduce _any_ user-facing change? Yes. After the changes for the example above: ```sql ... spark-sql> ALTER TABLE tbl1 PARTITION (part=0) RENAME TO PARTITION (part=2); spark-sql> SELECT * FROM tbl1; 0 2 1 1 ``` ### How was this patch tested? By running the affected test suite: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CachedTableSuite" ``` Closes #31060 from MaxGekk/rename-partition-refresh-cache-3.1. Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun (cherry picked from commit f18d68a3d59fc82a3611ce92cccfd9b52df29360) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/sql/execution/command/ddl.scala | 1 + .../scala/org/apache/spark/sql/CachedTableSuite.scala | 15 +++ .../org/apache/spark/sql/hive/CachedTableSuite.scala | 15 +++ 3 files changed, 31 insertions(+) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 748bb1b..11e695a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -535,6 +535,7 @@ case class AlterTableRenamePartitionCommand( catalog.renamePartitions( tableName, Seq(normalizedOldPartition), Seq(normalizedNewPartition)) +sparkSession.catalog.refreshTable(table.identifier.quotedString) Seq.empty[Row] } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala index 0e8122e..b535734 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala @@ -1285,4 +1285,19 @@ class CachedTableSuite extends QueryTest with SQLTestUtils checkAnswer(sql("SELECT * FROM t"), Seq(Row(1, 1))) } } + + test("SPARK-34011: refresh cache after partition renaming") { +withTable("t") { + sql("CREATE TABLE t (id int, part int) USING parquet PARTITIONED BY (part)") + sql("INSERT INTO t PARTITION (part=0) SELECT 0") + sql("INSERT INTO t PARTITION (part=1) SELECT 1") + assert(!spark.catalog.isCached("t")) + sql("CACHE TABLE t") + assert(spark.catalog.isCached("t")) + QueryTest.checkAnswer(sql("SELECT * FROM t"), Seq(Row(0, 0), Row(1, 1))) + sql("ALTER TABLE t PARTITION (part=0) RENAME TO PARTITION (part=2)") + assert(spark.catalog.isCached("t")) + QueryTest.checkAnswer(sql("SELECT * FROM t"), Seq(Row(0, 2), Row(1, 1))) +} + } } diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala index ed3ccb4..dc909fd 100644 --- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala +++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala @@ -454,4 +454,19 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with TestHiveSingleto checkAnswer(sql("SELECT * FROM t"), Seq(Row(1, 1))) } } + + test("SPARK-34011: refresh cache after partition renaming") { +withTable("t") { + sql("CREATE TABLE t
[spark] branch branch-3.1 updated (2b88afb -> f18d68a)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2b88afb [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" add f18d68a [SPARK-34011][SQL][3.1][3.0] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION` No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/command/ddl.scala | 1 + .../scala/org/apache/spark/sql/CachedTableSuite.scala | 15 +++ .../org/apache/spark/sql/hive/CachedTableSuite.scala | 15 +++ 3 files changed, 31 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators"
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 2b88afb [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" 2b88afb is described below commit 2b88afb65b23d4d06180ecd402aba9c7b0fc106a Author: gengjiaan AuthorDate: Wed Jan 6 21:14:45 2021 +0900 [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" ### What changes were proposed in this pull request? Add doc for 'like any' and 'like all' operators in sql-ref-syntx-qry-select-like.cmd ### Why are the changes needed? make the usage of 'like any' and 'like all' known to more users ### Does this PR introduce _any_ user-facing change? Yes. https://user-images.githubusercontent.com/692303/103767385-dc1ffb80-5063-11eb-9529-89479531425f.png";> https://user-images.githubusercontent.com/692303/103767391-dde9bf00-5063-11eb-82ce-63bdd11593a1.png";> https://user-images.githubusercontent.com/692303/103767396-df1aec00-5063-11eb-8e81-a192e6c72431.png";> ### How was this patch tested? No tests Closes #31008 from beliefer/SPARK-33977. Lead-authored-by: gengjiaan Co-authored-by: beliefer Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-like.md | 60 +- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-qry-select-like.md b/docs/sql-ref-syntax-qry-select-like.md index 6211faa8..3604a9b 100644 --- a/docs/sql-ref-syntax-qry-select-like.md +++ b/docs/sql-ref-syntax-qry-select-like.md @@ -21,12 +21,14 @@ license: | ### Description -A LIKE predicate is used to search for a specific pattern. +A LIKE predicate is used to search for a specific pattern. This predicate also supports multiple patterns with quantifiers include `ANY`, `SOME` and `ALL`. ### Syntax ```sql [ NOT ] { LIKE search_pattern [ ESCAPE esc_char ] | [ RLIKE | REGEXP ] regex_pattern } + +[ NOT ] { LIKE quantifiers ( search_pattern [ , ... ]) } ``` ### Parameters @@ -45,6 +47,10 @@ A LIKE predicate is used to search for a specific pattern. * **regex_pattern** Specifies a regular expression search pattern to be searched by the `RLIKE` or `REGEXP` clause. + +* **quantifiers** + +Specifies the predicate quantifiers include `ANY`, `SOME` and `ALL`. `ANY` or `SOME` means if one of the patterns matches the input, then return true; `ALL` means if all the patterns matches the input, then return true. ### Examples @@ -111,6 +117,58 @@ SELECT * FROM person WHERE name LIKE '%$_%' ESCAPE '$'; +---+--+---+ |500|Evan_W| 16| +---+--+---+ + +SELECT * FROM person WHERE name LIKE ALL ('%an%', '%an'); ++---+++ +| id|name| age| ++---+++ +|400| Dan| 50| ++---+++ + +SELECT * FROM person WHERE name LIKE ANY ('%an%', '%an'); ++---+--+---+ +| id| name|age| ++---+--+---+ +|400| Dan| 50| +|500|Evan_W| 16| ++---+--+---+ + +SELECT * FROM person WHERE name LIKE SOME ('%an%', '%an'); ++---+--+---+ +| id| name|age| ++---+--+---+ +|400| Dan| 50| +|500|Evan_W| 16| ++---+--+---+ + +SELECT * FROM person WHERE name NOT LIKE ALL ('%an%', '%an'); ++---+++ +| id|name| age| ++---+++ +|100|John| 30| +|200|Mary|null| +|300|Mike| 80| ++---+++ + +SELECT * FROM person WHERE name NOT LIKE ANY ('%an%', '%an'); ++---+--++ +| id| name| age| ++---+--++ +|100| John| 30| +|200| Mary|null| +|300| Mike| 80| +|500|Evan_W| 16| ++---+--++ + +SELECT * FROM person WHERE name NOT LIKE SOME ('%an%', '%an'); ++---+--++ +| id| name| age| ++---+--++ +|100| John| 30| +|200| Mary|null| +|300| Mike| 80| +|500|Evan_W| 16| ++---+--++ ``` ### Related Statements - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators"
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6788304 [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" 6788304 is described below commit 6788304240c416d173ebdb3d544f3361c6b9fe8e Author: gengjiaan AuthorDate: Wed Jan 6 21:14:45 2021 +0900 [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators" ### What changes were proposed in this pull request? Add doc for 'like any' and 'like all' operators in sql-ref-syntx-qry-select-like.cmd ### Why are the changes needed? make the usage of 'like any' and 'like all' known to more users ### Does this PR introduce _any_ user-facing change? Yes. https://user-images.githubusercontent.com/692303/103767385-dc1ffb80-5063-11eb-9529-89479531425f.png";> https://user-images.githubusercontent.com/692303/103767391-dde9bf00-5063-11eb-82ce-63bdd11593a1.png";> https://user-images.githubusercontent.com/692303/103767396-df1aec00-5063-11eb-8e81-a192e6c72431.png";> ### How was this patch tested? No tests Closes #31008 from beliefer/SPARK-33977. Lead-authored-by: gengjiaan Co-authored-by: beliefer Signed-off-by: Takeshi Yamamuro --- docs/sql-ref-syntax-qry-select-like.md | 60 +- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/docs/sql-ref-syntax-qry-select-like.md b/docs/sql-ref-syntax-qry-select-like.md index 6211faa8..3604a9b 100644 --- a/docs/sql-ref-syntax-qry-select-like.md +++ b/docs/sql-ref-syntax-qry-select-like.md @@ -21,12 +21,14 @@ license: | ### Description -A LIKE predicate is used to search for a specific pattern. +A LIKE predicate is used to search for a specific pattern. This predicate also supports multiple patterns with quantifiers include `ANY`, `SOME` and `ALL`. ### Syntax ```sql [ NOT ] { LIKE search_pattern [ ESCAPE esc_char ] | [ RLIKE | REGEXP ] regex_pattern } + +[ NOT ] { LIKE quantifiers ( search_pattern [ , ... ]) } ``` ### Parameters @@ -45,6 +47,10 @@ A LIKE predicate is used to search for a specific pattern. * **regex_pattern** Specifies a regular expression search pattern to be searched by the `RLIKE` or `REGEXP` clause. + +* **quantifiers** + +Specifies the predicate quantifiers include `ANY`, `SOME` and `ALL`. `ANY` or `SOME` means if one of the patterns matches the input, then return true; `ALL` means if all the patterns matches the input, then return true. ### Examples @@ -111,6 +117,58 @@ SELECT * FROM person WHERE name LIKE '%$_%' ESCAPE '$'; +---+--+---+ |500|Evan_W| 16| +---+--+---+ + +SELECT * FROM person WHERE name LIKE ALL ('%an%', '%an'); ++---+++ +| id|name| age| ++---+++ +|400| Dan| 50| ++---+++ + +SELECT * FROM person WHERE name LIKE ANY ('%an%', '%an'); ++---+--+---+ +| id| name|age| ++---+--+---+ +|400| Dan| 50| +|500|Evan_W| 16| ++---+--+---+ + +SELECT * FROM person WHERE name LIKE SOME ('%an%', '%an'); ++---+--+---+ +| id| name|age| ++---+--+---+ +|400| Dan| 50| +|500|Evan_W| 16| ++---+--+---+ + +SELECT * FROM person WHERE name NOT LIKE ALL ('%an%', '%an'); ++---+++ +| id|name| age| ++---+++ +|100|John| 30| +|200|Mary|null| +|300|Mike| 80| ++---+++ + +SELECT * FROM person WHERE name NOT LIKE ANY ('%an%', '%an'); ++---+--++ +| id| name| age| ++---+--++ +|100| John| 30| +|200| Mary|null| +|300| Mike| 80| +|500|Evan_W| 16| ++---+--++ + +SELECT * FROM person WHERE name NOT LIKE SOME ('%an%', '%an'); ++---+--++ +| id| name| age| ++---+--++ +|100| John| 30| +|200| Mary|null| +|300| Mike| 80| +|500|Evan_W| 16| ++---+--++ ``` ### Related Statements - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-34012][SQL][3.0] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new aaa3dcc [SPARK-34012][SQL][3.0] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide aaa3dcc is described below commit aaa3dcc2c9effde3dd3b4bbe04f7c06e299294cb Author: angerszhu AuthorDate: Wed Jan 6 20:57:03 2021 +0900 [SPARK-34012][SQL][3.0] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/22696 we support HAVING without GROUP BY means global aggregate But since we treat having as Filter before, in this way will cause a lot of analyze error, after https://github.com/apache/spark/pull/28294 we use `UnresolvedHaving` to instead `Filter` to solve such problem, but break origin logical about treat `SELECT 1 FROM range(10) HAVING true` as `SELECT 1 FROM range(10) WHERE true` . This PR fix this issue and add UT. NOTE: This backport comes from https://github.com/apache/spark/pull/31039 ### Why are the changes needed? Keep consistent behavior of migration guide. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes #31049 from AngersZh/SPARK-34012-3.0. Authored-by: angerszhu Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++- .../test/resources/sql-tests/inputs/group-by.sql | 10 .../resources/sql-tests/results/group-by.sql.out | 63 +- 3 files changed, 77 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 938976e..2fcee5a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -723,7 +723,11 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging val withProject = if (aggregationClause == null && havingClause != null) { if (conf.getConf(SQLConf.LEGACY_HAVING_WITHOUT_GROUP_BY_AS_WHERE)) { // If the legacy conf is set, treat HAVING without GROUP BY as WHERE. -withHavingClause(havingClause, createProject()) +val predicate = expression(havingClause.booleanExpression) match { + case p: Predicate => p + case e => Cast(e, BooleanType) +} +Filter(predicate, createProject()) } else { // According to SQL standard, HAVING without GROUP BY means global aggregate. withHavingClause(havingClause, Aggregate(Nil, namedExpressions, withFilter)) diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql index fedf03d..3f5f556 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql @@ -86,6 +86,16 @@ SELECT 1 FROM range(10) HAVING MAX(id) > 0; SELECT id FROM range(10) HAVING id > 0; +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=true; + +SELECT 1 FROM range(10) HAVING true; + +SELECT 1 FROM range(10) HAVING MAX(id) > 0; + +SELECT id FROM range(10) HAVING id > 0; + +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=false; + -- Test data CREATE OR REPLACE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES (1, true), (1, false), diff --git a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out index 50eb2a9..e5b7058 100644 --- a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 56 +-- Number of queries: 61 -- !query @@ -278,6 +278,67 @@ grouping expressions sequence is empty, and '`id`' is not an aggregate function. -- !query +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=true +-- !query schema +struct +-- !query output +spark.sql.legacy.parser.havingWithoutGroupByAsWheretrue + + +-- !query +SELECT 1 FROM range(10) HAVING true +-- !query schema +struct<1:int> +-- !query output +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 + + +-- !query +SELECT 1 FROM range(10) HAVING MAX(id) > 0 +-- !query schema +struct<> +-- !query output +org.apache.spark.sql.AnalysisException + +Aggregate/Window/Generate expressions are not valid in where clause of the query. +Expression in where clause: [(max(
[spark] branch branch-2.4 updated: [SPARK-34012][SQL][2.4] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new d442146 [SPARK-34012][SQL][2.4] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide d442146 is described below commit d442146964a981dd7f074c4954f7fed2752124e8 Author: angerszhu AuthorDate: Wed Jan 6 20:54:47 2021 +0900 [SPARK-34012][SQL][2.4] Keep behavior consistent when conf `spark.sqllegacy.parser.havingWithoutGroupByAsWhere` is true with migration guide ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/22696 we support HAVING without GROUP BY means global aggregate But since we treat having as Filter before, in this way will cause a lot of analyze error, after https://github.com/apache/spark/pull/28294 we use `UnresolvedHaving` to instead `Filter` to solve such problem, but break origin logical about treat `SELECT 1 FROM range(10) HAVING true` as `SELECT 1 FROM range(10) WHERE true` . This PR fix this issue and add UT. NOTE: This backport comes from #31039 ### Why are the changes needed? Keep consistent behavior of migration guide. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes #31050 from AngersZh/SPARK-34012-2.4. Authored-by: angerszhu Signed-off-by: Takeshi Yamamuro --- .../spark/sql/catalyst/parser/AstBuilder.scala | 6 ++- .../test/resources/sql-tests/inputs/group-by.sql | 10 .../resources/sql-tests/results/group-by.sql.out | 60 +- 3 files changed, 74 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 90e7d1c..4c4e4f1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala @@ -467,7 +467,11 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging val withProject = if (aggregation == null && having != null) { if (conf.getConf(SQLConf.LEGACY_HAVING_WITHOUT_GROUP_BY_AS_WHERE)) { // If the legacy conf is set, treat HAVING without GROUP BY as WHERE. -withHaving(having, createProject()) +val predicate = expression(having) match { + case p: Predicate => p + case e => Cast(e, BooleanType) +} +Filter(predicate, createProject()) } else { // According to SQL standard, HAVING without GROUP BY means global aggregate. withHaving(having, Aggregate(Nil, namedExpressions, withFilter)) diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql index 433db71..0c40a8c 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql @@ -80,3 +80,13 @@ SELECT 1 FROM range(10) HAVING true; SELECT 1 FROM range(10) HAVING MAX(id) > 0; SELECT id FROM range(10) HAVING id > 0; + +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=true; + +SELECT 1 FROM range(10) HAVING true; + +SELECT 1 FROM range(10) HAVING MAX(id) > 0; + +SELECT id FROM range(10) HAVING id > 0; + +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=false; diff --git a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out index f9d1ee8..d23a58a 100644 --- a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 30 +-- Number of queries: 35 -- !query 0 @@ -275,3 +275,61 @@ struct<> -- !query 29 output org.apache.spark.sql.AnalysisException grouping expressions sequence is empty, and '`id`' is not an aggregate function. Wrap '()' in windowing function(s) or wrap '`id`' in first() (or first_value) if you don't care which value you get.; + + +-- !query 30 +SET spark.sql.legacy.parser.havingWithoutGroupByAsWhere=true +-- !query 30 schema +struct +-- !query 30 output +spark.sql.legacy.parser.havingWithoutGroupByAsWheretrue + + +-- !query 31 +SELECT 1 FROM range(10) HAVING true +-- !query 31 schema +struct<1:int> +-- !query 31 output +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 + + +-- !query 32 +SELECT 1 FROM range(10) HAVING MAX(id) > 0 +-- !query 32 schema +struct<> +-- !query 32 output +java.lang.UnsupportedOperationException +Cannot evaluate exp
[spark] branch branch-3.1 updated (2a15d05 -> bf0a1c0)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from 2a15d05 [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions add bf0a1c0 [SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs No new revisions were added by this update. Summary of changes: sql/gen-sql-api-docs.py | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0d86a02 [SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs 0d86a02 is described below commit 0d86a02ffbaf53c403a4c68bac0041e84acb0cdd Author: HyukjinKwon AuthorDate: Wed Jan 6 20:31:27 2021 +0900 [SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs ### What changes were proposed in this pull request? This PR adds the support of the latest mkdocs, and makes the sidebar properly show. It works in lower versions too. Before: ![Screen Shot 2021-01-06 at 5 11 56 PM](https://user-images.githubusercontent.com/6477701/103745131-4e7fe400-5042-11eb-9c09-84f9f95e9fb9.png) After: ![Screen Shot 2021-01-06 at 5 10 53 PM](https://user-images.githubusercontent.com/6477701/103745139-5049a780-5042-11eb-8ded-30b6f7ef48aa.png) ### Why are the changes needed? This is a regression in the documentation. ### Does this PR introduce _any_ user-facing change? Technically no. It's not related yet. It fixes the list on the sidebar appears properly. ### How was this patch tested? Manually built the docs via `./sql/create-docs.sh` and `open ./sql/site/index.html` Closes #31061 from HyukjinKwon/SPARK-34022. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- sql/gen-sql-api-docs.py | 1 + 1 file changed, 1 insertion(+) diff --git a/sql/gen-sql-api-docs.py b/sql/gen-sql-api-docs.py index 6132899..7251850 100644 --- a/sql/gen-sql-api-docs.py +++ b/sql/gen-sql-api-docs.py @@ -195,6 +195,7 @@ def generate_sql_api_markdown(jvm, path): """ with open(path, 'w') as mdfile: +mdfile.write("# Built-in Finctions\n\n") for info in _list_function_infos(jvm): name = info.name usage = _make_pretty_usage(info.usage) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 2a15d05 [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions 2a15d05 is described below commit 2a15d055e7a5cef6ba27bf964e0c97f19e4c8897 Author: HyukjinKwon AuthorDate: Wed Jan 6 18:46:20 2021 +0900 [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/27406. It fixes the naming to match with Scala side. Note that there are a bit of inconsistency already e.g.) `col`, `e`, `expr` and `column`. This part I did not change but other names like `zero` vs `initialValue` or `col1`/`col2` vs `left`/`right` looks unnecessary. ### Why are the changes needed? To make the usage similar with Scala side, and for consistency. ### Does this PR introduce _any_ user-facing change? No, this is not released yet. ### How was this patch tested? GitHub Actions and Jenkins build will test it out. Closes #31062 from HyukjinKwon/SPARK-30681. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon (cherry picked from commit ff284fb6ac624b2f38ef12f9b840be3077cd27a6) Signed-off-by: HyukjinKwon --- python/pyspark/sql/functions.py | 16 python/pyspark/sql/functions.pyi | 6 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 4dc3129..90f2a45 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -4353,7 +4353,7 @@ def filter(col, f): return _invoke_higher_order_function("ArrayFilter", [col], [f]) -def aggregate(col, zero, merge, finish=None): +def aggregate(col, initialValue, merge, finish=None): """ Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result @@ -4370,7 +4370,7 @@ def aggregate(col, zero, merge, finish=None): -- col : :class:`Column` or str name of column or expression -zero : :class:`Column` or str +initialValue : :class:`Column` or str initial value. Name of column or expression merge : function a binary function ``(acc: Column, x: Column) -> Column...`` returning expression @@ -4414,19 +4414,19 @@ def aggregate(col, zero, merge, finish=None): if finish is not None: return _invoke_higher_order_function( "ArrayAggregate", -[col, zero], +[col, initialValue], [merge, finish] ) else: return _invoke_higher_order_function( "ArrayAggregate", -[col, zero], +[col, initialValue], [merge] ) -def zip_with(col1, col2, f): +def zip_with(left, right, f): """ Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer @@ -4436,9 +4436,9 @@ def zip_with(col1, col2, f): Parameters -- -col1 : :class:`Column` or str +left : :class:`Column` or str name of the first column or expression -col2 : :class:`Column` or str +right : :class:`Column` or str name of the second column or expression f : function a binary function ``(x1: Column, x2: Column) -> Column...`` @@ -4469,7 +4469,7 @@ def zip_with(col1, col2, f): |[foo_1, bar_2, 3]| +-+ """ -return _invoke_higher_order_function("ZipWith", [col1, col2], [f]) +return _invoke_higher_order_function("ZipWith", [left, right], [f]) def transform_keys(col, f): diff --git a/python/pyspark/sql/functions.pyi b/python/pyspark/sql/functions.pyi index 50e178d..8df5534 100644 --- a/python/pyspark/sql/functions.pyi +++ b/python/pyspark/sql/functions.pyi @@ -237,13 +237,13 @@ def filter(col: ColumnOrName, f: Callable[[Column], Column]) -> Column: ... def filter(col: ColumnOrName, f: Callable[[Column, Column], Column]) -> Column: ... def aggregate( col: ColumnOrName, -zero: ColumnOrName, +initialValue: ColumnOrName, merge: Callable[[Column, Column], Column], finish: Optional[Callable[[Column], Column]] = ..., ) -> Column: ... def zip_with( -col1: ColumnOrName, -ColumnOrName: ColumnOrName, +left: ColumnOrName, +right: ColumnOrName, f: Callable[[Column, Column], Column], ) -> Column: ... def transform_keys( - To unsu
[spark] branch master updated: [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff284fb [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions ff284fb is described below commit ff284fb6ac624b2f38ef12f9b840be3077cd27a6 Author: HyukjinKwon AuthorDate: Wed Jan 6 18:46:20 2021 +0900 [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/27406. It fixes the naming to match with Scala side. Note that there are a bit of inconsistency already e.g.) `col`, `e`, `expr` and `column`. This part I did not change but other names like `zero` vs `initialValue` or `col1`/`col2` vs `left`/`right` looks unnecessary. ### Why are the changes needed? To make the usage similar with Scala side, and for consistency. ### Does this PR introduce _any_ user-facing change? No, this is not released yet. ### How was this patch tested? GitHub Actions and Jenkins build will test it out. Closes #31062 from HyukjinKwon/SPARK-30681. Authored-by: HyukjinKwon Signed-off-by: HyukjinKwon --- python/pyspark/sql/functions.py | 16 python/pyspark/sql/functions.pyi | 6 +++--- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index f612d2d..c9d24dc 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -4355,7 +4355,7 @@ def filter(col, f): return _invoke_higher_order_function("ArrayFilter", [col], [f]) -def aggregate(col, zero, merge, finish=None): +def aggregate(col, initialValue, merge, finish=None): """ Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result @@ -4372,7 +4372,7 @@ def aggregate(col, zero, merge, finish=None): -- col : :class:`Column` or str name of column or expression -zero : :class:`Column` or str +initialValue : :class:`Column` or str initial value. Name of column or expression merge : function a binary function ``(acc: Column, x: Column) -> Column...`` returning expression @@ -4416,19 +4416,19 @@ def aggregate(col, zero, merge, finish=None): if finish is not None: return _invoke_higher_order_function( "ArrayAggregate", -[col, zero], +[col, initialValue], [merge, finish] ) else: return _invoke_higher_order_function( "ArrayAggregate", -[col, zero], +[col, initialValue], [merge] ) -def zip_with(col1, col2, f): +def zip_with(left, right, f): """ Merge two given arrays, element-wise, into a single array using a function. If one array is shorter, nulls are appended at the end to match the length of the longer @@ -4438,9 +4438,9 @@ def zip_with(col1, col2, f): Parameters -- -col1 : :class:`Column` or str +left : :class:`Column` or str name of the first column or expression -col2 : :class:`Column` or str +right : :class:`Column` or str name of the second column or expression f : function a binary function ``(x1: Column, x2: Column) -> Column...`` @@ -4471,7 +4471,7 @@ def zip_with(col1, col2, f): |[foo_1, bar_2, 3]| +-+ """ -return _invoke_higher_order_function("ZipWith", [col1, col2], [f]) +return _invoke_higher_order_function("ZipWith", [left, right], [f]) def transform_keys(col, f): diff --git a/python/pyspark/sql/functions.pyi b/python/pyspark/sql/functions.pyi index acb17a2..0cf60c0 100644 --- a/python/pyspark/sql/functions.pyi +++ b/python/pyspark/sql/functions.pyi @@ -237,13 +237,13 @@ def filter(col: ColumnOrName, f: Callable[[Column], Column]) -> Column: ... def filter(col: ColumnOrName, f: Callable[[Column, Column], Column]) -> Column: ... def aggregate( col: ColumnOrName, -zero: ColumnOrName, +initialValue: ColumnOrName, merge: Callable[[Column, Column], Column], finish: Optional[Callable[[Column], Column]] = ..., ) -> Column: ... def zip_with( -col1: ColumnOrName, -ColumnOrName: ColumnOrName, +left: ColumnOrName, +right: ColumnOrName, f: Callable[[Column, Column], Column], ) -> Column: ... def transform_keys( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.
[spark] branch branch-3.1 updated: [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf
This is an automated email from the ASF dual-hosted git repository. prashant pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new f2b80c7 [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf f2b80c7 is described below commit f2b80c78dfaed25bd444d71847c9483fbe4a4115 Author: Prashant Sharma AuthorDate: Wed Jan 6 14:55:40 2021 +0530 [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf ### What changes were proposed in this pull request? Skip files if they are binary or very large to fit the configMap's max size. ### Why are the changes needed? Config map cannot hold binary files and there is also a limit on how much data a configMap can hold. This limit can be configured by the k8s cluster admin. This PR, skips such files (with a warning) instead of failing with weird runtime errors. If such files are not skipped, then it would result in mount errors or encoding errors (if binary files are submitted). ### Does this PR introduce _any_ user-facing change? yes, in simple words avoids possible errors due to negligence (for example, placing a large file or a binary file in SPARK_CONF_DIR) and thus improves user experience. ### How was this patch tested? Added relevant tests and improved existing tests. Closes #30472 from ScrapCodes/SPARK-32221/avoid-conf-propagate-errors. Lead-authored-by: Prashant Sharma Co-authored-by: Prashant Sharma Signed-off-by: Prashant Sharma (cherry picked from commit f64dfa8727b785f333a0c10f5f7175ab51f22764) Signed-off-by: Prashant Sharma --- .../scala/org/apache/spark/deploy/k8s/Config.scala | 8 +++ .../deploy/k8s/submit/KubernetesClientUtils.scala | 80 +- .../spark/deploy/k8s/submit/ClientSuite.scala | 21 -- .../k8s/submit/KubernetesClientUtilsSuite.scala| 79 + 4 files changed, 164 insertions(+), 24 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index 8232ed3..65ac3c9 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -91,6 +91,14 @@ private[spark] object Config extends Logging { .toSequence .createWithDefault(Nil) + val CONFIG_MAP_MAXSIZE = +ConfigBuilder("spark.kubernetes.configMap.maxSize") + .doc("Max size limit for a config map. This is configurable as per" + +" https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.") + .version("3.1.0") + .longConf + .createWithDefault(1572864) // 1.5 MiB + val KUBERNETES_AUTH_DRIVER_CONF_PREFIX = "spark.kubernetes.authenticate.driver" val KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX = "spark.kubernetes.authenticate.executor" val KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX = "spark.kubernetes.authenticate.driver.mounted" diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala index 32f630f..4207077 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala @@ -18,15 +18,17 @@ package org.apache.spark.deploy.k8s.submit import java.io.{File, StringWriter} +import java.nio.charset.MalformedInputException import java.util.Properties import scala.collection.JavaConverters._ +import scala.collection.mutable import scala.io.{Codec, Source} import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, KeyToPath} import org.apache.spark.SparkConf -import org.apache.spark.deploy.k8s.{Constants, KubernetesUtils} +import org.apache.spark.deploy.k8s.{Config, Constants, KubernetesUtils} import org.apache.spark.deploy.k8s.Constants.ENV_SPARK_CONF_DIR import org.apache.spark.internal.Logging @@ -54,8 +56,10 @@ private[spark] object KubernetesClientUtils extends Logging { /** * Build, file -> 'file's content' map of all the selected files in SPARK_CONF_DIR. */ - def buildSparkConfDirFilesMap(configMapName: String, - sparkConf: SparkConf, resolvedPropertiesMap: Map[String, String]): Map[String, String] = { + def buildSparkConfDirFilesMap( + configMapName: String, + sparkConf: SparkConf, + resolvedPropertiesMap: Map[String, String]): Map[String, String] = synch
[spark] branch master updated: [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf
This is an automated email from the ASF dual-hosted git repository. prashant pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f64dfa8 [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf f64dfa8 is described below commit f64dfa8727b785f333a0c10f5f7175ab51f22764 Author: Prashant Sharma AuthorDate: Wed Jan 6 14:55:40 2021 +0530 [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf ### What changes were proposed in this pull request? Skip files if they are binary or very large to fit the configMap's max size. ### Why are the changes needed? Config map cannot hold binary files and there is also a limit on how much data a configMap can hold. This limit can be configured by the k8s cluster admin. This PR, skips such files (with a warning) instead of failing with weird runtime errors. If such files are not skipped, then it would result in mount errors or encoding errors (if binary files are submitted). ### Does this PR introduce _any_ user-facing change? yes, in simple words avoids possible errors due to negligence (for example, placing a large file or a binary file in SPARK_CONF_DIR) and thus improves user experience. ### How was this patch tested? Added relevant tests and improved existing tests. Closes #30472 from ScrapCodes/SPARK-32221/avoid-conf-propagate-errors. Lead-authored-by: Prashant Sharma Co-authored-by: Prashant Sharma Signed-off-by: Prashant Sharma --- .../scala/org/apache/spark/deploy/k8s/Config.scala | 8 +++ .../deploy/k8s/submit/KubernetesClientUtils.scala | 80 +- .../spark/deploy/k8s/submit/ClientSuite.scala | 21 -- .../k8s/submit/KubernetesClientUtilsSuite.scala| 79 + 4 files changed, 164 insertions(+), 24 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala index 6939de4..8dca875 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala @@ -99,6 +99,14 @@ private[spark] object Config extends Logging { .toSequence .createWithDefault(Nil) + val CONFIG_MAP_MAXSIZE = +ConfigBuilder("spark.kubernetes.configMap.maxSize") + .doc("Max size limit for a config map. This is configurable as per" + +" https://etcd.io/docs/v3.4.0/dev-guide/limit/ on k8s server end.") + .version("3.1.0") + .longConf + .createWithDefault(1572864) // 1.5 MiB + val KUBERNETES_AUTH_DRIVER_CONF_PREFIX = "spark.kubernetes.authenticate.driver" val KUBERNETES_AUTH_EXECUTOR_CONF_PREFIX = "spark.kubernetes.authenticate.executor" val KUBERNETES_AUTH_DRIVER_MOUNTED_CONF_PREFIX = "spark.kubernetes.authenticate.driver.mounted" diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala index 32f630f..4207077 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala @@ -18,15 +18,17 @@ package org.apache.spark.deploy.k8s.submit import java.io.{File, StringWriter} +import java.nio.charset.MalformedInputException import java.util.Properties import scala.collection.JavaConverters._ +import scala.collection.mutable import scala.io.{Codec, Source} import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder, KeyToPath} import org.apache.spark.SparkConf -import org.apache.spark.deploy.k8s.{Constants, KubernetesUtils} +import org.apache.spark.deploy.k8s.{Config, Constants, KubernetesUtils} import org.apache.spark.deploy.k8s.Constants.ENV_SPARK_CONF_DIR import org.apache.spark.internal.Logging @@ -54,8 +56,10 @@ private[spark] object KubernetesClientUtils extends Logging { /** * Build, file -> 'file's content' map of all the selected files in SPARK_CONF_DIR. */ - def buildSparkConfDirFilesMap(configMapName: String, - sparkConf: SparkConf, resolvedPropertiesMap: Map[String, String]): Map[String, String] = { + def buildSparkConfDirFilesMap( + configMapName: String, + sparkConf: SparkConf, + resolvedPropertiesMap: Map[String, String]): Map[String, String] = synchronized { val loadedConfFilesMap = KubernetesClientUtils.loadSparkConfDirFiles(sparkConf) // Add resolved
[spark] branch branch-3.0 updated (98cb0cd -> 403bca4)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 98cb0cd [SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression add 403bca4 [SPARK-33029][CORE][WEBUI][3.0] Fix the UI executor page incorrectly marking the driver as blacklisted No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/status/AppStatusListener.scala| 8 +--- .../executor_memory_usage_expectation.json| 2 +- .../executor_node_blacklisting_expectation.json | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (45a4ff8 -> 26d8df3)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a4ff8 [SPARK-33948][SQL] Fix CodeGen error of MapObjects.doGenCode method in Scala 2.13 add 26d8df3 [SPARK-33938][SQL] Optimize Like Any/All by LikeSimplification No new revisions were added by this update. Summary of changes: .../catalyst/expressions/regexpExpressions.scala | 6 +- .../spark/sql/catalyst/optimizer/expressions.scala | 81 +++--- .../optimizer/LikeSimplificationSuite.scala| 68 ++ 3 files changed, 128 insertions(+), 27 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org