[spark] branch branch-3.2 updated (e55bab5 -> 90b7ee0)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git. from e55bab5 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers add 90b7ee0 [SPARK-37238][BUILD][3.2] Upgrade ORC to 1.6.12 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 6 +++--- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 6 +++--- pom.xml | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7ef6a2e -> e29c4e1)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7ef6a2e [SPARK-37231][SQL] Dynamic writes/reads of ANSI interval partitions add e29c4e1 [SPARK-37211][INFRA] Added descriptions and an image to the guide for enabling GitHub Actions in notify_test_workflow.yml No new revisions were added by this update. Summary of changes: .github/workflows/images/workflow-enable-button.png | Bin 0 -> 79807 bytes .github/workflows/notify_test_workflow.yml | 10 -- 2 files changed, 8 insertions(+), 2 deletions(-) create mode 100644 .github/workflows/images/workflow-enable-button.png - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] MaxGekk commented on pull request #367: Add Chao Sun to committers
MaxGekk commented on pull request #367: URL: https://github.com/apache/spark-website/pull/367#issuecomment-962850694 Congratulations @sunchao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8ab9d63 -> 7ef6a2e)
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8ab9d63 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers add 7ef6a2e [SPARK-37231][SQL] Dynamic writes/reads of ANSI interval partitions No new revisions were added by this update. Summary of changes: .../execution/datasources/PartitioningUtils.scala | 2 ++ .../spark/sql/sources/PartitionedWriteSuite.scala | 40 ++ 2 files changed, 36 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new e55bab5 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers e55bab5 is described below commit e55bab5267b066fb78921ef6828924c32adbc637 Author: Wenchen Fan AuthorDate: Mon Nov 8 13:33:30 2021 +0800 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers This is a followup of #31427 , which introduced two issues: 1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException` 2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists. This PR fixes the 2 issues. save analysis time and improve error message no updated test Closes #34490 from cloud-fan/table. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan (cherry picked from commit 8ab9d6327d7db20a4257f9fe6d0b17919576be5e) Signed-off-by: Wenchen Fan --- .../sql/connector/catalog/LookupCatalog.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 10 +--- .../spark/sql/catalyst/parser/DDLParserSuite.scala | 3 + .../catalyst/analysis/ResolveSessionCatalog.scala | 2 +- .../datasources/v2/V2SessionCatalog.scala | 4 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 66 +- .../spark/sql/execution/command/DDLSuite.scala | 3 +- 7 files changed, 27 insertions(+), 65 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala index 0635859..0362caf 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala @@ -191,8 +191,8 @@ private[sql] trait LookupCatalog extends Logging { } else { ident.namespace match { case Array(db) => FunctionIdentifier(ident.name, Some(db)) -case _ => - throw QueryCompilationErrors.unsupportedFunctionNameError(ident.toString) +case other => + throw QueryCompilationErrors.requiresSinglePartNamespaceError(other) } } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index e7af006..7c2780a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -519,10 +519,6 @@ object QueryCompilationErrors { "SHOW VIEWS, only SessionCatalog supports this command.") } - def unsupportedFunctionNameError(quoted: String): Throwable = { -new AnalysisException(s"Unsupported function name '$quoted'") - } - def sqlOnlySupportedWithV1TablesError(sql: String): Throwable = { new AnalysisException(s"$sql is only supported with v1 tables.") } @@ -850,9 +846,9 @@ object QueryCompilationErrors { new TableAlreadyExistsException(ident) } - def requiresSinglePartNamespaceError(ident: Identifier): Throwable = { -new NoSuchTableException( - s"V2 session catalog requires a single-part namespace: ${ident.quoted}") + def requiresSinglePartNamespaceError(ns: Seq[String]): Throwable = { +new AnalysisException(CatalogManager.SESSION_CATALOG_NAME + + " requires a single-part namespace, but got " + ns.mkString("[", ", ", "]")) } def namespaceAlreadyExistsError(namespace: Array[String]): Throwable = { diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala index a1d9f89..886c9a6 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala @@ -2237,6 +2237,9 @@ class DDLParserSuite extends AnalysisTest { false, LocalTempView) comparePlans(parsed2, expected2) + +val v3 = "CREATE TEMPORARY VIEW a.b AS SELECT 1" +intercept(v3, "It is not allowed to add database prefix") } test("create view - full") { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala index 80063cd..b73ccbb 100644 ---
[spark] branch master updated: [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8ab9d63 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers 8ab9d63 is described below commit 8ab9d6327d7db20a4257f9fe6d0b17919576be5e Author: Wenchen Fan AuthorDate: Mon Nov 8 13:33:30 2021 +0800 [SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers ### What changes were proposed in this pull request? This is a followup of #31427 , which introduced two issues: 1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException` 2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists. This PR fixes the 2 issues. ### Why are the changes needed? save analysis time and improve error message ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated test Closes #34490 from cloud-fan/table. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../sql/connector/catalog/LookupCatalog.scala | 4 +- .../spark/sql/errors/QueryCompilationErrors.scala | 10 +--- .../catalyst/analysis/ResolveSessionCatalog.scala | 2 +- .../datasources/v2/V2SessionCatalog.scala | 4 +- .../spark/sql/connector/DataSourceV2SQLSuite.scala | 66 +- .../sql/execution/command/DDLParserSuite.scala | 3 + .../spark/sql/execution/command/DDLSuite.scala | 3 +- 7 files changed, 27 insertions(+), 65 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala index 0635859..0362caf 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala @@ -191,8 +191,8 @@ private[sql] trait LookupCatalog extends Logging { } else { ident.namespace match { case Array(db) => FunctionIdentifier(ident.name, Some(db)) -case _ => - throw QueryCompilationErrors.unsupportedFunctionNameError(ident.toString) +case other => + throw QueryCompilationErrors.requiresSinglePartNamespaceError(other) } } diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 527a2b9..b7f4cce 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -530,10 +530,6 @@ object QueryCompilationErrors { "SHOW VIEWS, only SessionCatalog supports this command.") } - def unsupportedFunctionNameError(quoted: String): Throwable = { -new AnalysisException(s"Unsupported function name '$quoted'") - } - def sqlOnlySupportedWithV1TablesError(sql: String): Throwable = { new AnalysisException(s"$sql is only supported with v1 tables.") } @@ -861,9 +857,9 @@ object QueryCompilationErrors { new TableAlreadyExistsException(ident) } - def requiresSinglePartNamespaceError(ident: Identifier): Throwable = { -new NoSuchTableException( - s"V2 session catalog requires a single-part namespace: ${ident.quoted}") + def requiresSinglePartNamespaceError(ns: Seq[String]): Throwable = { +new AnalysisException(CatalogManager.SESSION_CATALOG_NAME + + " requires a single-part namespace, but got " + ns.mkString("[", ", ", "]")) } def namespaceAlreadyExistsError(namespace: Array[String]): Throwable = { diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala index f211054..e5be7f4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala @@ -430,7 +430,7 @@ class ResolveSessionCatalog(val catalogManager: CatalogManager) className, resources, ignoreIfExists, replace) => if (isSessionCatalog(catalog)) { val database = if (nameParts.length > 2) { - throw QueryCompilationErrors.unsupportedFunctionNameError(nameParts.quoted) + throw QueryCompilationErrors.requiresSinglePartNamespaceError(nameParts) } else if
[spark] branch master updated (5cb0fb3 -> fe41d18)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5cb0fb3 [SPARK-35437][SQL] Use expressions to filter Hive partitions at client side add fe41d18 [SPARK-37199][SQL] Add deterministic field to QueryPlan No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/subquery.scala | 3 +++ .../spark/sql/catalyst/optimizer/InlineCTE.scala | 2 +- .../spark/sql/catalyst/plans/QueryPlan.scala | 7 + .../spark/sql/catalyst/plans/QueryPlanSuite.scala | 30 +- .../scala/org/apache/spark/sql/SubquerySuite.scala | 11 5 files changed, 51 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e2ea690 -> 5cb0fb3)
This is an automated email from the ASF dual-hosted git repository. sunchao pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e2ea690 [SPARK-37221][SQL] The collect-like API in SparkPlan should support columnar output add 5cb0fb3 [SPARK-35437][SQL] Use expressions to filter Hive partitions at client side No new revisions were added by this update. Summary of changes: .../catalyst/catalog/ExternalCatalogUtils.scala| 42 ++- .../org/apache/spark/sql/internal/SQLConf.scala| 14 .../spark/sql/hive/client/HiveClientImpl.scala | 2 +- .../apache/spark/sql/hive/client/HiveShim.scala| 85 +++--- .../hive/client/HivePartitionFilteringSuite.scala | 67 - 5 files changed, 176 insertions(+), 34 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (442dedb -> e2ea690)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 442dedb [SPARK-37120][BUILD] Add Daily GitHub Action jobs for Java11/17 add e2ea690 [SPARK-37221][SQL] The collect-like API in SparkPlan should support columnar output No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/SparkPlan.scala | 7 +- .../spark/sql/execution/SparkPlanSuite.scala | 25 ++ 2 files changed, 31 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: Revert "[SPARK-36998][CORE] Handle concurrent eviction of same application in SHS"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 6ecdde1 Revert "[SPARK-36998][CORE] Handle concurrent eviction of same application in SHS" 6ecdde1 is described below commit 6ecdde189a61ea07125a3bddca6ec1ddd6a1c866 Author: Dongjoon Hyun AuthorDate: Sun Nov 7 17:12:51 2021 -0800 Revert "[SPARK-36998][CORE] Handle concurrent eviction of same application in SHS" This reverts commit 248e07b49187bc7082e6cb2b0d9daa4b48ffe3cb. --- .../deploy/history/HistoryServerDiskManager.scala | 10 ++-- .../history/HistoryServerDiskManagerSuite.scala| 30 +++--- 2 files changed, 5 insertions(+), 35 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala index 8a5b285..31f9d18 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/HistoryServerDiskManager.scala @@ -17,7 +17,7 @@ package org.apache.spark.deploy.history -import java.io.{File, IOException} +import java.io.File import java.util.concurrent.atomic.AtomicLong import scala.collection.JavaConverters._ @@ -210,13 +210,7 @@ private class HistoryServerDiskManager( def committed(): Long = committedUsage.get() private def deleteStore(path: File): Unit = { -try { - FileUtils.deleteDirectory(path) -} catch { - // Handle simultaneous eviction of the same app - case e: IOException => -if (path.exists()) throw e -} +FileUtils.deleteDirectory(path) listing.delete(classOf[ApplicationStoreInfo], path.getAbsolutePath()) } diff --git a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala index fecf905..9004e86 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/HistoryServerDiskManagerSuite.scala @@ -21,9 +21,8 @@ import java.io.File import org.mockito.AdditionalAnswers import org.mockito.ArgumentMatchers.{anyBoolean, anyLong, eq => meq} -import org.mockito.Mockito.{doAnswer, spy, when} -import org.scalatest.{BeforeAndAfter, PrivateMethodTester} -import org.scalatestplus.mockito.MockitoSugar.mock +import org.mockito.Mockito.{doAnswer, spy} +import org.scalatest.BeforeAndAfter import org.apache.spark.{SparkConf, SparkFunSuite} import org.apache.spark.internal.config.History._ @@ -31,8 +30,7 @@ import org.apache.spark.status.KVUtils import org.apache.spark.util.{ManualClock, Utils} import org.apache.spark.util.kvstore.KVStore -class HistoryServerDiskManagerSuite extends SparkFunSuite - with PrivateMethodTester with BeforeAndAfter { +class HistoryServerDiskManagerSuite extends SparkFunSuite with BeforeAndAfter { private def doReturn(value: Any) = org.mockito.Mockito.doReturn(value, Seq.empty: _*) @@ -160,28 +158,6 @@ class HistoryServerDiskManagerSuite extends SparkFunSuite assert(manager.approximateSize(50L, true) > 50L) } - test("SPARK-36998: Should be able to delete a store") { -val manager = mockManager() -val tempDir = Utils.createTempDir() -tempDir.delete() -Seq(true, false).foreach { exists => - val file = mock[File] - when(file.exists()).thenReturn(true).thenReturn(true).thenReturn(exists) - when(file.isDirectory).thenReturn(true) - when(file.toPath).thenReturn(tempDir.toPath) - when(file.getAbsolutePath).thenReturn(tempDir.getAbsolutePath) - val deleteStore = PrivateMethod[Unit]('deleteStore) - if (exists) { -val m = intercept[Exception] { - manager invokePrivate deleteStore(file) -}.getMessage -assert(m.contains("Unknown I/O error")) - } else { -manager invokePrivate deleteStore(file) - } -} - } - test("SPARK-32024: update ApplicationStoreInfo.size during initializing") { val manager = mockManager() val leaseA = manager.lease(2) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37120][BUILD] Add Daily GitHub Action jobs for Java11/17
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 442dedb [SPARK-37120][BUILD] Add Daily GitHub Action jobs for Java11/17 442dedb is described below commit 442dedba835f43532d049adb8b56ba05bf675f3d Author: Dongjoon Hyun AuthorDate: Mon Nov 8 09:30:07 2021 +0900 [SPARK-37120][BUILD] Add Daily GitHub Action jobs for Java11/17 ### What changes were proposed in this pull request? This PR aims to add Daily GitHub Action jobs for Java 11/17. ### Why are the changes needed? To add a test coverage on Java 11/17. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #34508 from dongjoon-hyun/SPARK-37120. Authored-by: Dongjoon Hyun Signed-off-by: Hyukjin Kwon --- .github/workflows/build_and_test.yml | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 3f2d500..9f375b3 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -33,12 +33,17 @@ on: - cron: '0 7 * * *' # PySpark coverage for master branch - cron: '0 10 * * *' +# Java 11 +- cron: '0 13 * * *' +# Java 17 +- cron: '0 16 * * *' jobs: configure-jobs: name: Configure jobs runs-on: ubuntu-20.04 outputs: + java: ${{ steps.set-outputs.outputs.java }} branch: ${{ steps.set-outputs.outputs.branch }} hadoop: ${{ steps.set-outputs.outputs.hadoop }} type: ${{ steps.set-outputs.outputs.type }} @@ -48,26 +53,43 @@ jobs: id: set-outputs run: | if [ "${{ github.event.schedule }}" = "0 1 * * *" ]; then + echo '::set-output name=java::8' echo '::set-output name=branch::master' echo '::set-output name=type::scheduled' echo '::set-output name=envs::{}' echo '::set-output name=hadoop::hadoop2.7' elif [ "${{ github.event.schedule }}" = "0 4 * * *" ]; then + echo '::set-output name=java::8' echo '::set-output name=branch::master' echo '::set-output name=type::scheduled' echo '::set-output name=envs::{"SCALA_PROFILE": "scala2.13"}' echo '::set-output name=hadoop::hadoop3.2' elif [ "${{ github.event.schedule }}" = "0 7 * * *" ]; then + echo '::set-output name=java::8' echo '::set-output name=branch::branch-3.2' echo '::set-output name=type::scheduled' echo '::set-output name=envs::{"SCALA_PROFILE": "scala2.13"}' echo '::set-output name=hadoop::hadoop3.2' elif [ "${{ github.event.schedule }}" = "0 10 * * *" ]; then + echo '::set-output name=java::8' echo '::set-output name=branch::master' echo '::set-output name=type::pyspark-coverage-scheduled' echo '::set-output name=envs::{"PYSPARK_CODECOV": "true"}' echo '::set-output name=hadoop::hadoop3.2' +elif [ "${{ github.event.schedule }}" = "0 13 * * *" ]; then + echo '::set-output name=java::11' + echo '::set-output name=branch::branch-3.2' + echo '::set-output name=type::scheduled' + echo '::set-output name=envs::{}' + echo '::set-output name=hadoop::hadoop3.2' +elif [ "${{ github.event.schedule }}" = "0 16 * * *" ]; then + echo '::set-output name=java::17' + echo '::set-output name=branch::branch-3.2' + echo '::set-output name=type::scheduled' + echo '::set-output name=envs::{}' + echo '::set-output name=hadoop::hadoop3.2' else + echo '::set-output name=java::8' echo '::set-output name=branch::master' # Default branch to run on. CHANGE here when a branch is cut out. echo '::set-output name=type::regular' echo '::set-output name=envs::{}' @@ -89,7 +111,7 @@ jobs: fail-fast: false matrix: java: - - 8 + - ${{ needs.configure-jobs.outputs.java }} hadoop: - ${{ needs.configure-jobs.outputs.hadoop }} hive: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37232][BUILD] Upgrade ORC to 1.7.1
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a3cb70a [SPARK-37232][BUILD] Upgrade ORC to 1.7.1 a3cb70a is described below commit a3cb70aed6a075c580b9f5c4afcb6e2859f636d7 Author: William Hyun AuthorDate: Sun Nov 7 13:16:42 2021 -0800 [SPARK-37232][BUILD] Upgrade ORC to 1.7.1 ### What changes were proposed in this pull request? This PR aims to upgrade ORC to 1.7.1. - http://orc.apache.org/news/2021/11/07/ORC-1.7.1/ ### Why are the changes needed? This will bring the latest bug fixes. - https://github.com/apache/orc/milestone/1?closed=1 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #34507 from williamhyun/orc171. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 6 +++--- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 6 +++--- pom.xml | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 index 90e6304..59b2b9b 100644 --- a/dev/deps/spark-deps-hadoop-2.7-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2.7-hive-2.3 @@ -202,9 +202,9 @@ objenesis/2.6//objenesis-2.6.jar okhttp/3.12.12//okhttp-3.12.12.jar okio/1.14.0//okio-1.14.0.jar opencsv/2.3//opencsv-2.3.jar -orc-core/1.7.0//orc-core-1.7.0.jar -orc-mapreduce/1.7.0//orc-mapreduce-1.7.0.jar -orc-shims/1.7.0//orc-shims-1.7.0.jar +orc-core/1.7.1//orc-core-1.7.1.jar +orc-mapreduce/1.7.1//orc-mapreduce-1.7.1.jar +orc-shims/1.7.1//orc-shims-1.7.1.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar diff --git a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 index 7f45c5c..b5b8406 100644 --- a/dev/deps/spark-deps-hadoop-3.2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3.2-hive-2.3 @@ -189,9 +189,9 @@ objenesis/2.6//objenesis-2.6.jar okhttp/3.12.12//okhttp-3.12.12.jar okio/1.14.0//okio-1.14.0.jar opencsv/2.3//opencsv-2.3.jar -orc-core/1.7.0//orc-core-1.7.0.jar -orc-mapreduce/1.7.0//orc-mapreduce-1.7.0.jar -orc-shims/1.7.0//orc-shims-1.7.0.jar +orc-core/1.7.1//orc-core-1.7.1.jar +orc-mapreduce/1.7.1//orc-mapreduce-1.7.1.jar +orc-shims/1.7.1//orc-shims-1.7.1.jar oro/2.0.8//oro-2.0.8.jar osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar paranamer/2.8//paranamer-2.8.jar diff --git a/pom.xml b/pom.xml index 45ff0ee..d7bab23 100644 --- a/pom.xml +++ b/pom.xml @@ -137,7 +137,7 @@ 10.14.2.0 1.12.2 -1.7.0 +1.7.1 9.4.43.v20210629 4.0.3 0.10.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ddf27bd -> 1047708)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ddf27bd [SPARK-37223][SQL][TESTS] Fix unit test check in JoinHintSuite add 1047708 [SPARK-37207][SQL][PYTHON] Add isEmpty method for the Python DataFrame API No new revisions were added by this update. Summary of changes: python/docs/source/reference/pyspark.sql.rst | 1 + python/pyspark/sql/dataframe.py | 16 2 files changed, 17 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-37223][SQL][TESTS] Fix unit test check in JoinHintSuite
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ddf27bd [SPARK-37223][SQL][TESTS] Fix unit test check in JoinHintSuite ddf27bd is described below commit ddf27bd3af4cee733b8303c9cde386861e87c449 Author: Cheng Su AuthorDate: Sun Nov 7 07:48:31 2021 -0600 [SPARK-37223][SQL][TESTS] Fix unit test check in JoinHintSuite ### What changes were proposed in this pull request? This is to fix the unit test where we should assert on the content of log in `JoinHintSuite`. ### Why are the changes needed? Improve test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Changed test itself. Closes #34501 from c21/test-fix. Authored-by: Cheng Su Signed-off-by: Sean Owen --- .../test/scala/org/apache/spark/sql/JoinHintSuite.scala| 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala index 91cad85..99bad40 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala @@ -612,8 +612,9 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) .filter(_.contains("is not supported in the query:")) -assert(logs.size == 2) -logs.forall(_.contains(s"build left for ${joinType.split("_").mkString(" ")} join.")) +assert(logs.size === 2) +logs.foreach(log => + assert(log.contains(s"build left for ${joinType.split("_").mkString(" ")} join."))) } Seq("left_outer", "left_semi", "left_anti").foreach { joinType => @@ -640,8 +641,9 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP } val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) .filter(_.contains("is not supported in the query:")) -assert(logs.size == 2) -logs.forall(_.contains(s"build right for ${joinType.split("_").mkString(" ")} join.")) +assert(logs.size === 2) +logs.foreach(log => + assert(log.contains(s"build right for ${joinType.split("_").mkString(" ")} join."))) } Seq("right_outer").foreach { joinType => @@ -689,8 +691,8 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with AdaptiveSparkP } val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) .filter(_.contains("is not supported in the query:")) -assert(logs.size == 2) -logs.forall(_.contains("no equi-join keys")) +assert(logs.size === 2) +logs.foreach(log => assert(log.contains("no equi-join keys"))) } test("SPARK-36652: AQE dynamic join selection should not apply to non-equi join") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36665][SQL] Add more Not operator simplifications
This is an automated email from the ASF dual-hosted git repository. viirya pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 977dd05 [SPARK-36665][SQL] Add more Not operator simplifications 977dd05 is described below commit 977dd054ed0946b62e62d2d480dbf25598545a5e Author: Kazuyuki Tanimura AuthorDate: Sun Nov 7 01:17:58 2021 -0700 [SPARK-36665][SQL] Add more Not operator simplifications ### What changes were proposed in this pull request? This PR proposes to add more Not operator simplifications in `BooleanSimplification` by applying the following rules - Not(null) == null - e.g. IsNull(Not(...)) can be IsNull(...) - (Not(a) = b) == (a = Not(b)) - e.g. Not(...) = true can be (...) = false - (a != b) == (a = Not(b)) - e.g. (...) != true can be (...) = false ### Why are the changes needed? This PR simplifies SQL statements that includes Not operators. In addition, the following query does not push down the filter in the current implementation ``` SELECT * FROM t WHERE (not boolean_col) <=> null ``` although the following equivalent query pushes down the filter as expected. ``` SELECT * FROM t WHERE boolean_col <=> null ``` That is because the first query creates `IsNull(Not(boolean_col))` in the current implementation, which should be able to get simplified further to `IsNull(boolean_col)` This PR helps optimizing such cases. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests ``` build/sbt "testOnly *BooleanSimplificationSuite -- -z SPARK-36665" ``` Closes #33930 from kazuyukitanimura/SPARK-36665. Authored-by: Kazuyuki Tanimura Signed-off-by: Liang-Chi Hsieh --- .../spark/sql/catalyst/optimizer/Optimizer.scala | 2 + .../spark/sql/catalyst/optimizer/expressions.scala | 80 ++ .../sql/catalyst/rules/RuleIdCollection.scala | 2 + .../catalyst/optimizer/NotPropagationSuite.scala | 176 + .../optimizer/NullDownPropagationSuite.scala | 59 +++ 5 files changed, 319 insertions(+) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 298da4f..5386907 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -99,6 +99,7 @@ abstract class Optimizer(catalogManager: CatalogManager) OptimizeRepartition, TransposeWindow, NullPropagation, +NullDownPropagation, ConstantPropagation, FoldablePropagation, OptimizeIn, @@ -106,6 +107,7 @@ abstract class Optimizer(catalogManager: CatalogManager) EliminateAggregateFilter, ReorderAssociativeOperator, LikeSimplification, +NotPropagation, BooleanSimplification, SimplifyConditionals, PushFoldableIntoBranches, diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala index 0ec8bad..a32306f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala @@ -447,6 +447,53 @@ object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper { /** + * Move/Push `Not` operator if it's beneficial. + */ +object NotPropagation extends Rule[LogicalPlan] { + // Given argument x, return true if expression Not(x) can be simplified + // E.g. let x == Not(y), then canSimplifyNot(x) == true because Not(x) == Not(Not(y)) == y + // For the case of x = EqualTo(a, b), recursively check each child expression + // Extra nullable check is required for EqualNullSafe because + // Not(EqualNullSafe(e, null)) is different from EqualNullSafe(e, Not(null)) + private def canSimplifyNot(x: Expression): Boolean = x match { +case Literal(_, BooleanType) | Literal(_, NullType) => true +case _: Not | _: IsNull | _: IsNotNull | _: And | _: Or => true +case _: GreaterThan | _: GreaterThanOrEqual | _: LessThan | _: LessThanOrEqual => true +case EqualTo(a, b) if canSimplifyNot(a) || canSimplifyNot(b) => true +case EqualNullSafe(a, b) + if !a.nullable && !b.nullable && (canSimplifyNot(a) || canSimplifyNot(b)) => true +case _ => false + } + + def apply(plan: LogicalPlan): LogicalPlan = plan.transformWithPruning( +_.containsPattern(NOT), ruleId) { +case q: LogicalPlan =>