[spark] branch master updated: [SPARK-36899][R] Support ILIKE API on R
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 17e3ca6 [SPARK-36899][R] Support ILIKE API on R 17e3ca6 is described below commit 17e3ca6df5eb4b7b74cd8d04868da39eb0137826 Author: Leona Yoda AuthorDate: Thu Sep 30 14:43:09 2021 +0900 [SPARK-36899][R] Support ILIKE API on R ### What changes were proposed in this pull request? Support ILIKE (case insensitive LIKE) API on R. ### Why are the changes needed? ILIKE statement on SQL interface is supported by SPARK-36674. This PR will support R API for it. ### Does this PR introduce _any_ user-facing change? Yes. Users can call ilike from R. ### How was this patch tested? Unit tests. Closes #34152 from yoda-mon/r-ilike. Authored-by: Leona Yoda Signed-off-by: Kousuke Saruta --- R/pkg/NAMESPACE | 1 + R/pkg/R/column.R | 2 +- R/pkg/R/generics.R| 3 +++ R/pkg/tests/fulltests/test_sparkSQL.R | 2 ++ 4 files changed, 7 insertions(+), 1 deletion(-) diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index 5de7aeb..11403f6 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -316,6 +316,7 @@ exportMethods("%<=>%", "hour", "hypot", "ifelse", + "ilike", "initcap", "input_file_name", "instr", diff --git a/R/pkg/R/column.R b/R/pkg/R/column.R index 9fa117c..f1fd30e 100644 --- a/R/pkg/R/column.R +++ b/R/pkg/R/column.R @@ -72,7 +72,7 @@ column_functions1 <- c( "desc", "desc_nulls_first", "desc_nulls_last", "isNaN", "isNull", "isNotNull" ) -column_functions2 <- c("like", "rlike", "getField", "getItem", "contains") +column_functions2 <- c("like", "rlike", "ilike", "getField", "getItem", "contains") createOperator <- function(op) { setMethod(op, diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R index 9da818b..ad29a70 100644 --- a/R/pkg/R/generics.R +++ b/R/pkg/R/generics.R @@ -725,6 +725,9 @@ setGeneric("like", function(x, ...) { standardGeneric("like") }) #' @rdname columnfunctions setGeneric("rlike", function(x, ...) { standardGeneric("rlike") }) +#' @rdname columnfunctions +setGeneric("ilike", function(x, ...) { standardGeneric("ilike") }) + #' @rdname startsWith setGeneric("startsWith", function(x, prefix) { standardGeneric("startsWith") }) diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R b/R/pkg/tests/fulltests/test_sparkSQL.R index bd5c250..1d8ac2b 100644 --- a/R/pkg/tests/fulltests/test_sparkSQL.R +++ b/R/pkg/tests/fulltests/test_sparkSQL.R @@ -2130,6 +2130,8 @@ test_that("higher order functions", { expr("transform(xs, (x, i) -> CASE WHEN ((i % 2.0) = 0.0) THEN x ELSE (- x) END)"), array_exists("vs", function(v) rlike(v, "FAILED")) == expr("exists(vs, v -> (v RLIKE 'FAILED'))"), +array_exists("vs", function(v) ilike(v, "failed")) == + expr("exists(vs, v -> (v ILIKE 'failed'))"), array_forall("xs", function(x) x > 0) == expr("forall(xs, x -> x > 0)"), array_filter("xs", function(x, i) x > 0 | i %% 2 == 0) == - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36896][PYTHON] Return boolean for `dropTempView` and `dropGlobalTempView`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ad5a535 [SPARK-36896][PYTHON] Return boolean for `dropTempView` and `dropGlobalTempView` ad5a535 is described below commit ad5a53511e46d4a432f1143cd3d6dee8af1f224a Author: Xinrong Meng AuthorDate: Thu Sep 30 14:27:00 2021 +0900 [SPARK-36896][PYTHON] Return boolean for `dropTempView` and `dropGlobalTempView` ### What changes were proposed in this pull request? Currently `dropTempView` and `dropGlobalTempView` don't have return value, which conflicts with their docstring: `Returns true if this view is dropped successfully, false otherwise.`. And that's not consistent with the same API in other languages. The PR proposes a fix for that. ### Why are the changes needed? Be consistent with API in other languages. ### Does this PR introduce _any_ user-facing change? Yes. From ```py # dropTempView >>> spark.createDataFrame([(1, 1)]).createTempView("my_table") >>> spark.table("my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropTempView("my_table") >>> spark.catalog.dropTempView("my_table") # dropGlobalTempView >>> spark.createDataFrame([(1, 1)]).createGlobalTempView("my_table") >>> spark.table("global_temp.my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropGlobalTempView("my_table") >>> spark.catalog.dropGlobalTempView("my_table") ``` To ```py # dropTempView >>> spark.createDataFrame([(1, 1)]).createTempView("my_table") >>> spark.table("my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropTempView("my_table") True >>> spark.catalog.dropTempView("my_table") False # dropGlobalTempView >>> spark.createDataFrame([(1, 1)]).createGlobalTempView("my_table") >>> spark.table("global_temp.my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropGlobalTempView("my_table") True >>> spark.catalog.dropGlobalTempView("my_table") False ``` ### How was this patch tested? Existing tests. Closes #34147 from xinrong-databricks/fix_return. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/catalog.py | 8 +--- python/pyspark/sql/dataframe.py | 6 ++ 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/python/pyspark/sql/catalog.py b/python/pyspark/sql/catalog.py index 4990133..3760f96 100644 --- a/python/pyspark/sql/catalog.py +++ b/python/pyspark/sql/catalog.py @@ -50,7 +50,7 @@ class Catalog(object): @since(2.0) def setCurrentDatabase(self, dbName): """Sets the current default database in this session.""" -return self._jcatalog.setCurrentDatabase(dbName) +self._jcatalog.setCurrentDatabase(dbName) @since(2.0) def listDatabases(self): @@ -323,12 +323,13 @@ class Catalog(object): >>> spark.table("my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropTempView("my_table") +True >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... AnalysisException: ... """ -self._jcatalog.dropTempView(viewName) +return self._jcatalog.dropTempView(viewName) def dropGlobalTempView(self, viewName): """Drops the global temporary view with the given view name in the catalog. @@ -343,12 +344,13 @@ class Catalog(object): >>> spark.table("global_temp.my_table").collect() [Row(_1=1, _2=1)] >>> spark.catalog.dropGlobalTempView("my_table") +True >>> spark.table("global_temp.my_table") # doctest: +IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... AnalysisException: ... """ -self._jcatalog.dropGlobalTempView(viewName) +return self._jcatalog.dropGlobalTempView(viewName) def registerFunction(self, name, f, returnType=None): """An alias for :func:`spark.udf.register`. diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index de289e1..8d4c94f 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -136,6 +136,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): >>> sorted(df.collect()) == sorted(df2.collect()) True >>> spark.catalog.dropTempView("people") +True + """ warnings.warn( "Deprecated in 2.0, use createOrReplaceTempView instead.", @@ -164,6 +166,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): ... AnalysisException: u"Temporary table 'people' already exists;"
[spark] branch master updated (8357572 -> b60e576)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8357572 [SPARK-36888][SQL] add tests cases for sha2 function add b60e576 [SPARK-36893][BUILD][MESOS] Upgrade mesos into 1.4.3 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +- dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +- resource-managers/mesos/pom.xml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36888][SQL] add tests cases for sha2 function
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8357572 [SPARK-36888][SQL] add tests cases for sha2 function 8357572 is described below commit 8357572af5bea16913a44467400a7f4e2473de59 Author: Richard Chen AuthorDate: Thu Sep 30 09:52:40 2021 +0900 [SPARK-36888][SQL] add tests cases for sha2 function ### What changes were proposed in this pull request? Adding tests to `HashExpressionSuite` to test the `sha2` function to test for bit lengths of 0 and 512. Also add comments to clarify existing ambiguous comment. ### Why are the changes needed? Currently, the `sha2` function with bit lengths of `0` and `512` were not tested. This PR adds those tests ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Ran the sha2 test in `HashExpressionsSuite` to ensure added tests pass. Closes #34145 from richardc-db/add_sha_tests. Authored-by: Richard Chen Signed-off-by: Hyukjin Kwon --- .../spark/sql/catalyst/expressions/HashExpressionsSuite.scala | 6 ++ 1 file changed, 6 insertions(+) diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala index f1c4b355..5897dee 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala @@ -62,13 +62,19 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { test("sha2") { checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), Literal(224)), "107c5072b799c4771f328304cfe1ebb375eb6ea7f35a3aa753836fad") +checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), Literal(0)), + DigestUtils.sha256Hex("ABC")) checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), Literal(256)), DigestUtils.sha256Hex("ABC")) checkEvaluation(Sha2(Literal.create(Array[Byte](1, 2, 3, 4, 5, 6), BinaryType), Literal(384)), DigestUtils.sha384Hex(Array[Byte](1, 2, 3, 4, 5, 6))) +checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), Literal(512)), + DigestUtils.sha512Hex("ABC")) // unsupported bit length checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), null) +// null input and valid bit length checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), null) +// valid input and null bit length checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), Literal.create(null, IntegerType)), null) checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal.create(null, IntegerType)), null) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aa9064a [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images aa9064a is described below commit aa9064ad96ff7cefaa4381e912608b0b0d39a09c Author: Dongjoon Hyun AuthorDate: Wed Sep 29 11:39:01 2021 -0700 [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images ### What changes were proposed in this pull request? This PR aims to upgrade GitHub Action CI image to recover CRAN installation failure. ### Why are the changes needed? Sometimes, GitHub Action linter job failed - https://github.com/apache/spark/runs/3739748809 New image have R 4.1.1 and will recover the failure. ``` $ docker run -it --rm dongjoon/apache-spark-github-action-image:20210928 R --version R version 4.1.1 (2021-08-10) -- "Kick Things" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/. ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass `GitHub Action`. Closes #34138 from dongjoon-hyun/SPARK-36883. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .github/workflows/build_and_test.yml | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 89c5227e..c51afd6 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -228,7 +228,7 @@ jobs: name: "Build modules (${{ format('{0}, {1} job', needs.configure-jobs.outputs.branch, needs.configure-jobs.outputs.type) }}): ${{ matrix.modules }}" runs-on: ubuntu-20.04 container: - image: dongjoon/apache-spark-github-action-image:20210730 + image: dongjoon/apache-spark-github-action-image:20210930 strategy: fail-fast: false matrix: @@ -326,7 +326,7 @@ jobs: name: "Build modules: sparkr" runs-on: ubuntu-20.04 container: - image: dongjoon/apache-spark-github-action-image:20210602 + image: dongjoon/apache-spark-github-action-image:20210930 env: HADOOP_PROFILE: ${{ needs.configure-jobs.outputs.hadoop }} HIVE_PROFILE: hive2.3 @@ -391,8 +391,9 @@ jobs: LC_ALL: C.UTF-8 LANG: C.UTF-8 PYSPARK_DRIVER_PYTHON: python3.9 + PYSPARK_PYTHON: python3.9 container: - image: dongjoon/apache-spark-github-action-image:20210602 + image: dongjoon/apache-spark-github-action-image:20210930 steps: - name: Checkout Spark repository uses: actions/checkout@v2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36831][SQL] Support reading and writing ANSI intervals from/to CSV datasources
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a8734e3 [SPARK-36831][SQL] Support reading and writing ANSI intervals from/to CSV datasources a8734e3 is described below commit a8734e3f1695a3c436f65bbb1d54d1d02b0df33f Author: Kousuke Saruta AuthorDate: Wed Sep 29 21:22:34 2021 +0300 [SPARK-36831][SQL] Support reading and writing ANSI intervals from/to CSV datasources ### What changes were proposed in this pull request? This PR aims to support reading and writing ANSI intervals from/to CSV datasources. Aith this change, a interval data is written as a literal form like `INTERVAL '1-2' YEAR TO MONTH`. For the reading part, we need to specify the schema explicitly like: ``` val readDF = spark.read.schema("col INTERVAL YEAR TO MONTH").csv(...) ``` ### Why are the changes needed? For better usability. There should be no reason to prohibit from reading/writing ANSI intervals from/to CSV datasources. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. It covers both V1 and V2 sources. Closes #34142 from sarutak/ansi-interval-csv-source. Authored-by: Kousuke Saruta Signed-off-by: Max Gekk --- .../sql/execution/datasources/DataSource.scala | 9 +++-- .../execution/datasources/csv/CSVFileFormat.scala | 2 -- .../sql/execution/datasources/v2/csv/CSVTable.scala | 4 +--- .../datasources/CommonFileDataSourceSuite.scala | 2 +- .../sql/execution/datasources/csv/CSVSuite.scala| 21 - 5 files changed, 29 insertions(+), 9 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 0707af4..be9a912 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -579,11 +579,16 @@ case class DataSource( checkEmptyGlobPath, checkFilesExist, enableGlobbing = globPaths) } + // TODO: Remove the Set below once all the built-in datasources support ANSI interval types + private val writeAllowedSources: Set[Class[_]] = +Set(classOf[ParquetFileFormat], classOf[CSVFileFormat]) + private def disallowWritingIntervals( dataTypes: Seq[DataType], forbidAnsiIntervals: Boolean): Unit = { -val isParquet = providingClass == classOf[ParquetFileFormat] -dataTypes.foreach(TypeUtils.invokeOnceForInterval(_, forbidAnsiIntervals || !isParquet) { +val isWriteAllowedSource = writeAllowedSources(providingClass) +dataTypes.foreach( + TypeUtils.invokeOnceForInterval(_, forbidAnsiIntervals || !isWriteAllowedSource) { throw QueryCompilationErrors.cannotSaveIntervalIntoExternalStorageError() }) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala index 8add63c..d40ad9d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala @@ -148,8 +148,6 @@ class CSVFileFormat extends TextBasedFileFormat with DataSourceRegister { override def equals(other: Any): Boolean = other.isInstanceOf[CSVFileFormat] override def supportDataType(dataType: DataType): Boolean = dataType match { -case _: AnsiIntervalType => false - case _: AtomicType => true case udt: UserDefinedType[_] => supportDataType(udt.sqlType) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala index 02601b3..839cd01 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala @@ -26,7 +26,7 @@ import org.apache.spark.sql.connector.write.{LogicalWriteInfo, Write, WriteBuild import org.apache.spark.sql.execution.datasources.FileFormat import org.apache.spark.sql.execution.datasources.csv.CSVDataSource import org.apache.spark.sql.execution.datasources.v2.FileTable -import org.apache.spark.sql.types.{AnsiIntervalType, AtomicType, DataType, StructType, UserDefinedType} +import org.apache.spark.sql.types.{AtomicType, DataType, StructType, UserDefinedType} import org.apache.spark.sql.util.CaseInsensitiveStringMap case class CSVTable( @@ -55,8 +55,6 @@ case
[spark] branch master updated: [SPARK-36624][YARN] In yarn client mode, when ApplicationMaster failed with KILLED/FAILED, driver should exit with code not 0
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dcada3d [SPARK-36624][YARN] In yarn client mode, when ApplicationMaster failed with KILLED/FAILED, driver should exit with code not 0 dcada3d is described below commit dcada3d48c51f4855c600dc254883bd9eb3a0a1c Author: Angerszh AuthorDate: Wed Sep 29 11:12:01 2021 -0500 [SPARK-36624][YARN] In yarn client mode, when ApplicationMaster failed with KILLED/FAILED, driver should exit with code not 0 ### What changes were proposed in this pull request? In current code for yarn client mode, even when use use `yarn application -kill` to kill the application, driver side still exit with code 0. This behavior make job scheduler can't know the job is not success. and user don't know too. In this case we should exit program with a non 0 code. ### Why are the changes needed? Make scheduler/user more clear about application's status ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Closes #33873 from AngersZh/SPDI-36624. Authored-by: Angerszh Signed-off-by: Thomas Graves --- docs/running-on-yarn.md | 10 ++ .../src/main/scala/org/apache/spark/deploy/yarn/config.scala | 11 +++ .../spark/scheduler/cluster/YarnClientSchedulerBackend.scala | 10 +- 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 9930f3e..37ff479 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -442,6 +442,16 @@ To use a custom metrics.properties for the application master and executors, upd 1.6.0 + spark.yarn.am.clientModeExitOnError + false + + In yarn-client mode, when this is true, if driver got application report with final status of KILLED or FAILED, + driver will stop corresponding SparkContext and exit program with code 1. + Note, if this is true and called from another application, it will terminate the parent application as well. + + 3.3.0 + + spark.yarn.executor.failuresValidityInterval (none) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala index 89a4af2..ab2063c 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala @@ -52,6 +52,17 @@ package object config extends Logging { .timeConf(TimeUnit.MILLISECONDS) .createOptional + private[spark] val AM_CLIENT_MODE_EXIT_ON_ERROR = +ConfigBuilder("spark.yarn.am.clientModeExitOnError") + .doc("In yarn-client mode, when this is true, if driver got " + +"application report with final status of KILLED or FAILED, " + +"driver will stop corresponding SparkContext and exit program with code 1. " + +"Note, if this is true and called from another application, it will terminate " + +"the parent application as well.") + .version("3.3.0") + .booleanConf + .createWithDefault(false) + private[spark] val EXECUTOR_ATTEMPT_FAILURE_VALIDITY_INTERVAL_MS = ConfigBuilder("spark.yarn.executor.failuresValidityInterval") .doc("Interval after which Executor failures will be considered independent and not " + diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala index 8a55e61..28c8652 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala @@ -21,7 +21,7 @@ import java.io.InterruptedIOException import scala.collection.mutable.ArrayBuffer -import org.apache.hadoop.yarn.api.records.YarnApplicationState +import org.apache.hadoop.yarn.api.records.{FinalApplicationStatus, YarnApplicationState} import org.apache.spark.{SparkContext, SparkException} import org.apache.spark.deploy.yarn.{Client, ClientArguments, YarnAppReport} @@ -122,6 +122,14 @@ private[spark] class YarnClientSchedulerBackend( } allowInterrupt = false sc.stop() +state match { + case FinalApplicationStatus.FAILED | FinalApplicationStatus.KILLED +if conf.get(AM_CLIENT_MODE_EXIT_ON_ERROR) => +logWarning(s"ApplicationMaster finished with status ${state}, " + + s"SparkContext should exit with code 1.") +System.exit(1) +
[spark] branch master updated: [SPARK-36550][SQL] Propagation cause when UDF reflection fails
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d003db3 [SPARK-36550][SQL] Propagation cause when UDF reflection fails d003db3 is described below commit d003db34c4c5c52a8c99ceecf7233a9b19a69b81 Author: sychen AuthorDate: Wed Sep 29 08:30:50 2021 -0500 [SPARK-36550][SQL] Propagation cause when UDF reflection fails ### What changes were proposed in this pull request? When the exception is InvocationTargetException, get cause and stack trace. ### Why are the changes needed? Now when UDF reflection fails, InvocationTargetException is thrown, but it is not a specific exception. ``` Error in query: No handler for Hive UDF 'XXX': java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test Closes #33796 from cxzl25/SPARK-36550. Authored-by: sychen Signed-off-by: Sean Owen --- .../main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala index 7cbaa8a..56818b5 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala @@ -17,6 +17,7 @@ package org.apache.spark.sql.hive +import java.lang.reflect.InvocationTargetException import java.util.Locale import scala.util.{Failure, Success, Try} @@ -87,7 +88,11 @@ private[sql] class HiveSessionCatalog( udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema } } catch { - case NonFatal(e) => + case NonFatal(exception) => +val e = exception match { + case i: InvocationTargetException => i.getCause + case o => o +} val errorMsg = s"No handler for UDF/UDAF/UDTF '${clazz.getCanonicalName}': $e" val analysisException = new AnalysisException(errorMsg) analysisException.setStackTrace(e.getStackTrace) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0e23bd7 -> bbd1318)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0e23bd7 [SPARK-34980][SQL] Support coalesce partition through union in AQE add bbd1318 [SPARK-36889][SQL] Respect `spark.sql.parquet.filterPushdown` by v2 parquet scan builder No new revisions were added by this update. Summary of changes: .../v2/parquet/ParquetScanBuilder.scala| 44 -- .../datasources/parquet/ParquetFilterSuite.scala | 13 +++ 2 files changed, 37 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ca1c09d -> 0e23bd7)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ca1c09d [SPARK-36882][PYTHON] Support ILIKE API on Python add 0e23bd7 [SPARK-34980][SQL] Support coalesce partition through union in AQE No new revisions were added by this update. Summary of changes: .../adaptive/CoalesceShufflePartitions.scala | 143 ++--- .../execution/CoalesceShufflePartitionsSuite.scala | 6 +- .../adaptive/AdaptiveQueryExecSuite.scala | 85 +++- 3 files changed, 181 insertions(+), 53 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-36882][PYTHON] Support ILIKE API on Python
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ca1c09d [SPARK-36882][PYTHON] Support ILIKE API on Python ca1c09d is described below commit ca1c09d88c21d0f8664df8e852778f864f130d94 Author: Leona Yoda AuthorDate: Wed Sep 29 15:04:03 2021 +0900 [SPARK-36882][PYTHON] Support ILIKE API on Python ### What changes were proposed in this pull request? Support ILIKE (case insensitive LIKE) API on Python. ### Why are the changes needed? ILIKE statement on SQL interface is supported by SPARK-36674. This PR will support Python API for it. ### Does this PR introduce _any_ user-facing change? Yes. Users can call `ilike` from Python. ### How was this patch tested? Unit tests. Closes #34135 from yoda-mon/python-ilike. Authored-by: Leona Yoda Signed-off-by: Kousuke Saruta --- python/docs/source/reference/pyspark.sql.rst | 1 + python/pyspark/sql/column.py | 21 + python/pyspark/sql/tests/test_column.py | 2 +- 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/python/docs/source/reference/pyspark.sql.rst b/python/docs/source/reference/pyspark.sql.rst index f5a8357..0fd2c4d 100644 --- a/python/docs/source/reference/pyspark.sql.rst +++ b/python/docs/source/reference/pyspark.sql.rst @@ -259,6 +259,7 @@ Column APIs Column.eqNullSafe Column.getField Column.getItem +Column.ilike Column.isNotNull Column.isNull Column.isin diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 9046e7f..c46b0eb 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -507,6 +507,26 @@ class Column(object): >>> df.filter(df.name.like('Al%')).collect() [Row(age=2, name='Alice')] """ +_ilike_doc = """ +SQL ILIKE expression (case insensitive LIKE). Returns a boolean :class:`Column` +based on a case insensitive match. + +.. versionadded:: 3.3.0 + +Parameters +-- +other : str +a SQL LIKE pattern + +See Also + +pyspark.sql.Column.rlike + +Examples + +>>> df.filter(df.name.ilike('%Ice')).collect() +[Row(age=2, name='Alice')] +""" _startswith_doc = """ String starts with. Returns a boolean :class:`Column` based on a string match. @@ -541,6 +561,7 @@ class Column(object): contains = _bin_op("contains", _contains_doc) rlike = _bin_op("rlike", _rlike_doc) like = _bin_op("like", _like_doc) +ilike = _bin_op("ilike", _ilike_doc) startswith = _bin_op("startsWith", _startswith_doc) endswith = _bin_op("endsWith", _endswith_doc) diff --git a/python/pyspark/sql/tests/test_column.py b/python/pyspark/sql/tests/test_column.py index c2530b2..9a918c2 100644 --- a/python/pyspark/sql/tests/test_column.py +++ b/python/pyspark/sql/tests/test_column.py @@ -75,7 +75,7 @@ class ColumnTests(ReusedSQLTestCase): self.assertTrue(all(isinstance(c, Column) for c in cb)) cbool = (ci & ci), (ci | ci), (~ci) self.assertTrue(all(isinstance(c, Column) for c in cbool)) -css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.asc(), cs.desc(),\ +css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.ilike('A'), cs.asc(), cs.desc(),\ cs.startswith('a'), cs.endswith('a'), ci.eqNullSafe(cs) self.assertTrue(all(isinstance(c, Column) for c in css)) self.assertTrue(isinstance(ci.cast(LongType()), Column)) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org