[spark] branch master updated: [SPARK-36899][R] Support ILIKE API on R

2021-09-29 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17e3ca6  [SPARK-36899][R] Support ILIKE API on R
17e3ca6 is described below

commit 17e3ca6df5eb4b7b74cd8d04868da39eb0137826
Author: Leona Yoda 
AuthorDate: Thu Sep 30 14:43:09 2021 +0900

[SPARK-36899][R] Support ILIKE API on R

### What changes were proposed in this pull request?

Support ILIKE (case insensitive LIKE) API on R.

### Why are the changes needed?

ILIKE statement on SQL interface is supported by SPARK-36674.
This PR will support R API for it.

### Does this PR introduce _any_ user-facing change?

Yes. Users can call ilike from R.

### How was this patch tested?

Unit tests.

Closes #34152 from yoda-mon/r-ilike.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 R/pkg/NAMESPACE   | 1 +
 R/pkg/R/column.R  | 2 +-
 R/pkg/R/generics.R| 3 +++
 R/pkg/tests/fulltests/test_sparkSQL.R | 2 ++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 5de7aeb..11403f6 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -316,6 +316,7 @@ exportMethods("%<=>%",
   "hour",
   "hypot",
   "ifelse",
+  "ilike",
   "initcap",
   "input_file_name",
   "instr",
diff --git a/R/pkg/R/column.R b/R/pkg/R/column.R
index 9fa117c..f1fd30e 100644
--- a/R/pkg/R/column.R
+++ b/R/pkg/R/column.R
@@ -72,7 +72,7 @@ column_functions1 <- c(
   "desc", "desc_nulls_first", "desc_nulls_last",
   "isNaN", "isNull", "isNotNull"
 )
-column_functions2 <- c("like", "rlike", "getField", "getItem", "contains")
+column_functions2 <- c("like", "rlike", "ilike", "getField", "getItem", 
"contains")
 
 createOperator <- function(op) {
   setMethod(op,
diff --git a/R/pkg/R/generics.R b/R/pkg/R/generics.R
index 9da818b..ad29a70 100644
--- a/R/pkg/R/generics.R
+++ b/R/pkg/R/generics.R
@@ -725,6 +725,9 @@ setGeneric("like", function(x, ...) { 
standardGeneric("like") })
 #' @rdname columnfunctions
 setGeneric("rlike", function(x, ...) { standardGeneric("rlike") })
 
+#' @rdname columnfunctions
+setGeneric("ilike", function(x, ...) { standardGeneric("ilike") })
+
 #' @rdname startsWith
 setGeneric("startsWith", function(x, prefix) { standardGeneric("startsWith") })
 
diff --git a/R/pkg/tests/fulltests/test_sparkSQL.R 
b/R/pkg/tests/fulltests/test_sparkSQL.R
index bd5c250..1d8ac2b 100644
--- a/R/pkg/tests/fulltests/test_sparkSQL.R
+++ b/R/pkg/tests/fulltests/test_sparkSQL.R
@@ -2130,6 +2130,8 @@ test_that("higher order functions", {
   expr("transform(xs, (x, i) -> CASE WHEN ((i % 2.0) = 0.0) THEN x ELSE (- 
x) END)"),
 array_exists("vs", function(v) rlike(v, "FAILED")) ==
   expr("exists(vs, v -> (v RLIKE 'FAILED'))"),
+array_exists("vs", function(v) ilike(v, "failed")) ==
+  expr("exists(vs, v -> (v ILIKE 'failed'))"),
 array_forall("xs", function(x) x > 0) ==
   expr("forall(xs, x -> x > 0)"),
 array_filter("xs", function(x, i) x > 0 | i %% 2 == 0) ==

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36896][PYTHON] Return boolean for `dropTempView` and `dropGlobalTempView`

2021-09-29 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad5a535  [SPARK-36896][PYTHON] Return boolean for `dropTempView` and 
`dropGlobalTempView`
ad5a535 is described below

commit ad5a53511e46d4a432f1143cd3d6dee8af1f224a
Author: Xinrong Meng 
AuthorDate: Thu Sep 30 14:27:00 2021 +0900

[SPARK-36896][PYTHON] Return boolean for `dropTempView` and 
`dropGlobalTempView`

### What changes were proposed in this pull request?
Currently `dropTempView` and `dropGlobalTempView` don't have return value, 
which conflicts with their docstring:
`Returns true if this view is dropped successfully, false otherwise.`. And 
that's not consistent with the same API in other languages.

The PR proposes a fix for that.

### Why are the changes needed?
Be consistent with API in other languages.

### Does this PR introduce _any_ user-facing change?
Yes.
 From
```py
# dropTempView
>>> spark.createDataFrame([(1, 1)]).createTempView("my_table")
>>> spark.table("my_table").collect()
[Row(_1=1, _2=1)]
>>> spark.catalog.dropTempView("my_table")
>>> spark.catalog.dropTempView("my_table")

# dropGlobalTempView
>>> spark.createDataFrame([(1, 1)]).createGlobalTempView("my_table")
>>> spark.table("global_temp.my_table").collect()
[Row(_1=1, _2=1)]
>>> spark.catalog.dropGlobalTempView("my_table")
>>> spark.catalog.dropGlobalTempView("my_table")
```

 To
```py
# dropTempView
>>> spark.createDataFrame([(1, 1)]).createTempView("my_table")
>>> spark.table("my_table").collect()
[Row(_1=1, _2=1)]
>>> spark.catalog.dropTempView("my_table")
True
>>> spark.catalog.dropTempView("my_table")
False

# dropGlobalTempView
>>> spark.createDataFrame([(1, 1)]).createGlobalTempView("my_table")
>>> spark.table("global_temp.my_table").collect()
[Row(_1=1, _2=1)]
>>> spark.catalog.dropGlobalTempView("my_table")
True
>>> spark.catalog.dropGlobalTempView("my_table")
False
```

### How was this patch tested?
Existing tests.

Closes #34147 from xinrong-databricks/fix_return.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/catalog.py   | 8 +---
 python/pyspark/sql/dataframe.py | 6 ++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/catalog.py b/python/pyspark/sql/catalog.py
index 4990133..3760f96 100644
--- a/python/pyspark/sql/catalog.py
+++ b/python/pyspark/sql/catalog.py
@@ -50,7 +50,7 @@ class Catalog(object):
 @since(2.0)
 def setCurrentDatabase(self, dbName):
 """Sets the current default database in this session."""
-return self._jcatalog.setCurrentDatabase(dbName)
+self._jcatalog.setCurrentDatabase(dbName)
 
 @since(2.0)
 def listDatabases(self):
@@ -323,12 +323,13 @@ class Catalog(object):
 >>> spark.table("my_table").collect()
 [Row(_1=1, _2=1)]
 >>> spark.catalog.dropTempView("my_table")
+True
 >>> spark.table("my_table") # doctest: +IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 AnalysisException: ...
 """
-self._jcatalog.dropTempView(viewName)
+return self._jcatalog.dropTempView(viewName)
 
 def dropGlobalTempView(self, viewName):
 """Drops the global temporary view with the given view name in the 
catalog.
@@ -343,12 +344,13 @@ class Catalog(object):
 >>> spark.table("global_temp.my_table").collect()
 [Row(_1=1, _2=1)]
 >>> spark.catalog.dropGlobalTempView("my_table")
+True
 >>> spark.table("global_temp.my_table") # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 AnalysisException: ...
 """
-self._jcatalog.dropGlobalTempView(viewName)
+return self._jcatalog.dropGlobalTempView(viewName)
 
 def registerFunction(self, name, f, returnType=None):
 """An alias for :func:`spark.udf.register`.
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index de289e1..8d4c94f 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -136,6 +136,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 >>> sorted(df.collect()) == sorted(df2.collect())
 True
 >>> spark.catalog.dropTempView("people")
+True
+
 """
 warnings.warn(
 "Deprecated in 2.0, use createOrReplaceTempView instead.",
@@ -164,6 +166,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 ...
 AnalysisException: u"Temporary table 'people' already exists;"
 

[spark] branch master updated (8357572 -> b60e576)

2021-09-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8357572  [SPARK-36888][SQL] add tests cases for sha2 function
 add b60e576  [SPARK-36893][BUILD][MESOS] Upgrade mesos into 1.4.3

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 resource-managers/mesos/pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36888][SQL] add tests cases for sha2 function

2021-09-29 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8357572  [SPARK-36888][SQL] add tests cases for sha2 function
8357572 is described below

commit 8357572af5bea16913a44467400a7f4e2473de59
Author: Richard Chen 
AuthorDate: Thu Sep 30 09:52:40 2021 +0900

[SPARK-36888][SQL] add tests cases for sha2 function



### What changes were proposed in this pull request?


Adding tests to `HashExpressionSuite` to test the `sha2` function to test 
for bit lengths of 0 and 512. Also add comments to clarify existing ambiguous 
comment.

### Why are the changes needed?

Currently, the `sha2` function with bit lengths of `0` and `512` were not 
tested. This PR adds those tests

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?


Ran the sha2 test in `HashExpressionsSuite` to ensure added tests pass.

Closes #34145 from richardc-db/add_sha_tests.

Authored-by: Richard Chen 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/catalyst/expressions/HashExpressionsSuite.scala   | 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
index f1c4b355..5897dee 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
@@ -62,13 +62,19 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   test("sha2") {
 checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), 
Literal(224)),
   "107c5072b799c4771f328304cfe1ebb375eb6ea7f35a3aa753836fad")
+checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), 
Literal(0)),
+  DigestUtils.sha256Hex("ABC"))
 checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), 
Literal(256)),
   DigestUtils.sha256Hex("ABC"))
 checkEvaluation(Sha2(Literal.create(Array[Byte](1, 2, 3, 4, 5, 6), 
BinaryType), Literal(384)),
   DigestUtils.sha384Hex(Array[Byte](1, 2, 3, 4, 5, 6)))
+checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)), 
Literal(512)),
+  DigestUtils.sha512Hex("ABC"))
 // unsupported bit length
 checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(1024)), 
null)
+// null input and valid bit length
 checkEvaluation(Sha2(Literal.create(null, BinaryType), Literal(512)), null)
+// valid input and null bit length
 checkEvaluation(Sha2(Literal("ABC".getBytes(StandardCharsets.UTF_8)),
   Literal.create(null, IntegerType)), null)
 checkEvaluation(Sha2(Literal.create(null, BinaryType), 
Literal.create(null, IntegerType)), null)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images

2021-09-29 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aa9064a  [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images
aa9064a is described below

commit aa9064ad96ff7cefaa4381e912608b0b0d39a09c
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 29 11:39:01 2021 -0700

[SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images

### What changes were proposed in this pull request?

This PR aims to upgrade GitHub Action CI image to recover CRAN installation 
failure.

### Why are the changes needed?

Sometimes, GitHub Action linter job failed
- https://github.com/apache/spark/runs/3739748809

New image have R 4.1.1 and will recover the failure.
```
$ docker run -it --rm dongjoon/apache-spark-github-action-image:20210928 R 
--version
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass `GitHub Action`.

Closes #34138 from dongjoon-hyun/SPARK-36883.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 89c5227e..c51afd6 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -228,7 +228,7 @@ jobs:
 name: "Build modules (${{ format('{0}, {1} job', 
needs.configure-jobs.outputs.branch, needs.configure-jobs.outputs.type) }}): 
${{ matrix.modules }}"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20210730
+  image: dongjoon/apache-spark-github-action-image:20210930
 strategy:
   fail-fast: false
   matrix:
@@ -326,7 +326,7 @@ jobs:
 name: "Build modules: sparkr"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20210602
+  image: dongjoon/apache-spark-github-action-image:20210930
 env:
   HADOOP_PROFILE: ${{ needs.configure-jobs.outputs.hadoop }}
   HIVE_PROFILE: hive2.3
@@ -391,8 +391,9 @@ jobs:
   LC_ALL: C.UTF-8
   LANG: C.UTF-8
   PYSPARK_DRIVER_PYTHON: python3.9
+  PYSPARK_PYTHON: python3.9
 container:
-  image: dongjoon/apache-spark-github-action-image:20210602
+  image: dongjoon/apache-spark-github-action-image:20210930
 steps:
 - name: Checkout Spark repository
   uses: actions/checkout@v2

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36831][SQL] Support reading and writing ANSI intervals from/to CSV datasources

2021-09-29 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a8734e3  [SPARK-36831][SQL] Support reading and writing ANSI intervals 
from/to CSV datasources
a8734e3 is described below

commit a8734e3f1695a3c436f65bbb1d54d1d02b0df33f
Author: Kousuke Saruta 
AuthorDate: Wed Sep 29 21:22:34 2021 +0300

[SPARK-36831][SQL] Support reading and writing ANSI intervals from/to CSV 
datasources

### What changes were proposed in this pull request?

This PR aims to support reading and writing ANSI intervals from/to CSV 
datasources.
Aith this change, a interval data is written as a literal form like 
`INTERVAL '1-2' YEAR TO MONTH`.
For the reading part, we need to specify the schema explicitly like:
```
val readDF = spark.read.schema("col INTERVAL YEAR TO MONTH").csv(...)
```

### Why are the changes needed?

For better usability. There should be no reason to prohibit from 
reading/writing ANSI intervals from/to CSV datasources.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test. It covers both V1 and V2 sources.

Closes #34142 from sarutak/ansi-interval-csv-source.

Authored-by: Kousuke Saruta 
Signed-off-by: Max Gekk 
---
 .../sql/execution/datasources/DataSource.scala  |  9 +++--
 .../execution/datasources/csv/CSVFileFormat.scala   |  2 --
 .../sql/execution/datasources/v2/csv/CSVTable.scala |  4 +---
 .../datasources/CommonFileDataSourceSuite.scala |  2 +-
 .../sql/execution/datasources/csv/CSVSuite.scala| 21 -
 5 files changed, 29 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
index 0707af4..be9a912 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
@@ -579,11 +579,16 @@ case class DataSource(
   checkEmptyGlobPath, checkFilesExist, enableGlobbing = globPaths)
   }
 
+  // TODO: Remove the Set below once all the built-in datasources support ANSI 
interval types
+  private val writeAllowedSources: Set[Class[_]] =
+Set(classOf[ParquetFileFormat], classOf[CSVFileFormat])
+
   private def disallowWritingIntervals(
   dataTypes: Seq[DataType],
   forbidAnsiIntervals: Boolean): Unit = {
-val isParquet = providingClass == classOf[ParquetFileFormat]
-dataTypes.foreach(TypeUtils.invokeOnceForInterval(_, forbidAnsiIntervals 
|| !isParquet) {
+val isWriteAllowedSource = writeAllowedSources(providingClass)
+dataTypes.foreach(
+  TypeUtils.invokeOnceForInterval(_, forbidAnsiIntervals || 
!isWriteAllowedSource) {
   throw QueryCompilationErrors.cannotSaveIntervalIntoExternalStorageError()
 })
   }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
index 8add63c..d40ad9d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
@@ -148,8 +148,6 @@ class CSVFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
   override def equals(other: Any): Boolean = other.isInstanceOf[CSVFileFormat]
 
   override def supportDataType(dataType: DataType): Boolean = dataType match {
-case _: AnsiIntervalType => false
-
 case _: AtomicType => true
 
 case udt: UserDefinedType[_] => supportDataType(udt.sqlType)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
index 02601b3..839cd01 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVTable.scala
@@ -26,7 +26,7 @@ import 
org.apache.spark.sql.connector.write.{LogicalWriteInfo, Write, WriteBuild
 import org.apache.spark.sql.execution.datasources.FileFormat
 import org.apache.spark.sql.execution.datasources.csv.CSVDataSource
 import org.apache.spark.sql.execution.datasources.v2.FileTable
-import org.apache.spark.sql.types.{AnsiIntervalType, AtomicType, DataType, 
StructType, UserDefinedType}
+import org.apache.spark.sql.types.{AtomicType, DataType, StructType, 
UserDefinedType}
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 case class CSVTable(
@@ -55,8 +55,6 @@ case 

[spark] branch master updated: [SPARK-36624][YARN] In yarn client mode, when ApplicationMaster failed with KILLED/FAILED, driver should exit with code not 0

2021-09-29 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dcada3d  [SPARK-36624][YARN] In yarn client mode, when 
ApplicationMaster failed with KILLED/FAILED, driver should exit with code not 0
dcada3d is described below

commit dcada3d48c51f4855c600dc254883bd9eb3a0a1c
Author: Angerszh 
AuthorDate: Wed Sep 29 11:12:01 2021 -0500

[SPARK-36624][YARN] In yarn client mode, when ApplicationMaster failed with 
KILLED/FAILED, driver should exit with code not 0

### What changes were proposed in this pull request?
In current code for yarn client mode, even when use use `yarn application 
-kill` to kill the application, driver side still exit with code 0. This 
behavior make job scheduler can't know the job is not success. and user don't 
know too.

In this case we should exit program with a non 0 code.

### Why are the changes needed?
Make scheduler/user more clear about application's status

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #33873 from AngersZh/SPDI-36624.

Authored-by: Angerszh 
Signed-off-by: Thomas Graves 
---
 docs/running-on-yarn.md   | 10 ++
 .../src/main/scala/org/apache/spark/deploy/yarn/config.scala  | 11 +++
 .../spark/scheduler/cluster/YarnClientSchedulerBackend.scala  | 10 +-
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 9930f3e..37ff479 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -442,6 +442,16 @@ To use a custom metrics.properties for the application 
master and executors, upd
   1.6.0
 
 
+  spark.yarn.am.clientModeExitOnError
+  false
+  
+  In yarn-client mode, when this is true, if driver got application report 
with final status of KILLED or FAILED,
+  driver will stop corresponding SparkContext and exit program with code 1.
+  Note, if this is true and called from another application, it will terminate 
the parent application as well.
+  
+  3.3.0
+
+
   spark.yarn.executor.failuresValidityInterval
   (none)
   
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
index 89a4af2..ab2063c 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala
@@ -52,6 +52,17 @@ package object config extends Logging {
   .timeConf(TimeUnit.MILLISECONDS)
   .createOptional
 
+  private[spark] val AM_CLIENT_MODE_EXIT_ON_ERROR =
+ConfigBuilder("spark.yarn.am.clientModeExitOnError")
+  .doc("In yarn-client mode, when this is true, if driver got " +
+"application report with final status of KILLED or FAILED, " +
+"driver will stop corresponding SparkContext and exit program with 
code 1. " +
+"Note, if this is true and called from another application, it will 
terminate " +
+"the parent application as well.")
+  .version("3.3.0")
+  .booleanConf
+  .createWithDefault(false)
+
   private[spark] val EXECUTOR_ATTEMPT_FAILURE_VALIDITY_INTERVAL_MS =
 ConfigBuilder("spark.yarn.executor.failuresValidityInterval")
   .doc("Interval after which Executor failures will be considered 
independent and not " +
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
index 8a55e61..28c8652 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
@@ -21,7 +21,7 @@ import java.io.InterruptedIOException
 
 import scala.collection.mutable.ArrayBuffer
 
-import org.apache.hadoop.yarn.api.records.YarnApplicationState
+import org.apache.hadoop.yarn.api.records.{FinalApplicationStatus, 
YarnApplicationState}
 
 import org.apache.spark.{SparkContext, SparkException}
 import org.apache.spark.deploy.yarn.{Client, ClientArguments, YarnAppReport}
@@ -122,6 +122,14 @@ private[spark] class YarnClientSchedulerBackend(
 }
 allowInterrupt = false
 sc.stop()
+state match {
+  case FinalApplicationStatus.FAILED | FinalApplicationStatus.KILLED
+if conf.get(AM_CLIENT_MODE_EXIT_ON_ERROR) =>
+logWarning(s"ApplicationMaster finished with status ${state}, " +
+  s"SparkContext should exit with code 1.")
+System.exit(1)
+ 

[spark] branch master updated: [SPARK-36550][SQL] Propagation cause when UDF reflection fails

2021-09-29 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d003db3  [SPARK-36550][SQL] Propagation cause when UDF reflection fails
d003db3 is described below

commit d003db34c4c5c52a8c99ceecf7233a9b19a69b81
Author: sychen 
AuthorDate: Wed Sep 29 08:30:50 2021 -0500

[SPARK-36550][SQL] Propagation cause when UDF reflection fails

### What changes were proposed in this pull request?
When the exception is InvocationTargetException, get cause and stack trace.

### Why are the changes needed?
Now when UDF reflection fails, InvocationTargetException is thrown, but it 
is not a specific exception.
```
Error in query: No handler for Hive UDF 'XXX': 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
manual test

Closes #33796 from cxzl25/SPARK-36550.

Authored-by: sychen 
Signed-off-by: Sean Owen 
---
 .../main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala  | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
index 7cbaa8a..56818b5 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.hive
 
+import java.lang.reflect.InvocationTargetException
 import java.util.Locale
 
 import scala.util.{Failure, Success, Try}
@@ -87,7 +88,11 @@ private[sql] class HiveSessionCatalog(
 udfExpr.get.asInstanceOf[HiveGenericUDTF].elementSchema
   }
 } catch {
-  case NonFatal(e) =>
+  case NonFatal(exception) =>
+val e = exception match {
+  case i: InvocationTargetException => i.getCause
+  case o => o
+}
 val errorMsg = s"No handler for UDF/UDAF/UDTF 
'${clazz.getCanonicalName}': $e"
 val analysisException = new AnalysisException(errorMsg)
 analysisException.setStackTrace(e.getStackTrace)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0e23bd7 -> bbd1318)

2021-09-29 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0e23bd7  [SPARK-34980][SQL] Support coalesce partition through union 
in AQE
 add bbd1318  [SPARK-36889][SQL] Respect `spark.sql.parquet.filterPushdown` 
by v2 parquet scan builder

No new revisions were added by this update.

Summary of changes:
 .../v2/parquet/ParquetScanBuilder.scala| 44 --
 .../datasources/parquet/ParquetFilterSuite.scala   | 13 +++
 2 files changed, 37 insertions(+), 20 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ca1c09d -> 0e23bd7)

2021-09-29 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ca1c09d  [SPARK-36882][PYTHON] Support ILIKE API on Python
 add 0e23bd7  [SPARK-34980][SQL] Support coalesce partition through union 
in AQE

No new revisions were added by this update.

Summary of changes:
 .../adaptive/CoalesceShufflePartitions.scala   | 143 ++---
 .../execution/CoalesceShufflePartitionsSuite.scala |   6 +-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  85 +++-
 3 files changed, 181 insertions(+), 53 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36882][PYTHON] Support ILIKE API on Python

2021-09-29 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca1c09d  [SPARK-36882][PYTHON] Support ILIKE API on Python
ca1c09d is described below

commit ca1c09d88c21d0f8664df8e852778f864f130d94
Author: Leona Yoda 
AuthorDate: Wed Sep 29 15:04:03 2021 +0900

[SPARK-36882][PYTHON] Support ILIKE API on Python

### What changes were proposed in this pull request?

Support ILIKE (case insensitive LIKE) API on Python.

### Why are the changes needed?

ILIKE statement on SQL interface is supported by SPARK-36674.
This PR will support Python API for it.

### Does this PR introduce _any_ user-facing change?

Yes. Users can call `ilike` from Python.

### How was this patch tested?

Unit tests.

Closes #34135 from yoda-mon/python-ilike.

Authored-by: Leona Yoda 
Signed-off-by: Kousuke Saruta 
---
 python/docs/source/reference/pyspark.sql.rst |  1 +
 python/pyspark/sql/column.py | 21 +
 python/pyspark/sql/tests/test_column.py  |  2 +-
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/python/docs/source/reference/pyspark.sql.rst 
b/python/docs/source/reference/pyspark.sql.rst
index f5a8357..0fd2c4d 100644
--- a/python/docs/source/reference/pyspark.sql.rst
+++ b/python/docs/source/reference/pyspark.sql.rst
@@ -259,6 +259,7 @@ Column APIs
 Column.eqNullSafe
 Column.getField
 Column.getItem
+Column.ilike
 Column.isNotNull
 Column.isNull
 Column.isin
diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py
index 9046e7f..c46b0eb 100644
--- a/python/pyspark/sql/column.py
+++ b/python/pyspark/sql/column.py
@@ -507,6 +507,26 @@ class Column(object):
 >>> df.filter(df.name.like('Al%')).collect()
 [Row(age=2, name='Alice')]
 """
+_ilike_doc = """
+SQL ILIKE expression (case insensitive LIKE). Returns a boolean 
:class:`Column`
+based on a case insensitive match.
+
+.. versionadded:: 3.3.0
+
+Parameters
+--
+other : str
+a SQL LIKE pattern
+
+See Also
+
+pyspark.sql.Column.rlike
+
+Examples
+
+>>> df.filter(df.name.ilike('%Ice')).collect()
+[Row(age=2, name='Alice')]
+"""
 _startswith_doc = """
 String starts with. Returns a boolean :class:`Column` based on a string 
match.
 
@@ -541,6 +561,7 @@ class Column(object):
 contains = _bin_op("contains", _contains_doc)
 rlike = _bin_op("rlike", _rlike_doc)
 like = _bin_op("like", _like_doc)
+ilike = _bin_op("ilike", _ilike_doc)
 startswith = _bin_op("startsWith", _startswith_doc)
 endswith = _bin_op("endsWith", _endswith_doc)
 
diff --git a/python/pyspark/sql/tests/test_column.py 
b/python/pyspark/sql/tests/test_column.py
index c2530b2..9a918c2 100644
--- a/python/pyspark/sql/tests/test_column.py
+++ b/python/pyspark/sql/tests/test_column.py
@@ -75,7 +75,7 @@ class ColumnTests(ReusedSQLTestCase):
 self.assertTrue(all(isinstance(c, Column) for c in cb))
 cbool = (ci & ci), (ci | ci), (~ci)
 self.assertTrue(all(isinstance(c, Column) for c in cbool))
-css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.asc(), 
cs.desc(),\
+css = cs.contains('a'), cs.like('a'), cs.rlike('a'), cs.ilike('A'), 
cs.asc(), cs.desc(),\
 cs.startswith('a'), cs.endswith('a'), ci.eqNullSafe(cs)
 self.assertTrue(all(isinstance(c, Column) for c in css))
 self.assertTrue(isinstance(ci.cast(LongType()), Column))

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org