from:"dongjoon"

(spark) branch master updated: [SPARK-46009][SQL][FOLLOWUP] Remove unused PERCENTILE_CONT and PERCENTILE_DISC in g4

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ad63eef20617 [SPARK-46009][SQL][FOLLOWUP] Remove unused 
PERCENTILE_CONT and PERCENTILE_DISC in g4
ad63eef20617 is described below

commit ad63eef20617db7cdecce465af54e4787d0deeac
Author: beliefer 
AuthorDate: Wed May 1 11:25:54 2024 -0700

[SPARK-46009][SQL][FOLLOWUP] Remove unused PERCENTILE_CONT and 
PERCENTILE_DISC in g4

### What changes were proposed in this pull request?
This PR propose to remove unused `PERCENTILE_CONT` and `PERCENTILE_DISC` in 
g4

### Why are the changes needed?
https://github.com/apache/spark/pull/43910 merged the parse rule of 
`PercentileCont` and `PercentileDisc` into `functionCall`, but forgot to remove 
unused `PERCENTILE_CONT` and `PERCENTILE_DISC` in g4.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
GA.

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #46272 from beliefer/SPARK-46009_followup2.

Authored-by: beliefer 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-ref-ansi-compliance.md|   2 -
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |   2 -
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |   2 -
 .../sql-tests/analyzer-results/window2.sql.out | 126 +
 .../sql-tests/results/ansi/keywords.sql.out|   4 -
 .../resources/sql-tests/results/keywords.sql.out   |   2 -
 .../ThriftServerWithSparkContextSuite.scala|   2 +-
 7 files changed, 127 insertions(+), 13 deletions(-)

diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 011bd671ca1f..84416ffd5f83 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -608,8 +608,6 @@ Below is a list of all the keywords in Spark SQL.
 |PARTITIONED|non-reserved|non-reserved|non-reserved|
 |PARTITIONS|non-reserved|non-reserved|non-reserved|
 |PERCENT|non-reserved|non-reserved|non-reserved|
-|PERCENTILE_CONT|reserved|non-reserved|non-reserved|
-|PERCENTILE_DISC|reserved|non-reserved|non-reserved|
 |PIVOT|non-reserved|non-reserved|non-reserved|
 |PLACING|non-reserved|non-reserved|non-reserved|
 |POSITION|non-reserved|non-reserved|reserved|
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index 83e40c4a20a2..86e16af7ff10 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -298,8 +298,6 @@ OVERWRITE: 'OVERWRITE';
 PARTITION: 'PARTITION';
 PARTITIONED: 'PARTITIONED';
 PARTITIONS: 'PARTITIONS';
-PERCENTILE_CONT: 'PERCENTILE_CONT';
-PERCENTILE_DISC: 'PERCENTILE_DISC';
 PERCENTLIT: 'PERCENT';
 PIVOT: 'PIVOT';
 PLACING: 'PLACING';
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index 71bd75f934ca..653224c5475f 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -1829,8 +1829,6 @@ nonReserved
 | PARTITION
 | PARTITIONED
 | PARTITIONS
-| PERCENTILE_CONT
-| PERCENTILE_DISC
 | PERCENTLIT
 | PIVOT
 | PLACING
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out
new file mode 100644
index ..6fd41286959a
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/window2.sql.out
@@ -0,0 +1,126 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(null, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "a"),
+(1, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "a"),
+(1, 2L, 2.5D, date("2017-08-02"), timestamp_seconds(150200), "a"),
+(2, 2147483650L, 100.001D, date("2020-12-31"), timestamp_seconds(1609372800), 
"a"),
+(1, null, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), "b"),
+(2, 3L, 3.3D, date("2017-08-03"), timestamp_seconds(150300), "b"),
+(3, 2147483650L, 100.001D, date("2020-12-31"), timestamp_seconds(1609372800), 
"b"),
+(null, null, null, null, null, null),
+(3, 1L, 1.0D, date("2017-08-01"), timestamp_seconds(1501545600), null)
+AS testData(val, val_long, val_double, val_date, val_times

(spark) branch branch-3.5 updated: Revert "[SPARK-48016][SQL] Fix a bug in try_divide function when with decimals"

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new fc0ef07f2949 Revert "[SPARK-48016][SQL] Fix a bug in try_divide 
function when with decimals"
fc0ef07f2949 is described below

commit fc0ef07f2949c399537c6d9b5fb7b81f546de212
Author: Dongjoon Hyun 
AuthorDate: Wed May 1 11:18:29 2024 -0700

Revert "[SPARK-48016][SQL] Fix a bug in try_divide function when with 
decimals"

This reverts commit e78ee2c5770218a521340cb84f57a02dd00f7f3a.
---
 .../sql/catalyst/analysis/DecimalPrecision.scala   | 14 ++---
 .../spark/sql/catalyst/analysis/TypeCoercion.scala | 10 ++--
 sql/core/src/test/resources/log4j2.properties  |  2 +-
 .../analyzer-results/ansi/try_arithmetic.sql.out   | 56 ---
 .../analyzer-results/try_arithmetic.sql.out| 56 ---
 .../resources/sql-tests/inputs/try_arithmetic.sql  |  8 ---
 .../sql-tests/results/ansi/try_arithmetic.sql.out  | 64 --
 .../sql-tests/results/try_arithmetic.sql.out   | 64 --
 8 files changed, 13 insertions(+), 261 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
index f51127f53b38..09cf61a77955 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala
@@ -83,7 +83,7 @@ object DecimalPrecision extends TypeCoercionRule {
   val resultType = widerDecimalType(p1, s1, p2, s2)
   val newE1 = if (e1.dataType == resultType) e1 else Cast(e1, resultType)
   val newE2 = if (e2.dataType == resultType) e2 else Cast(e2, resultType)
-  b.withNewChildren(Seq(newE1, newE2))
+  b.makeCopy(Array(newE1, newE2))
   }
 
   /**
@@ -202,21 +202,21 @@ object DecimalPrecision extends TypeCoercionRule {
 case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
 l.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.withNewChildren(Seq(Cast(l, DataTypeUtils.fromLiteral(l)), r))
+  b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r))
 case (l, r: Literal) if l.dataType.isInstanceOf[DecimalType] &&
 r.dataType.isInstanceOf[IntegralType] &&
 literalPickMinimumPrecision =>
-  b.withNewChildren(Seq(l, Cast(r, DataTypeUtils.fromLiteral(r
+  b.makeCopy(Array(l, Cast(r, DataTypeUtils.fromLiteral(r
 // Promote integers inside a binary expression with fixed-precision 
decimals to decimals,
 // and fixed-precision decimals in an expression with floats / doubles 
to doubles
 case (l @ IntegralTypeExpression(), r @ DecimalExpression(_, _)) =>
-  b.withNewChildren(Seq(Cast(l, DecimalType.forType(l.dataType)), r))
+  b.makeCopy(Array(Cast(l, DecimalType.forType(l.dataType)), r))
 case (l @ DecimalExpression(_, _), r @ IntegralTypeExpression()) =>
-  b.withNewChildren(Seq(l, Cast(r, DecimalType.forType(r.dataType
+  b.makeCopy(Array(l, Cast(r, DecimalType.forType(r.dataType
 case (l, r @ DecimalExpression(_, _)) if isFloat(l.dataType) =>
-  b.withNewChildren(Seq(l, Cast(r, DoubleType)))
+  b.makeCopy(Array(l, Cast(r, DoubleType)))
 case (l @ DecimalExpression(_, _), r) if isFloat(r.dataType) =>
-  b.withNewChildren(Seq(Cast(l, DoubleType), r))
+  b.makeCopy(Array(Cast(l, DoubleType), r))
 case _ => b
   }
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
index c9a4a2d40246..190e72a8e669 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
@@ -1102,22 +1102,22 @@ object TypeCoercion extends TypeCoercionBase {
 
   case a @ BinaryArithmetic(left @ StringTypeExpression(), right)
 if right.dataType != CalendarIntervalType =>
-a.withNewChildren(Seq(Cast(left, DoubleType), right))
+a.makeCopy(Array(Cast(left, DoubleType), right))
   case a @ BinaryArithmetic(left, right @ StringTypeExpression())
 if left.dataType != CalendarIntervalType =>
-a.withNewChildren(Seq(left, Cast(right, DoubleType)))
+a.makeCopy(Array(left, Cast(right, DoubleType)))
 
   // For equality between string and timestamp we cast the string to a 
timestam

(spark) branch branch-3.4 updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 70ce67cc77cc [SPARK-48068][PYTHON] `mypy` should have 
`--python-executable` parameter
70ce67cc77cc is described below

commit 70ce67cc77ccce3a4509bba608dbab69b45cc2b9
Author: Dongjoon Hyun 
AuthorDate: Wed May 1 10:42:26 2024 -0700

[SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s 
`PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the 
following. That's not always guaranteed. We need to use `mypy`'s parameter to 
make it sure.

https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python 
installation like our CI environment.
```
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered 
by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--python-executable=python3.11 --namespace-packages --config-file 
python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 26c871f180306fbf86ce65f14f8e7a71f89885ed)
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/lint-python b/dev/lint-python
index b5ee63e38690..9b60ca75eb9b 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -69,6 +69,7 @@ function mypy_annotation_test {
 
 echo "starting mypy annotations test..."
 MYPY_REPORT=$( ($MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --cache-dir /tmp/.mypy_cache/ \
@@ -128,6 +129,7 @@ function mypy_examples_test {
 echo "starting mypy examples test..."
 
 MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --exclude "mllib/*" \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 953d7f90c6db [SPARK-48068][PYTHON] `mypy` should have 
`--python-executable` parameter
953d7f90c6db is described below

commit 953d7f90c6dbee597b0360c551dfac2a1d87d961
Author: Dongjoon Hyun 
AuthorDate: Wed May 1 10:42:26 2024 -0700

[SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s 
`PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the 
following. That's not always guaranteed. We need to use `mypy`'s parameter to 
make it sure.

https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python 
installation like our CI environment.
```
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered 
by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--python-executable=python3.11 --namespace-packages --config-file 
python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 26c871f180306fbf86ce65f14f8e7a71f89885ed)
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/lint-python b/dev/lint-python
index d040493c86c4..7ccd32451acc 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -118,6 +118,7 @@ function mypy_annotation_test {
 
 echo "starting mypy annotations test..."
 MYPY_REPORT=$( ($MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --cache-dir /tmp/.mypy_cache/ \
@@ -177,6 +178,7 @@ function mypy_examples_test {
 echo "starting mypy examples test..."
 
 MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --exclude "mllib/*" \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 26c871f18030 [SPARK-48068][PYTHON] `mypy` should have 
`--python-executable` parameter
26c871f18030 is described below

commit 26c871f180306fbf86ce65f14f8e7a71f89885ed
Author: Dongjoon Hyun 
AuthorDate: Wed May 1 10:42:26 2024 -0700

[SPARK-48068][PYTHON] `mypy` should have `--python-executable` parameter

### What changes were proposed in this pull request?

This PR aims to fix `mypy` failure by propagating `lint-python`'s 
`PYTHON_EXECUTABLE` to `mypy`'s parameter correctly.

### Why are the changes needed?

We assumed that `PYTHON_EXECUTABLE` is used for `dev/lint-python` like the 
following. That's not always guaranteed. We need to use `mypy`'s parameter to 
make it sure.

https://github.com/apache/spark/blob/ff401dde50343c9bbc1c49a0294272f2da7d01e2/.github/workflows/build_and_test.yml#L705

This patch is useful whose `python3` chooses one of multiple Python 
installation like our CI environment.
```
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8905641334 bash
WARNING: The requested image's platform (linux/amd64) does not match the 
detected host platform (linux/arm64/v8) and no specific platform was requested
root2ef6ce08d2c4:/# python3 --version
Python 3.10.12
root2ef6ce08d2c4:/# python3.9 --version
Python 3.9.19
```

For example, the following shows that `PYTHON_EXECUTABLE` is not considered 
by `mypy`.
```
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--python-executable=python3.11 --namespace-packages --config-file 
python/mypy.ini python/pyspark | wc -l
3428
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.9 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
root18c8eae5791e:/spark# PYTHON_EXECUTABLE=python3.11 mypy 
--namespace-packages --config-file python/mypy.ini python/pyspark | wc -l
1
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46314 from dongjoon-hyun/SPARK-48068.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/dev/lint-python b/dev/lint-python
index 6bd843103bd7..b8703310bc4b 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -125,6 +125,7 @@ function mypy_annotation_test {
 
 echo "starting mypy annotations test..."
 MYPY_REPORT=$( ($MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --cache-dir /tmp/.mypy_cache/ \
@@ -184,6 +185,7 @@ function mypy_examples_test {
 echo "starting mypy examples test..."
 
 MYPY_REPORT=$( (MYPYPATH=python $MYPY_BUILD \
+  --python-executable $PYTHON_EXECUTABLE \
   --namespace-packages \
   --config-file python/mypy.ini \
   --exclude "mllib/*" \


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48069][INFRA] Handle `PEP-632` by checking `ModuleNotFoundError` on `setuptools` in Python 3.12

2024-05-01 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff401dde5034 [SPARK-48069][INFRA] Handle `PEP-632` by checking 
`ModuleNotFoundError` on `setuptools` in Python 3.12
ff401dde5034 is described below

commit ff401dde50343c9bbc1c49a0294272f2da7d01e2
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 30 23:54:06 2024 -0700

[SPARK-48069][INFRA] Handle `PEP-632` by checking `ModuleNotFoundError` on 
`setuptools` in Python 3.12

### What changes were proposed in this pull request?

This PR aims to handle `PEP-632` by checking `ModuleNotFoundError` on 
`setuptools`.
- [PEP 632 – Deprecate distutils module](https://peps.python.org/pep-0632/)

### Why are the changes needed?

Use `Python 3.12`.
```
$ python3 --version
Python 3.12.2
```

**BEFORE**
```
$ dev/lint-python --mypy | grep ModuleNotFoundError
Traceback (most recent call last):
  File "", line 1, in 
ModuleNotFoundError: No module named 'setuptools'
```

**AFTER**
```
$ dev/lint-python --mypy | grep ModuleNotFoundError
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46315 from dongjoon-hyun/SPARK-48069.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/dev/lint-python b/dev/lint-python
index 8d587bd52aca..6bd843103bd7 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -84,7 +84,10 @@ function satisfies_min_version {
 local expected_version="$2"
 echo "$(
 "$PYTHON_EXECUTABLE" << EOM
-from setuptools.extern.packaging import version
+try:
+from setuptools.extern.packaging import version
+except ModuleNotFoundError:
+from packaging import version
 print(version.parse('$provided_version') >= version.parse('$expected_version'))
 EOM
 )"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden file

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 65cf5b18648a [SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden 
file
65cf5b18648a is described below

commit 65cf5b18648a81fc9b0787d03f23f7465c20f3ec
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 30 22:42:02 2024 -0700

[SPARK-48016][SQL][TESTS][FOLLOWUP] Update Java 21 golden file

### What changes were proposed in this pull request?

This is a follow-up of SPARK-48016 to update the missed Java 21 golden file.
- #46286

### Why are the changes needed?

To recover Java 21 CIs:
- https://github.com/apache/spark/actions/workflows/build_java21.yml
- https://github.com/apache/spark/actions/workflows/build_maven_java21.yml
- 
https://github.com/apache/spark/actions/workflows/build_maven_java21_macos14.yml

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests. I regenerated all in Java 21 and this was the only one 
affected.
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly 
org.apache.spark.sql.SQLQueryTestSuite"
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46313 from dongjoon-hyun/SPARK-48016.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../results/try_arithmetic.sql.out.java21  | 64 ++
 1 file changed, 64 insertions(+)

diff --git 
a/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21 
b/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21
index dcdb9d0dcb19..002a0dfcf37e 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21
+++ 
b/sql/core/src/test/resources/sql-tests/results/try_arithmetic.sql.out.java21
@@ -15,6 +15,22 @@ struct
 NULL
 
 
+-- !query
+SELECT try_add(2147483647, decimal(1))
+-- !query schema
+struct
+-- !query output
+2147483648
+
+
+-- !query
+SELECT try_add(2147483647, "1")
+-- !query schema
+struct
+-- !query output
+2.147483648E9
+
+
 -- !query
 SELECT try_add(-2147483648, -1)
 -- !query schema
@@ -249,6 +265,22 @@ struct
 NULL
 
 
+-- !query
+SELECT try_divide(1, decimal(0))
+-- !query schema
+struct
+-- !query output
+NULL
+
+
+-- !query
+SELECT try_divide(1, "0")
+-- !query schema
+struct
+-- !query output
+NULL
+
+
 -- !query
 SELECT try_divide(interval 2 year, 2)
 -- !query schema
@@ -313,6 +345,22 @@ struct
 NULL
 
 
+-- !query
+SELECT try_subtract(2147483647, decimal(-1))
+-- !query schema
+struct
+-- !query output
+2147483648
+
+
+-- !query
+SELECT try_subtract(2147483647, "-1")
+-- !query schema
+struct
+-- !query output
+2.147483648E9
+
+
 -- !query
 SELECT try_subtract(-2147483648, 1)
 -- !query schema
@@ -409,6 +457,22 @@ struct
 NULL
 
 
+-- !query
+SELECT try_multiply(2147483647, decimal(-2))
+-- !query schema
+struct
+-- !query output
+-4294967294
+
+
+-- !query
+SELECT try_multiply(2147483647, "-2")
+-- !query schema
+struct
+-- !query output
+-4.294967294E9
+
+
 -- !query
 SELECT try_multiply(-2147483648, 2)
 -- !query schema


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 02206cd66dbf [SPARK-48047][SQL] Reduce memory pressure of empty 
TreeNode tags
02206cd66dbf is described below

commit 02206cd66dbfc8de602a685b032f1805bcf8e36f
Author: Nick Young 
AuthorDate: Tue Apr 30 22:07:20 2024 -0700

[SPARK-48047][SQL] Reduce memory pressure of empty TreeNode tags

### What changes were proposed in this pull request?

- Changed the `tags` variable of the `TreeNode` class to initialize lazily. 
This will reduce unnecessary driver memory pressure.

### Why are the changes needed?

- Plans with large expression or operator trees are known to cause driver 
memory pressure; this is one step in alleviating that issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UT covers behavior. Outwards facing behavior does not change.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46285 from n-young-db/treenode-tags.

Authored-by: Nick Young 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/catalyst/trees/TreeNode.scala | 24 ++
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index 94e893d468b3..dd39f3182bfb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -78,8 +78,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
   /**
* A mutable map for holding auxiliary information of this tree node. It 
will be carried over
* when this node is copied via `makeCopy`, or transformed via 
`transformUp`/`transformDown`.
+   * We lazily evaluate the `tags` since the default size of a `mutable.Map` 
is nonzero. This
+   * will reduce unnecessary memory pressure.
*/
-  private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
+  private[this] var _tags: mutable.Map[TreeNodeTag[_], Any] = null
+  private def tags: mutable.Map[TreeNodeTag[_], Any] = {
+if (_tags eq null) {
+  _tags = mutable.Map.empty
+}
+_tags
+  }
 
   /**
* Default tree pattern [[BitSet] for a [[TreeNode]].
@@ -147,11 +155,13 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
 ineffectiveRules.get(ruleId.id)
   }
 
+  def isTagsEmpty: Boolean = (_tags eq null) || _tags.isEmpty
+
   def copyTagsFrom(other: BaseType): Unit = {
 // SPARK-32753: it only makes sense to copy tags to a new node
 // but it's too expensive to detect other cases likes node removal
 // so we make a compromise here to copy tags to node with no tags
-if (tags.isEmpty) {
+if (isTagsEmpty && !other.isTagsEmpty) {
   tags ++= other.tags
 }
   }
@@ -161,11 +171,17 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]
   }
 
   def getTagValue[T](tag: TreeNodeTag[T]): Option[T] = {
-tags.get(tag).map(_.asInstanceOf[T])
+if (isTagsEmpty) {
+  None
+} else {
+  tags.get(tag).map(_.asInstanceOf[T])
+}
   }
 
   def unsetTagValue[T](tag: TreeNodeTag[T]): Unit = {
-tags -= tag
+if (!isTagsEmpty) {
+  tags -= tag
+}
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48063][CORE] Enable `spark.stage.ignoreDecommissionFetchFailure` by default

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f3cc8f930383 [SPARK-48063][CORE] Enable 
`spark.stage.ignoreDecommissionFetchFailure` by default
f3cc8f930383 is described below

commit f3cc8f930383659b9f99e56b38de4b97d588e20b
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 30 15:19:00 2024 -0700

[SPARK-48063][CORE] Enable `spark.stage.ignoreDecommissionFetchFailure` by 
default

### What changes were proposed in this pull request?

This PR aims to **enable `spark.stage.ignoreDecommissionFetchFailure` by 
default** while keeping 
`spark.scheduler.maxRetainedRemovedDecommissionExecutors=0` without any change 
for Apache Spark 4.0.0 in order to help a user use this feature more easily by 
setting only one configuration, 
`spark.scheduler.maxRetainedRemovedDecommissionExecutors`.

### Why are the changes needed?

This feature was added at Apache Spark 3.4.0 via SPARK-40481 and 
SPARK-40979 and has been used for two years to support executor decommissioning 
features in the production.
- #37924
- #38441

### Does this PR introduce _any_ user-facing change?

No because `spark.scheduler.maxRetainedRemovedDecommissionExecutors` is 
still `0`.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46308 from dongjoon-hyun/SPARK-48063.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/internal/config/package.scala | 2 +-
 docs/configuration.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index b2cbb6f6deb6..2e207422ae06 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -2403,7 +2403,7 @@ package object config {
 s"count ${STAGE_MAX_CONSECUTIVE_ATTEMPTS.key}")
   .version("3.4.0")
   .booleanConf
-  .createWithDefault(false)
+  .createWithDefault(true)
 
   private[spark] val SCHEDULER_MAX_RETAINED_REMOVED_EXECUTORS =
 ConfigBuilder("spark.scheduler.maxRetainedRemovedDecommissionExecutors")
diff --git a/docs/configuration.md b/docs/configuration.md
index d5e2a569fdea..2e612ffd9ab9 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -3072,7 +3072,7 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
   spark.stage.ignoreDecommissionFetchFailure
-  false
+  true
   
 Whether ignore stage fetch failure caused by executor decommission when
 count spark.stage.maxConsecutiveAttempts


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48060][SS][TESTS] Fix `StreamingQueryHashPartitionVerifySuite` to update golden files correctly

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new faab553cac70 [SPARK-48060][SS][TESTS] Fix 
`StreamingQueryHashPartitionVerifySuite` to update golden files correctly
faab553cac70 is described below

commit faab553cac70eefeec286b1823b70ad62bed87f8
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 30 12:50:07 2024 -0700

[SPARK-48060][SS][TESTS] Fix `StreamingQueryHashPartitionVerifySuite` to 
update golden files correctly

### What changes were proposed in this pull request?

This PR aims to fix `StreamingQueryHashPartitionVerifySuite` to update 
golden files correctly.
- The documentation is added.
- Newly generated files are updated.

### Why are the changes needed?

Previously, `SPARK_GENERATE_GOLDEN_FILES` doesn't work as expected because 
it updates the files under `target` directory. We need to update `src/test` 
files.

**BEFORE**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly 
*StreamingQueryHashPartitionVerifySuite"

$ git status
On branch master
Your branch is up to date with 'apache/master'.

nothing to commit, working tree clean
```

**AFTER**
```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly 
*StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4

$ git status
On branch SPARK-48060
Your branch is up to date with 'dongjoon/SPARK-48060'.

Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
modified:   
sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
modified:   
sql/core/src/test/resources/structured-streaming/partition-tests/rowsAndPartIds

no changes added to commit (use "git add" and/or "git commit -a")
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. I regenerate the data like the following.

```
$ SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly 
*StreamingQueryHashPartitionVerifySuite" \
-Dspark.sql.test.randomDataGenerator.maxStrLen=100 \
-Dspark.sql.test.randomDataGenerator.maxArraySize=4
```

### Was this patch authored or co-authored using generative AI tooling?
No.

    Closes #46304 from dongjoon-hyun/SPARK-48060.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../partition-tests/randomSchemas  |   2 +-
 .../partition-tests/rowsAndPartIds | Bin 4862115 -> 13341426 
bytes
 .../StreamingQueryHashPartitionVerifySuite.scala   |  22 +++--
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git 
a/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
 
b/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
index 8d6ff942610c..f6eadd776cc6 100644
--- 
a/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
+++ 
b/sql/core/src/test/resources/structured-streaming/partition-tests/randomSchemas
@@ -1 +1 @@
-col_0 STRUCT NOT 
NULL, col_3: FLOAT NOT NULL, col_4: INT NOT NULL>,col_1 STRUCT, col_3: 
ARRAY NOT NULL, col_4: ARRAY, col_5: TIMESTAMP NOT NULL, col_6: 
STRUCT, col_1: BIGINT NOT NULL> NOT NULL, col_7: 
ARRAY NOT NULL, col_8: ARRAY, col_9: BIGINT NOT NULL> NOT 
NULL,col_2 BIGINT NOT NULL,col_3 STRUCT,col_1 STRUCT NOT NULL,col_2 STRING NOT 
NULL,col_3 STRUCT, col_2: ARRAY NOT 
NULL> NOT NULL,col_4 BINARY NOT NULL,col_5 ARRAY NOT NULL,col_6 
ARRAY,col_7 DOUBLE NOT NULL,col_8 ARRAY NOT NULL,col_9 
ARRAY,col_10 FLOAT NOT NULL,col_11 STRUCT NOT NULL>, col_1: STRUCT NOT NULL, col_1: 
INT, col_2: STRUCT

(spark) branch master updated: [SPARK-48057][PYTHON][CONNECT][TESTS] Enable `GroupedApplyInPandasTests.test_grouped_with_empty_partition`

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dab20b31388b [SPARK-48057][PYTHON][CONNECT][TESTS] Enable 
`GroupedApplyInPandasTests.test_grouped_with_empty_partition`
dab20b31388b is described below

commit dab20b31388ba7bcd2ab4d4424cbbd072bf84c30
Author: Ruifeng Zheng 
AuthorDate: Tue Apr 30 12:19:18 2024 -0700

[SPARK-48057][PYTHON][CONNECT][TESTS] Enable 
`GroupedApplyInPandasTests.test_grouped_with_empty_partition`

### What changes were proposed in this pull request?
Enable `GroupedApplyInPandasTests. test_grouped_with_empty_partition`

### Why are the changes needed?
test coverage

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46299 from zhengruifeng/fix_test_grouped_with_empty_partition.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py | 4 
 python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py | 4 ++--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py 
b/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py
index 1cc4ce012623..8a1da440c799 100644
--- a/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py
+++ b/python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map.py
@@ -38,10 +38,6 @@ class 
GroupedApplyInPandasTests(GroupedApplyInPandasTestsMixin, ReusedConnectTes
 def test_apply_in_pandas_returning_incompatible_type(self):
 super().test_apply_in_pandas_returning_incompatible_type()
 
-@unittest.skip("Spark Connect doesn't support RDD but the test depends on 
it.")
-def test_grouped_with_empty_partition(self):
-super().test_grouped_with_empty_partition()
-
 
 if __name__ == "__main__":
 from pyspark.sql.tests.connect.test_parity_pandas_grouped_map import *  # 
noqa: F401
diff --git a/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py 
b/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py
index f43dafc0a4a1..1e86e12eb74f 100644
--- a/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py
+++ b/python/pyspark/sql/tests/pandas/test_pandas_grouped_map.py
@@ -680,13 +680,13 @@ class GroupedApplyInPandasTestsMixin:
 data = [Row(id=1, x=2), Row(id=1, x=3), Row(id=2, x=4)]
 expected = [Row(id=1, x=5), Row(id=1, x=5), Row(id=2, x=4)]
 num_parts = len(data) + 1
-df = self.spark.createDataFrame(self.sc.parallelize(data, 
numSlices=num_parts))
+df = self.spark.createDataFrame(data).repartition(num_parts)
 
 f = pandas_udf(
 lambda pdf: pdf.assign(x=pdf["x"].sum()), "id long, x int", 
PandasUDFType.GROUPED_MAP
 )
 
-result = df.groupBy("id").apply(f).collect()
+result = df.groupBy("id").apply(f).sort("id").collect()
 self.assertEqual(result, expected)
 
 def test_grouped_over_window(self):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (0329479acb67 -> 9caa6f7f8b8e)

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0329479acb67 [SPARK-47359][SQL] Support TRANSLATE function to work 
with collated strings
 add 9caa6f7f8b8e [SPARK-48061][SQL][TESTS] Parameterize max limits of 
`spark.sql.test.randomDataGenerator`

No new revisions were added by this update.

Summary of changes:
 .../test/scala/org/apache/spark/sql/RandomDataGenerator.scala| 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-30 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9e8c4aa3f43a [SPARK-46122][SQL] Set 
`spark.sql.legacy.createHiveTableByDefault` to `false` by default
9e8c4aa3f43a is described below

commit 9e8c4aa3f43a3d99bff56cca319db623abc473ee
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 30 01:44:37 2024 -0700

[SPARK-46122][SQL] Set `spark.sql.legacy.createHiveTableByDefault` to 
`false` by default

### What changes were proposed in this pull request?

This PR aims to switch `spark.sql.legacy.createHiveTableByDefault` to 
`false` by default in order to move away from this legacy behavior from `Apache 
Spark 4.0.0` while the legacy functionality will be preserved during Apache 
Spark 4.x period by setting `spark.sql.legacy.createHiveTableByDefault=true`.

### Why are the changes needed?

Historically, this behavior change was merged at `Apache Spark 3.0.0` 
activity in SPARK-30098 and reverted officially during the `3.0.0 RC` period.

- 2019-12-06: #26736 (58be82a)
- 2019-12-06: 
https://lists.apache.org/thread/g90dz1og1zt4rr5h091rn1zqo50y759j
- 2020-05-16: #28517

At `Apache Spark 3.1.0`, we had another discussion and defined it as 
`Legacy` behavior via a new configuration by reusing the JIRA ID, SPARK-30098.
- 2020-12-01: 
https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
- 2020-12-03: #30554

Last year, this was proposed again twice and `Apache Spark 4.0.0` is a good 
time to make a decision for Apache Spark future direction.
- SPARK-42603 on 2023-02-27 as an independent idea.
- SPARK-46122 on 2023-11-27 as a part of Apache Spark 4.0.0 idea

### Does this PR introduce _any_ user-facing change?

Yes, the migration document is updated.

### How was this patch tested?

Pass the CIs with the adjusted test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46207 from dongjoon-hyun/SPARK-46122.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-migration-guide.md   | 1 +
 python/pyspark/sql/tests/test_readwriter.py   | 5 ++---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala| 2 +-
 .../apache/spark/sql/execution/command/PlanResolutionSuite.scala  | 8 +++-
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 1e0fdadde1e3..07562babc87d 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -25,6 +25,7 @@ license: |
 ## Upgrading from Spark SQL 3.5 to 4.0
 
 - Since Spark 4.0, `spark.sql.ansi.enabled` is on by default. To restore the 
previous behavior, set `spark.sql.ansi.enabled` to `false` or 
`SPARK_ANSI_SQL_MODE` to `false`.
+- Since Spark 4.0, `CREATE TABLE` syntax without `USING` and `STORED AS` will 
use the value of `spark.sql.sources.default` as the table provider instead of 
`Hive`. To restore the previous behavior, set 
`spark.sql.legacy.createHiveTableByDefault` to `true`.
 - Since Spark 4.0, the default behaviour when inserting elements in a map is 
changed to first normalize keys -0.0 to 0.0. The affected SQL functions are 
`create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To 
restore the previous behaviour, set 
`spark.sql.legacy.disableMapKeyNormalization` to `true`.
 - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is 
changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set 
`spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`).
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL 
configs 
`spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` 
instead of the core config 
`spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
diff --git a/python/pyspark/sql/tests/test_readwriter.py 
b/python/pyspark/sql/tests/test_readwriter.py
index 5784d2c72973..e752856d0316 100644
--- a/python/pyspark/sql/tests/test_readwriter.py
+++ b/python/pyspark/sql/tests/test_readwriter.py
@@ -247,10 +247,9 @@ class ReadwriterV2TestsMixin:
 
 def test_create_without_provider(self):
 df = self.df
-with self.assertRaisesRegex(
-AnalysisException, "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT"
-):
+with self.table("test_table"):
 df.writeTo("test_table").create()
+self.assertEqual(100, self.spark.sql("select * from 
test_table").count())
 
 def test_table_overwrite(self):
 df = self.df
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.

(spark) branch master updated: [SPARK-48042][SQL] Use a timestamp formatter with timezone at class level instead of making copies at method level

2024-04-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c9ed9dfccb72 [SPARK-48042][SQL] Use a timestamp formatter with 
timezone at class level instead of making copies at method level
c9ed9dfccb72 is described below

commit c9ed9dfccb72bc8d30557dcd2809c298a75c3f69
Author: Kent Yao 
AuthorDate: Mon Apr 29 11:13:39 2024 -0700

[SPARK-48042][SQL] Use a timestamp formatter with timezone at class level 
instead of making copies at method level

### What changes were proposed in this pull request?

This PR creates a timestamp formatter with the timezone directly for 
formatting. Previously, we called `withZone` for every value in the `format` 
function. Because the original `zoneId` in the formatter is null and never 
equals the one we pass in, it creates new copies of the formatter over and over.

```java
...
 *
 * param zone  the new override zone, null if no override
 * return a formatter based on this formatter with the requested 
override zone, not null
 */
public DateTimeFormatter withZone(ZoneId zone) {
if (Objects.equals(this.zone, zone)) {
return this;
}
return new DateTimeFormatter(printerParser, locale, decimalStyle, 
resolverStyle, resolverFields, chrono, zone);
}
```

### Why are the changes needed?

improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

- Existing tests
- I also ran the DateTimeBenchmark result locally, there's no performance 
gain at least for these cases.

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46282 from yaooqinn/SPARK-48042.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/catalyst/util/TimestampFormatter.scala  | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
index d59b52a3818a..9f57f8375c54 100644
--- 
a/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
+++ 
b/sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala
@@ -162,6 +162,9 @@ class Iso8601TimestampFormatter(
   protected lazy val formatter: DateTimeFormatter =
 getOrCreateFormatter(pattern, locale, isParsing)
 
+  @transient
+  private lazy val zonedFormatter: DateTimeFormatter = 
formatter.withZone(zoneId)
+
   @transient
   protected lazy val legacyFormatter = TimestampFormatter.getLegacyFormatter(
 pattern, zoneId, locale, legacyFormat)
@@ -231,7 +234,7 @@ class Iso8601TimestampFormatter(
 
   override def format(instant: Instant): String = {
 try {
-  formatter.withZone(zoneId).format(instant)
+  zonedFormatter.format(instant)
 } catch checkFormattedDiff(toJavaTimestamp(instantToMicros(instant)),
   (t: Timestamp) => format(t))
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (f781d153a5e4 -> c35a21e5984f)

2024-04-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f781d153a5e4 [SPARK-48046][K8S] Remove `clock` parameter from 
`DriverServiceFeatureStep`
 add c35a21e5984f [SPARK-48044][PYTHON][CONNECT] Cache 
`DataFrame.isStreaming`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/dataframe.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (d42c10d9411d -> f781d153a5e4)

2024-04-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d42c10d9411d [SPARK-47693][TESTS][FOLLOWUP] Reduce CollationBenchmarks 
time
 add f781d153a5e4 [SPARK-48046][K8S] Remove `clock` parameter from 
`DriverServiceFeatureStep`

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala   | 4 +---
 .../spark/deploy/k8s/features/DriverServiceFeatureStepSuite.scala | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (ccb0eb699f7c -> d42c10d9411d)

2024-04-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ccb0eb699f7c [SPARK-48038][K8S] Promote driverServiceName to 
KubernetesDriverConf
 add d42c10d9411d [SPARK-47693][TESTS][FOLLOWUP] Reduce CollationBenchmarks 
time

No new revisions were added by this update.

Summary of changes:
 .../execution/benchmark/CollationBenchmark.scala   | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf

2024-04-29 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ccb0eb699f7c [SPARK-48038][K8S] Promote driverServiceName to 
KubernetesDriverConf
ccb0eb699f7c is described below

commit ccb0eb699f7c54aa3902d1ebbb34684693b563de
Author: Cheng Pan 
AuthorDate: Mon Apr 29 08:35:13 2024 -0700

[SPARK-48038][K8S] Promote driverServiceName to KubernetesDriverConf

### What changes were proposed in this pull request?

Promote `driverServiceName` from `DriverServiceFeatureStep` to 
`KubernetesDriverConf`.

### Why are the changes needed?

To allow other feature steps, e.g. ingress(proposed in SPARK-47954), to 
access `driverServiceName`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

UT has been updated.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46276 from pan3793/SPARK-48038.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/deploy/k8s/KubernetesConf.scala   | 22 +++---
 .../k8s/features/DriverServiceFeatureStep.scala| 14 ++
 .../spark/deploy/k8s/KubernetesTestConf.scala  |  6 --
 .../features/DriverServiceFeatureStepSuite.scala   | 17 +
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
index b55f9317d10b..fda772b737fe 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
@@ -24,12 +24,13 @@ import org.apache.commons.lang3.StringUtils
 import org.apache.spark.{SPARK_VERSION, SparkConf}
 import org.apache.spark.deploy.k8s.Config._
 import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.features.DriverServiceFeatureStep._
 import org.apache.spark.deploy.k8s.submit._
 import org.apache.spark.internal.{Logging, MDC}
 import org.apache.spark.internal.LogKeys.{CONFIG, EXECUTOR_ENV_REGEX}
 import org.apache.spark.internal.config.ConfigEntry
 import org.apache.spark.resource.ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID
-import org.apache.spark.util.Utils
+import org.apache.spark.util.{Clock, SystemClock, Utils}
 
 /**
  * Structure containing metadata for Kubernetes logic to build Spark pods.
@@ -83,12 +84,27 @@ private[spark] class KubernetesDriverConf(
 val mainAppResource: MainAppResource,
 val mainClass: String,
 val appArgs: Array[String],
-val proxyUser: Option[String])
-  extends KubernetesConf(sparkConf) {
+val proxyUser: Option[String],
+clock: Clock = new SystemClock())
+  extends KubernetesConf(sparkConf) with Logging {
 
   def driverNodeSelector: Map[String, String] =
 KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_NODE_SELECTOR_PREFIX)
 
+  lazy val driverServiceName: String = {
+val preferredServiceName = s"$resourceNamePrefix$DRIVER_SVC_POSTFIX"
+if (preferredServiceName.length <= MAX_SERVICE_NAME_LENGTH) {
+  preferredServiceName
+} else {
+  val randomServiceId = KubernetesUtils.uniqueID(clock)
+  val shorterServiceName = s"spark-$randomServiceId$DRIVER_SVC_POSTFIX"
+  logWarning(s"Driver's hostname would preferably be 
$preferredServiceName, but this is " +
+s"too long (must be <= $MAX_SERVICE_NAME_LENGTH characters). Falling 
back to use " +
+s"$shorterServiceName as the driver service's name.")
+  shorterServiceName
+}
+  }
+
   override val resourceNamePrefix: String = {
 val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) 
else None
 custom.getOrElse(KubernetesConf.getResourceNamePrefix(appName))
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala
index cba4f442371c..9adfb2b8de49 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala
@@ -20,7 +20,7 @@ import scala.jdk.CollectionConverters._
 
 import io.fabric8.kubernetes.api.model.{HasMetadata, ServiceBuilder}
 
-import org.apache.spark.deploy.k8s.{KubernetesDriverConf, KubernetesUtils, 
SparkPod}
+import org

(spark) branch master updated: [MINOR][DOCS] Remove space in the middle of configuration name in Arrow-optimized Python UDF page

2024-04-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff0751a56f01 [MINOR][DOCS] Remove space in the middle of configuration 
name in Arrow-optimized Python UDF page
ff0751a56f01 is described below

commit ff0751a56f010a6bf8a9ae86ddf0868bee615848
Author: Hyukjin Kwon 
AuthorDate: Sun Apr 28 22:34:30 2024 -0700

[MINOR][DOCS] Remove space in the middle of configuration name in 
Arrow-optimized Python UDF page

### What changes were proposed in this pull request?

This PR removes a space in the middle of configuration name in 
Arrow-optimized Python UDF page.

![Screenshot 2024-04-29 at 1 53 42 
PM](https://github.com/apache/spark/assets/6477701/46b7c448-fb30-4838-a5ba-c8f1c23398fd)


https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html#arrow-python-udfs

### Why are the changes needed?

So users can copy and paste the configuration names properly.

### Does this PR introduce _any_ user-facing change?

Yes it fixes the doc.

### How was this patch tested?

Manually built the docs, and checked.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46274 from HyukjinKwon/fix-minor-typo.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/docs/source/user_guide/sql/arrow_pandas.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/docs/source/user_guide/sql/arrow_pandas.rst 
b/python/docs/source/user_guide/sql/arrow_pandas.rst
index a5dfb9aa4e52..1d6a4df60690 100644
--- a/python/docs/source/user_guide/sql/arrow_pandas.rst
+++ b/python/docs/source/user_guide/sql/arrow_pandas.rst
@@ -339,9 +339,9 @@ Arrow Python UDFs
 Arrow Python UDFs are user defined functions that are executed row-by-row, 
utilizing Arrow for efficient batch data
 transfer and serialization. To define an Arrow Python UDF, you can use the 
:meth:`udf` decorator or wrap the function
 with the :meth:`udf` method, ensuring the ``useArrow`` parameter is set to 
True. Additionally, you can enable Arrow
-optimization for Python UDFs throughout the entire SparkSession by setting the 
Spark configuration ``spark.sql
-.execution.pythonUDF.arrow.enabled`` to true. It's important to note that the 
Spark configuration takes effect only
-when ``useArrow`` is either not set or set to None.
+optimization for Python UDFs throughout the entire SparkSession by setting the 
Spark configuration
+``spark.sql.execution.pythonUDF.arrow.enabled`` to true. It's important to 
note that the Spark configuration takes
+effect only when ``useArrow`` is either not set or set to None.
 
 The type hints for Arrow Python UDFs should be specified in the same way as 
for default, pickled Python UDFs.
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (9a42610d5ad8 -> e1445e3f1cf5)

2024-04-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9a42610d5ad8 [SPARK-48029][INFRA] Update the packages name removed in 
building the spark docker image
 add e1445e3f1cf5 [SPARK-48036][DOCS] Update `sql-ref-ansi-compliance.md` 
and `sql-ref-identifier.md`

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-ansi-compliance.md | 14 ++
 docs/sql-ref-identifier.md  |  2 +-
 2 files changed, 7 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48029][INFRA] Update the packages name removed in building the spark docker image

2024-04-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a42610d5ad8 [SPARK-48029][INFRA] Update the packages name removed in 
building the spark docker image
9a42610d5ad8 is described below

commit 9a42610d5ad8ae0ded92fb68c7617861cfe975e1
Author: panbingkun 
AuthorDate: Sun Apr 28 21:43:47 2024 -0700

[SPARK-48029][INFRA] Update the packages name removed in building the spark 
docker image

### What changes were proposed in this pull request?
The pr aims to update the packages name removed in building the spark 
docker image.

### Why are the changes needed?
When our default image base was switched from `ubuntu 20.04` to `ubuntu 
22.04`, the unused installation package in the base image has changed, in order 
to eliminate some warnings in building images and free disk space more 
accurately, we need to correct it.

Before:
```
#35 [29/31] RUN apt-get remove --purge -y '^aspnet.*' '^dotnet-.*' 
'^llvm-.*' 'php.*' '^mongodb-.*' snapd google-chrome-stable 
microsoft-edge-stable firefox azure-cli google-cloud-sdk mono-devel 
powershell libgl1-mesa-dri || true
#35 0.489 Reading package lists...
#35 0.505 Building dependency tree...
#35 0.507 Reading state information...
#35 0.511 E: Unable to locate package ^aspnet.*
#35 0.511 E: Couldn't find any package by glob '^aspnet.*'
#35 0.511 E: Couldn't find any package by regex '^aspnet.*'
#35 0.511 E: Unable to locate package ^dotnet-.*
#35 0.511 E: Couldn't find any package by glob '^dotnet-.*'
#35 0.511 E: Couldn't find any package by regex '^dotnet-.*'
#35 0.511 E: Unable to locate package ^llvm-.*
#35 0.511 E: Couldn't find any package by glob '^llvm-.*'
#35 0.511 E: Couldn't find any package by regex '^llvm-.*'
#35 0.511 E: Unable to locate package ^mongodb-.*
#35 0.511 E: Couldn't find any package by glob '^mongodb-.*'
#35 0.511 EPackage 'php-crypt-gpg' is not installed, so not removed
#35 0.511 Package 'php' is not installed, so not removed
#35 0.511 : Couldn't find any package by regex '^mongodb-.*'
#35 0.511 E: Unable to locate package snapd
#35 0.511 E: Unable to locate package google-chrome-stable
#35 0.511 E: Unable to locate package microsoft-edge-stable
#35 0.511 E: Unable to locate package firefox
#35 0.511 E: Unable to locate package azure-cli
#35 0.511 E: Unable to locate package google-cloud-sdk
#35 0.511 E: Unable to locate package mono-devel
#35 0.511 E: Unable to locate package powershell
#35 DONE 0.5s

#36 [30/31] RUN apt-get autoremove --purge -y
#36 0.063 Reading package lists...
#36 0.079 Building dependency tree...
#36 0.082 Reading state information...
#36 0.088 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
#36 DONE 0.4s
```

After:
```
#38 [32/36] RUN apt-get remove --purge -y 'gfortran-11' 
'humanity-icon-theme' 'nodejs-doc' || true
#38 0.066 Reading package lists...
#38 0.087 Building dependency tree...
#38 0.089 Reading state information...
#38 0.094 The following packages were automatically installed and are no 
longer required:
#38 0.094   at-spi2-core bzip2-doc dbus-user-session dconf-gsettings-backend
#38 0.095   dconf-service gsettings-desktop-schemas gtk-update-icon-cache
#38 0.095   hicolor-icon-theme libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data
#38 0.095   libatspi2.0-0 libbz2-dev libcairo-gobject2 libcolord2 libdconf1 
libepoxy0
#38 0.095   libgfortran-11-dev libgtk-3-common libjs-highlight.js libllvm11
#38 0.095   libncurses-dev libncurses5-dev libphobos2-ldc-shared98 
libreadline-dev
#38 0.095   librsvg2-2 librsvg2-common libvte-2.91-common libwayland-client0
#38 0.095   libwayland-cursor0 libwayland-egl1 libxdamage1 libxkbcommon0
#38 0.095   session-migration tilix-common xkb-data
#38 0.095 Use 'apt autoremove' to remove them.
#38 0.096 The following packages will be REMOVED:
#38 0.096   adwaita-icon-theme* gfortran* gfortran-11* humanity-icon-theme* 
libgtk-3-0*
#38 0.096   libgtk-3-bin* libgtkd-3-0* libvte-2.91-0* libvted-3-0* 
nodejs-doc*
#38 0.096   r-base-dev* tilix* ubuntu-mono*
#38 0.248 0 upgraded, 0 newly installed, 13 to remove and 0 not upgraded.
#38 0.248 After this operation, 99.6 MB disk space will be freed.
...
(Reading database ... 70597 files and directories currently installed.)
#38 0.304 Removing r-base-dev (4.1.2-1ubuntu2) ...
#38 0.319 Removing gfortran (4:11.2.0-1ubuntu1) ...
#38 0.340 Removing gfortran-11 (11.4.0-1ubuntu1~22.04) ...
#38 0.356 Removing tilix (1.9.4-2build1) ...
#38 0.377 Removing libvted-3-0:amd64 (3.10.0-1ubuntu1) ...
#38 0.392 Removing libvte-2.91-0

(spark) branch master updated (3d62dd72a58f -> 8f1634e833ce)

2024-04-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3d62dd72a58f [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` 
placeholders in labels
 add 8f1634e833ce [SPARK-48032][BUILD] Upgrade `commons-codec` to 1.17.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels

2024-04-28 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3d62dd72a58f [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` 
placeholders in labels
3d62dd72a58f is described below

commit 3d62dd72a58f5a19e9a371acc09604ab9ceb9e68
Author: Xi Chen 
AuthorDate: Sun Apr 28 18:30:06 2024 -0700

[SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels

### What changes were proposed in this pull request?

Currently, only the pod annotations supports `APP_ID` and `EXECUTOR_ID` 
placeholders. This commit aims to add the same function to pod labels.

### Why are the changes needed?

The use case is to support using customized labels for availability zone 
based topology pod affinity. We want to use the Spark application ID as the 
customized label value, to allow Spark executor pods to run in the same 
availability zone as Spark driver pod.

Although we can use the Spark internal label `spark-app-selector` directly, 
this is not a good practice when using it along with YuniKorn Gang Scheduling. 
When Gang Scheduling is enabled, the YuniKorn placeholder pods should use the 
same affinity as real Spark pods. In this way, we have to add the internal 
`spark-app-selector` label to the placeholder pods. This is not good because 
the placeholder pods could be recognized as Spark pods in the monitoring system.

Thus we propose supporting the `APP_ID` and `EXECUTOR_ID` placeholders in 
Spark pod labels as well for flexibility.

### Does this PR introduce _any_ user-facing change?

No because the pattern strings are very specific.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46149 from jshmchenxi/SPARK-47730/support-app-placeholder-in-labels.

Authored-by: Xi Chen 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/deploy/k8s/KubernetesConf.scala  | 10 ++
 .../org/apache/spark/deploy/k8s/KubernetesConfSuite.scala   | 13 ++---
 .../deploy/k8s/features/BasicDriverFeatureStepSuite.scala   | 11 +++
 .../spark/deploy/k8s/integrationtest/BasicTestsSuite.scala  |  6 --
 .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala  |  6 --
 5 files changed, 31 insertions(+), 15 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
index a1ef04f4e311..b55f9317d10b 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
@@ -100,8 +100,9 @@ private[spark] class KubernetesDriverConf(
   SPARK_APP_ID_LABEL -> appId,
   SPARK_APP_NAME_LABEL -> KubernetesConf.getAppNameLabel(appName),
   SPARK_ROLE_LABEL -> SPARK_POD_DRIVER_ROLE)
-val driverCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(
-  sparkConf, KUBERNETES_DRIVER_LABEL_PREFIX)
+val driverCustomLabels =
+  KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_DRIVER_LABEL_PREFIX)
+.map { case(k, v) => (k, Utils.substituteAppNExecIds(v, appId, "")) }
 
 presetLabels.keys.foreach { key =>
   require(
@@ -173,8 +174,9 @@ private[spark] class KubernetesExecutorConf(
   SPARK_ROLE_LABEL -> SPARK_POD_EXECUTOR_ROLE,
   SPARK_RESOURCE_PROFILE_ID_LABEL -> resourceProfileId.toString)
 
-val executorCustomLabels = KubernetesUtils.parsePrefixedKeyValuePairs(
-  sparkConf, KUBERNETES_EXECUTOR_LABEL_PREFIX)
+val executorCustomLabels =
+  KubernetesUtils.parsePrefixedKeyValuePairs(sparkConf, 
KUBERNETES_EXECUTOR_LABEL_PREFIX)
+.map { case(k, v) => (k, Utils.substituteAppNExecIds(v, appId, 
executorId)) }
 
 presetLabels.keys.foreach { key =>
   require(
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala
index 9963db016ad9..3c53e9b74f92 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala
@@ -40,7 +40,9 @@ class KubernetesConfSuite extends SparkFunSuite {
 "execNodeSelectorKey2" -> "execNodeSelectorValue2")
   private val CUSTOM_LABELS = Map(
 "customLabel1Key" -> "customLabe

(spark) branch master updated: [SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`

2024-04-27 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 64d321926bbc [SPARK-48021][ML][BUILD] Add 
`--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
64d321926bbc is described below

commit 64d321926bbcede05d1c145405d503b3431f185b
Author: panbingkun 
AuthorDate: Sat Apr 27 17:38:55 2024 -0700

[SPARK-48021][ML][BUILD] Add `--add-modules=jdk.incubator.vector` to 
`JavaModuleOptions`

### What changes were proposed in this pull request?
The pr aims to:
- add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
- remove `jdk.incubator.foreign` and `-Dforeign.restricted=warn` from 
`SparkBuild.scala`

### Why are the changes needed?
1.`jdk.incubator.vector`
First introduction: https://github.com/apache/spark/pull/30810

https://github.com/apache/spark/pull/30810/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
https://github.com/apache/spark/assets/15246973/6ac7919a-5d82-475c-b8a2-7d9de71acacc;>

Why should we add `--add-modules=jdk.incubator.vector` to 
`JavaModuleOptions`,
Because when we only add `--add-modules=jdk.incubator.vector` to 
`SparkBuild.scala`, it will only take effect when compiling, as follows:
```
build/sbt "mllib-local/Test/runMain 
org.apache.spark.ml.linalg.BLASBenchmark"
...
```
https://github.com/apache/spark/assets/15246973/54d5f55f-cefe-4126-b255-69488f8699a6;>

However, when we use `spark-submit`, it is as follows:
```
./bin/spark-submit --class org.apache.spark.ml.linalg.BLASBenchmark 
/Users/panbingkun/Developer/spark/spark-community/mllib-local/target/scala-2.13/spark-mllib-local_2.13-4.0.0-SNAPSHOT-tests.jar
```
https://github.com/apache/spark/assets/15246973/8e02fa93-fef4-4cdc-96bd-908b3e9baea1;>

Obviously, `--add-modules=jdk.incubator.vector` does not take effect in the 
`Spark runtime`, so I propose adding `--add-modules=jdk.incubator.vector` to 
the `JavaModuleOptions`(`Spark runtime options`) so that we can improve 
`performance` by using `hardware-accelerated BLAS operations` by default.

After this patch(add `--add-modules=jdk.incubator.vector` to the 
`JavaModuleOptions`), as follows:
https://github.com/apache/spark/assets/15246973/da7aa494-0d3c-4c60-9991-e7cd29a1cec5;>

2.`jdk.incubator.foreign` and `-Dforeign.restricted=warn`
A.First introduction: https://github.com/apache/spark/pull/32253

https://github.com/apache/spark/pull/32253/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
https://github.com/apache/spark/assets/15246973/3f526019-c389-4e60-ab2a-f8e99cfb;>
Use `dev.ludovic.netlib:blas:1.3.2`, the class `ForeignLinkerBLAS` uses 
`jdk.incubator.foreign.*` in this version, so we need to add 
`jdk.incubator.foreign` and `-Dforeign.restricted=warn` to `SparkBuild.scala`

https://github.com/apache/spark/pull/32253/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8
https://github.com/apache/spark/assets/15246973/4fd35e96-0da2-4456-a3f6-6b57ad2e9b64;>

https://github.com/luhenry/netlib/blob/v1.3.2/blas/src/main/java/dev/ludovic/netlib/blas/ForeignLinkerBLAS.java#L36
https://github.com/apache/spark/assets/15246973/4b7e3bd1-4650-4c7d-bdb4-c1761d48d478;>

However, with the iterative development of `dev.ludovic.netlib`, 
`ForeignLinkerBLAS` has experienced one `major` change, as following:

https://github.com/luhenry/netlib/commit/48e923c3e5e84560139eb25b3c9df9873c05e41d
https://github.com/apache/spark/assets/15246973/7ba30b19-00c7-4cc4-bea7-a6ab4b326ad8;>
From now on (V3.0.0), `jdk.incubator.foreign.*` will not be used in 
`dev.ludovic.netlib`

Currently, Spark has used the `dev.ludovic.netlib` of version `v3.0.3`. In 
this version, `ForeignLinkerBLAS` has be removed.
https://github.com/apache/spark/blob/master/pom.xml#L191

Double check (`jdk.incubator.foreign` cannot be found in the `netlib` 
source code):
https://github.com/apache/spark/assets/15246973/5c6c6d73-6a5d-427a-9fb4-f626f02335ca;>

So we can completely remove options `jdk.incubator.foreign` and 
`-Dforeign.restricted=warn`.

B.For JDK 21
(PS: This is to explain the historical reasons for the differences between 
the current code logic and the initial ones)
(Just because `Spark` made changes to support `JDK 21`)
https://issues.apache.org/jira/browse/SPARK-44088
https://github.com/apache/spark/assets/15246973/34e7e7e8-4e72-470e-abc0-d79406ad25e5;>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test
- Pass GA.

### Was this patch authored or

(spark) branch master updated: [SPARK-47408][SQL] Fix mathExpressions that use StringType

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b623601910a3 [SPARK-47408][SQL] Fix mathExpressions that use StringType
b623601910a3 is described below

commit b623601910a37c863edac56d18e79a44b93c5b36
Author: Mihailo Milosevic 
AuthorDate: Fri Apr 26 19:48:27 2024 -0700

[SPARK-47408][SQL] Fix mathExpressions that use StringType

### What changes were proposed in this pull request?
Support more functions that use strings with collations.

### Why are the changes needed?
Hex, Unhex, Conv are widely used and need to be enabled wih collations

### Does this PR introduce _any_ user-facing change?
Yes, enabled more functions.

### How was this patch tested?
With new tests in `CollationSQLExpressionsSuite.scala`.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46227 from mihailom-db/SPARK-47408.

Lead-authored-by: Mihailo Milosevic 
Co-authored-by: Uros Bojanic <157381213+uros...@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun 
---
 .../sql/catalyst/expressions/mathExpressions.scala |  21 ++--
 .../catalyst/expressions/stringExpressions.scala   |   2 +-
 .../spark/sql/CollationSQLExpressionsSuite.scala   | 124 +
 3 files changed, 138 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 0c09e9be12e9..dc50c18f2ebb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -30,6 +30,7 @@ import 
org.apache.spark.sql.catalyst.expressions.codegen.Block._
 import org.apache.spark.sql.catalyst.util.{MathUtils, NumberConverter, 
TypeUtils}
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.internal.types.StringTypeAnyCollation
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -450,8 +451,9 @@ case class Conv(
   override def first: Expression = numExpr
   override def second: Expression = fromBaseExpr
   override def third: Expression = toBaseExpr
-  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, 
IntegerType, IntegerType)
-  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(StringTypeAnyCollation, IntegerType, IntegerType)
+  override def dataType: DataType = first.dataType
   override def nullable: Boolean = true
 
   override def nullSafeEval(num: Any, fromBase: Any, toBase: Any): Any = {
@@ -1002,7 +1004,7 @@ case class Bin(child: Expression)
   extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant with 
Serializable {
 
   override def inputTypes: Seq[DataType] = Seq(LongType)
-  override def dataType: DataType = StringType
+  override def dataType: DataType = SQLConf.get.defaultStringType
 
   protected override def nullSafeEval(input: Any): Any =
 UTF8String.fromString(jl.Long.toBinaryString(input.asInstanceOf[Long]))
@@ -1108,21 +1110,24 @@ case class Hex(child: Expression)
   extends UnaryExpression with ImplicitCastInputTypes with NullIntolerant {
 
   override def inputTypes: Seq[AbstractDataType] =
-Seq(TypeCollection(LongType, BinaryType, StringType))
+Seq(TypeCollection(LongType, BinaryType, StringTypeAnyCollation))
 
-  override def dataType: DataType = StringType
+  override def dataType: DataType = child.dataType match {
+case st: StringType => st
+case _ => SQLConf.get.defaultStringType
+  }
 
   protected override def nullSafeEval(num: Any): Any = child.dataType match {
 case LongType => Hex.hex(num.asInstanceOf[Long])
 case BinaryType => Hex.hex(num.asInstanceOf[Array[Byte]])
-case StringType => Hex.hex(num.asInstanceOf[UTF8String].getBytes)
+case _: StringType => Hex.hex(num.asInstanceOf[UTF8String].getBytes)
   }
 
   override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
 nullSafeCodeGen(ctx, ev, (c) => {
   val hex = Hex.getClass.getName.stripSuffix("$")
   s"${ev.value} = " + (child.dataType match {
-case StringType => s"""$hex.hex($c.getBytes());"""
+case _: StringType => s"""$hex.hex($c.getBytes());"""
 case _ => s"""$hex.hex($c);"""
   })
 })
@@ -1149,7 +1154,7 @@ case class Unhex(child: Expression, failOnError: Boolean

(spark-kubernetes-operator) branch main updated: [SPARK-48015] Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 167047a  [SPARK-48015] Update `build.gradle` to fix deprecation 
warnings
167047a is described below

commit 167047abed12ea8e6d709dbb3c6c326330d5787e
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 26 14:58:08 2024 -0700

[SPARK-48015] Update `build.gradle` to fix deprecation warnings

### What changes were proposed in this pull request?

This PR aims to update `build.gradle` to fix deprecation warnings.

### Why are the changes needed?

**AFTER**
```
$ ./gradlew build --warning-mode all

> Configure project :spark-operator-api
Updating PrinterColumns for generated CRD

BUILD SUCCESSFUL in 331ms
16 actionable tasks: 16 up-to-date
```

**BEFORE**
```
$ ./gradlew build --warning-mode all

> Configure project :
Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': 
line 20
The org.gradle.api.plugins.JavaPluginConvention type has been deprecated. 
This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for 
further information: 
https://docs.gradle.org/8.7/userguide/upgrading_version_8.html#java_convention_deprecation
at 
build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:20)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)
at 
build_1ab30mf3g41rlj3ezxkowdftr.run(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:16)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)
Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': 
line 21
The org.gradle.api.plugins.JavaPluginConvention type has been deprecated. 
This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for 
further information: 
https://docs.gradle.org/8.7/userguide/upgrading_version_8.html#java_convention_deprecation
at 
build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:21)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)
at 
build_1ab30mf3g41rlj3ezxkowdftr.run(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:16)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)
Build file '/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle': 
line 25
The RepositoryHandler.jcenter() method has been deprecated. This is 
scheduled to be removed in Gradle 9.0. JFrog announced JCenter's sunset in 
February 2021. Use mavenCentral() instead. Consult the upgrading guide for 
further information: 
https://docs.gradle.org/8.7/userguide/upgrading_version_6.html#jcenter_deprecation
at 
build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1$_closure2.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:25)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)
at 
build_1ab30mf3g41rlj3ezxkowdftr$_run_closure1.doCall$original(/Users/dongjoon/APACHE/spark-kubernetes-operator/build.gradle:23)
(Run with --stacktrace to get the full stack trace of this 
deprecation warning.)

> Configure project :spark-operator-api
Updating PrinterColumns for generated CRD

BUILD SUCCESSFUL in 353ms
16 actionable tasks: 16 up-to-date
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually build with `--warning-mode all`.
```
$ ./gradlew build --warning-mode all

> Configure project :spark-operator-api
Updating PrinterColumns for generated CRD

BUILD SUCCESSFUL in 331ms
16 actionable tasks: 16 up-to-date
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #9 from dongjoon-hyun/SPARK-48015.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 build.gradle | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/build.gradle b/build.gradle
index ed54f7b..a6c1701 100644
--- a/build.gradle
+++ b/build.gradle
@@ -17,12 +17,14 @@ subprojects {
   apply plugin: 'idea'
   apply plugin: 'eclipse'
   apply plugin: 'java'
-  sourceCompatibility = 17
-  targetCompatibility = 17
+
+  java {
+sourceCompatibility = 17
+targetCompatibility = 17
+  }
 
   repositories {
 mavenCentral()
-jcenter()
   }
 
   apply plugin:

(spark-kubernetes-operator) branch main updated: [SPARK-47950] Add Java API Module for Spark Operator

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 28ff3e0  [SPARK-47950] Add Java API Module for Spark Operator
28ff3e0 is described below

commit 28ff3e069e80bffa2a3be69fc4905ad3a0f76fd5
Author: zhou-jiang 
AuthorDate: Fri Apr 26 14:18:09 2024 -0700

[SPARK-47950] Add Java API Module for Spark Operator

### What changes were proposed in this pull request?

This PR adds Java API library for Spark Operator, with the ability to 
generate yaml spec.

### Why are the changes needed?

Spark Operator API refers to the 
CustomResourceDefinition(https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
 that represents the spec for Spark Application in k8s.

This module would be used by operator controller and reconciler. It can 
also serve external services that access k8s server with Java library.

### Does this PR introduce _any_ user-facing change?

No API changes in Apache Spark core API. Spark Operator API is proposed.

To view generate SparkApplication spec yaml, use

```
./gradlew :spark-operator-api:finalizeGeneratedCRD
```

(this requires yq to be installed for patching additional printer columns)

Generated yaml file would be located at

```

spark-operator-api/build/classes/java/main/META-INF/fabric8/sparkapplications.org.apache.spark-v1.yml
```

For more details, please also refer 
`spark-operator-docs/spark_application.md`

### How was this patch tested?

This is tested locally.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #8 from jiangzho/api.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 .github/.licenserc.yaml|   1 +
 build.gradle   |   2 +
 dev/.rat-excludes  |   2 +
 gradle.properties  |  16 ++
 settings.gradle|   2 +
 spark-operator-api/build.gradle|  32 
 .../apache/spark/k8s/operator/BaseResource.java|  36 +
 .../org/apache/spark/k8s/operator/Constants.java   |  82 ++
 .../spark/k8s/operator/SparkApplication.java   |  57 +++
 .../spark/k8s/operator/SparkApplicationList.java   |  26 +++
 .../k8s/operator/decorators/ResourceDecorator.java |  26 +++
 .../apache/spark/k8s/operator/diff/Diffable.java   |  22 +++
 .../spark/k8s/operator/spec/ApplicationSpec.java   |  57 +++
 .../operator/spec/ApplicationTimeoutConfig.java|  66 
 .../k8s/operator/spec/ApplicationTolerations.java  |  45 ++
 .../operator/spec/BaseApplicationTemplateSpec.java |  38 +
 .../apache/spark/k8s/operator/spec/BaseSpec.java   |  36 +
 .../spark/k8s/operator/spec/DeploymentMode.java|  25 +++
 .../spark/k8s/operator/spec/InstanceConfig.java|  68 
 .../k8s/operator/spec/ResourceRetainPolicy.java|  39 +
 .../spark/k8s/operator/spec/RestartConfig.java |  39 +
 .../spark/k8s/operator/spec/RestartPolicy.java |  39 +
 .../spark/k8s/operator/spec/RuntimeVersions.java   |  40 +
 .../operator/status/ApplicationAttemptSummary.java |  53 ++
 .../k8s/operator/status/ApplicationState.java  |  50 ++
 .../operator/status/ApplicationStateSummary.java   | 151 +
 .../k8s/operator/status/ApplicationStatus.java | 170 
 .../spark/k8s/operator/status/AttemptInfo.java |  44 +
 .../k8s/operator/status/BaseAttemptSummary.java|  37 +
 .../spark/k8s/operator/status/BaseState.java   |  37 +
 .../k8s/operator/status/BaseStateSummary.java  |  29 
 .../spark/k8s/operator/status/BaseStatus.java  |  64 
 .../spark/k8s/operator/utils/ModelUtils.java   | 110 +
 .../src/main/resources/printer-columns.sh  |  14 +-
 .../k8s/operator/spec/ApplicationSpecTest.java |  42 +
 .../spark/k8s/operator/spec/RestartPolicyTest.java |  62 +++
 .../k8s/operator/status/ApplicationStatusTest.java | 178 +
 .../spark/k8s/operator/utils/ModelUtilsTest.java   | 124 ++
 38 files changed, 1956 insertions(+), 5 deletions(-)

diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml
index 26ac0c1..d1d65e2 100644
--- a/.github/.licenserc.yaml
+++ b/.github/.licenserc.yaml
@@ -16,5 +16,6 @@ header:
 - '.asf.yaml'
 - '**/*.gradle'
 - gradlew
+- 'build/**'
 
   comment: on-failure
diff --git a/build.gradle b/build.gradle
index f64212b..ed54f7b 100644
--- a/build.gradle
+++ b/build.gradle
@@ -72,6 +72,8 @@ subprojects

(spark) branch master updated: [SPARK-48011][CORE] Store LogKey name as a value to avoid generating new string instances

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b2a33cc35a8 [SPARK-48011][CORE] Store LogKey name as a value to avoid 
generating new string instances
2b2a33cc35a8 is described below

commit 2b2a33cc35a880fafc569c707674313a56c15811
Author: Gengliang Wang 
AuthorDate: Fri Apr 26 13:25:15 2024 -0700

[SPARK-48011][CORE] Store LogKey name as a value to avoid generating new 
string instances

### What changes were proposed in this pull request?

Store LogKey name as a value to avoid generating new string instances
### Why are the changes needed?

To save memory usage on getting the names of `LogKey`s.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46249 from gengliangwang/addKeyName.

Authored-by: Gengliang Wang 
Signed-off-by: Dongjoon Hyun 
---
 common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala  | 6 +-
 common/utils/src/main/scala/org/apache/spark/internal/Logging.scala | 4 +---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
index 04990ddc4c9d..2ca80a496ccb 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala
@@ -16,10 +16,14 @@
  */
 package org.apache.spark.internal
 
+import java.util.Locale
+
 /**
  * All structured logging `keys` used in `MDC` must be extends `LogKey`
  */
-trait LogKey
+trait LogKey {
+  val name: String = this.toString.toLowerCase(Locale.ROOT)
+}
 
 /**
  * Various keys used for mapped diagnostic contexts(MDC) in logging.
diff --git 
a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala 
b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
index 085b22bee5f3..24a60f88c24a 100644
--- a/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
+++ b/common/utils/src/main/scala/org/apache/spark/internal/Logging.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.internal
 
-import java.util.Locale
-
 import scala.jdk.CollectionConverters._
 
 import org.apache.logging.log4j.{CloseableThreadContext, Level, LogManager}
@@ -110,7 +108,7 @@ trait Logging {
 val value = if (mdc.value != null) mdc.value.toString else null
 sb.append(value)
 if (Logging.isStructuredLoggingEnabled) {
-  context.put(mdc.key.toString.toLowerCase(Locale.ROOT), value)
+  context.put(mdc.key.name, value)
 }
 
 if (processedParts.hasNext) {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6098bd944f66 [SPARK-48010][SQL] Avoid repeated calls to conf.resolver 
in resolveExpression
6098bd944f66 is described below

commit 6098bd944f6603546601a9d5b5da5f756ce2257c
Author: Nikhil Sheoran <125331115+nikhilsheoran...@users.noreply.github.com>
AuthorDate: Fri Apr 26 11:23:12 2024 -0700

[SPARK-48010][SQL] Avoid repeated calls to conf.resolver in 
resolveExpression

### What changes were proposed in this pull request?
- This PR instead of calling `conf.resolver` for each call in 
`resolveExpression`, reuses the `resolver` obtained once.

### Why are the changes needed?
- Consider a view with large number of columns (~1000s). When looking at 
the RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE 
SELECT * FROM large_view`, observed that a large fraction of time is spent in 
`ResolveReferences` and `ResolveRelations`. Of these, the majority of the 
driver time went in initializing the `conf` to obtain `conf.resolver` for each 
of the column in the view.
- Since, the same `conf` is used in each of these calls, calling the 
`conf.resolver` again and again can be avoided by initializing it once and 
reusing the same resolver.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Created a dummy view with 3000 columns.
- Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`.
- `RuleExecutor` metrics before this change (after multiple runs)
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 8.026801698 seconds

Rule
Effective Time / Total Time Effective Runs / 
Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
4060159342 / 4062186814 1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences   
3789405037 / 3809203288 2 / 6

org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule
0 / 207411640 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone  
17800584 / 19431350 1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast   
15036018 / 15060440 1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability   
0 / 149298100 / 7
```
- `RuleExecutor` metrics after this change (after multiple runs)
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1483
Total time: 2.892630859 seconds

Rule
Effective Time / Total Time Effective Runs / 
Total Runs

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
1490357745 / 1492398446 1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences   
1212205822 / 1241729981 2 / 6

org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule
0 / 238571610 / 6
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone  
16603250 / 18806065 1 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability   
0 / 167493060 / 7
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast   
11158299 / 11183593 1 / 6
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46248 from nikhilsheoran-db/SPARK-48010.

Authored-by: Nikhil Sheoran 
<125331115+nikhilsheoran...@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/analysis/ColumnResolutionHelper.scala   | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
index 6e27192ead32..c10e000a098c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
+++ 
b/sql/

(spark) branch master updated: [SPARK-48005][PS][CONNECT][TESTS] Enable `DefaultIndexParityTests.test_index_distributed_sequence_cleanup`

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 78b19d5af08e [SPARK-48005][PS][CONNECT][TESTS] Enable 
`DefaultIndexParityTests.test_index_distributed_sequence_cleanup`
78b19d5af08e is described below

commit 78b19d5af08ea772eaea9c13b7b984a13294
Author: Ruifeng Zheng 
AuthorDate: Fri Apr 26 09:58:54 2024 -0700

[SPARK-48005][PS][CONNECT][TESTS] Enable 
`DefaultIndexParityTests.test_index_distributed_sequence_cleanup`

### What changes were proposed in this pull request?
Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

### Why are the changes needed?
this test requires `sc` access, can be enabled in `Spark Connect with JVM` 
mode

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci, also manually test:
```
python/run-tests -k --python-executables python3 --testnames 
'pyspark.pandas.tests.connect.indexes.test_parity_default 
DefaultIndexParityTests.test_index_distributed_sequence_cleanup'
Running PySpark tests. Output is in 
/Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: 
['pyspark.pandas.tests.connect.indexes.test_parity_default 
DefaultIndexParityTests.test_index_distributed_sequence_cleanup']
python3 python_implementation is CPython
python3 version is: Python 3.12.2
Starting test(python3): 
pyspark.pandas.tests.connect.indexes.test_parity_default 
DefaultIndexParityTests.test_index_distributed_sequence_cleanup (temp output: 
/Users/ruifeng.zheng/Dev/spark/python/target/ccd3da45-f774-4f5f-8283-a91a8ee12212/python3__pyspark.pandas.tests.connect.indexes.test_parity_default_DefaultIndexParityTests.test_index_distributed_sequence_cleanup__p9yved3e.log)
Finished test(python3): 
pyspark.pandas.tests.connect.indexes.test_parity_default 
DefaultIndexParityTests.test_index_distributed_sequence_cleanup (16s)
Tests passed in 16 seconds
```

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46242 from 
zhengruifeng/enable_test_index_distributed_sequence_cleanup.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 .../pyspark/pandas/tests/connect/indexes/test_parity_default.py   | 3 ++-
 python/pyspark/pandas/tests/indexes/test_default.py   | 8 
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py 
b/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py
index d6f0cadbf0cd..4240eb8fdbc8 100644
--- a/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py
+++ b/python/pyspark/pandas/tests/connect/indexes/test_parity_default.py
@@ -19,6 +19,7 @@ import unittest
 from pyspark.pandas.tests.indexes.test_default import DefaultIndexTestsMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
 from pyspark.testing.pandasutils import PandasOnSparkTestUtils
+from pyspark.util import is_remote_only
 
 
 class DefaultIndexParityTests(
@@ -26,7 +27,7 @@ class DefaultIndexParityTests(
 PandasOnSparkTestUtils,
 ReusedConnectTestCase,
 ):
-@unittest.skip("Test depends on SparkContext which is not supported from 
Spark Connect.")
+@unittest.skipIf(is_remote_only(), "Requires JVM access")
 def test_index_distributed_sequence_cleanup(self):
 super().test_index_distributed_sequence_cleanup()
 
diff --git a/python/pyspark/pandas/tests/indexes/test_default.py 
b/python/pyspark/pandas/tests/indexes/test_default.py
index 3d19eb407b42..5cd9fae76dfb 100644
--- a/python/pyspark/pandas/tests/indexes/test_default.py
+++ b/python/pyspark/pandas/tests/indexes/test_default.py
@@ -44,7 +44,7 @@ class DefaultIndexTestsMixin:
 "compute.default_index_type", "distributed-sequence"
 ), ps.option_context("compute.ops_on_diff_frames", True):
 with ps.option_context("compute.default_index_cache", 
"LOCAL_CHECKPOINT"):
-cached_rdd_ids = [rdd_id for rdd_id in 
self.spark._jsc.getPersistentRDDs()]
+cached_rdd_ids = [rdd_id for rdd_id in 
self._legacy_sc._jsc.getPersistentRDDs()]
 
 psdf1 = (
 self.spark.range(0, 100, 1, 10).withColumn("Key", 
F.col("id") % 33).pandas_api()
@@ -61,13 +61,13 @@ class DefaultIndexTestsMixin:
 self.assertTrue(
 any(
 rdd_id not in cached_rdd_ids
-for rdd_id in self.spark._jsc.getPers

(spark) branch master updated: [SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to `12.6.1.jre11`

2024-04-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4ee528f9b29f [SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to 
`12.6.1.jre11`
4ee528f9b29f is described below

commit 4ee528f9b29f5cd52b70b27a4b8c250c8ca1a17c
Author: Kent Yao 
AuthorDate: Fri Apr 26 08:08:57 2024 -0700

[SPARK-48007][BUILD][TESTS] Upgrade `mssql.jdbc` to `12.6.1.jre11`

### What changes were proposed in this pull request?

This PR upgrades mssql.jdbc.version to 12.6.1.jre11, 
https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc.

### Why are the changes needed?

test dependency management

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46244 from yaooqinn/SPARK-48007.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala  | 3 ++-
 pom.xml| 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala
index b351b2ad1ec7..61530f713eb8 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala
@@ -28,5 +28,6 @@ class MsSQLServerDatabaseOnDocker extends DatabaseOnDocker {
   override val jdbcPort: Int = 1433
 
   override def getJdbcUrl(ip: String, port: Int): String =
-s"jdbc:sqlserver://$ip:$port;user=sa;password=Sapass123;"
+s"jdbc:sqlserver://$ip:$port;user=sa;password=Sapass123;" +
+  "encrypt=true;trustServerCertificate=true"
 }
diff --git a/pom.xml b/pom.xml
index 9c8f8fbb2ab0..b916659fdbfa 100644
--- a/pom.xml
+++ b/pom.xml
@@ -325,7 +325,7 @@
 8.3.0
 42.7.3
 11.5.9.0
-9.4.1.jre8
+12.6.1.jre11
 23.3.0.23.09
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47991][SQL][TEST] Arrange the test cases for window frames and window functions

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ea4b7a242910 [SPARK-47991][SQL][TEST] Arrange the test cases for 
window frames and window functions
ea4b7a242910 is described below

commit ea4b7a2429106067eb30b6b47bf7c42059053d31
Author: beliefer 
AuthorDate: Thu Apr 25 20:54:27 2024 -0700

[SPARK-47991][SQL][TEST] Arrange the test cases for window frames and 
window functions

### What changes were proposed in this pull request?
This PR propose to arrange the test cases for window frames and window 
functions.

### Why are the changes needed?
Currently, `DataFrameWindowFramesSuite` and `DataFrameWindowFunctionsSuite` 
have different testing objectives.
The comments for the above two classes are as follows:
`DataFrameWindowFramesSuite` is `Window frame testing for DataFrame API.`
`DataFrameWindowFunctionsSuite` is `Window function testing for DataFrame 
API.`

But there are some test cases for window frame placed into 
`DataFrameWindowFunctionsSuite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just arrange the test cases for window frames and window functions.

### How was this patch tested?
GA

### Was this patch authored or co-authored using generative AI tooling?
'No'.

Closes #46226 from beliefer/SPARK-47991.

Authored-by: beliefer 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/DataFrameWindowFramesSuite.scala | 48 ++
 .../spark/sql/DataFrameWindowFunctionsSuite.scala  | 48 --
 2 files changed, 48 insertions(+), 48 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala
index fe1393af8174..95f4cc78d156 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFramesSuite.scala
@@ -32,6 +32,28 @@ import org.apache.spark.sql.types.CalendarIntervalType
 class DataFrameWindowFramesSuite extends QueryTest with SharedSparkSession {
   import testImplicits._
 
+  test("reuse window partitionBy") {
+val df = Seq((1, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value")
+val w = Window.partitionBy("key").orderBy("value")
+
+checkAnswer(
+  df.select(
+lead("key", 1).over(w),
+lead("value", 1).over(w)),
+  Row(1, "1") :: Row(2, "2") :: Row(null, null) :: Row(null, null) :: Nil)
+  }
+
+  test("reuse window orderBy") {
+val df = Seq((1, "1"), (2, "2"), (1, "1"), (2, "2")).toDF("key", "value")
+val w = Window.orderBy("value").partitionBy("key")
+
+checkAnswer(
+  df.select(
+lead("key", 1).over(w),
+lead("value", 1).over(w)),
+  Row(1, "1") :: Row(2, "2") :: Row(null, null) :: Row(null, null) :: Nil)
+  }
+
   test("lead/lag with empty data frame") {
 val df = Seq.empty[(Int, String)].toDF("key", "value")
 val window = Window.partitionBy($"key").orderBy($"value")
@@ -570,4 +592,30 @@ class DataFrameWindowFramesSuite extends QueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("SPARK-34227: WindowFunctionFrame should clear its states during 
preparation") {
+// This creates a single partition dataframe with 3 records:
+//   "a", 0, null
+//   "a", 1, "x"
+//   "b", 0, null
+val df = spark.range(0, 3, 1, 1).select(
+  when($"id" < 2, lit("a")).otherwise(lit("b")).as("key"),
+  ($"id" % 2).cast("int").as("order"),
+  when($"id" % 2 === 0, lit(null)).otherwise(lit("x")).as("value"))
+
+val window1 = Window.partitionBy($"key").orderBy($"order")
+  .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
+val window2 = Window.partitionBy($"key").orderBy($"order")
+  .rowsBetween(Window.unboundedPreceding, Window.currentRow)
+checkAnswer(
+  df.select(
+$"key",
+$"order",
+nth_value($"value", 1, ignoreNulls = true).over(window1),
+nth_value($"value", 1, ignoreNulls = true).over(window2)),
+  Seq(
+Row("a", 0, "x", null),
+Row("a", 1, "x"

(spark) branch master updated: [SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid referencing _to_seq in `pyspark-connect`

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 79357c8ccd22 [SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid 
referencing _to_seq in `pyspark-connect`
79357c8ccd22 is described below

commit 79357c8ccd22729a074c42f700544e7e3f023a8d
Author: Hyukjin Kwon 
AuthorDate: Thu Apr 25 14:49:21 2024 -0700

[SPARK-47933][CONNECT][PYTHON][FOLLOW-UP] Avoid referencing _to_seq in 
`pyspark-connect`

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/46155 that 
removes the reference of `_to_seq` that `pyspark-connect` package does not have.

### Why are the changes needed?

To recover the CI 
https://github.com/apache/spark/actions/runs/8821919392/job/24218893631

### Does this PR introduce _any_ user-facing change?

No, the main change has not been released out yet.

### How was this patch tested?

Manually tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46229 from HyukjinKwon/SPARK-47933-followuptmp.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/group.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/group.py b/python/pyspark/sql/group.py
index d26e23bc7160..34c3531c8302 100644
--- a/python/pyspark/sql/group.py
+++ b/python/pyspark/sql/group.py
@@ -43,9 +43,9 @@ def dfapi(f: Callable[..., DataFrame]) -> Callable[..., 
DataFrame]:
 
 
 def df_varargs_api(f: Callable[..., DataFrame]) -> Callable[..., DataFrame]:
-from pyspark.sql.classic.column import _to_seq
-
 def _api(self: "GroupedData", *cols: str) -> DataFrame:
+from pyspark.sql.classic.column import _to_seq
+
 name = f.__name__
 jdf = getattr(self._jgd, name)(_to_seq(self.session._sc, cols))
 return DataFrame(jdf, self.session)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for TINYINT type mapping change

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e1d021214c61 [SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for 
TINYINT type mapping change
e1d021214c61 is described below

commit e1d021214c6130588e69dfa05e0391d89b463f9d
Author: Kent Yao 
AuthorDate: Thu Apr 25 08:19:40 2024 -0700

[SPARK-45425][DOCS][FOLLOWUP] Add a migration guide for TINYINT type 
mapping change

### What changes were proposed in this pull request?

Followup of SPARK-45425, adding migration guide.

### Why are the changes needed?

migration guide

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing build

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46224 from yaooqinn/SPARK-45425.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 9b189eee6ad1..024423fb145a 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -47,6 +47,7 @@ license: |
 - Since Spark 4.0, MySQL JDBC datasource will read BIT(n > 1) as BinaryType, 
while in Spark 3.5 and previous, read as LongType. To restore the previous 
behavior, set `spark.sql.legacy.mysql.bitArrayMapping.enabled` to `true`.
 - Since Spark 4.0, MySQL JDBC datasource will write ShortType as SMALLINT, 
while in Spark 3.5 and previous, write as INTEGER. To restore the previous 
behavior, you can replace the column with IntegerType whenever before writing.
 - Since Spark 4.0, Oracle JDBC datasource will write TimestampType as 
TIMESTAMP WITH LOCAL TIME ZONE, while in Spark 3.5 and previous, write as 
TIMESTAMP. To restore the previous behavior, set 
`spark.sql.legacy.oracle.timestampMapping.enabled` to `true`.
+- Since Spark 4.0, MsSQL Server JDBC datasource will read TINYINT as 
ShortType, while in Spark 3.5 and previous, read as IntegerType. To restore the 
previous behavior, set `spark.sql.legacy.mssqlserver.numericMapping.enabled` to 
`true`.
 - Since Spark 4.0, The default value for 
`spark.sql.legacy.ctePrecedencePolicy` has been changed from `EXCEPTION` to 
`CORRECTED`. Instead of raising an error, inner CTE definitions take precedence 
over outer definitions.
 - Since Spark 4.0, The default value for `spark.sql.legacy.timeParserPolicy` 
has been changed from `EXCEPTION` to `CORRECTED`. Instead of raising an 
`INCONSISTENT_BEHAVIOR_CROSS_VERSION` error, `CANNOT_PARSE_TIMESTAMP` will be 
raised if ANSI mode is enable. `NULL` will be returned if ANSI mode is 
disabled. See [Datetime Patterns for Formatting and 
Parsing](sql-ref-datetime-pattern.html).
 - Since Spark 4.0, A bug falsely allowing `!` instead of `NOT` when `!` is not 
a prefix operator has been fixed. Clauses such as `expr ! IN (...)`, `expr ! 
BETWEEN ...`, or `col ! NULL` now raise syntax errors. To restore the previous 
behavior, set `spark.sql.legacy.bangEqualsNot` to `true`. 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (de5c512e0179 -> 287d02073929)

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from de5c512e0179 [SPARK-47987][PYTHON][CONNECT][TESTS] Enable 
`ArrowParityTests.test_createDataFrame_empty_partition`
 add 287d02073929 [SPARK-47989][SQL] MsSQLServer: Fix the scope of 
spark.sql.legacy.mssqlserver.numericMapping.enabled

No new revisions were added by this update.

Summary of changes:
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 177 +++--
 .../org/apache/spark/sql/internal/SQLConf.scala|   2 +-
 .../apache/spark/sql/jdbc/MsSqlServerDialect.scala |  29 ++--
 3 files changed, 104 insertions(+), 104 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47987][PYTHON][CONNECT][TESTS] Enable `ArrowParityTests.test_createDataFrame_empty_partition`

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new de5c512e0179 [SPARK-47987][PYTHON][CONNECT][TESTS] Enable 
`ArrowParityTests.test_createDataFrame_empty_partition`
de5c512e0179 is described below

commit de5c512e017965b5c726e254f8969fb17d5c17ea
Author: Ruifeng Zheng 
AuthorDate: Thu Apr 25 08:16:56 2024 -0700

[SPARK-47987][PYTHON][CONNECT][TESTS] Enable 
`ArrowParityTests.test_createDataFrame_empty_partition`

### What changes were proposed in this pull request?
Reenable `ArrowParityTests.test_createDataFrame_empty_partition`

We actually already had set up Classic SparkContext `_legacy_sc ` for Spark 
Connect test, so only need to add `_legacy_sc` in Classic PySpark test.

### Why are the changes needed?
to improve test coverage

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46220 from zhengruifeng/enable_test_createDataFrame_empty_partition.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/tests/connect/test_parity_arrow.py | 4 
 python/pyspark/sql/tests/test_arrow.py| 4 +++-
 python/pyspark/testing/sqlutils.py| 1 +
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/sql/tests/connect/test_parity_arrow.py 
b/python/pyspark/sql/tests/connect/test_parity_arrow.py
index 93d0b6cf0f5f..8727cc279641 100644
--- a/python/pyspark/sql/tests/connect/test_parity_arrow.py
+++ b/python/pyspark/sql/tests/connect/test_parity_arrow.py
@@ -24,10 +24,6 @@ from pyspark.testing.pandasutils import 
PandasOnSparkTestUtils
 
 
 class ArrowParityTests(ArrowTestsMixin, ReusedConnectTestCase, 
PandasOnSparkTestUtils):
-@unittest.skip("Spark Connect does not support Spark Context but the test 
depends on that.")
-def test_createDataFrame_empty_partition(self):
-super().test_createDataFrame_empty_partition()
-
 @unittest.skip("Spark Connect does not support fallback.")
 def test_createDataFrame_fallback_disabled(self):
 super().test_createDataFrame_fallback_disabled()
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 5235e021bae9..03cb35feb994 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -56,6 +56,7 @@ from pyspark.testing.sqlutils import (
 ExamplePointUDT,
 )
 from pyspark.errors import ArithmeticException, PySparkTypeError, 
UnsupportedOperationException
+from pyspark.util import is_remote_only
 
 if have_pandas:
 import pandas as pd
@@ -830,7 +831,8 @@ class ArrowTestsMixin:
 pdf = pd.DataFrame({"c1": [1], "c2": ["string"]})
 df = self.spark.createDataFrame(pdf)
 self.assertEqual([Row(c1=1, c2="string")], df.collect())
-self.assertGreater(self.spark.sparkContext.defaultParallelism, 
len(pdf))
+if not is_remote_only():
+self.assertGreater(self._legacy_sc.defaultParallelism, len(pdf))
 
 def test_toPandas_error(self):
 for arrow_enabled in [True, False]:
diff --git a/python/pyspark/testing/sqlutils.py 
b/python/pyspark/testing/sqlutils.py
index 690d5c37b22e..a0fdada72972 100644
--- a/python/pyspark/testing/sqlutils.py
+++ b/python/pyspark/testing/sqlutils.py
@@ -258,6 +258,7 @@ class ReusedSQLTestCase(ReusedPySparkTestCase, 
SQLTestUtils, PySparkErrorTestUti
 @classmethod
 def setUpClass(cls):
 super(ReusedSQLTestCase, cls).setUpClass()
+cls._legacy_sc = cls.sc
 cls.spark = SparkSession(cls.sc)
 cls.tempdir = tempfile.NamedTemporaryFile(delete=False)
 os.unlink(cls.tempdir.name)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3

2024-04-25 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5810554ce0fa [SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3
5810554ce0fa is described below

commit 5810554ce0faba4cb8e7f3ca3dd5812bd2cf179f
Author: panbingkun 
AuthorDate: Thu Apr 25 08:10:04 2024 -0700

[SPARK-47990][BUILD] Upgrade `zstd-jni` to 1.5.6-3

### What changes were proposed in this pull request?
The pr aims to upgrade `zstd-jni` from `1.5.6-2` to `1.5.6-3`.

### Why are the changes needed?
1.This version fix a potential memory leak problem, as follows:
https://github.com/apache/spark/assets/15246973/eeae3e7f-0c44-443d-838b-fa39b9e45d64;>

2.https://github.com/luben/zstd-jni/compare/v1.5.6-2...v1.5.6-3

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46225 from panbingkun/SPARK-47990.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index f6adb6d18b85..005cc7bfb435 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -278,4 +278,4 @@ xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
 zookeeper-jute/3.9.2//zookeeper-jute-3.9.2.jar
 zookeeper/3.9.2//zookeeper-3.9.2.jar
-zstd-jni/1.5.6-2//zstd-jni-1.5.6-2.jar
+zstd-jni/1.5.6-3//zstd-jni-1.5.6-3.jar
diff --git a/pom.xml b/pom.xml
index c98514efa356..9c8f8fbb2ab0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -800,7 +800,7 @@
   
 com.github.luben
 zstd-jni
-1.5.6-2
+1.5.6-3
   
   
 com.clearspring.analytics


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0fcced63be99 [SPARK-47979][SQL][TESTS] Use Hive tables explicitly for 
Hive table capability tests
0fcced63be99 is described below

commit 0fcced63be99302593591d29370c00e7c0d73cec
Author: Dongjoon Hyun 
AuthorDate: Wed Apr 24 18:57:29 2024 -0700

[SPARK-47979][SQL][TESTS] Use Hive tables explicitly for Hive table 
capability tests

### What changes were proposed in this pull request?

This PR aims to use `Hive` tables explicitly for Hive table capability 
tests in `hive` and `hive-thriftserver` module.

### Why are the changes needed?

To make Hive test coverage robust by making it independent from Apache 
Spark configuration changes.

### Does this PR introduce _any_ user-facing change?

No, this is a test only change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46211 from dongjoon-hyun/SPARK-47979.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala | 1 +
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala | 9 +++--
 .../org/apache/spark/sql/hive/execution/HiveQuerySuite.scala | 6 +++---
 .../spark/sql/hive/execution/command/ShowCreateTableSuite.scala  | 4 
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala
index b552611b75d1..2b2cbec41d64 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala
@@ -108,7 +108,7 @@ class UISeleniumSuite
   val baseURL = s"http://$localhost:$uiPort;
 
   val queries = Seq(
-"CREATE TABLE test_map(key INT, value STRING)",
+"CREATE TABLE test_map (key INT, value STRING) USING HIVE",
 s"LOAD DATA LOCAL INPATH '${TestData.smallKv}' OVERWRITE INTO TABLE 
test_map")
 
   queries.foreach(statement.execute)
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
index 0bc288501a01..b60adfb6f4cf 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
@@ -686,6 +686,7 @@ class HiveClientSuite(version: String) extends 
HiveVersionSuite(version) {
 versionSpark.sql(
   s"""
  |CREATE TABLE tab(c1 string)
+ |USING HIVE
  |location '${tmpDir.toURI.toString}'
  """.stripMargin)
 
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
index 241fdd4b9ec5..965db22b78f1 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
@@ -216,7 +216,7 @@ class HiveDDLSuite
 
   test("SPARK-22431: alter table tests with nested types") {
 withTable("t1", "t2", "t3") {
-  spark.sql("CREATE TABLE t1 (q STRUCT, i1 INT)")
+  spark.sql("CREATE TABLE t1 (q STRUCT, i1 INT) 
USING HIVE")
   spark.sql("ALTER TABLE t1 ADD COLUMNS (newcol1 STRUCT<`col1`:STRING, 
col2:Int>)")
   val newcol = spark.sql("SELECT * FROM t1").schema.fields(2).name
   assert("newcol1".equals(newcol))
@@ -2614,7 +2614,7 @@ class HiveDDLSuite
   "msg" -> "java.lang.UnsupportedOperationException: Unknown field 
type: void")
   )
 
-  sql("CREATE TABLE t3 AS SELECT NULL AS null_col")
+  sql("CREATE TABLE t3 USING HIVE AS SELECT NULL AS null_col")
   checkAnswer(sql("SELECT * FROM t3"), Row(null))
 }
 
@@ -2642,9 +2642,6 @@ class HiveDDLSuite
 
   sql("CREATE TABLE t3 (v VOID) USING hive")
   checkAnswer(sql("SELECT * FROM t3"), Seq.empty)
-
-  sql("CREATE TABLE t4 (v VOID)")
-  checkAnswer(sql("SELECT * FROM t4"), Seq.empty)
 }
 
 // Create table with void t

(spark) branch branch-3.5 updated: [SPARK-47633][SQL][3.5] Include right-side plan output in `LateralJoin#allAttributes` for more consistent canonicalization

2024-04-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new ce19bfc10682 [SPARK-47633][SQL][3.5] Include right-side plan output in 
`LateralJoin#allAttributes` for more consistent canonicalization
ce19bfc10682 is described below

commit ce19bfc1068229897454c5f5cb78aeb435821bd2
Author: Bruce Robbins 
AuthorDate: Wed Apr 24 09:48:21 2024 -0700

[SPARK-47633][SQL][3.5] Include right-side plan output in 
`LateralJoin#allAttributes` for more consistent canonicalization

This is a backport of #45763 to branch-3.5.

### What changes were proposed in this pull request?

Modify `LateralJoin` to include right-side plan output in `allAttributes`.

### Why are the changes needed?

In the following example, the view v1 is cached, but a query of v1 does not 
use the cache:
```
CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);

create or replace temp view v1 as
select *
from t1
join lateral (
  select c1 as a, c2 as b
  from t2)
on c1 = a;

cache table v1;

explain select * from v1;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
   :- LocalTableScan [c1#180, c2#181]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)),false), [plan_id=113]
  +- LocalTableScan [a#173, b#174]
```

The canonicalized version of the `LateralJoin` node is not consistent when 
there is a join condition. For example, for the above query, the join condition 
is canonicalized as follows:
```
Before canonicalization: Some((c1#174 = a#167))
After canonicalization:  Some((none#0 = none#167))
```
You can see that the `exprId` for the second operand of `EqualTo` is not 
normalized (it remains 167). That's because the attribute `a` from the 
right-side plan is not included `allAttributes`.

This PR adds right-side attributes to `allAttributes` so that references to 
right-side attributes in the join condition are normalized during 
canonicalization.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46190 from bersprockets/lj_canonical_issue_35.

Authored-by: Bruce Robbins 
Signed-off-by: Dongjoon Hyun 
---
 .../plans/logical/basicLogicalOperators.scala |  2 ++
 .../scala/org/apache/spark/sql/CachedTableSuite.scala | 19 +++
 2 files changed, 21 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 58c03ee72d6d..ca2c6a850561 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -2017,6 +2017,8 @@ case class LateralJoin(
 joinType: JoinType,
 condition: Option[Expression]) extends UnaryNode {
 
+  override lazy val allAttributes: AttributeSeq = left.output ++ 
right.plan.output
+
   require(Seq(Inner, LeftOuter, Cross).contains(joinType),
 s"Unsupported lateral join type $joinType")
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
index 8331a3c10fc9..9815cb816c99 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
@@ -1710,4 +1710,23 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils
   }
 }
   }
+
+  test("SPARK-47633: Cache hit for lateral join with join condition") {
+withTempView("t", "q1") {
+  sql("create or replace temp view t(c1, c2) as values (0, 1), (1, 2)")
+  val query = """select *
+|from t
+|join lateral (
+|  select c1 as a, c2 as b
+|  from t)
+|on c1 = a;
+|""".stripMargin
+  sql(s"cache table q1 as $query")
+  val df = sql(query)
+  checkAnswer(df,
+Row(0, 1, 0, 1) :: Row(1, 2, 1, 2) :: Nil)
+  assert(getNumInMemoryRelations(df) == 1)
+}
+
+  }
 }


-
To

(spark) branch master updated (09ed09cb18e7 -> 03d4ea6a707c)

2024-04-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 09ed09cb18e7 [SPARK-47958][TESTS] Change LocalSchedulerBackend to 
notify scheduler of executor on start
 add 03d4ea6a707c [SPARK-47974][BUILD] Remove `install_scala` from 
`build/mvn`

No new revisions were added by this update.

Summary of changes:
 .github/workflows/benchmark.yml|  6 ++
 .github/workflows/build_and_test.yml   | 24 
 .github/workflows/build_python_connect.yml |  3 +--
 .github/workflows/maven_test.yml   |  3 +--
 build/mvn  | 24 
 5 files changed, 12 insertions(+), 48 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47969][PYTHON][TESTS] Make `test_creation_index` deterministic

2024-04-24 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cb1e1f5cd49a [SPARK-47969][PYTHON][TESTS] Make `test_creation_index` 
deterministic
cb1e1f5cd49a is described below

commit cb1e1f5cd49a612c0c081949759c1f931883c263
Author: Ruifeng Zheng 
AuthorDate: Tue Apr 23 23:09:10 2024 -0700

[SPARK-47969][PYTHON][TESTS] Make `test_creation_index` deterministic

### What changes were proposed in this pull request?
Make `test_creation_index` deterministic

### Why are the changes needed?
it may fail in some env
```
FAIL [16.261s]: test_creation_index 
(pyspark.pandas.tests.frame.test_constructor.FrameConstructorTests.test_creation_index)
--
Traceback (most recent call last):
  File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in 
_assert_pandas_equal
assert_frame_equal(
  File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 1257, in assert_frame_equal
assert_index_equal(
  File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 407, in assert_index_equal
raise_assert_detail(obj, msg, left, right)
  File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 665, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame.index are different
DataFrame.index values are different (40.0 %)
[left]:  Int64Index([2, 3, 4, 6, 5], dtype='int64')
[right]: Int64Index([2, 3, 4, 5, 6], dtype='int64')
```

### Does this PR introduce _any_ user-facing change?
no. test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46200 from zhengruifeng/fix_test_creation_index.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/pandas/tests/frame/test_constructor.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/pandas/tests/frame/test_constructor.py 
b/python/pyspark/pandas/tests/frame/test_constructor.py
index ee010d8f023d..d7581895c6c9 100644
--- a/python/pyspark/pandas/tests/frame/test_constructor.py
+++ b/python/pyspark/pandas/tests/frame/test_constructor.py
@@ -195,14 +195,14 @@ class FrameConstructorMixin:
 with ps.option_context("compute.ops_on_diff_frames", True):
 # test with ps.DataFrame and pd.Index
 self.assert_eq(
-ps.DataFrame(data=psdf, index=pd.Index([2, 3, 4, 5, 6])),
-pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])),
+ps.DataFrame(data=psdf, index=pd.Index([2, 3, 4, 5, 
6])).sort_index(),
+pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 
6])).sort_index(),
 )
 
 # test with ps.DataFrame and ps.Index
 self.assert_eq(
-ps.DataFrame(data=psdf, index=ps.Index([2, 3, 4, 5, 6])),
-pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 6])),
+ps.DataFrame(data=psdf, index=ps.Index([2, 3, 4, 5, 
6])).sort_index(),
+pd.DataFrame(data=pdf, index=pd.Index([2, 3, 4, 5, 
6])).sort_index(),
 )
 
 # test String Index


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47956][SQL] Sanity check for unresolved LCA reference

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 66613ba042c4 [SPARK-47956][SQL] Sanity check for unresolved LCA 
reference
66613ba042c4 is described below

commit 66613ba042c4b73b45b3c71e79ce05c225f527e7
Author: Wenchen Fan 
AuthorDate: Tue Apr 23 08:44:48 2024 -0700

[SPARK-47956][SQL] Sanity check for unresolved LCA reference

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/40558. The 
sanity check should apply to all plan nodes, not only Project/Aggregate/Window, 
as we don't know what bug can happen. Maybe the bug moves LCA references to 
other plan nodes.

### Why are the changes needed?

better error message when bug happens

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46185 from cloud-fan/small.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala  | 20 ++--
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 10bff5e6e59a..d1b336b08955 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -110,9 +110,8 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   }
 
   /** Check and throw exception when a given resolved plan contains 
LateralColumnAliasReference. */
-  private def checkNotContainingLCA(exprSeq: Seq[NamedExpression], plan: 
LogicalPlan): Unit = {
-if (!plan.resolved) return
-
exprSeq.foreach(_.transformDownWithPruning(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))
 {
+  private def checkNotContainingLCA(exprs: Seq[Expression], plan: 
LogicalPlan): Unit = {
+
exprs.foreach(_.transformDownWithPruning(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE))
 {
   case lcaRef: LateralColumnAliasReference =>
 throw SparkException.internalError("Resolved plan should not contain 
any " +
   s"LateralColumnAliasReference.\nDebugging information: plan:\n$plan",
@@ -789,17 +788,10 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
   msg = s"Found the unresolved operator: 
${o.simpleString(SQLConf.get.maxToStringFields)}",
   context = o.origin.getQueryContext,
   summary = o.origin.context.summary)
-  // If the plan is resolved, the resolved Project, Aggregate or Window 
should have restored or
-  // resolved all lateral column alias references. Add check for extra 
safe.
-  case p @ Project(pList, _)
-if pList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
-checkNotContainingLCA(pList, p)
-  case agg @ Aggregate(_, aggList, _)
-if aggList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
-checkNotContainingLCA(aggList, agg)
-  case w @ Window(pList, _, _, _)
-if pList.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
-checkNotContainingLCA(pList, w)
+  // If the plan is resolved, all lateral column alias references should 
have been either
+  // restored or resolved. Add check for extra safe.
+  case o if 
o.expressions.exists(_.containsPattern(LATERAL_COLUMN_ALIAS_REFERENCE)) =>
+checkNotContainingLCA(o.expressions, o)
   case _ =>
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b01755f2791 [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` 
version to 2.0.0
2b01755f2791 is described below

commit 2b01755f27917b1d391835e6f8b1b2f9a34cc832
Author: Haejoon Lee 
AuthorDate: Tue Apr 23 07:49:15 2024 -0700

[SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0

### What changes were proposed in this pull request?

This PR proposes to bump Pandas version up to 2.0.0.

### Why are the changes needed?

From Apache Spark 4.0.0, Pandas API on Spark supports Pandas 2.0.0 and 
above and some of features will be broken from Pandas 1.x, so installing Pandas 
2.x is required.

See the full list of breaking changes from [Upgrading from PySpark 3.5 to 
4.0](https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_upgrade.rst#upgrading-from-pyspark-35-to-40).

### Does this PR introduce _any_ user-facing change?

No API changes, but the minimum Pandas version from user-facing 
documentation will be changed.

### How was this patch tested?

The existing CI should pass.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46175 from itholic/bump_pandas_2.

Authored-by: Haejoon Lee 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/spark-rm/Dockerfile | 2 +-
 python/docs/source/getting_started/install.rst | 6 +++---
 python/docs/source/migration_guide/pyspark_upgrade.rst | 3 +--
 python/docs/source/user_guide/sql/arrow_pandas.rst | 2 +-
 python/packaging/classic/setup.py  | 2 +-
 python/packaging/connect/setup.py  | 2 +-
 python/pyspark/sql/pandas/utils.py | 2 +-
 7 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index f51b24d58394..8d5ca38ba88e 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -37,7 +37,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true
 # These arguments are just for reuse and not really meant to be customized.
 ARG APT_INSTALL="apt-get install --no-install-recommends -y"
 
-ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 
sphinx-copybutton==0.5.2 pandas==1.5.3 pyarrow==10.0.1 plotly==5.4.0 
markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 
grpcio-status==1.62.0 googleapis-common-protos==1.56.4"
+ARG PIP_PKGS="sphinx==4.5.0 mkdocs==1.1.2 numpy==1.20.3 
pydata_sphinx_theme==0.13.3 ipython==7.19.0 nbsphinx==0.8.0 numpydoc==1.1.0 
jinja2==3.1.2 twine==3.4.1 sphinx-plotly-directive==0.1.3 
sphinx-copybutton==0.5.2 pandas==2.0.3 pyarrow==10.0.1 plotly==5.4.0 
markupsafe==2.0.1 docutils<0.17 grpcio==1.62.0 protobuf==4.21.6 
grpcio-status==1.62.0 googleapis-common-protos==1.56.4"
 ARG GEM_PKGS="bundler:2.3.8"
 
 # Install extra needed repos and refresh.
diff --git a/python/docs/source/getting_started/install.rst 
b/python/docs/source/getting_started/install.rst
index 08b6cc813cba..33a0560764df 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -205,7 +205,7 @@ Installable with ``pip install "pyspark[connect]"``.
 == = ==
 PackageSupported version Note
 == = ==
-`pandas`   >=1.4.4   Required for Spark Connect
+`pandas`   >=2.0.0   Required for Spark Connect
 `pyarrow`  >=10.0.0  Required for Spark Connect
 `grpcio`   >=1.62.0  Required for Spark Connect
 `grpcio-status`>=1.62.0  Required for Spark Connect
@@ -220,7 +220,7 @@ Installable with ``pip install "pyspark[sql]"``.
 = = ==
 Package   Supported version Note
 = = ==
-`pandas`  >=1.4.4   Required for Spark SQL
+`pandas`  >=2.0.0   Required for Spark SQL
 `pyarrow` >=10.0.0  Required for Spark SQL
 = = ==
 
@@ -233,7 +233,7 @@ Installable with ``pip install "pyspark[pandas_on_spark]"``.
 = = 
 Package   Supported version Note
 = =

(spark) branch master updated (cf5fc0c720ee -> 9c4f12ca04ac)

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cf5fc0c720ee [MINOR][DOCS] Fix type hint of 3 functions
 add 9c4f12ca04ac [SPARK-47949][SQL][DOCKER][TESTS] MsSQLServer: Bump up 
mssql docker image version to 2022-CU12-GDR1-ubuntu-22.04

No new revisions were added by this update.

Summary of changes:
 ...OnDocker.scala => MsSQLServerDatabaseOnDocker.scala} | 13 +++--
 .../spark/sql/jdbc/MsSqlServerIntegrationSuite.scala| 14 +-
 .../spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala | 16 ++--
 .../spark/sql/jdbc/v2/MsSqlServerNamespaceSuite.scala   | 17 ++---
 4 files changed, 12 insertions(+), 48 deletions(-)
 copy 
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/{MySQLDatabaseOnDocker.scala
 => MsSQLServerDatabaseOnDocker.scala} (72%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][DOCS] Fix type hint of 3 functions

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cf5fc0c720ee [MINOR][DOCS] Fix type hint of 3 functions
cf5fc0c720ee is described below

commit cf5fc0c720eef01c5fe86a6ce05160adbdbf4678
Author: Ruifeng Zheng 
AuthorDate: Tue Apr 23 07:42:44 2024 -0700

[MINOR][DOCS] Fix type hint of 3 functions

### What changes were proposed in this pull request?
Fix type hint of 3 functions

I did a quick scan of the functions, don't find other similar places.

### Why are the changes needed?
a string input will be treated as literal instead of column name

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46179 from zhengruifeng/correct_con.

Authored-by: Ruifeng Zheng 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/sql/connect/functions/builtin.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/connect/functions/builtin.py 
b/python/pyspark/sql/connect/functions/builtin.py
index 519e53c3a13f..8fffb1831466 100644
--- a/python/pyspark/sql/connect/functions/builtin.py
+++ b/python/pyspark/sql/connect/functions/builtin.py
@@ -2141,7 +2141,7 @@ def sequence(
 sequence.__doc__ = pysparkfuncs.sequence.__doc__
 
 
-def schema_of_csv(csv: "ColumnOrName", options: Optional[Dict[str, str]] = 
None) -> Column:
+def schema_of_csv(csv: Union[str, Column], options: Optional[Dict[str, str]] = 
None) -> Column:
 if isinstance(csv, Column):
 _csv = csv
 elif isinstance(csv, str):
@@ -2161,7 +2161,7 @@ def schema_of_csv(csv: "ColumnOrName", options: 
Optional[Dict[str, str]] = None)
 schema_of_csv.__doc__ = pysparkfuncs.schema_of_csv.__doc__
 
 
-def schema_of_json(json: "ColumnOrName", options: Optional[Dict[str, str]] = 
None) -> Column:
+def schema_of_json(json: Union[str, Column], options: Optional[Dict[str, str]] 
= None) -> Column:
 if isinstance(json, Column):
 _json = json
 elif isinstance(json, str):
@@ -2181,7 +2181,7 @@ def schema_of_json(json: "ColumnOrName", options: 
Optional[Dict[str, str]] = Non
 schema_of_json.__doc__ = pysparkfuncs.schema_of_json.__doc__
 
 
-def schema_of_xml(xml: "ColumnOrName", options: Optional[Dict[str, str]] = 
None) -> Column:
+def schema_of_xml(xml: Union[str, Column], options: Optional[Dict[str, str]] = 
None) -> Column:
 if isinstance(xml, Column):
 _xml = xml
 elif isinstance(xml, str):


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (ca916258b991 -> 33fa77cb4868)

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ca916258b991 [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark 
SQL Data Types to Microsoft SQL Server
 add 33fa77cb4868 [MINOR][DOCS] Add `docs/_generated/` to .gitignore

No new revisions were added by this update.

Summary of changes:
 .gitignore | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server

2024-04-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca916258b991 [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark 
SQL Data Types to Microsoft SQL Server
ca916258b991 is described below

commit ca916258b9916452aa2f377608e6be8df65550e5
Author: Kent Yao 
AuthorDate: Tue Apr 23 07:41:04 2024 -0700

[SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to 
Microsoft SQL Server

### What changes were proposed in this pull request?

This PR adds Document Mapping Spark SQL Data Types to Microsoft SQL Server

### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

doc build

![image](https://github.com/apache/spark/assets/8326978/7220d96a-c5ca-4780-9fc5-f93c99f91c10)

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46177 from yaooqinn/SPARK-47953.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 docs/sql-data-sources-jdbc.md | 106 ++
 1 file changed, 106 insertions(+)

diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index 51c0886430a3..734ed43f912a 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -1630,3 +1630,109 @@ as the activated JDBC Driver.
 
   
 
+
+### Mapping Spark SQL Data Types to Microsoft SQL Server
+
+The below table describes the data type conversions from Spark SQL Data Types 
to Microsoft SQL Server data types,
+when creating, altering, or writing data to a Microsoft SQL Server table using 
the built-in jdbc data source with
+the mssql-jdbc as the activated JDBC Driver.
+
+
+  
+
+  Spark SQL Data Type
+  SQL Server Data Type
+  Remarks
+
+  
+  
+
+  BooleanType
+  bit
+  
+
+
+  ByteType
+  smallint
+  Supported since Spark 4.0.0, previous versions throw errors
+
+
+  ShortType
+  smallint
+  
+
+
+  IntegerType
+  int
+  
+
+
+  LongType
+  bigint
+  
+
+
+  FloatType
+  real
+  
+
+
+  DoubleType
+  double precision
+  
+
+
+  DecimalType(p, s)
+  number(p,s)
+  
+
+
+  DateType
+  date
+  
+
+
+  TimestampType
+  datetime
+  
+
+
+  TimestampNTZType
+  datetime
+  
+
+
+  StringType
+  nvarchar(max)
+  
+
+
+  BinaryType
+  varbinary(max)
+  
+
+
+  CharType(n)
+  char(n)
+  
+
+
+  VarcharType(n)
+  varchar(n)
+  
+
+  
+
+
+The Spark Catalyst data types below are not supported with suitable SQL Server 
types.
+
+- DayTimeIntervalType
+- YearMonthIntervalType
+- CalendarIntervalType
+- ArrayType
+- MapType
+- StructType
+- UserDefinedType
+- NullType
+- ObjectType
+- VariantType


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: [SPARK-47943] Add `GitHub Action` CI for Java Build and Test

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 4a5febd  [SPARK-47943] Add `GitHub Action` CI for Java Build and Test
4a5febd is described below

commit 4a5febd8f48716c0506738fc6a5fd58afb95779f
Author: zhou-jiang 
AuthorDate: Mon Apr 22 22:44:17 2024 -0700

[SPARK-47943] Add `GitHub Action` CI for Java Build and Test

### What changes were proposed in this pull request?

This PR adds an additional CI build task for operator.

### Why are the changes needed?

The additional CI task is needed in order to build and test Java code for 
upcoming operator pull requests.

When Java plugin is enabled and Java source is checked in, `./gradlew 
build` 
[task](https://docs.gradle.org/3.3/userguide/java_plugin.html#sec:java_tasks) 
by default includes a set of tasks to compile and run tests. This can serve as 
pull request build.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

tested locally.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #7 from jiangzho/ci.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 6a5a147..887119f 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -26,4 +26,20 @@ jobs:
   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
 with:
   config: .github/.licenserc.yaml
-
+  build-test:
+name: "Build Test CI"
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+java-version: [ 17, 21 ]
+steps:
+  - name: Checkout repository
+uses: actions/checkout@v3
+  - name: Set up JDK ${{ matrix.java-version }}
+uses: actions/setup-java@v2
+with:
+  java-version: ${{ matrix.java-version }}
+  distribution: 'adopt'
+  - name: Build with Gradle
+run: |
+  set -o pipefail; ./gradlew build; set +o pipefail


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: [SPARK-47929] Setup Static Analysis for Operator

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 798ca15  [SPARK-47929] Setup Static Analysis for Operator
798ca15 is described below

commit 798ca15844c71baf5d7f1f8842e461a73c1009a9
Author: zhou-jiang 
AuthorDate: Mon Apr 22 22:42:23 2024 -0700

[SPARK-47929] Setup Static Analysis for Operator

### What changes were proposed in this pull request?

This is a breakdown PR from #2  - setting up common build Java tasks and 
corresponding plugins.

### Why are the changes needed?

This PR includes checkstyle, pmd, spotbugs. Also includes jacoco for 
coverage analysis, spotless for formatting. These tasks can help to enhance the 
quality of future Java contributions. They can also be referred in CI tasks for 
automation.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tested manually.

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #6 from jiangzho/builder_task.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 build.gradle |  76 -
 config/checkstyle/checkstyle.xml | 208 +++
 config/pmd/ruleset.xml   |  33 ++
 config/spotbugs/spotbugs_exclude.xml |  25 +
 gradle.properties|  22 
 5 files changed, 362 insertions(+), 2 deletions(-)

diff --git a/build.gradle b/build.gradle
index 6732f5a..f64212b 100644
--- a/build.gradle
+++ b/build.gradle
@@ -1,3 +1,18 @@
+buildscript {
+  repositories {
+maven {
+  url = uri("https://plugins.gradle.org/m2/;)
+}
+  }
+  dependencies {
+classpath 
"com.github.spotbugs.snom:spotbugs-gradle-plugin:${spotBugsGradlePluginVersion}"
+classpath 
"com.diffplug.spotless:spotless-plugin-gradle:${spotlessPluginVersion}"
+  }
+}
+
+assert JavaVersion.current().isCompatibleWith(JavaVersion.VERSION_17): "Java 
17 or newer is " +
+"required"
+
 subprojects {
   apply plugin: 'idea'
   apply plugin: 'eclipse'
@@ -6,7 +21,64 @@ subprojects {
   targetCompatibility = 17
 
   repositories {
-  mavenCentral()
-  jcenter()
+mavenCentral()
+jcenter()
+  }
+
+  apply plugin: 'checkstyle'
+  checkstyle {
+toolVersion = checkstyleVersion
+configFile = file("$rootDir/config/checkstyle/checkstyle.xml")
+ignoreFailures = false
+showViolations = true
+  }
+
+  apply plugin: 'pmd'
+  pmd {
+ruleSets = ["java-basic", "java-braces"]
+ruleSetFiles = files("$rootDir/config/pmd/ruleset.xml")
+toolVersion = pmdVersion
+consoleOutput = true
+ignoreFailures = false
+  }
+
+  apply plugin: 'com.github.spotbugs'
+  spotbugs {
+toolVersion = spotBugsVersion
+afterEvaluate {
+  reportsDir = file("${project.reporting.baseDir}/findbugs")
+}
+excludeFilter = file("$rootDir/config/spotbugs/spotbugs_exclude.xml")
+ignoreFailures = false
+  }
+
+  apply plugin: 'jacoco'
+  jacoco {
+toolVersion = jacocoVersion
+  }
+  jacocoTestReport {
+dependsOn test
+  }
+
+  apply plugin: 'com.diffplug.spotless'
+  spotless {
+java {
+  endWithNewline()
+  googleJavaFormat('1.17.0')
+  importOrder(
+'java',
+'javax',
+'scala',
+'',
+'org.apache.spark',
+  )
+  trimTrailingWhitespace()
+  removeUnusedImports()
+}
+format 'misc', {
+  target '*.md', '*.gradle', '**/*.properties', '**/*.xml', '**/*.yaml', 
'**/*.yml'
+  endWithNewline()
+  trimTrailingWhitespace()
+}
   }
 }
diff --git a/config/checkstyle/checkstyle.xml b/config/checkstyle/checkstyle.xml
new file mode 100644
index 000..90161fe
--- /dev/null
+++ b/config/checkstyle/checkstyle.xml
@@ -0,0 +1,208 @@
+
+
+https://checkstyle.org/dtds/configuration_1_3.dtd;>
+
+
+
+
+  
+
+  
+
+  
+
+  
+  
+  
+
+  
+
+  
+
+
+
+  
+
+  
+
+ftp://"/>
+  
+
+  
+
+  
+
+
+  
+  
+  
+
+
+
+  
+  
+  
+
+
+  
+  
+  
+
+
+
+  
+  
+
+
+  
+
+
+
+
+
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+  
+  
+
+
+
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+
+
+
+
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+
+
+  
+  
+

(spark) branch master updated (9d715ba49171 -> 876c2cf34a35)

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9d715ba49171 [SPARK-47938][SQL] MsSQLServer: Cannot find data type 
BYTE error
 add 876c2cf34a35 [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's 
version and clean up exclusions

No new revisions were added by this update.

Summary of changes:
 pom.xml | 69 +++--
 1 file changed, 41 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d715ba49171 [SPARK-47938][SQL] MsSQLServer: Cannot find data type 
BYTE error
9d715ba49171 is described below

commit 9d715ba491710969340d9e8a49a21d11f51ef7d3
Author: Kent Yao 
AuthorDate: Mon Apr 22 22:31:13 2024 -0700

[SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error

### What changes were proposed in this pull request?

This PR uses SMALLINT (as TINYINT ranges [0, 255]) instead of BYTE to fix 
the ByteType mapping for MsSQLServer JDBC

```java
[info]   com.microsoft.sqlserver.jdbc.SQLServerException: Column, 
parameter, or variable #1: Cannot find data type BYTE.
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1662)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:898)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:793)
[info]   at 
com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7417)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3488)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:262)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:237)
[info]   at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeUpdate(SQLServerStatement.java:733)
[info]   at 
org.apache.spark.sql.jdbc.JdbcDialect.createTable(JdbcDialects.scala:267)
```

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46164 from yaooqinn/SPARK-47938.

Lead-authored-by: Kent Yao 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala   | 8 
 .../main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala | 1 +
 2 files changed, 9 insertions(+)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
index 8bceb9506e85..273e8c35dd07 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
@@ -437,4 +437,12 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   .load()
 assert(df.collect().toSet === expectedResult)
   }
+
+  test("SPARK-47938: Fix 'Cannot find data type BYTE' in SQL Server") {
+spark.sql("select cast(1 as byte) as c0")
+  .write
+  .jdbc(jdbcUrl, "test_byte", new Properties)
+val df = spark.read.jdbc(jdbcUrl, "test_byte", new Properties)
+checkAnswer(df, Row(1.toShort))
+  }
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
index 862e99adc3b0..1d05c0d7c24e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala
@@ -136,6 +136,7 @@ private case class MsSqlServerDialect() extends JdbcDialect 
{
 case BinaryType => Some(JdbcType("VARBINARY(MAX)", 
java.sql.Types.VARBINARY))
 case ShortType if !SQLConf.get.legacyMsSqlServerNumericMappingEnabled =>
   Some(JdbcType("SMALLINT", java.sql.Types.SMALLINT))
+case ByteType => Some(JdbcType("SMALLINT", java.sql.Types.TINYINT))
 case _ => None
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (e4fb7dd98219 -> a97e72cfa7d4)

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e4fb7dd98219 [MINOR] Remove unnecessary `imports`
 add a97e72cfa7d4 [SPARK-47937][PYTHON][DOCS] Fix docstring of 
`hll_sketch_agg`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/functions/builtin.py |  8 +---
 python/pyspark/sql/functions/builtin.py | 12 +++-
 2 files changed, 12 insertions(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (b335dd366fb1 -> e4fb7dd98219)

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b335dd366fb1 [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move 
`pyspark.classic` references
 add e4fb7dd98219 [MINOR] Remove unnecessary `imports`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/util/Distribution.scala| 2 --
 .../scala/org/apache/spark/input/WholeTextFileInputFormatSuite.scala| 2 --
 .../scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala   | 2 --
 sql/api/src/main/scala/org/apache/spark/sql/types/UpCastRule.scala  | 2 --
 .../src/main/scala/org/apache/spark/sql/execution/CacheManager.scala| 2 --
 .../scala/org/apache/spark/sql/CollationRegexpExpressionsSuite.scala| 2 --
 .../scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala| 2 --
 sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala   | 1 -
 .../test/scala/org/apache/spark/sql/hive/client/HiveClientSuites.scala  | 2 --
 .../org/apache/spark/sql/hive/client/HiveClientUserNameSuites.scala | 2 --
 .../scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala | 2 --
 .../org/apache/spark/sql/hive/client/HivePartitionFilteringSuites.scala | 2 --
 12 files changed, 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d7c3794a0c56 [SPARK-47904][SQL][3.5] Preserve case in Avro schema when 
using enableStableIdentifiersForUnionType
d7c3794a0c56 is described below

commit d7c3794a0c567b12e8c8e18132aa362f11acdf5f
Author: Ivan Sadikov 
AuthorDate: Mon Apr 22 15:36:13 2024 -0700

[SPARK-47904][SQL][3.5] Preserve case in Avro schema when using 
enableStableIdentifiersForUnionType

### What changes were proposed in this pull request?

Backport of https://github.com/apache/spark/pull/46126 to branch-3.5.

When `enableStableIdentifiersForUnionType` is enabled, all of the types are 
lowercased which creates a problem when field types are case-sensitive:

Union type with fields:
```
Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
Schema.Field("F", Schema.create(Type.FLOAT))).asJava)
```

would become

```
struct>
```

but instead should be
```
struct>
```

### Why are the changes needed?

Fixes a bug of lowercasing the field name (the type portion).

### Does this PR introduce _any_ user-facing change?

Yes, if a user enables `enableStableIdentifiersForUnionType` and has Union 
types, all fields will preserve the case. Previously, the field names would be 
all in lowercase.

### How was this patch tested?

I added a test case to verify the new field names.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46169 from sadikovi/SPARK-47904-3.5.

Authored-by: Ivan Sadikov 
Signed-off-by: Dongjoon Hyun 
---
 .../apache/spark/sql/avro/SchemaConverters.scala   | 10 +++
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 31 --
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
index 06abe977e3b0..af358a8d1c96 100644
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
@@ -183,14 +183,14 @@ object SchemaConverters {
   // Avro's field name may be case sensitive, so field names 
for two named type
   // could be "a" and "A" and we need to distinguish them. In 
this case, we throw
   // an exception.
-  val temp_name = 
s"member_${s.getName.toLowerCase(Locale.ROOT)}"
-  if (fieldNameSet.contains(temp_name)) {
+  // Stable id prefix can be empty so the name of the field 
can be just the type.
+  val tempFieldName = s"member_${s.getName}"
+  if 
(!fieldNameSet.add(tempFieldName.toLowerCase(Locale.ROOT))) {
 throw new IncompatibleSchemaException(
-  "Cannot generate stable indentifier for Avro union type 
due to name " +
+  "Cannot generate stable identifier for Avro union type 
due to name " +
   s"conflict of type name ${s.getName}")
   }
-  fieldNameSet.add(temp_name)
-  temp_name
+  tempFieldName
 } else {
   s"member$i"
 }
diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
index 1df99210a55a..01c9dfb57a19 100644
--- a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
+++ b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
@@ -370,7 +370,7 @@ abstract class AvroSuite
   "",
   Seq())
   }
-  assert(e.getMessage.contains("Cannot generate stable indentifier"))
+  assert(e.getMessage.contains("Cannot generate stable identifier"))
 }
 {
   val e = intercept[Exception] {
@@ -381,7 +381,7 @@ abstract class AvroSuite
   "",
   Seq())
   }
-  assert(e.getMessage.contains("Cannot generate stable indentifier"))
+  assert(e.getMessage.contains("Cannot generate stable identifier"))
 }
 // Two array types or two map types are not allowed in union.
 {
@@ -434,6 +434,33 @@ abstract class AvroSuite
 }
   }
 
+  tes

(spark) branch master updated: [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ac9a12ef6e06 [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support
ac9a12ef6e06 is described below

commit ac9a12ef6e062ae07e878e202521b22de9979a17
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 22 14:46:03 2024 -0700

[SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support

### What changes were proposed in this pull request?

This PR aims to update K8s docs to recommend K8s v1.27+ for Apache Spark 
4.0.0.

This is a kind of follow-up of the following previous PR because Apache 
Spark 4.0.0 schedule is delayed slightly.
- #43069

### Why are the changes needed?

**1. K8s community starts to release v1.30.0 from 2024-04-17.**
- https://kubernetes.io/releases/#release-v1-30

**2. Default K8s Version in Public Cloud environments**

The default K8s versions of public cloud providers are already K8s 1.27+.

- EKS: v1.29 (Default)
- GKE: v1.29 (Rapid),  v1.28 (Regular), v1.27 (Stable)
- AKS: v1.27

**3. End Of Support**

In addition, K8s 1.26 is going to reach EOL when Apache Spark 4.0.0 arrives 
because K8s 1.26 is also going to reach EOL on June.

| K8s  |   AKS   |   GKE   |   EKS   |
|  | --- | --- | --- |
| 1.26 | 2024-03 | 2024-06 | 2024-06 |

- [AKS EOL 
Schedule](https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar)
- [GKE EOL 
Schedule](https://cloud.google.com/kubernetes-engine/docs/release-schedule)
- [EKS EOL 
Schedule](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar)

### Does this PR introduce _any_ user-facing change?

- No, this is a documentation-only change about K8s versions.
- Apache Spark K8s Integration Test is currently using K8s v1.30.0 on 
Minikube already.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46168 from dongjoon-hyun/SPARK-47942.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 docs/running-on-kubernetes.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 778af5f0751a..606b5eb6f900 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -44,7 +44,7 @@ Cluster administrators should use [Pod Security 
Policies](https://kubernetes.io/
 
 # Prerequisites
 
-* A running Kubernetes cluster at version >= 1.26 with access configured to it 
using
+* A running Kubernetes cluster at version >= 1.27 with access configured to it 
using
 [kubectl](https://kubernetes.io/docs/reference/kubectl/).  If you do not 
already have a working Kubernetes cluster,
 you may set up a test cluster on your local machine using
 [minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (f2d0cf23018f -> fc0c8553ea05)

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f2d0cf23018f [SPARK-47907][SQL] Put bang under a config
 add fc0c8553ea05 [SPARK-47904][SQL] Preserve case in Avro schema when 
using enableStableIdentifiersForUnionType

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/avro/SchemaConverters.scala   |  8 +++---
 .../org/apache/spark/sql/avro/AvroSuite.scala  | 31 --
 2 files changed, 32 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 86563169eef8 [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to 
`33.1.0-jre` in Docker IT
86563169eef8 is described below

commit 86563169eef899040e1ec70dd9963c64311dbaa1
Author: Cheng Pan 
AuthorDate: Mon Apr 22 13:34:20 2024 -0700

[SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in 
Docker IT

### What changes were proposed in this pull request?

This PR aims to upgrade `guava` dependency to `33.1.0-jre` in Docker 
Integration tests.

### Why are the changes needed?

This is a preparation of the following PR.
- #45372

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46167 from dongjoon-hyun/SPARK-47940.

Authored-by: Cheng Pan 
Signed-off-by: Dongjoon Hyun 
---
 connector/docker-integration-tests/pom.xml | 2 +-
 project/SparkBuild.scala   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/connector/docker-integration-tests/pom.xml 
b/connector/docker-integration-tests/pom.xml
index bb7647c72491..9003c2190be2 100644
--- a/connector/docker-integration-tests/pom.xml
+++ b/connector/docker-integration-tests/pom.xml
@@ -39,7 +39,7 @@
 
   com.google.guava
   guava
-  33.0.0-jre
+  33.1.0-jre
   test
 
 
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index bcaa51ec30ff..1bcc9c893393 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -952,7 +952,7 @@ object Unsafe {
 object DockerIntegrationTests {
   // This serves to override the override specified in DependencyOverrides:
   lazy val settings = Seq(
-dependencyOverrides += "com.google.guava" % "guava" % "33.0.0-jre"
+dependencyOverrides += "com.google.guava" % "guava" % "33.1.0-jre"
   )
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (256fc51508e4 -> 676d47ffe091)

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 256fc51508e4 [SPARK-47411][SQL] Support StringInstr & FindInSet 
functions to work with collated strings
 add 676d47ffe091 [SPARK-47935][INFRA][PYTHON] Pin `pandas==2.0.3` for 
`pypy3.8`

No new revisions were added by this update.

Summary of changes:
 dev/infra/Dockerfile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6

2024-04-22 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2fb31dea1c53 [SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6
2fb31dea1c53 is described below

commit 2fb31dea1c53352a8101bb0ec91f46c7d7ff826e
Author: panbingkun 
AuthorDate: Mon Apr 22 00:44:32 2024 -0700

[SPARK-47930][BUILD] Upgrade RoaringBitmap to 1.0.6

### What changes were proposed in this pull request?
The pr aims to upgrade `RoaringBitmap` from `1.0.5` to `1.0.6`.

### Why are the changes needed?
The full release notes:
https://github.com/RoaringBitmap/RoaringBitmap/releases/tag/1.0.6

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46152 from panbingkun/SPARK-47930.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt | 8 
 core/benchmarks/MapStatusesConvertBenchmark-results.txt   | 8 
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
index 502d10c1c58c..607efde07d1e 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk21-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500708715 
  8  0.0   707870326.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1610   1623 
 12  0.0  1610312472.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2443   2461 
 23  0.0  2442675908.0   0.3X
+Num Maps: 5 Fetch partitions:500686690 
  4  0.0   686489113.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1701   1727 
 24  0.0  1700658689.0   0.4X
+Num Maps: 5 Fetch partitions:1500  2750   2760 
 13  0.0  2749746755.0   0.2X
 
 
diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt 
b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
index 9fe4175bb5d9..3efec12b2cb3 100644
--- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt
+++ b/core/benchmarks/MapStatusesConvertBenchmark-results.txt
@@ -2,12 +2,12 @@
 MapStatuses Convert Benchmark
 

 
-OpenJDK 64-Bit Server VM 17.0.10+7-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 MapStatuses Convert:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Num Maps: 5 Fetch partitions:500775778 
  5  0.0   774980756.0   1.0X
-Num Maps: 5 Fetch partitions:1000  1765   1765 
  1  0.0  1765011999.0   0.4X
-Num Maps: 5 Fetch partitions:1500  2671   2682 
 15  0.0  2671372452.0   0.3X
+Num Maps: 5 Fetch partitions:500736746 
 12  0.0   736390304.0   1.0X
+Num Maps: 5 Fetch partitions:1000  1615   1632 
 16  0.0  1615129364.0   0.5X
+Num Maps: 5 Fetch partitions:1500  2574   2589 
 14  0.0  2573656222.0   0.3X
 
 
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 6420c9df4d16..c1adff73d339 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -1,7 +1,7 @@
 HikariCP/2.5.1//HikariCP-2.5.1.jar
 JLargeArrays/1.5//JLargeArrays-1.5.jar

(spark) branch master updated: [SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new adf02d38061b [SPARK-47925][SQL][TESTS] Mark 
`BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
adf02d38061b is described below

commit adf02d38061bd0ef48fd07252bef7706a0e49757
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 19 20:04:13 2024 -0700

[SPARK-47925][SQL][TESTS] Mark `BloomFilterAggregateQuerySuite` as 
`ExtendedSQLTest`

### What changes were proposed in this pull request?

This PR aims to mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest` 
to run in a different test pipeline.

### Why are the changes needed?

This will move this test case from `sql - other tests` to `sql - extended 
tests` to rebalance test pipelines.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46145 from dongjoon-hyun/SPARK-47925.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala | 2 ++
 1 file changed, 2 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala
index 4edb51d27190..9b39a2295e7d 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/BloomFilterAggregateQuerySuite.scala
@@ -26,10 +26,12 @@ import 
org.apache.spark.sql.execution.aggregate.BaseAggregateExec
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types.LongType
+import org.apache.spark.tags.ExtendedSQLTest
 
 /**
  * Query tests for the Bloom filter aggregate and filter function.
  */
+@ExtendedSQLTest
 class BloomFilterAggregateQuerySuite extends QueryTest with SharedSparkSession 
{
   import testImplicits._
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3fcc0f7ac142 [SPARK-47923][R] Upgrade the minimum version of `arrow` R 
package to 10.0.0
3fcc0f7ac142 is described below

commit 3fcc0f7ac142756b38f66085543ca045abe76a9f
Author: Dongjoon Hyun 
AuthorDate: Fri Apr 19 19:58:15 2024 -0700

[SPARK-47923][R] Upgrade the minimum version of `arrow` R package to 10.0.0

### What changes were proposed in this pull request?

This PR aims to upgrade the minimum version of `arrow` R package to 10.0.0 
like PySpark.

### Why are the changes needed?

Apache Spark `master` branch tests only with the latest R package which is 
`15.0.1` as of now. To avoid any incompatibility issues across R and Python, we 
had better use the same minimum policy.
```
$ docker run -it --rm 
ghcr.io/apache/apache-spark-ci-image:master-8755911327 R -e 
'installed.packages()' | grep arrow | head -n1
arrow"arrow""/usr/local/lib/R/site-library" "15.0.1"
```

### Does this PR introduce _any_ user-facing change?

Yes, but most SparkR users has been using the latest one which is higher 
than 10.0.0 because `Arrow R package 10.0.0` was released 2022-10-26 and has 
been used over one and half years.
- https://cran.r-project.org/src/contrib/Archive/arrow/

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

    Closes #46142 from dongjoon-hyun/SPARK-47923.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 R/pkg/DESCRIPTION | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2523104268d3..f7dd261c10fd 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -21,7 +21,7 @@ Suggests:
 testthat,
 e1071,
 survival,
-arrow (>= 1.0.0)
+arrow (>= 10.0.0)
 Collate:
 'schema.R'
 'generics.R'


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (3fcc0f7ac142 -> 2613516110a4)

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3fcc0f7ac142 [SPARK-47923][R] Upgrade the minimum version of `arrow` R 
package to 10.0.0
 add 2613516110a4 [SPARK-47924][CORE] Add a DEBUG log to 
`DiskStore.moveFileToBlock`

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/storage/DiskStore.scala | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated (afd99d19a2b8 -> 6a358ff7d633)

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from afd99d19a2b8 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance 
regression in scala 2.12
 add 6a358ff7d633 [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & 
`WriteInputFormatTestDataGenerator` deprecated

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated (bcaf61b975d6 -> e7a2e5a196a8)

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from bcaf61b975d6 [SPARK-47897][SQL][3.5] Fix ExpressionSet performance 
regression in scala 2.12
 add e7a2e5a196a8 [SPARK-47828][CONNECT][PYTHON][3.4] 
DataFrameWriterV2.overwrite fails with invalid plan

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/plan.py  | 8 
 python/pyspark/sql/tests/test_readwriter.py | 7 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1

2024-04-19 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8aa8ad6be7b3 [SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 
6.12.1
8aa8ad6be7b3 is described below

commit 8aa8ad6be7b3eeceafa2ad1e9211fb8133bb675c
Author: Bjørn Jørgensen 
AuthorDate: Fri Apr 19 08:20:17 2024 -0700

[SPARK-47915][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.1

### What changes were proposed in this pull request?
Upgrade `kubernetes-client` from 6.12.0 to 6.12.1

### Why are the changes needed?
[Release 
notes](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.12.1)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46137 from bjornjorgensen/kub-client6.12.1.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 770a7522e9f7..6420c9df4d16 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -156,31 +156,31 @@ jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
 jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
-kubernetes-client-api/6.12.0//kubernetes-client-api-6.12.0.jar
-kubernetes-client/6.12.0//kubernetes-client-6.12.0.jar
-kubernetes-httpclient-okhttp/6.12.0//kubernetes-httpclient-okhttp-6.12.0.jar
-kubernetes-model-admissionregistration/6.12.0//kubernetes-model-admissionregistration-6.12.0.jar
-kubernetes-model-apiextensions/6.12.0//kubernetes-model-apiextensions-6.12.0.jar
-kubernetes-model-apps/6.12.0//kubernetes-model-apps-6.12.0.jar
-kubernetes-model-autoscaling/6.12.0//kubernetes-model-autoscaling-6.12.0.jar
-kubernetes-model-batch/6.12.0//kubernetes-model-batch-6.12.0.jar
-kubernetes-model-certificates/6.12.0//kubernetes-model-certificates-6.12.0.jar
-kubernetes-model-common/6.12.0//kubernetes-model-common-6.12.0.jar
-kubernetes-model-coordination/6.12.0//kubernetes-model-coordination-6.12.0.jar
-kubernetes-model-core/6.12.0//kubernetes-model-core-6.12.0.jar
-kubernetes-model-discovery/6.12.0//kubernetes-model-discovery-6.12.0.jar
-kubernetes-model-events/6.12.0//kubernetes-model-events-6.12.0.jar
-kubernetes-model-extensions/6.12.0//kubernetes-model-extensions-6.12.0.jar
-kubernetes-model-flowcontrol/6.12.0//kubernetes-model-flowcontrol-6.12.0.jar
-kubernetes-model-gatewayapi/6.12.0//kubernetes-model-gatewayapi-6.12.0.jar
-kubernetes-model-metrics/6.12.0//kubernetes-model-metrics-6.12.0.jar
-kubernetes-model-networking/6.12.0//kubernetes-model-networking-6.12.0.jar
-kubernetes-model-node/6.12.0//kubernetes-model-node-6.12.0.jar
-kubernetes-model-policy/6.12.0//kubernetes-model-policy-6.12.0.jar
-kubernetes-model-rbac/6.12.0//kubernetes-model-rbac-6.12.0.jar
-kubernetes-model-resource/6.12.0//kubernetes-model-resource-6.12.0.jar
-kubernetes-model-scheduling/6.12.0//kubernetes-model-scheduling-6.12.0.jar
-kubernetes-model-storageclass/6.12.0//kubernetes-model-storageclass-6.12.0.jar
+kubernetes-client-api/6.12.1//kubernetes-client-api-6.12.1.jar
+kubernetes-client/6.12.1//kubernetes-client-6.12.1.jar
+kubernetes-httpclient-okhttp/6.12.1//kubernetes-httpclient-okhttp-6.12.1.jar
+kubernetes-model-admissionregistration/6.12.1//kubernetes-model-admissionregistration-6.12.1.jar
+kubernetes-model-apiextensions/6.12.1//kubernetes-model-apiextensions-6.12.1.jar
+kubernetes-model-apps/6.12.1//kubernetes-model-apps-6.12.1.jar
+kubernetes-model-autoscaling/6.12.1//kubernetes-model-autoscaling-6.12.1.jar
+kubernetes-model-batch/6.12.1//kubernetes-model-batch-6.12.1.jar
+kubernetes-model-certificates/6.12.1//kubernetes-model-certificates-6.12.1.jar
+kubernetes-model-common/6.12.1//kubernetes-model-common-6.12.1.jar
+kubernetes-model-coordination/6.12.1//kubernetes-model-coordination-6.12.1.jar
+kubernetes-model-core/6.12.1//kubernetes-model-core-6.12.1.jar
+kubernetes-model-discovery/6.12.1//kubernetes-model-discovery-6.12.1.jar
+kubernetes-model-events/6.12.1//kubernetes-model-events-6.12.1.jar
+kubernetes-model-extensions/6.12.1//kubernetes-model-extensions-6.12.1.jar
+kubernetes-model-flowcontrol/6.12.1//kubernetes-model-flowcontrol-6.12.1.jar
+kubernetes-model-gatewayapi/6.12.1//kubernetes-model-gatewayapi-6.12.1.jar
+kubernetes-model-metrics/6.12.1//kubernetes-model-metrics-6.12.1.jar
+kubernetes-model-networking/6.12.1//kubernetes-model-networking-6.12.1.jar
+kubernetes-model-node/6.12.1//kubernetes-model-node-6.12.1.jar
+kubernetes

(spark) branch master updated: [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 delegation token

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 074ddc282567 [SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore 
support to HS2 delegation token
074ddc282567 is described below

commit 074ddc2825674edcea1bb7febf2c6d8b27c2e375
Author: Kent Yao 
AuthorDate: Thu Apr 18 10:23:11 2024 -0700

[SPARK-47898][SQL] Port HIVE-12270: Add DBTokenStore support to HS2 
delegation token

### What changes were proposed in this pull request?

This PR ports `HIVE-12270: Add DBTokenStore support to HS2 delegation 
token`. This is a partial, as tests and other diffs that are already in the 
upstream artifacts are not necessary.

### Why are the changes needed?

This PR can reduce the usage of HMS classes in spark-thriftserver, a small 
step for reducing blocker for upgrading builtin Hive

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Pass build

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46115 from yaooqinn/SPARK-47898.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../java/org/apache/hive/service/auth/HiveAuthFactory.java| 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
index e3316cef241c..c48f4e3ec7b0 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
@@ -27,9 +27,7 @@ import javax.security.sasl.Sasl;
 
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.conf.HiveConf.ConfVars;
-import org.apache.hadoop.hive.metastore.HiveMetaStore;
-import org.apache.hadoop.hive.metastore.HiveMetaStore.HMSHandler;
-import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
 import org.apache.hadoop.hive.shims.HadoopShims.KerberosNameShim;
 import org.apache.hadoop.hive.shims.ShimLoader;
 import org.apache.hadoop.hive.thrift.DBTokenStore;
@@ -132,16 +130,15 @@ public class HiveAuthFactory {
   HiveConf.ConfVars.METASTORE_CLUSTER_DELEGATION_TOKEN_STORE_CLS);
 
   if (tokenStoreClass.equals(DBTokenStore.class.getName())) {
-HMSHandler baseHandler = new HiveMetaStore.HMSHandler(
-"new db based metaserver", conf, true);
-rawStore = baseHandler.getMS();
+// Follows https://issues.apache.org/jira/browse/HIVE-12270
+rawStore = Hive.class;
   }
 
   delegationTokenManager.startDelegationTokenSecretManager(
   conf, rawStore, ServerMode.HIVESERVER2);
   
saslServer.setSecretManager(delegationTokenManager.getSecretManager());
 }
-catch (MetaException|IOException e) {
+catch (IOException e) {
   throw new TTransportException("Failed to start token manager", e);
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in timestamp tests

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9f9cc87c1a19 [MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in 
timestamp tests
9f9cc87c1a19 is described below

commit 9f9cc87c1a19f01b65840cfdbec831867277ee59
Author: Kent Yao 
AuthorDate: Thu Apr 18 10:20:51 2024 -0700

[MINOR][TESTS] Replace CONFIG_DIM1 with CONFIG_DIM2 in timestamp tests

### What changes were proposed in this pull request?

A followup of #33640, it looks like the test purpose has 2 different 
dimensions

### Why are the changes needed?

test fix

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46119 from yaooqinn/minor.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql | 2 +-
 sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql 
b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql
index 377b26c67a3e..28fe4539855c 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ltz.sql
@@ -1,6 +1,6 @@
 -- timestamp_ltz literals and constructors
 --CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_LTZ
---CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_NTZ
+--CONFIG_DIM2 spark.sql.timestampType=TIMESTAMP_NTZ
 
 select timestamp_ltz'2016-12-31 00:12:00', timestamp_ltz'2016-12-31';
 
diff --git a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql 
b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql
index d744c0c19b42..07901093cfba 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql
@@ -1,6 +1,6 @@
 -- timestamp_ntz literals and constructors
 --CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_LTZ
---CONFIG_DIM1 spark.sql.timestampType=TIMESTAMP_NTZ
+--CONFIG_DIM2 spark.sql.timestampType=TIMESTAMP_NTZ
 
 select timestamp_ntz'2016-12-31 00:12:00', timestamp_ntz'2016-12-31';
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: [SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml`

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new fe6e74e  [SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml`
fe6e74e is described below

commit fe6e74ee9005f6b2a275fd92583713ebca3159a5
Author: Dongjoon Hyun 
AuthorDate: Thu Apr 18 09:34:14 2024 -0700

[SPARK-47889][FOLLOWUP] Add `gradlew` to `.licenserc.yaml`

### What changes were proposed in this pull request?

This PR aims to add `gradlew` to `.licenserc.yaml` as a follow-up of
- #4

### Why are the changes needed?

To recover CI.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No,

Closes #5 from dongjoon-hyun/SPARK-47889.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .github/.licenserc.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml
index f00689f..26ac0c1 100644
--- a/.github/.licenserc.yaml
+++ b/.github/.licenserc.yaml
@@ -15,5 +15,6 @@ header:
 - 'NOTICE'
 - '.asf.yaml'
 - '**/*.gradle'
+- gradlew
 
   comment: on-failure


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark-kubernetes-operator) branch main updated: [SPARK-47889] Setup gradle as build tool for operator repository

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 5f2c89c  [SPARK-47889] Setup gradle as build tool for operator 
repository
5f2c89c is described below

commit 5f2c89cea4aa04c4439a6de651ea4cfcead95015
Author: zhou-jiang 
AuthorDate: Thu Apr 18 09:27:50 2024 -0700

[SPARK-47889] Setup gradle as build tool for operator repository

This is a breakdown from #2 : set up [gradle](https://gradle.org/) as the 
build-tool for operator

Closes #4 from jiangzho/gradle.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 .github/.licenserc.yaml  |   1 +
 .gitignore   |  10 ++
 LICENSE  |  10 ++
 build.gradle |  12 ++
 gradle/wrapper/gradle-wrapper.properties |  24 +++
 gradlew  | 253 +++
 6 files changed, 310 insertions(+)

diff --git a/.github/.licenserc.yaml b/.github/.licenserc.yaml
index e9d1245..f00689f 100644
--- a/.github/.licenserc.yaml
+++ b/.github/.licenserc.yaml
@@ -14,5 +14,6 @@ header:
 - 'LICENSE'
 - 'NOTICE'
 - '.asf.yaml'
+- '**/*.gradle'
 
   comment: on-failure
diff --git a/.gitignore b/.gitignore
index 78213f8..5e0e9b6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,13 @@
 .vscode
 /lib/
 target/
+
+# Gradle Files #
+
+.gradle
+.m2
+.out/
+build
+dependencies.lock
+**/dependencies.lock
+gradle/wrapper/gradle-wrapper.jar
diff --git a/LICENSE b/LICENSE
index 261eeb9..bde9e98 100644
--- a/LICENSE
+++ b/LICENSE
@@ -199,3 +199,13 @@
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
+
+
+
+This product includes a gradle wrapper.
+
+* gradlew and gradle/wrapper/gradle-wrapper.properties
+
+Copyright: 2015-2021 Gradle Authors.
+Home page: https://github.com/gradle/gradle
+License: https://www.apache.org/licenses/LICENSE-2.0
diff --git a/build.gradle b/build.gradle
new file mode 100644
index 000..6732f5a
--- /dev/null
+++ b/build.gradle
@@ -0,0 +1,12 @@
+subprojects {
+  apply plugin: 'idea'
+  apply plugin: 'eclipse'
+  apply plugin: 'java'
+  sourceCompatibility = 17
+  targetCompatibility = 17
+
+  repositories {
+  mavenCentral()
+  jcenter()
+  }
+}
diff --git a/gradle/wrapper/gradle-wrapper.properties 
b/gradle/wrapper/gradle-wrapper.properties
new file mode 100644
index 000..9c87f96
--- /dev/null
+++ b/gradle/wrapper/gradle-wrapper.properties
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+distributionBase=GRADLE_USER_HOME
+distributionPath=wrapper/dists
+distributionSha256Sum=194717442575a6f96e1c1befa2c30e9a4fc90f701d7aee33eb879b79e7ff05c0
+distributionUrl=https\://services.gradle.org/distributions/gradle-8.7-all.zip
+networkTimeout=1
+zipStoreBase=GRADLE_USER_HOME
+zipStorePath=wrapper/dists
diff --git a/gradlew b/gradlew
new file mode 100755
index 000..369a55f
--- /dev/null
+++ b/gradlew
@@ -0,0 +1,253 @@
+#!/bin/sh
+
+#
+# Copyright © 2015-2021 the original authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#  https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+##
+#
+#   Gradle start up script for POSIX generated by Gradle.
+#
+#   Important for running:
+#
+#   (1) You need a

svn commit: r68631 - /release/spark/spark-3.4.2/

2024-04-18 Thread dongjoon

Author: dongjoon
Date: Thu Apr 18 15:12:12 2024
New Revision: 68631

Log:
Remove Apache Spark 3.4.2 after releasing 3.4.3

Removed:
release/spark/spark-3.4.2/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto`

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6232085227ee [SPARK-47887][CONNECT] Remove unused import 
`spark/connect/common.proto` from `spark/connect/relations.proto`
6232085227ee is described below

commit 6232085227ee2cc4e831996a1ac84c27868a1595
Author: yangjie01 
AuthorDate: Thu Apr 18 07:50:00 2024 -0700

[SPARK-47887][CONNECT] Remove unused import `spark/connect/common.proto` 
from `spark/connect/relations.proto`

### What changes were proposed in this pull request?

SPARK-46812 |  
[https://github.com/apache/spark/pull/45232](https://github.com/apache/spark/pull/45232/files#diff-5b26ee7d224ae355b252d713e570cb03eaecbf7f8adcdb6287dc40c370b71462R26)
 added an unused import `spark/connect/common.proto` to 
`spark/connect/relations.proto`, this pr just remove it.

### Why are the changes needed?
Fix compilation warning:

```
spark/connect/relations.proto:26:1: warning: Import 
spark/connect/common.proto is unused.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #46106 from LuciferYang/SPARK-47887.

Authored-by: yangjie01 
Signed-off-by: Dongjoon Hyun 
---
 .../main/protobuf/spark/connect/relations.proto|   1 -
 python/pyspark/sql/connect/proto/relations_pb2.py  | 303 ++---
 2 files changed, 151 insertions(+), 153 deletions(-)

diff --git 
a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto 
b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
index 5cbe6459d226..3882b2e85396 100644
--- a/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
+++ b/connector/connect/common/src/main/protobuf/spark/connect/relations.proto
@@ -23,7 +23,6 @@ import "google/protobuf/any.proto";
 import "spark/connect/expressions.proto";
 import "spark/connect/types.proto";
 import "spark/connect/catalog.proto";
-import "spark/connect/common.proto";
 
 option java_multiple_files = true;
 option java_package = "org.apache.spark.connect.proto";
diff --git a/python/pyspark/sql/connect/proto/relations_pb2.py 
b/python/pyspark/sql/connect/proto/relations_pb2.py
index 467d0610bbc6..5bf3901ee545 100644
--- a/python/pyspark/sql/connect/proto/relations_pb2.py
+++ b/python/pyspark/sql/connect/proto/relations_pb2.py
@@ -32,11 +32,10 @@ from google.protobuf import any_pb2 as 
google_dot_protobuf_dot_any__pb2
 from pyspark.sql.connect.proto import expressions_pb2 as 
spark_dot_connect_dot_expressions__pb2
 from pyspark.sql.connect.proto import types_pb2 as 
spark_dot_connect_dot_types__pb2
 from pyspark.sql.connect.proto import catalog_pb2 as 
spark_dot_connect_dot_catalog__pb2
-from pyspark.sql.connect.proto import common_pb2 as 
spark_dot_connect_dot_common__pb2
 
 
 DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(
-
b'\n\x1dspark/connect/relations.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fspark/connect/expressions.proto\x1a\x19spark/connect/types.proto\x1a\x1bspark/connect/catalog.proto\x1a\x1aspark/connect/common.proto"\xe9\x1a\n\x08Relation\x12\x35\n\x06\x63ommon\x18\x01
 
\x01(\x0b\x32\x1d.spark.connect.RelationCommonR\x06\x63ommon\x12)\n\x04read\x18\x02
 
\x01(\x0b\x32\x13.spark.connect.ReadH\x00R\x04read\x12\x32\n\x07project\x18\x03 
\x01(\x0b\x32\x16.spark.connect.Project [...]
+
b'\n\x1dspark/connect/relations.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fspark/connect/expressions.proto\x1a\x19spark/connect/types.proto\x1a\x1bspark/connect/catalog.proto"\xe9\x1a\n\x08Relation\x12\x35\n\x06\x63ommon\x18\x01
 
\x01(\x0b\x32\x1d.spark.connect.RelationCommonR\x06\x63ommon\x12)\n\x04read\x18\x02
 
\x01(\x0b\x32\x13.spark.connect.ReadH\x00R\x04read\x12\x32\n\x07project\x18\x03 
\x01(\x0b\x32\x16.spark.connect.ProjectH\x00R\x07project\x12/\n\x06\x66il [...]
 )
 
 _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals())
@@ -66,154 +65,154 @@ if _descriptor._USE_C_DESCRIPTORS == False:
 
_WITHCOLUMNSRENAMED.fields_by_name["rename_columns_map"]._serialized_options = 
b"\030\001"
 _PARSE_OPTIONSENTRY._options = None
 _PARSE_OPTIONSENTRY._serialized_options = b"8\001"
-_RELATION._serialized_start = 193
-_RELATION._serialized_end = 3626
-_UNKNOWN._serialized_start = 3628
-_UNKNOWN._serialized_end = 3637
-_RELATIONCOMMON._serialized_start = 3639
-_RELATIONCOMMON._serialized_end = 3730
-_SQL._serialized_start = 3733
-_SQL._serialized_end = 4211
-_SQL_ARGSENTRY._serialized_start = 40

(spark) branch master updated: [SPARK-47893][BUILD] Upgrade ASM to 9.7

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 51ca47da6c1a [SPARK-47893][BUILD] Upgrade ASM to 9.7
51ca47da6c1a is described below

commit 51ca47da6c1ab9da8e68de0a0418a6a59457f7f8
Author: panbingkun 
AuthorDate: Thu Apr 18 07:48:12 2024 -0700

[SPARK-47893][BUILD] Upgrade ASM to 9.7

### What changes were proposed in this pull request?
This PR aims to upgrade ASM to 9.7.

### Why are the changes needed?
xbean-asm9-shaded 4.25 upgrade to use `ASM 9.7` and `ASM 9.7` is for `Java 
23`.
https://asm.ow2.io/versions.html
https://github.com/apache/geronimo-xbean/pull/40

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46110 from panbingkun/SPARK-47893.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 4 ++--
 project/plugins.sbt   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 71a87a9f519d..45a4d499e513 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -272,7 +272,7 @@ transaction-api/1.1//transaction-api-1.1.jar
 txw2/3.0.2//txw2-3.0.2.jar
 univocity-parsers/2.9.1//univocity-parsers-2.9.1.jar
 wildfly-openssl/1.1.3.Final//wildfly-openssl-1.1.3.Final.jar
-xbean-asm9-shaded/4.24//xbean-asm9-shaded-4.24.jar
+xbean-asm9-shaded/4.25//xbean-asm9-shaded-4.25.jar
 xmlschema-core/2.3.1//xmlschema-core-2.3.1.jar
 xz/1.9//xz-1.9.jar
 zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar
diff --git a/pom.xml b/pom.xml
index e6b37610adb2..682365d9704a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -118,7 +118,7 @@
 3.9.6
 3.2.0
 spark
-9.6
+9.7
 2.0.13
 2.22.1
 
@@ -481,7 +481,7 @@
   
 org.apache.xbean
 xbean-asm9-shaded
-4.24
+4.25

(spark) tag v3.4.3 created (now 1eb558c3a6fb)

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v3.4.3
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 1eb558c3a6fb (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r68618 - /dev/spark/v3.4.3-rc2-bin/ /release/spark/spark-3.4.3/

2024-04-18 Thread dongjoon

Author: dongjoon
Date: Thu Apr 18 08:09:41 2024
New Revision: 68618

Log:
Release Apache Spark 3.4.3

Added:
release/spark/spark-3.4.3/
  - copied from r68617, dev/spark/v3.4.3-rc2-bin/
Removed:
dev/spark/v3.4.3-rc2-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final`

2024-04-18 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eb8688c2b6ce [SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final`
eb8688c2b6ce is described below

commit eb8688c2b6cebb319511ca3102fc0f933adbafa2
Author: panbingkun 
AuthorDate: Wed Apr 17 23:09:06 2024 -0700

[SPARK-47896][BUILD] Upgrade netty to `4.1.109.Final`

### What changes were proposed in this pull request?
The pr aims to upgrade `netty` from `4.1.108.Final` to `4.1.109.Final`.

### Why are the changes needed?
https://netty.io/news/2024/04/15/4-1-109-Final.html
This version has brought some bug fixes and improvements, such as:
- Fix DefaultChannelId#asLongText NPE 
([#13971](https://github.com/netty/netty/pull/13971))
- Rewrite ZstdDecoder to remove the need of allocate a huge byte[] 
internally ([#13928](https://github.com/netty/netty/pull/13928))
- Don't send a RST frame when closing the stream in a write future while 
processing inbound frames ([#13973](https://github.com/netty/netty/pull/13973))

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46112 from panbingkun/netty_for_spark4.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 38 +--
 pom.xml   |  2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 54e54a108904..71a87a9f519d 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -198,16 +198,16 @@ metrics-jmx/4.2.25//metrics-jmx-4.2.25.jar
 metrics-json/4.2.25//metrics-json-4.2.25.jar
 metrics-jvm/4.2.25//metrics-jvm-4.2.25.jar
 minlog/1.3.0//minlog-1.3.0.jar
-netty-all/4.1.108.Final//netty-all-4.1.108.Final.jar
-netty-buffer/4.1.108.Final//netty-buffer-4.1.108.Final.jar
-netty-codec-http/4.1.108.Final//netty-codec-http-4.1.108.Final.jar
-netty-codec-http2/4.1.108.Final//netty-codec-http2-4.1.108.Final.jar
-netty-codec-socks/4.1.108.Final//netty-codec-socks-4.1.108.Final.jar
-netty-codec/4.1.108.Final//netty-codec-4.1.108.Final.jar
-netty-common/4.1.108.Final//netty-common-4.1.108.Final.jar
-netty-handler-proxy/4.1.108.Final//netty-handler-proxy-4.1.108.Final.jar
-netty-handler/4.1.108.Final//netty-handler-4.1.108.Final.jar
-netty-resolver/4.1.108.Final//netty-resolver-4.1.108.Final.jar
+netty-all/4.1.109.Final//netty-all-4.1.109.Final.jar
+netty-buffer/4.1.109.Final//netty-buffer-4.1.109.Final.jar
+netty-codec-http/4.1.109.Final//netty-codec-http-4.1.109.Final.jar
+netty-codec-http2/4.1.109.Final//netty-codec-http2-4.1.109.Final.jar
+netty-codec-socks/4.1.109.Final//netty-codec-socks-4.1.109.Final.jar
+netty-codec/4.1.109.Final//netty-codec-4.1.109.Final.jar
+netty-common/4.1.109.Final//netty-common-4.1.109.Final.jar
+netty-handler-proxy/4.1.109.Final//netty-handler-proxy-4.1.109.Final.jar
+netty-handler/4.1.109.Final//netty-handler-4.1.109.Final.jar
+netty-resolver/4.1.109.Final//netty-resolver-4.1.109.Final.jar
 
netty-tcnative-boringssl-static/2.0.61.Final//netty-tcnative-boringssl-static-2.0.61.Final.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/linux-aarch_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-aarch_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/linux-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-linux-x86_64.jar
@@ -215,15 +215,15 @@ 
netty-tcnative-boringssl-static/2.0.65.Final/osx-aarch_64/netty-tcnative-borings
 
netty-tcnative-boringssl-static/2.0.65.Final/osx-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-osx-x86_64.jar
 
netty-tcnative-boringssl-static/2.0.65.Final/windows-x86_64/netty-tcnative-boringssl-static-2.0.65.Final-windows-x86_64.jar
 netty-tcnative-classes/2.0.65.Final//netty-tcnative-classes-2.0.65.Final.jar
-netty-transport-classes-epoll/4.1.108.Final//netty-transport-classes-epoll-4.1.108.Final.jar
-netty-transport-classes-kqueue/4.1.108.Final//netty-transport-classes-kqueue-4.1.108.Final.jar
-netty-transport-native-epoll/4.1.108.Final/linux-aarch_64/netty-transport-native-epoll-4.1.108.Final-linux-aarch_64.jar
-netty-transport-native-epoll/4.1.108.Final/linux-riscv64/netty-transport-native-epoll-4.1.108.Final-linux-riscv64.jar
-netty-transport-native-epoll/4.1.108.Final/linux-x86_64/netty-transport-native-epoll-4.1.108.Final-linux-x86_64.jar
-netty-transport-native-kqueue/4.1.108.Final/osx-aarch_64/netty-transport-native-kqueue-4.1.108.Final-osx-aarch_64.jar
-netty-transport-native-kqueue/4.1.108.Final/osx-x86_64/netty-transport-native-kqueue-4.1.108.Final-osx-x86_64.jar
-netty-transport

(spark) branch master updated: [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 47d783bc6489 [SPARK-47882][SQL] createTableColumnTypes need to be 
mapped to database types instead of using directly
47d783bc6489 is described below

commit 47d783bc64897c85294a32d5ea2ca0ec8a655ea7
Author: Kent Yao 
AuthorDate: Wed Apr 17 20:34:16 2024 -0700

[SPARK-47882][SQL] createTableColumnTypes need to be mapped to database 
types instead of using directly

### What changes were proposed in this pull request?

createTableColumnTypes contains Spark SQL data type definitions. The 
underlying database might not recognize them, boolean for Oracle(v < 23c).

### Why are the changes needed?

bugfix
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46093 from yaooqinn/SPARK-47882.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/execution/datasources/jdbc/JdbcUtils.scala   | 14 --
 .../test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala   | 12 ++--
 .../scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala   |  8 +---
 3 files changed, 23 insertions(+), 11 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index fd7be9d0ea41..c541ec16fc82 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -878,16 +878,15 @@ object JdbcUtils extends Logging with SQLConfHelper {
* Compute the schema string for this RDD.
*/
   def schemaString(
+  dialect: JdbcDialect,
   schema: StructType,
   caseSensitive: Boolean,
-  url: String,
   createTableColumnTypes: Option[String] = None): String = {
 val sb = new StringBuilder()
-val dialect = JdbcDialects.get(url)
 val userSpecifiedColTypesMap = createTableColumnTypes
-  .map(parseUserSpecifiedCreateTableColumnTypes(schema, caseSensitive, _))
+  .map(parseUserSpecifiedCreateTableColumnTypes(dialect, schema, 
caseSensitive, _))
   .getOrElse(Map.empty[String, String])
-schema.fields.foreach { field =>
+schema.foreach { field =>
   val name = dialect.quoteIdentifier(field.name)
   val typ = userSpecifiedColTypesMap
 .getOrElse(field.name, getJdbcType(field.dataType, 
dialect).databaseTypeDefinition)
@@ -903,6 +902,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
* use in-place of the default data type.
*/
   private def parseUserSpecifiedCreateTableColumnTypes(
+  dialect: JdbcDialect,
   schema: StructType,
   caseSensitive: Boolean,
   createTableColumnTypes: String): Map[String, String] = {
@@ -919,7 +919,9 @@ object JdbcUtils extends Logging with SQLConfHelper {
   }
 }
 
-val userSchemaMap = userSchema.fields.map(f => f.name -> 
f.dataType.catalogString).toMap
+val userSchemaMap = userSchema
+  .map(f => f.name -> getJdbcType(f.dataType, 
dialect).databaseTypeDefinition)
+  .toMap
 if (caseSensitive) userSchemaMap else CaseInsensitiveMap(userSchemaMap)
   }
 
@@ -988,7 +990,7 @@ object JdbcUtils extends Logging with SQLConfHelper {
 val statement = conn.createStatement
 val dialect = JdbcDialects.get(options.url)
 val strSchema = schemaString(
-  schema, caseSensitive, options.url, options.createTableColumnTypes)
+  dialect, schema, caseSensitive, options.createTableColumnTypes)
 try {
   statement.setQueryTimeout(options.queryTimeout)
   dialect.createTable(statement, tableName, strSchema, options)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
index 5915a44b7954..34c554f7d37e 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
@@ -1372,9 +1372,9 @@ class JDBCSuite extends QueryTest with SharedSparkSession 
{
   test("SPARK-16387: Reserved SQL words are not escaped by JDBC writer") {
 val df = spark.createDataset(Seq("a", "b", "c")).toDF("order")
 val schema = JdbcUtils.schemaString(
+  JdbcDialects.get("jdbc:mysql://localhost:3306/temp"),
   df.schema,
-  df.sparkSession.sessionState.conf.caseSensitiveAnalysis,
-  "jdbc:mysql://localhost:3306/temp")
+  df.sparkSes

(spark) branch master updated: [SPARK-47894][CORE][WEBUI] Add `Environment` page to Master UI

2024-04-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24b4581fa818 [SPARK-47894][CORE][WEBUI] Add `Environment` page to 
Master UI
24b4581fa818 is described below

commit 24b4581fa818da89a5aff57437addcece707e678
Author: Dongjoon Hyun 
AuthorDate: Wed Apr 17 20:29:54 2024 -0700

[SPARK-47894][CORE][WEBUI] Add `Environment` page to Master UI

### What changes were proposed in this pull request?

This PR aims to add `Environment` page to `Spark Master UI`.

### Why are the changes needed?

To improve `Spark Standalone` cluster UX by providing `Spark Master` JVM's 
information
- `Runtime Information`
- `Spark Properties`
- `Hadoop Properties`
- `System Properties`
- `Metrics Properties`
- `Classpath Entries`

https://github.com/apache/spark/assets/9700541/2b02abbd-e08f-4b0f-834a-160ea6fd00c7;>

https://github.com/apache/spark/assets/9700541/664d113a-b677-41a7-9e8c-841e087aae1d;>

### Does this PR introduce _any_ user-facing change?

Yes, but this is a new UI.

### How was this patch tested?

Pass the CIs with the newly added test case.

Or manual check the UI after running `Master`.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true 
-Dspark.deploy.maxDrivers=2" sbin/start-master.sh
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46111 from dongjoon-hyun/SPARK-47894.

    Authored-by: Dongjoon Hyun 
    Signed-off-by: Dongjoon Hyun 
---
 .../spark/deploy/master/ui/EnvironmentPage.scala   | 141 +
 .../apache/spark/deploy/master/ui/MasterPage.scala |   5 +-
 .../spark/deploy/master/ui/MasterWebUI.scala   |   5 +
 .../master/ui/ReadOnlyMasterWebUISuite.scala   |  14 +-
 4 files changed, 162 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala
new file mode 100644
index ..190e821524ba
--- /dev/null
+++ 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/EnvironmentPage.scala
@@ -0,0 +1,141 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.master.ui
+
+import scala.xml.Node
+
+import jakarta.servlet.http.HttpServletRequest
+
+import org.apache.spark.{SparkConf, SparkEnv}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.ui._
+import org.apache.spark.util.Utils
+
+private[ui] class EnvironmentPage(
+parent: MasterWebUI,
+conf: SparkConf) extends WebUIPage("Environment") {
+
+  def render(request: HttpServletRequest): Seq[Node] = {
+val details = SparkEnv.environmentDetails(conf, 
SparkHadoopUtil.get.newConfiguration(conf),
+  "", Seq.empty, Seq.empty, Seq.empty, Map.empty)
+val jvmInformation = details("JVM Information").sorted
+val sparkProperties = Utils.redact(conf, details("Spark 
Properties")).sorted
+val hadoopProperties = Utils.redact(conf, details("Hadoop 
Properties")).sorted
+val systemProperties = Utils.redact(conf, details("System 
Properties")).sorted
+val metricsProperties = Utils.redact(conf, details("Metrics 
Properties")).sorted
+val classpathEntries = details("Classpath Entries").sorted
+
+val runtimeInformationTable = UIUtils.listingTable(propertyHeader, 
propertyRow,
+  jvmInformation, fixedWidth = true, headerClasses = headerClasses)
+val sparkPropertiesTable = UIUtils.listingTable(propertyHeader, 
propertyRow,
+  sparkProperties, fixedWidth = true, headerClasses = headerClasses)
+val hadoopPropertiesTable = UIUtils.listingTable(propertyHeader, 
propertyRow,
+  hadoopProperties, fixedWidth = true, headerClasses = headerClasses)
+val systemPropertiesTable = UIUtils.listingTable(propertyHeader, 
pro

(spark) branch master updated: [SPARK-47726][DOC] Document push-based shuffle metrics

2024-04-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new be6eef97a81c [SPARK-47726][DOC] Document push-based shuffle metrics
be6eef97a81c is described below

commit be6eef97a81c147272d5bee09afc5d423586762f
Author: Luca Canali 
AuthorDate: Wed Apr 17 09:35:05 2024 -0700

[SPARK-47726][DOC] Document push-based shuffle metrics

### What changes were proposed in this pull request?
This adds documentation for the push-based shuffle metrics

### Why are the changes needed?
The push-based shuffle metrics are currently not documented

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45872 from LucaCanali/documentPushBasedShuffle.

Authored-by: Luca Canali 
Signed-off-by: Dongjoon Hyun 
---
 docs/monitoring.md | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/monitoring.md b/docs/monitoring.md
index 5e11d5aef81e..a008b71c3fe9 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -1301,6 +1301,17 @@ These metrics are exposed by Spark executors.
   - shuffleRemoteBytesReadToDisk.count
   - shuffleTotalBytesRead.count
   - shuffleWriteTime.count
+  - Metrics related to push-based shuffle:
+- shuffleCorruptMergedBlockChunks
+- shuffleMergedFetchFallbackCount
+- shuffleMergedRemoteBlocksFetched
+- shuffleMergedLocalBlocksFetched
+- shuffleMergedRemoteChunksFetched
+- shuffleMergedLocalChunksFetched
+- shuffleMergedRemoteBytesRead
+- shuffleMergedLocalBytesRead
+- shuffleRemoteReqsDuration
+- shuffleMergedRemoteReqsDuration
   - succeededTasks.count
   - threadpool.activeTasks
   - threadpool.completeTasks


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres special numeric values

2024-04-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 19833d92f325 [SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and 
doc for Postgres special numeric values
19833d92f325 is described below

commit 19833d92f3258ea2b4dcf803217e7a7334ecd927
Author: Kent Yao 
AuthorDate: Wed Apr 17 07:50:35 2024 -0700

[SPARK-47886][SQL][DOCS][TESTS] Postgres: Add tests and doc for Postgres 
special numeric values

### What changes were proposed in this pull request?

This PR added tests and doc for Postgres special numeric values.

Postgres supports special numeric values "NaN", "infinity", "-infinity" for 
both exact and inexact numbers, while we only support these for inexact ones.

### Why are the changes needed?

test coverage and doc improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test and doc build


![image](https://github.com/apache/spark/assets/8326978/4e46be31-981d-4625-91f2-f81c4d40abed)

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46102 from yaooqinn/SPARK-47886.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 30 --
 docs/sql-data-sources-jdbc.md  |  2 +-
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
index 1cd8a77e8442..8c0a7c0a809f 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala
@@ -18,12 +18,13 @@
 package org.apache.spark.sql.jdbc
 
 import java.math.{BigDecimal => JBigDecimal}
-import java.sql.{Connection, Date, Timestamp}
+import java.sql.{Connection, Date, SQLException, Timestamp}
 import java.text.SimpleDateFormat
 import java.time.LocalDateTime
 import java.util.Properties
 
-import org.apache.spark.sql.{Column, Row}
+import org.apache.spark.SparkException
+import org.apache.spark.sql.{Column, DataFrame, Row}
 import org.apache.spark.sql.catalyst.expressions.Literal
 import org.apache.spark.sql.types._
 import org.apache.spark.tags.DockerTest
@@ -554,4 +555,29 @@ class PostgresIntegrationSuite extends 
DockerJDBCIntegrationSuite {
   .option("query", "SELECT 1::oid, 'bar'::regclass, 
'integer'::regtype").load()
 checkAnswer(df, Row(1, "bar", "integer"))
   }
+
+  test("SPARK-47886: special number values") {
+def toDF(qry: String): DataFrame = {
+  spark.read.format("jdbc")
+.option("url", jdbcUrl)
+.option("query", qry)
+.load()
+}
+checkAnswer(
+  toDF("SELECT 'NaN'::float8 c1, 'infinity'::float8 c2, 
'-infinity'::float8 c3"),
+  Row(Double.NaN, Double.PositiveInfinity, Double.NegativeInfinity))
+checkAnswer(
+  toDF("SELECT 'NaN'::float4 c1, 'infinity'::float4 c2, 
'-infinity'::float4 c3"),
+  Row(Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity)
+)
+
+Seq("NaN", "infinity", "-infinity").foreach { v =>
+  val df = toDF(s"SELECT '$v'::numeric c1")
+  val e = intercept[SparkException](df.collect())
+  checkError(e, null)
+  val cause = e.getCause.asInstanceOf[SQLException]
+  assert(cause.getMessage.contains("Bad value for type BigDecimal"))
+  assert(cause.getSQLState === "22003")
+}
+  }
 }
diff --git a/docs/sql-data-sources-jdbc.md b/docs/sql-data-sources-jdbc.md
index ef7a07a82c5f..637efc24113e 100644
--- a/docs/sql-data-sources-jdbc.md
+++ b/docs/sql-data-sources-jdbc.md
@@ -845,7 +845,7 @@ as the activated JDBC Driver. Note that, different JDBC 
drivers, or different ve
 
   numeric, decimal
   DecimalType
-  Since PostgreSQL 15, 's' can be negative. If 's<0' it'll be adjusted 
to DecimalType(min(p-s, 38), 0); Otherwise, DecimalType(p, s), and if 'p>38', 
the fraction part will be truncated if exceeded. And if any value of this 
column have an actual precision greater 38 will fail with 
NUMERIC_VALUE_OUT_OF_RANGE.WITHOUT_SUGGESTION error
+  Since PostgreSQL 15, 's' can be negative. If 's<0' it'll be 
adjusted to DecimalType(min(p-s, 38), 0); Otherwise, DecimalType(p, 
s)If 'p>38', the fraction part will be truncated if exceeded. And

(spark) branch master updated (4e754f778fdc -> 6fb2f7c3772a)

2024-04-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4e754f778fdc [SPARK-47822][SQL] Prohibit Hash Expressions from hashing 
the Variant Data Type
 add 6fb2f7c3772a [SPARK-4][SQL] Use ANSI SQL mode by default

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md | 1 +
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ab6338e09aa0 [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4
ab6338e09aa0 is described below

commit ab6338e09aa0fe06aef1c753eaaf677f766e9490
Author: Neil Ramaswamy 
AuthorDate: Tue Apr 16 20:11:16 2024 -0700

[SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4

### What changes were proposed in this pull request?

Upgrades `rocksdbjni` dependency to 8.11.4.

### Why are the changes needed?

8.11.4 has Java-related RocksDB fixes:

https://github.com/facebook/rocksdb/releases/tag/v8.11.4

- Fixed CMake Javadoc build
- Fixed Java SstFileMetaData to prevent throwing java.lang.NoSuchMethodError

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- All existing UTs should pass
- [In progress] Performance benchmarks

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46065 from neilramaswamy/spark-47838.

Authored-by: Neil Ramaswamy 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3  |   2 +-
 pom.xml|   2 +-
 ...StoreBasicOperationsBenchmark-jdk21-results.txt | 122 +++--
 .../StateStoreBasicOperationsBenchmark-results.txt | 122 +++--
 4 files changed, 126 insertions(+), 122 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 466e8d09d89e..54e54a108904 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -247,7 +247,7 @@ parquet-jackson/1.13.1//parquet-jackson-1.13.1.jar
 pickle/1.3//pickle-1.3.jar
 py4j/0.10.9.7//py4j-0.10.9.7.jar
 remotetea-oncrpc/1.1.2//remotetea-oncrpc-1.1.2.jar
-rocksdbjni/8.11.3//rocksdbjni-8.11.3.jar
+rocksdbjni/8.11.4//rocksdbjni-8.11.4.jar
 scala-collection-compat_2.13/2.7.0//scala-collection-compat_2.13-2.7.0.jar
 scala-compiler/2.13.13//scala-compiler-2.13.13.jar
 scala-library/2.13.13//scala-library-2.13.13.jar
diff --git a/pom.xml b/pom.xml
index bf8d4f1b417d..7ded74b9f9df 100644
--- a/pom.xml
+++ b/pom.xml
@@ -687,7 +687,7 @@
   
 org.rocksdb
 rocksdbjni
-8.11.3
+8.11.4
   
   
 ${leveldbjni.group}
diff --git 
a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
index 0317e6116375..953031fc1daf 100644
--- a/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk21-results.txt
@@ -2,141 +2,143 @@
 put rows
 

 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (1 rows to overwrite - rate 100):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
---
-In-memory9 
10   1  1.1 936.2   1.0X
-RocksDB (trackTotalNumberOfRows: true)  41 
42   1  0.24068.9   0.2X
-RocksDB (trackTotalNumberOfRows: false) 15 
16   1  0.71500.4   0.6X
+In-memory9 
10   1  1.1 938.9   1.0X
+RocksDB (trackTotalNumberOfRows: true)  42 
44   2  0.24215.2   0.2X
+RocksDB (trackTotalNumberOfRows: false) 15 
16   1  0.71535.3   0.6X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1017-azure
 AMD EPYC 7763 64-Core Processor
 putting 1 rows (5000 rows to overwrite - rate 50):  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 
-
-In-memory  9   
  11   1  1.1 929.8   1.0X
-RocksDB (trackTotalNumberOfRows: true)40

(spark) branch master updated: [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer`

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5321353b24db [SPARK-47875][CORE] Remove 
`spark.deploy.recoverySerializer`
5321353b24db is described below

commit 5321353b24db247087890c44de06b9ad4e136473
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 16 16:47:23 2024 -0700

[SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer`

### What changes were proposed in this pull request?

This is a logical revert of SPARK-46205
- #44113
- #44118

### Why are the changes needed?

The initial implementation didn't handle the class initialization logic 
properly.
Until we have a fix, I'd like to revert this from `master` branch.

### Does this PR introduce _any_ user-facing change?

No, this is not released yet.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46087 from dongjoon-hyun/SPARK-47875.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../PersistenceEngineBenchmark-jdk21-results.txt   |  7 --
 .../PersistenceEngineBenchmark-results.txt |  7 --
 .../org/apache/spark/deploy/master/Master.scala|  7 ++
 .../org/apache/spark/internal/config/Deploy.scala  | 14 
 .../deploy/master/PersistenceEngineBenchmark.scala |  4 ++--
 .../deploy/master/PersistenceEngineSuite.scala | 14 +---
 .../apache/spark/deploy/master/RecoverySuite.scala | 25 ++
 docs/spark-standalone.md   | 12 ++-
 8 files changed, 9 insertions(+), 81 deletions(-)

diff --git a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt 
b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt
index 2a6bd778fc8a..ae4e0071adb0 100644
--- a/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt
+++ b/core/benchmarks/PersistenceEngineBenchmark-jdk21-results.txt
@@ -7,19 +7,12 @@ AMD EPYC 7763 64-Core Processor
 1000 Workers: Best Time(ms)   Avg 
Time(ms)   Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

 ZooKeeperPersistenceEngine with JavaSerializer 5036
   5232 229  0.0 5035730.1   1.0X
-ZooKeeperPersistenceEngine with KryoSerializer 4038
   4053  16  0.0 4038447.8   1.2X
 FileSystemPersistenceEngine with JavaSerializer2902
   2906   5  0.0 2902453.3   1.7X
 FileSystemPersistenceEngine with JavaSerializer (lz4)   816
829  19  0.0  816173.1   6.2X
 FileSystemPersistenceEngine with JavaSerializer (lzf)   755
780  33  0.0  755209.0   6.7X
 FileSystemPersistenceEngine with JavaSerializer (snappy)814
832  16  0.0  813672.5   6.2X
 FileSystemPersistenceEngine with JavaSerializer (zstd)  987
   1014  45  0.0  986834.7   5.1X
-FileSystemPersistenceEngine with KryoSerializer 687
698  14  0.0  687313.5   7.3X
-FileSystemPersistenceEngine with KryoSerializer (lz4)   590
599  15  0.0  589867.9   8.5X
-FileSystemPersistenceEngine with KryoSerializer (lzf)   915
922   9  0.0  915432.2   5.5X
-FileSystemPersistenceEngine with KryoSerializer (snappy)768
795  37  0.0  768494.4   6.6X
-FileSystemPersistenceEngine with KryoSerializer (zstd)  898
950  45  0.0  898118.6   5.6X
 RocksDBPersistenceEngine with JavaSerializer299
299   0  0.0  298800.0  16.9X
-RocksDBPersistenceEngine with KryoSerializer112
113   1  0.0  111779.6  45.1X
 BlackHolePersistenceEngine0
  0   0  5.5 180.3   27924.2X
 
 
diff --git a/core/benchmarks/PersistenceEngineBenchmark-results.txt 
b/core/benchmarks/PersistenceEngineBenchmark-results.txt
index da1838608de1..ec9a6fc1c8cf 100644
--- a/core/benchmarks/PersistenceEngineBenchmark-results.txt
+++ b/core/benchmarks/PersistenceEngineBenchmark-results.txt
@@ -7,19 +7,12 @@ AMD EPYC 7763 64-Core Processor
 1000 Workers: Best

(spark) branch master updated: [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a1fc112677f [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP 
WITH LOCAL TIME ZONE
9a1fc112677f is described below

commit 9a1fc112677f98089d946b3bf4f52b33ab0a5c23
Author: Kent Yao 
AuthorDate: Tue Apr 16 08:35:51 2024 -0700

[SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME 
ZONE

### What changes were proposed in this pull request?

This PR map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE

### Why are the changes needed?

We currently map both TimestampType and TimestampNTZType to Oracle's 
TIMESTAMP which represents a timestamp without time zone. This is ambiguous

### Does this PR introduce _any_ user-facing change?

It does not affect spark users to play a TimestampType read-write-read 
roundtrip, but might affect other systems' reading

### How was this patch tested?

existing test with new configuration
```java
SPARK-42627: Support ORACLE TIMESTAMP WITH LOCAL TIME ZONE (9 seconds, 536 
milliseconds)
```

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #46080 from yaooqinn/SPARK-47871.

Authored-by: Kent Yao 
Signed-off-by: Dongjoon Hyun 
---
 .../spark/sql/jdbc/OracleIntegrationSuite.scala| 39 --
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +++
 .../org/apache/spark/sql/jdbc/OracleDialect.scala  |  5 ++-
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  |  5 ++-
 5 files changed, 43 insertions(+), 19 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
index 418b86fb6b23..496498e5455b 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
@@ -547,23 +547,28 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSpark
   }
 
   test("SPARK-42627: Support ORACLE TIMESTAMP WITH LOCAL TIME ZONE") {
-val reader = spark.read.format("jdbc")
-  .option("url", jdbcUrl)
-  .option("dbtable", "test_ltz")
-val df = reader.load()
-val row1 = df.collect().head.getTimestamp(0)
-assert(df.count() === 1)
-assert(row1 === Timestamp.valueOf("2018-11-17 13:33:33"))
-
-df.write.format("jdbc")
-  .option("url", jdbcUrl)
-  .option("dbtable", "test_ltz")
-  .mode("append")
-  .save()
-
-val df2 = reader.load()
-assert(df.count() === 2)
-assert(df2.collect().forall(_.getTimestamp(0) === row1))
+Seq("true", "false").foreach { flag =>
+  withSQLConf((SQLConf.LEGACY_ORACLE_TIMESTAMP_MAPPING_ENABLED.key, flag)) 
{
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("dbtable", "test_ltz")
+  .load()
+val row1 = df.collect().head.getTimestamp(0)
+assert(df.count() === 1)
+assert(row1 === Timestamp.valueOf("2018-11-17 13:33:33"))
+
+df.write.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("dbtable", "test_ltz" + flag)
+  .save()
+
+val df2 = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("dbtable", "test_ltz" + flag)
+  .load()
+checkAnswer(df2, Row(row1))
+  }
+}
   }
 
   test("SPARK-47761: Reading ANSI INTERVAL Types") {
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index c7bd0b55840c..3004008b8ec7 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -45,6 +45,7 @@ license: |
 - Since Spark 4.0, MySQL JDBC datasource will read FLOAT as FloatType, while 
in Spark 3.5 and previous, it was read as DoubleType. To restore the previous 
behavior, you can cast the column to the old type.
 - Since Spark 4.0, MySQL JDBC datasource will read BIT(n > 1) as BinaryType, 
while in Spark 3.5 and previous, read as LongType. To restore the previous 
behavior, set `spark.sql.legacy.mysql.bitArrayMapping.enabled` to `true`.
 - Since Spark 4.0, MySQL JDBC datasource will write ShortType as SMALLINT, 
while in Spark 3.5 and previous, write as INTEGER. To restore the pre

(spark-kubernetes-operator) branch main updated: [SPARK-47745] Add License to Spark Operator repository

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new 7a3a7e8  [SPARK-47745] Add License to Spark Operator repository
7a3a7e8 is described below

commit 7a3a7e882af2c8e8d463ebed71329212133d229c
Author: zhou-jiang 
AuthorDate: Tue Apr 16 08:08:26 2024 -0700

[SPARK-47745] Add License to Spark Operator repository

### What changes were proposed in this pull request?

This PR aims to add ASF license file.

### Why are the changes needed?

To receive a code contribution.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #3 from jiangzho/license.

Authored-by: zhou-jiang 
Signed-off-by: Dongjoon Hyun 
---
 LICENSE | 201 
 1 file changed, 201 insertions(+)

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 000..261eeb9
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,201 @@
+ Apache License
+   Version 2.0, January 2004
+http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+  "License" shall mean the terms and conditions for use, reproduction,
+  and distribution as defined by Sections 1 through 9 of this document.
+
+  "Licensor" shall mean the copyright owner or entity authorized by
+  the copyright owner that is granting the License.
+
+  "Legal Entity" shall mean the union of the acting entity and all
+  other entities that control, are controlled by, or are under common
+  control with that entity. For the purposes of this definition,
+  "control" means (i) the power, direct or indirect, to cause the
+  direction or management of such entity, whether by contract or
+  otherwise, or (ii) ownership of fifty percent (50%) or more of the
+  outstanding shares, or (iii) beneficial ownership of such entity.
+
+  "You" (or "Your") shall mean an individual or Legal Entity
+  exercising permissions granted by this License.
+
+  "Source" form shall mean the preferred form for making modifications,
+  including but not limited to software source code, documentation
+  source, and configuration files.
+
+  "Object" form shall mean any form resulting from mechanical
+  transformation or translation of a Source form, including but
+  not limited to compiled object code, generated documentation,
+  and conversions to other media types.
+
+  "Work" shall mean the work of authorship, whether in Source or
+  Object form, made available under the License, as indicated by a
+  copyright notice that is included in or attached to the work
+  (an example is provided in the Appendix below).
+
+  "Derivative Works" shall mean any work, whether in Source or Object
+  form, that is based on (or derived from) the Work and for which the
+  editorial revisions, annotations, elaborations, or other modifications
+  represent, as a whole, an original work of authorship. For the purposes
+  of this License, Derivative Works shall not include works that remain
+  separable from, or merely link (or bind by name) to the interfaces of,
+  the Work and Derivative Works thereof.
+
+  "Contribution" shall mean any work of authorship, including
+  the original version of the Work and any modifications or additions
+  to that Work or Derivative Works thereof, that is intentionally
+  submitted to Licensor for inclusion in the Work by the copyright owner
+  or by an individual or Legal Entity authorized to submit on behalf of
+  the copyright owner. For the purposes of this definition, "submitted"
+  means any form of electronic, verbal, or written communication sent
+  to the Licensor or its representatives, including but not limited to
+  communication on electronic mailing lists, source code control systems,
+  and issue tracking systems that are managed by, or on behalf of, the
+  Licensor for the purpose of discussing and improving the Work, but
+  excluding communication that is conspicuously marked or otherwise
+  designated in writing by the copyright owner as "Not a Contribution."
+
+  "Contributor" shall mean Licensor and any individual or Legal Entity
+  on behalf of whom a Contribution has been received by Licensor and
+  subsequently incorporated with

(spark-kubernetes-operator) branch main updated: Update GITHUB_API_BASE

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git


The following commit(s) were added to refs/heads/main by this push:
 new a8eb690  Update GITHUB_API_BASE
a8eb690 is described below

commit a8eb690a7a85fd2b580e3756fad8d2bcf306e12c
Author: Dongjoon Hyun 
AuthorDate: Tue Apr 16 08:06:10 2024 -0700

Update GITHUB_API_BASE
---
 dev/merge_spark_pr.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 4647383..24e956d 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -65,7 +65,7 @@ GITHUB_OAUTH_KEY = os.environ.get("GITHUB_OAUTH_KEY")
 
 
 GITHUB_BASE = "https://github.com/apache/spark-kubernetes-operator/pull;
-GITHUB_API_BASE = "https://api.github.com/repos/spark-kubernetes-operator;
+GITHUB_API_BASE = 
"https://api.github.com/repos/apache/spark-kubernetes-operator;
 JIRA_BASE = "https://issues.apache.org/jira/browse;
 JIRA_API_BASE = "https://issues.apache.org/jira;
 # Prefix added to temporary branches


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47739][SQL] Register logical avro type

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fa2e9c7275aa [SPARK-47739][SQL] Register logical avro type
fa2e9c7275aa is described below

commit fa2e9c7275aa1c09652d0df0992565c32974b2b9
Author: milastdbx 
AuthorDate: Tue Apr 16 03:38:19 2024 -0700

[SPARK-47739][SQL] Register logical avro type

### What changes were proposed in this pull request?
In this pull request I propose that we register logical avro types when we 
initialize `AvroUtils` and `AvroFileFormat`, otherwise for first schema 
discovery we might get wrong result on very first execution after spark starts.
https://github.com/apache/spark/assets/150366084/3eaba6e3-34ec-4ca9-ae89-d0259ce942ba;>

example
```scala
val new_schema = """
 | {
 |   "type": "record",
 |   "name": "Entry",
 |   "fields": [
 | {
 |   "name": "rate",
 |   "type": [
 | "null",
 | {
 |   "type": "long",
 |   "logicalType": "custom-decimal",
 |   "precision": 38,
 |   "scale": 9
 | }
 |   ],
 |   "default": null
 | }
 |   ]
 | }""".stripMargin
spark.read.format("avro").option("avroSchema", 
new_schema).load().printSchema // maps to long - WRONG
spark.read.format("avro").option("avroSchema", 
new_schema).load().printSchema // maps to Decimal - CORRECT
```

### Why are the changes needed?
To fix issue with resolving avro schema upon spark startup.

### Does this PR introduce _any_ user-facing change?
No, its a bugfix

### How was this patch tested?
Unit tests
    
### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45895 from milastdbx/dev/milast/fixAvroLogicalTypeRegistration.

Lead-authored-by: milastdbx 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/avro/AvroFileFormat.scala | 21 --
 .../spark/sql/avro/AvroLogicalTypeInitSuite.scala  | 76 ++
 2 files changed, 91 insertions(+), 6 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
index 2792edaea284..372f24b54f5c 100755
--- 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
+++ 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
@@ -43,6 +43,8 @@ import org.apache.spark.util.SerializableConfiguration
 private[sql] class AvroFileFormat extends FileFormat
   with DataSourceRegister with Logging with Serializable {
 
+  AvroFileFormat.registerCustomAvroTypes()
+
   override def equals(other: Any): Boolean = other match {
 case _: AvroFileFormat => true
 case _ => false
@@ -173,10 +175,17 @@ private[sql] class AvroFileFormat extends FileFormat
 private[avro] object AvroFileFormat {
   val IgnoreFilesWithoutExtensionProperty = 
"avro.mapred.ignore.inputs.without.extension"
 
-  // Register the customized decimal type backed by long.
-  LogicalTypes.register(CustomDecimal.TYPE_NAME, new 
LogicalTypes.LogicalTypeFactory {
-override def fromSchema(schema: Schema): LogicalType = {
-  new CustomDecimal(schema)
-}
-  })
+  /**
+   * Register Spark defined custom Avro types.
+   */
+  def registerCustomAvroTypes(): Unit = {
+// Register the customized decimal type backed by long.
+LogicalTypes.register(CustomDecimal.TYPE_NAME, new 
LogicalTypes.LogicalTypeFactory {
+  override def fromSchema(schema: Schema): LogicalType = {
+new CustomDecimal(schema)
+  }
+})
+  }
+
+  registerCustomAvroTypes()
 }
diff --git 
a/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala
 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala
new file mode 100644
index ..126440ed69b8
--- /dev/null
+++ 
b/connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeInitSuite.scala
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file exc

(spark) branch master updated: [SPARK-46574][BUILD] Upgrade maven plugin to latest version

2024-04-16 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b7a729bfd19c [SPARK-46574][BUILD] Upgrade maven plugin to latest 
version
b7a729bfd19c is described below

commit b7a729bfd19cfa7a06d208f3899d329e414d5598
Author: panbingkun 
AuthorDate: Tue Apr 16 03:30:12 2024 -0700

[SPARK-46574][BUILD] Upgrade maven plugin to latest version

### What changes were proposed in this pull request?

### Why are the changes needed?
- `exec-maven-plugin` from `3.1.0` to `3.2.0`
https://github.com/mojohaus/exec-maven-plugin/releases/tag/3.2.0
https://github.com/mojohaus/exec-maven-plugin/releases/tag/3.1.1
Bug Fixes:
1.Fix https://github.com/mojohaus/exec-maven-plugin/issues/158 - Fix non 
ascii character handling 
(https://github.com/mojohaus/exec-maven-plugin/pull/372)
2.[https://github.com/mojohaus/exec-maven-plugin/issues/323] exec arguments 
missing (https://github.com/mojohaus/exec-maven-plugin/pull/324)

- `build-helper-maven-plugin` from `3.4.0` to `3.5.0`
https://github.com/mojohaus/build-helper-maven-plugin/releases/tag/3.5.0

- `maven-compiler-plugin` from `3.12.1` to `3.13.0`

https://github.com/apache/maven-compiler-plugin/releases/tag/maven-compiler-plugin-3.13.0

- `maven-jar-plugin` from `3.3.0` to `3.4.0`

https://github.com/apache/maven-jar-plugin/releases/tag/maven-jar-plugin-3.4.0
[[MJAR-62]](https://issues.apache.org/jira/browse/MJAR-62) - Set Build-Jdk 
according to used toolchain (https://github.com/apache/maven-jar-plugin/pull/73)

- `maven-source-plugin` from `3.3.0` to `3.3.1`

https://github.com/apache/maven-source-plugin/releases/tag/maven-source-plugin-3.3.1

- `maven-assembly-plugin` from `3.6.0` to `3.7.1`

https://github.com/apache/maven-assembly-plugin/releases/tag/maven-assembly-plugin-3.7.1

https://github.com/apache/maven-assembly-plugin/releases/tag/maven-assembly-plugin-3.7.0
Bug Fixes:
1.[[MASSEMBLY-967](https://issues.apache.org/jira/browse/MASSEMBLY-967)] - 
maven-assembly-plugin doesn't add target/class artifacts in generated jarfat 
but META-INF/MANIFEST.MF seems to be correct
2.[[MASSEMBLY-994](https://issues.apache.org/jira/browse/MASSEMBLY-994)] - 
Items from unpacked dependency are not refreshed
3.[[MASSEMBLY-998](https://issues.apache.org/jira/browse/MASSEMBLY-998)] - 
Transitive dependencies are not properly excluded as of 3.1.1
4.[[MASSEMBLY-1008](https://issues.apache.org/jira/browse/MASSEMBLY-1008)] 
- Assembly plugin handles scopes wrongly
5.[[MASSEMBLY-1020](https://issues.apache.org/jira/browse/MASSEMBLY-1020)] 
- Cannot invoke "java.io.File.isFile()" because "this.inputFile" is null
6.[[MASSEMBLY-1021](https://issues.apache.org/jira/browse/MASSEMBLY-1021)] 
- Nullpointer in assembly:single when upgrading to 3.7.0
7.[[MASSEMBLY-1022](https://issues.apache.org/jira/browse/MASSEMBLY-1022)] 
- Unresolved artifacts should be not processed

- `cyclonedx-maven-plugin` from `2.7.9` to `2.8.0`

https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.8.0

https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.11

https://github.com/CycloneDX/cyclonedx-maven-plugin/releases/tag/cyclonedx-maven-plugin-2.7.10
Bug Fixes:
1.check if configured schemaVersion is supported 
(https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/479)
2.ignore bomGenerator.generate() call 
(https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/376)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46043 from panbingkun/update_maven_plugins.

Authored-by: panbingkun 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/pom.xml b/pom.xml
index 99b238aac1dc..bf8d4f1b417d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -116,7 +116,7 @@
 17
 ${java.version}
 3.9.6
-3.1.0
+3.2.0
 spark
 9.6
 2.0.13
@@ -2994,7 +2994,7 @@
 
   org.codehaus.mojo
   build-helper-maven-plugin
-  3.4.0
+  3.5.0
   
 
   module-timestamp-property
@@ -3108,7 +3108,7 @@
 
   org.apache.maven.plugins
   maven-compiler-plugin
-  3.12.1
+  3.13.0
   
 ${java.version}
 true 
@@ -3234,7 +3234,7 @@
 
   org.apache.maven.plugins
   maven-jar-plugin
-  3.3.0
+  3.4.0
 
 
   org.apache.mave

(spark) branch branch-3.5 updated: [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new d54f24cf3c3d [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6
d54f24cf3c3d is described below

commit d54f24cf3c3dc8107fc143d47f7c61edb3ebdc32
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 15 20:39:32 2024 -0700

[SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6

### What changes were proposed in this pull request?

This PR aims to upgrade `Apache Maven` to 3.9.6 for Apache Spark 3.5.2+

This is a backport of the following PR. `Apache Maven 3.9.6` has been used 
over 4 months in `master` branch.
- #44267

### Why are the changes needed?

To bring the latest bug fixes,

- https://maven.apache.org/docs/3.9.0/release-notes.html
- https://maven.apache.org/docs/3.9.1/release-notes.html
- https://maven.apache.org/docs/3.9.2/release-notes.html
- https://maven.apache.org/docs/3.9.3/release-notes.html
- https://maven.apache.org/docs/3.9.5/release-notes.html
- https://maven.apache.org/docs/3.9.6/release-notes.html

### Does this PR introduce _any_ user-facing change?

No because this is a build time change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46069 from dongjoon-hyun/SPARK-46335.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 docs/building-spark.md| 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index 3737382eb86e..792a9aa4e979 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -81,7 +81,7 @@ if (!(Test-Path $tools)) {
 # == Maven
 # Push-Location $tools
 #
-# $mavenVer = "3.8.8"
+# $mavenVer = "3.9.6"
 # Start-FileDownload 
"https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip;
 "maven.zip"
 #
 # # extract
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 33d253a49dbf..4f626b4ff58c 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -27,7 +27,7 @@ license: |
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17.
+Building Spark using Maven requires Maven 3.9.6 and Java 8/11/17.
 Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 
3.0.0.
 
 ### Setting up Maven's Memory Usage
diff --git a/pom.xml b/pom.xml
index 34cbefbeb3f7..6bb764e0c28c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -115,7 +115,7 @@
 1.8
 ${java.version}
 ${java.version}
-3.8.8
+3.9.6
 3.1.0
 spark
 9.5


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ff339362b75 [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13
3ff339362b75 is described below

commit 3ff339362b759d5aef46a7668cbdca1f72ba289e
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 15 20:37:43 2024 -0700

[SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13

### What changes were proposed in this pull request?

This PR aims to upgrade `slf4j` to 2.0.13.

### Why are the changes needed?

To bring the following bug fix,
- https://www.slf4j.org/news.html#2.0.13
  - https://github.com/qos-ch/slf4j/issues/409

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Cis.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46067 from dongjoon-hyun/SPARK-47861.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 6 +++---
 pom.xml   | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 7f48a4327dba..466e8d09d89e 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -128,7 +128,7 @@ javax.servlet-api/4.0.1//javax.servlet-api-4.0.1.jar
 javolution/5.5.1//javolution-5.5.1.jar
 jaxb-api/2.2.11//jaxb-api-2.2.11.jar
 jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
-jcl-over-slf4j/2.0.12//jcl-over-slf4j-2.0.12.jar
+jcl-over-slf4j/2.0.13//jcl-over-slf4j-2.0.13.jar
 jdo-api/3.0.1//jdo-api-3.0.1.jar
 jdom2/2.0.6//jdom2-2.0.6.jar
 jersey-client/3.0.12//jersey-client-3.0.12.jar
@@ -154,7 +154,7 @@ json4s-jackson_2.13/4.0.7//json4s-jackson_2.13-4.0.7.jar
 json4s-scalap_2.13/4.0.7//json4s-scalap_2.13-4.0.7.jar
 jsr305/3.0.0//jsr305-3.0.0.jar
 jta/1.1//jta-1.1.jar
-jul-to-slf4j/2.0.12//jul-to-slf4j-2.0.12.jar
+jul-to-slf4j/2.0.13//jul-to-slf4j-2.0.13.jar
 kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
 kubernetes-client-api/6.12.0//kubernetes-client-api-6.12.0.jar
 kubernetes-client/6.12.0//kubernetes-client-6.12.0.jar
@@ -255,7 +255,7 @@ 
scala-parallel-collections_2.13/1.0.4//scala-parallel-collections_2.13-1.0.4.jar
 scala-parser-combinators_2.13/2.3.0//scala-parser-combinators_2.13-2.3.0.jar
 scala-reflect/2.13.13//scala-reflect-2.13.13.jar
 scala-xml_2.13/2.2.0//scala-xml_2.13-2.2.0.jar
-slf4j-api/2.0.12//slf4j-api-2.0.12.jar
+slf4j-api/2.0.13//slf4j-api-2.0.13.jar
 snakeyaml-engine/2.7//snakeyaml-engine-2.7.jar
 snakeyaml/2.2//snakeyaml-2.2.jar
 snappy-java/1.1.10.5//snappy-java-1.1.10.5.jar
diff --git a/pom.xml b/pom.xml
index fef2601c24db..99b238aac1dc 100644
--- a/pom.xml
+++ b/pom.xml
@@ -119,7 +119,7 @@
 3.1.0
 spark
 9.6
-2.0.12
+2.0.13
 2.22.1
 
 3.4.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-43394][BUILD] Upgrade maven to 3.8.8

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 1ee3f4e6bd79 [SPARK-43394][BUILD] Upgrade maven to 3.8.8
1ee3f4e6bd79 is described below

commit 1ee3f4e6bd7974c238556c538e90dda10dc2e2b7
Author: Cheng Pan 
AuthorDate: Sun May 7 08:24:12 2023 -0500

[SPARK-43394][BUILD] Upgrade maven to 3.8.8

Upgrade Maven from 3.8.7 to 3.8.8.

Maven 3.8.8 is the latest patched version of 3.8.x

https://maven.apache.org/docs/3.8.8/release-notes.html

No

Pass GA.

Closes #41073 from pan3793/SPARK-43394.

Authored-by: Cheng Pan 
Signed-off-by: Sean Owen 
(cherry picked from commit 04ef3d5d0f2bfebce8dd3b48b9861a2aa5ba1c3a)
Signed-off-by: Dongjoon Hyun 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 docs/building-spark.md| 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index a369e9285a0f..88090149f5c0 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -81,7 +81,7 @@ if (!(Test-Path $tools)) {
 # == Maven
 # Push-Location $tools
 #
-# $mavenVer = "3.8.6"
+# $mavenVer = "3.8.8"
 # Start-FileDownload 
"https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip;
 "maven.zip"
 #
 # # extract
diff --git a/docs/building-spark.md b/docs/building-spark.md
index be1c9062c5e2..5704da9cec85 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -27,7 +27,7 @@ license: |
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.8.6 and Java 8.
+Building Spark using Maven requires Maven 3.8.8 and Java 8/11/17.
 Spark requires Scala 2.12/2.13; support for Scala 2.11 was removed in Spark 
3.0.0.
 
 ### Setting up Maven's Memory Usage
diff --git a/pom.xml b/pom.xml
index 3c8d0260c4a8..282a46910902 100644
--- a/pom.xml
+++ b/pom.xml
@@ -113,7 +113,7 @@
 1.8
 ${java.version}
 ${java.version}
-3.8.6
+3.8.8
 1.6.0
 spark
 2.0.6


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated (b8e2498007a0 -> 3b3903dda363)

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


from b8e2498007a0 [SPARK-47318][CORE][3.5] Adds HKDF round to AuthEngine 
key derivation to follow standard KEX practices
 add 3b3903dda363 [SPARK-47828][CONNECT][PYTHON][3.5] 
DataFrameWriterV2.overwrite fails with invalid plan

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/plan.py  | 8 
 python/pyspark/sql/tests/test_readwriter.py | 7 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (becbca6752a5 -> 6d1b3668db42)

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from becbca6752a5 [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 
6.12.0
 add 6d1b3668db42 [SPARK-47855][CONNECT] Add 
`spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/connect/service/SparkConnectConfigHandler.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (61264f77fd68 -> becbca6752a5)

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 61264f77fd68 [SPARK-47603][KUBERNETES][YARN] Resource managers: 
Migrate logWarn with variables to structured logging framework
 add becbca6752a5 [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 
6.12.0

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 50 +--
 pom.xml   |  2 +-
 2 files changed, 26 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated (f3a6ca9e2c47 -> ba673d74973a)

2024-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f3a6ca9e2c47 [SPARK-47357][SQL] Add support for Upper, Lower, InitCap 
(all collations)
 add ba673d74973a [SPARK-47856][SQL] Document Mapping Spark SQL Data Types 
from Oracle and add tests

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|  47 +++
 docs/sql-data-sources-jdbc.md  | 144 +
 2 files changed, 191 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: Fix network-commont module version to 3.4.4-SNAPSHOT

2024-04-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 9993c39ef7a1 Fix network-commont module version to 3.4.4-SNAPSHOT
9993c39ef7a1 is described below

commit 9993c39ef7a104056b143f8e12c824d6ca68ab60
Author: Dongjoon Hyun 
AuthorDate: Sun Apr 14 21:44:22 2024 -0700

Fix network-commont module version to 3.4.4-SNAPSHOT
---
 common/network-common/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 8a1fe5781ba4..da85893ed3b6 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../../pom.xml
   
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r68519 - in /dev/spark/v3.4.3-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/R/articles/ _site/api/R/deps/ _site/api/R/deps/bootstrap-5.3.1/ _site/api/R/deps/jquery-3.6.0/ _site/api

2024-04-14 Thread dongjoon

Author: dongjoon
Date: Mon Apr 15 02:33:02 2024
New Revision: 68519

Log:
Apache Spark v3.4.3-rc2 docs


[This commit notification would consist of 2987 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r68518 - /dev/spark/v3.4.3-rc2-bin/

2024-04-14 Thread dongjoon

Author: dongjoon
Date: Mon Apr 15 01:30:44 2024
New Revision: 68518

Log:
Apache Spark v3.4.3-rc2

Added:
dev/spark/v3.4.3-rc2-bin/
dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz   (with props)
dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc
dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512
dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz   (with props)
dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc
dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz   (with 
props)
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz.asc
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3-scala2.13.tgz.sha512
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz   (with props)
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz.asc
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-hadoop3.tgz.sha512
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz   (with props)
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz.asc
dev/spark/v3.4.3-rc2-bin/spark-3.4.3-bin-without-hadoop.tgz.sha512
dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz   (with props)
dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz.asc
dev/spark/v3.4.3-rc2-bin/spark-3.4.3.tgz.sha512

Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc
==
--- dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc (added)
+++ dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.asc Mon Apr 15 01:30:44 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmYcgt0UHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FwVpQ//feuIM/HSzfE31Blc43Zc05sWRwZ2
+FZeiQGQ6dRbJpjKjLtKMsvlORov9Vx6225VX7bpBqyZ9gQDB8Hq1uoFPiQwagbBn
+qFCDh3agkEVxDZHEYjIBNRW5IVR89rFCCLR+YafKnN+alfCaScmGfAhS2JQYvsfM
+733xqFyxduPqPUVC7uJfi7qLEqrn8QV13duGzWmIEhAdl03/14UwWektNfQaSfPB
+cwv26dnQdUBGoqIEW9eJIM47+Plj1WYMNZtjB60bid5cilm9NjLB6GaHpzijSTHX
+Kpssu22OQPzG7d2D2D3EMvpHiAJC1oUIXnzzJiApOFg9dpcDhtH6Jp3J53UuMfBs
+pX/Yt/0n8VlZoF6DwREtLi3L5AeJt+wrlQQUSwAUNU7bQrM5mtQmuzc9u/lUfcPQ
+74860MGPWPx9+N+5NgSPop9UgP6fOSm53jFXIBJzedHLHhakSTu7+2mHEnpABwTE
+02LuAzZVwJ0N/iH0rwIKzNiikydtQyO7nTCUruGuMLcRFM5wnn3DNeSqbw/zRNAl
+Fabwq/x1dnA4ryoCV20s7ug0iVBsXN+eQzEegpshrUHZLFma4z7+iieX+xpuSu22
+ZWbbR0sh433tndVREpPg8K2oSsaASxkE0yUlgrp97uHDx7WAixReZCQ40szXJEC4
+MGp+TprPL1Ib4OI=
+=w+kM
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512
==
--- dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 (added)
+++ dev/spark/v3.4.3-rc2-bin/SparkR_3.4.3.tar.gz.sha512 Mon Apr 15 01:30:44 2024
@@ -0,0 +1 @@
+2bf5b5b574c916bc74cad122f22c33afec129e56fe6672bb0eaeff7b0218853e1e426e554119b2b0b94c527f05ae041057efbfba53a8916a1c7cc01964366d7b
  SparkR_3.4.3.tar.gz

Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc
==
--- dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc (added)
+++ dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.asc Mon Apr 15 01:30:44 2024
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmYcgt8UHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/Fynzg//TTKfsQ/w2lI1IqYLCJi8FBQJM3vx
+XfzGDq+gkyBSc1ohbNn/nMi/OryOXui5o74d1xmiyWz36M97DRXBaI+ldTFi9lgy
+DCECCDrNU0RcWkHtXCaP0EahN4pBK+82ftD7KrkZAILdvxZpSU2XIesBjs5lrSpn
+NFvwYvWg4COc+tMxvFOybAzqIDhe1geoLeEgizbJcC7PyACH9cQccazco1xoEi6K
+d+pMrBSGeV3ReiML7X6/fFXOwqe1P95NrdRLDdl0irow/p08Tbf8YW5b+Abo0j/E
+37SEh8veYoX0otOFrc5K/Z4sNh5OlLuzXnhOG03bCXpJ71imZGaJUPW286Tbnl8p
+fecG/aZ8Avb0yCWMIzeoffd00ObpFulU8zNQztdGJzQnJR12K1tefNPLA6Al0KE4
+7NljgNDfJL+WGhoip6rYLol7WK1RgGFHPYqcVINz6ZUNChAqdCgrSefCdb//Kavv
+Qkq08Q3QqlcHTGJb2hRmvwMuVYTqyFsRu83/EDYdVNdEZ0lWR5P79z+N+Is6SdYc
+Z/zcnPD83cNNCahyY97VkcyNBcZvx4maa/4AXCzBlGkebc4Yyymt3sft0/QWIM39
+FQz8mqciCQKfqIU4HI5yogxadmmFd5tELyBhbQz5mbgvFhHHDzzJpuDPIA6jZUQc
+C5OZ9KHhC/pF1dg=
+=xtzn
+-END PGP SIGNATURE-

Added: dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512
==
--- dev/spark/v3.4.3-rc2-bin/pyspark-3.4.3.tar.gz.sha512 (added)
+++ dev/spark

(spark) 01/01: Preparing development version 3.4.4-SNAPSHOT

2024-04-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git

commit d8c01554e5b88d0739343f13fe1fddd17892b8bc
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 15 00:21:16 2024 +

Preparing development version 3.4.4-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 42 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 8f6d8f1b3b6e..6d2bd4eb9759 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.3
+Version: 3.4.4
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 7df44b0eb82c..4f5d6213bca5 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 2b6f51089248..161d12d8cd05 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 4ab02df6003c..f772d3d080ed 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 5b256c629847..eda2c13558ae 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index e74655b629df..4f9d962818d2 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.3
+3.4.4-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index d5213c22fd4c..a7a2f2d27adb 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7

(spark) branch branch-3.4 updated (df3e8e4d2a3a -> d8c01554e5b8)

2024-04-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


from df3e8e4d2a3a Preparing development version 3.4.4-SNAPSHOT
 add 1eb558c3a6fb Preparing Spark release v3.4.3-rc2
 new d8c01554e5b8 Preparing development version 3.4.4-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 common/network-common/pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) 01/01: Preparing Spark release v3.4.3-rc2

2024-04-14 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v3.4.3-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1eb558c3a6fbdd59e5a305bc3ab12ce748f6511f
Author: Dongjoon Hyun 
AuthorDate: Mon Apr 15 00:21:11 2024 +

Preparing Spark release v3.4.3-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 connector/avro/pom.xml | 2 +-
 connector/connect/client/jvm/pom.xml   | 2 +-
 connector/connect/common/pom.xml   | 2 +-
 connector/connect/server/pom.xml   | 2 +-
 connector/docker-integration-tests/pom.xml | 2 +-
 connector/kafka-0-10-assembly/pom.xml  | 2 +-
 connector/kafka-0-10-sql/pom.xml   | 2 +-
 connector/kafka-0-10-token-provider/pom.xml| 2 +-
 connector/kafka-0-10/pom.xml   | 2 +-
 connector/kinesis-asl-assembly/pom.xml | 2 +-
 connector/kinesis-asl/pom.xml  | 2 +-
 connector/protobuf/pom.xml | 2 +-
 connector/spark-ganglia-lgpl/pom.xml   | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 43 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 6d2bd4eb9759..8f6d8f1b3b6e 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.4.4
+Version: 3.4.3
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R:
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 4f5d6213bca5..7df44b0eb82c 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.4-SNAPSHOT
+3.4.3
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 161d12d8cd05..2b6f51089248 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.4-SNAPSHOT
+3.4.3
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index da85893ed3b6..8a1fe5781ba4 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.4-SNAPSHOT
+3.4.3
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index f772d3d080ed..4ab02df6003c 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.4-SNAPSHOT
+3.4.3
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index eda2c13558ae..5b256c629847 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.4.4-SNAPSHOT
+3.4.3
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 4f9d96

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 8255 matches

Mail list logo