[spark] branch master updated: [SPARK-37586][SQL] Add the `mode` and `padding` args to `aes_encrypt()`/`aes_decrypt()`

2021-12-08 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f6e439  [SPARK-37586][SQL] Add the `mode` and `padding` args to 
`aes_encrypt()`/`aes_decrypt()`
8f6e439 is described below

commit 8f6e439068281633acefb895f8c4bd9203868c24
Author: Max Gekk 
AuthorDate: Thu Dec 9 14:36:47 2021 +0900

[SPARK-37586][SQL] Add the `mode` and `padding` args to 
`aes_encrypt()`/`aes_decrypt()`

### What changes were proposed in this pull request?
In the PR, I propose to add new optional arguments to the `aes_encrypt()` 
and `aes_decrypt()` functions with default values:
1. `mode` - specifies which block cipher mode should be used to 
encrypt/decrypt messages. Only one valid value is `ECB` at the moment.
2. `padding` - specifies how to pad messages whose length is not a multiple 
of the block size. Currently, only `PKCS` is supported.

In this way, when an user doesn't pass `mode`/`padding` to the functions, 
the functions apply AES encryption/decryption in the `ECB` mode with the 
`PKCS5Padding` padding.

### Why are the changes needed?
1. For now, `aes_encrypt()` and `aes_decrypt()` rely on the jvm's 
configuration regarding which cipher mode to support, this is problematic as it 
is not fixed across versions and systems. By using default constants for new 
arguments, we can guarantee the same behaviour across all supported platforms.
2. We can consider new arguments as new point of extension in the current 
implementation of AES algorithm in Spark SQL. In the future in OSS or in a 
private Spark fork, devs can implement other modes (and paddings) like GCM. 
Other systems have already supported different AES modes, see:
   1. Snowflake: 
https://docs.snowflake.com/en/sql-reference/functions/encrypt.html
   2. BigQuery: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aead-encryption-concepts#block_cipher_modes
   3. MySQL: 
https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-encrypt
   4. Hive: 
https://cwiki.apache.org/confluence/display/hive/languagemanual+udf
   5. PostgreSQL: 
https://www.postgresql.org/docs/12/pgcrypto.html#id-1.11.7.34.8

### Does this PR introduce _any_ user-facing change?
No. This PR just extends existing APIs.

### How was this patch tested?
By running new checks:
```
$ build/sbt "test:testOnly org.apache.spark.sql.DataFrameFunctionsSuite"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
$ build/sbt "sql/testOnly *ExpressionsSchemaSuite"
```

Closes #34837 from MaxGekk/aes-gsm-mode.

Authored-by: Max Gekk 
Signed-off-by: Kousuke Saruta 
---
 .../catalyst/expressions/ExpressionImplUtils.java  | 24 +--
 .../spark/sql/catalyst/expressions/misc.scala  | 78 +-
 .../spark/sql/errors/QueryExecutionErrors.scala| 10 ++-
 .../sql-functions/sql-expression-schema.md |  2 +-
 .../apache/spark/sql/DataFrameFunctionsSuite.scala | 16 +
 5 files changed, 104 insertions(+), 26 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
index 9afa5a6..83205c1 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions;
 
 import org.apache.spark.sql.errors.QueryExecutionErrors;
+import org.apache.spark.unsafe.types.UTF8String;
 
 import javax.crypto.Cipher;
 import javax.crypto.spec.SecretKeySpec;
@@ -27,19 +28,28 @@ import java.security.GeneralSecurityException;
  * An utility class for constructing expressions.
  */
 public class ExpressionImplUtils {
-  public static byte[] aesEncrypt(byte[] input, byte[] key) {
-return aesInternal(input, key, Cipher.ENCRYPT_MODE);
+  public static byte[] aesEncrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
+return aesInternal(input, key, mode.toString(), padding.toString(), 
Cipher.ENCRYPT_MODE);
   }
 
-  public static byte[] aesDecrypt(byte[] input, byte[] key) {
-return aesInternal(input, key, Cipher.DECRYPT_MODE);
+  public static byte[] aesDecrypt(byte[] input, byte[] key, UTF8String mode, 
UTF8String padding) {
+return aesInternal(input, key, mode.toString(), padding.toString(), 
Cipher.DECRYPT_MODE);
   }
 
-  private static byte[] aesInternal(byte[] input, byte[] key, int mode) {
+  private static byte[] aesInternal(
+  byte[] input,
+  byte[] key,
+  String mode,
+  String padding,
+  int opmode) {

[spark] branch master updated (5edd959 -> c7dd2d5)

2021-12-08 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5edd959  [SPARK-37561][SQL] Avoid loading all functions when obtaining 
hive's DelegationToken
 add c7dd2d5  [SPARK-36137][SQL][FOLLOWUP] Correct the config key in error 
msg

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (77a8778 -> 5edd959)

2021-12-08 Thread sunchao
This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 77a8778  [SPARK-37205][YARN] Introduce a new config 
'spark.yarn.am.tokenConfRegex' to support renewing delegation tokens in a 
multi-cluster environment
 add 5edd959  [SPARK-37561][SQL] Avoid loading all functions when obtaining 
hive's DelegationToken

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/client/HiveClientImpl.scala | 36 ++
 .../security/HiveDelegationTokenProvider.scala |  3 +-
 2 files changed, 26 insertions(+), 13 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ca9a68d -> 77a8778)

2021-12-08 Thread sunchao
This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ca9a68d  [SPARK-37529][K8S][TESTS][FOLLOWUP] Allow 
dev-run-integration-tests.sh to take a custom Dockerfile
 add 77a8778  [SPARK-37205][YARN] Introduce a new config 
'spark.yarn.am.tokenConfRegex' to support renewing delegation tokens in a 
multi-cluster environment

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/deploy/yarn/Client.scala  | 49 +-
 .../org/apache/spark/deploy/yarn/config.scala  | 17 
 2 files changed, 64 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (88f5122 -> ca9a68d)

2021-12-08 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 88f5122  [SPARK-37576][K8S] Support built-in K8s executor rolling 
plugin
 add ca9a68d  [SPARK-37529][K8S][TESTS][FOLLOWUP] Allow 
dev-run-integration-tests.sh to take a custom Dockerfile

No new revisions were added by this update.

Summary of changes:
 .../kubernetes/integration-tests/README.md | 14 +
 .../dev/dev-run-integration-tests.sh   | 24 +-
 2 files changed, 37 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37576][K8S] Support built-in K8s executor rolling plugin

2021-12-08 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 88f5122  [SPARK-37576][K8S] Support built-in K8s executor rolling 
plugin
88f5122 is described below

commit 88f5122d37e3cea783616d3c3a8a6198464f1b4d
Author: Dongjoon Hyun 
AuthorDate: Wed Dec 8 07:23:30 2021 -0800

[SPARK-37576][K8S] Support built-in K8s executor rolling plugin

### What changes were proposed in this pull request?

This PR aims to add a built-in plugin for K8s executor rolling decommission 
via the following.

```
spark-3.3.0-SNAPSHOT-bin-3.3.1/bin/spark-submit \
--master k8s://https://kubernetes.docker.internal:6443 \
--deploy-mode cluster \
-c spark.decommission.enabled=true \
-c spark.plugins=org.apache.spark.scheduler.cluster.k8s.ExecutorRollPlugin \
-c spark.kubernetes.executor.rollInterval=60 \
-c spark.executor.instances=2 \
-c spark.kubernetes.container.image=spark:latest \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar 
20
```

### Why are the changes needed?

This built-in plug-in is helpful when we want to refresh the long-lived 
executors to new ones.

### Does this PR introduce _any_ user-facing change?

No. This is a new feature.

### How was this patch tested?

Pass the K8s IT test.

I verified that newly added test case `Rolling decommissioning (1 minute, 
11 seconds)` passed.
```
[info] KubernetesSuite:
[info] - Run SparkPi with no resources (16 seconds, 49 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (15 
seconds, 604 milliseconds)
[info] - Run SparkPi with a very long application name. (16 seconds, 439 
milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (15 seconds, 433 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (15 seconds, 528 
milliseconds)
[info] - Run SparkPi with an argument. (16 seconds, 396 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment 
variables. (15 seconds, 436 milliseconds)
[info] - All pods have the same service account by default (16 seconds, 451 
milliseconds)
[info] - Run extraJVMOptions check on driver (8 seconds, 361 milliseconds)
... (omitted some irrelevant failures) ...
[info] - Verify logging configuration is picked from the provided 
SPARK_CONF_DIR/log4j.properties (16 seconds, 663 milliseconds)
[info] - Run SparkPi with env and mount secrets. (26 seconds, 276 
milliseconds)
[info] - Run PySpark on simple pi.py example (18 seconds, 479 milliseconds)
[info] - Run PySpark to test a pyfiles example (21 seconds, 673 
milliseconds)
[info] - Run PySpark with memory customization (16 seconds, 411 
milliseconds)
[info] - Run in client mode. (12 seconds, 357 milliseconds)
[info] - Start pod creation from template (15 seconds, 619 milliseconds)
... (omitted some irrelevant failures) ...
[info] - Launcher client dependencies (31 seconds, 421 milliseconds)
[info] - SPARK-33615: Launcher client archives (1 minute, 17 seconds)
[info] - SPARK-33748: Launcher python client respecting PYSPARK_PYTHON (37 
seconds, 28 milliseconds)
[info] - SPARK-33748: Launcher python client respecting 
spark.pyspark.python and spark.pyspark.driver.python (36 seconds, 661 
milliseconds)
[info] - Launcher python client dependencies using a zip file (36 seconds, 
411 milliseconds)
[info] - Test basic decommissioning (50 seconds, 105 milliseconds)
[info] - Test basic decommissioning with shuffle cleanup (51 seconds, 285 
milliseconds)
... (omitted some irrelevant failures) ...
[info] - Test decommissioning timeouts (51 seconds, 40 milliseconds)
[info] - Rolling decommissioning (1 minute, 11 seconds)
... (omitted some irrelevant failures) ...
```

Closes #34832 from dongjoon-hyun/SPARK-37576.

Lead-authored-by: Dongjoon Hyun 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/deploy/k8s/Config.scala |  8 ++
 .../scheduler/cluster/k8s/ExecutorRollPlugin.scala | 99 ++
 .../k8s/integrationtest/DecommissionSuite.scala| 29 +++
 3 files changed, 136 insertions(+)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
index 2458e2d..aff6473 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -137,6 +137,14 @@ private[spark] object Config 

[spark] branch master updated: [SPARK-37445][INFRA][FOLLOWUP] Use hadoop3.2 profile instead of hadoop3 for the scheduled GA job for branch-3.2

2021-12-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f14f82  [SPARK-37445][INFRA][FOLLOWUP] Use hadoop3.2 profile instead 
of hadoop3 for the scheduled GA job for branch-3.2
8f14f82 is described below

commit 8f14f824a5f541ffc6847d75ef66feef9543cf98
Author: Kousuke Saruta 
AuthorDate: Wed Dec 8 22:02:35 2021 +0900

[SPARK-37445][INFRA][FOLLOWUP] Use hadoop3.2 profile instead of hadoop3 for 
the scheduled GA job for branch-3.2

### What changes were proposed in this pull request?

This PR fixes an issue that the scheduled GA job for `branch-3.2` fails.
SPARK-37445 (#34715), renamed the profile name `hadoop3.2` to `hadoop3` but 
It should be `hadoop3.2` for the scheduled build.
https://github.com/apache/spark/runs/4453894964?check_suite_focus=true

### Why are the changes needed?

To recover the job.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The scheduled job itself.

Closes #34835 from sarutak/followup-SPARK-37445.

Authored-by: Kousuke Saruta 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 2083063..a8f5edf 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -69,7 +69,7 @@ jobs:
   echo '::set-output name=branch::branch-3.2'
   echo '::set-output name=type::scheduled'
   echo '::set-output name=envs::{"SCALA_PROFILE": "scala2.13"}'
-  echo '::set-output name=hadoop::hadoop3'
+  echo '::set-output name=hadoop::hadoop3.2'
 elif [ "${{ github.event.schedule }}" = "0 10 * * *" ]; then
   echo '::set-output name=java::8'
   echo '::set-output name=branch::master'

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1fac7a9 -> cf19cf5)

2021-12-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1fac7a9  [SPARK-37392][SQL] Fix the performance bug when inferring 
constraints for Generate
 add cf19cf5  [SPARK-37533][SQL][FOLLOWUP] try_element_at should throw an 
error on 0 array index

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/TryEval.scala   | 38 --
 .../expressions/collectionOperations.scala | 46 --
 .../sql-tests/results/ansi/try_element_at.sql.out  |  5 ++-
 .../sql-tests/results/try_element_at.sql.out   |  5 ++-
 4 files changed, 49 insertions(+), 45 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org