date:20230412

[spark] branch master updated (c01dad46c77 -> 0f8218c4324)

2023-04-12 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from c01dad46c77 [MINOR][SQL] Simplify the method 
resolveExprsAndAddMissingAttrs
 add 0f8218c4324 [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar 
meta on the read-side

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |  2 +-
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |  6 +++--
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 14 ++-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  3 +++
 .../apache/spark/sql/jdbc/PostgresDialect.scala|  8 +--
 .../v2/jdbc/JDBCTableCatalogSuite.scala| 19 +--
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 27 +-
 7 files changed, 55 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL] Simplify the method resolveExprsAndAddMissingAttrs

2023-04-12 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c01dad46c77 [MINOR][SQL] Simplify the method 
resolveExprsAndAddMissingAttrs
c01dad46c77 is described below

commit c01dad46c773146b27b8ffc09fdf83b09edefec1
Author: Gengliang Wang 
AuthorDate: Wed Apr 12 22:04:31 2023 -0700

[MINOR][SQL] Simplify the method resolveExprsAndAddMissingAttrs

### What changes were proposed in this pull request?

The method `resolveExprsAndAddMissingAttrs` contains redundant code: 
getting the `newExprs` and `newChild` shows up 4 times in different branches.
This PR is to simplify the implementation of the method.

### Why are the changes needed?

Code clean up and remove redundant code.
### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

Closes #40761 from gengliangwang/cleanup.

Lead-authored-by: Gengliang Wang 
Co-authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 .../catalyst/analysis/ColumnResolutionHelper.scala | 59 +++---
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
index ba550bce791..c5634278490 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
@@ -50,40 +50,39 @@ trait ColumnResolutionHelper extends Logging {
   (exprs, plan)
 } else {
   plan match {
-case p: Project =>
-  // Resolving expressions against current plan.
-  val maybeResolvedExprs = exprs.map(resolveExpressionByPlanOutput(_, 
p))
-  // Recursively resolving expressions on the child of current plan.
-  val (newExprs, newChild) = 
resolveExprsAndAddMissingAttrs(maybeResolvedExprs, p.child)
-  // If some attributes used by expressions are resolvable only on the 
rewritten child
-  // plan, we need to add them into original projection.
-  val missingAttrs = (AttributeSet(newExprs) -- 
p.outputSet).intersect(newChild.outputSet)
-  (newExprs, Project(p.projectList ++ missingAttrs, newChild))
-
-case a @ Aggregate(groupExprs, aggExprs, child) =>
-  val maybeResolvedExprs = exprs.map(resolveExpressionByPlanOutput(_, 
a))
-  val (newExprs, newChild) = 
resolveExprsAndAddMissingAttrs(maybeResolvedExprs, child)
-  val missingAttrs = (AttributeSet(newExprs) -- 
a.outputSet).intersect(newChild.outputSet)
-  if (missingAttrs.forall(attr => 
groupExprs.exists(_.semanticEquals(attr {
-// All the missing attributes are grouping expressions, valid case.
-(newExprs, a.copy(aggregateExpressions = aggExprs ++ missingAttrs, 
child = newChild))
-  } else {
-// Need to add non-grouping attributes, invalid case.
-(exprs, a)
-  }
-
-case g: Generate =>
-  val maybeResolvedExprs = exprs.map(resolveExpressionByPlanOutput(_, 
g))
-  val (newExprs, newChild) = 
resolveExprsAndAddMissingAttrs(maybeResolvedExprs, g.child)
-  (newExprs, g.copy(unrequiredChildIndex = Nil, child = newChild))
-
 // For `Distinct` and `SubqueryAlias`, we can't recursively resolve 
and add attributes
 // via its children.
 case u: UnaryNode if !u.isInstanceOf[Distinct] && 
!u.isInstanceOf[SubqueryAlias] =>
-  val maybeResolvedExprs = exprs.map(resolveExpressionByPlanOutput(_, 
u))
-  val (newExprs, newChild) = 
resolveExprsAndAddMissingAttrs(maybeResolvedExprs, u.child)
-  (newExprs, u.withNewChildren(Seq(newChild)))
+  val (newExprs, newChild) = {
+// Resolving expressions against current plan.
+val maybeResolvedExprs = 
exprs.map(resolveExpressionByPlanOutput(_, u))
+// Recursively resolving expressions on the child of current plan.
+resolveExprsAndAddMissingAttrs(maybeResolvedExprs, u.child)
+  }
+  // If some attributes used by expressions are resolvable only on the 
rewritten child
+  // plan, we need to add them into original projection.
+  lazy val missingAttrs =
+(AttributeSet(newExprs) -- 
u.outputSet).intersect(newChild.outputSet)
+  u match {
+case p: Project =>
+  (newExprs, Project(p.projectList ++ missingAttrs, newChild))
+
+case a @ Aggregate(groupExprs, aggExprs, child) =>
+  if (missingAttrs.forall(attr =>

[spark] branch master updated (a45affe3c8e -> 69abf14b966)

2023-04-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a45affe3c8e [SPARK-43063][SQL] `df.show` handle null should print NULL 
instead of null
 add 69abf14b966 [SPARK-43115][CONNECT][PS][TESTS] Split 
pyspark-pandas-connect from pyspark-connect module

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml |  2 ++
 dev/sparktestsupport/modules.py  | 15 +++
 dev/sparktestsupport/utils.py| 13 -
 3 files changed, 25 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43063][SQL] `df.show` handle null should print NULL instead of null

2023-04-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a45affe3c8e [SPARK-43063][SQL] `df.show` handle null should print NULL 
instead of null
a45affe3c8e is described below

commit a45affe3c8e7a724aea7dbbc1af08e36001c7540
Author: Yikf 
AuthorDate: Thu Apr 13 10:15:14 2023 +0800

[SPARK-43063][SQL] `df.show` handle null should print NULL instead of null

### What changes were proposed in this pull request?

`df.show` handle null should print NULL instead of null to consistent 
behavior;

Like as the following behavior is currently inconsistent:
``` shell
scala> spark.sql("select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 
'New Jersey', 4, 'Seattle') as result").show(false)
+--+
|result|
+--+
|null  |
+--+
```
``` shell
spark-sql> DESC FUNCTION EXTENDED decode;
function_desc
Function: decode
Class: org.apache.spark.sql.catalyst.expressions.Decode
Usage:
decode(bin, charset) - Decodes the first argument using the second 
argument character set.

decode(expr, search, result [, search, result ] ... [, default]) - 
Compares expr
  to each search value in order. If expr is equal to a search value, 
decode returns
  the corresponding result. If no match is found, then it returns 
default. If default
  is omitted, it returns null.

Extended Usage:
Examples:
  > SELECT decode(encode('abc', 'utf-8'), 'utf-8');
   abc
  > SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New 
Jersey', 4, 'Seattle', 'Non domestic');
   San Francisco
  > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New 
Jersey', 4, 'Seattle', 'Non domestic');
   Non domestic
  > SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New 
Jersey', 4, 'Seattle');
   NULL

Since: 3.2.0

Time taken: 0.074 seconds, Fetched 4 row(s)
```
``` shell
spark-sql> select decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New 
Jersey', 4, 'Seattle');
NULL
```

### Why are the changes needed?

`df.show` keep consistent behavior when handle `null` with spark-sql CLI.

### Does this PR introduce _any_ user-facing change?

Yes, `null` will display NULL instead of null.

### How was this patch tested?

GA

Closes #40699 from Yikf/show-NULL.

Authored-by: Yikf 
Signed-off-by: Wenchen Fan 
---
 python/pyspark/ml/feature.py   |  2 +-
 python/pyspark/pandas/frame.py | 20 ++---
 python/pyspark/sql/column.py   |  2 +-
 python/pyspark/sql/dataframe.py| 68 -
 python/pyspark/sql/functions.py| 86 +++---
 python/pyspark/sql/readwriter.py   | 10 +--
 .../sql/tests/connect/test_connect_basic.py| 38 +-
 .../sql/tests/connect/test_connect_column.py   | 36 -
 .../sql/tests/connect/test_connect_function.py | 62 
 .../spark/sql/catalyst/expressions/Cast.scala  | 22 +++---
 .../sql/catalyst/expressions/CastSuiteBase.scala   | 10 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  | 10 +--
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  2 +-
 13 files changed, 184 insertions(+), 184 deletions(-)

diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index ff7aaf71f9c..e7ec35bffa0 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -5313,7 +5313,7 @@ class VectorAssembler(
 +---+---++-+
 |  a|  b|   c| features|
 +---+---++-+
-|1.0|2.0|null|[1.0,2.0,NaN]|
+|1.0|2.0|NULL|[1.0,2.0,NaN]|
 |3.0|NaN| 4.0|[3.0,NaN,4.0]|
 |5.0|6.0| 7.0|[5.0,6.0,7.0]|
 +---+---++-+
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 1f81f0addf9..8bddcb6bae8 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -1530,7 +1530,7 @@ class DataFrame(Frame, Generic[T]):
 # |  A|  B|   C|
 # +---+---++
 # |  1|  2| 3.0|
-# |  4|  1|null|
+# |  4|  1|NULL|
 # +---+---++
 
 pair_scols: List[GenericColumn] = []
@@ -1560,10 +1560,10 @@ class DataFrame(Frame, Generic[T]):
 # |  2|  2|3.0|
3.0|
 # |  0|  0|4.0|
4.0|
 # |  0|  1|4.0|
1.0|
-# |  0|  2|   null|

[spark] branch master updated: [SPARK-43110][SQL] Move asIntegral to PhysicalDataType

2023-04-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a3839d8cf2e [SPARK-43110][SQL] Move asIntegral to PhysicalDataType
a3839d8cf2e is described below

commit a3839d8cf2e4d17c930cf3902538e39b58779e88
Author: Rui Wang 
AuthorDate: Thu Apr 13 09:53:01 2023 +0800

[SPARK-43110][SQL] Move asIntegral to PhysicalDataType

### What changes were proposed in this pull request?

This PR proposes that we move asIntegral to PhysicalDataType. This is to 
simplify the DataType class to make it become a simple interface without 
coupling too many internal representations.

### Why are the changes needed?

To make DataType become a simpler interface, non-public code can be moved 
outside of the DataType class.

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

UT

Closes #40758 from amaliujia/catalyst_datatype_refactor_5.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/catalyst/expressions/arithmetic.scala  | 6 +++---
 .../org/apache/spark/sql/catalyst/types/PhysicalDataType.scala  | 4 
 .../main/scala/org/apache/spark/sql/types/AbstractDataType.scala| 4 +---
 .../src/main/scala/org/apache/spark/sql/types/DecimalType.scala | 1 -
 .../src/main/scala/org/apache/spark/sql/types/DoubleType.scala  | 1 -
 .../src/main/scala/org/apache/spark/sql/types/FloatType.scala   | 1 -
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
index 3fbe7269cb7..31d4d71cd40 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
@@ -886,8 +886,8 @@ case class IntegralDivide(
 val integral = left.dataType match {
   case i: IntegralType =>
 PhysicalIntegralType.integral(i)
-  case d: DecimalType =>
-d.asIntegral.asInstanceOf[Integral[Any]]
+  case DecimalType.Fixed(p, s) =>
+PhysicalDecimalType(p, s).asIntegral.asInstanceOf[Integral[Any]]
   case _: YearMonthIntervalType =>
 PhysicalIntegerType.integral.asInstanceOf[Integral[Any]]
   case _: DayTimeIntervalType =>
@@ -981,7 +981,7 @@ case class Remainder(
   (left, right) => integral.rem(left, right)
 
 case d @ DecimalType.Fixed(precision, scale) =>
-  val integral = d.asIntegral.asInstanceOf[Integral[Any]]
+  val integral = PhysicalDecimalType(precision, 
scale).asIntegral.asInstanceOf[Integral[Any]]
   (left, right) =>
 checkDecimalOverflow(integral.rem(left, right).asInstanceOf[Decimal], 
precision, scale)
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala
index e7e9a2aa83b..b6e0cd88f08 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala
@@ -89,6 +89,7 @@ object PhysicalNumericType {
 
 sealed abstract class PhysicalFractionalType extends PhysicalNumericType {
   private[sql] val fractional: Fractional[InternalType]
+  private[sql] val asIntegral: Integral[InternalType]
 }
 
 object PhysicalFractionalType {
@@ -160,6 +161,7 @@ case class PhysicalDecimalType(precision: Int, scale: Int) 
extends PhysicalFract
   private[sql] val numeric = Decimal.DecimalIsFractional
   override private[sql] def exactNumeric = DecimalExactNumeric
   private[sql] val fractional = Decimal.DecimalIsFractional
+  private[sql] val asIntegral = Decimal.DecimalAsIfIntegral
 }
 
 case object PhysicalDecimalType {
@@ -179,6 +181,7 @@ class PhysicalDoubleType() extends PhysicalFractionalType 
with PhysicalPrimitive
   private[sql] val numeric = implicitly[Numeric[Double]]
   override private[sql] def exactNumeric = DoubleExactNumeric
   private[sql] val fractional = implicitly[Fractional[Double]]
+  private[sql] val asIntegral = DoubleType.DoubleAsIfIntegral
 }
 case object PhysicalDoubleType extends PhysicalDoubleType
 
@@ -193,6 +196,7 @@ class PhysicalFloatType() extends PhysicalFractionalType 
with PhysicalPrimitiveT
   private[sql] val numeric = implicitly[Numeric[Float]]
   override private[sql] def exactNumeric = FloatExactNumeric
   private[sql] val fractional = implicitly[Fractional[Float]]
+  private[sql] val asIntegral = FloatType.FloatAsIfIntegral
 }
 case object PhysicalFloatType extends PhysicalFloatType
 
diff --git

[spark] branch master updated: [SPARK-42656][FOLLOWUP] `chmod+x` for `connector/connect/bin/spark-connect-scala-client-classpath` script

2023-04-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 549c43a8553 [SPARK-42656][FOLLOWUP] `chmod+x` for 
`connector/connect/bin/spark-connect-scala-client-classpath` script
549c43a8553 is described below

commit 549c43a8553461e5e35f6c3f7d597c7322705170
Author: Juliusz Sompolski 
AuthorDate: Thu Apr 13 09:34:59 2023 +0900

[SPARK-42656][FOLLOWUP] `chmod+x` for 
`connector/connect/bin/spark-connect-scala-client-classpath` script

### What changes were proposed in this pull request?

Make the script introduced in https://github.com/apache/spark/pull/40676 
runnable.

### Why are the changes needed?

Somehow the chmod didn't commit with the previous PR, which only became 
apparent when I pulled back from master...

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual dev again...

Closes #40757 from 
juliuszsompolski/spark-connect-scala-client-classpath-chmod.

Authored-by: Juliusz Sompolski 
Signed-off-by: Hyukjin Kwon 
---
 connector/connect/bin/spark-connect-scala-client-classpath | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/connector/connect/bin/spark-connect-scala-client-classpath 
b/connector/connect/bin/spark-connect-scala-client-classpath
old mode 100644
new mode 100755


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2931993e059 -> 76bd695084c)

2023-04-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2931993e059 [SPARK-42437][PYTHON][CONNECT] PySpark catalog.cacheTable 
will allow to specify storage level
 add 76bd695084c [SPARK-43031][SS][CONNECT] Enable unit test and doctest 
for streaming

No new revisions were added by this update.

Summary of changes:
 dev/sparktestsupport/modules.py|   5 +
 python/pyspark/sql/connect/streaming/query.py  |  35 +-
 python/pyspark/sql/connect/streaming/readwriter.py |  40 ++-
 python/pyspark/sql/dataframe.py|   2 +-
 python/pyspark/sql/streaming/query.py  |  14 +-
 python/pyspark/sql/streaming/readwriter.py |  31 +-
 .../connect/streaming/test_parity_streaming.py |  68 
 .../pyspark/sql/tests/streaming/test_streaming.py  | 370 +++--
 .../sql/tests/streaming/test_streaming_foreach.py  | 297 +
 .../tests/streaming/test_streaming_foreachBatch.py | 102 ++
 10 files changed, 603 insertions(+), 361 deletions(-)
 create mode 100644 
python/pyspark/sql/tests/connect/streaming/test_parity_streaming.py
 create mode 100644 python/pyspark/sql/tests/streaming/test_streaming_foreach.py
 create mode 100644 
python/pyspark/sql/tests/streaming/test_streaming_foreachBatch.py


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (dabd771c37b -> 2931993e059)

2023-04-12 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from dabd771c37b [SPARK-43038][SQL] Support the CBC mode by 
`aes_encrypt()`/`aes_decrypt()`
 add 2931993e059 [SPARK-42437][PYTHON][CONNECT] PySpark catalog.cacheTable 
will allow to specify storage level

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/catalog.py  | 23 ---
 python/pyspark/sql/connect/catalog.py  |  5 +++--
 python/pyspark/sql/connect/plan.py | 23 +--
 python/pyspark/sql/tests/test_catalog.py   | 26 --
 python/pyspark/sql/tests/test_dataframe.py | 16 +++-
 5 files changed, 71 insertions(+), 22 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-43038][SQL] Support the CBC mode by `aes_encrypt()`/`aes_decrypt()`

2023-04-12 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dabd771c37b [SPARK-43038][SQL] Support the CBC mode by 
`aes_encrypt()`/`aes_decrypt()`
dabd771c37b is described below

commit dabd771c37be9cbd773b5223d8c78226ece84f8a
Author: Max Gekk 
AuthorDate: Wed Apr 12 16:02:29 2023 +0300

[SPARK-43038][SQL] Support the CBC mode by `aes_encrypt()`/`aes_decrypt()`

### What changes were proposed in this pull request?
In the PR, I propose new AES mode for the `aes_encrypt()`/`aes_decrypt()` 
functions - `CBC` ([Cipher Block 
Chaining](https://www.ibm.com/docs/en/linux-on-systems?topic=operation-cipher-block-chaining-cbc-mode))
 with the padding `PKCS7(5)`. The `aes_encrypt()` function returns a binary 
value which consists of the following fields:
1. The salt magic prefix `Salted__` with the length of 8 bytes.
2. A salt generated per every `aes_encrypt()` call using 
`java.security.SecureRandom`. Its length is 8 bytes.
3. The encrypted input.

The encrypt function derives the secret key and initialization vector (16 
bytes) from the salt and user's key using the same algorithm as OpenSSL's 
`EVP_BytesToKey()` (versions >= 1.1.0c).

The `aes_decrypt()` functions assumes that its input has the fields as 
showed above.

For example:
```sql
spark-sql> SELECT base64(aes_encrypt('Apache Spark', '', 
'CBC', 'PKCS'));
U2FsdGVkX1/ERGxwEOTDpDD4bQvDtQaNe+gXGudCcUk=
spark-sql> SELECT 
aes_decrypt(unbase64('U2FsdGVkX1/ERGxwEOTDpDD4bQvDtQaNe+gXGudCcUk='), 
'', 'CBC', 'PKCS');
Apache Spark
```

### Why are the changes needed?
To achieve feature parity with other systems/frameworks, and make the 
migration process from them to Spark SQL easier. For example, the `CBC` mode is 
supported by:
- BigQuery: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aead-encryption-concepts#block_cipher_modes
- Snowflake: 
https://docs.snowflake.com/en/sql-reference/functions/encrypt.html

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running new checks:
```
$ build/sbt "sql/testOnly *QueryExecutionErrorsSuite"
$ build/sbt "sql/test:testOnly 
org.apache.spark.sql.expressions.ExpressionInfoSuite"
$ build/sbt "test:testOnly org.apache.spark.sql.MiscFunctionsSuite"
$ build/sbt "core/testOnly *SparkThrowableSuite"
```
and checked compatibility with LibreSSL/OpenSSL:
```
$ openssl version
LibreSSL 3.3.6
$ echo -n 'Apache Spark' | openssl enc -e -aes-128-cbc -pass 
pass: -a
U2FsdGVkX1+5GyAmmG7wDWWDBAuUuxjMy++cMFytpls=
```
```sql
spark-sql (default)> SELECT 
aes_decrypt(unbase64('U2FsdGVkX1+5GyAmmG7wDWWDBAuUuxjMy++cMFytpls='), 
'', 'CBC');
Apache Spark
```
decrypt Spark's output by OpenSSL:
```sql
spark-sql (default)> SELECT base64(aes_encrypt('Apache Spark', 
'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'PKCS'));
U2FsdGVkX1+maU2vmxrulgxXuQSyZ3ODnlHKqnt2fDA=
```
```
$ echo 'U2FsdGVkX1+maU2vmxrulgxXuQSyZ3ODnlHKqnt2fDA=' | openssl aes-256-cbc 
-a -d -pass pass:abcdefghijklmnop12345678ABCDEFGH
Apache Spark
```

Closes #40704 from MaxGekk/aes-cbc.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 core/src/main/resources/error/error-classes.json   |  5 ++
 .../catalyst/expressions/ExpressionImplUtils.java  | 72 ++
 .../spark/sql/catalyst/expressions/misc.scala  | 16 +++--
 .../spark/sql/errors/QueryExecutionErrors.scala|  9 +++
 .../org/apache/spark/sql/MiscFunctionsSuite.scala  | 33 +-
 .../sql/errors/QueryExecutionErrorsSuite.scala | 31 --
 6 files changed, 141 insertions(+), 25 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index ae73071a120..1edf625fdc3 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -978,6 +978,11 @@
   "expects a binary value with 16, 24 or 32 bytes, but got 
 bytes."
 ]
   },
+  "AES_SALTED_MAGIC" : {
+"message" : [
+  "Initial bytes from input  do not match 'Salted__' 
(0x53616C7465645F5F)."
+]
+  },
   "PATTERN" : {
 "message" : [
   "."
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
index a6e482db57b..680ad11ad73 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionImplUtils.java
+++

[spark] branch master updated (f8751e2afeb -> 74d840c247a)

2023-04-12 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f8751e2afeb [SPARK-42994][ML][CONNECT] PyTorch Distributor support 
Local Mode
 add 74d840c247a [SPARK-43103][SQL] Moving Integral to PhysicalDataType

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Cast.scala  | 12 +-
 .../sql/catalyst/expressions/arithmetic.scala  | 10 
 .../expressions/collectionOperations.scala |  7 +++---
 .../sql/catalyst/types/PhysicalDataType.scala  | 28 ++
 .../apache/spark/sql/types/AbstractDataType.scala  |  4 +---
 .../org/apache/spark/sql/types/ByteType.scala  |  1 -
 .../org/apache/spark/sql/types/IntegerType.scala   |  3 ---
 .../org/apache/spark/sql/types/LongType.scala  |  1 -
 .../org/apache/spark/sql/types/ShortType.scala |  1 -
 9 files changed, 39 insertions(+), 28 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7e2c6c7ab23 -> f8751e2afeb)

2023-04-12 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7e2c6c7ab23 [SPARK-42985][CONNECT][PYTHON] Fix createDataFrame to 
respect the SQL configs
 add f8751e2afeb [SPARK-42994][ML][CONNECT] PyTorch Distributor support 
Local Mode

No new revisions were added by this update.

Summary of changes:
 .../src/main/protobuf/spark/connect/base.proto |   3 +
 .../src/main/protobuf/spark/connect/commands.proto |   9 ++
 .../src/main/protobuf/spark/connect/common.proto   |  10 ++
 .../sql/connect/planner/SparkConnectPlanner.scala  |  27 
 .../tests/connect/test_parity_torch_distributor.py |  59 +
 python/pyspark/ml/torch/distributor.py |  89 +
 python/pyspark/ml/torch/tests/test_distributor.py  |  30 -
 python/pyspark/sql/connect/client.py   |  16 +++
 python/pyspark/sql/connect/proto/base_pb2.py   | 108 
 python/pyspark/sql/connect/proto/base_pb2.pyi  |  13 ++
 python/pyspark/sql/connect/proto/commands_pb2.py   | 144 ++---
 python/pyspark/sql/connect/proto/commands_pb2.pyi  |  68 ++
 python/pyspark/sql/connect/proto/common_pb2.py |  16 ++-
 python/pyspark/sql/connect/proto/common_pb2.pyi|  30 +
 14 files changed, 462 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (631ee6706e6 -> 7e2c6c7ab23)

2023-04-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 631ee6706e6 [SPARK-43055][CONNECT][PYTHON] Support duplicated nested 
field names
 add 7e2c6c7ab23 [SPARK-42985][CONNECT][PYTHON] Fix createDataFrame to 
respect the SQL configs

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/conversion.py   | 36 +-
 python/pyspark/sql/connect/session.py  | 43 +-
 .../pyspark/sql/tests/connect/test_parity_arrow.py |  5 ---
 .../pyspark/sql/tests/connect/test_parity_types.py | 13 ++-
 python/pyspark/sql/tests/test_types.py | 30 +++
 5 files changed, 80 insertions(+), 47 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a31ac0492a5 -> 631ee6706e6)

2023-04-12 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a31ac0492a5 [SPARK-43039][SQL] Support custom fields in the file 
source _metadata column
 add 631ee6706e6 [SPARK-43055][CONNECT][PYTHON] Support duplicated nested 
field names

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/connect/client/SparkResult.scala |  12 ++-
 .../service/SparkConnectStreamHandler.scala|  36 ++-
 python/pyspark/sql/connect/client.py   |   5 +-
 python/pyspark/sql/connect/conversion.py   | 111 +
 python/pyspark/sql/tests/test_dataframe.py |  20 
 5 files changed, 135 insertions(+), 49 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c01dad46c77 -> 0f8218c4324)

[spark] branch master updated: [MINOR][SQL] Simplify the method resolveExprsAndAddMissingAttrs

[spark] branch master updated (a45affe3c8e -> 69abf14b966)

[spark] branch master updated: [SPARK-43063][SQL] `df.show` handle null should print NULL instead of null

[spark] branch master updated: [SPARK-43110][SQL] Move asIntegral to PhysicalDataType

[spark] branch master updated: [SPARK-42656][FOLLOWUP] `chmod+x` for `connector/connect/bin/spark-connect-scala-client-classpath` script

[spark] branch master updated (2931993e059 -> 76bd695084c)

[spark] branch master updated (dabd771c37b -> 2931993e059)

[spark] branch master updated: [SPARK-43038][SQL] Support the CBC mode by `aes_encrypt()`/`aes_decrypt()`

[spark] branch master updated (f8751e2afeb -> 74d840c247a)

[spark] branch master updated (7e2c6c7ab23 -> f8751e2afeb)

[spark] branch master updated (631ee6706e6 -> 7e2c6c7ab23)

[spark] branch master updated (a31ac0492a5 -> 631ee6706e6)

13 matches

Site Navigation

Mail list logo

Footer information