[spark] branch master updated: [SPARK-45625][BUILD] Upgrade log4j to 2.21.0

2023-10-22 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dbf2f4943c1 [SPARK-45625][BUILD] Upgrade log4j to 2.21.0
dbf2f4943c1 is described below

commit dbf2f4943c17258aa0d3478695022aade92c4e6a
Author: yangjie01 
AuthorDate: Mon Oct 23 12:34:11 2023 +0800

[SPARK-45625][BUILD] Upgrade log4j to 2.21.0

### What changes were proposed in this pull request?
This pr aims upgrade log4j from 2.20.0 to 2.21.0.

### Why are the changes needed?
Support for the zstd compression algorithm has been added in the new 
version: https://github.com/apache/logging-log4j2/issues/1508 | 
https://github.com/apache/logging-log4j2/pull/1514
Meanwhile, the new version starts to use Java 11 for building, and the 
runtime version is still compatible with Java 8: 
https://github.com/apache/logging-log4j2/pull/1369
The new version also brings some bug fixes, such as:
- Fixed logging of java.sql.Date objects by appending it before Log4J tries 
to call java.util.Date.toInstant() on it: 
https://github.com/apache/logging-log4j2/issues/1366
- Fixed concurrent date-time formatting issue in PatternLayout: 
https://github.com/apache/logging-log4j2/pull/1485
- Fixed buffer size in Log4jFixedFormatter date time formatter: 
https://github.com/apache/logging-log4j2/issues/1418
- Fixed the propagation of synchronous action failures in 
RollingFileManager and FileRenameAction: 
https://github.com/apache/logging-log4j2/issues/1445 | 
https://github.com/apache/logging-log4j2/pull/1549
- Fixed RollingFileManager to propagate failed synchronous actions 
correctly: https://github.com/apache/logging-log4j2/issues/1445
and more.

The complete release note is as follows:
- https://github.com/apache/logging-log4j2/releases/tag/rel%2F2.21.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43478 from LuciferYang/SPARK-45625.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 
 pom.xml   | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 71192da2058..c6fa77c84ca 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -175,10 +175,10 @@ lapack/3.0.3//lapack-3.0.3.jar
 leveldbjni-all/1.8//leveldbjni-all-1.8.jar
 libfb303/0.9.3//libfb303-0.9.3.jar
 libthrift/0.12.0//libthrift-0.12.0.jar
-log4j-1.2-api/2.20.0//log4j-1.2-api-2.20.0.jar
-log4j-api/2.20.0//log4j-api-2.20.0.jar
-log4j-core/2.20.0//log4j-core-2.20.0.jar
-log4j-slf4j2-impl/2.20.0//log4j-slf4j2-impl-2.20.0.jar
+log4j-1.2-api/2.21.0//log4j-1.2-api-2.21.0.jar
+log4j-api/2.21.0//log4j-api-2.21.0.jar
+log4j-core/2.21.0//log4j-core-2.21.0.jar
+log4j-slf4j2-impl/2.21.0//log4j-slf4j2-impl-2.21.0.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 metrics-core/4.2.19//metrics-core-4.2.19.jar
diff --git a/pom.xml b/pom.xml
index da9245a4c8a..ededd5f207c 100644
--- a/pom.xml
+++ b/pom.xml
@@ -120,7 +120,7 @@
 spark
 9.6
 2.0.9
-2.20.0
+2.21.0
 
 3.3.6
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.5 updated: [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession

2023-10-22 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 75a38b9024a [SPARK-45616][CORE] Avoid ParVector, which does not 
propagate ThreadLocals or SparkSession
75a38b9024a is described below

commit 75a38b9024af3c9cfd85e916c46359f7e7315c87
Author: Ankur Dave 
AuthorDate: Mon Oct 23 10:47:42 2023 +0800

[SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals 
or SparkSession

### What changes were proposed in this pull request?
`CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run 
Spark SQL queries in parallel. They incorrectly assume that each parallel 
operation will inherit the main thread’s active SparkSession. This is only true 
when these parallel operations run in freshly-created threads. However, when 
other code has already run some parallel operations before Spark was started, 
then there may be existing threads that do not have an active SparkSession. In 
that case, these tests fai [...]

The fix is to use the existing method `ThreadUtils.parmap()`. This method 
creates fresh threads that inherit the current active SparkSession, and it 
propagates the Spark ThreadLocals.

This PR also adds a scalastyle warning against use of ParVector.

### Why are the changes needed?
This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to 
future changes that may run parallel operations during test startup.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Reproduced the test failures by running a ParVector operation before Spark 
starts. Verified that this PR fixes the test failures in this condition.

```scala
  protected override def beforeAll(): Unit = {
// Run a ParVector operation before initializing the SparkSession. This 
starts some Scala
// execution context threads that have no active SparkSession. These 
threads will be reused for
// later ParVector operations, reproducing SPARK-45616.
new ParVector((0 until 100).toVector).foreach { _ => }

super.beforeAll()
  }
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43466 from ankurdave/SPARK-45616.

Authored-by: Ankur Dave 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 376de8a502fca6b46d7f21560a60024d643144ea)
Signed-off-by: Wenchen Fan 
---
 core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala  |  2 ++
 core/src/main/scala/org/apache/spark/util/ThreadUtils.scala  |  4 
 scalastyle-config.xml| 12 
 .../spark/sql/catalyst/expressions/CastSuiteBase.scala   |  9 ++---
 .../scala/org/apache/spark/sql/execution/command/ddl.scala   |  2 ++
 .../apache/spark/sql/expressions/ExpressionInfoSuite.scala   | 11 ++-
 .../main/scala/org/apache/spark/streaming/DStreamGraph.scala |  4 
 .../apache/spark/streaming/util/FileBasedWriteAheadLog.scala |  2 ++
 8 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
index 0a930234437..3c1451a0185 100644
--- a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
@@ -76,8 +76,10 @@ class UnionRDD[T: ClassTag](
 
   override def getPartitions: Array[Partition] = {
 val parRDDs = if (isPartitionListingParallel) {
+  // scalastyle:off parvector
   val parArray = new ParVector(rdds.toVector)
   parArray.tasksupport = UnionRDD.partitionEvalTaskSupport
+  // scalastyle:on parvector
   parArray
 } else {
   rdds
diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala 
b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
index 16d7de56c39..2d3d6ec89ff 100644
--- a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
+++ b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
@@ -363,6 +363,10 @@ private[spark] object ThreadUtils {
* Comparing to the map() method of Scala parallel collections, this method 
can be interrupted
* at any time. This is useful on canceling of task execution, for example.
*
+   * Functions are guaranteed to be executed in freshly-created threads that 
inherit the calling
+   * thread's Spark thread-local variables. These threads also inherit the 
calling thread's active
+   * SparkSession.
+   *
* @param in - the input collection which should be transformed in parallel.
* @param prefix - the prefix assigned to the underlying thread pool.
* @param maxThreads - maximum number of thread can be created during 
execution.
diff --git 

[spark] branch master updated: [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession

2023-10-22 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 376de8a502f [SPARK-45616][CORE] Avoid ParVector, which does not 
propagate ThreadLocals or SparkSession
376de8a502f is described below

commit 376de8a502fca6b46d7f21560a60024d643144ea
Author: Ankur Dave 
AuthorDate: Mon Oct 23 10:47:42 2023 +0800

[SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals 
or SparkSession

### What changes were proposed in this pull request?
`CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run 
Spark SQL queries in parallel. They incorrectly assume that each parallel 
operation will inherit the main thread’s active SparkSession. This is only true 
when these parallel operations run in freshly-created threads. However, when 
other code has already run some parallel operations before Spark was started, 
then there may be existing threads that do not have an active SparkSession. In 
that case, these tests fai [...]

The fix is to use the existing method `ThreadUtils.parmap()`. This method 
creates fresh threads that inherit the current active SparkSession, and it 
propagates the Spark ThreadLocals.

This PR also adds a scalastyle warning against use of ParVector.

### Why are the changes needed?
This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to 
future changes that may run parallel operations during test startup.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Reproduced the test failures by running a ParVector operation before Spark 
starts. Verified that this PR fixes the test failures in this condition.

```scala
  protected override def beforeAll(): Unit = {
// Run a ParVector operation before initializing the SparkSession. This 
starts some Scala
// execution context threads that have no active SparkSession. These 
threads will be reused for
// later ParVector operations, reproducing SPARK-45616.
new ParVector((0 until 100).toVector).foreach { _ => }

super.beforeAll()
  }
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43466 from ankurdave/SPARK-45616.

Authored-by: Ankur Dave 
Signed-off-by: Wenchen Fan 
---
 core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala  |  2 ++
 core/src/main/scala/org/apache/spark/util/ThreadUtils.scala  |  4 
 scalastyle-config.xml| 12 
 .../spark/sql/catalyst/expressions/CastSuiteBase.scala   |  9 ++---
 .../scala/org/apache/spark/sql/execution/command/ddl.scala   |  2 ++
 .../apache/spark/sql/expressions/ExpressionInfoSuite.scala   | 11 ++-
 .../main/scala/org/apache/spark/streaming/DStreamGraph.scala |  4 
 .../apache/spark/streaming/util/FileBasedWriteAheadLog.scala |  2 ++
 8 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
index 0a930234437..3c1451a0185 100644
--- a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala
@@ -76,8 +76,10 @@ class UnionRDD[T: ClassTag](
 
   override def getPartitions: Array[Partition] = {
 val parRDDs = if (isPartitionListingParallel) {
+  // scalastyle:off parvector
   val parArray = new ParVector(rdds.toVector)
   parArray.tasksupport = UnionRDD.partitionEvalTaskSupport
+  // scalastyle:on parvector
   parArray
 } else {
   rdds
diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala 
b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
index 16d7de56c39..2d3d6ec89ff 100644
--- a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
+++ b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
@@ -363,6 +363,10 @@ private[spark] object ThreadUtils {
* Comparing to the map() method of Scala parallel collections, this method 
can be interrupted
* at any time. This is useful on canceling of task execution, for example.
*
+   * Functions are guaranteed to be executed in freshly-created threads that 
inherit the calling
+   * thread's Spark thread-local variables. These threads also inherit the 
calling thread's active
+   * SparkSession.
+   *
* @param in - the input collection which should be transformed in parallel.
* @param prefix - the prefix assigned to the underlying thread pool.
* @param maxThreads - maximum number of thread can be created during 
execution.
diff --git a/scalastyle-config.xml b/scalastyle-config.xml
index 987b4235c19..2077769c71d 100644
--- a/scalastyle-config.xml
+++ 

[spark] branch master updated: [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase

2023-10-22 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e3ba9cf0403 [SPARK-45620][PYTHON] Fix user-facing APIs related to 
Python UDTF to use camelCase
e3ba9cf0403 is described below

commit e3ba9cf0403ade734f87621472088687e533b2cd
Author: Takuya UESHIN 
AuthorDate: Mon Oct 23 10:35:30 2023 +0900

[SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use 
camelCase

### What changes were proposed in this pull request?

Fix user-facing APIs related to Python UDTF to use camelCase.

### Why are the changes needed?

To keep the naming convention for user-facing APIs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated the related tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43470 from ueshin/issues/SPARK-45620/field_names.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/docs/source/user_guide/sql/python_udtf.rst | 22 +++---
 python/pyspark/sql/functions.py   | 12 ++--
 python/pyspark/sql/tests/test_udtf.py | 84 +++
 python/pyspark/sql/udtf.py| 24 +++
 python/pyspark/sql/worker/analyze_udtf.py | 12 ++--
 5 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/python/docs/source/user_guide/sql/python_udtf.rst 
b/python/docs/source/user_guide/sql/python_udtf.rst
index fb42644dc70..0e0c6e28578 100644
--- a/python/docs/source/user_guide/sql/python_udtf.rst
+++ b/python/docs/source/user_guide/sql/python_udtf.rst
@@ -77,29 +77,29 @@ To implement a Python UDTF, you first need to define a 
class implementing the me
 the particular UDTF call under consideration. Each parameter is an 
instance of the
 `AnalyzeArgument` class, which contains fields including the 
provided argument's data
 type and value (in the case of literal scalar arguments only). For 
table arguments, the
-`is_table` field is set to true and the `data_type` field is a 
StructType representing
+`isTable` field is set to true and the `dataType` field is a 
StructType representing
 the table's column types:
 
-data_type: DataType
+dataType: DataType
 value: Optional[Any]
-is_table: bool
+isTable: bool
 
 This method returns an instance of the `AnalyzeResult` class which 
includes the result
 table's schema as a StructType. If the UDTF accepts an input table 
argument, then the
 `AnalyzeResult` can also include a requested way to partition the 
rows of the input
-table across several UDTF calls. If `with_single_partition` is set 
to True, the query
+table across several UDTF calls. If `withSinglePartition` is set 
to True, the query
 planner will arrange a repartitioning operation from the previous 
execution stage such
 that all rows of the input table are consumed by the `eval` method 
from exactly one
-instance of the UDTF class. On the other hand, if the 
`partition_by` list is non-empty,
+instance of the UDTF class. On the other hand, if the 
`partitionBy` list is non-empty,
 the query planner will arrange a repartitioning such that all rows 
with each unique
 combination of values of the partitioning columns are consumed by 
a separate unique
-instance of the UDTF class. If `order_by` is non-empty, this 
specifies the requested
+instance of the UDTF class. If `orderBy` is non-empty, this 
specifies the requested
 ordering of rows within each partition.
 
 schema: StructType
-with_single_partition: bool = False
-partition_by: Sequence[PartitioningColumn] = 
field(default_factory=tuple)
-order_by: Sequence[OrderingColumn] = 
field(default_factory=tuple)
+withSinglePartition: bool = False
+partitionBy: Sequence[PartitioningColumn] = 
field(default_factory=tuple)
+orderBy: Sequence[OrderingColumn] = 
field(default_factory=tuple)
 
 Examples
 
@@ -116,7 +116,7 @@ To implement a Python UDTF, you first need to define a 
class implementing the me
 
 >>> def analyze(self, *args) -> AnalyzeResult:
 ... assert len(args) == 1, "This function accepts one argument 
only"
-... assert args[0].data_type == StringType(), "Only string 
arguments are supported"
+... assert args[0].dataType == StringType(), "Only string 
arguments are 

[spark] branch master updated (7763c91950e -> 9f675c54a56)

2023-10-22 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7763c91950e [SPARK-45624][CORE][TESTS] Use 
`AccessibleObject#canAccess` instead of `AccessibleObject#isAccessible`
 add 9f675c54a56 [SPARK-44753][PYTHON][CONNECT] XML: pyspark sql xml reader 
writer

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/readwriter.py   |  84 +
 python/pyspark/sql/connect/streaming/readwriter.py |  54 ++
 python/pyspark/sql/readwriter.py   | 200 +
 python/pyspark/sql/streaming/readwriter.py |  99 ++
 .../sql/tests/connect/test_connect_basic.py|  56 ++
 .../sql/tests/connect/test_parity_datasources.py   |   4 +
 python/pyspark/sql/tests/test_datasources.py   |  49 +
 7 files changed, 546 insertions(+)
 mode change 100644 => 100755 
python/pyspark/sql/tests/connect/test_connect_basic.py


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (48e207f4a21 -> 7763c91950e)

2023-10-22 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 48e207f4a21 
[SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES]
 Fix the compilation warning "Auto-application to `()` is deprecated" and turn 
it into a compilation error
 add 7763c91950e [SPARK-45624][CORE][TESTS] Use 
`AccessibleObject#canAccess` instead of `AccessibleObject#isAccessible`

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (2709426f0f6 -> 48e207f4a21)

2023-10-22 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory
 add 48e207f4a21 
[SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES]
 Fix the compilation warning "Auto-application to `()` is deprecated" and turn 
it into a compilation error

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  38 ++---
 .../execution/benchmark/AvroReadBenchmark.scala|  10 +-
 .../execution/benchmark/AvroWriteBenchmark.scala   |   4 +-
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  |  12 +-
 .../spark/sql/DataFrameNaFunctionSuite.scala   |   2 +-
 .../sql/UserDefinedFunctionE2ETestSuite.scala  |   2 +-
 .../spark/sql/connect/client/ArtifactManager.scala |   2 +-
 .../sql/connect/client/GrpcRetryHandler.scala  |   2 +-
 .../execution/ExecuteResponseObserver.scala|   4 +-
 .../execution/SparkConnectPlanExecution.scala  |   2 +-
 .../sql/connect/planner/SparkConnectPlanner.scala  |  12 +-
 .../sql/connect/service/ExecuteEventsManager.scala |   2 +-
 .../sql/connect/service/SparkConnectServer.scala   |   2 +-
 .../connect/planner/SparkConnectServiceSuite.scala |   2 +-
 .../connect/service/AddArtifactsHandlerSuite.scala |  12 +-
 .../service/ArtifactStatusesHandlerSuite.scala |   2 +-
 .../service/FetchErrorDetailsHandlerSuite.scala|   2 +-
 .../connect/service/InterceptorRegistrySuite.scala |  12 +-
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   |   8 +-
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala |   6 +-
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala |   4 +-
 .../spark/sql/jdbc/OracleIntegrationSuite.scala|   8 +-
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  |   4 +-
 .../sql/kafka010/KafkaOffsetReaderConsumer.scala   |   2 +-
 .../sql/kafka010/consumer/KafkaDataConsumer.scala  |   2 +-
 .../sql/kafka010/KafkaContinuousSourceSuite.scala  |  16 +-
 .../sql/kafka010/KafkaMicroBatchSourceSuite.scala  |   8 +-
 .../spark/sql/kafka010/KafkaRelationSuite.scala|  36 ++--
 .../kafka010/consumer/KafkaDataConsumerSuite.scala |   2 +-
 .../org/apache/spark/kafka010/KafkaTokenUtil.scala |   4 +-
 .../kafka010/DirectKafkaInputDStream.scala |  12 +-
 .../streaming/kafka010/KafkaDataConsumer.scala |   2 +-
 .../apache/spark/streaming/kafka010/KafkaRDD.scala |  14 +-
 .../kafka010/DirectKafkaStreamSuite.scala  |  14 +-
 .../kafka010/KafkaDataConsumerSuite.scala  |   4 +-
 .../spark/streaming/kafka010/KafkaRDDSuite.scala   |  36 ++--
 .../kinesis/KPLBasedKinesisTestUtils.scala |   2 +-
 .../kinesis/KinesisInputDStreamBuilderSuite.scala  |   2 +-
 .../org/apache/spark/BarrierCoordinator.scala  |   2 +-
 .../org/apache/spark/BarrierTaskContext.scala  |  19 ++-
 .../main/scala/org/apache/spark/Heartbeater.scala  |   2 +-
 .../scala/org/apache/spark/SecurityManager.scala   |   4 +-
 .../main/scala/org/apache/spark/SparkContext.scala |  12 +-
 .../main/scala/org/apache/spark/TestUtils.scala|   2 +-
 .../apache/spark/api/java/JavaSparkContext.scala   |   2 +-
 .../org/apache/spark/api/python/PythonRunner.scala |  22 +--
 .../scala/org/apache/spark/api/r/BaseRRunner.scala |   2 +-
 .../org/apache/spark/deploy/JsonProtocol.scala |   2 +-
 .../apache/spark/deploy/SparkSubmitArguments.scala |   2 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |   2 +-
 .../apache/spark/deploy/history/HistoryPage.scala  |   2 +-
 .../apache/spark/deploy/worker/CommandUtils.scala  |   4 +-
 .../org/apache/spark/deploy/worker/Worker.scala|   2 +-
 .../spark/executor/ProcfsMetricsGetter.scala   |   4 +-
 .../apache/spark/input/PortableDataStream.scala|   2 +-
 .../spark/internal/io/SparkHadoopWriter.scala  |   4 +-
 .../apache/spark/launcher/LauncherBackend.scala|   4 +-
 .../apache/spark/memory/UnifiedMemoryManager.scala |   2 +-
 .../org/apache/spark/metrics/MetricsSystem.scala   |   4 +-
 .../org/apache/spark/rdd/AsyncRDDActions.scala |   2 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala |   2 +-
 .../main/scala/org/apache/spark/rdd/PipedRDD.scala |   4 +-
 .../apache/spark/rdd/ReliableCheckpointRDD.scala   |   6 +-
 .../org/apache/spark/scheduler/DAGScheduler.scala  |  16 +-
 .../spark/scheduler/StatsReportListener.scala  |   2 +-
 .../scala/org/apache/spark/scheduler/Task.scala|   4 +-
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |   4 +-
 .../apache/spark/scheduler/TaskSetManager.scala|   4 +-
 .../cluster/CoarseGrainedSchedulerBackend.scala|   6 +-
 .../cluster/StandaloneSchedulerBackend.scala   |   2 +-
 .../apache/spark/serializer/KryoSerializer.scala   |  10 +-
 .../apache/spark/status/AppStatusListener.scala|  10 +-
 .../apache/spark/status/ElementTrackingStore.scala 

[spark] branch master updated: [SPARK-45541][CORE] Add SSLFactory

2023-10-22 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory
2709426f0f6 is described below

commit 2709426f0f622a214c51954664458e7dd2ab3304
Author: Hasnain Lakhani 
AuthorDate: Sun Oct 22 03:21:19 2023 -0500

[SPARK-45541][CORE] Add SSLFactory

### What changes were proposed in this pull request?

As titled - add a factory which supports creating SSL engines, and a 
corresponding builder for it. This will be used in a follow up PR by the 
`TransportContext` and related files to add SSL support.

### Why are the changes needed?

We need a mechanism to initialize the appropriate SSL implementation with 
the configured settings (such as protocol, ciphers, etc) for RPC SSL support.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests. This will be more thoroughly tested in a follow up PR which 
adds callsites to it. It has been integration tested as part of 
https://github.com/apache/spark/pull/42685

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43386 from hasnain-db/spark-tls-factory.

Authored-by: Hasnain Lakhani 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../spark/network/protocol/SslMessageEncoder.java  |   2 +-
 .../network/ssl/ReloadingX509TrustManager.java |   5 +-
 .../org/apache/spark/network/ssl/SSLFactory.java   | 470 +
 3 files changed, 474 insertions(+), 3 deletions(-)

diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java
 
b/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java
index f43d0789ee6..87723c6613e 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java
@@ -36,7 +36,7 @@ import org.slf4j.LoggerFactory;
 @ChannelHandler.Sharable
 public final class SslMessageEncoder extends MessageToMessageEncoder {
 
-  private final Logger logger = 
LoggerFactory.getLogger(SslMessageEncoder.class);
+  private static final Logger logger = 
LoggerFactory.getLogger(SslMessageEncoder.class);
 
   private SslMessageEncoder() {}
 
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java
 
b/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java
index 4c39a5d2a3d..87798bda2a0 100644
--- 
a/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java
@@ -46,7 +46,7 @@ import org.slf4j.LoggerFactory;
 public final class ReloadingX509TrustManager
 implements X509TrustManager, Runnable {
 
-  private final Logger logger = 
LoggerFactory.getLogger(ReloadingX509TrustManager.class);
+  private static final Logger logger = 
LoggerFactory.getLogger(ReloadingX509TrustManager.class);
 
   private final String type;
   private final File file;
@@ -180,7 +180,8 @@ public final class ReloadingX509TrustManager
 canonicalPath = latestCanonicalFile.getPath();
 lastLoaded = latestCanonicalFile.lastModified();
 try (FileInputStream in = new FileInputStream(latestCanonicalFile)) {
-  ks.load(in, password.toCharArray());
+  char[] passwordCharacters = password != null? password.toCharArray() : 
null;
+  ks.load(in, passwordCharacters);
   logger.debug("Loaded truststore '" + file + "'");
 }
 
diff --git 
a/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java
 
b/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java
new file mode 100644
index 000..fc03dba617f
--- /dev/null
+++ 
b/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java
@@ -0,0 +1,470 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the 

[spark] branch master updated: [SPARK-45574][SQL] Add :: syntax as a shorthand for casting

2023-10-22 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 936e98dbc07 [SPARK-45574][SQL] Add :: syntax as a shorthand for casting
936e98dbc07 is described below

commit 936e98dbc073fcfcb7e6c40720d55dac63a73d51
Author: Ivan Mitic 
AuthorDate: Sun Oct 22 11:23:01 2023 +0500

[SPARK-45574][SQL] Add :: syntax as a shorthand for casting

### What changes were proposed in this pull request?

Adds the `::` syntax as syntactic sugar for casting columns. This is a 
pretty common syntax across many industry databases.

### Does this PR introduce _any_ user-facing change?

Yes, new casting syntax.

### How was this patch tested?

Unit tests.
SQL tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43430 from mitkedb/master.

Authored-by: Ivan Mitic 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/parser/SqlBaseLexer.g4  |   1 +
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |   1 +
 .../spark/sql/catalyst/expressions/Cast.scala  |   5 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  11 +
 .../sql/catalyst/parser/CastingSyntaxSuite.scala   | 103 ++
 .../sql-tests/analyzer-results/ansi/cast.sql.out   | 234 +
 .../sql-tests/analyzer-results/cast.sql.out| 217 
 .../src/test/resources/sql-tests/inputs/cast.sql   |  31 ++
 .../resources/sql-tests/results/ansi/cast.sql.out  | 385 +
 .../test/resources/sql-tests/results/cast.sql.out  | 245 +
 .../spark/sql/errors/QueryParsingErrorsSuite.scala |   4 +-
 .../sql/expressions/ExpressionInfoSuite.scala  |   4 +-
 12 files changed, 1237 insertions(+), 4 deletions(-)

diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
index d9128de0f5d..e8b5cb012fc 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4
@@ -447,6 +447,7 @@ PIPE: '|';
 CONCAT_PIPE: '||';
 HAT: '^';
 COLON: ':';
+DOUBLE_COLON: '::';
 ARROW: '->';
 FAT_ARROW : '=>';
 HENT_START: '/*+';
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index 77a9108e063..84a31dafed9 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -957,6 +957,7 @@ primaryExpression
 | CASE whenClause+ (ELSE elseExpression=expression)? END   
#searchedCase
 | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END  
#simpleCase
 | name=(CAST | TRY_CAST) LEFT_PAREN expression AS dataType RIGHT_PAREN 
#cast
+| primaryExpression DOUBLE_COLON dataType  
#castByColon
 | STRUCT LEFT_PAREN (argument+=namedExpression (COMMA 
argument+=namedExpression)*)? RIGHT_PAREN #struct
 | FIRST LEFT_PAREN expression (IGNORE NULLS)? RIGHT_PAREN  
#first
 | ANY_VALUE LEFT_PAREN expression (IGNORE NULLS)? RIGHT_PAREN  
#any_value
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index b975dc3c7a5..99117d81b34 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -438,11 +438,14 @@ object Cast extends QueryErrorsBase {
  * session local timezone by an analyzer [[ResolveTimeZone]].
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data 
type `type`.",
+  usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data 
type `type`." +
+  " `expr` :: `type` alternative casting syntax is also supported.",
   examples = """
 Examples:
   > SELECT _FUNC_('10' as int);
10
+  > SELECT '10' :: int;
+   10
   """,
   since = "1.0.0",
   group = "conversion_funcs")
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 8ce58ef7688..7e0aafca31c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++