[spark] branch master updated: [SPARK-45625][BUILD] Upgrade log4j to 2.21.0
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dbf2f4943c1 [SPARK-45625][BUILD] Upgrade log4j to 2.21.0 dbf2f4943c1 is described below commit dbf2f4943c17258aa0d3478695022aade92c4e6a Author: yangjie01 AuthorDate: Mon Oct 23 12:34:11 2023 +0800 [SPARK-45625][BUILD] Upgrade log4j to 2.21.0 ### What changes were proposed in this pull request? This pr aims upgrade log4j from 2.20.0 to 2.21.0. ### Why are the changes needed? Support for the zstd compression algorithm has been added in the new version: https://github.com/apache/logging-log4j2/issues/1508 | https://github.com/apache/logging-log4j2/pull/1514 Meanwhile, the new version starts to use Java 11 for building, and the runtime version is still compatible with Java 8: https://github.com/apache/logging-log4j2/pull/1369 The new version also brings some bug fixes, such as: - Fixed logging of java.sql.Date objects by appending it before Log4J tries to call java.util.Date.toInstant() on it: https://github.com/apache/logging-log4j2/issues/1366 - Fixed concurrent date-time formatting issue in PatternLayout: https://github.com/apache/logging-log4j2/pull/1485 - Fixed buffer size in Log4jFixedFormatter date time formatter: https://github.com/apache/logging-log4j2/issues/1418 - Fixed the propagation of synchronous action failures in RollingFileManager and FileRenameAction: https://github.com/apache/logging-log4j2/issues/1445 | https://github.com/apache/logging-log4j2/pull/1549 - Fixed RollingFileManager to propagate failed synchronous actions correctly: https://github.com/apache/logging-log4j2/issues/1445 and more. The complete release note is as follows: - https://github.com/apache/logging-log4j2/releases/tag/rel%2F2.21.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43478 from LuciferYang/SPARK-45625. Authored-by: yangjie01 Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 pom.xml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 71192da2058..c6fa77c84ca 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -175,10 +175,10 @@ lapack/3.0.3//lapack-3.0.3.jar leveldbjni-all/1.8//leveldbjni-all-1.8.jar libfb303/0.9.3//libfb303-0.9.3.jar libthrift/0.12.0//libthrift-0.12.0.jar -log4j-1.2-api/2.20.0//log4j-1.2-api-2.20.0.jar -log4j-api/2.20.0//log4j-api-2.20.0.jar -log4j-core/2.20.0//log4j-core-2.20.0.jar -log4j-slf4j2-impl/2.20.0//log4j-slf4j2-impl-2.20.0.jar +log4j-1.2-api/2.21.0//log4j-1.2-api-2.21.0.jar +log4j-api/2.21.0//log4j-api-2.21.0.jar +log4j-core/2.21.0//log4j-core-2.21.0.jar +log4j-slf4j2-impl/2.21.0//log4j-slf4j2-impl-2.21.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar metrics-core/4.2.19//metrics-core-4.2.19.jar diff --git a/pom.xml b/pom.xml index da9245a4c8a..ededd5f207c 100644 --- a/pom.xml +++ b/pom.xml @@ -120,7 +120,7 @@ spark 9.6 2.0.9 -2.20.0 +2.21.0 3.3.6 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.5 updated: [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.5 by this push: new 75a38b9024a [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession 75a38b9024a is described below commit 75a38b9024af3c9cfd85e916c46359f7e7315c87 Author: Ankur Dave AuthorDate: Mon Oct 23 10:47:42 2023 +0800 [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession ### What changes were proposed in this pull request? `CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run Spark SQL queries in parallel. They incorrectly assume that each parallel operation will inherit the main thread’s active SparkSession. This is only true when these parallel operations run in freshly-created threads. However, when other code has already run some parallel operations before Spark was started, then there may be existing threads that do not have an active SparkSession. In that case, these tests fai [...] The fix is to use the existing method `ThreadUtils.parmap()`. This method creates fresh threads that inherit the current active SparkSession, and it propagates the Spark ThreadLocals. This PR also adds a scalastyle warning against use of ParVector. ### Why are the changes needed? This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to future changes that may run parallel operations during test startup. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Reproduced the test failures by running a ParVector operation before Spark starts. Verified that this PR fixes the test failures in this condition. ```scala protected override def beforeAll(): Unit = { // Run a ParVector operation before initializing the SparkSession. This starts some Scala // execution context threads that have no active SparkSession. These threads will be reused for // later ParVector operations, reproducing SPARK-45616. new ParVector((0 until 100).toVector).foreach { _ => } super.beforeAll() } ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43466 from ankurdave/SPARK-45616. Authored-by: Ankur Dave Signed-off-by: Wenchen Fan (cherry picked from commit 376de8a502fca6b46d7f21560a60024d643144ea) Signed-off-by: Wenchen Fan --- core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala | 2 ++ core/src/main/scala/org/apache/spark/util/ThreadUtils.scala | 4 scalastyle-config.xml| 12 .../spark/sql/catalyst/expressions/CastSuiteBase.scala | 9 ++--- .../scala/org/apache/spark/sql/execution/command/ddl.scala | 2 ++ .../apache/spark/sql/expressions/ExpressionInfoSuite.scala | 11 ++- .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 4 .../apache/spark/streaming/util/FileBasedWriteAheadLog.scala | 2 ++ 8 files changed, 38 insertions(+), 8 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala index 0a930234437..3c1451a0185 100644 --- a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala @@ -76,8 +76,10 @@ class UnionRDD[T: ClassTag]( override def getPartitions: Array[Partition] = { val parRDDs = if (isPartitionListingParallel) { + // scalastyle:off parvector val parArray = new ParVector(rdds.toVector) parArray.tasksupport = UnionRDD.partitionEvalTaskSupport + // scalastyle:on parvector parArray } else { rdds diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala index 16d7de56c39..2d3d6ec89ff 100644 --- a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala +++ b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala @@ -363,6 +363,10 @@ private[spark] object ThreadUtils { * Comparing to the map() method of Scala parallel collections, this method can be interrupted * at any time. This is useful on canceling of task execution, for example. * + * Functions are guaranteed to be executed in freshly-created threads that inherit the calling + * thread's Spark thread-local variables. These threads also inherit the calling thread's active + * SparkSession. + * * @param in - the input collection which should be transformed in parallel. * @param prefix - the prefix assigned to the underlying thread pool. * @param maxThreads - maximum number of thread can be created during execution. diff --git
[spark] branch master updated: [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 376de8a502f [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession 376de8a502f is described below commit 376de8a502fca6b46d7f21560a60024d643144ea Author: Ankur Dave AuthorDate: Mon Oct 23 10:47:42 2023 +0800 [SPARK-45616][CORE] Avoid ParVector, which does not propagate ThreadLocals or SparkSession ### What changes were proposed in this pull request? `CastSuiteBase` and `ExpressionInfoSuite` use `ParVector.foreach()` to run Spark SQL queries in parallel. They incorrectly assume that each parallel operation will inherit the main thread’s active SparkSession. This is only true when these parallel operations run in freshly-created threads. However, when other code has already run some parallel operations before Spark was started, then there may be existing threads that do not have an active SparkSession. In that case, these tests fai [...] The fix is to use the existing method `ThreadUtils.parmap()`. This method creates fresh threads that inherit the current active SparkSession, and it propagates the Spark ThreadLocals. This PR also adds a scalastyle warning against use of ParVector. ### Why are the changes needed? This change makes `CastSuiteBase` and `ExpressionInfoSuite` less brittle to future changes that may run parallel operations during test startup. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Reproduced the test failures by running a ParVector operation before Spark starts. Verified that this PR fixes the test failures in this condition. ```scala protected override def beforeAll(): Unit = { // Run a ParVector operation before initializing the SparkSession. This starts some Scala // execution context threads that have no active SparkSession. These threads will be reused for // later ParVector operations, reproducing SPARK-45616. new ParVector((0 until 100).toVector).foreach { _ => } super.beforeAll() } ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43466 from ankurdave/SPARK-45616. Authored-by: Ankur Dave Signed-off-by: Wenchen Fan --- core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala | 2 ++ core/src/main/scala/org/apache/spark/util/ThreadUtils.scala | 4 scalastyle-config.xml| 12 .../spark/sql/catalyst/expressions/CastSuiteBase.scala | 9 ++--- .../scala/org/apache/spark/sql/execution/command/ddl.scala | 2 ++ .../apache/spark/sql/expressions/ExpressionInfoSuite.scala | 11 ++- .../main/scala/org/apache/spark/streaming/DStreamGraph.scala | 4 .../apache/spark/streaming/util/FileBasedWriteAheadLog.scala | 2 ++ 8 files changed, 38 insertions(+), 8 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala index 0a930234437..3c1451a0185 100644 --- a/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala @@ -76,8 +76,10 @@ class UnionRDD[T: ClassTag]( override def getPartitions: Array[Partition] = { val parRDDs = if (isPartitionListingParallel) { + // scalastyle:off parvector val parArray = new ParVector(rdds.toVector) parArray.tasksupport = UnionRDD.partitionEvalTaskSupport + // scalastyle:on parvector parArray } else { rdds diff --git a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala index 16d7de56c39..2d3d6ec89ff 100644 --- a/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala +++ b/core/src/main/scala/org/apache/spark/util/ThreadUtils.scala @@ -363,6 +363,10 @@ private[spark] object ThreadUtils { * Comparing to the map() method of Scala parallel collections, this method can be interrupted * at any time. This is useful on canceling of task execution, for example. * + * Functions are guaranteed to be executed in freshly-created threads that inherit the calling + * thread's Spark thread-local variables. These threads also inherit the calling thread's active + * SparkSession. + * * @param in - the input collection which should be transformed in parallel. * @param prefix - the prefix assigned to the underlying thread pool. * @param maxThreads - maximum number of thread can be created during execution. diff --git a/scalastyle-config.xml b/scalastyle-config.xml index 987b4235c19..2077769c71d 100644 --- a/scalastyle-config.xml +++
[spark] branch master updated: [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e3ba9cf0403 [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase e3ba9cf0403 is described below commit e3ba9cf0403ade734f87621472088687e533b2cd Author: Takuya UESHIN AuthorDate: Mon Oct 23 10:35:30 2023 +0900 [SPARK-45620][PYTHON] Fix user-facing APIs related to Python UDTF to use camelCase ### What changes were proposed in this pull request? Fix user-facing APIs related to Python UDTF to use camelCase. ### Why are the changes needed? To keep the naming convention for user-facing APIs. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated the related tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43470 from ueshin/issues/SPARK-45620/field_names. Authored-by: Takuya UESHIN Signed-off-by: Hyukjin Kwon --- python/docs/source/user_guide/sql/python_udtf.rst | 22 +++--- python/pyspark/sql/functions.py | 12 ++-- python/pyspark/sql/tests/test_udtf.py | 84 +++ python/pyspark/sql/udtf.py| 24 +++ python/pyspark/sql/worker/analyze_udtf.py | 12 ++-- 5 files changed, 77 insertions(+), 77 deletions(-) diff --git a/python/docs/source/user_guide/sql/python_udtf.rst b/python/docs/source/user_guide/sql/python_udtf.rst index fb42644dc70..0e0c6e28578 100644 --- a/python/docs/source/user_guide/sql/python_udtf.rst +++ b/python/docs/source/user_guide/sql/python_udtf.rst @@ -77,29 +77,29 @@ To implement a Python UDTF, you first need to define a class implementing the me the particular UDTF call under consideration. Each parameter is an instance of the `AnalyzeArgument` class, which contains fields including the provided argument's data type and value (in the case of literal scalar arguments only). For table arguments, the -`is_table` field is set to true and the `data_type` field is a StructType representing +`isTable` field is set to true and the `dataType` field is a StructType representing the table's column types: -data_type: DataType +dataType: DataType value: Optional[Any] -is_table: bool +isTable: bool This method returns an instance of the `AnalyzeResult` class which includes the result table's schema as a StructType. If the UDTF accepts an input table argument, then the `AnalyzeResult` can also include a requested way to partition the rows of the input -table across several UDTF calls. If `with_single_partition` is set to True, the query +table across several UDTF calls. If `withSinglePartition` is set to True, the query planner will arrange a repartitioning operation from the previous execution stage such that all rows of the input table are consumed by the `eval` method from exactly one -instance of the UDTF class. On the other hand, if the `partition_by` list is non-empty, +instance of the UDTF class. On the other hand, if the `partitionBy` list is non-empty, the query planner will arrange a repartitioning such that all rows with each unique combination of values of the partitioning columns are consumed by a separate unique -instance of the UDTF class. If `order_by` is non-empty, this specifies the requested +instance of the UDTF class. If `orderBy` is non-empty, this specifies the requested ordering of rows within each partition. schema: StructType -with_single_partition: bool = False -partition_by: Sequence[PartitioningColumn] = field(default_factory=tuple) -order_by: Sequence[OrderingColumn] = field(default_factory=tuple) +withSinglePartition: bool = False +partitionBy: Sequence[PartitioningColumn] = field(default_factory=tuple) +orderBy: Sequence[OrderingColumn] = field(default_factory=tuple) Examples @@ -116,7 +116,7 @@ To implement a Python UDTF, you first need to define a class implementing the me >>> def analyze(self, *args) -> AnalyzeResult: ... assert len(args) == 1, "This function accepts one argument only" -... assert args[0].data_type == StringType(), "Only string arguments are supported" +... assert args[0].dataType == StringType(), "Only string arguments are
[spark] branch master updated (7763c91950e -> 9f675c54a56)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7763c91950e [SPARK-45624][CORE][TESTS] Use `AccessibleObject#canAccess` instead of `AccessibleObject#isAccessible` add 9f675c54a56 [SPARK-44753][PYTHON][CONNECT] XML: pyspark sql xml reader writer No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/readwriter.py | 84 + python/pyspark/sql/connect/streaming/readwriter.py | 54 ++ python/pyspark/sql/readwriter.py | 200 + python/pyspark/sql/streaming/readwriter.py | 99 ++ .../sql/tests/connect/test_connect_basic.py| 56 ++ .../sql/tests/connect/test_parity_datasources.py | 4 + python/pyspark/sql/tests/test_datasources.py | 49 + 7 files changed, 546 insertions(+) mode change 100644 => 100755 python/pyspark/sql/tests/connect/test_connect_basic.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (48e207f4a21 -> 7763c91950e)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 48e207f4a21 [SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES] Fix the compilation warning "Auto-application to `()` is deprecated" and turn it into a compilation error add 7763c91950e [SPARK-45624][CORE][TESTS] Use `AccessibleObject#canAccess` instead of `AccessibleObject#isAccessible` No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2709426f0f6 -> 48e207f4a21)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory add 48e207f4a21 [SPARK-45610][BUILD][CORE][SQL][SS][CONNECT][GRAPHX][DSTREAM][ML][MLLIB][K8S][YARN][SHELL][PYTHON][R][AVRO][UI][EXAMPLES] Fix the compilation warning "Auto-application to `()` is deprecated" and turn it into a compilation error No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/avro/AvroSuite.scala | 38 ++--- .../execution/benchmark/AvroReadBenchmark.scala| 10 +- .../execution/benchmark/AvroWriteBenchmark.scala | 4 +- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 12 +- .../spark/sql/DataFrameNaFunctionSuite.scala | 2 +- .../sql/UserDefinedFunctionE2ETestSuite.scala | 2 +- .../spark/sql/connect/client/ArtifactManager.scala | 2 +- .../sql/connect/client/GrpcRetryHandler.scala | 2 +- .../execution/ExecuteResponseObserver.scala| 4 +- .../execution/SparkConnectPlanExecution.scala | 2 +- .../sql/connect/planner/SparkConnectPlanner.scala | 12 +- .../sql/connect/service/ExecuteEventsManager.scala | 2 +- .../sql/connect/service/SparkConnectServer.scala | 2 +- .../connect/planner/SparkConnectServiceSuite.scala | 2 +- .../connect/service/AddArtifactsHandlerSuite.scala | 12 +- .../service/ArtifactStatusesHandlerSuite.scala | 2 +- .../service/FetchErrorDetailsHandlerSuite.scala| 2 +- .../connect/service/InterceptorRegistrySuite.scala | 12 +- .../spark/sql/jdbc/DB2IntegrationSuite.scala | 8 +- .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 6 +- .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 4 +- .../spark/sql/jdbc/OracleIntegrationSuite.scala| 8 +- .../spark/sql/jdbc/PostgresIntegrationSuite.scala | 4 +- .../sql/kafka010/KafkaOffsetReaderConsumer.scala | 2 +- .../sql/kafka010/consumer/KafkaDataConsumer.scala | 2 +- .../sql/kafka010/KafkaContinuousSourceSuite.scala | 16 +- .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 8 +- .../spark/sql/kafka010/KafkaRelationSuite.scala| 36 ++-- .../kafka010/consumer/KafkaDataConsumerSuite.scala | 2 +- .../org/apache/spark/kafka010/KafkaTokenUtil.scala | 4 +- .../kafka010/DirectKafkaInputDStream.scala | 12 +- .../streaming/kafka010/KafkaDataConsumer.scala | 2 +- .../apache/spark/streaming/kafka010/KafkaRDD.scala | 14 +- .../kafka010/DirectKafkaStreamSuite.scala | 14 +- .../kafka010/KafkaDataConsumerSuite.scala | 4 +- .../spark/streaming/kafka010/KafkaRDDSuite.scala | 36 ++-- .../kinesis/KPLBasedKinesisTestUtils.scala | 2 +- .../kinesis/KinesisInputDStreamBuilderSuite.scala | 2 +- .../org/apache/spark/BarrierCoordinator.scala | 2 +- .../org/apache/spark/BarrierTaskContext.scala | 19 ++- .../main/scala/org/apache/spark/Heartbeater.scala | 2 +- .../scala/org/apache/spark/SecurityManager.scala | 4 +- .../main/scala/org/apache/spark/SparkContext.scala | 12 +- .../main/scala/org/apache/spark/TestUtils.scala| 2 +- .../apache/spark/api/java/JavaSparkContext.scala | 2 +- .../org/apache/spark/api/python/PythonRunner.scala | 22 +-- .../scala/org/apache/spark/api/r/BaseRRunner.scala | 2 +- .../org/apache/spark/deploy/JsonProtocol.scala | 2 +- .../apache/spark/deploy/SparkSubmitArguments.scala | 2 +- .../spark/deploy/history/FsHistoryProvider.scala | 2 +- .../apache/spark/deploy/history/HistoryPage.scala | 2 +- .../apache/spark/deploy/worker/CommandUtils.scala | 4 +- .../org/apache/spark/deploy/worker/Worker.scala| 2 +- .../spark/executor/ProcfsMetricsGetter.scala | 4 +- .../apache/spark/input/PortableDataStream.scala| 2 +- .../spark/internal/io/SparkHadoopWriter.scala | 4 +- .../apache/spark/launcher/LauncherBackend.scala| 4 +- .../apache/spark/memory/UnifiedMemoryManager.scala | 2 +- .../org/apache/spark/metrics/MetricsSystem.scala | 4 +- .../org/apache/spark/rdd/AsyncRDDActions.scala | 2 +- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 2 +- .../main/scala/org/apache/spark/rdd/PipedRDD.scala | 4 +- .../apache/spark/rdd/ReliableCheckpointRDD.scala | 6 +- .../org/apache/spark/scheduler/DAGScheduler.scala | 16 +- .../spark/scheduler/StatsReportListener.scala | 2 +- .../scala/org/apache/spark/scheduler/Task.scala| 4 +- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 4 +- .../apache/spark/scheduler/TaskSetManager.scala| 4 +- .../cluster/CoarseGrainedSchedulerBackend.scala| 6 +- .../cluster/StandaloneSchedulerBackend.scala | 2 +- .../apache/spark/serializer/KryoSerializer.scala | 10 +- .../apache/spark/status/AppStatusListener.scala| 10 +- .../apache/spark/status/ElementTrackingStore.scala
[spark] branch master updated: [SPARK-45541][CORE] Add SSLFactory
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2709426f0f6 [SPARK-45541][CORE] Add SSLFactory 2709426f0f6 is described below commit 2709426f0f622a214c51954664458e7dd2ab3304 Author: Hasnain Lakhani AuthorDate: Sun Oct 22 03:21:19 2023 -0500 [SPARK-45541][CORE] Add SSLFactory ### What changes were proposed in this pull request? As titled - add a factory which supports creating SSL engines, and a corresponding builder for it. This will be used in a follow up PR by the `TransportContext` and related files to add SSL support. ### Why are the changes needed? We need a mechanism to initialize the appropriate SSL implementation with the configured settings (such as protocol, ciphers, etc) for RPC SSL support. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests. This will be more thoroughly tested in a follow up PR which adds callsites to it. It has been integration tested as part of https://github.com/apache/spark/pull/42685 ### Was this patch authored or co-authored using generative AI tooling? No Closes #43386 from hasnain-db/spark-tls-factory. Authored-by: Hasnain Lakhani Signed-off-by: Mridul Muralidharan gmail.com> --- .../spark/network/protocol/SslMessageEncoder.java | 2 +- .../network/ssl/ReloadingX509TrustManager.java | 5 +- .../org/apache/spark/network/ssl/SSLFactory.java | 470 + 3 files changed, 474 insertions(+), 3 deletions(-) diff --git a/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java b/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java index f43d0789ee6..87723c6613e 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java +++ b/common/network-common/src/main/java/org/apache/spark/network/protocol/SslMessageEncoder.java @@ -36,7 +36,7 @@ import org.slf4j.LoggerFactory; @ChannelHandler.Sharable public final class SslMessageEncoder extends MessageToMessageEncoder { - private final Logger logger = LoggerFactory.getLogger(SslMessageEncoder.class); + private static final Logger logger = LoggerFactory.getLogger(SslMessageEncoder.class); private SslMessageEncoder() {} diff --git a/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java b/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java index 4c39a5d2a3d..87798bda2a0 100644 --- a/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java +++ b/common/network-common/src/main/java/org/apache/spark/network/ssl/ReloadingX509TrustManager.java @@ -46,7 +46,7 @@ import org.slf4j.LoggerFactory; public final class ReloadingX509TrustManager implements X509TrustManager, Runnable { - private final Logger logger = LoggerFactory.getLogger(ReloadingX509TrustManager.class); + private static final Logger logger = LoggerFactory.getLogger(ReloadingX509TrustManager.class); private final String type; private final File file; @@ -180,7 +180,8 @@ public final class ReloadingX509TrustManager canonicalPath = latestCanonicalFile.getPath(); lastLoaded = latestCanonicalFile.lastModified(); try (FileInputStream in = new FileInputStream(latestCanonicalFile)) { - ks.load(in, password.toCharArray()); + char[] passwordCharacters = password != null? password.toCharArray() : null; + ks.load(in, passwordCharacters); logger.debug("Loaded truststore '" + file + "'"); } diff --git a/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java b/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java new file mode 100644 index 000..fc03dba617f --- /dev/null +++ b/common/network-common/src/main/java/org/apache/spark/network/ssl/SSLFactory.java @@ -0,0 +1,470 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the
[spark] branch master updated: [SPARK-45574][SQL] Add :: syntax as a shorthand for casting
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 936e98dbc07 [SPARK-45574][SQL] Add :: syntax as a shorthand for casting 936e98dbc07 is described below commit 936e98dbc073fcfcb7e6c40720d55dac63a73d51 Author: Ivan Mitic AuthorDate: Sun Oct 22 11:23:01 2023 +0500 [SPARK-45574][SQL] Add :: syntax as a shorthand for casting ### What changes were proposed in this pull request? Adds the `::` syntax as syntactic sugar for casting columns. This is a pretty common syntax across many industry databases. ### Does this PR introduce _any_ user-facing change? Yes, new casting syntax. ### How was this patch tested? Unit tests. SQL tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43430 from mitkedb/master. Authored-by: Ivan Mitic Signed-off-by: Max Gekk --- .../spark/sql/catalyst/parser/SqlBaseLexer.g4 | 1 + .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 1 + .../spark/sql/catalyst/expressions/Cast.scala | 5 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 11 + .../sql/catalyst/parser/CastingSyntaxSuite.scala | 103 ++ .../sql-tests/analyzer-results/ansi/cast.sql.out | 234 + .../sql-tests/analyzer-results/cast.sql.out| 217 .../src/test/resources/sql-tests/inputs/cast.sql | 31 ++ .../resources/sql-tests/results/ansi/cast.sql.out | 385 + .../test/resources/sql-tests/results/cast.sql.out | 245 + .../spark/sql/errors/QueryParsingErrorsSuite.scala | 4 +- .../sql/expressions/ExpressionInfoSuite.scala | 4 +- 12 files changed, 1237 insertions(+), 4 deletions(-) diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 index d9128de0f5d..e8b5cb012fc 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4 @@ -447,6 +447,7 @@ PIPE: '|'; CONCAT_PIPE: '||'; HAT: '^'; COLON: ':'; +DOUBLE_COLON: '::'; ARROW: '->'; FAT_ARROW : '=>'; HENT_START: '/*+'; diff --git a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index 77a9108e063..84a31dafed9 100644 --- a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -957,6 +957,7 @@ primaryExpression | CASE whenClause+ (ELSE elseExpression=expression)? END #searchedCase | CASE value=expression whenClause+ (ELSE elseExpression=expression)? END #simpleCase | name=(CAST | TRY_CAST) LEFT_PAREN expression AS dataType RIGHT_PAREN #cast +| primaryExpression DOUBLE_COLON dataType #castByColon | STRUCT LEFT_PAREN (argument+=namedExpression (COMMA argument+=namedExpression)*)? RIGHT_PAREN #struct | FIRST LEFT_PAREN expression (IGNORE NULLS)? RIGHT_PAREN #first | ANY_VALUE LEFT_PAREN expression (IGNORE NULLS)? RIGHT_PAREN #any_value diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index b975dc3c7a5..99117d81b34 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -438,11 +438,14 @@ object Cast extends QueryErrorsBase { * session local timezone by an analyzer [[ResolveTimeZone]]. */ @ExpressionDescription( - usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`.", + usage = "_FUNC_(expr AS type) - Casts the value `expr` to the target data type `type`." + + " `expr` :: `type` alternative casting syntax is also supported.", examples = """ Examples: > SELECT _FUNC_('10' as int); 10 + > SELECT '10' :: int; + 10 """, since = "1.0.0", group = "conversion_funcs") diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index 8ce58ef7688..7e0aafca31c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++