This is an automated email from the ASF dual-hosted git repository.
dongjoon-hyun pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-connect-swift.git
The following commit(s) were added to refs/heads/main by this push:
new 773c238 [SPARK-57062] Support `json(DataFrame)` in `DataFrameReader`
773c238 is described below
commit 773c23861d0b4c56240106272a545a157c1519cd
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Mon May 25 18:55:18 2026 -0700
[SPARK-57062] Support `json(DataFrame)` in `DataFrameReader`
### What changes were proposed in this pull request?
This PR aims to support `json(DataFrame)` overload in `DataFrameReader`.
### Why are the changes needed?
For feature parity with PySpark/Scala `spark.read.json(jsonDataset)`, and
to exercise `Spark_Connect_Parse.ParseFormat.json` (Apache Spark 4.2.0+).
- https://github.com/apache/spark/pull/55097
### Does this PR introduce _any_ user-facing change?
No behavior change. New public overload added.
### How was this patch tested?
Pass the CIs with the newly added test case.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
Closes #390 from dongjoon-hyun/SPARK-57062.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
Sources/SparkConnect/DataFrameReader.swift | 21 +++++++++++++++++++++
Tests/SparkConnectTests/DataFrameReaderTests.swift | 14 ++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/Sources/SparkConnect/DataFrameReader.swift
b/Sources/SparkConnect/DataFrameReader.swift
index 5182ec7..5556fdf 100644
--- a/Sources/SparkConnect/DataFrameReader.swift
+++ b/Sources/SparkConnect/DataFrameReader.swift
@@ -210,6 +210,27 @@ public actor DataFrameReader: Sendable {
return load(paths)
}
+ /// Loads a JSON dataset and returns the result as a ``DataFrame``.
+ /// The input ``DataFrame`` must have a single string column whose values
are JSON documents.
+ /// - Parameter jsonDataset: A ``DataFrame`` with a single string column.
+ /// - Returns: A ``DataFrame``.
+ public func json(_ jsonDataset: DataFrame) async -> DataFrame {
+ var parse = Parse()
+ parse.format = .json
+ parse.options = self.extraOptions.toStringDictionary()
+ if case .root(let input) = await jsonDataset.plan.opType {
+ parse.input = input
+ }
+
+ var relation = Relation()
+ relation.parse = parse
+
+ var plan = Plan()
+ plan.opType = .root(relation)
+
+ return DataFrame(spark: sparkSession, plan: plan)
+ }
+
/// Loads an XML file and returns the result as a ``DataFrame``.
/// - Parameter path: A path string
/// - Returns: A ``DataFrame``.
diff --git a/Tests/SparkConnectTests/DataFrameReaderTests.swift
b/Tests/SparkConnectTests/DataFrameReaderTests.swift
index f755879..25162d4 100644
--- a/Tests/SparkConnectTests/DataFrameReaderTests.swift
+++ b/Tests/SparkConnectTests/DataFrameReaderTests.swift
@@ -49,6 +49,20 @@ struct DataFrameReaderTests {
await spark.stop()
}
+ @Test
+ func jsonDataset() async throws {
+ let spark = try await SparkSession.builder.getOrCreate()
+ if await spark.version >= "4.2.0" {
+ let jsonDF = try await spark.sql(
+ "SELECT * FROM VALUES "
+ + "('{\"name\":\"Alice\",\"age\":25}'), "
+ + "('{\"name\":\"Bob\",\"age\":30}') AS T(value)"
+ )
+ #expect(try await spark.read.json(jsonDF).count() == 2)
+ }
+ await spark.stop()
+ }
+
@Test
func xml() async throws {
let spark = try await SparkSession.builder.getOrCreate()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]