[PR] [CONNECT][SPARK-53054] Fix the connect.DataFrameReader default format behavior [spark]

via GitHub Thu, 31 Jul 2025 16:41:40 -0700


dillitz opened a new pull request, #51759:
URL: https://github.com/apache/spark/pull/51759

### What changes were proposed in this pull request?
See title.

### Why are the changes needed?
Scala Spark Connect does not adhere to the
[documented](https://spark.apache.org/docs/3.5.6/sql-data-sources-load-save-functions.html)
behavior.

### Does this PR introduce _any_ user-facing change?
As documented in [Generic Load/Save Functions - Spark 3.5.6
Documentation](https://spark.apache.org/docs/3.5.6/sql-data-sources-load-save-functions.html),
and similar to Spark Classic and the Python Spark Connect, Scala Spark
Connect's `DataFrameReader` should default to the format set via the
`spark.sql.sources.default` config.

**Currently**: `spark.read.load("..."`) throws

```
java.lang.IllegalArgumentException: The source format must be specified.
```
**Expected**: `spark.read.load("...")` uses the format specified via
`spark.sql.sources.default`

### How was this patch tested?
Test case added to ClientE2ETestSuite.

### Was this patch authored or co-authored using generative AI tooling?
No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [CONNECT][SPARK-53054] Fix the connect.DataFrameReader default format behavior [spark]

Reply via email to