Re: [PR] [CONNECT][SPARK-53054] Fix the connect.DataFrameReader default format behavior [spark]

via GitHub Thu, 31 Jul 2025 18:46:50 -0700


violetnspct commented on PR #51761:
URL: https://github.com/apache/spark/pull/51761#issuecomment-3141883787


   @dillitz Should you be adding tests to cover the following scenarios?
   
   1. When both default format is configured and explicit format is specified. 
   
   2. Edge case:
       - Null source with no default format configured. This wasn't possible in 
old method (would throw exception). Important because: Could lead to runtime 
errors if default format isn't properly configured
       - Empty string source. This also wasn't possible in old method (would 
throw exception). Important because: Could be mistakenly passed as a format 
instead of null
   
   
   3. Maybe, unit test method, `test("read path collision")`, should be 
updated? The changes to the load() method affect how source format is handled, 
removing the explicit format check (assertSourceFormatSpecified). While the 
path collision scenario is still being tested, the changes could affect how 
format and path options interact when loading data. The original test focuses 
only on path collision but doesn't verify format behavior. While the PR adds a 
new test "ES-1538905: DataFrameReader defaults to spark.sql.sources.default" 
that covers the default format case, the path collision test should be updated 
to verify that path collision errors still occur correctly regardless of how 
the format is specified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CONNECT][SPARK-53054] Fix the connect.DataFrameReader default format behavior [spark]

Reply via email to