201573 opened a new pull request, #55660:
URL: https://github.com/apache/spark/pull/55660

   ### What changes were proposed in this pull request?
   
   This PR allows os.PathLike path objects, such as pathlib.Path, to be passed 
to PySpark readwriter path APIs.
   
   The change normalizes path-like objects with os.fsdecode before sending 
paths to the JVM or Spark Connect plans.
   
   ### Why are the changes needed?
   
   Currently, several PySpark readwriter methods accept only str or list[str] 
paths. Python users commonly use pathlib.Path, and these objects should work 
for file-system backed data sources.
   
   Closes #55203.
   
   ### Does this PR introduce any user-facing change?
   
   Yes. Users can pass pathlib.Path / os.PathLike objects to supported 
readwriter APIs.
   
   ### How was this patch tested?
   
   - ./dev/lint-python --compile
   - git diff --check
   - Added PySpark readwriter tests for pathlib.Path
   - Added Spark Connect plan coverage for path-like path lists
   
   Full PySpark runtime tests were not run locally because this machine does 
not have a Java Runtime installed.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes. I used OpenAI Codex to help implement and test this change. I have 
reviewed the changes and take responsibility for them. This contribution is my 
original work and I license the work to the project under the project's open 
source license.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to