RaghunandanKumar opened a new pull request, #55904:
URL: https://github.com/apache/spark/pull/55904

   ### What changes were proposed in this pull request?
   
   This change teaches Spark Connect `SparkSession.copyFromLocalToFs` to accept 
a local directory path in addition to a single file path.
   
   Changes in this PR:
   - update the Spark Connect artifact manager to expand a local directory into 
per-file `forward_to_fs` artifacts while preserving the nested relative layout
   - keep the existing file upload path unchanged
   - update the PySpark ML Connect helper to use the new recursive 
directory-copy behavior in remote mode
   - remove the old one-level directory limitation from the local ML helper as 
well
   - add a regression test that copies a directory containing a nested file 
tree and verifies both files arrive at the destination
   
   ### Why are the changes needed?
   
   Today the Connect artifact path only accepts a single file for 
`copyFromLocalToFs`, even though PySpark ML Connect has a directory-copy helper 
and model save flows naturally need to stage directory trees.
   
   This leaves two rough edges:
   - the Connect path cannot directly copy a directory tree
   - the ML helper had its own one-level directory workaround instead of 
reusing a stronger Connect primitive
   
   Supporting recursive directory uploads in the Connect artifact path makes 
the API more generally useful and removes the need for the shallow workaround 
in ML Connect.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   
   Before this change, `SparkSession.copyFromLocalToFs(local_dir, dest_path)` 
in Spark Connect only supported a single local file path and would not handle a 
directory tree.
   
   After this change, the same API accepts a local directory and copies all 
files under it recursively while preserving relative paths under the 
destination.
   
   ### How was this patch tested?
   
   Added a focused regression test in 
`pyspark.sql.tests.connect.client.test_artifact` covering nested directory copy.
   
   Attempted local verification with:
   - `python/run-tests --testnames 
pyspark.sql.tests.connect.client.test_artifact`
   - `build/sbt -Phive package`
   
   In this environment, local verification is currently blocked because Spark 
is not built and the machine does not have a usable Java runtime configured for 
`build/sbt`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: OpenAI Codex GPT-5
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to