RaghunandanKumar opened a new pull request, #55904: URL: https://github.com/apache/spark/pull/55904
### What changes were proposed in this pull request? This change teaches Spark Connect `SparkSession.copyFromLocalToFs` to accept a local directory path in addition to a single file path. Changes in this PR: - update the Spark Connect artifact manager to expand a local directory into per-file `forward_to_fs` artifacts while preserving the nested relative layout - keep the existing file upload path unchanged - update the PySpark ML Connect helper to use the new recursive directory-copy behavior in remote mode - remove the old one-level directory limitation from the local ML helper as well - add a regression test that copies a directory containing a nested file tree and verifies both files arrive at the destination ### Why are the changes needed? Today the Connect artifact path only accepts a single file for `copyFromLocalToFs`, even though PySpark ML Connect has a directory-copy helper and model save flows naturally need to stage directory trees. This leaves two rough edges: - the Connect path cannot directly copy a directory tree - the ML helper had its own one-level directory workaround instead of reusing a stronger Connect primitive Supporting recursive directory uploads in the Connect artifact path makes the API more generally useful and removes the need for the shallow workaround in ML Connect. ### Does this PR introduce _any_ user-facing change? Yes. Before this change, `SparkSession.copyFromLocalToFs(local_dir, dest_path)` in Spark Connect only supported a single local file path and would not handle a directory tree. After this change, the same API accepts a local directory and copies all files under it recursively while preserving relative paths under the destination. ### How was this patch tested? Added a focused regression test in `pyspark.sql.tests.connect.client.test_artifact` covering nested directory copy. Attempted local verification with: - `python/run-tests --testnames pyspark.sql.tests.connect.client.test_artifact` - `build/sbt -Phive package` In this environment, local verification is currently blocked because Spark is not built and the machine does not have a usable Java runtime configured for `build/sbt`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: OpenAI Codex GPT-5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
