timsaucer opened a new pull request, #1547: URL: https://github.com/apache/datafusion-python/pull/1547
# Which issue does this PR close? No associated issue. **PR 4 of 4** stacked on [#1546](https://github.com/apache/datafusion-python/pull/1546). The diff against \`main\` is cumulative until the prior PRs merge — review the commits on \`pr4-docs-examples\` directly for the PR4 delta. # Rationale for this change PRs 1-3 close the round-trip for Python UDFs and add the toggle that controls it. None of that is discoverable without user-facing documentation. This PR ships the user guide page that explains the multiprocessing / Ray / datafusion-distributed patterns, the runnable examples, and the centralized Security section that nails down what the toggle does and does not protect against. # What changes are included in this PR? **User guide.** \`docs/source/user-guide/io/distributing_work.rst\` is the new canonical page. It walks through: - The shipped-expression model (what travels inline vs by name). - Worker setup (\`datafusion.ipc.set_worker_ctx\`). - Sender-side configuration (\`datafusion.ipc.set_sender_ctx\` and \`SessionContext.with_python_udf_inlining\`). - A Security section that is the single source of truth for the cloudpickle / \`pickle.loads\` threat model. - Pointers to the runnable examples and \`datafusion-distributed\`. \`docs/source/user-guide/io/index.rst\` gets the toctree entry. **Runnable examples.** - \`examples/multiprocessing_pickle_expr.py\` — \`Pool.map\` of a closure-capturing UDF across processes, with the worker initializer wiring the worker context. The closure carries non-trivial state to demonstrate that captured state survives the round-trip. - \`examples/ray_pickle_expr.py\` — Ray actor analogue. - \`examples/datafusion-ffi-example/python/tests/_test_pickle_strict_ffi.py\` — strict-mode refusal exercised end-to-end against an FFI capsule scalar UDF. Kept under the FFI example crate because it needs that crate's compiled artifacts. The leading \`_\` keeps pytest from auto-collecting it as a unit test; the FFI example's own test harness runs it explicitly. - \`examples/README.md\` picks up index entries for the new files. **Docstring centralization.** Three docstrings previously carried near-duplicate copies of the pickle / cloudpickle security warning. Reduced each to a one-line summary plus a pointer to the Security section so there's a single canonical home for the threat model: - \`PythonLogicalCodec::with_python_udf_inlining\` rustdoc. - \`SessionContext.with_python_udf_inlining\` docstring. - \`datafusion.ipc\` module docstring. The crate-level \`codec.rs\` module rustdoc also updates "pure-Python scalar UDFs" to "scalar / aggregate / window UDFs" now that PR 2 has shipped agg + window inline. # Are there any user-facing changes? Docs and examples only. No code behavior changes, no new public APIs. \`api change\` not added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
