timsaucer opened a new pull request, #1547:
URL: https://github.com/apache/datafusion-python/pull/1547

   # Which issue does this PR close?
   
   No associated issue. **PR 4 of 4** stacked on 
[#1546](https://github.com/apache/datafusion-python/pull/1546). The diff 
against \`main\` is cumulative until the prior PRs merge — review the commits 
on \`pr4-docs-examples\` directly for the PR4 delta.
   
   # Rationale for this change
   
   PRs 1-3 close the round-trip for Python UDFs and add the toggle that 
controls it. None of that is discoverable without user-facing documentation. 
This PR ships the user guide page that explains the multiprocessing / Ray / 
datafusion-distributed patterns, the runnable examples, and the centralized 
Security section that nails down what the toggle does and does not protect 
against.
   
   # What changes are included in this PR?
   
   **User guide.** \`docs/source/user-guide/io/distributing_work.rst\` is the 
new canonical page. It walks through:
   
   - The shipped-expression model (what travels inline vs by name).
   - Worker setup (\`datafusion.ipc.set_worker_ctx\`).
   - Sender-side configuration (\`datafusion.ipc.set_sender_ctx\` and 
\`SessionContext.with_python_udf_inlining\`).
   - A Security section that is the single source of truth for the cloudpickle 
/ \`pickle.loads\` threat model.
   - Pointers to the runnable examples and \`datafusion-distributed\`.
   
   \`docs/source/user-guide/io/index.rst\` gets the toctree entry.
   
   **Runnable examples.**
   
   - \`examples/multiprocessing_pickle_expr.py\` — \`Pool.map\` of a 
closure-capturing UDF across processes, with the worker initializer wiring the 
worker context. The closure carries non-trivial state to demonstrate that 
captured state survives the round-trip.
   - \`examples/ray_pickle_expr.py\` — Ray actor analogue.
   - 
\`examples/datafusion-ffi-example/python/tests/_test_pickle_strict_ffi.py\` — 
strict-mode refusal exercised end-to-end against an FFI capsule scalar UDF. 
Kept under the FFI example crate because it needs that crate's compiled 
artifacts. The leading \`_\` keeps pytest from auto-collecting it as a unit 
test; the FFI example's own test harness runs it explicitly.
   - \`examples/README.md\` picks up index entries for the new files.
   
   **Docstring centralization.** Three docstrings previously carried 
near-duplicate copies of the pickle / cloudpickle security warning. Reduced 
each to a one-line summary plus a pointer to the Security section so there's a 
single canonical home for the threat model:
   
   - \`PythonLogicalCodec::with_python_udf_inlining\` rustdoc.
   - \`SessionContext.with_python_udf_inlining\` docstring.
   - \`datafusion.ipc\` module docstring.
   
   The crate-level \`codec.rs\` module rustdoc also updates "pure-Python scalar 
UDFs" to "scalar / aggregate / window UDFs" now that PR 2 has shipped agg + 
window inline.
   
   # Are there any user-facing changes?
   
   Docs and examples only. No code behavior changes, no new public APIs.
   
   \`api change\` not added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to