[PR] DF54 follow-ups wave 1: SessionContext APIs, codec typing, test fixes [datafusion-python]

via GitHub Thu, 21 May 2026 12:10:50 -0700


timsaucer opened a new pull request, #1554:
URL: https://github.com/apache/datafusion-python/pull/1554


   # Which issue does this PR close?
   
   No single issue — this is wave 1 of follow-up work after the DataFusion 54 
upgrade (#1532). Each commit is self-contained and can be reviewed 
independently.
   
   # Rationale for this change
   
   DataFusion 54 introduced or deprecated several pieces of upstream API 
surface that the Python bindings had not yet caught up with. This PR closes the 
highest-value gaps (UDF lookup, `read_batches`, variadic `get_field`), tightens 
typing on the codec setters added in #1541, migrates the FFI example off the 
deprecated `TableFunctionImpl::call`, and cleans up two long-standing pytest 
annoyances (an `xfail` that no longer needs to fail, and a deprecation warning 
that was leaking through `pytest.raises`).
   
   # What changes are included in this PR?
   
   - `refactor: migrate FFI example table function to call_with_args` — 
`PyTableFunction` already moved to `call_with_args` in 5a64b0d; this brings the 
FFI example along so it no longer relies on the deprecated entry point.
   - `feat: type SessionContext codec setters with exportable Protocols` — adds 
`LogicalExtensionCodecExportable` / `PhysicalExtensionCodecExportable` 
Protocols and tightens `with_logical_extension_codec` / 
`with_physical_extension_codec` signatures from `codec: Any` to `Protocol | 
_PyCapsule`. Pure typing change; no runtime behavior diff.
   - `feat: accept variadic field path in get_field` — collapses 
`get_field(expr, name)` and `get_field_path(expr, [names...])` into a single 
variadic `get_field(expr, *names)` that dispatches through one Rust binding.
   - `feat: SessionContext.read_batches / read_batch` — wraps upstream 
`SessionContext::read_batches` to materialize a DataFrame directly from a 
sequence of `RecordBatch`es without registering a named table. The single-batch 
`read_batch` is implemented in pure Python on top of `read_batches([batch])`.
   - `feat: SessionContext UDF lookup helpers` — exposes `udf(name)` / 
`udaf(name)` / `udwf(name)` lookups symmetric with the existing register 
helpers, plus `udfs()` / `udafs()` / `udwfs()` enumerators that return sorted 
`Vec<String>` instead of the raw upstream `HashSet`.
   - `chore: bump pre-commit so it stops failing CI checks`.
   - `test: drop xfail on timestamp[s] parquet roundtrip` — pyarrow.parquet 
promotes `timestamp[s]` to `timestamp[ms]` on write 
([apache/arrow#41382](https://github.com/apache/arrow/issues/41382)); cast the 
expected array so the test asserts DataFusion reads what Arrow actually stored, 
instead of relying on `xfail`.
   - `test: capture deprecation warning in repr_rows conflict case` — 
`DataFrameHtmlFormatter(repr_rows=..., max_rows=...)` fires the deprecation 
warning before raising `ValueError`, but `pytest.raises` does not catch 
warnings. Wrap the call in both `pytest.raises` and `pytest.warns` so the 
warning is asserted, not leaked into every pytest run.
   
   # Are there any user-facing changes?
   
   Yes — several new public APIs:
   
   - `SessionContext.read_batches(batches)` / 
`SessionContext.read_batch(batch)` — materialize a DataFrame directly from 
`RecordBatch`es.
   - `SessionContext.udf(name)` / `udaf(name)` / `udwf(name)` lookup helpers, 
and `udfs()` / `udafs()` / `udwfs()` enumerators.
   - `get_field(expr, *names)` now accepts a variadic field path (single-name 
calls are unchanged).
   - `with_logical_extension_codec` / `with_physical_extension_codec` setters 
are now typed as `Protocol | _PyCapsule` instead of `Any`; runtime behavior is 
unchanged.
   
   No breaking changes to existing public APIs.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] DF54 follow-ups wave 1: SessionContext APIs, codec typing, test fixes [datafusion-python]

Reply via email to