1fanwang opened a new pull request, #49869:
URL: https://github.com/apache/arrow/pull/49869
### Rationale for this change
Round-tripping an empty `Table` via `to_struct_array()` ->
`Table.from_struct_array()` raises `ValueError: Must pass schema, or at least
one RecordBatch`. This is the inverse of the issue fixed in GH-46355.
Reproducer (against pyarrow 21.0.0, macOS 15.3, Python 3.12.5):
```python
import pyarrow as pa
table = pa.table({"a": pa.array([]), "b": pa.array([], type=pa.float64())})
array = table.to_struct_array()
pa.Table.from_struct_array(array)
# ValueError: Must pass schema, or at least one RecordBatch
```
Root cause: after GH-46355, `Table.to_struct_array()` returns a
`ChunkedArray` with **zero chunks** for an empty table.
`Table.from_struct_array` then iterates `struct_array.chunks` and forwards an
empty list to `Table.from_batches`, which has no schema to infer from.
### What changes are included in this PR?
In `Table.from_struct_array`, the `ChunkedArray` branch now passes
`schema=schema(struct_array.type.fields)` to `Table.from_batches`, so the
zero-chunk path preserves the field names and dtypes carried on the input's
struct type. The single-`Array` branch is unchanged; it always produces a
non-empty batch list and `RecordBatch.from_struct_array` already preserves the
schema in C++.
This mirrors the inverse fix in GH-46355.
### Are these changes tested?
Yes. Added `test_table_from_struct_array_for_empty_chunked_array` in
`python/pyarrow/tests/test_table.py` which exercises the zero-chunk path and
asserts both data equality and schema equality with the expected empty table.
The test fails on `main` with the same `ValueError` and passes with the patch.
I verified the fix shape end-to-end against pyarrow 21.0.0 by
re-implementing the patched method in pure Python and running the issue's exact
reproducer plus four additional shapes (non-empty Array, non-empty
ChunkedArray, single-empty-chunk ChunkedArray, zero-chunk ChunkedArray) - all
behave correctly. CI will run the actual Cython-compiled test.
`flake8` (with `python/setup.cfg`) and `cython-lint --no-pycodestyle` both
pass on the touched files.
### Are there any user-facing changes?
`pa.Table.from_struct_array` now accepts an empty `ChunkedArray` of struct
type and returns the corresponding empty `Table` with the schema preserved,
instead of raising `ValueError`.
---
AI disclosure: I used Claude Code to scout Apache Arrow's open issues,
locate the precedent fix, and draft this patch. I read both
`Table.from_struct_array` and the GH-46355 commit, ran the reproducer myself,
validated the patched logic against pyarrow 21.0.0 across the five input shapes
above, and own the change. Happy to address review feedback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]