1fanwang opened a new pull request, #49869:
URL: https://github.com/apache/arrow/pull/49869

   ### Rationale for this change
   
   Round-tripping an empty `Table` via `to_struct_array()` -> 
`Table.from_struct_array()` raises `ValueError: Must pass schema, or at least 
one RecordBatch`. This is the inverse of the issue fixed in GH-46355.
   
   Reproducer (against pyarrow 21.0.0, macOS 15.3, Python 3.12.5):
   
   ```python
   import pyarrow as pa
   table = pa.table({"a": pa.array([]), "b": pa.array([], type=pa.float64())})
   array = table.to_struct_array()
   pa.Table.from_struct_array(array)
   # ValueError: Must pass schema, or at least one RecordBatch
   ```
   
   Root cause: after GH-46355, `Table.to_struct_array()` returns a 
`ChunkedArray` with **zero chunks** for an empty table. 
`Table.from_struct_array` then iterates `struct_array.chunks` and forwards an 
empty list to `Table.from_batches`, which has no schema to infer from.
   
   ### What changes are included in this PR?
   
   In `Table.from_struct_array`, the `ChunkedArray` branch now passes 
`schema=schema(struct_array.type.fields)` to `Table.from_batches`, so the 
zero-chunk path preserves the field names and dtypes carried on the input's 
struct type. The single-`Array` branch is unchanged; it always produces a 
non-empty batch list and `RecordBatch.from_struct_array` already preserves the 
schema in C++.
   
   This mirrors the inverse fix in GH-46355.
   
   ### Are these changes tested?
   
   Yes. Added `test_table_from_struct_array_for_empty_chunked_array` in 
`python/pyarrow/tests/test_table.py` which exercises the zero-chunk path and 
asserts both data equality and schema equality with the expected empty table. 
The test fails on `main` with the same `ValueError` and passes with the patch.
   
   I verified the fix shape end-to-end against pyarrow 21.0.0 by 
re-implementing the patched method in pure Python and running the issue's exact 
reproducer plus four additional shapes (non-empty Array, non-empty 
ChunkedArray, single-empty-chunk ChunkedArray, zero-chunk ChunkedArray) - all 
behave correctly. CI will run the actual Cython-compiled test.
   
   `flake8` (with `python/setup.cfg`) and `cython-lint --no-pycodestyle` both 
pass on the touched files.
   
   ### Are there any user-facing changes?
   
   `pa.Table.from_struct_array` now accepts an empty `ChunkedArray` of struct 
type and returns the corresponding empty `Table` with the schema preserved, 
instead of raising `ValueError`.
   
   ---
   
   AI disclosure: I used Claude Code to scout Apache Arrow's open issues, 
locate the precedent fix, and draft this patch. I read both 
`Table.from_struct_array` and the GH-46355 commit, ran the reproducer myself, 
validated the patched logic against pyarrow 21.0.0 across the five input shapes 
above, and own the change. Happy to address review feedback.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to