HyukjinKwon opened a new pull request, #48443: URL: https://github.com/apache/arrow/pull/48443
### Rationale for this change The `chunked_arrays` hypothesis strategy had a workaround that excluded struct types with the assumption that field metadata is not preserved (added from https://github.com/apache/arrow/commit/dd0988b49cb6726cf915bb9f53d7320e3a97b00b). Testing confirms that field metadata is now correctly preserved in chunked arrays with struct types, so the workaround is no longer necessary, and it is fixed by https://github.com/apache/arrow/commit/d06c664a1966da682a2382e46fe148be96cca1aa Now it explicitly calls `CChunkedArray::Make()` instead of manual construction of `CChunkedArray`. ### What changes are included in this PR? Remove the assumption that field metadata is not preserved. ### Are these changes tested? Manually tested the creation of metadata (generated by ChatGPT) ```python import sys import pyarrow as pa # Create a struct type with custom field metadata struct_type = pa.struct([ pa.field('a', pa.int32(), metadata={'custom_key': 'custom_value_a', 'description': 'field a'}), pa.field('b', pa.string(), metadata={'custom_key': 'custom_value_b', 'description': 'field b'}) ]) print("=== Original struct type ===") print(f"Type: {struct_type}") print(f"Field 'a' metadata: {struct_type[0].metadata}") print(f"Field 'b' metadata: {struct_type[1].metadata}") print() # Create arrays with this struct type arr1 = pa.array([ {'a': 1, 'b': 'foo'}, {'a': 2, 'b': 'bar'} ], type=struct_type) arr2 = pa.array([ {'a': 3, 'b': 'baz'}, {'a': 4, 'b': 'qux'} ], type=struct_type) print("=== Individual arrays ===") print(f"arr1.type: {arr1.type}") print(f"arr1.type[0].metadata: {arr1.type[0].metadata}") print(f"arr2.type: {arr2.type}") print(f"arr2.type[0].metadata: {arr2.type[0].metadata}") print() # Create chunked array WITH explicit type parameter (preserves metadata) chunked_with_type = pa.chunked_array([arr1, arr2], type=struct_type) print("=== Chunked array (with explicit type) ===") print(f"Type: {chunked_with_type.type}") print(f"Field 'a' metadata: {chunked_with_type.type[0].metadata}") print(f"Field 'b' metadata: {chunked_with_type.type[1].metadata}") print() # Verify metadata is preserved if (chunked_with_type.type[0].metadata == struct_type[0].metadata and chunked_with_type.type[1].metadata == struct_type[1].metadata): print("✓ SUCCESS: Field metadata IS preserved!") print(f" Field 'a': {dict(chunked_with_type.type[0].metadata)}") print(f" Field 'b': {dict(chunked_with_type.type[1].metadata)}") exit_code = 0 else: print("✗ FAILED: Field metadata was lost") exit_code = 1 print() print("=== Test without explicit type (for comparison) ===") # What happens without explicit type? (inferred from first chunk) chunked_without_type = pa.chunked_array([arr1, arr2]) print(f"Type: {chunked_without_type.type}") print(f"Field 'a' metadata: {chunked_without_type.type[0].metadata}") print(f"Field 'b' metadata: {chunked_without_type.type[1].metadata}") if chunked_without_type.type[0].metadata == struct_type[0].metadata: print(" → Metadata preserved even without explicit type (from first chunk)") else: print(" → Note: Even without explicit type, metadata is preserved from first chunk") ``` ### Are there any user-facing changes? No, test-only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
