HyukjinKwon opened a new pull request, #48443:
URL: https://github.com/apache/arrow/pull/48443

   ### Rationale for this change
   
   The `chunked_arrays` hypothesis strategy had a workaround that excluded 
struct types with the assumption that field metadata is not preserved (added 
from 
https://github.com/apache/arrow/commit/dd0988b49cb6726cf915bb9f53d7320e3a97b00b).
   
   Testing confirms that field metadata is now correctly preserved in chunked 
arrays with struct types, so the workaround is no longer necessary, and it is 
fixed by 
https://github.com/apache/arrow/commit/d06c664a1966da682a2382e46fe148be96cca1aa
   
   Now it explicitly calls `CChunkedArray::Make()` instead of manual 
construction of `CChunkedArray`.
   
   ### What changes are included in this PR?
   
   Remove the assumption that field metadata is not preserved.
   
   ### Are these changes tested?
   
   Manually tested the creation of metadata (generated by ChatGPT)
   
   ```python
   import sys
   import pyarrow as pa
   
   # Create a struct type with custom field metadata
   struct_type = pa.struct([
       pa.field('a', pa.int32(), metadata={'custom_key': 'custom_value_a', 
'description': 'field a'}),
       pa.field('b', pa.string(), metadata={'custom_key': 'custom_value_b', 
'description': 'field b'})
   ])
   
   print("=== Original struct type ===")
   print(f"Type: {struct_type}")
   print(f"Field 'a' metadata: {struct_type[0].metadata}")
   print(f"Field 'b' metadata: {struct_type[1].metadata}")
   print()
   
   # Create arrays with this struct type
   arr1 = pa.array([
       {'a': 1, 'b': 'foo'},
       {'a': 2, 'b': 'bar'}
   ], type=struct_type)
   
   arr2 = pa.array([
       {'a': 3, 'b': 'baz'},
       {'a': 4, 'b': 'qux'}
   ], type=struct_type)
   
   print("=== Individual arrays ===")
   print(f"arr1.type: {arr1.type}")
   print(f"arr1.type[0].metadata: {arr1.type[0].metadata}")
   print(f"arr2.type: {arr2.type}")
   print(f"arr2.type[0].metadata: {arr2.type[0].metadata}")
   print()
   
   # Create chunked array WITH explicit type parameter (preserves metadata)
   chunked_with_type = pa.chunked_array([arr1, arr2], type=struct_type)
   
   print("=== Chunked array (with explicit type) ===")
   print(f"Type: {chunked_with_type.type}")
   print(f"Field 'a' metadata: {chunked_with_type.type[0].metadata}")
   print(f"Field 'b' metadata: {chunked_with_type.type[1].metadata}")
   print()
   
   # Verify metadata is preserved
   if (chunked_with_type.type[0].metadata == struct_type[0].metadata and
       chunked_with_type.type[1].metadata == struct_type[1].metadata):
       print("✓ SUCCESS: Field metadata IS preserved!")
       print(f"  Field 'a': {dict(chunked_with_type.type[0].metadata)}")
       print(f"  Field 'b': {dict(chunked_with_type.type[1].metadata)}")
       exit_code = 0
   else:
       print("✗ FAILED: Field metadata was lost")
       exit_code = 1
   
   print()
   print("=== Test without explicit type (for comparison) ===")
   # What happens without explicit type? (inferred from first chunk)
   chunked_without_type = pa.chunked_array([arr1, arr2])
   print(f"Type: {chunked_without_type.type}")
   print(f"Field 'a' metadata: {chunked_without_type.type[0].metadata}")
   print(f"Field 'b' metadata: {chunked_without_type.type[1].metadata}")
   
   if chunked_without_type.type[0].metadata == struct_type[0].metadata:
       print("  → Metadata preserved even without explicit type (from first 
chunk)")
   else:
       print("  → Note: Even without explicit type, metadata is preserved from 
first chunk")
   ```
   
   ### Are there any user-facing changes?
   
   No, test-only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to