paleolimbot opened a new issue, #7982:
URL: https://github.com/apache/arrow-rs/issues/7982

   **Describe the bug**
   
   The representation of Dictionary in the data types enum seems to exclude 
field metadata, so extension types are dropped when they go through arrow-rs 
structures:
   
   
https://github.com/apache/arrow-rs/blob/a7f3ba8f3a748243af1575bce8d50dfc6a81ab73/arrow-schema/src/datatype.rs#L359
   
   The definition of RunEndEncoded and others seem to use a `FieldRef` and I'm 
wondering if it was a deliberate choice not to do this or whether it's just 
never come up.
   
   **To Reproduce**
   
   I used arro3 to reproduce:
   
   ```python
   import arro3.core as a3
   import geoarrow.pyarrow as ga
   import nanoarrow as na
   import pyarrow as pa
   
   c_schema = na.c_schema(pa.dictionary(pa.int32(), ga.wkb()))
   
   c_schema.metadata is None
   #> True
   c_schema.dictionary.metadata
   #> <nanoarrow._schema.SchemaMetadata>
   #> - b'ARROW:extension:name': b'geoarrow.wkb'
   #> - b'ARROW:extension:metadata': b'{}'
   
   c_schema2 = na.c_schema(a3.DataType.dictionary(pa.int32(), ga.wkb()))
   c_schema2.metadata is None
   #> True
   c_schema2.dictionary.metadata is None
   #> True
   ```
   
   **Expected behavior**
   
   I would have expected the metadata to roundtrip through the arrow-rs data 
type representation
   
   **Additional context**
   
   Occasionally Parquet readers will return dictionary-encoded arrays on read 
whose representation is not entirely in control of the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to