alamb opened a new issue, #8479:
URL: https://github.com/apache/arrow-rs/issues/8479

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The Parquet type system includes LogicalTypes types without a direct arrow 
equivalent, such as JSON, Variant, and UUID
   
   However, Arrow includes the idea of "Extension" types that add extra 
semantics to an existing Arrow physical type, and the arrow-rs parquet reader 
will automatically map these the relevant parquet types to a canonical Arrow 
extension type if the `arrow_canonical_extension_types` feature is set.
   
   However, right now that mapping of Parquet LogicalType --> Arrow (Canonical) 
ExtensionType is hard coded, which is unfortunate as it means:
   1. Users can not override the mapping (if they want to write their own 
implementation of parquet LogicalTypes, for example)
   2. The code has a bunch of `#[cfg(...)]` sprinkled in it -- see 
https://github.com/apache/arrow-rs/pull/8409 for an example
   
   **Describe the solution you'd like**
   @paleolimbot suggested on 
https://github.com/apache/arrow-rs/pull/8409/files#r2371071848 that we could 
maintain some sort of registry that was more ergonomic to configure and would 
allow user defined extension types
   
   **Describe alternatives you've considered**
   Quoting @paleolimbot on 
https://github.com/apache/arrow-rs/pull/8409/files#r2371071848:
   
   > you could also consider an injection approach like:
   
   ```rust
   pub trait ParquetArrowExtension {
       fn try_from_logical_type(&self, mut arrow_field: Field, logical_type: 
&LogicalType) -> Result<Option<Field>>;
       fn try_to_logical_type(&self, &Field) -> Result<Option<LogicalType>>;
   }
   ```
   
   ...and maintain a registry of those in the reader/writer options. Then you 
don't need compile time flags to support the extensions (something like 
DataFusion or a derivative could wire it all together at runtime).
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to