milenkovicm opened a new issue, #16944:
URL: https://github.com/apache/datafusion/issues/16944

   ### Is your feature request related to a problem or challenge?
   
   At the moment ballista overrides 
`LogicalExtensionCodec::try_decode_file_format` & 
`LogicalExtensionCodec::try_encode_file_format` providing support for:
   
   ```rust
   file_format_codecs: vec![
               Arc::new(ParquetLogicalExtensionCodec {}),
               Arc::new(CsvLogicalExtensionCodec {}),
               Arc::new(JsonLogicalExtensionCodec {}),
               Arc::new(ArrowLogicalExtensionCodec {}),
               Arc::new(AvroLogicalExtensionCodec {}),
           ],
   ```
   
   as seen at [1]. Should we want to integrate ballista with datafusion python 
we would need to provide a custom `LogicalExtensionCodec` implementing same 
logic or reusing ballista `LogicalExtensionCodec` implementation. As this file 
types are supported out of the box in datafusion would it make sense to 
implement encoder/decoder or them in `DefaultLogicalExtensionCodec`?
   
   [1]: 
https://github.com/milenkovicm/arrow-ballista/blob/e1e9f6ca423fd558664a7f2fb3b1bc3ed07d7db8/ballista/core/src/serde/mod.rs#L164-L165
   
   ### Describe the solution you'd like
   
   Should this proposal make sense, implement support for it in 
`DefaultLogicalExtensionCodec` similar to what is supported in ballista already:
   
   ```rust
     fn try_decode_file_format(
           &self,
           buf: &[u8],
           ctx: &datafusion::prelude::SessionContext,
       ) -> Result<Arc<dyn 
datafusion::datasource::file_format::FileFormatFactory>> {
           let proto = FileFormatProto::decode(buf)
               .map_err(|e| DataFusionError::Internal(e.to_string()))?;
   
           let codec = self
               .file_format_codecs
               .get(proto.encoder_position as usize)
               .ok_or(DataFusionError::Internal(
                   "Can't find required codec in file codec list".to_owned(),
               ))?;
   
           codec.try_decode_file_format(&proto.blob, ctx)
       }
   
       fn try_encode_file_format(
           &self,
           buf: &mut Vec<u8>,
           node: Arc<dyn 
datafusion::datasource::file_format::FileFormatFactory>,
       ) -> Result<()> {
           let mut blob = vec![];
           let (encoder_position, _) =
               self.try_any(|codec| codec.try_encode_file_format(&mut blob, 
node.clone()))?;
   
           let proto = FileFormatProto {
               encoder_position,
               blob,
           };
           proto
               .encode(buf)
               .map_err(|e| DataFusionError::Internal(e.to_string()))
       }
   ```
   
   
https://github.com/milenkovicm/arrow-ballista/blob/e1e9f6ca423fd558664a7f2fb3b1bc3ed07d7db8/ballista/core/src/serde/mod.rs#L214-L215
   
   ### Describe alternatives you've considered
   
   alternative would be to keep everything as it is.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to