Xuanwo commented on issue #7171:
URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2677020625

   > Edit: Ultimately something has to glue together downstream abstractions, 
e.g. in the context of [#7135](https://github.com/apache/arrow-rs/issues/7135) 
providing a way to connect DF's SessionContext through to some IO subsystem. 
Either DF needs to overload some existing interface e.g. ObjectStore/OpenDAL 
inevitably leading to challenges like 
[#7155](https://github.com/apache/arrow-rs/issues/7155) or it needs to define 
its own mechanism. In the case of parquet and AysncFileReaderFactory, this 
interface already exists we just need to point people at it.
   
   Thank you @tustvold for inviting me to join this discussion.
   
   I believe we should build `datafusion-storage` primarily focused on 
DataFusion's own needs while maintaining `datafusion-storage-object-store` and 
`datafusion-storage-opendal` separately. The benefit is that users can 
implement innovative features like `datafusion-storage-cudf` or 
`datafusion-storage-io_uring` without being constrained by the current I/O 
abstraction of object-store or OpenDAL.
   
   If this becomes a reality, DataFusion can design the abstraction based on 
its own requirements without having to push everything upstream to 
`object_store`. This would allow them to maintain useful features such as 
context management and add additional requirements to the trait while letting 
`datafusion-storage-object-store` and `datafusion-storage-opendal` handle the 
extra work.
   
   We can start by aliasing the `ObjectStore` trait inside `datafusion-storage` 
first. I'm happy to initiate a proposal if that sounds like a good idea to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to