Xuanwo commented on issue #7171: URL: https://github.com/apache/arrow-rs/issues/7171#issuecomment-2677020625
> Edit: Ultimately something has to glue together downstream abstractions, e.g. in the context of [#7135](https://github.com/apache/arrow-rs/issues/7135) providing a way to connect DF's SessionContext through to some IO subsystem. Either DF needs to overload some existing interface e.g. ObjectStore/OpenDAL inevitably leading to challenges like [#7155](https://github.com/apache/arrow-rs/issues/7155) or it needs to define its own mechanism. In the case of parquet and AysncFileReaderFactory, this interface already exists we just need to point people at it. Thank you @tustvold for inviting me to join this discussion. I believe we should build `datafusion-storage` primarily focused on DataFusion's own needs while maintaining `datafusion-storage-object-store` and `datafusion-storage-opendal` separately. The benefit is that users can implement innovative features like `datafusion-storage-cudf` or `datafusion-storage-io_uring` without being constrained by the current I/O abstraction of object-store or OpenDAL. If this becomes a reality, DataFusion can design the abstraction based on its own requirements without having to push everything upstream to `object_store`. This would allow them to maintain useful features such as context management and add additional requirements to the trait while letting `datafusion-storage-object-store` and `datafusion-storage-opendal` handle the extra work. We can start by aliasing the `ObjectStore` trait inside `datafusion-storage` first. I'm happy to initiate a proposal if that sounds like a good idea to you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
