alamb commented on code in PR #5543:
URL: https://github.com/apache/arrow-datafusion/pull/5543#discussion_r1133763264
##########
datafusion/execution/src/object_store.rs:
##########
@@ -89,6 +89,138 @@ pub trait ObjectStoreProvider: Send + Sync + 'static {
fn get_by_url(&self, url: &Url) -> Result<Arc<dyn ObjectStore>>;
}
+/// Provides a mechanism to get and put object stores.
+pub trait ObjectStoreManager: Send + Sync + std::fmt::Debug + 'static {
+ /// If a store with the same schema and host existed before, it is
replaced and returned
+ fn register_store(
+ &self,
+ scheme: &str,
+ host: &str,
+ store: Arc<dyn ObjectStore>,
+ ) -> Option<Arc<dyn ObjectStore>>;
+
+ /// Get a suitable store for the provided URL. For example:
+ ///
+ /// - URL with scheme `file:///` or no schema will return the default
LocalFS store
+ /// - URL with scheme `s3://bucket/` will return the S3 store
+ /// - URL with scheme `hdfs://hostname:port/` will return the hdfs store
+ fn get_by_url(&self, url: &Url) -> Result<Arc<dyn ObjectStore>>;
Review Comment:
Got it -- thank you for the explanation @yahoNanJing , I missed the
`register_store`
> I have no preference. Here, the main reason of not combining them is for
the backward compatibility of existing usage of ObjectStoreProvider.
I think we can add a new method to `ObjectStoreProvider` that has a default
implementation and maintain backwards compatibility.
For example, what if we added something like this (not tested) which would
require no changes to existing `ObjectStoreProvider`s?
```rust
/// Registers the specified object store for urls with scheme/host
/// returning the previously registered store if any.
fn register_store(
&self,
scheme: &str,
host: &str,
store: Arc<dyn ObjectStore>,
) -> Result<Option<Arc<dyn ObjectStore>>>
{
Err(DataFusionError::NotImplemented("register_store is not supported by
this provider"))
}
```
If we could extend `ObjectStoreProvider` that would be my preference as I
think it keeps the overall code simpler
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]