alamb commented on a change in pull request #1779:
URL: https://github.com/apache/arrow-datafusion/pull/1779#discussion_r801987584



##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default 
LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's 
registered
+    /// Returns a tuple with the store and the self-described uri of the file 
in that store
     pub fn get_by_uri<'a>(
         &self,
         uri: &'a str,
     ) -> Result<(Arc<dyn ObjectStore>, &'a str)> {
-        if let Some((scheme, path)) = uri.split_once("://") {
-            let stores = self.object_stores.read();
-            let store = stores
-                .get(&*scheme.to_lowercase())
-                .map(Clone::clone)
-                .ok_or_else(|| {
-                    DataFusionError::Internal(format!(
-                        "No suitable object store found for {}",
-                        scheme
-                    ))
-                })?;
-            Ok((store, path))
+        // We do not support the remote object store on Windows OS

Review comment:
       I would personally prefer to keep any `uri` parsing in the datafusion 
crate as simple as possible and leave more sophisticated uri interpretation to 
the actual `ObjectStoreInstance`. 
   
   Among other things this makes it easier to implement arbitrary 
`ObjectStoreInstance`
   
   For the usecase mentioned in 
https://github.com/apache/arrow-datafusion/issues/1778 I wonder if you could 
write a wrapping object store like:
   
   ```rust
   struct HostAwareHDFS {
   
   }
   
   impl ObjectStore for HostAwareHDFS {
       async fn list_file(&self, prefix: &str) -> Result<FileMetaStream> {
         /// form valid hdfs url:
         let hdfs_url = format!("hdfs://{}", prefix);
         match Url::parse(&hdfs_url) {
           Ok(url) => { 
              self.get_object_store_for_host(url.host()).list_file(prefix)?
              ...
          Err(..) ...
       }
   ...
   }
   ```
   
   
   In other words, push the interpretation of urls into the ObjectStore
   
   
   If this won't work, I would suggest passing the entire parsed url into 
`store.get()` rather than some synthetic key that is datafusion specific

##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -219,25 +220,42 @@ impl ObjectStoreRegistry {
 
     /// Get a suitable store for the URI based on it's scheme. For example:
     /// - URI with scheme `file://` or no schema will return the default 
LocalFS store
-    /// - URI with scheme `s3://` will return the S3 store if it's registered
-    /// Returns a tuple with the store and the path of the file in that store
-    /// (URI=scheme://path).
+    /// - URI with scheme `s3://host:port` will return the S3 store if it's 
registered

Review comment:
       URI schemes also allow for `username` and `password` 
   
   e.g. `s3://user:pass@host:port`
   
   https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to