[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1062: Add support of HDFS as remote object store

GitBox Wed, 29 Sep 2021 17:13:15 -0700


houqp commented on a change in pull request #1062:
URL: https://github.com/apache/arrow-datafusion/pull/1062#discussion_r718131881




##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -75,8 +83,15 @@ pub type ListEntryStream =
 /// It maps strings (e.g. URLs, filesystem paths, etc) to sources of bytes
 #[async_trait]
 pub trait ObjectStore: Sync + Send + Debug {
+    /// Get file system scheme
+    fn get_schema(&self) -> &'static str;

Review comment:
       an object store could have multiple schemes, for example, s3/s3a or 
file/fs/filesystem, so it would be better to return a slice of str here.

##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -75,8 +83,15 @@ pub type ListEntryStream =
 /// It maps strings (e.g. URLs, filesystem paths, etc) to sources of bytes
 #[async_trait]
 pub trait ObjectStore: Sync + Send + Debug {
+    /// Get file system scheme
+    fn get_schema(&self) -> &'static str;

Review comment:
       also i think the name should be `get_scheme` instead?

##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -75,8 +83,15 @@ pub type ListEntryStream =
 /// It maps strings (e.g. URLs, filesystem paths, etc) to sources of bytes
 #[async_trait]
 pub trait ObjectStore: Sync + Send + Debug {
+    /// Get file system scheme
+    fn get_schema(&self) -> &'static str;

Review comment:
       hmm... after taking a closer look at this, it looks like this is mainly 
used in `get_chunk_reader` to build object store specific chunkreaders based on 
the file scheme. I think the ideal abstraction would be making file format 
modules agnostic to object stores instead of implementing object specific 
format readers like `HadoopParquetFileReader`.

##########
File path: datafusion/src/datasource/object_store/mod.rs
##########
@@ -75,8 +83,15 @@ pub type ListEntryStream =
 /// It maps strings (e.g. URLs, filesystem paths, etc) to sources of bytes
 #[async_trait]
 pub trait ObjectStore: Sync + Send + Debug {
+    /// Get file system scheme
+    fn get_schema(&self) -> &'static str;

Review comment:
       hmm... after taking a closer look at this, it looks like this is mainly 
used in `get_chunk_reader` to build object store specific chunkreaders based on 
the file scheme. I think the ideal abstraction would be to make file format 
modules agnostic to object stores instead of implementing object store specific 
format readers like `HadoopParquetFileReader`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1062: Add support of HDFS as remote object store

Reply via email to