[GitHub] [arrow-datafusion] yjshen commented on pull request #1905: Avoid repeated `open` for one single file and simplify object reader API on the `sync` part

GitBox Wed, 09 Mar 2022 00:48:33 -0800


yjshen commented on pull request #1905:
URL: 
https://github.com/apache/arrow-datafusion/pull/1905#issuecomment-1062686947



   Thanks @tustvold for the detailed analysis. ❤️
   
   We already have a workaround for the repeated open issue in the HDFS object 
store. And I'm changing the object reader API here to avoid future object 
reader implementations falling into the pitfall of repeated open 
unintentionally. 
   
   I really like the idea of getting rid of `ChunkReader` APIs and using an 
async parquet exec. I expect we could also achieve file-handle reuse for the 
async reading path on top of tokio async io. And I think we could remove this 
API:
   ```rust
       /// Get reader for a part [start, start + length] in the file
       fn sync_chunk_reader(
           &self,
           start: u64,
           length: usize,
       ) -> Result<Box<dyn Read + Send + Sync>>;
   ```
   
    totally since it's misuse-prone and only used by Parquet exec in your #1617 
as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on pull request #1905: Avoid repeated `open` for one single file and simplify object reader API on the `sync` part

Reply via email to