alamb commented on a change in pull request #1905:
URL: https://github.com/apache/arrow-datafusion/pull/1905#discussion_r820111433



##########
File path: datafusion/src/datasource/object_store/local.rs
##########
@@ -82,23 +112,12 @@ impl ObjectReader for LocalFileReader {
         )
     }
 
-    fn sync_chunk_reader(
-        &self,
-        start: u64,
-        length: usize,
-    ) -> Result<Box<dyn Read + Send + Sync>> {
-        // A new file descriptor is opened for each chunk reader.
-        // This okay because chunks are usually fairly large.
-        let mut file = File::open(&self.file.path)?;

Review comment:
       I probably misunderstand something here and I am sorry I don't quite 
follow all the comments on this PR. 
   
   If the issue you are trying to solve is that `File::open` is called too 
often, would it be possible to "memoize" the open here with a mutex inside of 
the FileReader?
   
   Something like
   
   ```rust
   struct LocalFileReader { 
   ...
       /// Keep the open file descriptor to avoid reopening it
      cache: Mutex<Option<Box<dyn Read + Send + Sync + Clone>>>
   }
   
   impl LocalFileReader { 
   ...
       fn sync_chunk_reader(
           &self,
           start: u64,
           length: usize,
       ) -> Result<Box<dyn Read + Send + Sync>> {
       let mut cache = self.cache.lock();
       if let Some(cache) = cache {
         return Ok(cache.clone())
       };
       *cache = File::open(...);
       return cache.clone();
   }
   ```
         




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to