tustvold commented on issue #1163:
URL: https://github.com/apache/arrow-rs/issues/1163#issuecomment-1011957344


   I had a brief play around with this and found the following.
   
   The write side is serial, and so it should be possible to use standard 
library abstractions. The current trait topology will likely require using 
`Rc<RefCell<W>>` or similar. I tried using mutable borrows, but this runs into 
issues as the types need to be boxed in order to be used in traits (due to lack 
of GATs) but by value trait methods (i.e. `fn close(self)`) aren't object safe, 
which makes the ergonomics of such an API suck as you need to manually 
`std::mem::drop`.
   
   The read side is more complicated, the problem can be seen in 
`SerializedRowGroupReader::get_column_page_reader`. This wants to return a 
`Box<dyn PageReader>` which can be used asynchronously (although not 
concurrently) with respect to others from the same row group. This is what 
requires `FileSource`, we want buffered reads on a shared file descriptor.
   
   It occurs to me that one of the things an async API able to support object 
stores will need is a sparse file abstraction, ultimately this is what the read 
path wants. I'll therefore park this for now, and see what shakes out of that.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to