tustvold commented on issue #1163: URL: https://github.com/apache/arrow-rs/issues/1163#issuecomment-1011957344
I had a brief play around with this and found the following. The write side is serial, and so it should be possible to use standard library abstractions. The current trait topology will likely require using `Rc<RefCell<W>>` or similar. I tried using mutable borrows, but this runs into issues as the types need to be boxed in order to be used in traits (due to lack of GATs) but by value trait methods (i.e. `fn close(self)`) aren't object safe, which makes the ergonomics of such an API suck as you need to manually `std::mem::drop`. The read side is more complicated, the problem can be seen in `SerializedRowGroupReader::get_column_page_reader`. This wants to return a `Box<dyn PageReader>` which can be used asynchronously (although not concurrently) with respect to others from the same row group. This is what requires `FileSource`, we want buffered reads on a shared file descriptor. It occurs to me that one of the things an async API able to support object stores will need is a sparse file abstraction, ultimately this is what the read path wants. I'll therefore park this for now, and see what shakes out of that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
