On Wed, 5 Feb 2020 15:46:15 -0600
Wes McKinney <[email protected]> wrote:
>
> I'll comment in more detail on some of the other items in due course,
> but I think this should be handled by an implementation of
> RandomAccessFile (that wraps a naked RandomAccessFile) with some
> additional methods, rather than adding this to the abstract
> RandomAccessFile interface, e.g.
>
> class CachingInputFile : public RandomAccessFile {
> public:
> CachingInputFile(std::shared_ptr<RandomAccessFile> naked_file);
> Status CacheRanges(...);
> };
>
> etc.
IMHO it may be more beneficial to expose it as an asynchronous API on
RandomAccessFile, for example:
class RandomAccessFile {
public:
struct Range {
int64_t offset;
int64_t length;
};
std::vector<Promise<std::shared_ptr<Buffer>>>
ReadRangesAsync(std::vector<Range> ranges);
};
The reason is that some APIs such as the C++ AWS S3 API have their own
async support, which may be beneficial to use over a generic Arrow
thread-pool implementation.
Also, by returning a Promise instead of simply caching the results, you
make it easier to handle the lifetime of the results.
(Promise<T> can be something like std::future<Result<T>>, though
std::future<> has annoying limitations and we may want to write our own
instead)
Regards
Antoine.