On Wed, 5 Feb 2020 15:46:15 -0600 Wes McKinney <wesmck...@gmail.com> wrote: > > I'll comment in more detail on some of the other items in due course, > but I think this should be handled by an implementation of > RandomAccessFile (that wraps a naked RandomAccessFile) with some > additional methods, rather than adding this to the abstract > RandomAccessFile interface, e.g. > > class CachingInputFile : public RandomAccessFile { > public: > CachingInputFile(std::shared_ptr<RandomAccessFile> naked_file); > Status CacheRanges(...); > }; > > etc.
IMHO it may be more beneficial to expose it as an asynchronous API on RandomAccessFile, for example: class RandomAccessFile { public: struct Range { int64_t offset; int64_t length; }; std::vector<Promise<std::shared_ptr<Buffer>>> ReadRangesAsync(std::vector<Range> ranges); }; The reason is that some APIs such as the C++ AWS S3 API have their own async support, which may be beneficial to use over a generic Arrow thread-pool implementation. Also, by returning a Promise instead of simply caching the results, you make it easier to handle the lifetime of the results. (Promise<T> can be something like std::future<Result<T>>, though std::future<> has annoying limitations and we may want to write our own instead) Regards Antoine.