On Wed, 5 Feb 2020 15:46:15 -0600
Wes McKinney <wesmck...@gmail.com> wrote:
> 
> I'll comment in more detail on some of the other items in due course,
> but I think this should be handled by an implementation of
> RandomAccessFile (that wraps a naked RandomAccessFile) with some
> additional methods, rather than adding this to the abstract
> RandomAccessFile interface, e.g.
> 
> class CachingInputFile : public RandomAccessFile {
>  public:
>    CachingInputFile(std::shared_ptr<RandomAccessFile> naked_file);
>    Status CacheRanges(...);
> };
> 
> etc.

IMHO it may be more beneficial to expose it as an asynchronous API on
RandomAccessFile, for example:

class RandomAccessFile {
 public:
  struct Range {
    int64_t offset;
    int64_t length;
  };

  std::vector<Promise<std::shared_ptr<Buffer>>>
    ReadRangesAsync(std::vector<Range> ranges);
};


The reason is that some APIs such as the C++ AWS S3 API have their own
async support, which may be beneficial to use over a generic Arrow
thread-pool implementation.

Also, by returning a Promise instead of simply caching the results, you
make it easier to handle the lifetime of the results.


(Promise<T> can be something like std::future<Result<T>>, though
std::future<> has annoying limitations and we may want to write our own
instead)

Regards

Antoine.


Reply via email to