On Thu, Feb 6, 2020, 12:41 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 06/02/2020 à 19:37, Wes McKinney a écrit :
> > On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou <anto...@python.org> wrote:
> >
> >> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
> >>>
> >>> This seems useful, too. It becomes a question of where do you want to
> >>> manage the cached memory segments, however you obtain them. I'm
> >>> arguing that we should not have much custom code in the Parquet
> >>> library to manage the prefetched segments (and providing the correct
> >>> buffer slice to each column reader when they need it), and instead
> >>> encapsulate this logic so it can be reused.
> >>
> >> I see, so RandomAccessFile would have some associative caching logic to
> >> find whether the exact requested range was cached and then return it to
> >> the caller?  That sounds doable.  How is lifetime handled then?  Are
> >> cached buffers kept on the RandomAccessFile until they are requested, at
> >> which point their ownership is transferred to the caller?
> >>
> >
> > This seems like too much to try to build into RandomAccessFile. I would
> > suggest a class that wraps a random access file and manages cached
> segments
> > and their lifetimes through explicit APIs.
>
> So Parquet would expect to receive that class rather than
> RandomAccessFile?  Or it would grow separate paths for it?
>

If the user opts in to coalesced prefetching then the RowGroupReader would
instantiate the wrapper under the hood. Public APIs (aside from new APIs in
ReaderProperties for prefetching) would be unchanged.



>
>
>
> Regards
>
> Antoine.
>

Reply via email to