Le 06/02/2020 à 19:40, Antoine Pitrou a écrit : > > Le 06/02/2020 à 19:37, Wes McKinney a écrit : >> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou <anto...@python.org> wrote: >> >>> Le 06/02/2020 à 16:26, Wes McKinney a écrit : >>>> >>>> This seems useful, too. It becomes a question of where do you want to >>>> manage the cached memory segments, however you obtain them. I'm >>>> arguing that we should not have much custom code in the Parquet >>>> library to manage the prefetched segments (and providing the correct >>>> buffer slice to each column reader when they need it), and instead >>>> encapsulate this logic so it can be reused. >>> >>> I see, so RandomAccessFile would have some associative caching logic to >>> find whether the exact requested range was cached and then return it to >>> the caller? That sounds doable. How is lifetime handled then? Are >>> cached buffers kept on the RandomAccessFile until they are requested, at >>> which point their ownership is transferred to the caller? >>> >> >> This seems like too much to try to build into RandomAccessFile. I would >> suggest a class that wraps a random access file and manages cached segments >> and their lifetimes through explicit APIs. > > So Parquet would expect to receive that class rather than > RandomAccessFile? Or it would grow separate paths for it?
Actually, on a more high-level basis, is the goal to prefetch for sequential consumption of row groups? Regards Antoine.