Le 06/02/2020 à 19:40, Antoine Pitrou a écrit :
> 
> Le 06/02/2020 à 19:37, Wes McKinney a écrit :
>> On Thu, Feb 6, 2020, 12:12 PM Antoine Pitrou <anto...@python.org> wrote:
>>
>>> Le 06/02/2020 à 16:26, Wes McKinney a écrit :
>>>>
>>>> This seems useful, too. It becomes a question of where do you want to
>>>> manage the cached memory segments, however you obtain them. I'm
>>>> arguing that we should not have much custom code in the Parquet
>>>> library to manage the prefetched segments (and providing the correct
>>>> buffer slice to each column reader when they need it), and instead
>>>> encapsulate this logic so it can be reused.
>>>
>>> I see, so RandomAccessFile would have some associative caching logic to
>>> find whether the exact requested range was cached and then return it to
>>> the caller?  That sounds doable.  How is lifetime handled then?  Are
>>> cached buffers kept on the RandomAccessFile until they are requested, at
>>> which point their ownership is transferred to the caller?
>>>
>>
>> This seems like too much to try to build into RandomAccessFile. I would
>> suggest a class that wraps a random access file and manages cached segments
>> and their lifetimes through explicit APIs.
> 
> So Parquet would expect to receive that class rather than
> RandomAccessFile?  Or it would grow separate paths for it?

Actually, on a more high-level basis, is the goal to prefetch for
sequential consumption of row groups?

Regards

Antoine.

Reply via email to