Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

Antoine Pitrou Thu, 06 Feb 2020 11:30:19 -0800


Le 06/02/2020 à 20:20, Wes McKinney a écrit :
>> Actually, on a more high-level basis, is the goal to prefetch for
>> sequential consumption of row groups?
>>
> 
> Essentially yes. One "easy" optimization is to prefetch the entire
> serialized row group. This is an evolution of that idea where we want to
> prefetch only the needed parts of a row group in a minimum number of IO
> calls (consider reading the first 10 columns from a file with 1000 columns
> -- so we want to do one IO call instead of 10 like we do now).


There are no situations where you would want to consume a scattered
subset of row groups (e.g. predicate pushdown)?

Regards

Antoine.

Re: [Discuss] Proposal for optimizing Datasets over S3/object storage

Reply via email to