On Thu, Feb 6, 2020 at 1:30 PM Antoine Pitrou <anto...@python.org> wrote: > > > Le 06/02/2020 à 20:20, Wes McKinney a écrit : > >> Actually, on a more high-level basis, is the goal to prefetch for > >> sequential consumption of row groups? > >> > > > > Essentially yes. One "easy" optimization is to prefetch the entire > > serialized row group. This is an evolution of that idea where we want to > > prefetch only the needed parts of a row group in a minimum number of IO > > calls (consider reading the first 10 columns from a file with 1000 columns > > -- so we want to do one IO call instead of 10 like we do now). > > There are no situations where you would want to consume a scattered > subset of row groups (e.g. predicate pushdown)?
There are. If it can be demonstrated that there are performance gains resulting from IO optimizations involving multiple row groups then I see no reason not to implement them. > Regards > > Antoine.