Le 18/03/2020 à 17:36, David Li a écrit : > Hi all, > > Thanks to Antoine for implementing the core read coalescing logic. > > We've taken a look at what else needs to be done to get this working, > and it sounds like the following changes would be worthwhile, > independent of the rest of the optimizations we discussed: > > - Add benchmarks of the current Parquet reader with the current S3File > (and other file implementations) so we can track > improvements/regressions
Instead of S3, you can use the Slow streams and Slow filesystem implementations. It may better protect against varying external conditions. > - Use the coalescing inside the Parquet reader (even without a column > filter hint - this would subsume PARQUET-1698) I'm assuming this would be done at the RowGroupReader level, right? > - In coalescing, split large read ranges into smaller ones (this would > further improve on PARQUET-1698 by taking advantage of parallel reads) I don't understand what the "advantage" would be. Can you elaborate? Regards Antoine.