On Wed, Jun 23, 2021 at 3:03 AM Antoine Pitrou <anto...@python.org> wrote: > > On Tue, 22 Jun 2021 19:04:49 -0500 > Wes McKinney <wesmck...@gmail.com> wrote: > > Some on this list might be interested in a new paper out of CMU/MIT > > about the use of selection vectors and bitmaps for handling the > > intermediate results of filters: > > > > https://db.cs.cmu.edu/papers/2021/ngom-damon2021.pdf > > > > The research was done in the context of NoisePage which uses Arrow as > > its memory format. I found some of the observations related to AVX512 > > to be interesting. > > Too bad they didn't compare with the simple strategy of materializing > filtered results.
I think this strategy has been rejected consistently in vectorized query engines on empirical performance grounds. "Pushing down" the filter into aggregate or elementwise kernels (to avoid a temporary materialization / memory allocation) is the way that systems I'm aware with work. I'm not sure the best reference on this to learn more, but Marcin Zukowski's PhD thesis (who went on to build Vectorwise and then Snowflake) is here https://www.semanticscholar.org/paper/Balancing-vectorized-query-execution-with-storage-Zukowski/8ded53b9756d7a45065f92e91946c8049e92eecd > > Another issue is that the "Full" computation strategy is delicate to > implement when kernels may raise errors (e.g. checked arithmetic). > >