On Wed, Jun 23, 2021 at 3:03 AM Antoine Pitrou <anto...@python.org> wrote:
>
> On Tue, 22 Jun 2021 19:04:49 -0500
> Wes McKinney <wesmck...@gmail.com> wrote:
> > Some on this list might be interested in a new paper out of CMU/MIT
> > about the use of selection vectors and bitmaps for handling the
> > intermediate results of filters:
> >
> > https://db.cs.cmu.edu/papers/2021/ngom-damon2021.pdf
> >
> > The research was done in the context of NoisePage which uses Arrow as
> > its memory format. I found some of the observations related to AVX512
> > to be interesting.
>
> Too bad they didn't compare with the simple strategy of materializing
> filtered results.

I think this strategy has been rejected consistently in vectorized
query engines on empirical performance grounds. "Pushing down" the
filter into aggregate or elementwise kernels (to avoid a temporary
materialization / memory allocation) is the way that systems I'm aware
with work.

I'm not sure the best reference on this to learn more, but Marcin
Zukowski's PhD thesis (who went on to build Vectorwise and then
Snowflake) is here

https://www.semanticscholar.org/paper/Balancing-vectorized-query-execution-with-storage-Zukowski/8ded53b9756d7a45065f92e91946c8049e92eecd

>
> Another issue is that the "Full" computation strategy is delicate to
> implement when kernels may raise errors (e.g. checked arithmetic).
>
>

Reply via email to