On Wed, Aug 9, 2023 at 11:15 AM Tomas Vondra
<[email protected]> wrote:
> Cool. I'll try to build my own set of examples that I find interesting
> either because it's what the patch aims to help with, or because I
> expect it to be problematic for some reason. And then we can compare.
That would be great. I definitely want to make this a collaborative thing.
> Yup, I agree with that principle. The AM can evaluate the expression in
> a smarter way, without the visibility checks.
Attached text document (which I guess might be my first draft) is an
attempt to put the discussion up until this point on a more formal
footing.
The format here tries to reduce the principles to a handful of bullet
points. For example, one line reads:
+ Index quals are better than equivalent index filters because bitmap
index scans can only use index quals
I'm pretty sure that these are all points that you and I both agree on
already. But you should confirm that. And make your own revisions, as
you see fit.
It's definitely possible that I overlooked an interesting and durable
advantage that index filters have. If there is some other way in which
index filters are likely to remain the best and only viable approach,
then we should note that. I just went with the really obvious case of
an expression that definitely needs visibility checks to avoid ever
throwing a division-by-zero error, related to some invisible tuple.
It's an extreme case in that it focuses on requirements that seem just
about unavoidable in any future world. (Come to think of it, the
biggest and most durable advantage for index filters is probably just
how general they are, which I do mention.)
--
Peter Geoghegan
General principles for index quals
==================================
Tomas Vondra's "evaluate filters on the index tuple" patch works by making an
expressions that didn't becoming index
quals into additional index filter (not table filters) where possible. The
general idea is that the planner should
recognize a kind of "qual hierarchy", where it is always preferable to make
each index path have use true index quals
where possible, but, failing that, it is still preferable to make the predicate
into an index filter rather than into a
table filter.
In planner:
Index Quals > Index Filter > Table Filter
Although this general design makes sense, it nevertheless can lead to
suboptimal outcomes: situations where the planner
chooses a plan that is obviously not as good as some hypothetical other plan
that manages to use true index quals
through some (currently unimplemented) other means.
This isn't really a problem with the patch itself; it is more a case of the
planner lacking complementary optimizations
that make predicates into true index quals before the mechanism added by the
patch is even reached. The "catch-all"
nature of the patch isn't a problem to be fixed -- its very general nature is a
key strength of the approach (it uses
full expression evaluation, which is much more general than any approach that
ultimately reduces predicates to indexable
operators from an opclass).
This document lists certain specific scenarios where the patch doesn't quite do
as well as one would hope. That puts
things on a good footing for adding other complementary techniques later on.
Theoretically this will be independent
work, that could have happened at any point in the past. But the patch has
brought these other related issues into
sharp relief for the first time -- so it doesn't necessarily feel all that
independent.
A working "taxonomy" for all of these techniques seems useful, even one such as
this, that's generally acknowledged to
be incomplete and far from perfect.
Why index filters are better than equivalent table filters
----------------------------------------------------------
This part is very simple.
+ They're better because they have a decent chance of avoiding significant
amounts of heap accesses for expression evaluation.
This extends some of the benefits that index-only scans have to plain index
scans.
Why index quals are better than equivalent index filters
--------------------------------------------------------
The advantages that index quals have over index filters can be much less clear
than one would hope because index filters
are confusingly similar to index quals. But index quals are significantly
better, overall. This is guaranteed to be
true, assuming we're comparing "equivalent" index quals and index filters --
logically interchangeable forms of quals
with different physical/runtime characteristics.
(An index filter might be better than an index qual when it happens to be much
more selective, because it filters tuples
in some fundamentally different way, but that's not relevant to this
discussion. We're focussed on clear-cut cases,
where we can speak with the most confidence, since that's a useful basis for
further improvements later on.)
We may even be missing out on the advantages of true index quals in cases where
the patch actually offers big
improvements compared to what was possible in Postgres 16. Since, of course,
index filters can still be much better
than table filters. To make matters more confusing, the consequences may only
be apparent at certain times, for a given
query. And minor variants of the same query can be very sensitive to whether
true index quals or mere index filters
could be used. We must keep all of this in perspective, which is hard.
Example 1: bitmap index scan, prefer index quals
------------------------------------------------
+ Index quals are better than equivalent index filters because bitmap index
scans can only use index quals
If we have an index on "tenk1 (four, two)", then we miss out on the opportunity
to eliminate heap accesses for a query
like this one:
select ctid, *
from tenk1
where four = 1 and two != 1;
This gives us a "bad" plan (at least in relative-to-ideal-case terms), since
there is an excessive amount of heap access:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Bitmap Heap Scan on public.tenk1 (cost=25.35..407.85 rows=1250 width=250)
(actual time=0.707..0.707 rows=0 loops=1) │
│ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Recheck Cond: (tenk1.four = 1)
│
│ Filter: (tenk1.two <> 1)
│
│ Rows Removed by Filter: 2500
│
│ Heap Blocks: exact=345
│
│ Buffers: shared hit=349
│
│ -> Bitmap Index Scan on tenk1_four_two_idx (cost=0.00..25.04 rows=2500
width=0) (actual time=0.073..0.074 rows=2500 loops=1) │
│ Index Cond: (tenk1.four = 1)
│
│ Buffers: shared hit=4
│
│ Planning Time: 0.037 ms
│
│ Execution Time: 0.719 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Here we see a bitmap index scan plan (that uses our composite index), which
makes sense overall. But the details beyond
that make no sense -- since we're using table filter quals for "two". It turns
out that the bitmap heap scan will
access every single heap page in the tenk1 table as a result, even though we
could have literally avoided all heap
accesses had we been able to push down the != as an index qual.
As things stand, the patch gives an enormous advantage to index scans over
bitmap index scans. But the costing fails to
capture that difference. That may be okay in many individual cases, but it
seems questionable here. Intuitively, there
is no particular reason why this particular case (involving a simple
inequality) couldn't find a way to use index quals
instead of relying on index filters. That would be in keeping with the
traditional tendency of index scans to do
approximately the same thing as similar bitmap index scans (just in a slightly
different order).
This example relies on an understanding that we really could make != into just
another indexable operator, since it is
already a kind of an honorary member of the btree opclass. That is pretty
clear cut -- nbtree knows how to do a bunch
of fairly similar tricks already.
Just for context, here is the patch's "good" index scan plan (or better plan,
at least). This plan does the trick that
we'd ideally see from both index scans and bitmap index scans alike (namely, it
avoids all heap accesses):
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_four_two_idx on public.tenk1 (cost=0.29..709.59
rows=1250 width=250) (actual time=0.300..0.300 rows=0 loops=1) │
│ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: (tenk1.four = 1)
│
│ Index Filter: (tenk1.two <> 1)
│
│ Rows Removed by Index Recheck: 2500
│
│ Filter: (tenk1.two <> 1)
│
│ Buffers: shared hit=5
│
│ Planning Time: 0.035 ms
│
│ Execution Time: 0.311 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(Though of course it should actually make everything into "Index Cond:" index
quals -- need to teach nbtree how to do
that in some related patch that I have yet to start work on.)
Costing each variant becomes much less important if we can find a way to make
all useful runtime behavior available with
both plan variants.
Example 2: using index filters is inevitable and natural
--------------------------------------------------------
Similarly, we can imagine a somewhat similar query where the only possible
approach will be index filters put there by
the patch. Here we use the same index as before, on "tenk1 (four, two)":
select ctid, *
from tenk1
where four = 1 and 2/two = 1;
This gives us a good plan:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY
PLAN │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_four_two_idx on public.tenk1 (cost=0.29..59.14
rows=14 width=250) (actual time=0.321..0.321 rows=0 loops=1) │
│ Output: ctid, unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: (tenk1.four = 1)
│
│ Index Filter: ((2 / tenk1.two) = 1)
│
│ Rows Removed by Index Recheck: 2500
│
│ Filter: ((2 / tenk1.two) = 1)
│
│ Buffers: shared hit=5
│
│ Planning Time: 0.043 ms
│
│ Execution Time: 0.334 ms
│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Here we must do a visibility check in order to know ahead of time that any
division-by-zero errors are limited to those
index tuples actually visible to our MVCC snapshot. That factor is what makes
it very clear that the only sensible way
to do this is by using an index filter -- index AMs cannot be expected to do
true expression evaluation.
One would hope that the optimizer would be able to give full credit to the
index scan plan over a similar, potentially
much slower bitmap index scan plan (where the use of "table filters" is
inevitable and has the potential to make things
far slower). Whether or not that happens is ultimately a fairly standard
question of costing, which makes it out of
scope for this document.
Our first and second examples can be placed into each of two different buckets
in a fairly clear cut, black-and-white
way. There is also a huge amount of gray area that one can think of, but
that's out of scope for this document. This
document aims to be a starting point that will provide starting guidelines
about how to categorize any of these gray
area cases. Of course, our ability to implement various transformations as a
practical matter (e.g., teaching nbtree to
deal with inequalities natively) is bound to play a very important role in
practice -- maybe even the largest role.
Example 3: "risky" plan shifts away from a BitmapOr plan
--------------------------------------------------------
+ Index quals are better than index filters because they have the potential to
give the index AM important context.
+ Index quals are better than index filters because they can reliably avoid
heap access, no matter how the VM is set.
Currently, the patch has one notable change in regression tests output, for
this query:
SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR
tenthous = 42);
Master branch plan, using regression test composite index on "tenk1(thousand,
tenthous)":
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│
QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Bitmap Heap Scan on public.tenk1 (cost=6.89..8.91 rows=1 width=244) (actual
time=0.010..0.011 rows=1 loops=1)
│
│ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
│
│ Recheck Cond: (((tenk1.thousand = 42) AND (tenk1.tenthous = 1)) OR
((tenk1.thousand = 42) AND (tenk1.tenthous = 3)) OR ((tenk1.thousand = 42) AND
(tenk1.tenthous = 42))) │
│ Heap Blocks: exact=1
│
│ Buffers: shared hit=7
│
│ -> BitmapOr (cost=6.89..6.89 rows=1 width=0) (actual time=0.008..0.009
rows=0 loops=1)
│
│ Buffers: shared hit=6
│
│ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29
rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
│
│ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 1))
│
│ Buffers: shared hit=2
│
│ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29
rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1)
│
│ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 3))
│
│ Buffers: shared hit=2
│
│ -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..2.29
rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
│
│ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = 42))
│
│ Buffers: shared hit=2
│
│ Planning Time: 0.059 ms
│
│ Execution Time: 0.028 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
The patch uses this plan, which is the faster plan according to every
traditional measure:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..4.38
rows=1 width=244) (actual time=0.010..0.012 rows=1 loops=1) │
│ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: (tenk1.thousand = 42)
│
│ Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42)) │
│ Rows Removed by Index Recheck: 9
│
│ Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous =
42)) │
│ Buffers: shared hit=4
│
│ Planning Time: 0.051 ms
│
│ Execution Time: 0.023 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Notably, the original BitmapOr plan only accesses one heap page buffer. So the
patch gets a faster plan by saving on
index page accesses, even though that wasn't really its intent.
Example 4: alternative approach, by SAOP patch
----------------------------------------------
Another patch (this one from Peter Geoghegan) improves SAOP execution. If you
run the same query with that patch
applied, rewritten to use an IN ( ... ), you get the following similar though
slightly better plan:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..8.33
rows=1 width=244) (actual time=0.008..0.008 rows=1 loops=1) │
│ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = ANY
('{1,3,42}'::integer[])))
│
│ Buffers: shared hit=3
│
│ Planning Time: 0.043 ms
│
│ Execution Time: 0.017 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
This plan does the absolute bare minimum number of buffer accesses, since it
doesn't repeat any single individual access
to the same buffer. This is a key design goal for the SAOP patch. That in
itself isn't terribly interesting just yet --
what's interesting is that the plan is more _robust_, in the sense that will
_reliably_ perform a lot better than the
similar index filter plan, even if various closely related things should happen
to shift.
Suppose, for example, we do this:
insert into tenk1 (thousand, tenthous) select 42, i from generate_series(43,
1000) i;
The situation at execution time with the SAOP patch remains completely
unchanged. If we re-execute the query we once
again get 3 buffer hits (the plan will _reliably_ do the absolute minimum
number of page accesses in the way we already
described, even).
Meanwhile, this is what we see for the index filter patch -- much higher
execution cost compared to our original example 3.
This is due to a change that doesn't affect any of the rows the query needs to
return anyway:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..4.38
rows=1 width=244) (actual time=0.020..0.382 rows=1 loops=1) │
│ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: (tenk1.thousand = 42)
│
│ Index Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR
(tenk1.tenthous = 42)) │
│ Rows Removed by Index Recheck: 967
│
│ Filter: ((tenk1.tenthous = 1) OR (tenk1.tenthous = 3) OR (tenk1.tenthous =
42)) │
│ Buffers: shared hit=336
│
│ Planning Time: 0.088 ms
│
│ Execution Time: 0.402 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
To be fair, the index filter patch wouldn't have done too badly had we
consistently used SAOP syntax (it would have been
just like Postgres 16, which isn't "robust", but also isn't as "risky"). It is
the same as the plan that the SAOP gets,
except that we see the same issue with excessive primitive index scans/repeated
index page accesses:
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐│
QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_thous_tenthous on public.tenk1 (cost=0.29..8.90
rows=1 width=244) (actual time=0.011..0.012 rows=1 loops=1) │
│ Output: unique1, unique2, two, four, ten, twenty, hundred, thousand,
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4 │
│ Index Cond: ((tenk1.thousand = 42) AND (tenk1.tenthous = ANY
('{1,3,42}'::integer[])))
│
│ Buffers: shared hit=7
│
│ Planning Time: 0.047 ms
│
│ Execution Time: 0.022 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
The extra primitive index scans are of minimal concern here (though that might
not be true in more elaborate cases with
hundreds of constants).
The real problem here seems to be a perverse interaction with OR expression
evaluation. It's already possible to
effectively transform the expression into index quals for the BitmapOr path,
through other means. Ideally, we'd
consistently "normalize to CNF" earlier, so that we'd at least get a "robust"
SAOP-based plan in most simple cases
involving OR expressions (no less robust than in Postgres 16). That would mean
that the index filter patch would get a
plan that was expected to be more expensive, but also one that is relatively
robust. But once we had the main SAOP
patch then everything would be strictly better here.
In general, a SAOP (with nbtree, which has native support for SAOP clauses)
passes down true index quals, that give the
index AM the context required to terminate scans early, and to avoid heap
accesses. While the SAOP patch pushes that
idea a lot further, it isn't really a new idea.
It should also be noted that the problem with the index filter patch is one of
degree. Even Postgres 16 will eventually
have the same problem, with a sufficiently elaborate OR-based expression --
since the cost of repeated index accesses
pushes things in that direction eventually. The SAOP patch has the advantage
of reliably avoiding the issue by reliably
avoiding repeat accesses to index pages. This means that the plan we saw is
always the cheapest plan, no matter how the
details are varied. This happens because we simply don't have to ever make a
trade-off between heap page accesses and
index page accesses -- the SAOP plan is literally guaranteed to win, no matter
what happens at runtime.
For that reason, "risky OR plans" can be address within the scope of the main
SAOP patch, and/or Alena Rybakina's
OR-to-SAOP transformation patch. It seems unreasonable to make that the
problem of the index filter patch.
(Actually, it's still possible that we ought to have chosen a sequential scan,
and make the wrong choice, but that's not
something that we can just avoid.)
+ It's suboptimal (and weird) that we might sometimes rely on index filters to
reduce index page accesses
+ Index quals can achieve the same desirable outcome "robustly", because the
index AM has all required context
+ "Choice is confusion" -- we should always prefer one index path that "offers
all of the advantages" over two or more hypothetical alternative index paths
that can only do one thing well
+ More generally, we should try to avoid nonlinear cost differences between
highly similar plans (e.g., bitmap index scan vs index scan of same index).
In nbtree (using Markus Winand's terminology, which we should arguably have
exposed in EXPLAIN output):
Access predicates > Index filter predicates
Within nbtree itself, we could say that SK_BT_REQFWD and SK_BT_REQBKWD scan
keys are both types of "Access predicates".
It's a bit more complicated than that because with inequalities you'll often
have scan keys that are required in one
scan direction (e.g., forward scan case), but not required in the other
direction (should the direction change).
BTEqualStrategyNumber scan keys are both SK_BT_REQFWD and SK_BT_REQBKWD at the
same time, regardless of other details.
But overall SK_BT_REQFWD/SK_BT_REQBKWD scan keys indicate Access Predicates,
while anything else is merely index filter
predicates, which don't affect our scan's initial position nor affect when the
scan can terminate.
Presumably we'd need to make != scan keys (discussed in example 1) into index
filter predicates in this sense, as part
of any patch to teach index AMs (likely just nbtree) about simple inequalities.