Hi And here is a proposed code change, alternative to the doc change.
I hope that it is ok to relax the restriction like so, as `cost_bitmap_node_and` is already resigned to inputs not being independent: * We estimate AND selectivity on the assumption that the inputs are * independent. This is probably often wrong, but we don't have the info * to do better. I've ran "make check", and one test have changed (the last one in the "test extraction of restriction OR clauses from join OR clause" group - in an expected way, IMO). I am attaching the regression.diff for convenience. Is this a generally desirable change to pursue? Best regards, Dmytro On Wed, Feb 26, 2025 at 7:31 PM Dmytro Astapov <[email protected]> wrote: > Hi! > > I am (still) very unsure if the code change I mentioned will make sense, > but documentation chage could perhaps look like something along these lines? > > > > Best regards, Dmytro > > > On Tue, Feb 25, 2025 at 9:14 PM Dmytro Astapov <[email protected]> wrote: > >> Hi! >> >> I've been investigating why postgres does not do BitmapAnd of two >> well-suited indexes, and reading indxpath.c >> >> In my case, there is a table (d date, col1 int, col2 int) -- types not >> really important -- and there are two indices on (d,col1) and (d, col2). >> >> For queries that do WHERE d>=X AND col1=Y AND col2=Z postgres will never >> BitmapAnd those two indices because both indexes include (d) and we have a >> condition on (d). Here is a full example, which could also be seen here: >> https://www.db-fiddle.com/f/uPLx5bRtDEoZw3Dx4kkwKh/0: >> >> begin; >> >> CREATE TABLE test_table ( >> d date, >> col1 int, >> col2 int >> ); >> >> INSERT INTO test_table (d, col1, col2) >> SELECT >> d.date, >> c1.val as col1, >> c2.val as col2 >> FROM >> generate_series( >> '2023-01-01'::date, >> '2023-12-31'::date, >> '1 day'::interval >> ) as d(date), >> generate_series(1, 1000) as c1(val), >> generate_series(1, 1000) as c2(val) >> WHERE >> random() < 0.001; >> >> create index on test_table(col1,d); >> create index on test_table(col2,d); >> >> -- This uses BitmapAnd >> explain select * from test_table where col1=123 and col2=321; >> >> -- This does not use BitmapAnd, even though it could! >> explain select * from test_table where col1=123 and col2=321 and d >= >> '2023-05-05'; >> >> I checked that BitmapAnd is rejected by this >> <https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/indxpath.c#L1878> >> line in choose_bitmap_and: >> >> if (bms_overlap(pathinfo->clauseids, clauseidsofar)) >> continue; /* consider it redundant */ >> >> There is a comment on choose_bitmap_and that explains the rationale of >> this check, but reading it I can't help but feel that what the comment >> describes is this condition: >> >> if (bms_is_subset(pathinfo->clauseids, clauseidsofar)) >> continue; /* consider it redundant */ >> >> And indeed, in my (admittedly not super-extensive) testing changing >> bms_overlap to bms_is_subset leads to better faster execution plans. >> >> Is it possible that this condition could thus be relaxed? >> >> Even if I am wrong, and the condition absolutely should be bms_overlap, I >> feel that this restriction is very very hard to discover and perhaps >> https://www.postgresql.org/docs/current/indexes-bitmap-scans.html should >> mention that compound indexes that have columns in common will never be >> combined? >> >> Best regards, Dmytro >> >
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index a43ca16d68..f34d82f2e9 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1767,16 +1767,16 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
* because of the prefiltering step. The cheapest of these is returned.
*
* We will only consider AND combinations in which no two indexes use the
- * same WHERE clause. This is a bit of a kluge: it's needed because
- * costsize.c and clausesel.c aren't very smart about redundant clauses.
- * They will usually double-count the redundant clauses, producing a
- * too-small selectivity that makes a redundant AND step look like it
- * reduces the total cost. Perhaps someday that code will be smarter and
- * we can remove this limitation. (But note that this also defends
- * against flat-out duplicate input paths, which can happen because
- * match_join_clauses_to_index will find the same OR join clauses that
- * extract_restriction_or_clauses has pulled OR restriction clauses out
- * of.)
+ * WHERE clauses that are subset of another. This is a bit of a kluge:
+ * it's needed because costsize.c and clausesel.c aren't very smart about
+ * redundant clauses. They will usually double-count the redundant
+ * clauses, producing a too-small selectivity that makes a redundant AND
+ * step look like it reduces the total cost. Perhaps someday that code
+ * will be smarter and we can remove this limitation. (But note that this
+ * also defends against flat-out duplicate input paths, which can happen
+ * because match_join_clauses_to_index will find the same OR join clauses
+ * that extract_restriction_or_clauses has pulled OR restriction clauses
+ * out of.)
*
* For the same reason, we reject AND combinations in which an index
* predicate clause duplicates another clause. Here we find it necessary
@@ -1875,7 +1875,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel, List *paths)
pathinfo = pathinfoarray[j];
/* Check for redundancy */
- if (bms_overlap(pathinfo->clauseids, clauseidsofar))
+ if (bms_is_subset(pathinfo->clauseids, clauseidsofar))
continue; /* consider it redundant */
if (pathinfo->preds)
{
regression.diffs
Description: Binary data
