Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
On Wed, Dec 21, 2011 at 03:03, Tom Lane wrote: > I've applied a revised version of this patch that factors things in a > way I found nicer. Nice, thanks! Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Marti Raudsepp writes: > On Tue, Dec 20, 2011 at 07:08, Tom Lane wrote: >> but I think I don't >> like this refactoring much. Will take a closer look tomorrow. > I was afraid you'd say that, especially for a change that should be > backpatched. But I couldn't think of alternative ways to do it that > give non-bogus estimates. I've applied a revised version of this patch that factors things in a way I found nicer. The main concrete thing I didn't like about what you'd done was dropping the haveFullScan logic. If we have more than one qual triggering that, we're still going to do one full scan, not multiples of that. It seemed unreasonably hard to get that exactly right when there are multiple array quals each doing such a thing, but I didn't want to let it regress in its handling of multiple plain quals. Also, while looking at this I realized that we had the costing of nestloop cases all wrong. The idea is to scale up the number of tuples (pages) fetched, apply index_pages_fetched(), then scale down again. I think maybe somebody thought that was redundant, but it's not because index_pages_fetched() is nonlinear. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
On Tue, Dec 20, 2011 at 07:08, Tom Lane wrote: > it'd likely be better if this code ignored unrecognized qual expression > types rather than Assert'ing they're not there. The patch replaced that Assert with an elog(ERROR) > Hmm. I am reminded of how utterly unreadable "diff -u" format is for > anything longer than single-line changes :-( ... Sorry, the new patch is in context (-C) diff format proper. I also moved around code a bit and removed an unused variable that was left around from the refactoring. > but I think I don't > like this refactoring much. Will take a closer look tomorrow. I was afraid you'd say that, especially for a change that should be backpatched. But I couldn't think of alternative ways to do it that give non-bogus estimates. While writing this patch, the largest dilemma was where to account for the multiple array scans. Given that this code is mostly a heuristic and I lack a deep understanding of GIN indexes, it's likely that I got this part wrong. Currently I'm doing this: partialEntriesInQuals *= array_scans; exactEntriesInQuals *= array_scans; searchEntriesInQuals *= array_scans; Which seems to be the right thing as far as random disk accesses are concerned (successive scans are more likely to hit the cache) and also works well with queries that don't touch most of the index. But this fails spectacularly when multiple full scans are performed e.g. LIKE ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of the rescan costs. Another approach is multiplying the total cost from the number of scans. This overestimates random accesses from rescans, but fixes the above case: *indexTotalCost = (*indexStartupCost + dataPagesFetched * spc_random_page_cost) * array_scans; Regards, Marti diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out new file mode 100644 index e7af7d4..250d853 *** a/contrib/pg_trgm/expected/pg_trgm.out --- b/contrib/pg_trgm/expected/pg_trgm.out *** explain (costs off) *** 3486,3491 --- 3486,3501 Index Cond: (t ~~* '%BCD%'::text) (4 rows) + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); +QUERY PLAN + - + Bitmap Heap Scan on test2 +Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +-> Bitmap Index Scan on test2_idx_gin + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + (4 rows) + select * from test2 where t like '%BCD%'; t --- *** select * from test2 where t ilike 'qua%' *** 3509,3514 --- 3519,3531 quark (1 row) + select * from test2 where t like any ('{%bcd%,qua%}'); +t + + abcdef + quark + (2 rows) + drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; *** explain (costs off) *** 3528,3533 --- 3545,3560 Index Cond: (t ~~* '%BCD%'::text) (2 rows) + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); +QUERY PLAN + - + Bitmap Heap Scan on test2 +Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +-> Bitmap Index Scan on test2_idx_gist + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + (4 rows) + select * from test2 where t like '%BCD%'; t --- *** select * from test2 where t ilike 'qua%' *** 3551,3553 --- 3578,3587 quark (1 row) + select * from test2 where t like any ('{%bcd%,qua%}'); +t + + abcdef + quark + (2 rows) + diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql new file mode 100644 index ea902f6..ac969e6 *** a/contrib/pg_trgm/sql/pg_trgm.sql --- b/contrib/pg_trgm/sql/pg_trgm.sql *** explain (costs off) *** 47,56 --- 47,59 select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; + select * from test2 where t like any ('{%bcd%,qua%}'); drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; *** explain (costs off) *** 58,64 --- 61,70 select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Marti Raudsepp writes: > Since PostgreSQL 9.1, GIN has new cost estimation code. This code > assumes that the only expression type it's going to see is OpExpr. > However, ScalarArrayOpExpr has also been possible in earlier versions. > Estimating col ANY () queries segfaults in 9.1 if there's > a GIN index on the column. Ugh. I think we subconsciously assumed that ScalarArrayOpExpr couldn't appear here because GIN doesn't set amsearcharray, but of course that's wrong. > (It seems that RowCompareExpr and NullTest clauses are impossible for > a GIN index -- at least my efforts to find such cases failed) No, those are safe for the moment --- indxpath.c has a hard-wired assumption that RowCompareExpr is only usable with btree, and NullTest is only considered to be indexable if amsearchnulls is set. Still, it'd likely be better if this code ignored unrecognized qual expression types rather than Assert'ing they're not there. It's not like the cases it *does* handle are done so perfectly that ignoring an unknown qual could be said to bollix the estimate unreasonably. > Attached is an attempted fix for the issue. I split out the code for > the extract call and now run that for each array element, adding > together the average of (partialEntriesInQuals, exactEntriesInQuals, > searchEntriesInQuals) for each array element. After processing all > quals, I multiply the entries by the number of array_scans (which is > the product of all array lengths) to get the total cost. > This required a fair bit of refactoring, but I tried to follow the > patterns for OpExpr pretty strictly -- discounting scans over NULL > elements, returning 0 cost when none of the array elements can match, > accounting for cache effects when there are multiple scans, etc. But > it's also possible that I have no idea what I'm really doing. :) Hmm. I am reminded of how utterly unreadable "diff -u" format is for anything longer than single-line changes :-( ... but I think I don't like this refactoring much. Will take a closer look tomorrow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Hi list, Since PostgreSQL 9.1, GIN has new cost estimation code. This code assumes that the only expression type it's going to see is OpExpr. However, ScalarArrayOpExpr has also been possible in earlier versions. Estimating col ANY () queries segfaults in 9.1 if there's a GIN index on the column. Case in point: create table words (word text); create index on words using gin (to_tsvector('english', word)); explain analyze select * from words where to_tsvector('english', word) @@ any ('{foo}'); (It seems that RowCompareExpr and NullTest clauses are impossible for a GIN index -- at least my efforts to find such cases failed) Attached is an attempted fix for the issue. I split out the code for the extract call and now run that for each array element, adding together the average of (partialEntriesInQuals, exactEntriesInQuals, searchEntriesInQuals) for each array element. After processing all quals, I multiply the entries by the number of array_scans (which is the product of all array lengths) to get the total cost. This required a fair bit of refactoring, but I tried to follow the patterns for OpExpr pretty strictly -- discounting scans over NULL elements, returning 0 cost when none of the array elements can match, accounting for cache effects when there are multiple scans, etc. But it's also possible that I have no idea what I'm really doing. :) I also added regression tests for this to tsearch and pg_trgm. Regards, Marti diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out index e7af7d4..250d853 100644 --- a/contrib/pg_trgm/expected/pg_trgm.out +++ b/contrib/pg_trgm/expected/pg_trgm.out @@ -3486,6 +3486,16 @@ explain (costs off) Index Cond: (t ~~* '%BCD%'::text) (4 rows) +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); + QUERY PLAN +- + Bitmap Heap Scan on test2 + Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + -> Bitmap Index Scan on test2_idx_gin + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +(4 rows) + select * from test2 where t like '%BCD%'; t --- @@ -3509,6 +3519,13 @@ select * from test2 where t ilike 'qua%'; quark (1 row) +select * from test2 where t like any ('{%bcd%,qua%}'); + t + + abcdef + quark +(2 rows) + drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; @@ -3528,6 +3545,16 @@ explain (costs off) Index Cond: (t ~~* '%BCD%'::text) (2 rows) +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); + QUERY PLAN +- + Bitmap Heap Scan on test2 + Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + -> Bitmap Index Scan on test2_idx_gist + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +(4 rows) + select * from test2 where t like '%BCD%'; t --- @@ -3551,3 +3578,10 @@ select * from test2 where t ilike 'qua%'; quark (1 row) +select * from test2 where t like any ('{%bcd%,qua%}'); + t + + abcdef + quark +(2 rows) + diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql index ea902f6..ac969e6 100644 --- a/contrib/pg_trgm/sql/pg_trgm.sql +++ b/contrib/pg_trgm/sql/pg_trgm.sql @@ -47,10 +47,13 @@ explain (costs off) select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; +select * from test2 where t like any ('{%bcd%,qua%}'); drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; @@ -58,7 +61,10 @@ explain (costs off) select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; +select * from test2 where t like any ('{%bcd%,qua%}'); diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index d06809e..64b732e 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -6591,6 +6591,70 @@ find_index_column(Node *op, IndexOptInfo *index) } /* + * Estimate gin costs for a single search value. Returns false if no match + * is possible, true otherwise. All pointer arguments must be initialized by + * the caller. + */ +static bool +gin_costestimate_search(Oid extractProcOid, Datum value, int strategy_op, +