On Wed, Oct 11, 2023 at 01:00:44PM -0700, Peter Geoghegan wrote: > On Wed, Oct 11, 2023 at 11:38 AM Noah Misch <n...@leadboat.com> wrote: > > Interesting. So, >99% of interval-type indexes, even ones WITH > > (deduplicate_items=off), will get amcheck failures. The <1% of exceptions > > might include indexes having allequalimage=off due to an additional column, > > e.g. a two-column (interval, numeric) index. If interval indexes are common > > enough and "pg_amcheck --heapallindexed" failures from $SUBJECT are > > relatively > > rare, that could argue for giving amcheck a special case. Specifically, > > downgrade its "metapage incorrectly indicates that deduplication is safe" > > from > > ERROR to WARNING for interval_ops only. > > I am not aware of any user actually running "deduplicate_items = off" > in production, for any index. It was added purely as a defensive thing > -- not because I anticipated any real need to disable deduplication. > Deduplication was optimized for being enabled by default.
Sure. Low-importance background information: deduplicate_items=off got on my radar while I was wondering if ALTER INDEX ... SET (deduplicate_items=off) would clear allequalimage. If it had, we could have advised people to use ALTER INDEX, then rebuild only those indexes still failing "pg_amcheck --heapallindexed". ALTER INDEX doesn't do that, ruling out that idea. > > Without that special case (i.e. with > > the v1 patch), the release notes should probably resemble, "After updating, > > run REINDEX on all indexes having an interval-type column." > > +1 > > > There's little > > point in recommending pg_amcheck if >99% will fail. I'm inclined to bet > > that > > interval-type indexes are rare, so I lean against adding the amcheck special > > case. It's not a strong preference. Other opinions? > exactly one case like that post-fix (interval_ops is at least the only > affected core code opfamily), so why not point that out directly with > a HINT? A HINT could go a long way towards putting the problem in > context, without really adding a special case, and without any real > question of users being misled. Works for me. Added.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Dissociate btequalimage() from interval_ops, ending its deduplication. Under interval_ops, some equal values are distinguishable. One such pair is '24:00:00' and '1 day'. With that being so, btequalimage() breaches the documented contract for the "equalimage" btree support function. This can cause incorrect results from index-only scans. Users should REINDEX any btree indexes having interval-type columns. After updating, pg_amcheck will report an error for almost all such indexes. This fix makes interval_ops simply omit the support function, like numeric_ops does. Back-pack to v13, where btequalimage() first appeared. In back branches, for the benefit of old catalog content, btequalimage() code will return false for type "interval". Going forward, back-branch initdb will include the catalog change. Reviewed by Peter Geoghegan. Discussion: https://postgr.es/m/20231011013317.22.nmi...@google.com diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index dbb83d8..3e07a3e 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -31,6 +31,7 @@ #include "access/xact.h" #include "catalog/index.h" #include "catalog/pg_am.h" +#include "catalog/pg_opfamily_d.h" #include "commands/tablecmds.h" #include "common/pg_prng.h" #include "lib/bloomfilter.h" @@ -338,10 +339,20 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed, errmsg("index \"%s\" metapage has equalimage field set on unsupported nbtree version", RelationGetRelationName(indrel)))); if (allequalimage && !_bt_allequalimage(indrel, false)) + { + bool has_interval_ops = false; + + for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(indrel); i++) + if (indrel->rd_opfamily[i] == INTERVAL_BTREE_FAM_OID) + has_interval_ops = true; ereport(ERROR, (errcode(ERRCODE_INDEX_CORRUPTED), errmsg("index \"%s\" metapage incorrectly indicates that deduplication is safe", - RelationGetRelationName(indrel)))); + RelationGetRelationName(indrel)), + has_interval_ops + ? errhint("This is known of \"interval\" indexes last built on a version predating 2023-11.") + : 0)); + } /* Check index, possibly against table it is an index on */ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck, diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat index 5b95012..4c70da4 100644 --- a/src/include/catalog/pg_amproc.dat +++ b/src/include/catalog/pg_amproc.dat @@ -172,8 +172,6 @@ { amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', amprocrighttype => 'interval', amprocnum => '3', amproc => 'in_range(interval,interval,interval,bool,bool)' }, -{ amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', - amprocrighttype => 'interval', amprocnum => '4', amproc => 'btequalimage' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', amprocrighttype => 'macaddr', amprocnum => '1', amproc => 'macaddr_cmp' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat index 91587b9..81a8525 100644 --- a/src/include/catalog/pg_opfamily.dat +++ b/src/include/catalog/pg_opfamily.dat @@ -50,7 +50,7 @@ opfmethod => 'btree', opfname => 'integer_ops' }, { oid => '1977', opfmethod => 'hash', opfname => 'integer_ops' }, -{ oid => '1982', +{ oid => '1982', oid_symbol => 'INTERVAL_BTREE_FAM_OID', opfmethod => 'btree', opfname => 'interval_ops' }, { oid => '1983', opfmethod => 'hash', opfname => 'interval_ops' }, diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index a1bdf2c..7a6f36a 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -2208,6 +2208,7 @@ ORDER BY 1, 2, 3; | array_ops | array_ops | anyarray | float_ops | float4_ops | real | float_ops | float8_ops | double precision + | interval_ops | interval_ops | interval | jsonb_ops | jsonb_ops | jsonb | multirange_ops | multirange_ops | anymultirange | numeric_ops | numeric_ops | numeric @@ -2216,7 +2217,7 @@ ORDER BY 1, 2, 3; | record_ops | record_ops | record | tsquery_ops | tsquery_ops | tsquery | tsvector_ops | tsvector_ops | tsvector -(15 rows) +(16 rows) -- **************** pg_index **************** -- Look for illegal values in pg_index fields.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Dissociate btequalimage() from interval_ops, ending its deduplication. Under interval_ops, some equal values are distinguishable. One such pair is '24:00:00' and '1 day'. With that being so, btequalimage() breaches the documented contract for the "equalimage" btree support function. This can cause incorrect results from index-only scans. Users should REINDEX any btree indexes having interval-type columns. After updating, pg_amcheck will report an error for almost all such indexes. This fix makes interval_ops simply omit the support function, like numeric_ops does. Back-pack to v13, where btequalimage() first appeared. In back branches, for the benefit of old catalog content, btequalimage() code will return false for type "interval". Going forward, back-branch initdb will include the catalog change. Reviewed by Peter Geoghegan. Discussion: https://postgr.es/m/20231011013317.22.nmi...@google.com diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index dbb83d8..3e07a3e 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -31,6 +31,7 @@ #include "access/xact.h" #include "catalog/index.h" #include "catalog/pg_am.h" +#include "catalog/pg_opfamily_d.h" #include "commands/tablecmds.h" #include "common/pg_prng.h" #include "lib/bloomfilter.h" @@ -338,10 +339,20 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed, errmsg("index \"%s\" metapage has equalimage field set on unsupported nbtree version", RelationGetRelationName(indrel)))); if (allequalimage && !_bt_allequalimage(indrel, false)) + { + bool has_interval_ops = false; + + for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(indrel); i++) + if (indrel->rd_opfamily[i] == INTERVAL_BTREE_FAM_OID) + has_interval_ops = true; ereport(ERROR, (errcode(ERRCODE_INDEX_CORRUPTED), errmsg("index \"%s\" metapage incorrectly indicates that deduplication is safe", - RelationGetRelationName(indrel)))); + RelationGetRelationName(indrel)), + has_interval_ops + ? errhint("This is known of \"interval\" indexes last built on a version predating 2023-11.") + : 0)); + } /* Check index, possibly against table it is an index on */ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck, diff --git a/src/backend/utils/adt/datum.c b/src/backend/utils/adt/datum.c index 9f06ee7..251dd23 100644 --- a/src/backend/utils/adt/datum.c +++ b/src/backend/utils/adt/datum.c @@ -43,6 +43,7 @@ #include "postgres.h" #include "access/detoast.h" +#include "catalog/pg_type_d.h" #include "common/hashfn.h" #include "fmgr.h" #include "utils/builtins.h" @@ -385,20 +386,17 @@ datum_image_hash(Datum value, bool typByVal, int typLen) * datum_image_eq() in all cases can use this as their "equalimage" support * function. * - * Currently, we unconditionally assume that any B-Tree operator class that - * registers btequalimage as its support function 4 must be able to safely use - * optimizations like deduplication (i.e. we return true unconditionally). If - * it ever proved necessary to rescind support for an operator class, we could - * do that in a targeted fashion by doing something with the opcintype - * argument. + * Earlier minor releases erroneously associated this function with + * interval_ops. Detect that case to rescind deduplication support, without + * requiring initdb. *------------------------------------------------------------------------- */ Datum btequalimage(PG_FUNCTION_ARGS) { - /* Oid opcintype = PG_GETARG_OID(0); */ + Oid opcintype = PG_GETARG_OID(0); - PG_RETURN_BOOL(true); + PG_RETURN_BOOL(opcintype != INTERVALOID); } /*------------------------------------------------------------------------- diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat index 5b95012..4c70da4 100644 --- a/src/include/catalog/pg_amproc.dat +++ b/src/include/catalog/pg_amproc.dat @@ -172,8 +172,6 @@ { amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', amprocrighttype => 'interval', amprocnum => '3', amproc => 'in_range(interval,interval,interval,bool,bool)' }, -{ amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', - amprocrighttype => 'interval', amprocnum => '4', amproc => 'btequalimage' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', amprocrighttype => 'macaddr', amprocnum => '1', amproc => 'macaddr_cmp' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', diff --git a/src/include/catalog/pg_opfamily.dat b/src/include/catalog/pg_opfamily.dat index 91587b9..81a8525 100644 --- a/src/include/catalog/pg_opfamily.dat +++ b/src/include/catalog/pg_opfamily.dat @@ -50,7 +50,7 @@ opfmethod => 'btree', opfname => 'integer_ops' }, { oid => '1977', opfmethod => 'hash', opfname => 'integer_ops' }, -{ oid => '1982', +{ oid => '1982', oid_symbol => 'INTERVAL_BTREE_FAM_OID', opfmethod => 'btree', opfname => 'interval_ops' }, { oid => '1983', opfmethod => 'hash', opfname => 'interval_ops' }, diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index a1bdf2c..7a6f36a 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -2208,6 +2208,7 @@ ORDER BY 1, 2, 3; | array_ops | array_ops | anyarray | float_ops | float4_ops | real | float_ops | float8_ops | double precision + | interval_ops | interval_ops | interval | jsonb_ops | jsonb_ops | jsonb | multirange_ops | multirange_ops | anymultirange | numeric_ops | numeric_ops | numeric @@ -2216,7 +2217,7 @@ ORDER BY 1, 2, 3; | record_ops | record_ops | record | tsquery_ops | tsquery_ops | tsquery | tsvector_ops | tsvector_ops | tsvector -(15 rows) +(16 rows) -- **************** pg_index **************** -- Look for illegal values in pg_index fields.