The btequalimage() header comment says: * Generic "equalimage" support function. * * B-Tree operator classes whose equality function could safely be replaced by * datum_image_eq() in all cases can use this as their "equalimage" support * function.
interval_ops, however, recognizes equal-but-distinguishable values: create temp table t (c interval); insert into t values ('1d'::interval), ('24h'); table t; select distinct c from t; The CREATE INDEX of the following test: begin; create table t (c interval); insert into t select x from generate_series(1,500), (values ('1 year 1 month'::interval), ('1 year 30 days')) t(x); select distinct c from t; create index ti on t (c); rollback; Fails with: 2498151 2023-10-10 05:06:46.177 GMT DEBUG: building index "ti" on table "t" serially 2498151 2023-10-10 05:06:46.178 GMT DEBUG: index "ti" can safely use deduplication TRAP: failed Assert("!itup_key->allequalimage || keepnatts == _bt_keep_natts_fast(rel, lastleft, firstright)"), File: "nbtutils.c", Line: 2443, PID: 2498151 I've also caught btree posting lists where one TID refers to a '1d' heap tuple, while another TID refers to a '24h' heap tuple. amcheck complains. Index-only scans can return the '1d' bits where the actual tuple had the '24h' bits. Are there other consequences to highlight in the release notes? The back-branch patch is larger, to fix things without initdb. Hence, I'm attaching patches for HEAD and for v16 (trivial to merge back from there). I glanced at the other opfamilies permitting deduplication, and they look okay: [local] test=*# select amproc, amproclefttype = amprocrighttype as l_eq_r, array_agg(array[opfname, amproclefttype::regtype::text]) from pg_amproc join pg_opfamily f on amprocfamily = f.oid where amprocnum = 4 and opfmethod = 403 group by 1,2; ─[ RECORD 1 ]─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── amproc │ btequalimage l_eq_r │ t array_agg │ {{bit_ops,bit},{bool_ops,boolean},{bytea_ops,bytea},{char_ops,"\"char\""},{datetime_ops,date},{datetime_ops,"timestamp without time zone"},{datetime_ops,"timestamp with time zone"},{network_ops,inet},{integer_ops,smallint},{integer_ops,integer},{integer_ops,bigint},{interval_ops,interval},{macaddr_ops,macaddr},{oid_ops,oid},{oidvector_ops,oidvector},{time_ops,"time without time zone"},{timetz_ops,"time with time zone"},{varbit_ops,"bit varying"},{text_pattern_ops,text},{bpchar_pattern_ops,character},{money_ops,money},{tid_ops,tid},{uuid_ops,uuid},{pg_lsn_ops,pg_lsn},{macaddr8_ops,macaddr8},{enum_ops,anyenum},{xid8_ops,xid8}} ─[ RECORD 2 ]─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── amproc │ btvarstrequalimage l_eq_r │ t array_agg │ {{bpchar_ops,character},{text_ops,text},{text_ops,name}} Thanks, nm
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Dissociate btequalimage() from interval_ops, ending its deduplication. Under interval_ops, some equal values are distinguishable. One such pair is '24:00:00' and '1 day'. With that being so, btequalimage() breaches the documented contract for the "equalimage" btree support function. This can cause incorrect results from index-only scans. Users should REINDEX any indexes having interval-type columns and for which "pg_amcheck --heapallindexed" reports an error. This fix makes interval_ops simply omit the support function, like numeric_ops does. Back-pack to v13, where btequalimage() first appeared. In back branches, for the benefit of old catalog content, btequalimage() code will return false for type "interval". Going forward, back-branch initdb will include the catalog change. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat index 5b95012..4c70da4 100644 --- a/src/include/catalog/pg_amproc.dat +++ b/src/include/catalog/pg_amproc.dat @@ -172,8 +172,6 @@ { amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', amprocrighttype => 'interval', amprocnum => '3', amproc => 'in_range(interval,interval,interval,bool,bool)' }, -{ amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', - amprocrighttype => 'interval', amprocnum => '4', amproc => 'btequalimage' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', amprocrighttype => 'macaddr', amprocnum => '1', amproc => 'macaddr_cmp' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index a1bdf2c..7a6f36a 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -2208,6 +2208,7 @@ ORDER BY 1, 2, 3; | array_ops | array_ops | anyarray | float_ops | float4_ops | real | float_ops | float8_ops | double precision + | interval_ops | interval_ops | interval | jsonb_ops | jsonb_ops | jsonb | multirange_ops | multirange_ops | anymultirange | numeric_ops | numeric_ops | numeric @@ -2216,7 +2217,7 @@ ORDER BY 1, 2, 3; | record_ops | record_ops | record | tsquery_ops | tsquery_ops | tsquery | tsvector_ops | tsvector_ops | tsvector -(15 rows) +(16 rows) -- **************** pg_index **************** -- Look for illegal values in pg_index fields.
Author: Noah Misch <n...@leadboat.com> Commit: Noah Misch <n...@leadboat.com> Dissociate btequalimage() from interval_ops, ending its deduplication. Under interval_ops, some equal values are distinguishable. One such pair is '24:00:00' and '1 day'. With that being so, btequalimage() breaches the documented contract for the "equalimage" btree support function. This can cause incorrect results from index-only scans. Users should REINDEX any indexes having interval-type columns and for which "pg_amcheck --heapallindexed" reports an error. This fix makes interval_ops simply omit the support function, like numeric_ops does. Back-pack to v13, where btequalimage() first appeared. In back branches, for the benefit of old catalog content, btequalimage() code will return false for type "interval". Going forward, back-branch initdb will include the catalog change. Reviewed by FIXME. Discussion: https://postgr.es/m/FIXME diff --git a/src/backend/utils/adt/datum.c b/src/backend/utils/adt/datum.c index 9f06ee7..251dd23 100644 --- a/src/backend/utils/adt/datum.c +++ b/src/backend/utils/adt/datum.c @@ -43,6 +43,7 @@ #include "postgres.h" #include "access/detoast.h" +#include "catalog/pg_type_d.h" #include "common/hashfn.h" #include "fmgr.h" #include "utils/builtins.h" @@ -385,20 +386,17 @@ datum_image_hash(Datum value, bool typByVal, int typLen) * datum_image_eq() in all cases can use this as their "equalimage" support * function. * - * Currently, we unconditionally assume that any B-Tree operator class that - * registers btequalimage as its support function 4 must be able to safely use - * optimizations like deduplication (i.e. we return true unconditionally). If - * it ever proved necessary to rescind support for an operator class, we could - * do that in a targeted fashion by doing something with the opcintype - * argument. + * Earlier minor releases erroneously associated this function with + * interval_ops. Detect that case to rescind deduplication support, without + * requiring initdb. *------------------------------------------------------------------------- */ Datum btequalimage(PG_FUNCTION_ARGS) { - /* Oid opcintype = PG_GETARG_OID(0); */ + Oid opcintype = PG_GETARG_OID(0); - PG_RETURN_BOOL(true); + PG_RETURN_BOOL(opcintype != INTERVALOID); } /*------------------------------------------------------------------------- diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat index 5b95012..4c70da4 100644 --- a/src/include/catalog/pg_amproc.dat +++ b/src/include/catalog/pg_amproc.dat @@ -172,8 +172,6 @@ { amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', amprocrighttype => 'interval', amprocnum => '3', amproc => 'in_range(interval,interval,interval,bool,bool)' }, -{ amprocfamily => 'btree/interval_ops', amproclefttype => 'interval', - amprocrighttype => 'interval', amprocnum => '4', amproc => 'btequalimage' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', amprocrighttype => 'macaddr', amprocnum => '1', amproc => 'macaddr_cmp' }, { amprocfamily => 'btree/macaddr_ops', amproclefttype => 'macaddr', diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index a1bdf2c..7a6f36a 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -2208,6 +2208,7 @@ ORDER BY 1, 2, 3; | array_ops | array_ops | anyarray | float_ops | float4_ops | real | float_ops | float8_ops | double precision + | interval_ops | interval_ops | interval | jsonb_ops | jsonb_ops | jsonb | multirange_ops | multirange_ops | anymultirange | numeric_ops | numeric_ops | numeric @@ -2216,7 +2217,7 @@ ORDER BY 1, 2, 3; | record_ops | record_ops | record | tsquery_ops | tsquery_ops | tsquery | tsvector_ops | tsvector_ops | tsvector -(15 rows) +(16 rows) -- **************** pg_index **************** -- Look for illegal values in pg_index fields.