Hi,

Ashutosh Bapat reported me off-list a possible issue in how BRIN
minmax-multi calculate distance for infinite timestamp/date values.

The current code does this:

    if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
        PG_RETURN_FLOAT8(0);

so means infinite values are "very close" to any other value, and thus
likely to be merged into a summary range. That's exactly the opposite of
what we want to do, possibly resulting in inefficient indexes.

Consider this example

    create table test (a timestamptz) with (fillfactor=50);

    insert into test
    select (now() + ((10000 * random())::int || ' seconds')::interval)
      from generate_series(1,1000000) s(i);

    update test set a = '-infinity'::timestamptz where random() < 0.01;
    update test set a = 'infinity'::timestamptz where random() < 0.01;

    explain (analyze, timing off, costs off)
     select * from test where a = '2024-01-01'::timestamptz;

                                  QUERY PLAN

------------------------------------------------------------------------------
 Bitmap Heap Scan on test (actual rows=0 loops=1)
   Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
   Rows Removed by Index Recheck: 680662
   Heap Blocks: lossy=6024
   ->  Bitmap Index Scan on test_a_idx (actual rows=60240 loops=1)
         Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
 Planning Time: 0.075 ms
 Execution Time: 106.871 ms
(8 rows)

Clearly, large part of the table gets scanned - this happens because
when building the index, we end up with ranges like this:


    [-infinity,a,b,c,...,x,y,z,infinity]

and we conclude that distance for [-infinity,a] is 0, and we combine
these values into a range. And the same for [z,infinity]. But we should
do exactly the opposite thing - never merge those.

Attached is a patch fixing this, with which the plan looks like this:

                                  QUERY PLAN

------------------------------------------------------------------------------
 Bitmap Heap Scan on test (actual rows=0 loops=1)
   Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
   ->  Bitmap Index Scan on test_a_idx (actual rows=0 loops=1)
         Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
 Planning Time: 0.289 ms
 Execution Time: 9.432 ms
(6 rows)

Which seems much better.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index f8b2a3f9bc..c8775c274e 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2084,8 +2084,14 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
+	/*
+	 * If either value is infinite, we treat them as in infinite distance.
+	 * We deduplicate the values before calculating distances for them, so
+	 * either one value is finite, or the sign is different - so the
+	 * inifinite distance is appropriate for both cases.
+	 */
 	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
-		PG_RETURN_FLOAT8(0);
+		PG_RETURN_FLOAT8(get_float8_infinity());
 
 	PG_RETURN_FLOAT8(dateVal1 - dateVal2);
 }
@@ -2141,8 +2147,14 @@ brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
 	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
 
+	/*
+	 * If either value is infinite, we treat them as in infinite distance.
+	 * We deduplicate the values before calculating distances for them, so
+	 * either one value is finite, or the sign is different - so the
+	 * inifinite distance is appropriate for both cases.
+	 */
 	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
-		PG_RETURN_FLOAT8(0);
+		PG_RETURN_FLOAT8(get_float8_infinity());
 
 	delta = dt2 - dt1;
 

Reply via email to