Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-29 Thread Ashutosh Bapat
On Fri, Oct 27, 2023 at 10:32 PM Tomas Vondra
 wrote:
>
> FWIW I've cleaned up and pushed all the patches we came up with this
> thread. And I've backpatched all of them to 14+.
>

Thanks a lot Tomas.

-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-27 Thread Tomas Vondra
FWIW I've cleaned up and pushed all the patches we came up with this
thread. And I've backpatched all of them to 14+.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-22 Thread Tomas Vondra
On 10/20/23 11:52, Ashutosh Bapat wrote:
> On Thu, Oct 19, 2023 at 6:11 PM Ashutosh Bapat
>  wrote:
>>
>> I think we should provide generate_series(date, date, integer) which
>> will use date + integer -> date.
> 
> Just to be clear, I don't mean that this patch should add it.
> 

I'm not against adding such generate_series() variant. For this patch
I'll use something like the query you proposed, I think.

I was thinking about the (date + interval) failure a bit more, and while
I think it's confusing it's not quite wrong. The problem is that the
interval may have hours/minutes, so it makes sense that the operator
returns timestamp. That's not what most operators do, where the data
type does not change. So a bit unexpected, but seems correct.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-20 Thread Ashutosh Bapat
On Thu, Oct 19, 2023 at 6:11 PM Ashutosh Bapat
 wrote:
>
> I think we should provide generate_series(date, date, integer) which
> will use date + integer -> date.

Just to be clear, I don't mean that this patch should add it.

-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Ashutosh Bapat
On Thu, Oct 19, 2023 at 4:51 PM Tomas Vondra
 wrote:
>
> On 10/19/23 11:22, Ashutosh Bapat wrote:
> > On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
> >  wrote:
> >
> >>
> >> Does that explain the algorithm? I'm not against clarifying the comment,
> >> of course.
> >
> > Thanks a lot for this explanation. It's clear now.
> >
> >> I tried to do that, but I ran into troubles with the "date" tests. I
> >> needed to build values that close to the min/max values, so I did
> >> something like
> >>
> >> SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
> >> generate_series(1,10) s(i);
> >>
> >> And then the same for the max date, but that fails because of the
> >> date/timestamp conversion in date plus operator.
> >>
> >> However, maybe two simple generate_series() would work ...
> >>
> >
> > Something like this? select i::date from  generate_series('4713-02-01
> > BC'::date,  '4713-01-01 BC'::date, '-1 day'::interval) i;
>
> That works, but if you try the same thing with the largest date, that'll
> fail
>
>   select i::date from  generate_series('5874896-12-01'::date,
>'5874897-01-01'::date,
>'1 day'::interval) i;
>
>   ERROR:  date out of range for timestamp

Hmm, I see. This uses generate_series(timestamp, timestamp, interval) version.

date + integer -> date though, so the following works. It's also an
example at https://www.postgresql.org/docs/16/functions-srf.html.
#SELECT '5874896-12-01'::date + i FROM
generate_series(1,10) s(i);

I think we should provide generate_series(date, date, integer) which
will use date + integer -> date.

-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Tomas Vondra
On 10/19/23 11:22, Ashutosh Bapat wrote:
> On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
>  wrote:
> 
>>
>> Does that explain the algorithm? I'm not against clarifying the comment,
>> of course.
> 
> Thanks a lot for this explanation. It's clear now.
> 
>> I tried to do that, but I ran into troubles with the "date" tests. I
>> needed to build values that close to the min/max values, so I did
>> something like
>>
>> SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
>> generate_series(1,10) s(i);
>>
>> And then the same for the max date, but that fails because of the
>> date/timestamp conversion in date plus operator.
>>
>> However, maybe two simple generate_series() would work ...
>>
> 
> Something like this? select i::date from  generate_series('4713-02-01
> BC'::date,  '4713-01-01 BC'::date, '-1 day'::interval) i;

That works, but if you try the same thing with the largest date, that'll
fail

  select i::date from  generate_series('5874896-12-01'::date,
   '5874897-01-01'::date,
   '1 day'::interval) i;

  ERROR:  date out of range for timestamp


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Ashutosh Bapat
On Thu, Oct 19, 2023 at 2:31 PM Tomas Vondra
 wrote:

>
> Does that explain the algorithm? I'm not against clarifying the comment,
> of course.

Thanks a lot for this explanation. It's clear now.

> I tried to do that, but I ran into troubles with the "date" tests. I
> needed to build values that close to the min/max values, so I did
> something like
>
> SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
> generate_series(1,10) s(i);
>
> And then the same for the max date, but that fails because of the
> date/timestamp conversion in date plus operator.
>
> However, maybe two simple generate_series() would work ...
>

Something like this? select i::date from  generate_series('4713-02-01
BC'::date,  '4713-01-01 BC'::date, '-1 day'::interval) i;
   i
---
 4713-02-01 BC
 4713-01-31 BC
 4713-01-30 BC
 4713-01-29 BC
 4713-01-28 BC
 4713-01-27 BC
 4713-01-26 BC
 4713-01-25 BC
 4713-01-24 BC
 4713-01-23 BC
 4713-01-22 BC
 4713-01-21 BC
 4713-01-20 BC
 4713-01-19 BC
 4713-01-18 BC
 4713-01-17 BC
 4713-01-16 BC
 4713-01-15 BC
 4713-01-14 BC
 4713-01-13 BC
 4713-01-12 BC
 4713-01-11 BC
 4713-01-10 BC
 4713-01-09 BC
 4713-01-08 BC
 4713-01-07 BC
 4713-01-06 BC
 4713-01-05 BC
 4713-01-04 BC
 4713-01-03 BC
 4713-01-02 BC
 4713-01-01 BC
(32 rows)

-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Tomas Vondra



On 10/19/23 09:04, Dean Rasheed wrote:
> On Thu, 19 Oct 2023, 05:32 Ashutosh Bapat,  > wrote:
> 
> On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
>  > wrote:
> 
> >
> > I did use that many values to actually force "compaction" and
> merging of
> > points into ranges. We only keep 32 values per page range, so with 2
> > values we'll not build a range. You're right it may still trigger the
> > overflow (we probably still calculate distances, I didn't realize
> that),
> > but without the compaction we can't check the query plans.
> >
> > However, I agree 60 values may be a bit too much. And I realized
> we can
> > reduce the count quite a bit by using the values_per_range option. We
> > could set it to 8 (which is the minimum).
> >
> 
> I haven't read BRIN code, except the comments in the beginning of the
> file. From what you describe it seems we will store first 32 values as
> is, but later as the number of values grow create ranges from those?
> Please point me to the relevant source code/documentation. Anyway, if
> we can reduce the number of rows we insert, that will be good.
> 
> 
> I don't think 60 values is excessive, but instead of listing them out by
> hand, perhaps use generate_series().
> 

I tried to do that, but I ran into troubles with the "date" tests. I
needed to build values that close to the min/max values, so I did
something like

SELECT '4713-01-01 BC'::date + (i || ' days')::interval FROM
generate_series(1,10) s(i);

And then the same for the max date, but that fails because of the
date/timestamp conversion in date plus operator.

However, maybe two simple generate_series() would work ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Tomas Vondra



On 10/19/23 06:32, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
>  wrote:
> 
>>
>> I did use that many values to actually force "compaction" and merging of
>> points into ranges. We only keep 32 values per page range, so with 2
>> values we'll not build a range. You're right it may still trigger the
>> overflow (we probably still calculate distances, I didn't realize that),
>> but without the compaction we can't check the query plans.
>>
>> However, I agree 60 values may be a bit too much. And I realized we can
>> reduce the count quite a bit by using the values_per_range option. We
>> could set it to 8 (which is the minimum).
>>
> 
> I haven't read BRIN code, except the comments in the beginning of the
> file. From what you describe it seems we will store first 32 values as
> is, but later as the number of values grow create ranges from those?
> Please point me to the relevant source code/documentation. Anyway, if
> we can reduce the number of rows we insert, that will be good.
> 

I don't think we have documentation other than what's at the beginning
of the file. What the comment tries to explain is that the summary has a
maximum size (32 values by default), and each value can be either a
point or a range. A point requires one value, range requires two. So we
accumulate values one by one - until 32 values that's fine. Once we get
33, we have to merge some of the points into ranges, and we do that in a
greedy way by distance.

For example, this may happen:

33 values
-> 31 values + 1 range  [requires 33]
-> 30 values + 1 range  [requires 32]
...

The exact steps depend on which values/ranges are picked for the merge,
of course. In any case, there's no difference between the initial 32
values and the values added later.

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

>>>
>>
>> Not sure what you mean by "crash". Yes, it doesn't hit an assert,
>> because there's none when calculating distance for date. It however
>> should fail in the query plan check due to the incorrect order of
>> subtractions.
>>
>> Also, the commit message does not claim to fix overflow. In fact, it
>> says it can't overflow ...
>>
> 
> 
> Reading the commit message
> "Tests for overflows with dates and timestamps in BRIN ...
> 
> ...
> 
> The new regression tests check this for date and timestamp data types.
> It adds tables with data close to the allowed min/max values, and builds
> a minmax-multi index on it."
> 
> I expected the CREATE INDEX statement to throw an error or fail the
> "Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
> later commit mentions that the overflow is not possible.
> 

Hmmm, yeah. The comment should mention the date doesn't have issue with
overflows, but other bugs.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-19 Thread Dean Rasheed
On Thu, 19 Oct 2023, 05:32 Ashutosh Bapat, 
wrote:

> On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
>  wrote:
>
> >
> > I did use that many values to actually force "compaction" and merging of
> > points into ranges. We only keep 32 values per page range, so with 2
> > values we'll not build a range. You're right it may still trigger the
> > overflow (we probably still calculate distances, I didn't realize that),
> > but without the compaction we can't check the query plans.
> >
> > However, I agree 60 values may be a bit too much. And I realized we can
> > reduce the count quite a bit by using the values_per_range option. We
> > could set it to 8 (which is the minimum).
> >
>
> I haven't read BRIN code, except the comments in the beginning of the
> file. From what you describe it seems we will store first 32 values as
> is, but later as the number of values grow create ranges from those?
> Please point me to the relevant source code/documentation. Anyway, if
> we can reduce the number of rows we insert, that will be good.
>

I don't think 60 values is excessive, but instead of listing them out by
hand, perhaps use generate_series().

Regards,
Dean


Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Ashutosh Bapat
On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
 wrote:

>
> I did use that many values to actually force "compaction" and merging of
> points into ranges. We only keep 32 values per page range, so with 2
> values we'll not build a range. You're right it may still trigger the
> overflow (we probably still calculate distances, I didn't realize that),
> but without the compaction we can't check the query plans.
>
> However, I agree 60 values may be a bit too much. And I realized we can
> reduce the count quite a bit by using the values_per_range option. We
> could set it to 8 (which is the minimum).
>

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

> >
>
> Not sure what you mean by "crash". Yes, it doesn't hit an assert,
> because there's none when calculating distance for date. It however
> should fail in the query plan check due to the incorrect order of
> subtractions.
>
> Also, the commit message does not claim to fix overflow. In fact, it
> says it can't overflow ...
>


Reading the commit message
"Tests for overflows with dates and timestamps in BRIN ...

...

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it."

I expected the CREATE INDEX statement to throw an error or fail the
"Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
later commit mentions that the overflow is not possible.

> >
> > We may want to combine various test cases though. Like the test adding
> > infinity and extreme values could be combined. Also the number of
> > values it inserts in the table for the reasons stated above.
> >
>
> I prefer not to do that. I find it more comprehensible to keep the tests
> separate / testing different things. If the tests were expensive to
> setup or something like that, that'd be a different situation.

Fair enough.

> >>
> >
> > Right. Do we highlight that in the commit message so that the person
> > writing release notes picks it up from there?
> >
>
> Yes, I think I'll mention what impact each issue can have on indexes.

Thanks.


-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra
On 10/18/23 12:47, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
>  wrote:
>>
>> Hi,
>>
>> Here's a couple cleaned-up patches fixing the various discussed here.
>> I've tried to always add a regression test demonstrating the issue
>> first, and then fix it in the next patch.
> 
> It will be good to commit the test changes as well.
> 

I do plan to commit them, ofc. I was just explaining why I'm adding the
tests first, and then fixing the issue in a separate commit.

>>
>> In particular, this deals with these issues:
>>
>> 1) overflows in distance calculation for large timestamp values (0002)
> 
> I could reduce the SQL for timestamp overflow test to just
> -- test overflows during CREATE INDEX with extreme timestamp values
> CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);
> 
> SET datestyle TO iso;
> 
> INSERT INTO brin_timestamp_test VALUES
> ('4713-01-01 00:00:30 BC'),
> ('294276-12-01 00:00:01');
> 
> CREATE INDEX ON brin_timestamp_test USING brin (a
> timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
> 
> I didn't understand the purpose of adding 60 odd values to the table.
> It didn't tell which of those values triggers the overflow. Minimal
> set above is much easier to understand IMO. Using a temporary table
> just avoids DROP TABLE statement. But I am ok if you want to use
> non-temporary table with DROP.
> 
> Code changes in 0002 look fine. Do we want to add a comment "cast to a
> wider datatype to avoid overflow"? Or is that too explicit?
> 
> The code changes fix the timestamp issue but there's a diff in case of
> 

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

>>
>> 2) incorrect subtraction in distance for date values (0003)
> 
> The test case for date brin index didn't crash though. Even after
> applying 0003 patch. The reason why date subtraction can't overflow is
> a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
> because of the code below
> #define IS_VALID_DATE(d) \
> ((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
> (d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
> This prevents the lower side to be well within the negative int32
> overflow threshold and we always subtract higher value from the lower
> one. May be good to elaborate this? A later patch does use float 8
> casting eliminating "any" possibility of overflow. So the comment may
> not be necessary after squashing all the changes.
> 

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

>>
>> 3) incorrect distance for infinite date/timestamp values (0005)
> 
> The tests could use a minimal set of rows here too.
> 
> The code changes look fine and fix the problem seen with the tests alone.
> 

OK

>>
>> 4) failing distance for extreme interval values (0007)
> 
> I could reproduce the issue with a minimal set of values
> -- test handling of overflow for interval values
> CREATE TABLE brin_interval_test(a INTERVAL);
> 
> INSERT INTO brin_interval_test VALUES
> ('17785 years'),
> ('-17800 years');
> 
> CREATE INDEX ON brin_interval_test USING brin (a
> interval_minmax_multi_ops) WITH (pages_per_range=1);
> DROP TABLE brin_interval_test;
> 
> The code looks fine and fixed the issue seen with the test.
> 
> We may want to combine various test cases though. Like the test adding
> infinity and extreme values could be combined. Also the number of
> values it inserts in the table for the reasons stated above.
> 

I prefer not to do that. I find it more comprehensible to keep the tests
separate / testing different things. If the tests were expensive to
setup or something like that, that'd be a different situation.

>>
>> All the problems except "2" have been discussed earlier, but this seems
>> a bit more serious than the other issues, as it's easier to hit. It
>> subtracts the values in the opposite order (smaller - larger), so the
>> distances are negated. Which means we actually merge the values from the
>> most distant ones, and thus are "guaranteed" to build very a very
>> inefficient summary. People with multi-minmax indexes on "date" columns
>> probably will need to reindex.
>>
> 
> Right. Do we highlight that in the commit message so that the person
> writing release notes picks it up from there?
> 

Yes, I think I'll mention 

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra



On 10/18/23 12:13, Dean Rasheed wrote:
> On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
>  wrote:
>>
>> Here's a couple cleaned-up patches fixing the various discussed here.
>> I've tried to always add a regression test demonstrating the issue
>> first, and then fix it in the next patch.
>>
> 
> This looks good to me.
> 
>> 2) incorrect subtraction in distance for date values (0003)
>>
>> All the problems except "2" have been discussed earlier, but this seems
>> a bit more serious than the other issues, as it's easier to hit. It
>> subtracts the values in the opposite order (smaller - larger), so the
>> distances are negated. Which means we actually merge the values from the
>> most distant ones, and thus are "guaranteed" to build very a very
>> inefficient summary.
>>
> 
> Yeah, that's not good. Amusingly this accidentally made infinite dates
> behave correctly, since they were distance 0 away from anything else,
> which was larger than all the other negative distances! But yes, that
> needed fixing properly.
> 

Right. Apparently two wrongs can make a right, after all ;-)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Dean Rasheed
On Wed, 18 Oct 2023 at 09:16, Tomas Vondra
 wrote:
>
> BTW when adding the tests with extreme values, I noticed this:
>
>   test=# select '5874897-01-01'::date;
>date
>   ---
>5874897-01-01
>   (1 row)
>
>   test=# select '5874897-01-01'::date + '1 second'::interval;
>   ERROR:  date out of range for timestamp
>

That's correct because date + interval returns timestamp, and the
value is out of range for a timestamp. This is equivalent to:

select '5874897-01-01'::date::timestamp + '1 second'::interval;
ERROR:  date out of range for timestamp

and I think it's good that it gives a different error from this:

select '294276-01-01'::date::timestamp + '1 year'::interval;
ERROR:  timestamp out of range

so you can tell that the overflow in the first case happens before the addition.

Of course a side effect of internally casting first is that you can't
do things like this:

select '5874897-01-01'::date - '5872897 years'::interval;
ERROR:  date out of range for timestamp

which arguably ought to return '2000-01-01 00:00:00'. In practice
though, I think it would be far more trouble than it's worth trying to
change that.

Regards,
Dean




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Ashutosh Bapat
On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
 wrote:
>
> Hi,
>
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.

It will be good to commit the test changes as well.

>
> In particular, this deals with these issues:
>
> 1) overflows in distance calculation for large timestamp values (0002)

I could reduce the SQL for timestamp overflow test to just
-- test overflows during CREATE INDEX with extreme timestamp values
CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);

SET datestyle TO iso;

INSERT INTO brin_timestamp_test VALUES
('4713-01-01 00:00:30 BC'),
('294276-12-01 00:00:01');

CREATE INDEX ON brin_timestamp_test USING brin (a
timestamptz_minmax_multi_ops) WITH (pages_per_range=1);

I didn't understand the purpose of adding 60 odd values to the table.
It didn't tell which of those values triggers the overflow. Minimal
set above is much easier to understand IMO. Using a temporary table
just avoids DROP TABLE statement. But I am ok if you want to use
non-temporary table with DROP.

Code changes in 0002 look fine. Do we want to add a comment "cast to a
wider datatype to avoid overflow"? Or is that too explicit?

The code changes fix the timestamp issue but there's a diff in case of

>
> 2) incorrect subtraction in distance for date values (0003)

The test case for date brin index didn't crash though. Even after
applying 0003 patch. The reason why date subtraction can't overflow is
a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
because of the code below
#define IS_VALID_DATE(d) \
((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
(d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
This prevents the lower side to be well within the negative int32
overflow threshold and we always subtract higher value from the lower
one. May be good to elaborate this? A later patch does use float 8
casting eliminating "any" possibility of overflow. So the comment may
not be necessary after squashing all the changes.

>
> 3) incorrect distance for infinite date/timestamp values (0005)

The tests could use a minimal set of rows here too.

The code changes look fine and fix the problem seen with the tests alone.

>
> 4) failing distance for extreme interval values (0007)

I could reproduce the issue with a minimal set of values
-- test handling of overflow for interval values
CREATE TABLE brin_interval_test(a INTERVAL);

INSERT INTO brin_interval_test VALUES
('17785 years'),
('-17800 years');

CREATE INDEX ON brin_interval_test USING brin (a
interval_minmax_multi_ops) WITH (pages_per_range=1);
DROP TABLE brin_interval_test;

The code looks fine and fixed the issue seen with the test.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

>
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary. People with multi-minmax indexes on "date" columns
> probably will need to reindex.
>

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

--
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Dean Rasheed
On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
 wrote:
>
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.
>

This looks good to me.

> 2) incorrect subtraction in distance for date values (0003)
>
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary.
>

Yeah, that's not good. Amusingly this accidentally made infinite dates
behave correctly, since they were distance 0 away from anything else,
which was larger than all the other negative distances! But yes, that
needed fixing properly.

Thanks for taking care of this.

Regards,
Dean




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra
On 10/17/23 22:25, Tomas Vondra wrote:
> Hi,
> 
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.
> 
> In particular, this deals with these issues:
> 
> 1) overflows in distance calculation for large timestamp values (0002)
> 
> 2) incorrect subtraction in distance for date values (0003)
> 
> 3) incorrect distance for infinite date/timestamp values (0005)
> 
> 4) failing distance for extreme interval values (0007)
> 
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary. People with multi-minmax indexes on "date" columns
> probably will need to reindex.
> 

BTW when adding the tests with extreme values, I noticed this:

  test=# select '5874897-01-01'::date;
   date
  ---
   5874897-01-01
  (1 row)

  test=# select '5874897-01-01'::date + '1 second'::interval;
  ERROR:  date out of range for timestamp

IIUC this happens because the first thing date_pl_interval does is
date2timestamp, ignoring the fact that the ranges of those data types
are different - dates allow values up to '5874897 AD' while timestamps
only allows values up to '294276 AD'.

This seems to be a long-standing behavior, added by a9e08392dd6f in
2004. Not sure how serious it is, I just noticed when I tried to do
arithmetics on the extreme values in tests.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-17 Thread Tomas Vondra
Hi,

Here's a couple cleaned-up patches fixing the various discussed here.
I've tried to always add a regression test demonstrating the issue
first, and then fix it in the next patch.

In particular, this deals with these issues:

1) overflows in distance calculation for large timestamp values (0002)

2) incorrect subtraction in distance for date values (0003)

3) incorrect distance for infinite date/timestamp values (0005)

4) failing distance for extreme interval values (0007)

All the problems except "2" have been discussed earlier, but this seems
a bit more serious than the other issues, as it's easier to hit. It
subtracts the values in the opposite order (smaller - larger), so the
distances are negated. Which means we actually merge the values from the
most distant ones, and thus are "guaranteed" to build very a very
inefficient summary. People with multi-minmax indexes on "date" columns
probably will need to reindex.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom e23ca4e53d352e05951fb314dea347682794c25b Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 17 Oct 2023 18:18:53 +0200
Subject: [PATCH 1/8] Tests for overflows with dates and timestamps in BRIN

When calculating distances for date and timestamp values for BRIN
minmax-multi indexes, we need to be careful about overflows for extreme
values. In that case the distance is negative, resulting in building of
inefficient summaries.

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it.
---
 src/test/regress/expected/brin_multi.out | 61 ++
 src/test/regress/sql/brin_multi.sql  | 65 
 2 files changed, 126 insertions(+)

diff --git a/src/test/regress/expected/brin_multi.out b/src/test/regress/expected/brin_multi.out
index 9f46934c9be..d5bd600f8fd 100644
--- a/src/test/regress/expected/brin_multi.out
+++ b/src/test/regress/expected/brin_multi.out
@@ -823,3 +823,64 @@ SELECT COUNT(*) FROM brin_test_multi_2 WHERE a = 'aab32389-22bc-c25a-6f60-6eb525
 
 DROP TABLE brin_test_multi_2;
 RESET enable_seqscan;
+-- test overflows during CREATE INDEX with extreme timestamp values
+CREATE TABLE brin_timestamp_test(a TIMESTAMPTZ);
+SET datestyle TO iso;
+INSERT INTO brin_timestamp_test VALUES
+('4713-01-01 00:00:01 BC'), ('4713-01-01 00:00:02 BC'), ('4713-01-01 00:00:03 BC'),
+('4713-01-01 00:00:04 BC'), ('4713-01-01 00:00:05 BC'), ('4713-01-01 00:00:06 BC'),
+('4713-01-01 00:00:07 BC'), ('4713-01-01 00:00:08 BC'), ('4713-01-01 00:00:09 BC'),
+('4713-01-01 00:00:10 BC'), ('4713-01-01 00:00:11 BC'), ('4713-01-01 00:00:12 BC'),
+('4713-01-01 00:00:13 BC'), ('4713-01-01 00:00:14 BC'), ('4713-01-01 00:00:15 BC'),
+('4713-01-01 00:00:16 BC'), ('4713-01-01 00:00:17 BC'), ('4713-01-01 00:00:18 BC'),
+('4713-01-01 00:00:19 BC'), ('4713-01-01 00:00:20 BC'), ('4713-01-01 00:00:21 BC'),
+('4713-01-01 00:00:22 BC'), ('4713-01-01 00:00:23 BC'), ('4713-01-01 00:00:24 BC'),
+('4713-01-01 00:00:25 BC'), ('4713-01-01 00:00:26 BC'), ('4713-01-01 00:00:27 BC'),
+('4713-01-01 00:00:28 BC'), ('4713-01-01 00:00:29 BC'), ('4713-01-01 00:00:30 BC'),
+('294276-12-01 00:00:01'), ('294276-12-01 00:00:02'), ('294276-12-01 00:00:03'),
+('294276-12-01 00:00:04'), ('294276-12-01 00:00:05'), ('294276-12-01 00:00:06'),
+('294276-12-01 00:00:07'), ('294276-12-01 00:00:08'), ('294276-12-01 00:00:09'),
+('294276-12-01 00:00:10'), ('294276-12-01 00:00:11'), ('294276-12-01 00:00:12'),
+('294276-12-01 00:00:13'), ('294276-12-01 00:00:14'), ('294276-12-01 00:00:15'),
+('294276-12-01 00:00:16'), ('294276-12-01 00:00:17'), ('294276-12-01 00:00:18'),
+('294276-12-01 00:00:19'), ('294276-12-01 00:00:20'), ('294276-12-01 00:00:21'),
+('294276-12-01 00:00:22'), ('294276-12-01 00:00:23'), ('294276-12-01 00:00:24'),
+('294276-12-01 00:00:25'), ('294276-12-01 00:00:26'), ('294276-12-01 00:00:27'),
+('294276-12-01 00:00:28'), ('294276-12-01 00:00:29'), ('294276-12-01 00:00:30');
+CREATE INDEX ON brin_timestamp_test USING brin (a timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
+DROP TABLE brin_timestamp_test;
+-- test overflows during CREATE INDEX with extreme date values
+CREATE TABLE brin_date_test(a DATE);
+INSERT INTO brin_date_test VALUES
+('4713-01-01 BC'), ('4713-01-02 BC'), ('4713-01-03 BC'), ('4713-01-04 BC'),
+('4713-01-05 BC'), ('4713-01-06 BC'), ('4713-01-07 BC'), ('4713-01-08 BC'),
+('4713-01-09 BC'), ('4713-01-10 BC'), ('4713-01-11 BC'), ('4713-01-12 BC'),
+('4713-01-13 BC'), ('4713-01-14 BC'), ('4713-01-15 BC'), ('4713-01-16 BC'),
+('4713-01-17 BC'), ('4713-01-18 BC'), ('4713-01-19 BC'), ('4713-01-20 BC'),
+('4713-01-21 BC'), ('4713-01-22 BC'), ('4713-01-23 BC'), ('4713-01-24 BC'),
+('4713-01-25 BC'), ('4713-01-26 BC'), ('4713-01-27 BC'), ('4713-01-28 BC'),
+('4713-01-29 BC'), ('4713-01-30 BC'), ('4713-01-31 BC'),
+('5874897-12-01'), ('5874897-12-02'), 

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-16 Thread Ashutosh Bapat
On Mon, Oct 16, 2023 at 7:33 PM Tomas Vondra
 wrote:
>
> On 10/16/23 11:25, Ashutosh Bapat wrote:
> > Thanks Tomas for bringing this discussion to hackers.
> >
> >
> > On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  
> > wrote:
> >>
> >> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
> >>  wrote:
> >>>
> >>> I do plan to backpatch this, yes. I don't think there are many people
> >>> affected by this (few people are using infinite dates/timestamps, but
> >>> maybe the overflow could be more common).
> >>>
> >
> > The example you gave is missing CREATE INDEX command. Is it "create
> > index test_idx_a on test using brin(a);"
>
> Ah, you're right - apologies. FWIW when testing I usually use 1-page
> ranges, to possibly hit more combinations / problems. More importantly,
> it needs to specify the opclass, otherwise it'll use the default minmax
> opclass which does not use distance at all:
>
> create index test_idx_a on test
>  using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);
>

Thanks.

> >
> > Do already create indexes have this issue? Do they need to rebuilt
> > after upgrading?
> >
>
> Yes, existing indexes will have inefficient ranges. I'm not sure we want
> to push people to reindex everything, the issue seem somewhat unlikely
> in practice.
>

If the column has infinity values only then they need to rebuild the
index. Such users may notice this bug fix in the release notes and
decide to rebuild the index themselves.

> >> I think this should be rewritten to compute delta from ia and ib
> >> without going via an intermediate Interval value. I.e., instead of
> >> computing "result", just do something like
> >>
> >> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
> >> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
> >> days += (int64) ib->day - (int64) ia->day;
> >> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
> >>
> >> then convert to double precision as it does now:
> >>
> >> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
> >>
> >
> > Given Tomas's explanation of how these functions are supposed to work,
> > I think your suggestions is better.
> >
> > I was worried that above calculations may not produce the same result
> > as the current code when there is no error because modulo and integer
> > division are not distributive over subtraction. But it looks like
> > together they behave as normal division which is distributive over
> > subtraction. I couldn't find an example where that is not true.
> >
> > Tomas, you may want to incorporate this in your patch. If not, I will
> > incorporate it in my infinite interval patchset in [1].
> >
>
> I'd rather keep it as separate patch, although maybe let's deal with it
> separately from the larger patches. It's a bug, and having it in a patch
> set that adds a feature does not seem like a good idea (or maybe I don't
> understand what the other thread does, I haven't looked very closely).
>

If you incorporate these changes, I will need to remove 0009, which
mostly rewrites that function, from my patchset. If you don't, my
patch rewrites anyway. Either way is fine with me.

-- 
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-16 Thread Tomas Vondra
On 10/16/23 11:25, Ashutosh Bapat wrote:
> Thanks Tomas for bringing this discussion to hackers.
> 
> 
> On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  wrote:
>>
>> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
>>  wrote:
>>>
>>> I do plan to backpatch this, yes. I don't think there are many people
>>> affected by this (few people are using infinite dates/timestamps, but
>>> maybe the overflow could be more common).
>>>
> 
> The example you gave is missing CREATE INDEX command. Is it "create
> index test_idx_a on test using brin(a);"

Ah, you're right - apologies. FWIW when testing I usually use 1-page
ranges, to possibly hit more combinations / problems. More importantly,
it needs to specify the opclass, otherwise it'll use the default minmax
opclass which does not use distance at all:

create index test_idx_a on test
 using brin (a timestamptz_minmax_multi_ops) with (pages_per_range=1);

> 
> Do already create indexes have this issue? Do they need to rebuilt
> after upgrading?
> 

Yes, existing indexes will have inefficient ranges. I'm not sure we want
to push people to reindex everything, the issue seem somewhat unlikely
in practice.

>>
>> OK, though I doubt that such values are common in practice.
>>
>> There's also an overflow problem in
>> brin_minmax_multi_distance_interval() though, and I think that's worse
>> because overflows there throw "interval out of range" errors, which
>> can prevent index creation or inserts.
>>
>> There's a patch (0009 in [1]) as part of the infinite interval work,
>> but that just uses interval_mi(), and so doesn't fix the
>> interval-out-of-range errors, except for infinite intervals, which are
>> treated as special cases, which I don't think is really necessary.
>>
> 
> Right. I used interval_mi() to preserve the finite value behaviour as
> is. But ...
> 
>> I think this should be rewritten to compute delta from ia and ib
>> without going via an intermediate Interval value. I.e., instead of
>> computing "result", just do something like
>>
>> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
>> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
>> days += (int64) ib->day - (int64) ia->day;
>> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
>>
>> then convert to double precision as it does now:
>>
>> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
>>
> 
> Given Tomas's explanation of how these functions are supposed to work,
> I think your suggestions is better.
> 
> I was worried that above calculations may not produce the same result
> as the current code when there is no error because modulo and integer
> division are not distributive over subtraction. But it looks like
> together they behave as normal division which is distributive over
> subtraction. I couldn't find an example where that is not true.
> 
> Tomas, you may want to incorporate this in your patch. If not, I will
> incorporate it in my infinite interval patchset in [1].
> 

I'd rather keep it as separate patch, although maybe let's deal with it
separately from the larger patches. It's a bug, and having it in a patch
set that adds a feature does not seem like a good idea (or maybe I don't
understand what the other thread does, I haven't looked very closely).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-16 Thread Ashutosh Bapat
Thanks Tomas for bringing this discussion to hackers.


On Fri, Oct 13, 2023 at 8:58 PM Dean Rasheed  wrote:
>
> On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
>  wrote:
> >
> > I do plan to backpatch this, yes. I don't think there are many people
> > affected by this (few people are using infinite dates/timestamps, but
> > maybe the overflow could be more common).
> >

The example you gave is missing CREATE INDEX command. Is it "create
index test_idx_a on test using brin(a);"

Do already create indexes have this issue? Do they need to rebuilt
after upgrading?

>
> OK, though I doubt that such values are common in practice.
>
> There's also an overflow problem in
> brin_minmax_multi_distance_interval() though, and I think that's worse
> because overflows there throw "interval out of range" errors, which
> can prevent index creation or inserts.
>
> There's a patch (0009 in [1]) as part of the infinite interval work,
> but that just uses interval_mi(), and so doesn't fix the
> interval-out-of-range errors, except for infinite intervals, which are
> treated as special cases, which I don't think is really necessary.
>

Right. I used interval_mi() to preserve the finite value behaviour as
is. But ...

> I think this should be rewritten to compute delta from ia and ib
> without going via an intermediate Interval value. I.e., instead of
> computing "result", just do something like
>
> dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
> days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
> days += (int64) ib->day - (int64) ia->day;
> days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);
>
> then convert to double precision as it does now:
>
> delta = (double) days + dayfraction / (double) USECS_PER_DAY;
>

Given Tomas's explanation of how these functions are supposed to work,
I think your suggestions is better.

I was worried that above calculations may not produce the same result
as the current code when there is no error because modulo and integer
division are not distributive over subtraction. But it looks like
together they behave as normal division which is distributive over
subtraction. I couldn't find an example where that is not true.

Tomas, you may want to incorporate this in your patch. If not, I will
incorporate it in my infinite interval patchset in [1].

[1] 
https://www.postgresql.org/message-id/CAExHW5u1JE7dxK=wlzqhcszntoxqzjdiermhrepw6r8w6kc...@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-13 Thread Dean Rasheed
On Fri, 13 Oct 2023 at 13:17, Tomas Vondra
 wrote:
>
> I do plan to backpatch this, yes. I don't think there are many people
> affected by this (few people are using infinite dates/timestamps, but
> maybe the overflow could be more common).
>

OK, though I doubt that such values are common in practice.

There's also an overflow problem in
brin_minmax_multi_distance_interval() though, and I think that's worse
because overflows there throw "interval out of range" errors, which
can prevent index creation or inserts.

There's a patch (0009 in [1]) as part of the infinite interval work,
but that just uses interval_mi(), and so doesn't fix the
interval-out-of-range errors, except for infinite intervals, which are
treated as special cases, which I don't think is really necessary.

I think this should be rewritten to compute delta from ia and ib
without going via an intermediate Interval value. I.e., instead of
computing "result", just do something like

dayfraction = (ib->time % USECS_PER_DAY) - (ia->time % USECS_PER_DAY);
days = (ib->time / USECS_PER_DAY) - (ia->time / USECS_PER_DAY);
days += (int64) ib->day - (int64) ia->day;
days += ((int64) ib->month - (int64) ia->month) * INT64CONST(30);

then convert to double precision as it does now:

delta = (double) days + dayfraction / (double) USECS_PER_DAY;

So the first part is exact 64-bit integer arithmetic, with no chance
of overflow, and it'll handle "infinite" intervals just fine, when
that feature gets added.

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/CAExHW5u1JE7dxK=wlzqhcszntoxqzjdiermhrepw6r8w6kc...@mail.gmail.com




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-13 Thread Tomas Vondra
On 10/13/23 14:04, Dean Rasheed wrote:
> On Fri, 13 Oct 2023 at 11:44, Tomas Vondra
>  wrote:
>>
>> On 10/13/23 11:21, Dean Rasheed wrote:
>>>
>>> Is this only inefficient? Or can it also lead to wrong query results?
>>
>> I don't think it can produce incorrect results. It only affects which
>> values we "merge" into an interval when building the summaries.
>>
> 
> Ah, I get it now. These "distance" support functions are only used to
> see how far apart 2 ranges are, for the purposes of the algorithm that
> merges the 2 closest ranges. So if it gets it wrong, it only leads to
> a poor choice of ranges to merge, making the query inefficient, but
> still correct.
> 

Right.

> Presumably, that also makes this kind of change safe to back-patch
> (not sure if you were planning to do that?), since it will only affect
> range merging choices when inserting new values into existing indexes.
> 

I do plan to backpatch this, yes. I don't think there are many people
affected by this (few people are using infinite dates/timestamps, but
maybe the overflow could be more common).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-13 Thread Dean Rasheed
On Fri, 13 Oct 2023 at 11:44, Tomas Vondra
 wrote:
>
> On 10/13/23 11:21, Dean Rasheed wrote:
> >
> > Is this only inefficient? Or can it also lead to wrong query results?
>
> I don't think it can produce incorrect results. It only affects which
> values we "merge" into an interval when building the summaries.
>

Ah, I get it now. These "distance" support functions are only used to
see how far apart 2 ranges are, for the purposes of the algorithm that
merges the 2 closest ranges. So if it gets it wrong, it only leads to
a poor choice of ranges to merge, making the query inefficient, but
still correct.

Presumably, that also makes this kind of change safe to back-patch
(not sure if you were planning to do that?), since it will only affect
range merging choices when inserting new values into existing indexes.

Regards,
Dean




Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-13 Thread Tomas Vondra


On 10/13/23 11:21, Dean Rasheed wrote:
> On Thu, 12 Oct 2023 at 23:43, Tomas Vondra
>  wrote:
>>
>> Ashutosh Bapat reported me off-list a possible issue in how BRIN
>> minmax-multi calculate distance for infinite timestamp/date values.
>>
>> The current code does this:
>>
>> if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
>> PG_RETURN_FLOAT8(0);
>>
> 
> Yes indeed, that looks wrong. I noticed the same thing while reviewing
> the infinite interval patch.
> 
>> so means infinite values are "very close" to any other value, and thus
>> likely to be merged into a summary range. That's exactly the opposite of
>> what we want to do, possibly resulting in inefficient indexes.
>>
> 
> Is this only inefficient? Or can it also lead to wrong query results?
> 

I don't think it can produce incorrect results. It only affects which
values we "merge" into an interval when building the summaries.

>> Attached is a patch fixing this
>>
> 
> I wonder if it's actually necessary to give infinity any special
> handling at all for dates and timestamps. For those types, "infinity"
> is actually just INT_MIN/MAX, which compares correctly with any finite
> value, and will be much larger/smaller than any common value, so it
> seems like it isn't necessary to give "infinite" values any special
> treatment. That would be consistent with date_cmp() and
> timestamp_cmp().
> 

Right, but 

> Something else that looks wrong about that BRIN code is that the
> integer subtraction might lead to overflow -- it is subtracting two
> integer values, then casting the result to float8. It should cast each
> input before subtracting, more like brin_minmax_multi_distance_int8().
> 

... it also needs to fix this, otherwise it overflows. Consider

 delta = dt2 - dt1;

and assume dt1 is INT64_MIN, or that dt2 is INT64_MAX.

> IOW, I think brin_minmax_multi_distance_date/timestamp could be made
> basically identical to brin_minmax_multi_distance_int4/8.
> 

Right. Attached is a patch doing it this way.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index f8b2a3f9bc6..3d5e6d0c8f6 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2084,10 +2084,13 @@ brin_minmax_multi_distance_date(PG_FUNCTION_ARGS)
 	DateADT		dateVal1 = PG_GETARG_DATEADT(0);
 	DateADT		dateVal2 = PG_GETARG_DATEADT(1);
 
-	if (DATE_NOT_FINITE(dateVal1) || DATE_NOT_FINITE(dateVal2))
-		PG_RETURN_FLOAT8(0);
+	/*
+	 * We know the values are range boundaries, but the range may be collapsed
+	 * (i.e. single points), with equal values.
+	 */
+	Assert(dateVal1 <= dateVal2);
 
-	PG_RETURN_FLOAT8(dateVal1 - dateVal2);
+	PG_RETURN_FLOAT8((double) dateVal2 - (double) dateVal1);
 }
 
 /*
@@ -2136,19 +2139,16 @@ brin_minmax_multi_distance_timetz(PG_FUNCTION_ARGS)
 Datum
 brin_minmax_multi_distance_timestamp(PG_FUNCTION_ARGS)
 {
-	float8		delta = 0;
-
 	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
 	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
 
-	if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
-		PG_RETURN_FLOAT8(0);
-
-	delta = dt2 - dt1;
-
-	Assert(delta >= 0);
+	/*
+	 * We know the values are range boundaries, but the range may be collapsed
+	 * (i.e. single points), with equal values.
+	 */
+	Assert(dt1 <= dt2);
 
-	PG_RETURN_FLOAT8(delta);
+	PG_RETURN_FLOAT8((double) dt2 - (double) dt1);
 }
 
 /*


Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-13 Thread Dean Rasheed
On Thu, 12 Oct 2023 at 23:43, Tomas Vondra
 wrote:
>
> Ashutosh Bapat reported me off-list a possible issue in how BRIN
> minmax-multi calculate distance for infinite timestamp/date values.
>
> The current code does this:
>
> if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
> PG_RETURN_FLOAT8(0);
>

Yes indeed, that looks wrong. I noticed the same thing while reviewing
the infinite interval patch.

> so means infinite values are "very close" to any other value, and thus
> likely to be merged into a summary range. That's exactly the opposite of
> what we want to do, possibly resulting in inefficient indexes.
>

Is this only inefficient? Or can it also lead to wrong query results?

> Attached is a patch fixing this
>

I wonder if it's actually necessary to give infinity any special
handling at all for dates and timestamps. For those types, "infinity"
is actually just INT_MIN/MAX, which compares correctly with any finite
value, and will be much larger/smaller than any common value, so it
seems like it isn't necessary to give "infinite" values any special
treatment. That would be consistent with date_cmp() and
timestamp_cmp().

Something else that looks wrong about that BRIN code is that the
integer subtraction might lead to overflow -- it is subtracting two
integer values, then casting the result to float8. It should cast each
input before subtracting, more like brin_minmax_multi_distance_int8().

IOW, I think brin_minmax_multi_distance_date/timestamp could be made
basically identical to brin_minmax_multi_distance_int4/8.

Regards,
Dean