On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote: > Justin Pryzby <pry...@telsasoft.com> writes: > > That seems to be only used for ineq_histogram_selectivity() interpolation of > > histogram bins. It looks to me that at least isn't working for "special > > values", and needs to use something other than isnan(). > > Uh, what? This seems completely wrong to me. We could possibly > promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but > I don't really see the point. They'll compare to other timestamp > values correctly without that, cf timestamp_cmp_internal(). > The example you give seems to me to be working sanely, or at least > as sanely as it can given the number of histogram points available, > with the existing code. In any case, shoving NaNs into the > computation is not going to make anything better.
As I see it, the problem is that the existing code tests for isnan(), but infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's absurdly large values being used as if they were isnormal(). src/include/datatype/timestamp.h:#define DT_NOBEGIN PG_INT64_MIN src/include/datatype/timestamp.h-#define DT_NOEND PG_INT64_MAX On v12, my test gives: |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); |INSERT INTO t VALUES('-infinity'); |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; | Seq Scan on t (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1) vs patched master: |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes'); |INSERT INTO t VALUES('-infinity'); |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t; |explain analyze SELECT * FROM t WHERE t>='2010-12-29'; | Seq Scan on t (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1) IMO 146 rows is a reasonable estimate given a single histogram bucket of infinite width, and 3 rows is a less reasonable result of returning INT64_MAX in one place and then handling it as a normal value. The comments in ineq_histogram seem to indicate that this case is intended to get binfrac=0.5: | Watch out for the possibility that we got a NaN or Infinity from the | division. This can happen despite the previous checks, if for example "low" is | -Infinity. I changed to use INFINITY, -INFINITY and !isnormal() rather than nan() and isnan() (although binfrac is actually NAN at that point so the existing test is ok). Justin