Re: pg_stats and range statistics

Egor Rogov Mon, 12 Jul 2021 04:11:10 -0700

Hi,

thanks for the review and corrections.


On 11.07.2021 21:54, Soumyadeep Chakraborty wrote:

Hello,

This should have been added with [1].

Excerpt from the documentation:
"pg_stats is also designed to present the information in a more readable
format than the underlying catalog — at the cost that its schema must
be extended whenever new slot types are defined for pg_statistic." [2]

So, I added a reminder in pg_statistic.h.


Good point.

Attached is v2 of this patch with some cosmetic changes.

I wonder why "TODO: catalog version bump"? This patch doesn't changecatalog structure, or I miss something?

Renamed the columns a
bit and updated the docs to be a bit more descriptive.
(range_length_empty_frac -> empty_range_frac, range_bounds_histogram ->
range_bounds_histograms)

I intended to make the same prefix ("range_") for all columns concernedwith range types, although I'm fine with the proposed naming.

One question:

We do have the option of representing the histogram of lower bounds separately
from the histogram of upper bounds, as two separate view columns. Don't know if
there is much utility though and there is a fair bit of added complexity: see
below. Thoughts?

I thought about it too, and decided not to transform the underlying datastructure. As far as I can see, pg_stats never employed suchtransformations. For example, STATISTIC_KIND_DECHIST is an arraycontaining the histogram followed by the average in its last element. Itis shown in pg_stats.elem_count_histogram as is, although it arguablymay be splitted into two fields. All in all, I believe pg_stats's job isto "unpack" stavalues and stanumbers into meaningful fields, and not totry to go deeper than that.


My attempts via SQL (unnest -> lower|upper -> array_agg) were futile given
unnest does not play nice with anyarray. For instance:

select unnest(stavalues1) from pg_statistic;
ERROR:  cannot determine element type of "anyarray" argument

Maybe the only option is to write a UDF pg_get_{lower|upper}_bounds_histogram
which can do something similar to what calc_hist_selectivity does:

/*
  * Convert histogram of ranges into histograms of its lower and upper
  * bounds.
  */
nhist = hslot.nvalues;
hist_lower = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
hist_upper = (RangeBound *) palloc(sizeof(RangeBound) * nhist);
for (i = 0; i < nhist; i++)
{
bool empty;

range_deserialize(rng_typcache, DatumGetRangeTypeP(hslot.values[i]),
   &hist_lower[i], &hist_upper[i], &empty);
/* The histogram should not contain any empty ranges */
if (empty)
elog(ERROR, "bounds histogram contains an empty range");
}

This is looking good and ready.

[1] 
https://github.com/postgres/postgres/commit/918eee0c497c88260a2e107318843c9b1947bc6f
[2] https://www.postgresql.org/docs/devel/view-pg-stats.html

Regards,
Soumyadeep (VMware)

Re: pg_stats and range statistics

Reply via email to