On Mon, Oct 1, 2012 at 3:22 AM, Jeff Davis <pg...@j-davis.com> wrote:

> On Tue, 2012-09-04 at 17:27 +0400, Alexander Korotkov wrote:
> > Addon patch is attached. Actually, I don't get your intention of
> > introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to
> > leave it as empty frac in distinct stakind or replace this stakind
> > with STATISTIC_KIND_LENGTH_HISTOGRAM? In the attached
> > patch STATISTIC_KIND_RANGE_EMPTY_FRAC is replaced
> > with STATISTIC_KIND_LENGTH_HISTOGRAM.
>
> Review comments:
>
> 1. In compute_range_stats, you need to guard against the case where
> there is no subdiff function. Perhaps default to 1.0 or something?
>

Let it be 1.0 without subdiff function. However, there is not so much use
of this method of estimation without subdiff. But, probably it's better
than nothing.

2. I think it would be helpful to add comments regarding what happens
> when lengths are identical, right now it's a little confusing. For
> instance, the comment: "Generate a length histogram slot entry if there
> are at least two length values" doesn't seem right, because the
> condition still matches even if there is only one distinct value.
>
I've rephrased comment. Not it's implicitly says that collected values are
not necessary distinct.


> 3. It looks like get_distance also needs to guard against a missing
> subdiff.
>

Same to compute_range_stats. Let default value be 1.0.


> 4. There are 3 binary search functions, which seems a little excessive:
>   * rbound_bsearch: greatest i such that hist[i] < v; or -1
>   * rbound_bsearch_equal: greatest i such that:
>       hist[i] <= v and (i=0 or hist[i-1] != hist[i]); or -1
>   * length_hist_bsearch: least i such that hist[i] >= v;
>       or length of hist
> (let me know if I misunderstood the definitions)
> At a minimum, we need more consistent and informative names. Also, the
> definition of rbound_bsearch_equal is a little confusing because it's
> looking for the highest index among distinct values, but the lowest
> index among identical values. Do you see a way to refactor these to be a
> little easier to understand?
>

Actually, goal of rbound_bsearch_equal is to find histogram bin to start
interpolation from. I've renamed it to rbound_bsearch_bin and added
corresponding comment.

------
With best regards,
Alexander Korotkov.

Attachment: range_stat-0.8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to