On Tue, Mar 8, 2016 at 3:36 PM, Joel Jacobson <j...@trustly.com> wrote:

> Hi Alex,
>
> Thanks for excellent research.
>

Joel,

Thank you for spending your time to run these :-)

I've ran your queries against Trustly's production database and I can
> confirm your findings, the results are similar:
>
> WITH ...
> SELECT count(1),
>        min(hist_ratio)::real,
>        avg(hist_ratio)::real,
>        max(hist_ratio)::real,
>        stddev(hist_ratio)::real
>   FROM stats2
>  WHERE histogram_bounds IS NOT NULL;
>
> -[ RECORD 1 ]----
> count  | 2814
> min    | 0.193548
> avg    | 0.927357
> max    | 1
> stddev | 0.164134
>
>
> WHERE distinct_hist < num_hist
> -[ RECORD 1 ]----
> count  | 624
> min    | 0.193548
> avg    | 0.672407
> max    | 0.990099
> stddev | 0.194901
>
>
> WITH ..
> SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
> WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
>        n_distinct, null_frac,
>        num_mcv, most_common_vals, most_common_freqs,
>        mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
>        distinct_hist, num_hist, hist_ratio,
>        histogram_bounds
>   FROM stats2
>  ORDER BY hist_ratio
>  LIMIT 1;
>
>  -[ RECORD 1
> ]-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> columnname        | public.x.y
> n_distinct        | 103
> null_frac         | 0
> num_mcv           | 10
> most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
> most_common_freqs |
>
> {0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667}
> mcv_frac          | 0.971267
> nonnull_mcv_frac  | 0.971267
> distinct_hist     | 18
> num_hist          | 93
> hist_ratio        | 0.193548387096774
> histogram_bounds  |
>
> {10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}
>

I don't want to be asking for too much here, but is there a chance you
could try the effects of the proposed patch on an offline copy of your
database?

Do you envision or maybe have experienced problems with query plans
referring to the columns that are near the top of the above hist_ratio
report?  In other words: what are the practical implications for you with
the values being duplicated rather badly throughout the histogram like in
the example you shown?

Thank you!
--
Alex

Reply via email to