Re: [HACKERS] Statistics and selectivity estimation for ranges

Alexander Korotkov Fri, 01 Mar 2013 06:23:15 -0800

On Wed, Feb 13, 2013 at 5:55 PM, Alexander Korotkov <[email protected]>wrote:


> On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas <
> [email protected]> wrote:
>
>> On 04.01.2013 10:42, Alexander Korotkov wrote:
>>
>>> /*
>>>  * Calculate selectivity of "&&" operator using histograms of range
>>> lower bounds
>>>  * and histogram of range lengths.
>>>  */
>>> static double
>>> calc_hist_selectivity_overlap(**TypeCacheEntry *typcache, RangeBound
>>> *lower,
>>>                                         RangeBound *upper, RangeBound
>>> *hist_lower, int hist_nvalues,
>>>                                                         Datum
>>> *length_hist_values, int length_hist_nvalues)
>>>
>>
>> We already have code to estimate &&, based on the lower and upper bound
>> histograms:
>>
>>                  case OID_RANGE_OVERLAP_OP:
>>>                 case OID_RANGE_CONTAINS_ELEM_OP:
>>>                         /*
>>>                          * A && B <=> NOT (A << B OR A >> B).
>>>                          *
>>>                          * "range @> elem" is equivalent to "range &&
>>> [elem,elem]". The
>>>                          * caller already constructed the singular range
>>> from the element
>>>                          * constant, so just treat it the same as &&.
>>>                          */
>>>                         hist_selec =
>>>                                 calc_hist_selectivity_scalar(**typcache,
>>> &const_lower, hist_upper,
>>>
>>>                  nhist, false);
>>>                         hist_selec +=
>>>                                 (1.0 - 
>>> calc_hist_selectivity_scalar(**typcache,
>>> &const_upper, hist_lower,
>>>
>>>                           nhist, true));
>>>                         hist_selec = 1.0 - hist_selec;
>>>                         break;
>>>
>>
>> I don't think the method based on lower bound and length histograms is
>> any better. In fact, my gut feeling is that it's less accurate. I'd suggest
>> dropping that part of the patch.
>>
>
> Right. This estimation has an accuracy of histogram, while estimation
> based on lower bound and length histograms rely on additional assumption
> about independence of lower bound and length histogram. We can sum A << B
> and A >> B probabilities because they are mutually exclusive. It's pretty
> evident but I would like to mention it in the comments, because typical
> assumption about events in statistics calculation is their independence.
>

These changes were made in attached patch.

------
With best regards,
Alexander Korotkov.

range_stat-0.11.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Statistics and selectivity estimation for ranges

Reply via email to