Re: [HACKERS] Collect frequency statistics for arrays

Noah Misch Tue, 17 Jan 2012 02:34:02 -0800

On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
> Thanks for your fixes to the patch. Them looks correct to me. I did some
> fixes in the patch. The proof of some concepts is still needed. I'm going
> to provide it in a few days.


Your further fixes look good.  Could you also answer my question about the
header comment of mcelem_array_contained_selec()?

/*
 * Estimate selectivity of "column <@ const" based on most common element
 * statistics.  Independent element occurrence would imply a particular
 * distribution of distinct element counts among matching rows.  Real data
 * usually falsifies that assumption.  For example, in a set of 1-element
 * integer arrays having elements in the range [0;10], element occurrences are
 * not independent.  If they were, a sufficiently-large set would include all
 * distinct element counts 0 through 11.  We correct for this using the
 * histogram of distinct element counts.
 *
 * In the "column @> const" and "column && const" cases, we usually have
 * "const" with low summary frequency of elements (otherwise we have
 * selectivity close to 0 or 1 correspondingly).  That's why the effect of
 * dependence related to distinct element counts distribution is negligible
 * there.  In the "column <@ const" case, summary frequency of elements is
 * high (otherwise we have selectivity close to 0).  That's why we should do
 * correction due to array distinct element counts distribution.
 */

By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const".  For "column @> const", selectivity
cannot exceed the smallest frequency among const elements.  A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collect frequency statistics for arrays

Reply via email to