Re: [HACKERS] Cross-column statistics revisited

Nathan Boley Sun, 19 Oct 2008 10:10:16 -0700

> I still need to go through backend/utils/adt/selfuncs.c
> to figure out exactly how we use the one-dimensional values.
>


Here's a page that helped me figure all this out.

http://www.postgresql.org/docs/8.1/static/planner-stats-details.html

>>
>> 2) Do we want to fold the MCV's into the dependence histogram? That
>> will cause problems in our copula approach but I'd hate to have to
>> keep an N^d histogram dependence relation in addition to the copula.
>
> Yeah, if we're already trying to figure out how to compress copulae,
> having also to compress MCV matrices seems painful and error-prone.
> But I'm not sure why it would cause problems to keep them in the
> copula -- is that just because we are most interested in the copula
> modeling the parts of the distribution that are most sparsely
> populated?
>

The problem I was thinking of is that we don't currently store the
true marginal distribution. As it stands, histograms only include non
mcv values. So we would either need to take the mcv's separately (
which would assume independence between mcv's and non-mcv values ) or
store multiple histograms.

>> 4) How will this approach deal with histogram buckets that have
>> scaling count sizes ( ie -0.4 )?
>
> I'm not sure what you mean here.
>

That was more a note to myself, and should have been numbered 3.5.
ndistinct estimates currently start to scale after a large enough
row/ndistinct ratio. If we try to model ndistinct, we need to deal
with scaling ndistinct counts somehow. But that's way off in the
future, it was probably pointless to mention it.

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cross-column statistics revisited

Reply via email to