On Sun, Dec 12, 2010 at 8:46 PM, Tomas Vondra <t...@fuzzy.cz> wrote: > Dne 13.12.2010 01:05, Robert Haas napsal(a): >> This is a good idea, but I guess the question is what you do next. If >> you know that the "applicability" is 100%, you can disregard the >> restriction clause on the implied column. And if it has no >> implicatory power, then you just do what we do now. But what if it >> has some intermediate degree of implicability? > > Well, I think you've missed the e-mail from Florian Pflug - he actually > pointed out that the 'implicativeness' Heikki mentioned is called > conditional probability. And conditional probability can be used to > express the "AND" probability we are looking for (selectiveness). > > For two columns, this is actually pretty straighforward - as Florian > wrote, the equation is > > P(A and B) = P(A|B) * P(B) = P(B|A) * P(A)
Well, the question is what data you are actually storing. It's appealing to store a measure of the extent to which a constraint on column X constrains column Y, because you'd only need to store O(ncolumns^2) values, which would be reasonably compact and would potentially handle the zip code problem - a classic "hard case" rather neatly. But that wouldn't be sufficient to use the above equation, because there A and B need to be things like "column X has value x", and it's not going to be practical to store a complete set of MCVs for column X for each possible value that could appear in column Y. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers