On 12.12.2010 15:17, Martijn van Oosterhout wrote:
On Sun, Dec 12, 2010 at 03:58:49AM +0100, Tomas Vondra wrote:
Very cool that you're working on this.

+1

Lets talk about one special case - I'll explain how the proposed
solution works, and then I'll explain how to make it more general, what
improvements are possible, what issues are there. Anyway this is by no
means a perfect or complete solution - it's just a starting point.

It looks like you handled most of the issues. Just a few points:

- This is obviously applicable to more than just integers, probably
   anything with a b-tree operator class. What you've coded seems rely
   on calculations on the values. Have you thought about how it could
   work for, for example, strings?

The classic failure case has always been: postcodes and city names.
Strongly correlated, but in a way that the computer can't easily see.

Yeah, and that's actually analogous to the example I used in my presentation.

The way I think of that problem is that once you know the postcode, knowing the city name doesn't add any information. The postcode implies the city name. So the selectivity for "postcode = ? AND city = ?" should be the selectivity of "postcode = ?" alone. The measurement we need is "implicativeness": How strongly does column A imply a certain value for column B. Perhaps that could be measured by counting the number of distinct values of column B for each value of column A, or something like that. I don't know what the statisticians call that property, or if there's some existing theory on how to measure that from a sample.

That's assuming the combination has any matches. It's possible that the user chooses a postcode and city combination that doesn't exist, but that's no different from a user doing "city = 'fsdfsdfsd'" on a single column, returning no matches. We should assume that the combination makes sense.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to