I'm gradually slogging my way through the KNNGIST patches which were posted here:
http://archives.postgresql.org/pgsql-hackers/2010-07/msg01183.php I have a couple of conceptual questions. 1. Is KNNGIST intended to work if there's more than one pathkey? If so, how? Example: SELECT * FROM tab ORDER BY this_point <-> '(0,0)', this_point <-> '(1,1)' As far as I can see, there's nothing in match_pathkey_to_index() which would prevent lists of pathkeys and sortclauses of length 2 from being constructed, but can GIST actually handle that? It seems hard. If it can't, which I suspect is the case, then we can make this logic a lot simpler by removing one of the loops from match_pathkey_to_index() and bailing out quickly whenever list_length(root->query_pathkeys) != 1. 2. I'm concerned by the fact that the new consistent() methods only return 0 or -1.0 for non-leaf pages. If we have a query like: SELECT * FROM tab WHERE this_point <-> '(0,0)' ...it seems that every non-leaf page will be consistent and we'll end up loading every key in the entire index into an RBTree, which doesn't seem practical from a memory-usage perspective. Maybe I'm misunderstanding something, but it seems like in a case like this you'd need to have the consistent function for each page return a minimum and maximum distance to '(0,0)'. If you have one page that has a minimum distance of 0 and a maximum distance of 100 and another page which has a minimum distance of 101 and a maximum distance of 200, you don't even need to look at the second page until all the keys from the first one have been processed. If there's a third page with a minimum distance of 50 and a maximum distance of 150, you can return the tuples from the first page with distances less than 50, in order; then you can do a merge between the remaining keys on that page and the keys on the third page with values <= 100; then you can do a merge between the remaining keys on that page and those on the second page with values <= 150; and finally you can return the remaining keys on the second page in order. That still doesn't seem to provide any particularly tight bound on memory usage, but it's certainly way better than traversing the entire tree up front. If you are already doing something like this somewhere in the code, please point me in the right direction... 3. I've been scratching my head over the following bit of code and it doesn't make any sense to me. As far as I can tell, this is effectively comparing the number of columns in the ORDER BY clause to the number of restriction clauses applicable to the relation being scanned. Those two quantities don't seem to have much to do with each other, so either I'm confused or the code is. It doesn't seem like it should matter anyway, since I don't think we're planning on any AMs being both amoptionalkey and amcanorderbyop. + if (list_length(restrictclauses) < indexcol && !index->amoptionalkey) + break; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers