Re: [HACKERS] KNN-GiST with recheck

Heikki Linnakangas Tue, 28 Jan 2014 05:56:04 -0800

On 01/13/2014 07:17 PM, Alexander Korotkov wrote:

Here goes a desription of this patch same as in original thread.


KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.

See an example.

create table test as (select id, polygon(3+(random()*10)::int,
circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
generate_series(1,1000000) id);
create index test_idx on test using gist (p);

We can get results ordered by distance from polygon to point.

postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
point(0.5,0.5) limit 10;
    id   |       ?column?
--------+----------------------
  755611 | 0.000405855808916853
  807562 | 0.000464123777564343
  437778 | 0.000738524708741959
  947860 |  0.00076250998760724
  389843 | 0.000886362723569568
   17586 | 0.000981960100555216
  411329 |  0.00145338112316853
  894191 |  0.00149399559703506
  391907 |   0.0016647896049741
  235381 |  0.00167554614889509
(10 rows)

It's fast using just index scan.

                                                             QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
rows=10 loops=1)
    ->  Index Scan using test_idx on test  (cost=0.29..157672.29
rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
          Order By: (p <-> '(0.5,0.5)'::point)
  Total runtime: 0.305 ms
(4 rows)


Nice! Some thoughts:

1. This patch introduces a new "polygon <-> point" operator. That seemsuseful on its own, with or without this patch.

2. I wonder how useful it really is to allow mixing exact and non-exactreturn values from the distance function. The distance function includedin the patch always returns recheck=true. I have a feeling that allother distance function will also always return either true or false.

3. A binary heap would be a better data structure to buffer therechecked values. A Red-Black tree allows random insertions anddeletions, but in this case you need to insert arbitrary values but onlyremove the minimum item. That's exactly what a binary heap excels at. Wehave a nice binary heap implementation in the backend that you can use,see src/backend/lib/binaryheap.c.

4. (as you mentioned in the other thread: ) It's a modularity violationthat you peek into the heap tuple from gist. I think the proper way todo this would be to extend the IndexScan executor node to perform there-shuffling of tuples that come from the index in wrong order, orperhaps add a new node type for it.

Of course that's exactly what your partial sort patch does :-). Ihaven't looked at that in detail, but I don't think the approach thepartial sort patch takes will work here as is. In the KNN-GiST case, theindex is returning tuples roughly in the right order, but a tuple thatit returns might in reality belong somewhere later in the ordering. Inthe partial sort patch, the "input stream" of tuples is divided intonon-overlapping groups, so that the tuples within the group are notsorted, but the groups are. I think the partial sort case is a specialcase of the KNN-GiST case, if you consider the lower bound of each tupleto be the leading keys that you don't need to sort.

BTW, this capability might also be highly useful for the min/max indexesas well. A min/max index cannot return an exact ordering of tuples, butit can also give a lower bound for a group of tuples.


- Heikki


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNN-GiST with recheck

Reply via email to