Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Sep 27, 2018 at 8:28 PM Mark Dilger wrote: > > On Sep 18, 2018, at 3:58 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > > > > On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin > wrote: > >>> 17 сент. 2018 г., в 2:03, Alexander Korotkov < > a.korot...@postgrespro.ru> написал(а): > >>> > >>> Also, it appears to me that it's OK to be a single patch > >> > >> +1, ISTM that these 6 patches represent atomic unit of work. > > > > Thank you, pushed. > > One note on this commit, > > in my fork, I have converted BoxPGetDatum from a macro to a static inline > function, > and it shows a thinko in your commit, namely that in->leafDatum is of type > (BOX *), > but is actually (as the name implies) already a Datum: > > spgquadtreeproc.c:469:27: error: incompatible integer to pointer > conversion passing 'Datum' (aka 'unsigned long') to parameter of type 'BOX > *' [-Werror,-Wint-conversion] > > BoxPGetDatum(in->leafDatum), true, > > I'm not claiming this creates any active bug, just that it would make more > sense to > handle the typing cleanly. Perhaps the following: > > diff --git a/src/backend/access/spgist/spgquadtreeproc.c > b/src/backend/access/spgist/spgquadtreeproc.c > index dee438a307..90cc776899 100644 > --- a/src/backend/access/spgist/spgquadtreeproc.c > +++ b/src/backend/access/spgist/spgquadtreeproc.c > @@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS) > > if (res && in->norderbys > 0) > /* ok, it passes -> let's compute the distances */ > - out->distances = spg_key_orderbys_distances( > - > BoxPGetDatum(in->leafDatum), true, > + out->distances = spg_key_orderbys_distances(in->leafDatum, > true, > > in->orderbys, in->norderbys); > True. Pushed, thanks! -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
> On Sep 18, 2018, at 3:58 PM, Alexander Korotkov > wrote: > > On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin wrote: >>> 17 сент. 2018 г., в 2:03, Alexander Korotkov >>> написал(а): >>> >>> Also, it appears to me that it's OK to be a single patch >> >> +1, ISTM that these 6 patches represent atomic unit of work. > > Thank you, pushed. One note on this commit, in my fork, I have converted BoxPGetDatum from a macro to a static inline function, and it shows a thinko in your commit, namely that in->leafDatum is of type (BOX *), but is actually (as the name implies) already a Datum: spgquadtreeproc.c:469:27: error: incompatible integer to pointer conversion passing 'Datum' (aka 'unsigned long') to parameter of type 'BOX *' [-Werror,-Wint-conversion] BoxPGetDatum(in->leafDatum), true, I'm not claiming this creates any active bug, just that it would make more sense to handle the typing cleanly. Perhaps the following: diff --git a/src/backend/access/spgist/spgquadtreeproc.c b/src/backend/access/spgist/spgquadtreeproc.c index dee438a307..90cc776899 100644 --- a/src/backend/access/spgist/spgquadtreeproc.c +++ b/src/backend/access/spgist/spgquadtreeproc.c @@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS) if (res && in->norderbys > 0) /* ok, it passes -> let's compute the distances */ - out->distances = spg_key_orderbys_distances( - BoxPGetDatum(in->leafDatum), true, + out->distances = spg_key_orderbys_distances(in->leafDatum, true, in->orderbys, in->norderbys); mark
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin wrote: > > 17 сент. 2018 г., в 2:03, Alexander Korotkov > > написал(а): > > > > Also, it appears to me that it's OK to be a single patch > > +1, ISTM that these 6 patches represent atomic unit of work. Thank you, pushed. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! > 17 сент. 2018 г., в 2:03, Alexander Korotkov > написал(а): > > Also, it appears to me that it's OK to be a single patch +1, ISTM that these 6 patches represent atomic unit of work. Best regards, Andrey Borodin.
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Aug 30, 2018 at 3:57 PM Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Generally patch looks close to committable shape for me. I'm going to > revise code and documentation again, split it up, and then propose to > commit. > I've revised this patch again. This revision includes a lot of code beautification, adjustments to comments and documentation. Also, after reviewing bug fix [1] I found that we don't need separate memory context for queue. traversalCxt looks perfectly fine for that purpose, because there it's single scan lifetime. Also, it appears to me that it's OK to be a single patch: besides KNN SP-GiST and opclasses it contains only extraction of few common function between GiST and SP-GiST. So, I'm going to commit this if no objections. Links. 1. https://www.postgresql.org/message-id/flat/153663176628.23136.11901365223750051490%40wrigleys.postgresql.org -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company knn-spgist-v9.patch Description: Binary data
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Aug 30, 2018 at 12:41 PM Alexander Korotkov wrote: > Right, performance regression appears to be caused by queue memory > context allocation. I've tried to apply the same approach that we've > in GiST: allocate separate memory context for queue only at second > rescan call. And it appears to work, there is no measurable > regression in comparison to master. > > Patch v8 > x run 1 run 2 run 3 > 0.1 680 660 662 > 0.01 1403 1395 1508 > 0.003 6561 6475 6223 > > Revised version of patch is attached. I've squashed patchset into one > patch. Later I'll split it into fewer parts before committing. I'm > going to also make some benchmarking of KNN itself: GiST vs SP-GiST. I've made KNN GiST vs SP-GiST benchmarking similar to what I've done previously for plain search. create table knn_test as (select point(random(), random()) p from generate_series(1,100) i); create index knn_test_idx on knn_test using spgist(p); create index knn_test_idx on knn_test using gist(p); CREATE FUNCTION knn_bench(step float8, lim int) RETURNS void AS $$ DECLARE x float8; y float8; BEGIN y := 0.0; WHILE y <= 1.0 LOOP x := 0.0; WHILE x <= 1.0 LOOP PERFORM * FROM knn_test ORDER BY p <-> point(x,y) LIMIT lim; x := x + step; END LOOP; y := y + step; END LOOP; END; $$ LANGUAGE plpgsql; And the results are following. # GiST step limit run 1 run 2 run 3 buffers 0.1 1000 156 161 158 123191 0.03 100 298 305 301 122902 0.0110 1216 1214 1212 138591 # SP-GiST step limit run 1 run 2 run 3 buffers 0.1 1000 147 152 153 126446 0.03 100 227 223 226 131241 0.0110 734 733 730 175908 For me it looks expectable. SP-GiST takes less CPU, but uses more buffers. It's pretty same to what we have for plain scans. So, I see no anomaly here. Generally patch looks close to committable shape for me. I'm going to revise code and documentation again, split it up, and then propose to commit. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Aug 30, 2018 at 12:30 PM Alexander Korotkov wrote: > On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov > wrote: > > # Current patch (use list) > > x run 1 run 2 run 3 > > 0.11206 1230 1231 > > 0.01 2862 2813 2802 > > 0.003 13915 13911 13900 > > Sorry, I didn't explain what these tables means. There are times in > milliseconds for 3 runs of spgist_bench($x, $x) Right, performance regression appears to be caused by queue memory context allocation. I've tried to apply the same approach that we've in GiST: allocate separate memory context for queue only at second rescan call. And it appears to work, there is no measurable regression in comparison to master. Patch v8 x run 1 run 2 run 3 0.1 680 660 662 0.01 1403 1395 1508 0.003 6561 6475 6223 Revised version of patch is attached. I've squashed patchset into one patch. Later I'll split it into fewer parts before committing. I'm going to also make some benchmarking of KNN itself: GiST vs SP-GiST. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company knn-spgist-v8.patch Description: Binary data
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov wrote: > # Current patch (use list) > x run 1 run 2 run 3 > 0.11206 1230 1231 > 0.01 2862 2813 2802 > 0.003 13915 13911 13900 Sorry, I didn't explain what these tables means. There are times in milliseconds for 3 runs of spgist_bench($x, $x) -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! On Tue, Aug 28, 2018 at 12:50 PM Alexander Korotkov wrote: > On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin wrote: > > I'm not sure in architectural point of view: supporting two ways (list and > > heap) to store result seems may be a bit heavy, but OK. At least, it has > > meaningful benefits. > > It seems that Nikita have reworked it that way after my suspect that > switching regular scans to pairing heap *might* cause a regression. > However, I didn't insist that we need to support two ways. For > instance, if we can prove that there is no regression then it's fine > to use just heap... So, I decided to check does it really matter to support both list and pairing heap? Or can we just always use pairing heap? I wrote following simple test. Points are uniformly distributed in box (0,0)-(1,1). # create table spgist_test as (select point(random(), random()) p from generate_series(1,100) i); # create index spgist_test_idx on spgist_test using spgist(p); spgist_bench() walks over this box with given step and queries it each time with boxes of given size. CREATE FUNCTION spgist_bench(step float8, size float8) RETURNS void AS $$ DECLARE x float8; y float8; BEGIN y := 0.0; WHILE y <= 1.0 LOOP x := 0.0; WHILE x <= 1.0 LOOP PERFORM * FROM spgist_test WHERE p <@ box(point(x,y), point(x+size, y+size)); x := x + step; END LOOP; y := y + step; END LOOP; END; $$ LANGUAGE plpgsql; It fist I've compared current patch with modified patch, which I made to to always queue scan. # Current patch (use list) x run 1 run 2 run 3 0.11206 1230 1231 0.01 2862 2813 2802 0.003 13915 13911 13900 # Always use queue x run 1 run 2 run 3 0.11238 1189 1170 0.01 2740 2852 2769 0.003 13627 13751 13757 And it appears that difference is less than statistical error. But then I compared that with master. And it appears that master is much faster. # Master x run 1 run 2 run 3 0.1 725 691 700 0.01 1488 1480 1495 0.003 6696 6210 6295 It's probably because we're anyway allocating separate queue memory context. I'll explore this more and come up with details. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin wrote: > I'm not sure in architectural point of view: supporting two ways (list and > heap) to store result seems may be a bit heavy, but OK. At least, it has > meaningful benefits. It seems that Nikita have reworked it that way after my suspect that switching regular scans to pairing heap *might* cause a regression. However, I didn't insist that we need to support two ways. For instance, if we can prove that there is no regression then it's fine to use just heap... Links 1. https://www.postgresql.org/message-id/CAPpHfdu6Wm4DSAp8Pvwq0uo7fCSzsbrNy7x2v5EKK_g4Nkjx1Q%40mail.gmail.com -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
17 июля 2018 г., в 16:42, Nikita Glukhov написал(а): > > Fixed. Patch works as advertised, adds documentation and tests. I didn't succeeded in attempts to break it's functionality. I have no more notices about the code. Maybe except this const looks unusual, but is correct +extern BOX *box_copy(const BOX *box); I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit heavy, but OK. At least, it has meaningful benefits. I think the patch is ready for committer. Best regards, Andrey Borodin.
Re: [HACKERS] [PATCH] kNN for SP-GiST
Attached 7th version of the patches: * renamed recheck fields and variables * fixed formatting of the one if-statement On 15.07.2018 14:54, Andrey Borodin wrote: 14.07.2018, 1:31, Nikita Glukhov wrote: Attached 6th version of the patches. ... 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need to update it's behavior somehow to be used as-is? Or to support new functionality? There are no SP-GiST opclasses in contrib/, so there is nothing to change in contrib/. Moreover, existing extensions need to be updated only for support of new distance-ordered searches -- they need to implement distances[][] array calculation in their spgInnerConsistent() and spgLeafConsistent() support functions. So, if extension will not update anything - it will keep all preexisting functionality, right? Yes, of course. I have some more nitpicks about naming + recheckQual = out.recheck; + recheckDist = out.recheckDistances; Can we call this things somehow in one fashion? I would prefer to rename "spgLeafConsitentOut.recheck" to "recheckQual" but it is not good to rename user-visible fields, so I decided to rename all fields and variables to "recheck"/"recheckDistances". Also, this my be a stupid question, but do we really need to differ this two recheck cases? It is certain that we cannot merge them? This recheck cases are absolutely different. In the first case, opclass support functions can not accurately determine whether the leaf value satisfies the boolean search operator (compressed values can be stored in leaves). In the second case, opclass returns only a approximate value (the lower bound) of the leaf distances. In the example below operator 'polygon >> polygon' does not need recheck because bounding box (they are stored in the index instead of polygons) test gives exact results, but the ordering operator 'polygon <-> point' needs recheck: SELECT * FROM polygons WHERE poly >> polygon(box '((0,0),(1,1))') ORDER BY poly <-> point '(0,0)'; This seems strange if-formatting + /* If allTheSame, they should all or none of 'em match */ + if (innerTuple->allTheSame) + if (out.nNodes != 0 && out.nNodes != nNodes) + elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple"); + + if (out.nNodes) // few lines before you compare with 0 Fixed. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index c4e8a3b..9279756 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #include "postgres.h" +#include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" -#include "catalog/pg_type.h" #include "miscadmin.h" #include "storage/lmgr.h" #include "storage/predicate.h" @@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; bool res = false; - int i; if (scan->xs_hitup) { @@ -564,45 +563,10 @@ getNextNearest(IndexScanDesc scan) /* found a heap item at currently minimal distance */ scan->xs_ctup.t_self = item->data.heap.heapPtr; scan->xs_recheck = item->data.heap.recheck; - scan->xs_recheckorderby = item->data.
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! > 14 июля 2018 г., в 1:31, Nikita Glukhov написал(а): > > Attached 6th version of the patches. > ... >> 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses >> need to update it's behavior somehow to be used as-is? Or to support new >> functionality? > > There are no SP-GiST opclasses in contrib/, so there is nothing to change in > contrib/. Moreover, existing extensions need to be updated only for support > of > new distance-ordered searches -- they need to implement distances[][] array > calculation in their spgInnerConsistent() and spgLeafConsistent() support > functions. So, if extension will not update anything - it will keep all preexisting functionality, right? I have some more nitpicks about naming + recheckQual = out.recheck; + recheckDist = out.recheckDistances; Can we call this things somehow in one fashion? Also, this my be a stupid question, but do we really need to differ this two recheck cases? It is certain that we cannot merge them? This seems strange if-formatting + /* If allTheSame, they should all or none of 'em match */ + if (innerTuple->allTheSame) + if (out.nNodes != 0 && out.nNodes != nNodes) + elog(ERROR, "inconsistent inner_consistent results for allTheSame inner tuple"); + + if (out.nNodes) // few lines before you compare with 0 Best regards, Andrey Borodin.
Re: [HACKERS] [PATCH] kNN for SP-GiST
Attached 6th version of the patches. On 09.07.2018 20:47, Andrey Borodin wrote: 4 июля 2018 г., в 3:21, Nikita Glukhov написал(а): Attached 5th version of the patches, where minor refactoring of distance handling was done (see below). I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small notices. Thank your for your review. Following directives can be omitted: #include "lib/rbtree.h" #include "storage/relfilenode.h" This message is not noly GiST related, is it? elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy"); Some small typos: usefull -> useful nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?) Fixed. On 10.07.2018 20:09, Andrey Borodin wrote: I've passed through the code one more time. Here are few more questions: 1. Logic behind division of the patch into steps is described last time 2017-01-30, but ISTM actual steps have changed since than? May I ask you to write a bit about steps of the patchset? A brief description of the current patch set: 1. Exported two box functions that are used in patch 5. 2. Extracted subroutine index_store_float8_orderby_distances() from GiST code that is common for GiST, SP-GiST, and possibly other kNN implementations (used in patch 4). 3. Extracted two subroutines from GiST code common for gistproperty() and spgistproperty() (used in path 4). 4. The main patch: added kNN support to SP-GiST. This patch in theory can be split into two patches: refactoring and kNN support, but it will require some additional effort. Refactoring basically consists in the extraction of spgInnerTest(), spgTestLeafTuple(), spgMakeInnerItem() subroutines. A more detailed commit history can be found in my github [1]. 5. Added ordering operator point <-> point to kd_point_ops and quad_point_ops. 6. Added ordering operator polygon <-> point to poly_ops. 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need to update it's behavior somehow to be used as-is? Or to support new functionality? There are no SP-GiST opclasses in contrib/, so there is nothing to change in contrib/. Moreover, existing extensions need to be updated only for support of new distance-ordered searches -- they need to implement distances[][] array calculation in their spgInnerConsistent() and spgLeafConsistent() support functions. 3. There is a patch about predicate locking in SP-GiST [2] Is this KNN patch theoretically compatible with predicate locking? Seems like it is, I just want to note that this functionality may exist. I think kNN is compatible with predicate locking. Patch [2] does not apply cleanly on kNN but the conflict resolution is trivial. 4. Scan state now have scanStack and queue. May be it's better to name scanStack and scanQueue or stack and queue? "queue" was renamed to "scanQueue". [1] https://github.com/glukhovn/postgres/commits/knn_spgist [2] https://commitfest.postgresql.org/14/1215/ -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index c4e8a3b..9279756 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #i
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! > I'm reviewing this patch. Currently I'm trying to understand sp-gist scan > deeeper, but as for now have some small notices. I've passed through the code one more time. Here are few more questions: 1. Logic behind division of the patch into steps is described last time 2017-01-30, but ISTM actual steps have changed since than? May I ask you to write a bit about steps of the patchset? 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need to update it's behavior somehow to be used as-is? Or to support new functionality? 3. There is a patch about predicate locking in SP-GiST [0] Is this KNN patch theoretically compatible with predicate locking? Seems like it is, I just want to note that this functionality may exist. 4. Scan state now have scanStack and queue. May be it's better to name scanStack and scanQueue or stack and queue? Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/14/1215/
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! > 4 июля 2018 г., в 3:21, Nikita Glukhov написал(а): > Attached 5th version of the patches, where minor refactoring of distance > handling was done (see below). > I'm reviewing this patch. Currently I'm trying to understand sp-gist scan deeeper, but as for now have some small notices. Following directives can be omitted: #include "lib/rbtree.h" #include "storage/relfilenode.h" This message is not noly GiST related, is it? elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy"); Some small typos: usefull -> useful nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?) Best regards, Andrey Borodin.
Re: [HACKERS] [PATCH] kNN for SP-GiST
Attached 5th version of the patches, where minor refactoring of distance handling was done (see below). On 02.07.2018 19:06, Alexander Korotkov wrote: Hi! On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov wrote: On 06.03.2018 17:30, David Steele wrote: I agree with Andres. Pushing this patch to the next CF. Attached 4th version of the patches rebased onto the current master. Nothing interesting has changed from the previous version. I took a look to this patchset. In general, it looks good for me, but I've following notes about it. * We're going to replace scan stack with pairing heap not only for KNN-search, but for regular search as well. Did you verify that it doesn't cause regression for regular search case, because insertion into pairing heap might be slower than insertion into stack? One may say that it didn't caused regression in GiST, and that's why it shouldn't cause regression in SP-GiST. However, SP-GiST trees might be much higher and these performance aspects might be different. I decided to bring back the List-based scan stack in the previous version of the patches, and here is how spgAddSearchItemToQueue() looks like now: static void spgAddSearchItemToQueue(SpGistScanOpaque so, SpGistSearchItem *item) { if (so->queue) pairingheap_add(so->queue, &item->phNode); else so->scanStack = lcons(item, so->scanStack); } so->queue is initialized in spgrescan() only for ordered searches. * I think null handling requires better explanation. Nulls are specially handled in pairingheap_SpGistSearchItem_cmp(). In the same time you explicitly set infinity distances for leaf nulls. You probably have reasons to implement it this way, but I think this should be explained. Also isnull property of SpGistSearchItem doesn't have comment. Distances for NULL items are expected to be NULL (it would not be true for non-strict the distance operators, so we do not seem to support them here), and so NULL items are considered to be greater than any non-NULL items in pairingheap_SpGistSearchItem_cmp(). Distances are copied into SpGistSearchItem only if it is non-NULL item. Also in the last version of the patch I have introduced spgAllocSearchItem() which conditionally allocates distance-array in SpGistSearchItem. Now inifinity distances are used only in one place, but if we require that innerConsistent() should always return distances, then we can completely get rid of so->infDistances field. * I think KNN support should be briefly described in src/backed/access/spgist/README. A minimal description written by the original author Vlad Sterzhanov already present in README. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index c4e8a3b..9279756 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #include "postgres.h" +#include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" -#include "catalog/pg_type.h" #include "miscadmin.h" #include "storage/lmgr.h" #include "storage/predicate.h" @@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; bool res = false; - int i; if (scan->xs_hitup) { @@ -564,45 +
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi! On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov wrote: > On 06.03.2018 17:30, David Steele wrote: > > > I agree with Andres. Pushing this patch to the next CF. > > Attached 4th version of the patches rebased onto the current master. > Nothing interesting has changed from the previous version. I took a look to this patchset. In general, it looks good for me, but I've following notes about it. * We're going to replace scan stack with pairing heap not only for KNN-search, but for regular search as well. Did you verify that it doesn't cause regression for regular search case, because insertion into pairing heap might be slower than insertion into stack? One may say that it didn't caused regression in GiST, and that's why it shouldn't cause regression in SP-GiST. However, SP-GiST trees might be much higher and these performance aspects might be different. * I think null handling requires better explanation. Nulls are specially handled in pairingheap_SpGistSearchItem_cmp(). In the same time you explicitly set infinity distances for leaf nulls. You probably have reasons to implement it this way, but I think this should be explained. Also isnull property of SpGistSearchItem doesn't have comment. * I think KNN support should be briefly described in src/backed/access/spgist/README. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] kNN for SP-GiST
On 06.03.2018 17:30, David Steele wrote: I agree with Andres. Pushing this patch to the next CF. Attached 4th version of the patches rebased onto the current master. Nothing interesting has changed from the previous version. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index c4e8a3b..9279756 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #include "postgres.h" +#include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" -#include "catalog/pg_type.h" #include "miscadmin.h" #include "storage/lmgr.h" #include "storage/predicate.h" @@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; bool res = false; - int i; if (scan->xs_hitup) { @@ -564,45 +563,10 @@ getNextNearest(IndexScanDesc scan) /* found a heap item at currently minimal distance */ scan->xs_ctup.t_self = item->data.heap.heapPtr; scan->xs_recheck = item->data.heap.recheck; - scan->xs_recheckorderby = item->data.heap.recheckDistances; - for (i = 0; i < scan->numberOfOrderBys; i++) - { -if (so->orderByTypes[i] == FLOAT8OID) -{ -#ifndef USE_FLOAT8_BYVAL - /* must free any old value to avoid memory leakage */ - if (!scan->xs_orderbynulls[i]) - pfree(DatumGetPointer(scan->xs_orderbyvals[i])); -#endif - scan->xs_orderbyvals[i] = Float8GetDatum(item->distances[i]); - scan->xs_orderbynulls[i] = false; -} -else if (so->orderByTypes[i] == FLOAT4OID) -{ - /* convert distance function's result to ORDER BY type */ -#ifndef USE_FLOAT4_BYVAL - /* must free any old value to avoid memory leakage */ - if (!scan->xs_orderbynulls[i]) - pfree(DatumGetPointer(scan->xs_orderbyvals[i])); -#endif - scan->xs_orderbyvals[i] = Float4GetDatum((float4) item->distances[i]); - scan->xs_orderbynulls[i] = false; -} -else -{ - /* - * If the ordering operator's return value is anything - * else, we don't know how to convert the float8 bound - * calculated by the distance function to that. The - * executor won't actually need the order by values we - * return here, if there are no lossy results, so only - * insist on converting if the *recheck flag is set. - */ - if (scan->xs_recheckorderby) - elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy"); - scan->xs_orderbynulls[i] = true; -} - } + + index_store_float8_orderby_distances(scan, so->orderByTypes, + item->distances, + item->data.heap.recheckDistances); /* in an index-only scan, also return the reconstructed tuple. */ if (scan->xs_want_itup) diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 22b5cc9..6dac90c 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -74,6 +74,7 @@ #include "access/transam.h" #include "access/xlog.h" #include "catalog/index.h" +#include "catalog/pg_type.h" #include "pgstat.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" @@ -897,3 +898,72 @@ index_getprocinfo(Relation irel,
Re: Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi Nikita, On 3/2/18 1:35 AM, Andres Freund wrote: > > On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote: >> Attached 3rd version of kNN for SP-GiST. > > Given that this was submitted to the last v11 CF, after not being > developed for a year, I think it's unfortunately too late for v11. As we > should be concentrating on getting things into v11, I think it'd be > appropriate to move this to the next CF. I agree with Andres. Pushing this patch to the next CF. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] [PATCH] kNN for SP-GiST
Hi, On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote: > Attached 3rd version of kNN for SP-GiST. Given that this was submitted to the last v11 CF, after not being developed for a year, I think it's unfortunately too late for v11. As we should be concentrating on getting things into v11, I think it'd be appropriate to move this to the next CF. Greetings, Andres Freund
Re: [HACKERS] [PATCH] kNN for SP-GiST
Attached 3rd version of kNN for SP-GiST. On 09.03.2017 16:52, Alexander Korotkov wrote: Hi, Nikita! I take a look to this patchset. My first notes are following. * 0003-Extract-index_store_orderby_distances-v02.patch Function index_store_orderby_distances doesn't look general enough for its name. GiST supports ordering only by float4 and float8 distances. SP-GiST also goes that way. But in the future, access methods could support distances of other types. Thus, I suggest to rename function to index_store_float8_orderby_distances(). This function was renamed. * 0004-Add-kNN-support-to-SP-GiST-v02.patch Patch didn't applied cleanly. Patches were rebased onto current master. *** AUTHORS *** 371,373 --- 376,379 Teodor Sigaev mailto:teo...@sigaev.ru>> Oleg Bartunov mailto:o...@sai.msu.su>> + Vlad Sterzhanov mailto:glide...@gmail.com>> I don't think it worth mentioning here GSoC student who made just dirty prototype and abandon this work just after final evaluation. Student's mention was removed. More generally, you switched SP-GiST from scan stack to pairing heap. We should check if it introduces overhead to non-KNN scans. I decided to bring back scan stack for unordered scans. Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now. * 0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch I faced following warning. spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration] return get_float8_nan(); ^ 1 warning generated. Fixed. * 0006-Add-spg_compress-method-to-SP-GiST-v02.patch * 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch I think this is out of scope for KNN SP-GiST. This is very valuable, but completely separate feature. And it should be posted and discussed separately. Compress method for SP-GiST with poly_ops already have been committed, so I left only ordering operators for polygons in the last 6th patch. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index b30b931..da76a76 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #include "postgres.h" +#include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" -#include "catalog/pg_type.h" #include "miscadmin.h" #include "pgstat.h" #include "lib/pairingheap.h" @@ -540,7 +540,6 @@ getNextNearest(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; bool res = false; - int i; if (scan->xs_hitup) { @@ -561,45 +560,10 @@ getNextNearest(IndexScanDesc scan) /* found a heap item at currently minimal distance */ scan->xs_ctup.t_self = item->data.heap.heapPtr; scan->xs_recheck = item->data.heap.recheck; - scan->xs_recheckorderby = item->data.heap.recheckDistances; - for (i = 0; i < scan->numberOfOrderBys; i++) - { -if (so->orderByTypes[i] == FLOAT8OID) -{ -#ifndef USE_FLOAT8_BYVAL - /* must free any old value to avoid memory leakage */ - if (!scan->xs_orderbynulls[i]) - pfree(DatumGetPointer(scan->xs_orderbyvals[i])); -#endif - scan->xs_orderbyvals[i] =