Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-09-27 Thread Alexander Korotkov
On Thu, Sep 27, 2018 at 8:28 PM Mark Dilger  wrote:

> > On Sep 18, 2018, at 3:58 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> >
> > On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin 
> wrote:
> >>> 17 сент. 2018 г., в 2:03, Alexander Korotkov <
> a.korot...@postgrespro.ru> написал(а):
> >>>
> >>> Also, it appears to me that it's OK to be a single patch
> >>
> >> +1, ISTM that these 6 patches represent atomic unit of work.
> >
> > Thank you, pushed.
>
> One note on this commit,
>
> in my fork, I have converted BoxPGetDatum from a macro to a static inline
> function,
> and it shows a thinko in your commit, namely that in->leafDatum is of type
> (BOX *),
> but is actually (as the name implies) already a Datum:
>
> spgquadtreeproc.c:469:27: error: incompatible integer to pointer
> conversion passing 'Datum' (aka 'unsigned long') to parameter of type 'BOX
> *' [-Werror,-Wint-conversion]
>
>   BoxPGetDatum(in->leafDatum), true,
>
> I'm not claiming this creates any active bug, just that it would make more
> sense to
> handle the typing cleanly.  Perhaps the following:
>
> diff --git a/src/backend/access/spgist/spgquadtreeproc.c
> b/src/backend/access/spgist/spgquadtreeproc.c
> index dee438a307..90cc776899 100644
> --- a/src/backend/access/spgist/spgquadtreeproc.c
> +++ b/src/backend/access/spgist/spgquadtreeproc.c
> @@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)
>
> if (res && in->norderbys > 0)
> /* ok, it passes -> let's compute the distances */
> -   out->distances = spg_key_orderbys_distances(
> -
>  BoxPGetDatum(in->leafDatum), true,
> +   out->distances = spg_key_orderbys_distances(in->leafDatum,
> true,
>
>   in->orderbys, in->norderbys);
>

True.  Pushed, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-09-27 Thread Mark Dilger



> On Sep 18, 2018, at 3:58 PM, Alexander Korotkov  
> wrote:
> 
> On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin  wrote:
>>> 17 сент. 2018 г., в 2:03, Alexander Korotkov  
>>> написал(а):
>>> 
>>> Also, it appears to me that it's OK to be a single patch
>> 
>> +1, ISTM that these 6 patches represent atomic unit of work.
> 
> Thank you, pushed.

One note on this commit,

in my fork, I have converted BoxPGetDatum from a macro to a static inline 
function,
and it shows a thinko in your commit, namely that in->leafDatum is of type (BOX 
*),
but is actually (as the name implies) already a Datum:

spgquadtreeproc.c:469:27: error: incompatible integer to pointer conversion 
passing 'Datum' (aka 'unsigned long') to parameter of type 'BOX *' 
[-Werror,-Wint-conversion]

BoxPGetDatum(in->leafDatum), true,

I'm not claiming this creates any active bug, just that it would make more 
sense to
handle the typing cleanly.  Perhaps the following:

diff --git a/src/backend/access/spgist/spgquadtreeproc.c 
b/src/backend/access/spgist/spgquadtreeproc.c
index dee438a307..90cc776899 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -465,8 +465,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)
 
if (res && in->norderbys > 0)
/* ok, it passes -> let's compute the distances */
-   out->distances = spg_key_orderbys_distances(
-   
BoxPGetDatum(in->leafDatum), true,
+   out->distances = spg_key_orderbys_distances(in->leafDatum, true,

in->orderbys, in->norderbys);

mark


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-09-18 Thread Alexander Korotkov
On Mon, Sep 17, 2018 at 12:42 PM Andrey Borodin  wrote:
> > 17 сент. 2018 г., в 2:03, Alexander Korotkov  
> > написал(а):
> >
> > Also, it appears to me that it's OK to be a single patch
>
> +1, ISTM that these 6 patches represent atomic unit of work.

Thank you, pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-09-17 Thread Andrey Borodin
Hi!

> 17 сент. 2018 г., в 2:03, Alexander Korotkov  
> написал(а):
> 
> Also, it appears to me that it's OK to be a single patch

+1, ISTM that these 6 patches represent atomic unit of work.

Best regards, Andrey Borodin.


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-09-16 Thread Alexander Korotkov
On Thu, Aug 30, 2018 at 3:57 PM Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Generally patch looks close to committable shape for me.  I'm going to
> revise code and documentation again, split it up, and then propose to
> commit.
>

I've revised this patch again.  This revision includes a lot of code
beautification, adjustments to comments and documentation.  Also, after
reviewing bug fix [1] I found that we don't need separate memory context
for queue.  traversalCxt looks perfectly fine for that purpose, because
there it's single scan lifetime.  Also, it appears to me that it's OK to be
a single patch: besides KNN SP-GiST and opclasses it contains only
extraction of few common function between GiST and SP-GiST.

So, I'm going to commit this if no objections.

Links.
1.
https://www.postgresql.org/message-id/flat/153663176628.23136.11901365223750051490%40wrigleys.postgresql.org

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


knn-spgist-v9.patch
Description: Binary data


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-08-30 Thread Alexander Korotkov
On Thu, Aug 30, 2018 at 12:41 PM Alexander Korotkov
 wrote:
> Right, performance regression appears to be caused by queue memory
> context allocation.  I've tried to apply the same approach that we've
> in GiST: allocate separate memory context for queue only at second
> rescan call.  And it appears to work, there is no measurable
> regression in comparison to master.
>
> Patch v8
> x run 1 run 2 run 3
> 0.1 680   660   662
> 0.01   1403  1395  1508
> 0.003  6561  6475  6223
>
> Revised version of patch is attached.  I've squashed patchset into one
> patch.  Later I'll split it into fewer parts before committing.  I'm
> going to also make some benchmarking of KNN itself: GiST vs SP-GiST.

I've made KNN GiST vs SP-GiST benchmarking similar to what I've done
previously for plain search.

create table knn_test as (select point(random(), random()) p from
generate_series(1,100) i);
create index knn_test_idx on knn_test using spgist(p);
create index knn_test_idx on knn_test using gist(p);

CREATE FUNCTION knn_bench(step float8, lim int) RETURNS void AS $$
DECLARE
x float8;
y float8;
BEGIN
y := 0.0;
WHILE y <= 1.0 LOOP
x := 0.0;
WHILE x <= 1.0 LOOP
PERFORM * FROM knn_test ORDER BY p <-> point(x,y) LIMIT lim;
x := x + step;
END LOOP;
y := y + step;
END LOOP;
END;
$$ LANGUAGE plpgsql;

And the results are following.

# GiST
step limit run 1 run 2 run 3 buffers
 0.1  1000   156   161   158  123191
0.03   100   298   305   301  122902
0.0110  1216  1214  1212  138591

# SP-GiST
step limit run 1 run 2 run 3 buffers
 0.1  1000   147   152   153  126446
0.03   100   227   223   226  131241
0.0110   734   733   730  175908

For me it looks expectable.  SP-GiST takes less CPU, but uses more
buffers.  It's pretty same to what we have for plain scans.  So, I see
no anomaly here.

Generally patch looks close to committable shape for me.  I'm going to
revise code and documentation again, split it up, and then propose to
commit.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-08-30 Thread Alexander Korotkov
On Thu, Aug 30, 2018 at 12:30 PM Alexander Korotkov
 wrote:
> On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov
>  wrote:
> > # Current patch (use list)
> > x run 1 run 2 run 3
> > 0.11206  1230  1231
> > 0.01   2862  2813  2802
> > 0.003 13915 13911 13900
>
> Sorry, I didn't explain what these tables means.  There are times in
> milliseconds for 3 runs of spgist_bench($x, $x)

Right, performance regression appears to be caused by queue memory
context allocation.  I've tried to apply the same approach that we've
in GiST: allocate separate memory context for queue only at second
rescan call.  And it appears to work, there is no measurable
regression in comparison to master.

Patch v8
x run 1 run 2 run 3
0.1 680   660   662
0.01   1403  1395  1508
0.003  6561  6475  6223

Revised version of patch is attached.  I've squashed patchset into one
patch.  Later I'll split it into fewer parts before committing.  I'm
going to also make some benchmarking of KNN itself: GiST vs SP-GiST.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


knn-spgist-v8.patch
Description: Binary data


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-08-30 Thread Alexander Korotkov
On Thu, Aug 30, 2018 at 12:17 PM Alexander Korotkov
 wrote:
> # Current patch (use list)
> x run 1 run 2 run 3
> 0.11206  1230  1231
> 0.01   2862  2813  2802
> 0.003 13915 13911 13900

Sorry, I didn't explain what these tables means.  There are times in
milliseconds for 3 runs of spgist_bench($x, $x)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-08-30 Thread Alexander Korotkov
Hi!

On Tue, Aug 28, 2018 at 12:50 PM Alexander Korotkov
 wrote:
> On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin  wrote:
> > I'm not sure in architectural point of view: supporting two ways (list and 
> > heap) to store result seems may be a bit heavy, but OK. At least, it has 
> > meaningful benefits.
>
> It seems that Nikita have reworked it that way after my suspect that
> switching regular scans to pairing heap *might* cause a regression.
> However, I didn't insist that we need to support two ways.  For
> instance, if we can prove that there is no regression then it's fine
> to use just heap...

So, I decided to check does it really matter to support both list and
pairing heap?  Or can we just always use pairing heap?

I wrote following simple test.

Points are uniformly distributed in box (0,0)-(1,1).
# create table spgist_test as (select point(random(), random()) p from
generate_series(1,100) i);
# create index spgist_test_idx on spgist_test using spgist(p);

spgist_bench() walks over this box with given step and queries it each
time with boxes of given size.
CREATE FUNCTION spgist_bench(step float8, size float8) RETURNS void AS $$
DECLARE
x float8;
y float8;
BEGIN
y := 0.0;
WHILE y <= 1.0 LOOP
x := 0.0;
WHILE x <= 1.0 LOOP
PERFORM * FROM spgist_test WHERE p <@ box(point(x,y),
point(x+size, y+size));
x := x + step;
END LOOP;
y := y + step;
END LOOP;
END;
$$ LANGUAGE plpgsql;

It fist I've compared current patch with modified patch, which I made
to to always queue scan.

# Current patch (use list)
x run 1 run 2 run 3
0.11206  1230  1231
0.01   2862  2813  2802
0.003 13915 13911 13900

# Always use queue
x run 1 run 2 run 3
0.11238  1189  1170
0.01   2740  2852  2769
0.003 13627 13751 13757

And it appears that difference is less than statistical error.  But
then I compared that with master.  And it appears that master is much
faster.

# Master
x run 1 run 2 run 3
0.1 725   691   700
0.01   1488  1480  1495
0.003  6696  6210  6295

It's probably because we're anyway allocating separate queue memory
context.  I'll explore this more and come up with details.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-08-28 Thread Alexander Korotkov
On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin  wrote:
> I'm not sure in architectural point of view: supporting two ways (list and 
> heap) to store result seems may be a bit heavy, but OK. At least, it has 
> meaningful benefits.

It seems that Nikita have reworked it that way after my suspect that
switching regular scans to pairing heap *might* cause a regression.
However, I didn't insist that we need to support two ways.  For
instance, if we can prove that there is no regression then it's fine
to use just heap...

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdu6Wm4DSAp8Pvwq0uo7fCSzsbrNy7x2v5EKK_g4Nkjx1Q%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-26 Thread Andrey Borodin
17 июля 2018 г., в 16:42, Nikita Glukhov  написал(а):
> 
> Fixed.

Patch works as advertised, adds documentation and tests. I didn't succeeded in 
attempts to break it's functionality. I have no more notices about the code.
Maybe except this const looks unusual, but is correct
+extern BOX *box_copy(const BOX *box);

I'm not sure in architectural point of view: supporting two ways (list and 
heap) to store result seems may be a bit heavy, but OK. At least, it has 
meaningful benefits.

I think the patch is ready for committer.

Best regards, Andrey Borodin.


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-17 Thread Nikita Glukhov

Attached 7th version of the patches:
 * renamed recheck fields and variables
 * fixed formatting of the one if-statement

On 15.07.2018 14:54, Andrey Borodin wrote:


14.07.2018, 1:31, Nikita Glukhov  wrote:

Attached  6th version of the patches.
...

2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
need to update it's behavior somehow to be used as-is? Or to support new
functionality?

There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/.  Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.

So, if extension will not update anything - it will keep all preexisting 
functionality, right?


Yes, of course.


I have some more nitpicks about naming
+   recheckQual = out.recheck;
+   recheckDist = out.recheckDistances;
Can we call this things somehow in one fashion?


I would prefer to rename "spgLeafConsitentOut.recheck" to "recheckQual" but it
is not good to rename user-visible fields, so I decided to rename all fields
and variables to "recheck"/"recheckDistances".


Also, this my be a stupid question, but do we really need to differ this two
recheck cases? It is certain that we cannot merge them?


This recheck cases are absolutely different.

In the first case, opclass support functions can not accurately determine
whether the leaf value satisfies the boolean search operator (compressed
values can be stored in leaves).

In the second case, opclass returns only a approximate value (the lower bound)
of the leaf distances.

In the example below operator 'polygon >> polygon' does not need recheck because
bounding box (they are stored in the index instead of polygons) test gives exact
results, but the ordering operator 'polygon <-> point' needs recheck:

SELECT * FROM polygons
WHERE poly >> polygon(box '((0,0),(1,1))')
ORDER BY poly <-> point '(0,0)';


This seems strange if-formatting
+   /* If allTheSame, they should all or none of 'em match */
+   if (innerTuple->allTheSame)
+   if (out.nNodes != 0 && out.nNodes != nNodes)
+   elog(ERROR, "inconsistent inner_consistent results for 
allTheSame inner tuple");
+
+   if (out.nNodes) // few lines before you compare with 0


Fixed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index c4e8a3b..9279756 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
-#include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
-	int			i;
 
 	if (scan->xs_hitup)
 	{
@@ -564,45 +563,10 @@ getNextNearest(IndexScanDesc scan)
 			/* found a heap item at currently minimal distance */
 			scan->xs_ctup.t_self = item->data.heap.heapPtr;
 			scan->xs_recheck = item->data.heap.recheck;
-			scan->xs_recheckorderby = item->data.

Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-15 Thread Andrey Borodin
Hi!

> 14 июля 2018 г., в 1:31, Nikita Glukhov  написал(а):
> 
> Attached  6th version of the patches.
> ...
>> 2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
>> need to update it's behavior somehow to be used as-is? Or to support new
>> functionality?
> 
> There are no SP-GiST opclasses in contrib/, so there is nothing to change in
> contrib/.  Moreover, existing extensions need to be updated only for support 
> of
> new distance-ordered searches -- they need to implement distances[][] array
> calculation in their spgInnerConsistent() and spgLeafConsistent() support
> functions.
So, if extension will not update anything - it will keep all preexisting 
functionality, right?

I have some more nitpicks about naming
+   recheckQual = out.recheck;
+   recheckDist = out.recheckDistances;
Can we call this things somehow in one fashion?

Also, this my be a stupid question, but do we really need to differ this two 
recheck cases? It is certain that we cannot merge them?

This seems strange if-formatting
+   /* If allTheSame, they should all or none of 'em match */
+   if (innerTuple->allTheSame)
+   if (out.nNodes != 0 && out.nNodes != nNodes)
+   elog(ERROR, "inconsistent inner_consistent results for 
allTheSame inner tuple");
+
+   if (out.nNodes) // few lines before you compare with 0

Best regards, Andrey Borodin.


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-13 Thread Nikita Glukhov

Attached  6th version of the patches.

On 09.07.2018 20:47, Andrey Borodin wrote:


4 июля 2018 г., в 3:21, Nikita Glukhov  написал(а):
Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan
deeeper, but as for now have some small notices.


Thank your for your review.



Following directives can be omitted:
#include "lib/rbtree.h"
#include "storage/relfilenode.h"

This message is not noly GiST related, is it?
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or 
float4 if the distance function is lossy");

Some small typos:
usefull -> useful
nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)


Fixed.

On 10.07.2018 20:09, Andrey Borodin wrote:


I've passed through the code one more time. Here are few more questions:
1. Logic behind division of the patch into steps is described last time
2017-01-30, but ISTM actual steps have changed since than? May I ask you
to write a bit about steps of the patchset?


A brief description of the current patch set:
1. Exported two box functions that are used in patch 5.
2. Extracted subroutine index_store_float8_orderby_distances() from GiST code
   that is common for GiST, SP-GiST, and possibly other kNN implementations
   (used in patch 4).
3. Extracted two subroutines from GiST code common for gistproperty() and
   spgistproperty() (used in path 4).
4. The main patch: added kNN support to SP-GiST.
   This patch in theory can be split into two patches: refactoring and kNN
   support, but it will require some additional effort.
   Refactoring basically consists in the extraction of spgInnerTest(),
   spgTestLeafTuple(), spgMakeInnerItem() subroutines.
   A more detailed commit history can be found in my github [1].
5. Added ordering operator point <-> point to kd_point_ops and quad_point_ops.
6. Added ordering operator polygon <-> point to poly_ops.


2. The patch leaves contribs intact. Do extensions with sp-gist opclasses
need to update it's behavior somehow to be used as-is? Or to support new
functionality?


There are no SP-GiST opclasses in contrib/, so there is nothing to change in
contrib/.  Moreover, existing extensions need to be updated only for support of
new distance-ordered searches -- they need to implement distances[][] array
calculation in their spgInnerConsistent() and spgLeafConsistent() support
functions.


3. There is a patch about predicate locking in SP-GiST [2] Is this KNN patch
theoretically compatible with predicate locking? Seems like it is, I just
want to note that this functionality may exist.


I think kNN is compatible with predicate locking.  Patch [2] does not apply
cleanly on kNN but the conflict resolution is trivial.


4. Scan state now have scanStack and queue. May be it's better to name
scanStack and scanQueue or stack and queue?


"queue" was renamed to "scanQueue".

[1] https://github.com/glukhovn/postgres/commits/knn_spgist
[2] https://commitfest.postgresql.org/14/1215/

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index c4e8a3b..9279756 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #i

Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-10 Thread Andrey Borodin
Hi!

> I'm reviewing this patch. Currently I'm trying to understand sp-gist scan 
> deeeper, but as for now have some small notices.

I've passed through the code one more time. Here are few more questions:
1. Logic behind division of the patch into steps is described last time 
2017-01-30, but ISTM actual steps have changed since than? May I ask you to 
write a bit about steps of the patchset?
2. The patch leaves contribs intact. Do extensions with sp-gist opclasses need 
to update it's behavior somehow to be used as-is? Or to support new 
functionality?
3. There is a patch about predicate locking in SP-GiST [0] Is this KNN patch 
theoretically compatible with predicate locking? Seems like it is, I just want 
to note that this functionality may exist.
4. Scan state now have scanStack and queue. May be it's better to name 
scanStack and scanQueue or stack and queue?

Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/14/1215/


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-09 Thread Andrey Borodin
Hi!


> 4 июля 2018 г., в 3:21, Nikita Glukhov  написал(а):
> Attached 5th version of the patches, where minor refactoring of distance 
> handling was done (see below).
> 

I'm reviewing this patch. Currently I'm trying to understand sp-gist scan 
deeeper, but as for now have some small notices.

Following directives can be omitted:
#include "lib/rbtree.h"
#include "storage/relfilenode.h"

This message is not noly GiST related, is it?
elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or 
float4 if the distance function is lossy");

Some small typos:
usefull -> useful
nearest-neighbour -> nearest-neighbor (or do we use e.g. "colour"?)

Best regards, Andrey Borodin.


Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-03 Thread Nikita Glukhov

Attached 5th version of the patches, where minor refactoring of distance
handling was done (see below).

On 02.07.2018 19:06, Alexander Korotkov wrote:


Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov  wrote:

On 06.03.2018 17:30, David Steele wrote:


I agree with Andres.  Pushing this patch to the next CF.

Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.


I decided to bring back the List-based scan stack in the previous version
of the patches, and here is how spgAddSearchItemToQueue() looks like now:

static void
spgAddSearchItemToQueue(SpGistScanOpaque so, SpGistSearchItem *item)
{
if (so->queue)
pairingheap_add(so->queue, &item->phNode);
else
so->scanStack = lcons(item, so->scanStack);
}

so->queue is initialized in spgrescan() only for ordered searches.


* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.


Distances for NULL items are expected to be NULL (it would not be true for
non-strict the distance operators, so we do not seem to support them here),
and so NULL items are considered to be greater than any non-NULL items in
pairingheap_SpGistSearchItem_cmp().  Distances are copied into SpGistSearchItem
only if it is non-NULL item.  Also in the last version of the patch I have
introduced spgAllocSearchItem() which conditionally allocates distance-array in
SpGistSearchItem.  Now inifinity distances are used only in one place, but
if we require that innerConsistent() should always return distances, then
we can completely get rid of so->infDistances field.



* I think KNN support should be briefly described in
src/backed/access/spgist/README.


A minimal description written by the original author Vlad Sterzhanov
already present in README.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index c4e8a3b..9279756 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
-#include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
-	int			i;
 
 	if (scan->xs_hitup)
 	{
@@ -564,45 +

Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-07-02 Thread Alexander Korotkov
Hi!

On Fri, Jun 29, 2018 at 5:37 PM Nikita Glukhov  wrote:
> On 06.03.2018 17:30, David Steele wrote:
>
> > I agree with Andres.  Pushing this patch to the next CF.
>
> Attached 4th version of the patches rebased onto the current master.
> Nothing interesting has changed from the previous version.

I took a look to this patchset.  In general, it looks good for me, but
I've following notes about it.

* We're going to replace scan stack with pairing heap not only for
KNN-search, but for regular search as well.  Did you verify that it
doesn't cause regression for regular search case, because insertion
into pairing heap might be slower than insertion into stack?  One may
say that it didn't caused regression in GiST, and that's why it
shouldn't cause regression in SP-GiST.  However, SP-GiST trees might
be much higher and these performance aspects might be different.

* I think null handling requires better explanation. Nulls are
specially handled in pairingheap_SpGistSearchItem_cmp().  In the same
time you explicitly set infinity distances for leaf nulls.  You
probably have reasons to implement it this way, but I think this
should be explained.  Also isnull property of SpGistSearchItem doesn't
have comment.

* I think KNN support should be briefly described in
src/backed/access/spgist/README.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-06-29 Thread Nikita Glukhov

On 06.03.2018 17:30, David Steele wrote:


I agree with Andres.  Pushing this patch to the next CF.


Attached 4th version of the patches rebased onto the current master.
Nothing interesting has changed from the previous version.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index c4e8a3b..9279756 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
-#include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -543,7 +543,6 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
-	int			i;
 
 	if (scan->xs_hitup)
 	{
@@ -564,45 +563,10 @@ getNextNearest(IndexScanDesc scan)
 			/* found a heap item at currently minimal distance */
 			scan->xs_ctup.t_self = item->data.heap.heapPtr;
 			scan->xs_recheck = item->data.heap.recheck;
-			scan->xs_recheckorderby = item->data.heap.recheckDistances;
-			for (i = 0; i < scan->numberOfOrderBys; i++)
-			{
-if (so->orderByTypes[i] == FLOAT8OID)
-{
-#ifndef USE_FLOAT8_BYVAL
-	/* must free any old value to avoid memory leakage */
-	if (!scan->xs_orderbynulls[i])
-		pfree(DatumGetPointer(scan->xs_orderbyvals[i]));
-#endif
-	scan->xs_orderbyvals[i] = Float8GetDatum(item->distances[i]);
-	scan->xs_orderbynulls[i] = false;
-}
-else if (so->orderByTypes[i] == FLOAT4OID)
-{
-	/* convert distance function's result to ORDER BY type */
-#ifndef USE_FLOAT4_BYVAL
-	/* must free any old value to avoid memory leakage */
-	if (!scan->xs_orderbynulls[i])
-		pfree(DatumGetPointer(scan->xs_orderbyvals[i]));
-#endif
-	scan->xs_orderbyvals[i] = Float4GetDatum((float4) item->distances[i]);
-	scan->xs_orderbynulls[i] = false;
-}
-else
-{
-	/*
-	 * If the ordering operator's return value is anything
-	 * else, we don't know how to convert the float8 bound
-	 * calculated by the distance function to that.  The
-	 * executor won't actually need the order by values we
-	 * return here, if there are no lossy results, so only
-	 * insist on converting if the *recheck flag is set.
-	 */
-	if (scan->xs_recheckorderby)
-		elog(ERROR, "GiST operator family's FOR ORDER BY operator must return float8 or float4 if the distance function is lossy");
-	scan->xs_orderbynulls[i] = true;
-}
-			}
+
+			index_store_float8_orderby_distances(scan, so->orderByTypes,
+ item->distances,
+ item->data.heap.recheckDistances);
 
 			/* in an index-only scan, also return the reconstructed tuple. */
 			if (scan->xs_want_itup)
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 22b5cc9..6dac90c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -74,6 +74,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/index.h"
+#include "catalog/pg_type.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -897,3 +898,72 @@ index_getprocinfo(Relation irel,
 

Re: Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-03-06 Thread David Steele
Hi Nikita,

On 3/2/18 1:35 AM, Andres Freund wrote:
> 
> On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:
>> Attached 3rd version of kNN for SP-GiST.
> 
> Given that this was submitted to the last v11 CF, after not being
> developed for a year, I think it's unfortunately too late for v11. As we
> should be concentrating on getting things into v11, I think it'd be
> appropriate to move this to the next CF.

I agree with Andres.  Pushing this patch to the next CF.

Regards,
-- 
-David
da...@pgmasters.net




Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-03-01 Thread Andres Freund
Hi,

On 2018-03-01 00:58:42 +0300, Nikita Glukhov wrote:
> Attached 3rd version of kNN for SP-GiST.

Given that this was submitted to the last v11 CF, after not being
developed for a year, I think it's unfortunately too late for v11. As we
should be concentrating on getting things into v11, I think it'd be
appropriate to move this to the next CF.

Greetings,

Andres Freund



Re: [HACKERS] [PATCH] kNN for SP-GiST

2018-02-28 Thread Nikita Glukhov

Attached 3rd version of kNN for SP-GiST.


On 09.03.2017 16:52, Alexander Korotkov wrote:


Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for 
its name.  GiST supports ordering only by float4 and float8 
distances.  SP-GiST also goes that way. But in the future, access 
methods could support distances of other types.
Thus, I suggest to rename function to 
 index_store_float8_orderby_distances().

This function was renamed.


* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.

Patches were rebased onto current master.



*** AUTHORS
*** 371,373 
--- 376,379 

      Teodor Sigaev mailto:teo...@sigaev.ru>>
      Oleg Bartunov mailto:o...@sai.msu.su>>
+     Vlad Sterzhanov mailto:glide...@gmail.com>>


I don't think it worth mentioning here GSoC student who made just 
dirty prototype and abandon this work just after final evaluation.

Student's mention was removed.

More generally, you switched SP-GiST from scan stack to pairing heap.  
We should check if it introduces overhead to non-KNN scans.

I decided to bring back scan stack for unordered scans.


Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now.


* 
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch


I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 
'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration]

return get_float8_nan();
 ^
1 warning generated.

Fixed.


* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable, 
but completely separate feature.  And it should be posted and 
discussed separately.
Compress method for SP-GiST with poly_ops already have been committed, 
so I left only ordering operators for polygons in the last 6th patch.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index f57380a..a4345ef 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -41,7 +41,6 @@ enum path_delim
 static int	point_inside(Point *p, int npts, Point *plist);
 static int	lseg_crossing(double x, double y, double px, double py);
 static BOX *box_construct(double x1, double x2, double y1, double y2);
-static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2);
 static bool box_ov(BOX *box1, BOX *box2);
 static double box_ht(BOX *box);
 static double box_wd(BOX *box);
@@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2)
 
 /*		box_fill		-		fill in a given box struct
  */
-static BOX *
+BOX *
 box_fill(BOX *result, double x1, double x2, double y1, double y2)
 {
 	if (x1 > x2)
@@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2)
 /*		box_copy		-		copy a box
  */
 BOX *
-box_copy(BOX *box)
+box_copy(const BOX *box)
 {
 	BOX		   *result = (BOX *) palloc(sizeof(BOX));
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index a589e42..94806c2 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -182,6 +182,7 @@ typedef struct
 extern double point_dt(Point *pt1, Point *pt2);
 extern double point_sl(Point *pt1, Point *pt2);
 extern double pg_hypot(double x, double y);
-extern BOX *box_copy(BOX *box);
+extern BOX *box_copy(const BOX *box);
+extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi);
 
 #endif			/* GEO_DECLS_H */
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index b30b931..da76a76 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -14,9 +14,9 @@
  */
 #include "postgres.h"
 
+#include "access/genam.h"
 #include "access/gist_private.h"
 #include "access/relscan.h"
-#include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "lib/pairingheap.h"
@@ -540,7 +540,6 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
-	int			i;
 
 	if (scan->xs_hitup)
 	{
@@ -561,45 +560,10 @@ getNextNearest(IndexScanDesc scan)
 			/* found a heap item at currently minimal distance */
 			scan->xs_ctup.t_self = item->data.heap.heapPtr;
 			scan->xs_recheck = item->data.heap.recheck;
-			scan->xs_recheckorderby = item->data.heap.recheckDistances;
-			for (i = 0; i < scan->numberOfOrderBys; i++)
-			{
-if (so->orderByTypes[i] == FLOAT8OID)
-{
-#ifndef USE_FLOAT8_BYVAL
-	/* must free any old value to avoid memory leakage */
-	if (!scan->xs_orderbynulls[i])
-		pfree(DatumGetPointer(scan->xs_orderbyvals[i]));
-#endif
-	scan->xs_orderbyvals[i] =