Re: [HACKERS] GiST kNN search queue (Re: KNN-GiST with recheck)

Heikki Linnakangas Mon, 22 Dec 2014 02:09:13 -0800

On 12/20/2014 10:50 PM, Peter Geoghegan wrote:

On Wed, Dec 17, 2014 at 6:07 AM, Heikki Linnakangas
<[email protected]> wrote:

How about adding a src/backend/lib/README for that, per attached?


I took a quick look at this. Some observations:

* I like the idea of adding a .../lib README. However, the README
fails to note that ilist.c implements an *integrated* list, unlike the
much more prevalent cell-based List structure. It should note that,
since that's the whole point of ilist.c.


Added a sentence on that.

* pairingheap_remove() is technically dead code. It still makes sense
that you'd have it in this patch, but I think there's an argument for
not including it at all on the theory that if you need to use it you
should use a different data structure. After all, the actual
(non-amortized) complexity of that operation is O(n) [1], and if
remove operations are infrequent as we might expect, that might be the
more important consideration. As long as you are including
pairingheap_remove(), though, why is the local variable "prev_ptr" a
pointer to a pointer to a pairingheap_node, rather than just a pointer
to a pairingheap_node?

* Similarly, the function-like macro pairingheap_reset() doesn't seem
to pull its weight. Why does it exist alongside pairingheap_free()?
I'm not seeing a need to re-use a heap like that.

pairingheap_remove and pairingheap_reset are both unused in this patch,but they were needed for the other use case, tracking snapshots toadvance xmin more aggressively, discussed here:http://www.postgresql.org/message-id/[email protected]. Infact, without the pairingheap_remove() operation, the prev_or_parentpointer wouldn't be necessarily at all. We could've added them as aseparate patch, but that seems like unnecessary code churn.

The prev_ptr variable is used to set the parent's first_child pointer,or the previous sibling's next_sibling pointer, depending on whether theremoved node is the parent's first child or not. I'll add more commentsin pairingheap_remove to explain that.

*  "Assert(!pairingheap_is_empty(heap))" appears in a few places.
You're basically asserting that a pointer isn't null, often
immediately before dereferencing the pointer. This seems to be of
questionable value.

I copied that from binaryheap.c. It has some documentation value; theymake it easy to see that the functions require the heap to not be empty.It's also explained in comments, but still.

* I think that the indentation of code could use some tweaking.

* More comments, please. In particular, comment the struct fields in
pairingheap_node. There are various blocks of code that could use at
least an additional terse comment, too.


Added some comments, hope it's better now.

* You talked about tuplesort.c integration. In order for that to
happen, I think the comparator logic should know less about min-heaps.
This should formally be a max-heap, with the ability to provide
customizations only encapsulated in the comparator (like inverting the
comparison logic to get a min-heap, or like custom NULLs first/last
behavior). So IMV this comment should be more generic/anticipatory:

+ /*
+  * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+  * and >0 iff a > b.  For a min-heap, the conditions are reversed.
+  */
+ typedef int (*pairingheap_comparator) (const pairingheap_node *a,
const pairingheap_node *b, void *arg);

I think the functions should be called pairing_max_heap* for this
reason, too. Although that isn't consistent with binaryheap.c, so I
guess this whole argument is a non-starter.

I don't see what the problem is. The pairingheap.c (and binaryheap.c)code works the same for min and max-heaps. The comments assume amax-heap in a few places, but that seems OK to me in the context.

* We should just move rbtree.c to .../lib. We're not using CVS anymore
-- the history will be magically preserved.

Yeah, I tend to agree. Tom Lane has not liked moving things, because itbreaks back-patching. That's true in general, even though git has somesmarts to follow renames. I think it would work in this case, though.Anyway, let's discuss and do that as a separate patch, so that we don'tget stuck on that.

Anyway, to get to the heart of the matter: in general, I think the
argument for the patch is sound. It's not a stellar improvement, but
it's worthwhile. That's all I have for now...

Ok, thanks for the review! I have committed this, with some cleanup andmore comments added.


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST kNN search queue (Re: KNN-GiST with recheck)

Reply via email to