I was mostly unclear on how I'd go about attaching the extension functions to the relevant indexing mechanism. From the looks of the vptree.tar.gz file (which is really, *really* helpful, incidentally!), a it's done via a custom operator class, which then gets passed to the actual index creation mechanism when you're declaring the index.
I think I had looked at that at one point, but it's been a while. In my case, I'm using discrete-cosine-transform based perceptual hashes for searching. They are nice and compact (64-bits per hash), while still producing good search results. I have a dataset of ~36 million images, and it does searches in < 50 milliseconds with a hamming distance of 4, while touching ~0.25% of the tree (And occupying ~18 GB of ram). My BK tree is up on github here <https://github.com/fake-name/IntraArchiveDeduplicator/blob/master/deduplicator/bktree.hpp>, if anyone needs something like that (BSD licensed, pretty well tested). There's also a python wrapper for it. I'll probably not have time to poke about until this weekend, but thanks! Connor On Mon, Oct 30, 2017 at 4:50 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Hi! > > On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf < > w...@imaginaryindustries.com> wrote: > >> I'm looking at implementing a custom indexing scheme, and I've been >> having trouble understanding the proper approach. >> >> Basically, I need a BK tree, which is a tree-structure useful for >> indexing arbitrary discrete metric-spaces (in my case, I'm interested in >> indexing across the hamming edit-distance of perceptual hashes, for fuzzy >> image searching). I'm *pretty* sure a SP-GiST index is the correct index >> type, as my tree is intrinsically unbalanced. >> > > Yes, SP-GiST is appropriate index type for BK tree. I'm pretty sure BK > tree could be implemented as SP-GiST opclass. > The only thing worrying me is selection pivot values for nodes. SP-GiST > builds by insertion of index tuples on by one. First pivot value for root > node in SP-GIST would be created once first leaf page overflows. Thus, you > would have to select this pivot value basing on very small fraction in the > beginning of dataset. > As I know, BK tree is most efficient when root pivot value is selected > after looking in whole dataset and then hierarchically to subtrees. > > BTW, did you try my extension for searching similar images. It's quite > primitive, but works for some cases. > https://github.com/postgrespro/imgsmlr > https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf > > I have a functional stand-alone implementation of a BK-Tree, and it works >> very well, but the complexity of managing what is basically a external >> index for my database has reached the point where it's significantly >> problematic, and it seems to be it should be moved into the database. >> > > Sure, moving this index to the database is right decision. > > Anyways, looking at the contents of postgres/src/backend/access/spgist, >> it looks pretty straightforward in terms of the actual C implementation, >> but I'm stuck understanding how to "install" a custom SP-GiST >> implementation. There are several GiST indexing implementations in the >> contrib directory, but no examples for how I'd go about implementing a >> loadable SP-GiST index. >> >> Basically, my questions are: >> >> - Is it possible to implement a SP-GiST indexing scheme as a loadable >> module? >> >> Yes, it's possible to define SP-GiST. > >> >> - If so, how? >> >> The pretty same way as GiST opclass extension. You have to define > supporting functions and operators and then define operator class over them. > >> >> - And is there an example I can base my implementation off of? >> >> >> >> I'm relatively comfortable with C (much moreso with C++), but I haven't >> spent a lot of time looking at the postgresql codebase. I don't think I >> could start from a empty folder and make a properly-implemented module in >> any reasonable period of time, so if I have a working example for some sort >> of index that uses the same interfaces that would really help a lot >> > I don't think there is an example in PostgreSQL source code tree or on > github. But I've attached by early experiment with VP-tree (seems to be > pretty same as BK tree) using SP-GiST (see vptree.tar.gz). Basing on this > experiment I realized that it's important to select root pivot value basing > on the whole dataset. However, for your metric/dataset/queries it might > appear to be different. > > It also would be nice to someday improve SP-GiST to support some global > strategies on index creation. In particular, it would allow to resolve > selection of pivot values problem that I mention above. Right now my > colleagues and me don't have time for that. But I can assist you with > advises if you will decide to implement that. > > ------ > Alexander Korotkov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >