Peter, Paul and Alexander,

One last question: is there anything actionable by me regarding this, e.g. opening a ticket on the PostgreSQL issue tracker?

That is as much as I can do, because I have zero experience with PostgreSQL development, but if you want me to open an issue describing the performance issues and benefits to the PostGIS community of having much faster GiST indexing and referencing this mailing list thread, that is something I could potentially do. Probably not a whole lot of value in it, but at least it would document the enhancement request.

Marco

Op 21-9-2020 om 18:39 schreef Peter Geoghegan:
On Wed, Sep 16, 2020 at 3:17 PM Alexander Korotkov <aekorot...@gmail.com> wrote:
In fact, GiST indexes are inferior to B-tree indexes for many reasons.
For instance, B-tree implements disk-ordered scan during VACUUM, while
GiST doesn't.  B-tree could implement retail index tuple delete (which
will be very useful for pluggable table AMs), but for GiST retail
delete is problematic, because its performance is not guaranteed to be
log(N).  B-tree implements deduplication, but GiST doesn't.  And so
on.  So, most of advances are applied to B-tree, while GiST is
retarded.  That makes me think it's a good idea to reimplement GiST on
top of B-tree.
A standard B-Tree with Z ordering is sometimes called a UB-Tree. This
is described in "Modern B-Tree techniques", which promotes UB-Trees as
having all of the advantages you list here. Though the book also says
that it probably has somewhat inferior read performance to
"specialized structures" (which probably refers to loosely ordered
tree structures like GiST). This matches my own high level intuitions
about it.

In short, my idea is to add to the B-tree ability to maintain "union
key" (MBR for spatial) in pivot tuple (including leftmost pivot
tuple), while navigation of insertions will remain comparison-based
(Z-ordered for spatial).  Once we have union keys correctly
maintained, we can implement a special "union key" scan for B-tree,
which would work similar to the GiST scan. Maintaining union keys
could be tricky, because we will have to make sure no concurrent split
ruined our work on extension of parent union key before we've extended
the child union key.  But nevertheless, I think it's solvable.
That sounds hard. It sounds like you more or less want to make the
structure a true UB-Tree for inserts -- a structure where you cannot
allow there to be more than one place for any possible key in the
index. But you also want to maintain a union key for selects, in order
to make it possible to have a choice of subtree to insert into, just
like GiST. Is it really possible to do both at the same time? It feels
like a fundamental trade-off.

I suspect that it would be helpful (if you just did a traditional
UB-Tree, without the union key stuff) to add specialized suffix
truncation logic, that picked a split point according to special
criteria for the conditioned Z-order keys. That might limit the extent
of the performance hit for reads, without the added complexity of the
union key thing. This is just a guess.

I don't have a particularly good sense of the costs here. It's obvious
that the benefits would be very significant, but you don't need me to
say that. The hard parts are 1.) determining the costs, and 2.)
determining if the benefits are worth it.


--
Peter Geoghegan
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to