On 16/09/16 18:35, Amit Kapila wrote:

On Thu, Sep 15, 2016 at 7:53 PM, Andres Freund <and...@anarazel.de> wrote:
Hi,

On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
For making hash indexes usable in production systems, we need to improve
its concurrency and make them crash-safe by WAL logging them.
One earlier question about this is whether that is actually a worthwhile
goal.  Are the speed and space benefits big enough in the general case?

I think there will surely by speed benefits w.r.t reads for larger
indexes.  All our measurements till now have shown that there is a
benefit varying from 30~60% (for reads) with hash index as compare to
btree, and I think it could be even more if we further increase the
size of index.  On space front, I have not done any detailed study, so
it is not right to conclude anything, but it appears to me that if the
index is on char/varchar column where size of key is 10 or 20 bytes,
hash indexes should be beneficial as they store just hash-key.

Could those benefits not be achieved in a more maintainable manner by
adding a layer that uses a btree over hash(columns), and adds
appropriate rechecks after heap scans?

I don't think it can be faster for reads than using real hash index,
but surely one can have that as a workaround.

Note that I'm not saying that hash indexes are not worthwhile, I'm just
doubtful that question has been explored sufficiently.

I think theoretically hash indexes are faster than btree considering
logarithmic complexity (O(1) vs. O(logn)), also the results after
recent optimizations indicate that hash indexes are faster than btree
for equal to searches.  I am not saying after the recent set of
patches proposed for hash indexes they will be better in all kind of
cases.  It could be beneficial for cases where indexed columns are not
updated heavily.

I think one can definitely argue that we can some optimizations in
btree and make them equivalent or better than hash indexes, but I am
not sure if it is possible for all-kind of use-cases.



I think having the choice for a more equality optimized index design is desirable. Now that they are wal logged they are first class citizens so to speak. I suspect that there are a lot of further speed optimizations that can be considered to tease out the best performance - now that the basics of reliability have been sorted. I think this patch/set of patches is/are important!

regards

Mark



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to