On Tue, Oct 14, 2014 at 10:25 AM, Andres Freund <and...@2ndquadrant.com> wrote: >> The key idea here is that lookups are done without any locks, only >> memory barriers; and inserts and deletes are done using atomic ops. > > Hm. I quickly looked and I see that you use two full barriers for every > lookup. That's pretty expensive. I think this should be doable using > only read/write barriers on the lookup side? Even on architectures where > read/write memory barriers have to be something but a compiler barrier, > they're still usually a magnitude or so cheaper than full barriers.
The code in CHashSearch shows the problem there: you need to STORE the hazard pointer before you begin to do the LOAD operations to scan the bucket, and you must finish all of those LOADs before you STORE the NULL hazard pointer. A read or write barrier won't provide store/load or load/store ordering. > I don't see much reason to strive for full lock-free ness. If the locks > aren't noticeable in real world scenarios - who cares? That's my feeling too. ISTR that when I stress-tested the hash table back in 2012 when I started this code, the concurrency was far better than dynahash with 16 lwlock partitions. I don't remember off hand exactly how I tested that, but it was with the code in hashtest.c. I also seem to recall that it was possible to make the freelists get VERY hot, but of course that was a workload where hash table updates were the only thing happening, so I assume that on a workload where we're actually doing work (like, copying the data in and out of kernel buffers) that's not going to be a real problem unless you have thousands of cores. But we'll see. > With regard for using a hash table for the buffer mapping lock I'm > doubtful that any form of separate chaining is the right one. We > currently have a quite noticeable problem with the number of cache > misses in the buffer mapping hash (and also the heavyweight lock mgr) - > if we stay with hashes that's only going to be fundamentally lower than > dynahash if we change the type of hashing. I've had good, *preliminary*, > results using an open addressing + linear probing approach. I'm very skeptical of open addressing + linear probing. Such hash tables tend to degrade over time, because you have to leave tombstones behind to ensure that future scans advance far enough. There's really no way to recover without rebuilding the whole hash table, and eventually it degrades to linear search. If we're spending too much time walking hash chains, I think the solution is to increase the number of buckets so that the chains get shorter. > My guess is that the additional indirection via the arena explains the > difference in cache misses? But I might be completely off here. The arena makes the calculation of the pointer address less predictable, which might make things more difficult for the CPU pipeline. But aside from that, I think it's the same basic idea: you bounce from some sort of bucket header to the first element of the chain and then, hopefully, you're done. Am I missing something? > It'd be quite interesting to see a perf stat -ddd of dynahash.c/chash > employing builds for some comparable load. I'm hoping you can test this out on x86. All I have to work with are the POWER machines, and the characteristics of those are somewhat different. I can test it there, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers