On 12/20/14, 11:51 AM, Tom Lane wrote:
Andres Freund <and...@2ndquadrant.com> writes:
On 2014-12-19 22:03:55 -0600, Jim Nasby wrote:
What I am thinking is not using all of those fields in their raw form to 
calculate the hash value. IE: something analogous to:
hash_any(SharedBufHash, (rot(forkNum, 2) | dbNode) ^ relNode) << 32 | blockNum)

perhaps that actual code wouldn't work, but I don't see why we couldn't do 
something similar... am I missing something?

I don't think that'd improve anything. Jenkin's hash does have a quite
mixing properties, I don't believe that the above would improve the
quality of the hash.

I think what Jim is suggesting is to intentionally degrade the quality of
the hash in order to let it be calculated a tad faster.  We could do that
but I doubt it would be a win, especially in systems with lots of buffers.
IIRC, when we put in Jenkins hashing to replace the older homebrew hash
function, it improved performance even though the hash itself was slower.

Right. Now that you mention it, I vaguely recall the discussions about changing 
the hash function to reduce collisions.

I'll still take a look at fash-hash, but it's looking like there may not be 
anything we can do here unless we change how we identify relation files 
(combining dbid, tablespace id, fork number and file id, at least for 
searching). If we had 64bit hash support then maybe that'd be a significant 
win, since you wouldn't need to hash at all. But that certainly doesn't seem to 
be low-hanging fruit to me...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to