On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote: > On 12/18/14, 5:00 PM, Jim Nasby wrote: > >2201582 20 -- Mostly LOCALLOCK and Shared Buffer > > Started looking into this; perhaps https://code.google.com/p/fast-hash/ would > be worth looking at, though it requires uint64. > > It also occurs to me that we're needlessly shoving a lot of 0's into the hash > input by using RelFileNode and ForkNumber. RelFileNode includes the > tablespace Oid, which is pointless here because relid is unique per-database. > We also have very few forks and typically care about very few databases. If > we crammed dbid and ForkNum together that gets us down to 12 bytes, which at > minimum saves us the trip through the case logic. I suspect it also means we > could eliminate one of the mix() calls. > > But I wonder if we could still do better, because we typically also won't > have that many relations. Is there some fast way we could combine dbid, > forkNum and relid into a uint32? That gets us down to 8 bytes, which means we > could use fash-hash, or a stripped down mix(). > > Unfortunately I don't know much about hash algorithms, so I don't know how > practical any of this actually is, or what a good method for combining those > fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ > relid, but if you got unlucky with your oid values you could end up with a > lot of collissions from that. > > I can put some effort into this, but I'd like some guidance. > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com >
Hi, If we are going to consider changing the hash function, we should consider something like xxhash which runs at 13.8GB/s on a 2.7GHz x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which is double the speed of fast-hash according to the page running on a 3GHz x86_64. In addition, something like that could be used a checksum instead of the current CRC32, although some work has already gone into speeding it up, as is. Otherwise, it probably makes sense to just stick with creating the fastpath 8-byte analogously to the 4-byte fastpath that was just added. Is calculating the hash the bottle-neck? Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers