Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

k...@rice.edu Fri, 19 Dec 2014 15:38:20 -0800

On Fri, Dec 19, 2014 at 04:41:51PM -0600, Jim Nasby wrote:
> On 12/18/14, 5:00 PM, Jim Nasby wrote:
> >2201582 20 -- Mostly LOCALLOCK and Shared Buffer
> 
> Started looking into this; perhaps https://code.google.com/p/fast-hash/ would 
> be worth looking at, though it requires uint64.
> 
> It also occurs to me that we're needlessly shoving a lot of 0's into the hash 
> input by using RelFileNode and ForkNumber. RelFileNode includes the 
> tablespace Oid, which is pointless here because relid is unique per-database. 
> We also have very few forks and typically care about very few databases. If 
> we crammed dbid and ForkNum together that gets us down to 12 bytes, which at 
> minimum saves us the trip through the case logic. I suspect it also means we 
> could eliminate one of the mix() calls.
> 
> But I wonder if we could still do better, because we typically also won't 
> have that many relations. Is there some fast way we could combine dbid, 
> forkNum and relid into a uint32? That gets us down to 8 bytes, which means we 
> could use fash-hash, or a stripped down mix().
> 
> Unfortunately I don't know much about hash algorithms, so I don't know how 
> practical any of this actually is, or what a good method for combining those 
> fields would be. My current idea is something like (rot(forkNum, 2) | dbid) ^ 
> relid, but if you got unlucky with your oid values you could end up with a 
> lot of collissions from that.
> 
> I can put some effort into this, but I'd like some guidance.
> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Hi,

If we are going to consider changing the hash function, we should
consider something like xxhash which runs at 13.8GB/s on a 2.7GHz
x86_64 for the XXH64 variant and 6.8GB/s for the XXH32 variant which
is double the speed of fast-hash according to the page running on a
3GHz x86_64. In addition, something like that could be used a checksum
instead of the current CRC32, although some work has already gone into
speeding it up, as is. Otherwise, it probably makes sense to just stick
with creating the fastpath 8-byte analogously to the 4-byte fastpath
that was just added. Is calculating the hash the bottle-neck?

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: hash_create API changes (was Re: [HACKERS] speedup tidbitmap patch: hash BlockNumber)

Reply via email to