On Tue, Aug 28, 2018 at 2:26 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2018-08-28 13:50:43 +1200, Thomas Munro wrote: > >> What bad thing would happen if we used OIDs directly as hash values in > >> internal hash tables (that is, instead of uint32_hash() we'd use > >> uint32_identity(), or somehow optimise it away entirely, as you can > >> see in some C++ standard libraries for eg std::hash<int>)? > > > Oids are very much not equally distributed, so in all likelihood you'd > > get cases very you currently have a reasonably well averaged out usage > > of the hashtable, not be that anymore. > > Right. In particular, most of our hash usages assume that all bits of > the hash value are equally "random", so that we can just mask off the > lowest N bits of the hash and not get values that are biased towards > particular hash buckets. It's unlikely that raw OIDs would have that > property.
Yeah, it would be a terrible idea as a general hash function for use in contexts where the "avalanche effect" assumption is made about information being spread out over the bits (the HJ batching code wouldn't work for example). I was wondering specifically about the limited case of hash tables that are used to look things up in caches. -- Thomas Munro http://www.enterprisedb.com