On Tue, Aug 15, 2017 at 03:03:07PM +0200, Michal Privoznik wrote: > On 08/15/2017 02:01 PM, Peter Krempa wrote: > > On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote: > >> On 08/03/2017 04:50 AM, Peter Krempa wrote: > > > > [ trimmed off-topic part ] > > > >> NB: I didn't dig into the qemumonitorjsontest algorithm, but... > >> > >> While going through the common object changes, I ended up looking at and > >> thinking about the hash table algorithms, initially it was just the > >> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful) > >> since as soon as 1 bucket chain exceeds 8 elements, the table is resized > >> by 8 unless the new size is 8 * 2048 - at which time no matter how full > >> a bucket is we don't resize the table > > > > Resizing the table is very expensive, so it makes sense to scale it > > aggresively. At some point though it would take too much memory. > > > > If some place in the code expects massive hash table, it can set the > > hash table to > 2048 manually. > > > >> The next thing I considered was using a prime number as the table bucket > >> size and it seems by just doing that will cause qemumonitorjsontest to > > > > What would be the benefit of such change? I don't really see how prime > > numbers would improve anything performance-wise. > > Because if your hash function is stupid it can cluster all the keys > together. Primes, however, have way fewer divisors, therefore clustering > is harder to achieve. Of course, you can construct a hash function so > that it's utterly stupid (e.g. {return 4;}) and then all of our > optimizations are thrown out of the window. But I believe that using > prime numbers for table size is advised in the literature too. > Apparently, wiki mentions this fact too: > > https://en.wikipedia.org/wiki/Hash_table#Choosing_a_hash_function
If your hash function has a problem with clustering, then you really should replace the hash function with a better one - which we already did a few years back now :-) Incidentally, we should replace our murmurhash funtion with siphash at some point, because people have found attacks against murmurhash, and so most devs have migrated over to siphash. Regards, Daniel [1] https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/ -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list