On Tue, Aug 15, 2017 at 10:02:38AM -0400, John Ferlan wrote: > > > On 08/15/2017 08:01 AM, Peter Krempa wrote: > > On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote: > >> On 08/03/2017 04:50 AM, Peter Krempa wrote: > > > > [ trimmed off-topic part ] > > > >> NB: I didn't dig into the qemumonitorjsontest algorithm, but... > >> > >> While going through the common object changes, I ended up looking at and > >> thinking about the hash table algorithms, initially it was just the > >> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful) > >> since as soon as 1 bucket chain exceeds 8 elements, the table is resized > >> by 8 unless the new size is 8 * 2048 - at which time no matter how full > >> a bucket is we don't resize the table > > > > Resizing the table is very expensive, so it makes sense to scale it > > aggresively. At some point though it would take too much memory. > > > > If some place in the code expects massive hash table, it can set the > > hash table to > 2048 manually. > > > > Right - avoiding excessive or unnecessary resizing is something I was > considering. One thing that was at the back of my mind is "somewhat > recent" discussions about 10s of thousands of disk/volumes. IIRC it's > been discussed on qemu list as it relates to libguestfs. > > Because volume names tend to be less random than a UUID, I was looking > at how the existing algorithms operate when you have say "disk00001" > through "disk99999" w/r/t bucket layout and length of chains when we > reach the maximum bucket count.
For any sensible hash algorithm, such a "predictable" naming convention is not a realk problem - indeed if it were a problem it would be considered a security vulnerability these days. While murmurhash is not a cryptographic hash, a single bit difference in names should still produce a very different hashcode value. If we switch to siphash though, which is a cryptographic hash, we will have a strong guarantee of a completely different hash code for such names. So again I don't think bucket size is vs primes is important here - the choice of hash function more directly determines the collision resistance we have. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list