Brian Dessent wrote on 02 June 2008 19:43: > Dave Korn wrote: > >> Regardless of how well (or poorly) the >> hash function distributes DLLS into the various buckets, there are only >> 1024 of them, and we have many DLLs, many of which will occupy multiple >> buckets; collisions are inevitable. > > First of all, I don't see where this 1024 comes from. By my reading the > hash distributes over the range 0x61300000 - 0x712C0000 in 64k > increments, meaning 4092 buckets.
static unsigned long compute_dll_image_base (const char *ofile) { unsigned long hash = strhash (ofile); return 0x61300000 + ((hash << 16) & 0x0FFC0000); } #endif Looking at the mask value, it has ten bits set. Hence 1024 possible results. Looking at the 0 bits to the right of them, they are spaced in 1<<18 = 1<<(16+2) = 1<<16 * 1<<2 = 64k * 4 = 256kB units. I did this in my head, so I may be having a brain fart for all I know. > But what I really meant wasn't necessarily to improve the hashing > function per se but to give it more buckets, a wider range. Point, but short of the world moving to 64-bit address space, we're always likely to run into collisions round the top end of the 2Gb space. cheers, DaveK -- Can't think of a witty .sigline today.... -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/