> > At a previous job, we tested a 32-bit hash function by running it
> > against hundreds of thousands of unique URL's stored in our
> > database.  We found one collision.  A 64-bit hash is billions of
> > times better (4 billion, to be exact).
> 
> Good to know.  I wonder how many collisions I'd find if I ran it over
> every URL listed in the directory www.yahoo.com.
> 
> Which 64 bit hash function did you use?  Invent your own, or something
> "off the shelf"?
> 

We found a public domain one on the net see 
http://www.burtleburtle.net/bob/hash/evahash.html for some sample code.  It's only a 
32-bit hash though.  However, that same page appears to have instructions for a 64-bit 
hash function as well, but I haven't tried it at all.  I'd be curious to know how many 
collisions you find hashing all the URL's in yahoo's database :)  I don't know how 
long that would take, but if you do it I'd like to hear the results.

Since the hash function takes a key and an initial value, you could try running it 
with two different initial values and/or keys.  This would give you effectively a 
128-bit hash, which you could store across two fields in MySQL.  I'm guessing that the 
64-bit hash will probably be good enough though.

Steve Meyers



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to