On 17/07/2006 5:52 PM, Arash Partow wrote: > Hi Paul, > > For different data types different hash functions work > better/worse aka fewer or more collisions. > > I believe the more choice people have and also the more > ways people see how a particular thing can be done, then > the easier it will be for them to come up with their own > specific efficient solutions.
Who is likely to bother? In timbot we trust. Have you read the comments at the start of Objects/dictobject.c? > > That said, I believe at least one (most likely more) of > the hash functions in the group above will most always work > better (ala less collisions) than the standard default hash > available in the built-in dict for any random set of strings. [Aside: it's not "in the built-in dict". Any type which wants its instances to be hashable defines its own hash method, one that suits the type.] This belief would be based on: (a) actual testing by you or (b) a refereed academic paper which did such tests on hash functions (including the Python "standard default hash") or (c) ...??? What is the Python "standard default hash", anyway? It is written in C. Wouldn't it have been a good idea to provide a Python function for it, so that people who are going to "come up with their own specific efficient solutions" had something to compare with? Or perhaps they are intended to "port" your functions back into C? A few more questions: Why have you truncated the answers to 31 bits? Have you tested that these functions produce the same output (apart from the 31-bit thing) as the originals that you worked from? The reason for asking is that Python unlike C doesn't lose any bits in the << operation; if this is followed by a >> you may be shifting some unwanted 1-bits back in again. Talking about not losing bits: For your 36-byte example input, the SDBMHash (with its << 16) is up to about 566 bits before you truncate it to 31. A little over the top, perhaps. Maybe not the fastest way of doing it. What is the purpose of the calls to long() in PJWHash? And the $64K question: What is the quintessential difference between PJWHash and ELFHash? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list