John Machin wrote: > Who is likely to bother? In timbot we trust. Have you read the comments > at the start of Objects/dictobject.c? > No I haven't probably wont be anytime soon, as far as time, well people interested, as is how started my original port, would be more than willing to try/assess the routines for sets of strings that they wish to hash etc.. this site may help explain plus has some code snippets that may help you understand what I mean.
http://www.partow.net/programming/hashfunctions/index.html > [Aside: it's not "in the built-in dict". Any type which wants its > instances to be hashable defines its own hash method, one that suits the > type.] > > This belief would be based on: > (a) actual testing by you > or (b) a refereed academic paper which did such tests on hash functions > (including the Python "standard default hash") > or (c) ...??? > Experience has shown me that the belief than one "default" way of hashing is generally not the optimal way for the overwhelming majority of data, but then again.... > What is the Python "standard default hash", anyway? It is written in C. > Wouldn't it have been a good idea to provide a Python function for it, > so that people who are going to "come up with their own specific > efficient solutions" had something to compare with? Or perhaps they are > intended to "port" your functions back into C? > The above link provides a very simple hash test framework, the "standard default hash" can be placed in there and tested amongst the other algorithms. > A few more questions: > > Why have you truncated the answers to 31 bits? > algorithms were adapted from unsigned int, i should have removed that final truncation in python. > Have you tested that these functions produce the same output (apart from > the 31-bit thing) as the originals that you worked from? The reason for > asking is that Python unlike C doesn't lose any bits in the << > operation; if this is followed by a >> you may be shifting some unwanted > 1-bits back in again. > Some of them do others don't (not really important unless you are trying to be compatible with other implementations which this is not), I am aware of python not truncating/wrapping of values under various operations, I believe its a nice little side effect from python which gives more bits to play with as long as you don't truncate them as I have. > Talking about not losing bits: For your 36-byte example input, the > SDBMHash (with its << 16) is up to about 566 bits before you truncate it > to 31. A little over the top, perhaps. Maybe not the fastest way of > doing it. > Possibly, do you have a better solution I'm very keen to learn... > What is the purpose of the calls to long() in PJWHash? > trying to cast to long, looking at it now its rather superfluous. > And the $64K question: What is the quintessential difference between > PJWHash and ELFHash? > Nothing, elf is essentially pjw, its just optimised for 32-bit systems in that the calculation for th's etc are static where has pjw required sizeof to calc the th's which i couldn't find a way of doing, so i fudged it in the hope that maybe sometime in the future a work around of sorts could be developed. Again this was just a simple posting, I hoped to maybe gets some comments and may present some ideas to people, I don't expect anyone to drop everything and start using these, just thought it might be something interesting for this NG. btw I'm not a very good python programmer ATM. Arash Partow ________________________________________________________ Be one who knows what they don't know, Instead of being one who knows not what they don't know, Thinking they know everything about all things. http://www.partow.net -- http://mail.python.org/mailman/listinfo/python-list