On Feb 27, 10:07 am, Steve Holden <st...@holdenweb.com> wrote: > Assuming no duplicates, how does this help? You still have to verify > collisions.
Absolutely. But a decent hashing function (particularly since you know quite a bit about the data beforehand) will have very few collisions (theoretically no collisions, again since we know the data beforehand). > The problem is you can't guarantee any hash value will be unique, so > ultimately you have to check whether list entries hashing to the same > value are or are not equal. Which would mean either having the whole > list in memory or having access to it via some other random access method. Again, absolutely. But as Tim pointed out, with a decent hash the number of superfluous checks should be minimal, so your checks will mostly occur on valid duplicate values. It's worth noting that I picked this algorithm with the assumption that any storage to physical memory (such as required by using a set) would be out of the question. ~G -- http://mail.python.org/mailman/listinfo/python-list