Robert Haas <robertmh...@gmail.com> wrote: > Tom Lane <t...@sss.pgh.pa.us> wrote: >> Robert Haas <robertmh...@gmail.com> writes: >>> The goal is to make those hard to predict, though. >> >> Really? I think "I don't understand when this fails" isn't >> obviously better than being able to predict when it fails ... > > Isn't that the whole point of hash functions? Collisions are > inevitable, but you want them to be unpredictable. Are you sure you're not confusing attributes of a good random number generator with hash generation? (They have much in common, but I've only seen concern with "hard to predict" when working on random number algorithms, as for financial audits or jury selection.) > If you want a hash function with predictable collision behavior, > just has the first element. It'll be highly predictable AND > wicked fast. You're confusing "unpredictable" with "widely distributed", I think. There's no reason that the hash value of an integer of the same size as the hash shouldn't be the integer itself, for example. It's hard to get more predictable than that, yet it works well for hash lookups. I know that for a hash of a character string, the total = (total * 31) + nextcharacter has been shown to be effective. That's hardly random or hard to predict, but it does tend to spread out the hash values well. Whether it is as good for arrays, I don't know. It seems like a reasonable place to start, though, since a character string is rather like an array of characters. >> What concerns me about that is that it tends to push the bits to >> the left --- I think the effects of the earlier inputs are going >> to overflow out as you incorporate a lot of newer inputs. What >> you want is a scheme where every input item affects all the bits >> of the final result about equally, and my gut feeling is this >> doesn't provide that. > > I don't think so. That would be a problem if you multiplied by an > even number. You can see that you don't get dups if you just add > in the same value over and over: Yeah, that 1 in the low order position ensures that the impact of any value which is ever added into the total is never entirely lost. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers