On Fri, Sep 07, 2007 at 11:08:13AM -0400, Brian Hurt wrote: > Kenneth Marshall wrote: > >>>> >>> How likely is it that you will get a hash collision, two strings that are >>> different that will hash to the same value? To avoid this requires a >>> very large hash key (128 bits, minimum)- otherwise you get into birthday >>> attack problems. With a 32-bit hash, the likelyhood is greater than 50% >>> that two strings in a collection of 100,000 will hash to the same value. >>> With a 64-bit hash, the likelyhood is greater than 50% that two strings >>> in a collection of 10 billion will has to same value. 10 billion is a >>> large number, but not an unreasonable number, of strings to want to put >>> into a hash table- and it's exactly this case where the O(1) cost of >>> hashtables starts being a real win. >>> >>> Brian >>> >>> >> Yes, there is a non-negligible chance of collision (In a DB is there >> any chance that is non-negligible? :) ) and the values must be checked >> against the actual. The win is the collapse of the index size and only >> needed to check a small fraction of the actual tuples. >> >> >> > > Ah, OK- I misunderstood you. I thought you were saying that the hash > values would need to be unique, and you wouldn't check the original values > at all. My bad. > > Brian > No, you were correct. I misstated originally and you and Mark both pointed out my mistake.
Regards, Ken ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq