Re: [HACKERS] Hash index todo list item

Brian Hurt Fri, 07 Sep 2007 08:11:39 -0700

Kenneth Marshall wrote:

How likely is it that you will get a hash collision, two strings that aredifferent that will hash to the same value? To avoid this requires a verylarge hash key (128 bits, minimum)- otherwise you get into birthday attackproblems. With a 32-bit hash, the likelyhood is greater than 50% that twostrings in a collection of 100,000 will hash to the same value. With a64-bit hash, the likelyhood is greater than 50% that two strings in acollection of 10 billion will has to same value. 10 billion is a largenumber, but not an unreasonable number, of strings to want to put into ahash table- and it's exactly this case where the O(1) cost of hashtablesstarts being a real win.
Brian
Yes, there is a non-negligible chance of collision (In a DB is there
any chance that is non-negligible? :) ) and the values must be checked
against the actual. The win is the collapse of the index size and only
needed to check a small fraction of the actual tuples.

Ah, OK- I misunderstood you. I thought you were saying that the hashvalues would need to be unique, and you wouldn't check the originalvalues at all. My bad.


Brian

Re: [HACKERS] Hash index todo list item

Reply via email to