Jae Hyuk Kwak wrote:
For the issue of Modulus operation, it does not need to be divide or hardware modulus operation. Let me give you an example here: 13 % 2 = 1 13 & (2-1) = 1
Yes, of course, we all know that, and of course the compiler does these simple optimizations. However, for computing hashes you typically want modulus with OTHER than a power of 2 to involve all bits.
In English, we can use AND operator as substitutes of modulus operator. In the way, the size of hash table should be restricted to certain values like 2, 4, 8, 16, and so on. But it is not a big problem because by the nature of Hash Table, we don't make the table full. For example, we have 257 cases on a switch statement, then we need a hash table whose bucket size is 512. It will have 50% filled hash table, but 50% filled hash table is usual.
Again, yes, of course, but typically power of two size for a hash table followed by an and that takes only the low order bits is likely to be a very poor hash choice.
In order to prevent the false positives, we need to store the key value as you correctly pointed. So the final process will be first to the hash key value calculation and then go to the hash table, and compare whether the value is correct or not.
Again, you can assume we all know the elementary algorithms for managing hash tables. One thing to realize is that hashing only has good average performance. So your O(N) is talking about average case performance, whereas the O(NlogN) for a tree search is worst case. That's comparing apples and oranges. It is worrisome to have the possibility of really bad performance for a particular bad luck case.