There are entire books on the subject, so I can't give you everything, but here's a quick summary:
First you need a hash function which maps your input to an index in the table. Hashing strings is a common application, and there are a lot of string hash functions. Most common is something like this: int hash(char *s) { int result = 0; for(int i = 0; s[i]; ++i) result = 17 * result + s[i]; return result; } You would then use mod to map the return value into the size of the hash table. There are better hash functions, but that one is not bad for most purposes. The next issue to deal with his how to handle collisions. There is always the possibility that two values will map to the same spot in the hash table, and the more full the table gets, the more that will happen. The most basic method is linear probing, in which you simply check the next slot, and then the next, etc. until you find an empty slot. There are several problems with this scheme. First, as the table gets full it tends to create big clumps of filled cells, which reduces the efficiency of searching and adding to the table. Secondly, when you want to delete from the table, you might leave some other element in a spot where it won't be found. That's because with this scheme, the search table method must look in the location indicated by the hash function, and if it is occupied, it must search each cell until it finds one which matches or until it reaches an empty cell. If you deleted one of the elements between the hash value of the element you are searching for and the location where it was ultimately stored, you will run into trouble if you are not careful. This is typically addressed by marking a cell where an element has been deleted as "deleted". Then the search knows to keep looking past that location. A cleanup can be run occasionally to put things back closer to where they should be, but the cleanup is costly in terms of time, so you don't want to do it too often. Using a rehash function is a better way to deal with collisions. The rehash is a different hash function which puts the element in another location. I like to use a hash function which has a second parameter which defaults to zero, but other values can be used to rehash a string. Here is one: int hash(char *str, int rehash=0) { int result = rehash; for(int i = 0; s[i]; ++i) result ^= s[result^str[i]]; // s[] is a permutation of values 0..255 return result; } This has the nice characteristic that if hash(s1) == hash(s2), hash(s1,1) and hash(s2,1) are likely to be different. In some cases you can design a hash function which avoids collisions. If you have 100 known strings to hash, you can usually use the second function above and find a permutation s which results in no collisions. Better yet, if you can control the hash tags, you can ensure no collisions. In one case I was managing a set of up to N items, and each one had a unique id tag. The tags didn't need to be sequential, but each one had to be unique. So I assigned the tags by building a queue of integers initialized with values 1..N. When I needed an id for a new item I pulled it out of the queue, and when an item was deleted, I added N to it's id and put that id into the queue. This guaranteed that a hash function of id%N would never generate collisions. One of my favorite ways to deal with collisions is to have each cell in the table be a bucket which can contain multiple elements. Making it be a binary search tree is very nice, because it makes resolving collisions into an O(ln n) operation rather than O(n) for linear probing. If you already have a binary search tree class, you can just make a table of them to use as a hash table. In most cases you need to design in the ability to delete elements from the hash table. If you don't need that ability, that makes your job easier, but most applications do require it. You must make sure that deleting one item doesn't affect searches on other items, and that repeated insertions and deletes don't degrade the performance of the hash table. Chosing the right size for a hash table is very important. Hash table performance degrades significantly when the table gets more than half full. By the time it is 80% full you might as well be using a linear search. Some hash systems use variable sized hash tables. But changing size is very expensive because you must reinsert all the items into the new table. You are better off, when possible, picking the right size to start with. There are many schools of thought on hashing, and many ways to tailor a hashing system to a particular need. Think about the requirements of your application and look for the best way to accomplish your specific task. I hope this helps. Don On Sep 16, 9:17 am, prasanth n <nprasnt...@gmail.com> wrote: > can anyone tell how to implement a hash table in C?? how to choose the hash > function?? algorithm?? > > -- > *prasanth* -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.