The absolute fastest way to solve this problem I can think of short of resorting to hand-crafted assembly is to write a C program using a fast hash library, e.g uthash, then split the key space among threads and go for it. There are a few technical gotchas. Off the top of my head without actually having my hands on it (which means my ideas need to be taken with a grain of salt) I am thinking of the following that needs to be taken care of:
- maintain a local hash for each thread and synchronize/dump/merge into the global in batches. Otherwise multi-processor will be of little help because you have to have to take a mutex when writing into the global hash, and this will cause a serious performance nightmare if you do it once per entry. - if your key space is read from files, mmap() the files - the devil is often in the details - make sure you are parsing/generating strings efficiently, do things in place when possible, etc - to test the correctness make the key space of two threads partially overlap on purpose and see if you are catching the duplicates -- Sasha Pachev Fast Running Blog. http://fastrunningblog.com Run. Blog. Improve. Repeat. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */