On 12/28/2013 2:01 PM, Sasha Pachev wrote:
However, the exercise of computing and comparing a lot of hashes within some
reasonable time is very interesting. Knowing how to do it efficiently
is a skill that will come handy at some point in your life.

In all honesty, I think this is the opposite of an interesting problem. This is more of an exercise for a second or third year CS student. It's a trade-off between cpu time and memory on the computing side, and memory and storage access time on the comparison side. The computing side is mostly CPU bound unless you're in a micro processor architecture of some sort, because the memory isn't going to matter unless your algorithm uses a LOT of memory (think scrypt).

As far as comparison, your data structure is going to be an array of strings for a small set, or maybe a hash map. For a large set, you better be using a tree for indexes or you're going to have a bad time. What algorithm you put that data structure through will depend on your hardware and priorities (speed? data analysis? data set size?). Really someone that's had a data structure's course should be able to bang out the comparison side of things in an hour or less, depending on language/libs.

This is, of course, ignoring the idea that a sysadmin would be looking at this from a "pipe sort to uniq" solution which we established wouldn't be helpful anyway.

Anyway, that's my thoughts on the subject.

-Tod Hansmann

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to