Re: 24 Million entries and I need to what?

Tod Hansmann Sun, 29 Dec 2013 00:31:27 -0800


On 12/28/2013 2:01 PM, Sasha Pachev wrote:

However, the exercise of computing and comparing a lot of hashes within some
reasonable time is very interesting. Knowing how to do it efficiently
is a skill that will come handy at some point in your life.

In all honesty, I think this is the opposite of an interesting problem.This is more of an exercise for a second or third year CS student. It'sa trade-off between cpu time and memory on the computing side, andmemory and storage access time on the comparison side. The computingside is mostly CPU bound unless you're in a micro processor architectureof some sort, because the memory isn't going to matter unless youralgorithm uses a LOT of memory (think scrypt).

As far as comparison, your data structure is going to be an array ofstrings for a small set, or maybe a hash map. For a large set, youbetter be using a tree for indexes or you're going to have a bad time.What algorithm you put that data structure through will depend on yourhardware and priorities (speed? data analysis? data set size?). Reallysomeone that's had a data structure's course should be able to bang outthe comparison side of things in an hour or less, depending onlanguage/libs.

This is, of course, ignoring the idea that a sysadmin would be lookingat this from a "pipe sort to uniq" solution which we establishedwouldn't be helpful anyway.


Anyway, that's my thoughts on the subject.

-Tod Hansmann

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Re: 24 Million entries and I need to what?

Reply via email to