I've done similar things in the past. Here is how I did it:

Generate and sort the first file.
Generate and sort the second file.
join first second

The join command will only print lines that are in both the first and
second file. This would give you a list possible collisions. You'd just
need to verify that the collision is valid (i.e. the random string was not
the actual string from the first). If you just wanted a count you could:

join first second | wc -l


On Fri, Dec 27, 2013 at 10:55 AM, S. Dale Morrey <sdalemor...@gmail.com>wrote:

> I would love for you to tell me that, but still I'm trying to verify a
> particular implementation of the algorithm,
>
> Your CDB file idea is a good one.  I'm going to investigate it further.
>
>
>
> On Fri, Dec 27, 2013 at 10:50 AM, Steve Meyers <st...@plug.org> wrote:
>
> > On 12/27/13 10:43 AM, S. Dale Morrey wrote:
> >
> >> Yes, that is exactly what I'm doing.  I'm checking the propensity for a
> >> random string of characters to have a hash collision with an existing
> >> known
> >> set of words given an unsalted hashing algorithm.
> >>
> >
> > Can I save you the trouble by telling you how unlikely it is to happen?
> >
> > Since your 24 million existing hashes are static, I'd look into how big a
> > CDB file would be, and then use that to check for collisions.
> >
> > Steve
> >
> >
> >
> > /*
> > PLUG: http://plug.org, #utah on irc.freenode.net
> > Unsubscribe: http://plug.org/mailman/options/plug
> > Don't fear the penguin.
> > */
> >
>
> /*
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
> */
>

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to