More on this: http://glinden.blogspot.com/2008/04/detecting-near-duplicates-in-big-data.html
http://www.conradweb.org/~jackg/pubs/SIGIR04_Conrad.pdf http://code.google.com/p/simhash/ On Wed, Aug 13, 2008 at 11:40 AM, Luke Tucker <[EMAIL PROTECTED]> wrote: > Thanks to anil for sending this along > > http://gatekeeper.research.compaq.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/ > > - Luke > > > > -- > Archive: > http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218642056754 > To unsubscribe send an email with subject "unsubscribe" to > [EMAIL PROTECTED] Please contact > [EMAIL PROTECTED] for questions. > > -- Archive: http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218649093016 To unsubscribe send an email with subject "unsubscribe" to [EMAIL PROTECTED] Please contact [EMAIL PROTECTED] for questions.

