More on this:

http://glinden.blogspot.com/2008/04/detecting-near-duplicates-in-big-data.html

http://www.conradweb.org/~jackg/pubs/SIGIR04_Conrad.pdf

http://code.google.com/p/simhash/


On Wed, Aug 13, 2008 at 11:40 AM, Luke Tucker <[EMAIL PROTECTED]> wrote:
> Thanks to anil for sending this along
>
> http://gatekeeper.research.compaq.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/
>
> - Luke
>
>
>
> --
> Archive:
> http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218642056754
> To unsubscribe send an email with subject "unsubscribe" to
> [EMAIL PROTECTED]  Please contact
> [EMAIL PROTECTED] for questions.
>
>


--
Archive: 
http://www.openplans.org/projects/melkjug/lists/melkjug-development-list/archive/2008/08/1218649093016
To unsubscribe send an email with subject "unsubscribe" to [EMAIL PROTECTED]  
Please contact [EMAIL PROTECTED] for questions.

Reply via email to