Re: [algogeeks] Efficient Way to Detect Duplicate Document

Sathaiah Dontula Sat, 07 May 2011 07:30:57 -0700

hash on the each page and compare the hash value.


Thanks & regards,
Sathaiah Dontula

On Tue, May 3, 2011 at 8:59 PM, bittu <shashank7andr...@gmail.com> wrote:

> suppose You have a billion urls, where each is a huge page. How do you
> detect the duplicate documents?
> on what  criteria you will detect it, what algorithm , approach ,
> whats will be the complexity of each approach
> as it has many application in computer science ...i would like to have
> some good discussion on this topic
>
> Lets Explorer All The Approach ???
>
> Thanks & Regrads
> Shashank
> CSE, BIT Mesra
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Efficient Way to Detect Duplicate Document

Reply via email to