1.Use Haddop & Map Reduce Framework .Obviously We Need Distributed
Algo  we will make one computer as master & assign the job to all
   slave computer to do the crawling the web depending upon the
geographic area ( m thinking real time problem).to crawled the
maximum
   pages in least time we need above framework or any other
distributed framework like google map reduce or GFS.
   computers are given for maximizing the crawling function &
minimizing the the crawling time time..

   Algorithmically you ca think of its like a graph which has 100
connected components in it &we will bfs to traverse each computer to
find out
   the number of pages it has been crawled  till now.

  i have given some overview hope it will help


Thanks
Shashank "I Don't Do Code to Code But I Do Code to Build Product"
Computer Science & Engineering
Birla Institute of Technology,Mesra

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to