1.Use Haddop & Map Reduce Framework .Obviously We Need Distributed Algo we will make one computer as master & assign the job to all slave computer to do the crawling the web depending upon the geographic area ( m thinking real time problem).to crawled the maximum pages in least time we need above framework or any other distributed framework like google map reduce or GFS. computers are given for maximizing the crawling function & minimizing the the crawling time time..
Algorithmically you ca think of its like a graph which has 100 connected components in it &we will bfs to traverse each computer to find out the number of pages it has been crawled till now. i have given some overview hope it will help Thanks Shashank "I Don't Do Code to Code But I Do Code to Build Product" Computer Science & Engineering Birla Institute of Technology,Mesra -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.