@^ : We are just going to do a BFS kind of thing is crawler, its better to use a <b>queue</b> for implementing it. rest to store it , we can use accordingly like if we have just to store the URL we can use a pretty simple DS like arrays, or say Linklists(if its very large) . but if we need to store entire thing HTML , then we have to go to DOM structures so store it, in something like XML tags.
On 2/7/12, Durgesh Kumar <durgesh1...@gmail.com> wrote: > U can use dictionary or linked list ............ > > Better if U choose language like python or java. > > Python have module named "Urllib2" and "httplib2" which implements all > the functions for getiing ,posting and browsing data. > > > INFORMAL ALGORITHM...... > > 1. Start with any arbitray link. LINK=[new link] > 2.a>Get html content of the link . > b>Parse the required Content and store it . > c>Add the new link on the page to the LINK it it is not present there. > 3.Repeat step 2 untill U want to crawl. > > On 2/5/12, Ravi Ranjan <ravi.cool2...@gmail.com> wrote: >> what will the algorithm and the appropriate data structure to implement a >> web crawler?? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Algorithm Geeks" group. >> To post to this group, send email to algogeeks@googlegroups.com. >> To unsubscribe from this group, send email to >> algogeeks+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/algogeeks?hl=en. >> >> > > > -- > *Durgesh Kumar* > Final Year, B.tech > Information Technology > HALDIA INSTITUTE OF TCHNOLOGY > HALDIA > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.