hmm, i am not shure about all the requirments, but maybe snarf ? it has a lot more functions then wget, for example resume download.
Oleg. ----- Original Message ----- From: "Tal, Shachar" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, October 09, 2003 3:17 PM Subject: Open-source webcrawler required > Hi All, > > I am in need of an (open-source) web crawler (a-la wget), but one that does > all of the following: > 1. Performs breadth-first search, not depth-first search. (so stopping > condition based of disk space will give a wide crawl, rather than a deep > crawl). > 2. Can let me defined whether to recurse into a link or not, based on > criteria (e.g. leaving domain or not being the most obvious, but also by > regexping the url etc.) > 3. optimally should allow me to provide a lambda function that will return a > rating based on page content, so I decide whether to recurse and where to > avoid. > > Anyone? > > I will write such a thing, if none is found, but really prefer not to. > > Shachar Tal > Verint Systems > > > > > This electronic message contains information from Verint Systems, which may > be privileged and confidential. The information is intended to be for the > use of the individual(s) or entity named above. If you are not the intended > recipient, be aware that any disclosure, copying, distribution or use of the > contents of this information is prohibited. If you have received this > electronic message in error, please notify us by replying to this email. > > ================================================================= > To unsubscribe, send mail to [EMAIL PROTECTED] with > the word "unsubscribe" in the message body, e.g., run the command > echo unsubscribe | mail [EMAIL PROTECTED] > > ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]