Re: Open-source webcrawler required

Oleg Kobets Thu, 09 Oct 2003 12:53:29 -0700

hmm, i am not shure about all the requirments, but maybe snarf ?

it has a lot more functions then wget, for example resume download.


Oleg.

----- Original Message ----- 
From: "Tal, Shachar" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 09, 2003 3:17 PM
Subject: Open-source webcrawler required


> Hi All,
>
> I am in need of an (open-source) web crawler (a-la wget), but one that
does
> all of the following:
> 1. Performs breadth-first search, not depth-first search. (so stopping
> condition based of disk space will give a wide crawl, rather than a deep
> crawl).
> 2. Can let me defined whether to recurse into a link or not, based on
> criteria (e.g. leaving domain or not being the most obvious, but also by
> regexping the url etc.)
> 3. optimally should allow me to provide a lambda function that will return
a
> rating based on page content, so I decide whether to recurse and where to
> avoid.
>
> Anyone?
>
> I will write such a thing, if none is found, but really prefer not to.
>
> Shachar Tal
> Verint Systems
>
>
>
>
> This electronic message contains information from Verint Systems, which
may
> be privileged and confidential.  The information is intended to be for the
> use of the individual(s) or entity named above.  If you are not the
intended
> recipient, be aware that any disclosure, copying, distribution or use of
the
> contents of this information is prohibited.  If you have received this
> electronic message in error, please notify us by replying to this email.
>
> =================================================================
> To unsubscribe, send mail to [EMAIL PROTECTED] with
> the word "unsubscribe" in the message body, e.g., run the command
> echo unsubscribe | mail [EMAIL PROTECTED]
>
>


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: Open-source webcrawler required

Reply via email to