----- Original Message ----- 
From: "Berlin Brown" <[EMAIL PROTECTED]>
Sent: Sunday, June 10, 2007 11:24 AM

> Yea, but how do crawl the actual pages like you would a intranet
> crawl. For example, lets say that I have 20 urls in my set from the
> DmozParser.  Lets also say that I want to go into the depth 3 levels
> deep into the 20 urls.  Is that possible.
>
> For example with the intranet crawl I would start with some seed URL
> and then go into some depth.  How would I do that URLs fetched from
> for example dmoz.

The only way I can imagine is doing it on a host-by-host basis, restricting 
the host you crawl at various stages with an URLFilter, e.g. by changing the 
content of regex-urlfilter.txt .

Enzo


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to