----- Original Message ----- From: "Berlin Brown" <[EMAIL PROTECTED]> Sent: Sunday, June 10, 2007 11:24 AM
> Yea, but how do crawl the actual pages like you would a intranet > crawl. For example, lets say that I have 20 urls in my set from the > DmozParser. Lets also say that I want to go into the depth 3 levels > deep into the 20 urls. Is that possible. > > For example with the intranet crawl I would start with some seed URL > and then go into some depth. How would I do that URLs fetched from > for example dmoz. The only way I can imagine is doing it on a host-by-host basis, restricting the host you crawl at various stages with an URLFilter, e.g. by changing the content of regex-urlfilter.txt . Enzo ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
