[Nutch-general] Nutch 0.9 recrawl

Annona Keene Tue, 24 Apr 2007 14:58:29 -0700

I've been using nutch for a little while now, and the new release is great. I'm 
hoping someone can help me with what I'm trying to do.


One of the sites I crawl is basically an archive for a mailing list. So there's 
lots of data that never changes, and then there are new pages every day. I'm 
not entirely clear on how this recrawling thing works. Is there a way I can use 
Nutch to just crawl those new pages, and ignore all the old ones that are 
pretty much static forever that I've already crawled? 

Any help would be greatly appreciated.

Thanks,
Ann


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Nutch 0.9 recrawl

Reply via email to