About 2 months ago John Kleven posted asking about using nutch just to crawl.

I have the same question, essentially.  One possible development tack I can 
take with my project is: use nutch for crawling, then use Xapian for 
tokenization, indexing, etc.  Over time we will need to spider a lot of sites 
so I'm disinclined to use wget.

Does nutch have out-of-the-box capability to spider sites and write the output 
to html files?  If not, can someone give me a quick summary of how I would 
properly modify or subclass the nutch code?






      
____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to