Hi Tom, Nice article!
Tom White wrote:
Hi, I've written an article about using Nutch at the intranet scale, which you may find interesting: http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html .
I found it very enlightening. In the following installments I'd personally like to learn about
* How to keep the index up to date - nutch makes it simple to crawl the intranet and then you start tomcat and you are flying, but what then? Whats the best way to keep the search db fresh, ie. revisit the existing pages and crawl new links. * How to use the parse-ext module - to parse stuff on your intranet not supported by the existing parsers
* How to customize the web-interface
Please post any comments on the article page itself.
I have to register to do that! ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
