Hi,

I've installed Nutch on my machine and convinced it to crawl our intranet, ie the local NFS and samba shares via the local filesystem and our local intranet web servers and I'm quite impressed with how well it works. One thing I'm not sure about though, is how the index is kept up to date. Is the "nutch crawl" command only used for creating the initial index/db? What do I need to do to keep the index/db up to date?

Things work well with html, msword and pdf, but I'd like to index zip-archives, tar.gz archives, rpm-files and openoffice documents as well. Are plugins for these file types available?

Regards,

Thomas Sondergaard


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to