Hi Bruno, I see something similar to this when crawling about 35,000 local files. The last file is fetched and all visible progress stops. After about 8 minutes it says "Fetcher: done" and continues with the following stages of the crawl. I know it's doing something in that time since the CPU usage remains high. Try letting it run overnight. Maybe it's just quiet instead of frozen.
Best, Ian -----Original Message----- From: Sami Siren [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 18, 2006 7:26 AM To: [email protected] Subject: Re: Indexing the file system / best approach Bruno Thiel wrote: > All, > > I want to get nutch to index the file system. My first approach was to > nfs-mount the file system and et nutch crawl through the hierachary over > http/Apache. This turned out to be fairly slow ~3,000 fetches per hour. > Next approach was to go via file:/// <file:///> and to generate a file list > to be crawled. This file list is fairly big ~200,000 entries, and with the > current 0.8.1 release of nutch the fetcher just freezes right at the end of > a crawl. What exactly happens when your fetcher freezes? 200 000 entries is not a big list to be fetched. -- Sami Siren ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
