Hi Bruno,
  I see something similar to this when crawling about 35,000 local
files.  The last file is fetched and all visible progress stops.  After
about 8 minutes it says "Fetcher: done" and continues with the following
stages of the crawl.  I know it's doing something in that time since the
CPU usage remains high.  Try letting it run overnight.  Maybe it's just
quiet instead of frozen.

Best,
Ian

-----Original Message-----
From: Sami Siren [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 18, 2006 7:26 AM
To: [email protected]
Subject: Re: Indexing the file system / best approach

Bruno Thiel wrote:
> All,
>
> I want to get nutch to index the file system. My first approach was to
> nfs-mount the file system and et nutch crawl through the hierachary
over
> http/Apache. This turned out to be fairly slow  ~3,000 fetches per
hour. 
> Next approach was to go via file:/// <file:///>  and to generate a
file list
> to be crawled. This file list is fairly big ~200,000 entries, and with
the
> current 0.8.1 release of nutch the fetcher just freezes right at the
end of
> a crawl.
What exactly happens when your fetcher freezes? 200 000 entries is not a

big list to
be fetched.

--
 Sami Siren


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to