The problem was that I used part of the NutchBean code
to perform searches and I kept the FetchedSegments instantiation
in the constructor. Each time I performed a query the system 
opened many files of this kind:

(segment)/fetcher/data
(segment)/fetcher_content/data
(segment)/fetcher_text/data
...

and it wasn't going to close them.
Thanks

Fabio


> -----Original Message-----
> From: Fabio Gasparetti [mailto:[EMAIL PROTECTED] 
> Sent: Monday, June 28, 2004 4:53 PM
> To: '[EMAIL PROTECTED]'
> Subject: Nutch & focused crawling
> 
> 
> I've been working for a couple of weeks on a simple focused 
> crawler based on Nutch. I used the score field to assign a 
> priority to each url to be crawled by means of a particular 
> Prioritizer implementation, that could also be the current 
> Nutch link analysis algorithm of course. I basically iterate 
> the basic cycle: generate segment, fetch, updatedb, but in 
> the analyzer's place I placed a call to the ad hoc 
> prioritizer. Each iteration corresponds to a new segment. But 
> when I need to instantiate the MultiSearcher to run some 
> query in the cycle, for example to show some statistics, 
> after nearly 20 iterations (less then 1000 urls), that is 20 
> Searcher calls, I got the "Too many open files" message. I 
> took care to close the Searcher when I finished with it and I 
> also raised the max opened-file settings but the problem 
> persists. Any suggestions? Thanks
> 
> Fabio Gasparetti
> 
> 
> 
> Nutch: 0.4
> Java: 1.4.2_01
> SO: Linux Red Hat 7.1
> 1Gbytes ram
> 
> 
> 040628 150916 10 SEVERE Exception in CrawlerStat 
> call:java.io.FileNotFoundException: 
> pluto/segments/20040628150731/fetcher_text/index (Too many 
> open files) 040628 150916 10 indexing segment: 
> pluto/segments/20040628150903 java.lang.NullPointerException
>       at 
> org.apache.lucene.store.FSDirectory.create(FSDirectory.java:141)
>       at 
> org.apache.lucene.store.FSDirectory.<init>(FSDirectory.java:128)
>       at 
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
>       at 
> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:193)
>       at 
> net.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:49)
>       at net.nutch.indexer.IndexSegment.main(IndexSegment.java:182)
>       at 
> com.parc.search.focusedcrawler.FocusedCrawlTool.run(FocusedCra
> wlTool.java:173)
>       at 
> com.parc.search.focusedcrawler.FocusedCrawlTool.main(FocusedCr
> awlTool.java:388)
> 040628 150916 10 SEVERE java.lang.NullPointerException
> 
> 
> 



-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to