The problem was that I used part of the NutchBean code to perform searches and I kept the FetchedSegments instantiation in the constructor. Each time I performed a query the system opened many files of this kind:
(segment)/fetcher/data (segment)/fetcher_content/data (segment)/fetcher_text/data ... and it wasn't going to close them. Thanks Fabio > -----Original Message----- > From: Fabio Gasparetti [mailto:[EMAIL PROTECTED] > Sent: Monday, June 28, 2004 4:53 PM > To: '[EMAIL PROTECTED]' > Subject: Nutch & focused crawling > > > I've been working for a couple of weeks on a simple focused > crawler based on Nutch. I used the score field to assign a > priority to each url to be crawled by means of a particular > Prioritizer implementation, that could also be the current > Nutch link analysis algorithm of course. I basically iterate > the basic cycle: generate segment, fetch, updatedb, but in > the analyzer's place I placed a call to the ad hoc > prioritizer. Each iteration corresponds to a new segment. But > when I need to instantiate the MultiSearcher to run some > query in the cycle, for example to show some statistics, > after nearly 20 iterations (less then 1000 urls), that is 20 > Searcher calls, I got the "Too many open files" message. I > took care to close the Searcher when I finished with it and I > also raised the max opened-file settings but the problem > persists. Any suggestions? Thanks > > Fabio Gasparetti > > > > Nutch: 0.4 > Java: 1.4.2_01 > SO: Linux Red Hat 7.1 > 1Gbytes ram > > > 040628 150916 10 SEVERE Exception in CrawlerStat > call:java.io.FileNotFoundException: > pluto/segments/20040628150731/fetcher_text/index (Too many > open files) 040628 150916 10 indexing segment: > pluto/segments/20040628150903 java.lang.NullPointerException > at > org.apache.lucene.store.FSDirectory.create(FSDirectory.java:141) > at > org.apache.lucene.store.FSDirectory.<init>(FSDirectory.java:128) > at > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) > at > org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:193) > at > net.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:49) > at net.nutch.indexer.IndexSegment.main(IndexSegment.java:182) > at > com.parc.search.focusedcrawler.FocusedCrawlTool.run(FocusedCra > wlTool.java:173) > at > com.parc.search.focusedcrawler.FocusedCrawlTool.main(FocusedCr > awlTool.java:388) > 040628 150916 10 SEVERE java.lang.NullPointerException > > > ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
