Hi all We have been using nutch for crawling individual URLs from a queue and the crawling process is being scheduled using quartz. We started to see memory leaks around the Language Identifier plugin which seems to be retaining NGramEntry's for some (still) unknown reason.
We are using Spring for DI of Nutch tools (ie. Generator, Indexer, etc) into our Crawl object. Still, the crawl object is released (crawl = null) after each crawl is finished and all objects are using spring's prototype scope (including the Indexer), which means Spring won't retain references to those objects. So, any suggestions about how to solve this? Why are this objects retained when the Crawl object is released? Has Nutch been thought for such long running iterative processes? Could this be a Plugin Framework issue? I am attaching a reference image generated with JRockit. Thanks in advance... Rodrigo
