Hi all

We have been using nutch for crawling individual URLs from a queue and the
crawling process is being scheduled using quartz. We started to see memory
leaks around the Language Identifier plugin which seems to be retaining
NGramEntry's for some (still) unknown reason.

We are using Spring for DI of Nutch tools (ie. Generator, Indexer, etc) into
our Crawl object. Still, the crawl object is released (crawl = null) after
each crawl is finished and all objects are using spring's prototype scope
(including the Indexer), which means Spring won't retain references to those
objects.

So, any suggestions about how to solve this? Why are this objects retained
when the Crawl object is released? Has Nutch been thought for such long
running iterative processes? Could this be a Plugin Framework issue?

I am attaching a reference image generated with JRockit. Thanks in
advance...

Rodrigo

Reply via email to