I'm having big troubles with nutch 0.9 that I hadn't with 0.8. It seems
that the plugin repository initializes itself all the timem until I get
an out of memory exception. I've been seeing the code... the plugin
repository mantains a map from Configuration to plugin repositories, but
the Configuration object does not have an equals or hashCode method...
wouldn't it be nice to add such a method (comparing property values)?
Wouldn't that help prevent initializing many plugin repositories? What
could be the cause to may problem? (Aaah.. so many questions... =) )

Which job causes the problem? Perhaps, we can find out what keeps
creating a conf object over and over.

Also, I have tried what you have suggested (better caching for plugin
repository) and it really seems to make a difference. Can you try with
this patch(*) to see if it solves your problem?

(*) http://www.ceng.metu.edu.tr/~e1345172/plugin_repository_cache.patch

I'm running it. So far it's working ok, and I haven't seen all those plugin loadings...

I've modified your patch though to define CACHE like this:

 private static final Map<PluginProperty, PluginRepository> CACHE =
     new LinkedHashMap<PluginProperty, PluginRepository>() {
   @Override
   protected boolean removeEldestEntry(
       Entry<PluginProperty, PluginRepository> eldest) {
         return size() > 10;
   }
 };

...which means an LRU cache with a fixed size of 10.

Reply via email to