I'm having big troubles with nutch 0.9 that I hadn't with 0.8. It seems
that the plugin repository initializes itself all the timem until I get
an out of memory exception. I've been seeing the code... the plugin
repository mantains a map from Configuration to plugin repositories, but
the Configuration object does not have an equals or hashCode method...
wouldn't it be nice to add such a method (comparing property values)?
Wouldn't that help prevent initializing many plugin repositories? What
could be the cause to may problem? (Aaah.. so many questions... =) )
Which job causes the problem? Perhaps, we can find out what keeps
creating a conf object over and over.
Also, I have tried what you have suggested (better caching for plugin
repository) and it really seems to make a difference. Can you try with
this patch(*) to see if it solves your problem?
(*) http://www.ceng.metu.edu.tr/~e1345172/plugin_repository_cache.patch
I'm running it. So far it's working ok, and I haven't seen all those
plugin loadings...
I've modified your patch though to define CACHE like this:
private static final Map<PluginProperty, PluginRepository> CACHE =
new LinkedHashMap<PluginProperty, PluginRepository>() {
@Override
protected boolean removeEldestEntry(
Entry<PluginProperty, PluginRepository> eldest) {
return size() > 10;
}
};
...which means an LRU cache with a fixed size of 10.