I'll have to get around to trying this in the future. I have already 'forked' the code. But, would like to get back on track too. So, guess I will post something, someday. The plugin part is now the least of my worries. Again, the parsing is what is killing me now. I don't use nutch in the 'out-of-the-box' fashion. My app is running in a container that crawls when messages to crawl are received.
On 5/29/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > On 5/29/07, Briggs <[EMAIL PROTECTED]> wrote: > > I have also noticed this. The code explicitly loads an instance of the > > plugins for every fetch (well, or parse etc., depending on what you > > are doing). This causes OutOfMemoryErrors. So, if you dump the heap, > > you can see the filter classes get loaded and the never get unloaded > > (they are loaded within their own classloader). So, you'll see the > > same class loaded thousands of time, which is bad. > > > > So, in my case, I had to change the way the plugins are loaded. > > Basically, I changed all the main plugin loaders (like > > URLFilters.java, IndexFilters.java) to be singletons with a single > > 'getInstance()' method on each. I don't need special configs for > > filters so I can deal with singletons. > > > > You'll find the heart of the problem somewhere in the extension point > > class(es). It calls newInstance() an aweful lot. But, the classloader > > (one per plugin) never gets destroyed, or something so.... this can be > > nasty. > > > > I'm still dealing with my OutOfMemory errors on parsing, yuck. > > Well then can you test the patch too? Nicolas's idea seems to be the > right one. After this patch, I think plugin loaders will see the same > PluginRepository instance. > > > > > > > > > > > > > On 5/29/07, Doğacan Güney <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > On 5/28/07, Nicolás Lichtmaier <[EMAIL PROTECTED]> wrote: > > > > I'm having big troubles with nutch 0.9 that I hadn't with 0.8. It seems > > > > that the plugin repository initializes itself all the timem until I get > > > > an out of memory exception. I've been seeing the code... the plugin > > > > repository mantains a map from Configuration to plugin repositories, but > > > > the Configuration object does not have an equals or hashCode method... > > > > wouldn't it be nice to add such a method (comparing property values)? > > > > Wouldn't that help prevent initializing many plugin repositories? What > > > > could be the cause to may problem? (Aaah.. so many questions... =) ) > > > > > > Which job causes the problem? Perhaps, we can find out what keeps > > > creating a conf object over and over. > > > > > > Also, I have tried what you have suggested (better caching for plugin > > > repository) and it really seems to make a difference. Can you try with > > > this patch(*) to see if it solves your problem? > > > > > > (*) http://www.ceng.metu.edu.tr/~e1345172/plugin_repository_cache.patch > > > > > > > > > > > Bye! > > > > > > > > > > > > > -- > > > Doğacan Güney > > > > > > > > > -- > > "Conscious decisions by conscious minds are what make reality real" > > > > > -- > Doğacan Güney > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers