I'll have to get around to trying this in the future.  I have already
'forked' the code. But, would like to get back on track too.  So,
guess I will post something, someday.   The plugin part is now the
least of my worries.  Again, the parsing is what is killing me now.  I
don't use nutch in the 'out-of-the-box' fashion.  My app is running in
a container that crawls when messages to crawl are received.

On 5/29/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> On 5/29/07, Briggs <[EMAIL PROTECTED]> wrote:
> > I have also noticed this. The code explicitly loads an instance of the
> > plugins for every fetch (well, or parse etc., depending on what you
> > are doing). This causes OutOfMemoryErrors. So, if you dump the heap,
> > you can see the filter classes get loaded and the never get unloaded
> > (they are loaded within their own classloader). So, you'll see the
> > same class loaded thousands of time, which is bad.
> >
> > So, in my case, I had to change the way the plugins are loaded.
> > Basically, I changed all the main plugin loaders (like
> > URLFilters.java, IndexFilters.java) to be singletons with a single
> > 'getInstance()' method on each. I don't need special configs for
> > filters so I can deal with singletons.
> >
> > You'll find the heart of the problem somewhere in the extension point
> > class(es).  It calls newInstance() an aweful lot. But, the classloader
> > (one per plugin) never gets destroyed, or something so.... this can be
> > nasty.
> >
> > I'm still dealing with my OutOfMemory errors on parsing, yuck.
>
> Well then can you test the patch too? Nicolas's idea seems to be the
> right one. After this patch, I think plugin loaders will see the same
> PluginRepository instance.
>
> >
> >
> >
> >
> >
> > On 5/29/07, Doğacan Güney <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > On 5/28/07, Nicolás Lichtmaier <[EMAIL PROTECTED]> wrote:
> > > > I'm having big troubles with nutch 0.9 that I hadn't with 0.8. It seems
> > > > that the plugin repository initializes itself all the timem until I get
> > > > an out of memory exception. I've been seeing the code... the plugin
> > > > repository mantains a map from Configuration to plugin repositories, but
> > > > the Configuration object does not have an equals or hashCode method...
> > > > wouldn't it be nice to add such a method (comparing property values)?
> > > > Wouldn't that help prevent initializing many plugin repositories? What
> > > > could be the cause to may problem? (Aaah.. so many questions... =) )
> > >
> > > Which job causes the problem? Perhaps, we can find out what keeps
> > > creating a conf object over and over.
> > >
> > > Also, I have tried what you have suggested (better caching for plugin
> > > repository) and it really seems to make a difference. Can you try with
> > > this patch(*) to see if it solves your problem?
> > >
> > > (*) http://www.ceng.metu.edu.tr/~e1345172/plugin_repository_cache.patch
> > >
> > > >
> > > > Bye!
> > > >
> > >
> > >
> > > --
> > > Doğacan Güney
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>
>
> --
> Doğacan Güney
>


-- 
"Conscious decisions by conscious minds are what make reality real"
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to