Re: Possible memory leak?
You do not need to implement any special interface any object will do. -- Sami Siren Enrico Triolo wrote: I'm trying to fix this bug, so I looked at some source code to see how other objects are cached in the configuration. I see for example in CommonGrams.java that an Hashtable is put into the configuration using the setObject() method. Could I use the same method? Can I put arbitrary objects in the configuration or must they implement/extend some interface/class (maybe Serializable?). Enrico On 6/28/06, Enrico Triolo <[EMAIL PROTECTED]> wrote: Sure! On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: > It seems to be a side effect of NUTCH-169 (remove static NutchConf). > Prior to this, the language identifier was a singleton. > I think we should cache its instance in the conf as we do for many others > objects > in Nutch. > Enrico, could you please create a JIRA issue. > > Thanks > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > >
Re: Possible memory leak?
I'm trying to fix this bug, so I looked at some source code to see how other objects are cached in the configuration. I see for example in CommonGrams.java that an Hashtable is put into the configuration using the setObject() method. Could I use the same method? Can I put arbitrary objects in the configuration or must they implement/extend some interface/class (maybe Serializable?). Enrico On 6/28/06, Enrico Triolo <[EMAIL PROTECTED]> wrote: Sure! On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: > It seems to be a side effect of NUTCH-169 (remove static NutchConf). > Prior to this, the language identifier was a singleton. > I think we should cache its instance in the conf as we do for many others > objects > in Nutch. > Enrico, could you please create a JIRA issue. > > Thanks > > Jérôme > > -- > http://motrech.free.fr/ > http://www.frutch.org/ > >
Re: Possible memory leak?
Sure! On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote: It seems to be a side effect of NUTCH-169 (remove static NutchConf). Prior to this, the language identifier was a singleton. I think we should cache its instance in the conf as we do for many others objects in Nutch. Enrico, could you please create a JIRA issue. Thanks Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: Possible memory leak?
It seems to be a side effect of NUTCH-169 (remove static NutchConf). Prior to this, the language identifier was a singleton. I think we should cache its instance in the conf as we do for many others objects in Nutch. Enrico, could you please create a JIRA issue. Thanks Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: Possible memory leak?
Enrico Triolo wrote: Using a profiler (specifically, netbeans profiler) I found out that for each submitted url a new LanguageIdentifier instance is created, and never released. With the memory inspector tool I can see as many instances of LanguageIdentifier and NGramProfile$NGramEntry as the number of fetched pages, each of them occupying about 180kb. Forcing garbage collection doesn't release much memory. Yes, this looks like a bug. A single instance of LanguageIdentifier per task should be cached in the job "context" (i.e. Configuration instance), to avoid too many instantiations. Since I was still having some strange results with the profiler, I added a println message in the getInstance method, to monitor effectively singleton creation. It turns out that the singleton is re-istantiated each time! I can't really understand why this is happening, maybe is something related to hadoop internals? I remember a similar situation I had, where instance variables were not initialized after the object was created with Class.newInstance(). VM bug? not sure... I didn't track it down that time, I simply moved the variable initialization to setConf(), which solved my problem. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Possible memory leak?
Hi all, in my application I often need to perform the inject -> generate -> .. -> index loop multiple times, since users can 'suggest' new web pages to be crawled and indexed. I also need to enable the language identifier plugin. Everything seems to work correctly, but after some time I get an OutOfMemoryException. Actually the time isn't important, since I noticed that the problem arises when the user submits many urls (~100). As I said, for each submitted url a new loop is performed (similar to the one in the Crawl.main method). Using a profiler (specifically, netbeans profiler) I found out that for each submitted url a new LanguageIdentifier instance is created, and never released. With the memory inspector tool I can see as many instances of LanguageIdentifier and NGramProfile$NGramEntry as the number of fetched pages, each of them occupying about 180kb. Forcing garbage collection doesn't release much memory. LanguageIdentifier has a static class variable 'identifier' that is never used; reading through the code it seems that the original idea was to implement a singleton pattern. So, to limit memory usage, I implemented a static getInstance method and modified the LanguageIndexingFilter class making it to use the singleton. Since I was still having some strange results with the profiler, I added a println message in the getInstance method, to monitor effectively singleton creation. It turns out that the singleton is re-istantiated each time! I can't really understand why this is happening, maybe is something related to hadoop internals? Cheers, Enrico