Re: Possible memory leak?

2006-07-13 Thread Sami Siren

You do not need to implement any special interface any object will do.
--
Sami Siren

Enrico Triolo wrote:


I'm trying to fix this bug, so I looked at some source code to see how
other objects are cached in the configuration.
I see for example in CommonGrams.java that an Hashtable is put into
the configuration using the setObject() method. Could I use the same
method? Can I put arbitrary objects in the configuration or must they
implement/extend some interface/class (maybe Serializable?).

Enrico

On 6/28/06, Enrico Triolo <[EMAIL PROTECTED]> wrote:


Sure!

On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:
> It seems to be a side effect of NUTCH-169 (remove static NutchConf).
> Prior to this, the language identifier was a singleton.
> I think we should cache its instance in the conf as we do for many 
others

> objects
> in Nutch.
> Enrico, could you please create a JIRA issue.
>
> Thanks
>
> Jérôme
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>







Re: Possible memory leak?

2006-07-13 Thread Enrico Triolo

I'm trying to fix this bug, so I looked at some source code to see how
other objects are cached in the configuration.
I see for example in CommonGrams.java that an Hashtable is put into
the configuration using the setObject() method. Could I use the same
method? Can I put arbitrary objects in the configuration or must they
implement/extend some interface/class (maybe Serializable?).

Enrico

On 6/28/06, Enrico Triolo <[EMAIL PROTECTED]> wrote:

Sure!

On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:
> It seems to be a side effect of NUTCH-169 (remove static NutchConf).
> Prior to this, the language identifier was a singleton.
> I think we should cache its instance in the conf as we do for many others
> objects
> in Nutch.
> Enrico, could you please create a JIRA issue.
>
> Thanks
>
> Jérôme
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
>



Re: Possible memory leak?

2006-06-28 Thread Enrico Triolo

Sure!

On 6/28/06, Jérôme Charron <[EMAIL PROTECTED]> wrote:

It seems to be a side effect of NUTCH-169 (remove static NutchConf).
Prior to this, the language identifier was a singleton.
I think we should cache its instance in the conf as we do for many others
objects
in Nutch.
Enrico, could you please create a JIRA issue.

Thanks

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/




Re: Possible memory leak?

2006-06-28 Thread Jérôme Charron

It seems to be a side effect of NUTCH-169 (remove static NutchConf).
Prior to this, the language identifier was a singleton.
I think we should cache its instance in the conf as we do for many others
objects
in Nutch.
Enrico, could you please create a JIRA issue.

Thanks

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: Possible memory leak?

2006-06-28 Thread Andrzej Bialecki

Enrico Triolo wrote:

Using a profiler (specifically, netbeans profiler) I found out that
for each submitted url a new LanguageIdentifier instance is created,
and never released. With the memory inspector tool I can see as many
instances of LanguageIdentifier and NGramProfile$NGramEntry as the
number of fetched pages, each of them occupying about 180kb. Forcing
garbage collection doesn't release much memory.


Yes, this looks like a bug. A single instance of LanguageIdentifier per 
task should be cached in the job "context" (i.e. Configuration 
instance), to avoid too many instantiations.




Since I was still having some strange results with the profiler, I
added a println message in the getInstance method, to monitor
effectively singleton creation. It turns out that the singleton is
re-istantiated each time!
I can't really understand why this is happening, maybe is something
related to hadoop internals?


I remember a similar situation I had, where instance variables were not 
initialized after the object was created with Class.newInstance(). VM 
bug? not sure... I didn't track it down that time, I simply moved the 
variable initialization to setConf(), which solved my problem.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




Possible memory leak?

2006-06-28 Thread Enrico Triolo

Hi all, in my application I often need to perform the inject ->
generate -> .. -> index loop multiple times, since users can 'suggest'
new web pages to be crawled and indexed.
I also need to enable the language identifier plugin.

Everything seems to work correctly, but after some time I get an
OutOfMemoryException. Actually the time isn't important, since I
noticed that the problem arises when the user submits many urls
(~100). As I said, for each submitted url a new loop is performed
(similar to the one in the Crawl.main method).

Using a profiler (specifically, netbeans profiler) I found out that
for each submitted url a new LanguageIdentifier instance is created,
and never released. With the memory inspector tool I can see as many
instances of LanguageIdentifier and NGramProfile$NGramEntry as the
number of fetched pages, each of them occupying about 180kb. Forcing
garbage collection doesn't release much memory.

LanguageIdentifier has a static class variable 'identifier' that is
never used; reading through the code it seems that the original idea
was to implement a singleton pattern.
So, to limit memory usage, I implemented a static getInstance method
and modified the LanguageIndexingFilter class making it to use the
singleton.
Since I was still having some strange results with the profiler, I
added a println message in the getInstance method, to monitor
effectively singleton creation. It turns out that the singleton is
re-istantiated each time!
I can't really understand why this is happening, maybe is something
related to hadoop internals?

Cheers,
Enrico