Briggs wrote:
> Version:  Nutch 0.9 (but this applies to just about all versions)
> 
> I'm really in a bind.
> 
> Is anyone crawling from within a web application, or is everyone
> running Nutch using the shell scripts provided?  I am trying to write
> a web application around the Nutch crawling facilities, but it seems
> that there is are huge memory issues when trying to do this.   The
> container (tomcat 5.5.17 with 1.5 gigs of memory allocated, and 128K
> on the stack) runs out of memory in less that an hour.  When profiling
> version 0.7.2 we can see that there is a constant pool of objects that
> grow, but never get garbage collected.  So, even when the crawl is
> finished, these objects tend to just hang around forever, until we get
> the wonderful: java.lang.OutOfMemoryError: PermGen space.  I updated
> the application to use Nutch 0.9 and the problem got about 80x worse

Have you analyzed in any level of detail what is causing this memory
wasting?  Have you tried tweaking jvms XX:MaxPermSize?

I believe that all the classes required by plugins need to be loaded
multiple times (every time you execute a command where Configuration
object is created) because of the design of plugin system where every
plugin has it's own class loader (per configuration).

> So, the current design is/was to have an event happen within the
> system, which would fire off a crawler (currently just calls
> org.apache.nutch.crawl.Crawl.main()).  But, this has caused nothing
> but grief.  We need to have several crawlers running concurrently. We

You should perhaps use and call the classes directly and take control of
managing the Configuration object, this way PermGen size is not wasted
by loading same classes over and over again.

-- 
 Sami Siren

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to