I'll look around the code to make sure I am creating only one instance of Configuration in my classes, and will play around with the maxpermgen settings.
Any other input from people that have attempted this sort of setup would be appreciated. On 4/30/07, Briggs <[EMAIL PROTECTED]> wrote: > Well, in nutch 0.7 it was all due to NGramEntry instances held within > hashmaps that never get cleaned up. This code was in the language > plugin, but it has been moved into the nutch codebase. > > That wasn't the only problem, but that was a big one. I though > removing it would solve the problem, but then another creeped up. > > On 4/30/07, Sami Siren <[EMAIL PROTECTED]> wrote: > > Briggs wrote: > > > Version: Nutch 0.9 (but this applies to just about all versions) > > > > > > I'm really in a bind. > > > > > > Is anyone crawling from within a web application, or is everyone > > > running Nutch using the shell scripts provided? I am trying to write > > > a web application around the Nutch crawling facilities, but it seems > > > that there is are huge memory issues when trying to do this. The > > > container (tomcat 5.5.17 with 1.5 gigs of memory allocated, and 128K > > > on the stack) runs out of memory in less that an hour. When profiling > > > version 0.7.2 we can see that there is a constant pool of objects that > > > grow, but never get garbage collected. So, even when the crawl is > > > finished, these objects tend to just hang around forever, until we get > > > the wonderful: java.lang.OutOfMemoryError: PermGen space. I updated > > > the application to use Nutch 0.9 and the problem got about 80x worse > > > > Have you analyzed in any level of detail what is causing this memory > > wasting? Have you tried tweaking jvms XX:MaxPermSize? > > > > I believe that all the classes required by plugins need to be loaded > > multiple times (every time you execute a command where Configuration > > object is created) because of the design of plugin system where every > > plugin has it's own class loader (per configuration). > > > > > So, the current design is/was to have an event happen within the > > > system, which would fire off a crawler (currently just calls > > > org.apache.nutch.crawl.Crawl.main()). But, this has caused nothing > > > but grief. We need to have several crawlers running concurrently. We > > > > You should perhaps use and call the classes directly and take control of > > managing the Configuration object, this way PermGen size is not wasted > > by loading same classes over and over again. > > > > -- > > Sami Siren > > > > > -- > "Conscious decisions by conscious minds are what make reality real" > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
