Andrzej: I agree on all points.. I have (I just recently posted this) implemented the Crawl in my own class.. For the exact reasons that you mentioned.. You are 100 percent correct..
I will, as you suggested read up on the PluginRepository classloader issue. Thanks I was trying to kill that thread but could not find a good way of doing that.. I am on XP in eclipse.. How would I do that.. I would love to know.. If you would.. Tell me.. That would be great. If you would.. ray -----Original Message----- From: Andrzej Bialecki [mailto:[email protected]] Sent: Thursday, April 23, 2009 10:35 AM To: [email protected] Subject: Re: Hadoop thread seems to remain alive Lukas, Ray wrote: > Hey Ray.. Great name you have there.. HA.. > > I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls nutch over and over again to do crawling and index creation.. This thread that stays alive.. It eventually exceeds some limit (native thread) in Java and crashes my application.. So that is why I need to find and properly close down that service or whatever. I noticed that Hadopp files are still locked and so I am thinking that as a hint that it is hadopp.. > > Bottom line is > > When you run Crawl in the java directory, some thread stays open.. That thread is killing me.. What is it that stays alive past the completion of the Crawl.java code... > If you run org.apache.nutch.crawl.Crawl from within java/eclispe something stays alive.. How to clise that is the issue.. > > See what I am asking.. > First, don't use the Crawl class to implement continuous crawling in a long-running application. This class was never meant to do this - e.g. it instantiates various Nutch tools over and over again. Just replicate the logic there in your own class so that you instantiate things once. Second, it's likely that you're experiencing the PluginRepository classloader issue, described here: https://issues.apache.org/jira/browse/NUTCH-356 . The patch in this issue is still not applied, because it's a hack, and there were few active users who experienced this problem - because it occurs only in long-running applications that run Nutch tools in the context of a single JVM, and most users run Nutch tools from command-line. And finally, if the application is stuck and doesn't exit due to a still running thread, generate a thread dump and see what that thread is. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
