Andrzej:
I agree on all points.. I have (I just recently posted this) implemented
the Crawl in my own class.. For the exact reasons that you mentioned..
You are 100 percent correct.. 

I will, as you suggested read up on the PluginRepository classloader
issue. Thanks 

I was trying to kill that thread but could not find a good way of doing
that.. I am on XP in eclipse.. How would I do that.. I would love to
know.. If you would.. Tell me.. That would be great. If you would..

ray



-----Original Message-----
From: Andrzej Bialecki [mailto:[email protected]] 
Sent: Thursday, April 23, 2009 10:35 AM
To: [email protected]
Subject: Re: Hadoop thread seems to remain alive

Lukas, Ray wrote:
> Hey Ray.. Great name you have there.. HA.. 
> 
> I don't actually care about deleting these files.. That is not the
issue.. See I have embedded Nutch in my application. That application
calls nutch over and over again to do crawling and index creation.. This
thread that stays alive.. It eventually exceeds some limit (native
thread) in Java and crashes my application.. So that is why I need to
find and properly close down that service or whatever. I noticed that
Hadopp files are still locked and so I am thinking that as a hint that
it is hadopp.. 
> 
> Bottom line is
> 
> When you run Crawl in the java directory, some thread stays open..
That thread is killing me.. What is it that stays alive past the
completion of the Crawl.java code... 
> If you run org.apache.nutch.crawl.Crawl from within java/eclispe
something stays alive.. How to clise that is the issue.. 
> 
> See what I am asking.. 
> 

First, don't use the Crawl class to implement continuous crawling in a 
long-running application. This class was never meant to do this - e.g. 
it instantiates various Nutch tools over and over again. Just replicate 
the logic there in your own class so that you instantiate things once.

Second, it's likely that you're experiencing the PluginRepository 
classloader issue, described here: 
https://issues.apache.org/jira/browse/NUTCH-356 . The patch in this 
issue is still not applied, because it's a hack, and there were few 
active users who experienced this problem - because it occurs only in 
long-running applications that run Nutch tools in the context of a 
single JVM, and most users run Nutch tools from command-line.

And finally, if the application is stuck and doesn't exit due to a still

running thread, generate a thread dump and see what that thread is.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to