Lukas, Ray wrote:
Hey Ray.. Great name you have there.. HA..
I don't actually care about deleting these files.. That is not the issue.. See I have embedded Nutch in my application. That application calls nutch over and over again to do crawling and index creation.. This thread that stays alive.. It eventually exceeds some limit (native thread) in Java and crashes my application.. So that is why I need to find and properly close down that service or whatever. I noticed that Hadopp files are still locked and so I am thinking that as a hint that it is hadopp..
Bottom line is
When you run Crawl in the java directory, some thread stays open.. That thread is killing me.. What is it that stays alive past the completion of the Crawl.java code...
If you run org.apache.nutch.crawl.Crawl from within java/eclispe something stays alive.. How to clise that is the issue..
See what I am asking..
First, don't use the Crawl class to implement continuous crawling in a
long-running application. This class was never meant to do this - e.g.
it instantiates various Nutch tools over and over again. Just replicate
the logic there in your own class so that you instantiate things once.
Second, it's likely that you're experiencing the PluginRepository
classloader issue, described here:
https://issues.apache.org/jira/browse/NUTCH-356 . The patch in this
issue is still not applied, because it's a hack, and there were few
active users who experienced this problem - because it occurs only in
long-running applications that run Nutch tools in the context of a
single JVM, and most users run Nutch tools from command-line.
And finally, if the application is stuck and doesn't exit due to a still
running thread, generate a thread dump and see what that thread is.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com