Shawn Gervais wrote:
When I have been at the terminal to observe the timed out process before it is reaped, I have seen that it continues to use 100% of a single processor. strace of the java process did not produce any usable leads. When the reduce task is reassigned, either to the same machine or another, it will die around the same percentage completion.

Did you try 'kill -QUIT' the process? That should print a stack trace for every thread.

Is there an option I can enable somewhere that will allow for more verbose output to be written to the logs? Any other suggestions on debugging this issue?

You could put add some print statements to FetcherOutputFormat.java, in the RecordWriter.write() method, printing each key (URL) written. That might let you figure out what page is hanging things.

It seems to me that it might be possible to take a snapshot of the task while it is running (i.e. data and the task job jar), so that I can debug it in isolation without re-running an entire fetch process. I am unsure of how this might be done, though.

Once you know the page (assuming it is determinisitic) then you should be able to run a fetch of just that page to test things.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to