Andreas Kostyrka wrote:
Ok, a new dead job: ;(
This time after 2.4GB/11,3M lines ;(
Any idea what I could do debug this?
(No idea how to go at debugging a Java process that is distributed and does
GBs of data.
Its one of the big problems of distributed computing; distributed debugging
How does one stabilize that kind of stuff to generate a
reproducable situation?)
-If you are using vmware/xen or similar with a private you can snapshot
the entire cluster, but then you are left with many GB of machine state
to deal with. But virtualisation has its own problems, as timing can get
very screwed up
-any hung process, kill -QUIT it and you get a stack trace
-logging, lots of it. Get all the boxes in perfect NTP-driven sync and
you can correlate events in a single-site cluster. Dealiing with logs
from the other side of the world is a harder problem -don't go there if
you can help it.
The x-trace team are trying to instrument hadoop for better debugging
http://radlab.cs.berkeley.edu/wiki/Projects/X-Trace_on_Hadoop
this looks really interesting
--
Steve Loughran http://www.1060.org/blogxter/publish/5
Author: Ant in Action http://antbook.org/