Re: Stackoverflow

Steve Loughran Wed, 04 Jun 2008 03:01:41 -0700

Andreas Kostyrka wrote:

Ok, a new dead job: ;(
This time after 2.4GB/11,3M lines ;(

Any idea what I could do debug this?
(No idea how to go at debugging a Java process that is distributed and doesGBs of data.


Its one of the big problems of distributed computing; distributed debugging

How does one stabilize that kind of stuff to generate areproducable situation?)

-If you are using vmware/xen or similar with a private you can snapshotthe entire cluster, but then you are left with many GB of machine stateto deal with. But virtualisation has its own problems, as timing can getvery screwed up


-any hung process, kill -QUIT it and you get a stack trace

-logging, lots of it. Get all the boxes in perfect NTP-driven sync andyou can correlate events in a single-site cluster. Dealiing with logsfrom the other side of the world is a harder problem -don't go there ifyou can help it.


The x-trace team are trying to instrument hadoop for better debugging
http://radlab.cs.berkeley.edu/wiki/Projects/X-Trace_on_Hadoop

this looks really interesting

--
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Re: Stackoverflow

Reply via email to