Re: [MTT devel] Analysis of hung jobs.
For the record Ethan and I took this off-list and got it working shortly afterwards, results are now on-line and the code is in SVN. Attached is a final patch to cleanup the output by removing a extraneous space which is inserted in the final output. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk Index: lib/MTT/DoCommand.pm === --- lib/MTT/DoCommand.pm (revision 1327) +++ lib/MTT/DoCommand.pm (working copy) @@ -619,7 +619,7 @@ if (FindProgram(qw(padb))) { my $padb_cmd = "padb --config-option rmgr=mpirun --full-report=$pid"; -$ret .= "\n $padb_cmd"; +$ret .= "\n$padb_cmd"; $ret .= "\n" . `$padb_cmd`; } else {
Re: [MTT devel] Analysis of hung jobs.
On Wed, 2009-10-07 at 16:21 -0400, Ethan Mallove wrote: > No secret file (/home/em162155/.padb-secret) > Error: Could not load secret file on this node You need to do this once to set a secret key for security purposes, run the following two commands and try again. echo secret=ochi4aeZ > /home/em162155/.padb-secret chmod 0600 /home/em162155/.padb-secret Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [MTT devel] Analysis of hung jobs.
On Wed, 2009-10-07 at 15:41 -0400, Ethan Mallove wrote: > I got the following error doing a simple test: As it happens I saw this error earlier on FC8, r279 should fix this problem. > $ perl --version > This is perl, v5.8.4 built for sun4-solaris-64int I had wondered if you'd be using solaris, this is not something I've tested and not something I'd expect to work. The stack trace code should all be fine but there might be some problems reading data from /proc. In the past padb has worked on Tru64, possibly all that needs porting would be getting parent pid and process name from ps rather than /proc/status. Ashley, -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk
Re: [MTT devel] Analysis of hung jobs.
On Tue, 2009-10-06 at 11:25 -0400, Ethan Mallove wrote: > On Tue, Oct/06/2009 10:23:48AM, Ashley Pittman wrote: > > > > Further to the mail linked below, padb is able to perform diagnostics, > > including backtraces on hung jobs and integrates well into automated > > testing environments. > > Can padb get a backtrace from a non-debuggable MPI (e.g., not compiled > with -g)? It's gets what is available from the application, without -g it will give you function names only, with -g it will also give you file names and line numbers and optionally variables, their types and values. It can show the message queues regardless of the -g option. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk