Re: [MTT devel] Analysis of hung jobs.

2009-11-02 Thread Ashley Pittman

For the record Ethan and I took this off-list and got it working shortly
afterwards, results are now on-line and the code is in SVN.

Attached is a final patch to cleanup the output by removing a extraneous
space which is inserted in the final output.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk
Index: lib/MTT/DoCommand.pm
===
--- lib/MTT/DoCommand.pm	(revision 1327)
+++ lib/MTT/DoCommand.pm	(working copy)
@@ -619,7 +619,7 @@
 if (FindProgram(qw(padb))) {

 my $padb_cmd = "padb --config-option rmgr=mpirun --full-report=$pid";
-$ret .= "\n $padb_cmd";
+$ret .= "\n$padb_cmd";
 $ret .= "\n" . `$padb_cmd`;

 } else {


Re: [MTT devel] Analysis of hung jobs.

2009-10-07 Thread Ashley Pittman
On Wed, 2009-10-07 at 16:21 -0400, Ethan Mallove wrote:

>   No secret file (/home/em162155/.padb-secret)
>   Error: Could not load secret file on this node

You need to do this once to set a secret key for security purposes, run
the following two commands and try again.

echo secret=ochi4aeZ > /home/em162155/.padb-secret
chmod 0600 /home/em162155/.padb-secret

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [MTT devel] Analysis of hung jobs.

2009-10-07 Thread Ashley Pittman
On Wed, 2009-10-07 at 15:41 -0400, Ethan Mallove wrote:

> I got the following error doing a simple test:

As it happens I saw this error earlier on FC8, r279 should fix this
problem.

>   $ perl --version
>   This is perl, v5.8.4 built for sun4-solaris-64int

I had wondered if you'd be using solaris, this is not something I've
tested and not something I'd expect to work.  The stack trace code
should all be fine but there might be some problems reading data
from /proc.  In the past padb has worked on Tru64, possibly all that
needs porting would be getting parent pid and process name from ps
rather than /proc/status.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [MTT devel] Analysis of hung jobs.

2009-10-06 Thread Ashley Pittman
On Tue, 2009-10-06 at 11:25 -0400, Ethan Mallove wrote:
> On Tue, Oct/06/2009 10:23:48AM, Ashley Pittman wrote:
> > 
> > Further to the mail linked below, padb is able to perform diagnostics,
> > including backtraces on hung jobs and integrates well into automated
> > testing environments.
> 
> Can padb get a backtrace from a non-debuggable MPI (e.g., not compiled
> with -g)?

It's gets what is available from the application, without -g it will
give you function names only, with -g it will also give you file names
and line numbers and optionally variables, their types and values.

It can show the message queues regardless of the -g option.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk