Re: [MTT devel] Analysis of hung jobs.

2009-10-07 Thread Ashley Pittman
On Wed, 2009-10-07 at 16:21 -0400, Ethan Mallove wrote:

>   No secret file (/home/em162155/.padb-secret)
>   Error: Could not load secret file on this node

You need to do this once to set a secret key for security purposes, run
the following two commands and try again.

echo secret=ochi4aeZ > /home/em162155/.padb-secret
chmod 0600 /home/em162155/.padb-secret

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [MTT devel] Analysis of hung jobs.

2009-10-07 Thread Ashley Pittman
On Wed, 2009-10-07 at 15:41 -0400, Ethan Mallove wrote:

> I got the following error doing a simple test:

As it happens I saw this error earlier on FC8, r279 should fix this
problem.

>   $ perl --version
>   This is perl, v5.8.4 built for sun4-solaris-64int

I had wondered if you'd be using solaris, this is not something I've
tested and not something I'd expect to work.  The stack trace code
should all be fine but there might be some problems reading data
from /proc.  In the past padb has worked on Tru64, possibly all that
needs porting would be getting parent pid and process name from ps
rather than /proc/status.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [MTT devel] Analysis of hung jobs.

2009-10-07 Thread Ethan Mallove
On Tue, Oct/06/2009 04:30:52PM, Ashley Pittman wrote:
> On Tue, 2009-10-06 at 11:25 -0400, Ethan Mallove wrote:
> > On Tue, Oct/06/2009 10:23:48AM, Ashley Pittman wrote:
> > > 
> > > Further to the mail linked below, padb is able to perform diagnostics,
> > > including backtraces on hung jobs and integrates well into automated
> > > testing environments.
> > 
> > Can padb get a backtrace from a non-debuggable MPI (e.g., not compiled
> > with -g)?
> 
> It's gets what is available from the application, without -g it will
> give you function names only, with -g it will also give you file names
> and line numbers and optionally variables, their types and values.
> 
> It can show the message queues regardless of the -g option.

I got the following error doing a simple test:

  $ padb --config-option rmgr=mpirun --full-report=12480
  Nested quantifiers in regex; marked by <-- HERE in m/\A# 
Start of str.
 "# Quote
 ((?:[^"\\]++ <-- HERE |\\.)*+) # Anyting which isn't \"
 "# Close quote
 ,?   # An optional comma.
 (.*) # Rest of line
 \z   # end.
 / at padb line 5044.

  $ perl --version
  This is perl, v5.8.4 built for sun4-solaris-64int

-Ethan

> 
> Ashley.
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
> ___
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel