Actually in SLURM 2.2.0 that I am using now,  there is a support for parallel 
debugger and srun does provide needed info  and fill proc_table and set up all 
debug variable correctly. The only problem that I see so far is the one that I 
described. Maybe the solution would be to check if job was started by non 
orterun and then/or check for MPIR_debug_gate before waiting for signal.

Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies |
Rogue Wave Software Inc  |  24 Prime Parkway, Natick, MA 01760 | p 508-652-7739|
nikolay.pis...@roguewave.com<mailto:niko...@totalviewtech.com>
www.roguewave.com<http://www.roguewave.com>

From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Ralph Castain
Sent: Thursday, February 10, 2011 10:47 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in 
OMPI)

If you srun a job, then there is no "mpirun" to provide a proc_table. So 
running a  job directly via srun means you cannot run TV on it.


On Feb 10, 2011, at 8:34 AM, Nikolay Piskun wrote:



   Hi,
I am trying to use Totalview with srun and hit interesting problem. Looks like 
if OMPI is started by "srun   -mpi=ompi ", mpi job is hang in 
ompi_wait_for_debugger() subroutine. What happen, I think is ompi was compiled 
without ORTE_DISABLE_FULL_SUPPORT and as result rank 0 is waiting for message 
from HNP (by the way what is HNP?)  that was supposed to be send by orterun. 
The problem is that orterun was never invoked because MPI was initiated by 
srun, not orterun.  So what is the solution? Should we always compile OMPI with 
 ORTE_DISABLE_FULL_SUPPORT=true for anything that uses different starters like 
srun from SLURM?
Thanks
Nikolay

Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies |
Rogue Wave Software Inc  |  24 Prime Parkway, Natick, MA 01760 | p 508-652-7739|
nikolay.pis...@roguewave.com<mailto:niko...@totalviewtech.com>
www.roguewave.com<http://www.roguewave.com>

_______________________________________________
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to