Thanks much, looks like this should work. The patch is one line: -------------------------------------------------------------- diff -c ompi_debuggers.c ompi_debuggers.c.old *** ompi_debuggers.c Thu Feb 10 15:13:07 2011 --- ompi_debuggers.c.old Fri Jan 22 09:21:23 2010 *************** *** 222,228 **** mpimsgq_dll_locations = tmp1; mpidbg_dll_locations = tmp2;
! if (ORTE_DISABLE_FULL_SUPPORT || orte_standalone_operation) { /* spin until debugger attaches and releases us */ while (MPIR_debug_gate == 0) { #if defined(__WINDOWS__) --- 222,228 ---- mpimsgq_dll_locations = tmp1; mpidbg_dll_locations = tmp2; ! if (ORTE_DISABLE_FULL_SUPPORT) { /* spin until debugger attaches and releases us */ while (MPIR_debug_gate == 0) { #if defined(__WINDOWS__) ---------------------------------------------------------------- What would be the best way to put it in? -- Nikolay Piskun Director of Continuing Engineering TotalView Technologies, Rogue Wave Software company mailto:niko...@totalviewtech.com phone: 508-652-7739 24 Prime Parkway, Natick, MA 01760 http://www.totalviewtech.com ________________________________________ From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of Ralph Castain [r...@open-mpi.org] Sent: Thursday, February 10, 2011 12:42 PM To: Open MPI Developers Subject: Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI) FWIW: there already is a flag in ORTE that gets set when procs are launched by a non-orterun entity: orte_standalone_operation. So all you would have to do is add an appropriate check for that flag to be true. On Feb 10, 2011, at 9:18 AM, Jeff Squyres wrote: > I think what Ralph was trying to say is that Open MPI doesn't (currently) > support running parallel debuggers when only srun is used (and mpirun is not). > > We'd certainly be open to someone submitting a patch to enable this > functionality, though! > > > On Feb 10, 2011, at 8:02 AM, Nikolay Piskun wrote: > >> Actually in SLURM 2.2.0 that I am using now, there is a support for >> parallel debugger and srun does provide needed info and fill proc_table and >> set up all debug variable correctly. The only problem that I see so far is >> the one that I described. Maybe the solution would be to check if job was >> started by non orterun and then/or check for MPIR_debug_gate before waiting >> for signal. >> >> Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies >> | >> Rogue Wave Software Inc | 24 Prime Parkway, Natick, MA 01760 | p >> 508-652-7739| >> nikolay.pis...@roguewave.com >> www.roguewave.com >> >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of Ralph Castain >> Sent: Thursday, February 10, 2011 10:47 AM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang >> in OMPI) >> >> If you srun a job, then there is no "mpirun" to provide a proc_table. So >> running a job directly via srun means you cannot run TV on it. >> >> >> On Feb 10, 2011, at 8:34 AM, Nikolay Piskun wrote: >> >> >> >> Hi, >> I am trying to use Totalview with srun and hit interesting problem. Looks >> like if OMPI is started by “srun –mpi=ompi ”, mpi job is hang in >> ompi_wait_for_debugger() subroutine. What happen, I think is ompi was >> compiled without ORTE_DISABLE_FULL_SUPPORT and as result rank 0 is waiting >> for message from HNP (by the way what is HNP?) that was supposed to be send >> by orterun. The problem is that orterun was never invoked because MPI was >> initiated by srun, not orterun. So what is the solution? Should we always >> compile OMPI with ORTE_DISABLE_FULL_SUPPORT=true for anything that uses >> different starters like srun from SLURM? >> Thanks >> Nikolay >> >> Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies >> | >> Rogue Wave Software Inc | 24 Prime Parkway, Natick, MA 01760 | p >> 508-652-7739| >> nikolay.pis...@roguewave.com >> www.roguewave.com >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel