Thanks much, looks like this should work. The patch is one line:
-------------------------------------------------------------- 
 diff -c ompi_debuggers.c ompi_debuggers.c.old 
*** ompi_debuggers.c    Thu Feb 10 15:13:07 2011
--- ompi_debuggers.c.old        Fri Jan 22 09:21:23 2010
***************
*** 222,228 ****
      mpimsgq_dll_locations = tmp1;
      mpidbg_dll_locations = tmp2;

!     if (ORTE_DISABLE_FULL_SUPPORT || orte_standalone_operation) {
          /* spin until debugger attaches and releases us */
          while (MPIR_debug_gate == 0) {
  #if defined(__WINDOWS__)
--- 222,228 ----
      mpimsgq_dll_locations = tmp1;
      mpidbg_dll_locations = tmp2;

!     if (ORTE_DISABLE_FULL_SUPPORT) {
          /* spin until debugger attaches and releases us */
          while (MPIR_debug_gate == 0) {
  #if defined(__WINDOWS__)
----------------------------------------------------------------
 What would be the best way to put it in? 

--
Nikolay Piskun
Director of Continuing Engineering
TotalView Technologies, Rogue Wave Software company
mailto:niko...@totalviewtech.com   phone: 508-652-7739
24 Prime Parkway,          Natick, MA 01760
http://www.totalviewtech.com
________________________________________
From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of 
Ralph Castain [r...@open-mpi.org]
Sent: Thursday, February 10, 2011 12:42 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Debugger problem with srun and openmpi 1.5    (hang   
in OMPI)

FWIW: there already is a flag in ORTE that gets set when procs are launched by 
a non-orterun entity: orte_standalone_operation. So all you would have to do is 
add an appropriate check for that flag to be true.


On Feb 10, 2011, at 9:18 AM, Jeff Squyres wrote:

> I think what Ralph was trying to say is that Open MPI doesn't (currently) 
> support running parallel debuggers when only srun is used (and mpirun is not).
>
> We'd certainly be open to someone submitting a patch to enable this 
> functionality, though!
>
>
> On Feb 10, 2011, at 8:02 AM, Nikolay Piskun wrote:
>
>> Actually in SLURM 2.2.0 that I am using now,  there is a support for 
>> parallel debugger and srun does provide needed info  and fill proc_table and 
>> set up all debug variable correctly. The only problem that I see so far is 
>> the one that I described. Maybe the solution would be to check if job was 
>> started by non orterun and then/or check for MPIR_debug_gate before waiting 
>> for signal.
>>
>> Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies 
>> |
>> Rogue Wave Software Inc  |  24 Prime Parkway, Natick, MA 01760 | p 
>> 508-652-7739|
>> nikolay.pis...@roguewave.com
>> www.roguewave.com
>>
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>> Behalf Of Ralph Castain
>> Sent: Thursday, February 10, 2011 10:47 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang 
>> in OMPI)
>>
>> If you srun a job, then there is no "mpirun" to provide a proc_table. So 
>> running a  job directly via srun means you cannot run TV on it.
>>
>>
>> On Feb 10, 2011, at 8:34 AM, Nikolay Piskun wrote:
>>
>>
>>
>>   Hi,
>> I am trying to use Totalview with srun and hit interesting problem. Looks 
>> like if OMPI is started by “srun   –mpi=ompi ”, mpi job is hang in 
>> ompi_wait_for_debugger() subroutine. What happen, I think is ompi was 
>> compiled without ORTE_DISABLE_FULL_SUPPORT and as result rank 0 is waiting 
>> for message from HNP (by the way what is HNP?)  that was supposed to be send 
>> by orterun. The problem is that orterun was never invoked because MPI was 
>> initiated by srun, not orterun.  So what is the solution? Should we always 
>> compile OMPI with  ORTE_DISABLE_FULL_SUPPORT=true for anything that uses 
>> different starters like srun from SLURM?
>> Thanks
>> Nikolay
>>
>> Nikolay Piskun | Director of Continuing Engineering | Totalview Technologies 
>> |
>> Rogue Wave Software Inc  |  24 Prime Parkway, Natick, MA 01760 | p 
>> 508-652-7739|
>> nikolay.pis...@roguewave.com
>> www.roguewave.com
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to