[OMPI devel] ORTE Scaling results: updated

2008-04-08 Thread Ralph H Castain
Hello all

The wiki page has been updated with the latest test results from a new
branch that implemented inbound collectives on the modex and barrier
operations. As you will see from the graphs, ORTE/OMPI now exhibits a
negative 2nd-derivative on the launch time curve for mpi_no_op (i.e.,
MPI_Init/MPI_Finalize).

Some cleanup of the branch code is required before insertion into the trunk.
I'll send out a note when that occurs.

The wiki page is at:

https://svn.open-mpi.org/trac/ompi/wiki/ORTEScalabilityTesting

Ralph




Re: [OMPI devel] MPI_Comm_connect/Accept

2008-04-08 Thread Aurélien Bouteiller

Still no luck here,

I launch those three processes :
term1$ ompi-server -d --report-uri URIFILE

term2$ mpirun -mca routed unity -ompi-server file:URIFILE -np 1  
simple_accept


term3$ mpirun -mca routed unity -ompi-server file:URIFILE -np 1  
simple_connect


The output of ompi-server shows a successful publish and lookup. I get  
the correct port on the client side. However, the result is the same  
as when not using the Publish/Lookup mechanism: the connect fails  
saying the

port cannot be reached.

Found port < 1940389889.0;tcp:// 
160.36.252.99:49777;tcp6://2002:a024:ed65:9:21b:63ff:fecb: 
28:49778;tcp6://fec0::9:21b:63ff:fecb:28:49778;tcp6://2002:a024:ff7f: 
9:21b:63ff:fecb:28:49778:300 >
[abouteil.nomad.utk.edu:60339] [[29620,1],0] ORTE_ERROR_LOG: A message  
is attempting to be sent to a process whose contact information is  
unknown in file ../../../../../trunk/orte/mca/rml/oob/rml_oob_send.c  
at line 140
[abouteil.nomad.utk.edu:60339] [[29620,1],0] attempted to send to  
[[29608,1],0]
[abouteil.nomad.utk.edu:60339] [[29620,1],0] ORTE_ERROR_LOG: A message  
is attempting to be sent to a process whose contact information is  
unknown in file ../../../../../trunk/ompi/mca/dpm/orte/dpm_orte.c at  
line 455

[abouteil.nomad.utk.edu:60339] *** An error occurred in MPI_Comm_connect
[abouteil.nomad.utk.edu:60339] *** on communicator MPI_COMM_SELF
[abouteil.nomad.utk.edu:60339] *** MPI_ERR_UNKNOWN: unknown error
[abouteil.nomad.utk.edu:60339] *** MPI_ERRORS_ARE_FATAL (goodbye)

I took a look in the source code, and I think the problem comes from a  
conceptional mistake in MPI_Connect. The function "connect_accept" in  
dpm_orte.c takes a orte_process_name_t as the destination port. This  
structure only contains the jobid and the vpid (always set to 0, I  
guess meaning you plan to contact the HNP of that job). Obviously, if  
the accepting process does not share the same HNP with the connecting  
process, there is no way for the MPI_Comm_connect function to fill  
correctly this field. The all purpose of the port_name string is to  
provide a consistent way to access the remote endpoint without a  
complicated name resolution service. I think this function should take  
the port_name instead (the string returned by open_port) and contact  
directly with OOB this endpoint to get the contact informations it  
needs from there, and not from the local HNP.


Aurelien

Le 4 avr. 08 à 15:21, Ralph H Castain a écrit :
Okay, I have a partial fix in there now. You'll have to use -mca  
routed

unity as I still need to fix it for routed tree.

Couple of things:

1. I fixed the --debug flag so it automatically turns on the debug  
output
from the data server code itself. Now ompi-server will tell you when  
it is

accessed.

2. remember, we added an MPI_Info key that specifies if you want the  
data
stored locally (on your own mpirun) or globally (on the ompi- 
server). If you
specify nothing, there is a precedence built into the code that  
defaults to
"local". So you have to tell us that this data is to be published  
"global"

if you want to connect multiple mpiruns.

I believe Jeff wrote all that up somewhere - could be in an email  
thread,
though. Been too long ago for me to remember... ;-) You can look it  
up in

the code though as a last resort - it is in
ompi/mca/pubsub/orte/pubsub_orte.c.

Ralph



On 4/4/08 12:55 PM, "Ralph H Castain"  wrote:

Well, something got borked in here - will have to fix it, so this  
will

probably not get done until next week.


On 4/4/08 12:26 PM, "Ralph H Castain"  wrote:

Yeah, you didn't specify the file correctly...plus I found a bug  
in the code

when I looked (out-of-date a little in orterun).

I am updating orterun (commit soon) and will include a better help  
message
about the proper format of the orterun cmd-line option. The syntax  
is:


-ompi-server uri

or -ompi-server file:filename-where-uri-exists

Problem here is that you gave it a uri of "test", which means  
nothing. ;-)


Should have it up-and-going soon.
Ralph

On 4/4/08 12:02 PM, "Aurélien Bouteiller"   
wrote:



Ralph,

I've not been very successful at using ompi-server. I tried this :

xterm1$ ompi-server --debug-devel -d --report-uri test
[grosse-pomme.local:01097] proc_info: hnp_uri NULL
daemon uri NULL
[grosse-pomme.local:01097] [[34900,0],0] ompi-server: up and  
running!



xterm2$ mpirun -ompi-server test -np 1 mpi_accept_test
Port name:
2285895681.0;tcp://192.168.0.101:50065;tcp:// 
192.168.0.150:50065:300


xterm3$ mpirun -ompi-server test  -np 1 simple_connect
--
Process rank 0 attempted to lookup from a global ompi_server that
could not be contacted. This is typically caused by either not
specifying the contact info for the server, or by the server not
currently executing. If you did specify the contact info for a
server, please check to see that the server is running and start
it again (or have your sys admin s

Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham



On 4/8/08 2:19 PM, "Ralph H Castain"  wrote:

> 
> 
> 
> On 4/8/08 12:10 PM, "Pak Lui"  wrote:
> 
>> Richard Graham wrote:
>>> What happens if I deliver sigusr2 to mpirun ?  What I observe (for both
>>> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
>>> get propagated to the mpi procs, which do invoke the signal handler I
>>> registered, but the job is terminated right after that.  However, if I
>>> deliver the signal directly to the mpi procs, the signal handler is invoked,
>>> and the job continues to run.
>> 
>> This is exactly what I have observed previously when I made the
>> gridengine change. It is due to the fact that orterun (aka mpirun) is
>> the process fork and exec'ing the executables on the HNP. e.g. On the
>> remote nodes, you don't have this problem. So the wait_daemon function
>> picks up the signal from mpirun on HNP, then kill off the children.
> 
> I'll look into this, but I don't believe this is true UNLESS something
> exits. The wait_daemon function only gets called when a proc terminates - it
> doesn't "pickup" a signal on its own. Perhaps we are just having a language
> problem here...
> 
> In the rsh situation, the daemon "daemonizes" and closes the ssh session
> during launch. If the ssh session closed on a signal, then that would return
> and indicate that a daemon had failed to start, causing the abort. But that
> session is successfully closed PRIOR to the launch of any MPI procs. I note
> that we don't "deregister" the waitpid, though, so there may be some issue
> there.
> 
> However, we most certainly do NOT look for such things in Torque. My guess
> is that something is causing a proc/daemon to abort, which then causes the
> system to abort the job.
> 
> I have tried this on my Mac (got other things going on at the moment on the
> distributed machines), and all works as expected. However, that doesn't mean
> there isn't a problem in general.

Interesting - I do most of my development work on the Mac, and this is where
I also see the problem.  I have not updated in a couple of days, so maybe
things have been fixed since.

Rich

> 
> Will investigate when I have time shortly.
> 
>> 
>>> 
>>> So, I think that what was intended to happen is the correct thing, but for
>>> some reason it is not happening.
>>> 
>>> Rich
>>> 
>>> 
>>> On 4/8/08 1:47 PM, "Ralph H Castain"  wrote:
>>> 
 I found what Pak said a little confusing as the wait_daemon function
 doesn't
 actually receive a signal itself - it only detects that a proc has exited
 and checks to see if that happened due to a signal. If so, it flags that
 situation and will order the job aborted.
 
 So if the proc continues alive, the fact that it was hit with SIGUSR2 will
 not be detected by ORTE nor will anything happen - however, if the OS uses
 SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
 signal, we will see that proc terminate due to signal and abort the rest of
 the job.
 
 We could change it if that is what people want - it is trivial to insert
 code to say "kill everything except if it died due to a certain signal".
 
  up to you folks. Current behavior is what you said you wanted a
 long
 time ago - nothing has changed in this regard for several years.
 
 
 On 4/8/08 11:36 AM, "Pak Lui"  wrote:
 
> First, can your user executable create a signal handler to catch the
> SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
> you catch the signal and have the process to do nothing.
> 
> from signal(3HEAD)
>   Name Value   DefaultEvent
>   SIGUSR1  16  Exit   User Signal 1
>   SIGUSR2  17  Exit   User Signal 2
> 
> The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
> might cause the processes to exit if the orted (or mpirun if it's on
> HNP) receives a signal like SIGUSR2; it'd work on killing all the user
> processes on that node once it receives a signal.
> 
> I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
> receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
> acknowledge a signal returns, without declaring the launch_failed which
> would kill off the user processes. The signals would also get passed to
> the user processes, and let them decide what to do with the signals
> themselves.
> 
> SGE needed this so the job kill or job suspension notification to work
> properly since they would send a SIGUSR1/2 to mpirun. I believe this is
> probably what you need in the rsh plm.
> 
> Richard Graham wrote:
>> I am running into a situation where I am trying to deliver a signal to
>> the
>> mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to
>> the
>> mpi procs, but then proceeds to kill the children.  Is there 

Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain



On 4/8/08 12:10 PM, "Pak Lui"  wrote:

> Richard Graham wrote:
>> What happens if I deliver sigusr2 to mpirun ?  What I observe (for both
>> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
>> get propagated to the mpi procs, which do invoke the signal handler I
>> registered, but the job is terminated right after that.  However, if I
>> deliver the signal directly to the mpi procs, the signal handler is invoked,
>> and the job continues to run.
> 
> This is exactly what I have observed previously when I made the
> gridengine change. It is due to the fact that orterun (aka mpirun) is
> the process fork and exec'ing the executables on the HNP. e.g. On the
> remote nodes, you don't have this problem. So the wait_daemon function
> picks up the signal from mpirun on HNP, then kill off the children.

I'll look into this, but I don't believe this is true UNLESS something
exits. The wait_daemon function only gets called when a proc terminates - it
doesn't "pickup" a signal on its own. Perhaps we are just having a language
problem here...

In the rsh situation, the daemon "daemonizes" and closes the ssh session
during launch. If the ssh session closed on a signal, then that would return
and indicate that a daemon had failed to start, causing the abort. But that
session is successfully closed PRIOR to the launch of any MPI procs. I note
that we don't "deregister" the waitpid, though, so there may be some issue
there.

However, we most certainly do NOT look for such things in Torque. My guess
is that something is causing a proc/daemon to abort, which then causes the
system to abort the job.

I have tried this on my Mac (got other things going on at the moment on the
distributed machines), and all works as expected. However, that doesn't mean
there isn't a problem in general.

Will investigate when I have time shortly.

> 
>> 
>> So, I think that what was intended to happen is the correct thing, but for
>> some reason it is not happening.
>> 
>> Rich
>> 
>> 
>> On 4/8/08 1:47 PM, "Ralph H Castain"  wrote:
>> 
>>> I found what Pak said a little confusing as the wait_daemon function doesn't
>>> actually receive a signal itself - it only detects that a proc has exited
>>> and checks to see if that happened due to a signal. If so, it flags that
>>> situation and will order the job aborted.
>>> 
>>> So if the proc continues alive, the fact that it was hit with SIGUSR2 will
>>> not be detected by ORTE nor will anything happen - however, if the OS uses
>>> SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
>>> signal, we will see that proc terminate due to signal and abort the rest of
>>> the job.
>>> 
>>> We could change it if that is what people want - it is trivial to insert
>>> code to say "kill everything except if it died due to a certain signal".
>>> 
>>>  up to you folks. Current behavior is what you said you wanted a long
>>> time ago - nothing has changed in this regard for several years.
>>> 
>>> 
>>> On 4/8/08 11:36 AM, "Pak Lui"  wrote:
>>> 
 First, can your user executable create a signal handler to catch the
 SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
 you catch the signal and have the process to do nothing.
 
 from signal(3HEAD)
   Name Value   DefaultEvent
   SIGUSR1  16  Exit   User Signal 1
   SIGUSR2  17  Exit   User Signal 2
 
 The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
 might cause the processes to exit if the orted (or mpirun if it's on
 HNP) receives a signal like SIGUSR2; it'd work on killing all the user
 processes on that node once it receives a signal.
 
 I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
 receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
 acknowledge a signal returns, without declaring the launch_failed which
 would kill off the user processes. The signals would also get passed to
 the user processes, and let them decide what to do with the signals
 themselves.
 
 SGE needed this so the job kill or job suspension notification to work
 properly since they would send a SIGUSR1/2 to mpirun. I believe this is
 probably what you need in the rsh plm.
 
 Richard Graham wrote:
> I am running into a situation where I am trying to deliver a signal to the
> mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
> mpi procs, but then proceeds to kill the children.  Is there an easy way
> that I can get around this ?  I am using this mechanism in a situation
> where
> I don't have a debugger, and trying to use this to turn on debugging when
> I
> hit a hang, so killing the mpi procs is really not what I want to have
> happen.
> 
> Thanks,
> Rich
> 
> ___
> devel maili

Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui

Richard Graham wrote:

What happens if I deliver sigusr2 to mpirun ?  What I observe (for both
ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
get propagated to the mpi procs, which do invoke the signal handler I
registered, but the job is terminated right after that.  However, if I
deliver the signal directly to the mpi procs, the signal handler is invoked,
and the job continues to run.


This is exactly what I have observed previously when I made the 
gridengine change. It is due to the fact that orterun (aka mpirun) is 
the process fork and exec'ing the executables on the HNP. e.g. On the 
remote nodes, you don't have this problem. So the wait_daemon function 
picks up the signal from mpirun on HNP, then kill off the children.




So, I think that what was intended to happen is the correct thing, but for
some reason it is not happening.

Rich


On 4/8/08 1:47 PM, "Ralph H Castain"  wrote:


I found what Pak said a little confusing as the wait_daemon function doesn't
actually receive a signal itself - it only detects that a proc has exited
and checks to see if that happened due to a signal. If so, it flags that
situation and will order the job aborted.

So if the proc continues alive, the fact that it was hit with SIGUSR2 will
not be detected by ORTE nor will anything happen - however, if the OS uses
SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
signal, we will see that proc terminate due to signal and abort the rest of
the job.

We could change it if that is what people want - it is trivial to insert
code to say "kill everything except if it died due to a certain signal".

 up to you folks. Current behavior is what you said you wanted a long
time ago - nothing has changed in this regard for several years.


On 4/8/08 11:36 AM, "Pak Lui"  wrote:


First, can your user executable create a signal handler to catch the
SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
you catch the signal and have the process to do nothing.

from signal(3HEAD)
  Name Value   DefaultEvent
  SIGUSR1  16  Exit   User Signal 1
  SIGUSR2  17  Exit   User Signal 2

The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
might cause the processes to exit if the orted (or mpirun if it's on
HNP) receives a signal like SIGUSR2; it'd work on killing all the user
processes on that node once it receives a signal.

I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
acknowledge a signal returns, without declaring the launch_failed which
would kill off the user processes. The signals would also get passed to
the user processes, and let them decide what to do with the signals
themselves.

SGE needed this so the job kill or job suspension notification to work
properly since they would send a SIGUSR1/2 to mpirun. I believe this is
probably what you need in the rsh plm.

Richard Graham wrote:

I am running into a situation where I am trying to deliver a signal to the
mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
mpi procs, but then proceeds to kill the children.  Is there an easy way
that I can get around this ?  I am using this mechanism in a situation where
I don't have a debugger, and trying to use this to turn on debugging when I
hit a hang, so killing the mpi procs is really not what I want to have
happen.

Thanks,
Rich

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

- Pak Lui
pak@sun.com


Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain
Hmmm...well, I'll take a look. I haven't seen that behavior, but I haven't
checked it in some time.


On 4/8/08 11:54 AM, "Richard Graham"  wrote:

> What happens if I deliver sigusr2 to mpirun ?  What I observe (for both
> ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
> get propagated to the mpi procs, which do invoke the signal handler I
> registered, but the job is terminated right after that.  However, if I
> deliver the signal directly to the mpi procs, the signal handler is invoked,
> and the job continues to run.
> 
> So, I think that what was intended to happen is the correct thing, but for
> some reason it is not happening.
> 
> Rich
> 
> 
> On 4/8/08 1:47 PM, "Ralph H Castain"  wrote:
> 
>> I found what Pak said a little confusing as the wait_daemon function doesn't
>> actually receive a signal itself - it only detects that a proc has exited
>> and checks to see if that happened due to a signal. If so, it flags that
>> situation and will order the job aborted.
>> 
>> So if the proc continues alive, the fact that it was hit with SIGUSR2 will
>> not be detected by ORTE nor will anything happen - however, if the OS uses
>> SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
>> signal, we will see that proc terminate due to signal and abort the rest of
>> the job.
>> 
>> We could change it if that is what people want - it is trivial to insert
>> code to say "kill everything except if it died due to a certain signal".
>> 
>>  up to you folks. Current behavior is what you said you wanted a long
>> time ago - nothing has changed in this regard for several years.
>> 
>> 
>> On 4/8/08 11:36 AM, "Pak Lui"  wrote:
>> 
>>> First, can your user executable create a signal handler to catch the
>>> SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
>>> you catch the signal and have the process to do nothing.
>>> 
>>> from signal(3HEAD)
>>>   Name Value   DefaultEvent
>>>   SIGUSR1  16  Exit   User Signal 1
>>>   SIGUSR2  17  Exit   User Signal 2
>>> 
>>> The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
>>> might cause the processes to exit if the orted (or mpirun if it's on
>>> HNP) receives a signal like SIGUSR2; it'd work on killing all the user
>>> processes on that node once it receives a signal.
>>> 
>>> I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
>>> receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
>>> acknowledge a signal returns, without declaring the launch_failed which
>>> would kill off the user processes. The signals would also get passed to
>>> the user processes, and let them decide what to do with the signals
>>> themselves.
>>> 
>>> SGE needed this so the job kill or job suspension notification to work
>>> properly since they would send a SIGUSR1/2 to mpirun. I believe this is
>>> probably what you need in the rsh plm.
>>> 
>>> Richard Graham wrote:
 I am running into a situation where I am trying to deliver a signal to the
 mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
 mpi procs, but then proceeds to kill the children.  Is there an easy way
 that I can get around this ?  I am using this mechanism in a situation
 where
 I don't have a debugger, and trying to use this to turn on debugging when I
 hit a hang, so killing the mpi procs is really not what I want to have
 happen.
 
 Thanks,
 Rich
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Signals

2008-04-08 Thread Richard Graham
What happens if I deliver sigusr2 to mpirun ?  What I observe (for both
ssh/rsh and torque) that if I deliver a sigusr2 to mpirun, the signal does
get propagated to the mpi procs, which do invoke the signal handler I
registered, but the job is terminated right after that.  However, if I
deliver the signal directly to the mpi procs, the signal handler is invoked,
and the job continues to run.

So, I think that what was intended to happen is the correct thing, but for
some reason it is not happening.

Rich


On 4/8/08 1:47 PM, "Ralph H Castain"  wrote:

> I found what Pak said a little confusing as the wait_daemon function doesn't
> actually receive a signal itself - it only detects that a proc has exited
> and checks to see if that happened due to a signal. If so, it flags that
> situation and will order the job aborted.
> 
> So if the proc continues alive, the fact that it was hit with SIGUSR2 will
> not be detected by ORTE nor will anything happen - however, if the OS uses
> SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
> signal, we will see that proc terminate due to signal and abort the rest of
> the job.
> 
> We could change it if that is what people want - it is trivial to insert
> code to say "kill everything except if it died due to a certain signal".
> 
>  up to you folks. Current behavior is what you said you wanted a long
> time ago - nothing has changed in this regard for several years.
> 
> 
> On 4/8/08 11:36 AM, "Pak Lui"  wrote:
> 
>> First, can your user executable create a signal handler to catch the
>> SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
>> you catch the signal and have the process to do nothing.
>> 
>> from signal(3HEAD)
>>   Name Value   DefaultEvent
>>   SIGUSR1  16  Exit   User Signal 1
>>   SIGUSR2  17  Exit   User Signal 2
>> 
>> The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
>> might cause the processes to exit if the orted (or mpirun if it's on
>> HNP) receives a signal like SIGUSR2; it'd work on killing all the user
>> processes on that node once it receives a signal.
>> 
>> I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
>> receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
>> acknowledge a signal returns, without declaring the launch_failed which
>> would kill off the user processes. The signals would also get passed to
>> the user processes, and let them decide what to do with the signals
>> themselves.
>> 
>> SGE needed this so the job kill or job suspension notification to work
>> properly since they would send a SIGUSR1/2 to mpirun. I believe this is
>> probably what you need in the rsh plm.
>> 
>> Richard Graham wrote:
>>> I am running into a situation where I am trying to deliver a signal to the
>>> mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
>>> mpi procs, but then proceeds to kill the children.  Is there an easy way
>>> that I can get around this ?  I am using this mechanism in a situation where
>>> I don't have a debugger, and trying to use this to turn on debugging when I
>>> hit a hang, so killing the mpi procs is really not what I want to have
>>> happen.
>>> 
>>> Thanks,
>>> Rich
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Signals

2008-04-08 Thread Ralph H Castain
I found what Pak said a little confusing as the wait_daemon function doesn't
actually receive a signal itself - it only detects that a proc has exited
and checks to see if that happened due to a signal. If so, it flags that
situation and will order the job aborted.

So if the proc continues alive, the fact that it was hit with SIGUSR2 will
not be detected by ORTE nor will anything happen - however, if the OS uses
SIGUSR2 to terminate the proc, or if the proc terminates when it gets that
signal, we will see that proc terminate due to signal and abort the rest of
the job.

We could change it if that is what people want - it is trivial to insert
code to say "kill everything except if it died due to a certain signal".

 up to you folks. Current behavior is what you said you wanted a long
time ago - nothing has changed in this regard for several years.


On 4/8/08 11:36 AM, "Pak Lui"  wrote:

> First, can your user executable create a signal handler to catch the
> SIGUSR2 to not exit? By default on Solaris it is going to exit, unless
> you catch the signal and have the process to do nothing.
> 
> from signal(3HEAD)
>   Name Value   DefaultEvent
>   SIGUSR1  16  Exit   User Signal 1
>   SIGUSR2  17  Exit   User Signal 2
> 
> The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm
> might cause the processes to exit if the orted (or mpirun if it's on
> HNP) receives a signal like SIGUSR2; it'd work on killing all the user
> processes on that node once it receives a signal.
> 
> I workaround this for gridengine PLM. Once the gridengine_wait_daemon()
> receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to
> acknowledge a signal returns, without declaring the launch_failed which
> would kill off the user processes. The signals would also get passed to
> the user processes, and let them decide what to do with the signals
> themselves.
> 
> SGE needed this so the job kill or job suspension notification to work
> properly since they would send a SIGUSR1/2 to mpirun. I believe this is
> probably what you need in the rsh plm.
> 
> Richard Graham wrote:
>> I am running into a situation where I am trying to deliver a signal to the
>> mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
>> mpi procs, but then proceeds to kill the children.  Is there an easy way
>> that I can get around this ?  I am using this mechanism in a situation where
>> I don't have a debugger, and trying to use this to turn on debugging when I
>> hit a hang, so killing the mpi procs is really not what I want to have
>> happen.
>> 
>> Thanks,
>> Rich
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] Signals

2008-04-08 Thread Pak Lui
First, can your user executable create a signal handler to catch the 
SIGUSR2 to not exit? By default on Solaris it is going to exit, unless 
you catch the signal and have the process to do nothing.


from signal(3HEAD)
 Name Value   DefaultEvent
 SIGUSR1  16  Exit   User Signal 1
 SIGUSR2  17  Exit   User Signal 2

The other thing is, I suspect orte_plm_rsh_wait_daemon() in the rsh plm 
might cause the processes to exit if the orted (or mpirun if it's on 
HNP) receives a signal like SIGUSR2; it'd work on killing all the user 
processes on that node once it receives a signal.


I workaround this for gridengine PLM. Once the gridengine_wait_daemon() 
receives a SIGUSR1/SIGUSR2 signal, it just lets the signals to 
acknowledge a signal returns, without declaring the launch_failed which 
would kill off the user processes. The signals would also get passed to 
the user processes, and let them decide what to do with the signals 
themselves.


SGE needed this so the job kill or job suspension notification to work 
properly since they would send a SIGUSR1/2 to mpirun. I believe this is 
probably what you need in the rsh plm.


Richard Graham wrote:

I am running into a situation where I am trying to deliver a signal to the
mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
mpi procs, but then proceeds to kill the children.  Is there an easy way
that I can get around this ?  I am using this mechanism in a situation where
I don't have a debugger, and trying to use this to turn on debugging when I
hit a hang, so killing the mpi procs is really not what I want to have
happen.

Thanks,
Rich

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--

- Pak Lui
pak@sun.com


[OMPI devel] Signals

2008-04-08 Thread Richard Graham
I am running into a situation where I am trying to deliver a signal to the
mpi procs (sigusr2).  I deliver this to mpirun, which propagates it to the
mpi procs, but then proceeds to kill the children.  Is there an easy way
that I can get around this ?  I am using this mechanism in a situation where
I don't have a debugger, and trying to use this to turn on debugging when I
hit a hang, so killing the mpi procs is really not what I want to have
happen.

Thanks,
Rich



Re: [OMPI devel] mpirun return code problems

2008-04-08 Thread Ralph H Castain
I'm aware - as we discussed on a recent telecon, I put it on my list of
things to resolve. Solution is known - just busy with other things at the
moment.


On 4/8/08 6:06 AM, "Tim Prins"  wrote:

> Hi all,
> 
> I reported this before, but it seems that the report got lost. I have
> found some situations where mpirun will return a '0' when there is an error.
> 
> An easy way to reproduce this is to edit the file
> 'orte/mca/plm/base/plm_base_launch_support.c' and on line 154 put in
> 'return ORTE_ERROR;' (or apply the attached diff).
> 
> Then recompile and run mpirun. mpirun will indicate there was an error,
> but will still return 0. The reason this is concerning to me is that MTT
> only looks at return codes, so our tests may be failing and we wouldn't
> know it.
> 
> Thanks,
> 
> Tim
> Index: orte/mca/plm/base/plm_base_launch_support.c
> ===
> --- orte/mca/plm/base/plm_base_launch_support.c (revision 18092)
> +++ orte/mca/plm/base/plm_base_launch_support.c (working copy)
> @@ -151,7 +151,7 @@
>   ORTE_JOBID_PRINT(job), ORTE_ERROR_NAME(rc)));
>  return rc;
>  }
> -
> +   return ORTE_ERROR;
>  /* complete wiring up the iof */
>  OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
>   "%s plm:base:launch wiring up iof",
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] mpirun return code problems

2008-04-08 Thread Tim Prins

Hi all,

I reported this before, but it seems that the report got lost. I have 
found some situations where mpirun will return a '0' when there is an error.


An easy way to reproduce this is to edit the file 
'orte/mca/plm/base/plm_base_launch_support.c' and on line 154 put in 
'return ORTE_ERROR;' (or apply the attached diff).


Then recompile and run mpirun. mpirun will indicate there was an error, 
but will still return 0. The reason this is concerning to me is that MTT 
only looks at return codes, so our tests may be failing and we wouldn't 
know it.


Thanks,

Tim
Index: orte/mca/plm/base/plm_base_launch_support.c
===
--- orte/mca/plm/base/plm_base_launch_support.c (revision 18092)
+++ orte/mca/plm/base/plm_base_launch_support.c (working copy)
@@ -151,7 +151,7 @@
  ORTE_JOBID_PRINT(job), ORTE_ERROR_NAME(rc)));
 return rc;
 }
-
+   return ORTE_ERROR; 
 /* complete wiring up the iof */
 OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
  "%s plm:base:launch wiring up iof",