It sounds reasonable to me. I agree with Ralf W about having mpirun send a STOP to itself - that would seem to solve the problem about stopping everything.

It would seem, however, that you cannot similarly STOP the daemons or else you won't be able to CONT the job. I'm not sure how big a deal that is - I can't think of any issue it creates offhand.

Is there any issue in the MPI comm layers if you abruptly STOP a process while it's communicating? Especially since the STOP is going to be asynchronous. Do you need to quiet networks like IB first?

Ralph


On Dec 5, 2008, at 12:00 PM, Rolf Vandevaart wrote:


We have had requests to be able to suspend/resume MPI jobs within an SGE environment. SGE sends a signal (which is configurable) to mpirun to stop the job and another signal to resume it. To support this, I propose that we add support in the ORTE to catch SIGTSTP/ SIGCONT and forward these to the a.outs. Actually, SIGTSTP will be caught, forwarded, then converted to SIGSTOP before being delivered to the a.outs. The one disadvantage is that we have overridden the SIGTSTP default behavior which is typically to stop mpirun.

Does anyone else have a requirement like this or does anyone have issues with these changes? FWIW, I know there is at least one other MPI that supports this type of behavior.

One problem is that with SIGTSTP no longer delivering a stop signal to mpirun, one cannot CTRL-Z at their terminal to stop mpirun. I am trying to figure out how big a problem that is.

Rolf

PS: Here are the possible code changes.  Not too major.

burl-ct-v440-2 62 =>svn diff
Index: orte/tools/orterun/orterun.c
===================================================================
--- orte/tools/orterun/orterun.c        (revision 20072)
+++ orte/tools/orterun/orterun.c        (working copy)
@@ -99,6 +99,8 @@
#ifndef __WINDOWS__
static struct opal_event sigusr1_handler;
static struct opal_event sigusr2_handler;
+static struct opal_event sigtstp_handler;
+static struct opal_event sigcont_handler;
#endif  /* __WINDOWS__ */
static orte_job_t *jdata;
static char *orterun_basename = NULL;
@@ -511,6 +513,12 @@
    opal_signal_set(&sigusr2_handler, SIGUSR2,
                    signal_forward_callback, &sigusr2_handler);
    opal_signal_add(&sigusr2_handler, NULL);
+    opal_signal_set(&sigtstp_handler, SIGTSTP,
+                    signal_forward_callback, &sigtstp_handler);
+    opal_signal_add(&sigtstp_handler, NULL);
+    opal_signal_set(&sigcont_handler, SIGCONT,
+                    signal_forward_callback, &sigcont_handler);
+    opal_signal_add(&sigcont_handler, NULL);
#endif  /* __WINDOWS__ */

/* we are an hnp, so update the contact info field for later use */
@@ -763,6 +771,8 @@
    /** Remove the USR signal handlers */
    opal_signal_del(&sigusr1_handler);
    opal_signal_del(&sigusr2_handler);
+    opal_signal_del(&sigtstp_handler);
+    opal_signal_del(&sigcont_handler);
#endif  /* __WINDOWS__ */

    /* get the daemon job object */
Index: orte/orted/orted_comm.c
===================================================================
--- orte/orted/orted_comm.c     (revision 20072)
+++ orte/orted/orted_comm.c     (working copy)
@@ -457,10 +457,6 @@

        /****    SIGNAL_LOCAL_PROCS   ****/
        case ORTE_DAEMON_SIGNAL_LOCAL_PROCS:
-            if (orte_debug_daemons_flag) {
- opal_output(0, "%s orted_cmd: received signal_local_procs",
-                            ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
-            }
            /* unpack the jobid */
            n = 1;
if (ORTE_SUCCESS != (ret = opal_dss.unpack(buffer, &job, &n, ORTE_JOBID))) {
@@ -474,7 +470,22 @@
                ORTE_ERROR_LOG(ret);
                goto CLEANUP;
            }
-
+
+            /* Convert SIGTSTP to SIGSTOP so we can suspend a.out */
+            if (SIGTSTP == signal) {
+                if (orte_debug_daemons_flag) {
+ opal_output(0, "%s orted_cmd: converted SIGTSTP to SIGSTOP before delivering",
+                                ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
+                }
+                signal = SIGSTOP;
+            }
+
+            if (orte_debug_daemons_flag) {
+ opal_output(0, "%s orted_cmd: received signal_local_procs, delivering signal %d",
+                            ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
+                            signal);
+            }
+
            /* signal them */
if (ORTE_SUCCESS != (ret = orte_odls.signal_local_procs(NULL, signal))) {
                ORTE_ERROR_LOG(ret);
burl-ct-v440-2 63 =>







--

=========================
rolf.vandeva...@sun.com
781-442-3043
=========================
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to