Interesting - I see why. Please try this version.

Ralph


On Thu, Oct 15, 2015 at 4:05 AM, Mark Santcroos <mark.santcr...@rutgers.edu>
wrote:

>
> > On 15 Oct 2015, at 4:38 , Ralph Castain <r...@open-mpi.org> wrote:
> > Okay, please try the attached patch.
>
> *scratch*
>
> Although I reported results with the patch earlier, I can't reproduce it
> anymore.
> Now orte-dvm shuts down after the first orte-submit completes with:
>
>
> [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing
> Command: ORTE_DAEMON_SPAWN_JOB_CMD
> [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing
> Command: ORTE_DAEMON_ADD_LOCAL_PROCS
> [netbook:72038] [[9827,0],0] Releasing job data for [INVALID]
> [netbook:72038] sess_dir_finalize: proc session dir does not exist
> [netbook:72038] [[9827,0],0] JOB [9827,1] HAS TERMINATED
> [netbook:72038] [[9827,0],0] NOTIFYING [[9826,0],0] OF JOB [9827,1]
> COMPLETION
> [netbook:72038] [[9827,0],0] JOB [9827,1] HAS TERMINATED
> [netbook:72038] [[9827,0],0] orted:comm:process_commands() Processing
> Command: ORTE_DAEMON_EXIT_CMD
> [netbook:72038] sess_dir_finalize: proc session dir does not exist
> [netbook:72038] sess_dir_cleanup: job session dir does not exist
> exiting with status 0
>
>
> (Earlier I maybe had an unpatched instance of orte-dvm still running and
> either the installation or some dynamic linking got messed up?!?!)
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/10/18178.php
>
diff --git a/orte/mca/state/dvm/state_dvm.c b/orte/mca/state/dvm/state_dvm.c
index 0e7309c..5b1a841 100644
--- a/orte/mca/state/dvm/state_dvm.c
+++ b/orte/mca/state/dvm/state_dvm.c
@@ -267,6 +267,7 @@ void check_complete(int fd, short args, void *cbdata)
         if (jdata->state < ORTE_JOB_STATE_UNTERMINATED) {
             jdata->state = ORTE_JOB_STATE_TERMINATED;
         }
+        opal_output(0, "%s JOB %s HAS TERMINATED", 
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_JOBID_PRINT(jdata->jobid));
     }
 
     /* tell the IOF that the job is complete */
diff --git a/orte/tools/orte-dvm/orte-dvm.c b/orte/tools/orte-dvm/orte-dvm.c
index 3cdf585..f9a969a 100644
--- a/orte/tools/orte-dvm/orte-dvm.c
+++ b/orte/tools/orte-dvm/orte-dvm.c
@@ -462,6 +462,11 @@ static void notify_requestor(int sd, short args, void 
*cbdata)
     int ret;
     opal_buffer_t *reply;
 
+opal_output(0, "%s NOTIFYING %s OF JOB %s COMPLETION",
+            ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
+            ORTE_NAME_PRINT(&jdata->originator),
+            ORTE_JOBID_PRINT(jdata->jobid));
+
     /* notify the requestor */
     reply = OBJ_NEW(opal_buffer_t);
     /* see if there was any problem */
@@ -476,6 +481,7 @@ static void notify_requestor(int sd, short args, void 
*cbdata)
     /* we cannot cleanup the job object as we might
      * hit an error during transmission, so clean it
      * up in the send callback */
+
     OBJ_RELEASE(caddy);
 }
 

Reply via email to