Ja; $done is a little odd and probably mis-named. It's the number of file descriptors still open from the child process. When it reaches 0, we're done.

On Oct 31, 2006, at 9:39 AM, Ethan Mallove wrote:

I've run with these changes and they seem to work (I did
need to change the INI param "module" to "specify_module",
from the previous commit). Just one question (see below).


On Sun, Oct/29/2006 08:36:04AM, jsquy...@osl.iu.edu wrote:
Author: jsquyres
Date: 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
New Revision: 403

Modified:
   trunk/CHANGES
   trunk/lib/MTT/DoCommand.pm
   trunk/lib/MTT/Globals.pm
   trunk/samples/ompi-core-template.ini

Log:
 * Add textwrap to Global defaults
 * Add new global: drain_timeout
 * In DoCommand, after the timeout, we'll wait drain_timeout more
   seconds to get any final output and then unconditionally move on.
 * Add some Verbose statements to catch when kill() does not seem to
be working. Have not nailed this down yet; want to see some output
   from when it occurrs.


Modified: trunk/CHANGES
===================================================================== =========
--- trunk/CHANGES       (original)
+++ trunk/CHANGES       2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
@@ -1,2 +1,5 @@
 To announce to OMPI core testers:

+- added new fields to MTT section to ini file
+  - textwrap
+  - drain_timeout

Modified: trunk/lib/MTT/DoCommand.pm
===================================================================== =========
--- trunk/lib/MTT/DoCommand.pm  (original)
+++ trunk/lib/MTT/DoCommand.pm 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
@@ -32,6 +32,7 @@
     if ($kid != 0) {
         return $?;
     }
+    Verbose("** Kill TERM didn't work!\n");

     # Nope, that didn't work.  Sleep a few seconds and try again.
     sleep(2);
@@ -39,6 +40,7 @@
     if ($kid != 0) {
         return $?;
     }
+    Verbose("** Kill TERM (more waiting) didn't work!\n");

     # That didn't work either.  Try SIGINT;
     kill("INT", $pid);
@@ -46,6 +48,7 @@
     if ($kid != 0) {
         return $?;
     }
+    Verbose("** Kill INT didn't work!\n");

     # Nope, that didn't work.  Sleep a few seconds and try again.
     sleep(2);
@@ -53,6 +56,7 @@
     if ($kid != 0) {
         return $?;
     }
+    Verbose("** Kill INT (more waiting) didn't work!\n");

     # Ok, now we're mad.  Be violent.
     while (1) {
@@ -61,13 +65,7 @@
         if ($kid != 0) {
             return $?;
         }
-        sleep(1);
-
-        kill("KILL", $pid);
-        $kid = waitpid($pid, WNOHANG);
-        if ($kid != 0) {
-            return $?;
-        }
+        Verbose("** Kill KILL didn't work!\n");
         sleep(1);
     }
 }
@@ -278,7 +276,7 @@
         if (defined($end_time) && time() > $end_time) {
             my $over = time() - $end_time;
             if ($over > $last_over) {
-                Debug("*** Past timeout by $over seconds\n");
+                Verbose("*** Past timeout by $over seconds\n");
                 my $st = _kill_proc($pid);
                 if (!defined($killed_status)) {
                     $killed_status = $st;
@@ -286,6 +284,12 @@
                 $ret->{timed_out} = 1;
             }
             $last_over = $over;
+
+            # See if we've over the drain_timeout
+            if ($over > $MTT::Globals::Values->{drain_timeout}) {
+                Verbose("*** Past drain timeout; quitting\n");
+                $done = 0;
+            }


I would have thought if we're "quitting" here, then $done =
1.

-Ethan


         }
     }
     close OUTerr;

Modified: trunk/lib/MTT/Globals.pm
===================================================================== =========
--- trunk/lib/MTT/Globals.pm    (original)
+++ trunk/lib/MTT/Globals.pm 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
@@ -26,6 +26,8 @@
     hostfile => undef,
     hostlist => undef,
     max_np => undef,
+    textwrap => 76,
+    drain_timeout => 5,
 };

 # Reset $Globals per a specific ini file
@@ -68,6 +70,13 @@
     if ($val) {
         $Values->{textwrap} = $val;
     }
+
+    # Output display preference
+
+    my $val = MTT::Values::Value($ini, "MTT", "drain_timeout");
+    if ($val) {
+        $Values->{drain_timeout} = $val;
+    }
 }



Modified: trunk/samples/ompi-core-template.ini
===================================================================== =========
--- trunk/samples/ompi-core-template.ini        (original)
+++ trunk/samples/ompi-core-template.ini 2006-10-29 08:35:58 EST (Sun, 29 Oct 2006)
@@ -91,9 +91,15 @@
 # returned by &env_max_procs(), you can fill in an integer here.
 max_np =

-# Output display preference
+# OMPI Core: Output display preference; the default width at which MTT
+# output will wrap.
 textwrap = 76

+# OMPI Core: After the timeout for a command has passed, wait this
+# many additional seconds to drain all output, and then kill it with
+# extreme prejiduce.
+drain_timeout = 5
+
#==================================================================== ==
 # MPI get phase
#==================================================================== ==
_______________________________________________
mtt-svn mailing list
mtt-...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-svn


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to