On Tue, Sep 11, 2007 at 04:56:13AM -0400, Daniel Veillard wrote:
> On Tue, Sep 11, 2007 at 12:59:59AM +0100, Daniel P. Berrange wrote:
> > I noticed that when using the SSH tunnel for the remote driver I ended up
> > with alot of zombie SSH processes. We simply forgot to waitpid() on the
> > child when a connection attempt failed, or when shutting down an open remote
> > connection. Attached is a possible patch
> 
>   Looks fine, like Rich maybe a bit of refactoring might be good. The
> only worries I have is the following scenario:
>    - the ssh process dies
>    - libvirt based application takes some time to notice it
>    - the OS span a new process with the same PID after a PID rollabck
>      (not completely unlikely since the ssh may have been started a long
>      time ago)
>    - we end-up killing a random process in the system
> 
> I think this is mostly avoidable by resetting priv->pid to -1 or 0 on 
> any child communication error, and before doing the kill in the patch.
> Even better would be to be able to check that the process corresponding
> to priv->pid is still a child of the current process, I wonder if this
> can be achieved without blocking with an initial waitpid()

After some testing I believe we can safely do away with kill() completely.
If we simply close(priv->sock) which is our end of the socketpair used
to talk to SSH, then SSH detects the end-of-file condition and exits of
its on accord. So there's no need to kill() - just close the socket and
waitpid() seems to do the trickm, unless I'm missing something bad.

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 
Index: src/remote_internal.c
===================================================================
RCS file: /data/cvs/libvirt/src/remote_internal.c,v
retrieving revision 1.22
diff -u -p -r1.22 remote_internal.c
--- src/remote_internal.c       12 Sep 2007 10:33:48 -0000      1.22
+++ src/remote_internal.c       14 Sep 2007 02:09:38 -0000
@@ -62,6 +62,7 @@
 struct private_data {
     int magic;                  /* Should be MAGIC or DEAD. */
     int sock;                   /* Socket. */
+    pid_t pid;                  /* PID of tunnel process */
     int uses_tls;               /* TLS enabled on socket? */
     gnutls_session_t session;   /* GnuTLS session (if uses_tls != 0). */
     char *type;                 /* Cached return from remoteType. */
@@ -578,7 +579,7 @@ doRemoteOpen (virConnectPtr conn, struct
 
         /*FALLTHROUGH*/
     case trans_ext: {
-        int pid;
+        pid_t pid;
         int sv[2];
 
         /* Fork off the external process.  Use socketpair to create a private
@@ -617,6 +618,7 @@ doRemoteOpen (virConnectPtr conn, struct
         /* Parent continues here. */
         close (sv[1]);
         priv->sock = sv[0];
+        priv->pid = pid;
     }
     } /* switch (transport) */
 
@@ -646,6 +648,14 @@ doRemoteOpen (virConnectPtr conn, struct
             gnutls_bye (priv->session, GNUTLS_SHUT_RDWR);
         close (priv->sock);
     }
+    if (priv->pid > 0) {
+        pid_t reap;
+        do {
+            reap = waitpid(priv->pid, NULL, 0);
+            if (reap == -1 && errno == EINTR)
+                continue;
+        } while (reap != -1 && reap != priv->pid);
+    }
 
     /* Free up the URL and strings. */
     xmlFreeURI (uri);
@@ -1170,6 +1180,15 @@ doRemoteClose (virConnectPtr conn, struc
         gnutls_bye (priv->session, GNUTLS_SHUT_RDWR);
     close (priv->sock);
 
+    if (priv->pid > 0) {
+        pid_t reap;
+        do {
+            reap = waitpid(priv->pid, NULL, 0);
+            if (reap == -1 && errno == EINTR)
+                continue;
+        } while (reap != -1 && reap != priv->pid);
+    }
+
     /* See comment for remoteType. */
     if (priv->type) free (priv->type);
 
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to