Hi Danny,

Danny Milosavljevic <dan...@scratchpost.org> skribis:

> what happens when the loop reads the pid file when it contains just half of a
> numeral?  It won't detect it, right?

Correct.

I’m proposing the addition below to be on the verrrry safe side.
WDYT?

The weird thing, as I mentioned earlier, is that systemd and Pies do not
protect against truncated PID files, and I couldn’t find any
“documentation” of the problem on the intertubes.  For systemd it’s
maybe less of a problem since services are started in a cgroup, I think.

Thanks,
Ludo’.

diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
index b937609..ef27d88 100644
--- a/modules/shepherd/service.scm
+++ b/modules/shepherd/service.scm
@@ -709,10 +709,16 @@ results."
 set when starting a service."
   (environ))
 
-(define* (read-pid-file file #:key (max-delay 5))
+(define* (read-pid-file file #:key (max-delay 5)
+                        (validate-pid? #f))
   "Wait for MAX-DELAY seconds for FILE to show up, and read its content as a
 number.  Return #f if FILE was not created or does not contain a number;
-otherwise return the number that was read (a PID)."
+otherwise return the number that was read (a PID).
+
+When VALIDATE-PID? is true, succeed if and only if the number that was read is
+the PID of an existing process in the current PID namespace.  This test cannot
+be used if FILE might contain a PID from another PID namespace (i.e., the
+daemon writing FILE is running in a separate PID namespace.)"
   (define start (current-time))
 
   (let loop ()
@@ -736,11 +742,13 @@ otherwise return the number that was read (a PID)."
            (try-again))
           ((? integer? pid)
            ;; It's possible, though unlikely, that PID is not a valid PID, for
-           ;; instance because writes to FILE did not complete.  However, we
-           ;; don't do (kill pid 0) because if the process lives in a separate
-           ;; PID namespace, then PID is probably invalid in our own
-           ;; namespace.
-           pid)))
+           ;; instance because writes to FILE did not complete.  When
+           ;; VALIDATE-PID? is true, check that PID is valid in the current
+           ;; PID namespace.
+           (if (or (not validate-pid?)
+                   (catch-system-error (kill pid 0) #t))
+               pid
+               (try-again)))))
       (lambda args
         (let ((errno (system-error-errno args)))
           (if (= ENOENT errno)
@@ -931,7 +939,8 @@ start."
                                         environment-variables)))
             (if pid-file
                 (match (read-pid-file pid-file
-                                      #:max-delay pid-file-timeout)
+                                      #:max-delay pid-file-timeout
+                                      #:validate-pid? #t)
                   (#f
                    (catch-system-error (kill pid SIGTERM))
                    #f)

Reply via email to