Hello,

Maxim Cournoyer <maxim.courno...@gmail.com> skribis:

> + herd -s t-socket-1651 status root
> Started:
>  + root
> + herd -s t-socket-1651 stop root
> ++ cat t-pid-1651
> + kill 1896
> + exit 1
> + rm -f t-socket-1651
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill 1896
> + rm -f t-pid-1651
> FAIL tests/no-home.sh (exit status: 1)

What happens here is that the shepherd process is still alive after
‘herd stop root’ has completed, contrary to what’s expected:

--8<---------------cut here---------------start------------->8---
$herd stop root

if kill `cat "$pid"`
then
    exit 1
fi
--8<---------------cut here---------------end--------------->8---

The expectation is that shepherd has terminated by the time ‘herd stop
root’ exits; I wonder if that’s bogus.

‘herd stop root’ terminates when shepherd has closed its connection,
which normally happens when shepherd exits:

--8<---------------cut here---------------start------------->8---
28003 read(15, "(shepherd-command (version 0) (action stop) (service root) 
(arguments ()) (directory \"/data/src/shepherd\"))", 1024) = 107
28003 brk(0x1030000)                    = 0x1030000
28003 mmap(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0072be8000
28003 brk(0x100f000)                    = 0x100f000
28003 getcwd("/data/src/shepherd", 100) = 19
28003 chdir("/data/src/shepherd")       = 0
28003 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, 
st_size=2962, ...}, 0) = 0
28003 write(7, "2022-01-13 16:21:16 Exiting shepherd...\n", 40) = 40
28003 chdir("/data/src/shepherd")       = 0
28003 getuid()                          = 1000
28003 close(13)                         = 0
28003 unlink("test")                    = 0
28003 exit_group(0)                     = ?
28006 <... futex resumed>)              = ?
28008 <... read resumed> <unfinished ...>) = ?
28005 <... futex resumed>)              = ?
28004 <... futex resumed>)              = ?
28008 +++ exited with 0 +++
28006 +++ exited with 0 +++
28005 +++ exited with 0 +++
28004 +++ exited with 0 +++
28003 +++ exited with 0 +++
--8<---------------cut here---------------end--------------->8---

Maybe there’s a chance that the shell hasn’t processed the shepherd’s
SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
case, the shepherd process still exists as a zombie.

A more robust approach might be to use the shell’s builtin ‘wait’,
because then I suppose the shell will be forced to process pending
SIGCHLDs:

diff --git a/tests/no-home.sh b/tests/no-home.sh
index 85b6116..5a8c278 100644
--- a/tests/no-home.sh
+++ b/tests/no-home.sh
@@ -1,5 +1,5 @@
 # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable.
-# Copyright © 2014, 2016 Ludovic Courtès <l...@gnu.org>
+# Copyright © 2014, 2016, 2022 Ludovic Courtès <l...@gnu.org>
 #
 # This file is part of the GNU Shepherd.
 #
@@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
 $herd status root
 $herd stop root
 
-if kill `cat "$pid"`
-then
-    exit 1
-fi
+wait `cat "$pid"`
I can’t get it to fail while waiting for a few minutes of:

  while make check TESTS=tests/no-home.sh ; do : ; done

… but I cannot get the original one to fail either.

Does it work for you?

Ludo’.

Reply via email to