Hello :) Thanks a lot for putting this together!
Ludovic Courtès <[email protected]> writes: > Hi Tomas, > > Tomas Volf <[email protected]> writes: > >> So my suggestion is that when the socket is deleted, the shepherd >> process stops itself. > > I posted a patch that does exactly that: > > https://codeberg.org/shepherd/shepherd/pulls/14 > > Let me know what you think! I wanted to try it, but did not figure out how to get the source code, for some reason my git does not see the civodul/monitor-socket-deletion branch: --8<---------------cut here---------------start------------->8--- ~/src/shepherd $ git remote -v origin https://codeberg.org/shepherd/shepherd.git (fetch) origin https://codeberg.org/shepherd/shepherd.git (push) ~/src/shepherd $ git fetch ~/src/shepherd $ git branch -r origin/HEAD -> origin/main origin/keyring origin/main origin/wip-goblinsify ~/src/shepherd $ git checkout civodul/monitor-socket-deletion error: pathspec 'civodul/monitor-socket-deletion' did not match any file(s) known to git --8<---------------cut here---------------end--------------->8--- So all I could do is read it in the browser, without being able to test the code locally. It seems nice and there is only a one bug I have noticed, so the list below is mostly just few suggestions and/or observations. In configure.ac, I wonder whether you could use `action-if-fails' argument of AC_COMPUTE_INT instead of the separate `test -z ...' block. But not sure, maybe the current approach is more readable. In the comment, you have `Inotify', but it seems that is should always be spelled in lower case (as in `inotify'). At least that is what wikipedia does, even at the start of a sentence. In the shepherd.texi file, the `If this' on a line by itself looks bit weird, but maybe you did it this way intentionally to minimize the diff? In the Note, you use `it' often and it is not always obvious to what it refers to (without knowing the context). Maybe ending the first sentence with `to control @command{shepherd}' would be bit cleaner? Also, if I read the code correctly, even the PID 1 shuts down if the attempt to re-open the socket fails, that is not mentioned here. In `socket-monitor', I admit I did not figure out when the wait-for-file-deletion returns #f, it seems to me that when it returns, it returns #t? So I am unsure what the point of the `when' is instead of just (wait-for-file-deletion socket-file) (on-deletion) (loop). In `on-socket-deletion' (the PID 1 branch), I have to admit I am not sure what exactly happens when you call (stop-service root-service). Is that a clean shutdown equivalent to running `shutdown' command? Since simply stopping the PID 1 leads to a kernel panic, I want to make sure that is not what we are doing here. I am unsure whether stopping the system is the right thing to do (even if we fail to re-open the socket), but at least it should be grateful. For the system.scm.in, I do not have much experience with inotify, so cannot comment much here. However I believe you have a race condition in `wait-for-file-deletion'. You are checking via (file-exists? file), but you have no guarantee that the file on the disk matches the file you have open as a socket. I think you should use `stat' and check whether `stat:dev' and `stat:ino' match what you expect them to be. One more observation (if I read the code right) is that if you fail to set up the inotify watch, you will close the socket and immediately return #t. Which will cause the shepherd to spin forever in a tight loop opening and closing the socket. Did I miss anything? I am not sure what is the correct approach here. Maybe, if the inotify setup fails, we should fallback to the polling approach from GNU/Hurd? Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
