Re: [systemd-devel] Warning "Supervising process..." due to SIGCHLD from grand-parent
I had my focus on the child (main daemon process), therefore the term went backward parent and grand-parent. Sorry, if that caused confusion, but let's go with your definitions. Thank you for the explanation and pointing out the steps! Especially 1. makes it much clearer than the list described here https://www.freedesktop.org/software/systemd/man/daemon.html, which only mention that P' shall call exit(). chrony is probably not making sure of 1., were P should wait until P' has exited. Now that you mentioned sysv semantics is old I come across an option in chrony which will not fork. I shall probably try using that instead. Best Regards, Christopher Wong From: Lennart Poettering Sent: Monday, October 31, 2022 11:43:17 AM To: Christopher Wong Cc: systemd-devel@lists.freedesktop.org Subject: Re: [systemd-devel] Warning "Supervising process..." due to SIGCHLD from grand-parent On Mo, 31.10.22 11:40, Lennart Poettering (lenn...@poettering.net) wrote: > This is almost certainly a bug in chrony. If you use Type=forking, > then the process that systemd forks off (let's call it "P") should > wait until all of the below holds: > > 1. The middle child P' has exited > 2. The grandchild (and main daemon process) P'' is running > 3. The PID file has been successfully written to contain the PID of P''. BTW, let me add an explanation, *why* this is needed: if they leave P'' running for a bit longer, then there's a race: if for some reason the deamon ends up failing shortly after starting up there is a race if P' or P'' die first. If P'' dies first, then the service manager will never see its SIGCHLD and cannot determine there was a failure. If P' dies first then all is good, as the P'' SIGCHLD will be properly collected by the service manager. But anyway, it's 2022, chrony being stuck in sysv semantics is sad. Use sd_notify(). Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] Warning "Supervising process..." due to SIGCHLD from grand-parent
On Mo, 31.10.22 11:40, Lennart Poettering (lenn...@poettering.net) wrote: > This is almost certainly a bug in chrony. If you use Type=forking, > then the process that systemd forks off (let's call it "P") should > wait until all of the below holds: > > 1. The middle child P' has exited > 2. The grandchild (and main daemon process) P'' is running > 3. The PID file has been successfully written to contain the PID of P''. BTW, let me add an explanation, *why* this is needed: if they leave P'' running for a bit longer, then there's a race: if for some reason the deamon ends up failing shortly after starting up there is a race if P' or P'' die first. If P'' dies first, then the service manager will never see its SIGCHLD and cannot determine there was a failure. If P' dies first then all is good, as the P'' SIGCHLD will be properly collected by the service manager. But anyway, it's 2022, chrony being stuck in sysv semantics is sad. Use sd_notify(). Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] Warning "Supervising process..." due to SIGCHLD from grand-parent
On Mo, 31.10.22 08:04, Christopher Wong (christopher.w...@axis.com) wrote: > Hi, > > > We have during boot received the "Supervising process..." warning > from systemd related to chronyd.service. This is not always > happening, but when it happens systemd receives SIGCHLD from > grand-parent (22955) before the parent (22956). See logs below: grand-parent? do you mean grand-child? I am a bit confusing what you are trying to say? > Oct 25 10:34:55.104980 axis-accc8ed1c728 systemd[22955]: chronyd.service: > Executing: /usr/sbin/chronyd -u chronyd -f /run/chrony/chronyd.conf > Oct 25 10:34:55.117999 axis-accc8ed1c728 chronyd[22957]: chronyd version 4.2 > starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER -SIGND +ASYNCDNS > +NTS -SECHASH +IPV6 +DEBUG) > Oct 25 10:34:55.120781 axis-accc8ed1c728 chronyd[22957]: Frequency -8.172 +/- > 1.366 ppm read from /var/lib/chrony/drift > Oct 25 10:34:55.124304 axis-accc8ed1c728 systemd[1]: > systemd-journald.service: Received EPOLLHUP on stored fd 82 (stored), closing. > Oct 25 10:34:55.126460 axis-accc8ed1c728 systemd[1]: Received SIGCHLD from > PID 22955 (chronyd). > Oct 25 10:34:55.126708 axis-accc8ed1c728 systemd[1]: Child 22955 (chronyd) > died (code=exited, status=0/SUCCESS) > Oct 25 10:34:55.126920 axis-accc8ed1c728 systemd[1]: chronyd.service: Child > 22955 belongs to chronyd.service. > Oct 25 10:34:55.127000 axis-accc8ed1c728 systemd[1]: chronyd.service: Control > process exited, code=exited, status=0/SUCCESS (success) > Oct 25 10:34:55.127027 axis-accc8ed1c728 systemd[1]: chronyd.service: Got > final SIGCHLD for state start. > Oct 25 10:34:55.127160 axis-accc8ed1c728 systemd[1]: chronyd.service: > Potentially unsafe symlink chain, will now retry with relaxed checks: > /run/chrony/chronyd.pid > Oct 25 10:34:55.127571 axis-accc8ed1c728 systemd[1]: chronyd.service: New > main PID 22957 belongs to service, we are happy. > Oct 25 10:34:55.127598 axis-accc8ed1c728 systemd[1]: chronyd.service: Main > PID loaded: 22957 > Oct 25 10:34:55.127759 axis-accc8ed1c728 systemd[1]: Custom log in > process-util.c fnc pid_is_my_child(): pid: 22957, ppid: 22956, cached_pid: 1. > Oct 25 10:34:55.127785 axis-accc8ed1c728 systemd[1]: chronyd.service: > Supervising process 22957 which is not our child. We'll most likely not > notice when it exits. > Oct 25 10:34:55.127964 axis-accc8ed1c728 systemd[1]: chronyd.service: Changed > start -> running > Oct 25 10:34:55.128006 axis-accc8ed1c728 systemd[1]: chronyd.service: Job > 117032 chronyd.service/start finished, result=done > Oct 25 10:34:55.128053 axis-accc8ed1c728 systemd[1]: Started NTP > client/server. > ... > Oct 25 10:34:55.158173 axis-accc8ed1c728 systemd[1]: Received SIGCHLD from > PID 22956 (chronyd). > Oct 25 10:34:55.158436 axis-accc8ed1c728 systemd[1]: Child 22956 (chronyd) > died (code=exited, status=0/SUCCESS) > Oct 25 10:34:55.158679 axis-accc8ed1c728 systemd[1]: chronyd.service: Child > 22956 belongs to chronyd.service. > > > The chronyd does two forks. In the normal case the parent will die > first and then the grand-parent will die. This behavior is according > to the SysV Daemons implementation > https://www.freedesktop.org/software/systemd/man/daemon.html > However, it seems scheduling for parent and grand-parent can vary > and result in a different behavior. This is almost certainly a bug in chrony. If you use Type=forking, then the process that systemd forks off (let's call it "P") should wait until all of the below holds: 1. The middle child P' has exited 2. The grandchild (and main daemon process) P'' is running 3. The PID file has been successfully written to contain the PID of P''. That all said, it's 2022, maybe chrony should just use Type=notify and sd_notify() like any modern code? Lennart -- Lennart Poettering, Berlin
[systemd-devel] Warning "Supervising process..." due to SIGCHLD from grand-parent
Hi, We have during boot received the "Supervising process..." warning from systemd related to chronyd.service. This is not always happening, but when it happens systemd receives SIGCHLD from grand-parent (22955) before the parent (22956). See logs below: Oct 25 10:34:55.104980 axis-accc8ed1c728 systemd[22955]: chronyd.service: Executing: /usr/sbin/chronyd -u chronyd -f /run/chrony/chronyd.conf Oct 25 10:34:55.117999 axis-accc8ed1c728 chronyd[22957]: chronyd version 4.2 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP -SCFILTER -SIGND +ASYNCDNS +NTS -SECHASH +IPV6 +DEBUG) Oct 25 10:34:55.120781 axis-accc8ed1c728 chronyd[22957]: Frequency -8.172 +/- 1.366 ppm read from /var/lib/chrony/drift Oct 25 10:34:55.124304 axis-accc8ed1c728 systemd[1]: systemd-journald.service: Received EPOLLHUP on stored fd 82 (stored), closing. Oct 25 10:34:55.126460 axis-accc8ed1c728 systemd[1]: Received SIGCHLD from PID 22955 (chronyd). Oct 25 10:34:55.126708 axis-accc8ed1c728 systemd[1]: Child 22955 (chronyd) died (code=exited, status=0/SUCCESS) Oct 25 10:34:55.126920 axis-accc8ed1c728 systemd[1]: chronyd.service: Child 22955 belongs to chronyd.service. Oct 25 10:34:55.127000 axis-accc8ed1c728 systemd[1]: chronyd.service: Control process exited, code=exited, status=0/SUCCESS (success) Oct 25 10:34:55.127027 axis-accc8ed1c728 systemd[1]: chronyd.service: Got final SIGCHLD for state start. Oct 25 10:34:55.127160 axis-accc8ed1c728 systemd[1]: chronyd.service: Potentially unsafe symlink chain, will now retry with relaxed checks: /run/chrony/chronyd.pid Oct 25 10:34:55.127571 axis-accc8ed1c728 systemd[1]: chronyd.service: New main PID 22957 belongs to service, we are happy. Oct 25 10:34:55.127598 axis-accc8ed1c728 systemd[1]: chronyd.service: Main PID loaded: 22957 Oct 25 10:34:55.127759 axis-accc8ed1c728 systemd[1]: Custom log in process-util.c fnc pid_is_my_child(): pid: 22957, ppid: 22956, cached_pid: 1. Oct 25 10:34:55.127785 axis-accc8ed1c728 systemd[1]: chronyd.service: Supervising process 22957 which is not our child. We'll most likely not notice when it exits. Oct 25 10:34:55.127964 axis-accc8ed1c728 systemd[1]: chronyd.service: Changed start -> running Oct 25 10:34:55.128006 axis-accc8ed1c728 systemd[1]: chronyd.service: Job 117032 chronyd.service/start finished, result=done Oct 25 10:34:55.128053 axis-accc8ed1c728 systemd[1]: Started NTP client/server. ... Oct 25 10:34:55.158173 axis-accc8ed1c728 systemd[1]: Received SIGCHLD from PID 22956 (chronyd). Oct 25 10:34:55.158436 axis-accc8ed1c728 systemd[1]: Child 22956 (chronyd) died (code=exited, status=0/SUCCESS) Oct 25 10:34:55.158679 axis-accc8ed1c728 systemd[1]: chronyd.service: Child 22956 belongs to chronyd.service. The chronyd does two forks. In the normal case the parent will die first and then the grand-parent will die. This behavior is according to the SysV Daemons implementation https://www.freedesktop.org/software/systemd/man/daemon.html However, it seems scheduling for parent and grand-parent can vary and result in a different behavior. When systemd receives the SIGCHLD from the grand-parent (22955) it start to run service_load_pid_file() and set the new main PID, which also later triggers the warning "Supervising process...", but shouldn't systemd wait for the parent (22956) to the main PID to die before considering the service as started? Now chronyd is started before the parent (22956) has died, indicated by the log entry "Started NTP client/server." above. In https://www.freedesktop.org/software/systemd/man/systemd.service.html it is only mentioning that when the parent process exits then the unit is started. It doesn't mention grand-parent when type=forking and the service binary is doing double fork(). Testing with systemd 251.3. Best Regards, Christopher Wong