On Thu, Dec 22, 2022, at 1:29 PM, Adam Williamson wrote:
> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
>> On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
>> > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>> > 
>> > This document represents a proposed Change. As part of the Changes
>> > process, proposals are publicly announced in order to receive
>> > community feedback. This proposal will only be implemented if approved
>> > by the Fedora Engineering Steering Committee.
>> > 
>> > == Summary ==
>> > A downstream configuration change to reduce the systemd unit timeout
>> > from 2 minutes to 15 seconds.
>> 
>>   Great change, please do it!
>> Also, sometimes after reaching the timeout, systemd extends wait by
>> another 2 minutes (or 1m30). I wasn't able to find in the sources or
>> documentation why this happens, but this behaviour should be blocked.
>> Otherwise some services after 15s will get another 15, and then another…
>
> 15 seconds feels very aggressive to me. I can think of some cases, like
> libvirtd automatically suspending or cleanly shutting down running VMs,
> that might well take longer than that. Could we not go for 30 seconds?
> Going all the way from 90/120 down to 15 seems pretty radical.

Yeah. I'm not opposed to the change, and I understand the main impetus behind 
it (PackageKitd), but it's the consequences of unknowns that I'm still left 
scratching my head trying to imagine worse case before we actually subject 
users to it.

There really isn't a good kernel facility for something in between SIGTERM 
which is ignorable, and SIGKILL which isn't. And I'm not familiar with 
systemd's facilities for tracking service shutdown progress. i.e. I'm OK with 
SIGKILL for a process that isn't responding. But I'm also not sure if there's a 
facility for a process indicating either "I'm working on it" or "don't force 
kill me or it'll be bad".

I also don't know if privileged services doing writes to the file system can 
inhibit either remount read-only or umount? And if so, do we just wait for all 
of that to complete? I think we'd have to. I'm pretty leery of rebooting 
forcibly even if we can't remount ro because some process is holding things up, 
doing the best it can to flush. Databases and VM's do come to mind, in 
particular because I routinely run VMs on my laptop with cache mode unsafe. If 
the VM is forcibly quit, it's fine. But if the host is forcibly rebooted before 
the VM's pending writes are completed by the host, that'd be bad (regardless of 
the file system choice).

Also I wonder if  there's a way for desktops to opt into this behavior? Or a 
way for servers, iot, cloud, and rpm-ostree based systems to opt out? They very 
well might have legitimate reasons for very long service shutdowns: they're 
really super busy, and forward progress is being made but it'll take a *lot* 
longer than 15 minutes to get to a safe shutdown point.



-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to