Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-17 Thread Peter Boy


> Am 17.01.2023 um 22:30 schrieb Chris Murphy :
> 
> 
> 
> On Tue, Jan 17, 2023, at 11:51 AM, Peter Boy wrote:
>>> Am 16.01.2023 um 13:23 schrieb Lennart Poettering :
>>> 
>>> Just to say this cleary btw: when we introduced the time-out initially
>>> we were coming from sysvinit where no such time-out existed at
>>> all. Hence we picked a conservative (i.e. overly long) value to not
>>> upset things too badly. And yes, some people were very much upset we
>>> now defaulted to a time-out.
>>> 
>>> If we'd start from scratch without sysvinit heritage, I think we
>>> would have started with something much much lower right-away.
>> 
>> When introducing a timeout, you obviously had the grace to choose a 
>> fairly conservative  (i.e. cautious) default value that did not lead to 
>> major problems. Would be interesting what would have been if you had 
>> started with 15 sec.
> 
> Why? it was 0 sec before systemd.

As far as I understood Lennart, there was no timeout in Sys V that killed a 
hanging process. But that is not the relevant point. 

> If anything, the time out behavior is masking problems with services not 
> shutting down in a timely manner.

It's not necessarily that. It is only one of at least 2 possibilities. 

One possibility is indeed that a service "hangs" and therefore does not 
terminate in a timely manner. This is then a bug or inappropriate programming 
in the service. And there is no point in waiting for this service, you have to 
abort, the sooner the better. 

The other possibility, especially on a highly loaded server, is that processes 
impede each other in the special situation of a shutdown and resource 
bottleneck resp. resource concurrency. And this is not dependent on the 
individual service, but on the multitude of services and their 
interdependencies. This process is not determined and is randomly driven. The 
time required for a single event, i.e. an individual shutdown, is not 
predictable. At best, one can approximate a range. If the range is exceeded, 
the assumption of a non-faulty flow becomes increasingly improbable and there 
is no point in waiting for any service anymore. No more improvement can be 
expected. You have to abort.

Unfortunately, we have no data in this case, only different "feelings". We 
can't estimate a plausible range, we can only kind of guess. And in the case of 
a server, we might be accept to wait a little longer in light of potential, 
major follow-on issues.  

So, the current decision is not optimal, but OK and manageable.



>> The way it is proposed it doesn’t make a lot of sense. Desktops and 
>> servers work very differently and have different requirements. For 
>> servers, this proposal in its present form makes no sense at all, and 
>> is on the contrary dangerous.
> 
> Why? It's been said in this thread that servers come with a higher 
> expectation of rebooting upon request rather than indefinitely hanging, in 
> contrast to desktops where there can be some tolerance for delay in exchange 
> for safety.

Maybe I don’t fully understand this due to translation issues. On a server, a 
reboot is a rare event. Optimally it is up 24/7/365. If I suffer the misfortune 
of having to reboot the server, it doesn't matter if it's 45 sec, 2 min or 5 
min. All important services are redundant, there is no total failure. And the 
startup BIOS processing often takes longer than any (regular) shutdown process. 
So, if I have 15 sec timeout instead of 2 mins, is no noticeable improvement. 
The most important thing is to get back up without any damage. 


> What I've seen on Fedora Server when there are services that hold things up 
> is invariably sshd does immediately quit so now I can't even log back in to 
> find out what's holding up the reboot. It's quite substantially a worse Ux 
> than on the desktop. I mean, ostensibly I know what I'm doing on my own 
> server and don't need to be second guessed like a desktop user.

Yes, it's pretty annoying that ssh always reliably stops immediately, unlike 
all other processes. It would be most helpful if systemd would terminate ssh 
last. 

> At least postgresql and libvirtd are configured to inhibit reboot/shutdown 
> indefinitely until they properly quit. Services can opt into this behavior, 
> overriding the default. But indefinite delay would  pose a bigger problem on 
> server than on desktops, due to the loss of any feedback and control.

Agreed. Nobody voted for an indefinite delay, as far as I have read the posts. 
It's all about how long who is willing to wait and about the relevance of 
possible damages.  





--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org

Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-17 Thread Chris Murphy


On Tue, Jan 17, 2023, at 11:51 AM, Peter Boy wrote:
>> Am 16.01.2023 um 13:23 schrieb Lennart Poettering :
>> 
>> Just to say this cleary btw: when we introduced the time-out initially
>> we were coming from sysvinit where no such time-out existed at
>> all. Hence we picked a conservative (i.e. overly long) value to not
>> upset things too badly. And yes, some people were very much upset we
>> now defaulted to a time-out.
>> 
>> If we'd start from scratch without sysvinit heritage, I think we
>> would have started with something much much lower right-away.
>
> When introducing a timeout, you obviously had the grace to choose a 
> fairly conservative  (i.e. cautious) default value that did not lead to 
> major problems. Would be interesting what would have been if you had 
> started with 15 sec.

Why? it was 0 sec before systemd. If anything, the time out behavior is masking 
problems with services not shutting down in a timely manner.


>> It
>> appears to me fedora is considering switch to that now, and I
>> certainly think that would make a lot of sense.
>
> The way it is proposed it doesn’t make a lot of sense. Desktops and 
> servers work very differently and have different requirements. For 
> servers, this proposal in its present form makes no sense at all, and 
> is on the contrary dangerous.

Why? It's been said in this thread that servers come with a higher expectation 
of rebooting upon request rather than indefinitely hanging, in contrast to 
desktops where there can be some tolerance for delay in exchange for safety.

Why should a server sysadmin's request for a reboot or shutdown be second 
guessed? What are the consequences of second guessing?

What I've seen on Fedora Server when there are services that hold things up is 
invariably sshd does immediately quit so now I can't even log back in to find 
out what's holding up the reboot. It's quite substantially a worse Ux than on 
the desktop. I mean, ostensibly I know what I'm doing on my own server and 
don't need to be second guessed like a desktop user. 

At least postgresql and libvirtd are configured to inhibit reboot/shutdown 
indefinitely until they properly quit. Services can opt into this behavior, 
overriding the default. But indefinite delay would  pose a bigger problem on 
server than on desktops, due to the loss of any feedback and control.


> A strangely ignorant attitude to take a positive view of the change, 
> even if those affected, the customers, are upset and fear considerable 
> disadvantages. Only someone who is not responsible for TBs of data and 
> thousands of users can talk like this. The least you have to do is test 
> and check what effects it has and prove that the concern is unjustified.

The proposal changes a default behavior. It's not itself an override.



-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-17 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Jan 12, 2023 at 09:36:39AM -0500, Colin Walters wrote:
> Ideally, we'd have a mechanism to define timeouts like this somehow
> relative to system speed (throughput) not simple wall clock time.

That's a nice idea. Meson has '-t' that is a multiplier for test
timeouts and it's quite useful. I guess we could add something like this
in systemd (with the usual trifecta of a config file setting, and commandline
and kernel commandline options).

Another option is for things that have hard-to-predict timeout needs to extend
the timeout dynamically using sd_notify with EXTEND_TIMEOUT_USEC=…. [1]
It's a bit more work for the service, but is a much more flexible solution.

[1] 
https://www.freedesktop.org/software/systemd/man/sd_notify.html#EXTEND_TIMEOUT_USEC=%E2%80%A6

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-17 Thread Peter Boy


> Am 16.01.2023 um 13:23 schrieb Lennart Poettering :
> 
> Just to say this cleary btw: when we introduced the time-out initially
> we were coming from sysvinit where no such time-out existed at
> all. Hence we picked a conservative (i.e. overly long) value to not
> upset things too badly. And yes, some people were very much upset we
> now defaulted to a time-out.
> 
> If we'd start from scratch without sysvinit heritage, I think we
> would have started with something much much lower right-away.

When introducing a timeout, you obviously had the grace to choose a fairly 
conservative  (i.e. cautious) default value that did not lead to major 
problems. Would be interesting what would have been if you had started with 15 
sec.


> It
> appears to me fedora is considering switch to that now, and I
> certainly think that would make a lot of sense.

The way it is proposed it doesn’t make a lot of sense. Desktops and servers 
work very differently and have different requirements. For servers, this 
proposal in its present form makes no sense at all, and is on the contrary 
dangerous.

One indispensable amendment is that nothing changes for servers.


> Anyway, if fedora now wants to lower the default setup, then I
> certainly sympathize. I think a policy of "aggressive time-out by
> default, individual opt-outs per-service" is a better policy for a
> stable OS than the current "conservative time-out by default,
> individual opt-in per-service for something more aggressive".
> 
> So yes, lowering the time-outs by default would make sense to me, but
> of course, people will be upset...

A strangely ignorant attitude to take a positive view of the change, even if 
those affected, the customers, are upset and fear considerable disadvantages. 
Only someone who is not responsible for TBs of data and thousands of users can 
talk like this. The least you have to do is test and check what effects it has 
and prove that the concern is unjustified.


--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor and board member
Java developer and enthusiast

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-16 Thread Lennart Poettering
On Mi, 11.01.23 16:35, Zbigniew Jędrzejewski-Szmek (zbys...@in.waw.pl) wrote:

> We have thousands of systemd services in Fedora. To "just add timeouts
> to things that take too long" would mean updating them individually.
> (Or maybe only some, but we don't really know which ones.)
> This is never going to happen, it's just too much work, and there is
> no clear clear understanding if it is "safe" for any specific service.
>
> Instead, the idea is to attack the problem from the other end: reduce
> the timeout for everyone. Once this happens, we should start getting
> feedback about what services where this doesn't work. Some services
> legitimately need a long timeout (databases, etc), and for those the
> maintainers would usually have a good idea and can extend the timeout
> easily. Some services are just buggy, and with the additional visibility
> and tracebacks, it should be much easier to diagnose why they are slow.
>
> Approaching the problem from this side is much more feasible. We'll
> probably have to touch a dozen files instead of thousands.

Just to say this cleary btw: when we introduced the time-out initially
we were coming from sysvinit where no such time-out existed at
all. Hence we picked a conservative (i.e. overly long) value to not
upset things too badly. And yes, some people were very much upset we
now defaulted to a time-out.

If we'd start from scratch without sysvinit heritage, I think we
would have started with something much much lower right-away. It
appears to me fedora is considering switch to that now, and I
certainly think that would make a lot of sense.

Anyway, if fedora now wants to lower the default setup, then I
certainly sympathize. I think a policy of "aggressive time-out by
default, individual opt-outs per-service" is a better policy for a
stable OS than the current "conservative time-out by default,
individual opt-in per-service for something more aggressive".

So yes, lowering the time-outs by default would make sense to me, but
of course, people will be upset...

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-12 Thread Michael Catanzaro
On Thu, Jan 12 2023 at 08:31:33 PM +, Jonathan Wakely 
 wrote:

IIUC the difficulty is finding out which ones are being slow, but that
could be solved by changing the signal to SIGABRT, right?


Well we still have to end with SIGKILL because SIGABRT is ignorable, so 
the new proposed behavior is SIGTERM -> wait until timeout -> SIGABRT 
-> SIGKILL. This is what the TimeoutStopFailureMode=abort configuration 
that Lennart suggested already does.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-12 Thread Jonathan Wakely
On Wed, 11 Jan 2023 at 16:36, Zbigniew Jędrzejewski-Szmek wrote:
>
> On Tue, Jan 10, 2023 at 06:16:07PM -0800, Kevin Fenzi wrote:
> > Ok, but it seems safer to just add timeouts to things that take too long
> > and can safely be killed off rather than lowering the timeout for
> > everything and potentially kill things that cannot safely be killed.
> >
> > I realize it's a lot more work to try and fix particular slow things.
> >
> > It's hard to know what really needs more time and what just should be
> > killed off sooner. :(
>
> We have thousands of systemd services in Fedora. To "just add timeouts
> to things that take too long" would mean updating them individually.
> (Or maybe only some, but we don't really know which ones.)
> This is never going to happen, it's just too much work, and there is
> no clear clear understanding if it is "safe" for any specific service.
>
> Instead, the idea is to attack the problem from the other end: reduce
> the timeout for everyone. Once this happens, we should start getting
> feedback about what services where this doesn't work. Some services
> legitimately need a long timeout (databases, etc), and for those the
> maintainers would usually have a good idea and can extend the timeout
> easily. Some services are just buggy, and with the additional visibility
> and tracebacks, it should be much easier to diagnose why they are slow.
>
> Approaching the problem from this side is much more feasible. We'll
> probably have to touch a dozen files instead of thousands.

But most of those thousands of services never cause delays at
shutdown, so don't need to be touched in either case. The default
doesn't matter for them, as they never timeout anyway. The services
that do cause problems are less common, and only those ones would need
to be given a shorter timeout to solve most of the problems users
experience.

IIUC the difficulty is finding out which ones are being slow, but that
could be solved by changing the signal to SIGABRT, right?
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-12 Thread Colin Walters


On Thu, Dec 22, 2022, at 12:35 PM, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>
> This document represents a proposed Change. As part of the Changes
> process, proposals are publicly announced in order to receive
> community feedback. This proposal will only be implemented if approved
> by the Fedora Engineering Steering Committee.
>
> == Summary ==
> A downstream configuration change to reduce the systemd unit timeout
> from 2 minutes to 15 seconds.

A problem I've seen in the past is that timeouts like this have very different 
effects on slow/heavily loaded systems.  For example, an OpenStack environment 
that has relatively slow storage (or a public cloud environment without 
provisioned IOPS).  Or a bare metal server that is loaded to the limit.

The effect of a 15 second timeout in those scenarios is wildly different from 
that of a relatively idle desktop system with a SSD or modern NVMe drive.

DBus for example defaults to a 25 second timeout on method calls, and I've seen 
problems like the above there.

Ideally, we'd have a mechanism to define timeouts like this somehow relative to 
system speed (throughput) not simple wall clock time.

That said, I think the simplest is for this change to only apply to desktop 
systems.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-11 Thread Michal Schorm
On Wed, Jan 11, 2023 at 5:36 PM Zbigniew Jędrzejewski-Szmek
 wrote:
> Instead, the idea is to attack the problem from the other end: reduce
> the timeout for everyone. Once this happens, we should start getting
> feedback about what services where this doesn't work.

Sound's terrible to me.
Could we achieve this without knowingly breaking users' machine's
shutdowns? Or annoying user base in general ?
I can imagine a new standalone package, that only changes the timeout,
and only people willing to test the short shutdowns would participate
and install it?

--

If somehow we end up with the terrible approach of "change crucial
system-wide config and see what happens", could we at least do it step
by step?
Shortening the timeout by e.g. 10 or 30 seconds every Fedora release ?
30 minutes every half a year spent by copy-pasting the Fedora change
from the last release sound's like a solid tradeoff for not gaining
upset users.

--

The different configuration for different Fedora editions doesn't
sound right to me.
The edition originally installed does tell us anything about any other
software installed on that machine later.

Hunting for well-known trouble-making services (regarding shutdown
times) is a much cleaner approach.

--

Is there any existing piece of software that can analyze the shutdowns
and report the mean times and messages of each service / component ?
I'd use it on my devices with problematic shutdowns.

And I personally believe that strictly opt-in telemetry is a good way
to gather user (telemetric) data.

--

Michal Schorm
Software Engineer
Core Services - Databases Team
Red Hat

--

On Wed, Jan 11, 2023 at 9:56 PM Michael Catanzaro  wrote:
>
> On Wed, Jan 11 2023 at 11:48:27 AM -0800, Kevin Fenzi 
> wrote:
> > I do appreciate the change to do an abort on these units. Will that
> > get
> > reported via abrt?
>
> It should, yes, the same as any other crash.
>
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it: 
> https://pagure.io/fedora-infrastructure/new_issue
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-11 Thread Michael Catanzaro
On Wed, Jan 11 2023 at 11:48:27 AM -0800, Kevin Fenzi  
wrote:
I do appreciate the change to do an abort on these units. Will that 
get

reported via abrt?


It should, yes, the same as any other crash.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-11 Thread Kevin Fenzi
On Wed, Jan 11, 2023 at 04:35:33PM +, Zbigniew Jędrzejewski-Szmek wrote:
> 
> We have thousands of systemd services in Fedora. To "just add timeouts
> to things that take too long" would mean updating them individually.
> (Or maybe only some, but we don't really know which ones.)

Sure, although you can get user reports of them. 

> This is never going to happen, it's just too much work, and there is
> no clear clear understanding if it is "safe" for any specific service.

Well, I would hope package maintainers would be able to know/figure this
out on their packages? I realize there's a lot of variables there, but
if a service is ok with the current timeout, but not ok with the new
proposed timeout the maintainer should be able to figure out why and fix
it, or add a timeout back to the old one?

> Instead, the idea is to attack the problem from the other end: reduce
> the timeout for everyone. Once this happens, we should start getting
> feedback about what services where this doesn't work. Some services
> legitimately need a long timeout (databases, etc), and for those the
> maintainers would usually have a good idea and can extend the timeout
> easily. Some services are just buggy, and with the additional visibility
> and tracebacks, it should be much easier to diagnose why they are slow.
> 
> Approaching the problem from this side is much more feasible. We'll
> probably have to touch a dozen files instead of thousands.

Well, yes, for you, but it's not so great for the user with the dead
modem or broken databases. :(

But I guess in the end a service that is ok with 120 seconds, but not ok
with 15 seconds would hopefully be quite rare. 

I do appreciate the change to do an abort on these units. Will that get
reported via abrt?

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-11 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jan 10, 2023 at 06:16:07PM -0800, Kevin Fenzi wrote:
> On Tue, Jan 10, 2023 at 06:00:59PM -0600, Michael Catanzaro wrote:
> > On Tue, Jan 10 2023 at 03:19:10 PM -0800, Kevin Fenzi 
> > wrote:
> > > Is there something wrong with that approach that I am not understanding?
> > 
> > No, I don't think you're missing anything. That should work fine for
> > PackageKit. But of course it won't do a thing to help with for other
> > services that are misbehaving! The goal here is to make the operating system
> > generally more robust.
> 
> Ok, but it seems safer to just add timeouts to things that take too long
> and can safely be killed off rather than lowering the timeout for
> everything and potentially kill things that cannot safely be killed. 
> 
> I realize it's a lot more work to try and fix particular slow things.
> 
> It's hard to know what really needs more time and what just should be
> killed off sooner. :(

We have thousands of systemd services in Fedora. To "just add timeouts
to things that take too long" would mean updating them individually.
(Or maybe only some, but we don't really know which ones.)
This is never going to happen, it's just too much work, and there is
no clear clear understanding if it is "safe" for any specific service.

Instead, the idea is to attack the problem from the other end: reduce
the timeout for everyone. Once this happens, we should start getting
feedback about what services where this doesn't work. Some services
legitimately need a long timeout (databases, etc), and for those the
maintainers would usually have a good idea and can extend the timeout
easily. Some services are just buggy, and with the additional visibility
and tracebacks, it should be much easier to diagnose why they are slow.

Approaching the problem from this side is much more feasible. We'll
probably have to touch a dozen files instead of thousands.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-11 Thread Michael Catanzaro
On Tue, Jan 10 2023 at 06:00:59 PM -0600, Michael Catanzaro 
 wrote:
I'm going to amend this proposal to specify 
TimeoutStopFailureMode=abort as well, so we can get crash dumps and 
bug reports when programs fail to quit properly.


By the way, the goal with this is to surface bugs and dodge any 
concerns that using a shorter shutdown timer will hide bugs. SIGKILL is 
basically silent and easy to go unnoticed. But it's a lot harder for 
developers to ignore a crash report.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-10 Thread Kevin Fenzi
On Tue, Jan 10, 2023 at 06:00:59PM -0600, Michael Catanzaro wrote:
> On Tue, Jan 10 2023 at 03:19:10 PM -0800, Kevin Fenzi 
> wrote:
> > Is there something wrong with that approach that I am not understanding?
> 
> No, I don't think you're missing anything. That should work fine for
> PackageKit. But of course it won't do a thing to help with for other
> services that are misbehaving! The goal here is to make the operating system
> generally more robust.

Ok, but it seems safer to just add timeouts to things that take too long
and can safely be killed off rather than lowering the timeout for
everything and potentially kill things that cannot safely be killed. 

I realize it's a lot more work to try and fix particular slow things.

It's hard to know what really needs more time and what just should be
killed off sooner. :(

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-10 Thread Michael Catanzaro
On Tue, Jan 10 2023 at 03:19:10 PM -0800, Kevin Fenzi  
wrote:
Is there something wrong with that approach that I am not 
understanding?


No, I don't think you're missing anything. That should work fine for 
PackageKit. But of course it won't do a thing to help with for other 
services that are misbehaving! The goal here is to make the operating 
system generally more robust.


I'm going to amend this proposal to specify 
TimeoutStopFailureMode=abort as well, so we can get crash dumps and bug 
reports when programs fail to quit properly.


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-10 Thread Kevin Fenzi
On Mon, Jan 09, 2023 at 08:45:43PM +, Zbigniew Jędrzejewski-Szmek wrote:
> 
> The current default is mostly arbitrary. It was just selected as a nice round
> value, in the spirit of "let's pick something large enough to be larger than 
> any
> realistic process will ever need".
> 
> I think you're misinterpreting Michael's words that "it's safe enough to 
> ignore this problem".
> IIUC, the idea is to set a longer timeout in those cases at the service level.
> I.e. the problem is "ignored" only in the sense of the system-wide default 
> being
> smaller, and the specific services setting a higher timeout as required.
> 
> Also, even with the current high defaults, some services still actually time 
> out.
> If something bad happens in that case, it is already happening. This is bad
> for users in at least two ways. First, because they have to wait and wait, and
> second because the timeout is actually hit so things *do* get terminated but 
> when
> this happens, we do nothing. The idea would be to lower the default timeouts,
> but also approach any cases where we hit the timeout much more seriously.

I'm a bit confused why we can't just fix the units that are taking too
long instead of changing the global value. The change page mentions that
"it's not possible to fix every misbehaving service: in some cases the
misbehaviour comes from design flaws that are difficult to resolve." but
can't we just change their timeout? ie, add to packagekitd's service
file: 
TimeoutStopSec=30s
(or 15 or whatever)?

Is there something wrong with that approach that I am not understanding?

kevin


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-10 Thread Michael Catanzaro
On Mon, Jan 9 2023 at 11:04:11 AM +0100, Lennart Poettering 
 wrote:

That said: dumping core is potentially extremely expensive (web
browsers have gigabytes of virtual memory that we might end up
processing and compressing). Quite often the stuff that is slow when
exiting is also the stuff that is expensive to dump.


Web browser core dumps will fail by default due to the 2 GB limit on 
core dumps, after which systemd-coredump truncates the core dump. I 
think we should raise the default core size limit to at least 20 GB.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-09 Thread Zbigniew Jędrzejewski-Szmek
On Sat, Jan 07, 2023 at 12:59:26AM +0100, Peter Boy wrote:
> 
> > Am 06.01.2023 um 18:06 schrieb Michael Catanzaro :
> > 
> > ...
> > 
> > I think most of the feedback on this change can be summarized as:
> > 
> > (a) Specific services want longer timeouts.
> > 
> > This can already be configured via existing configuration mechanisms, so I 
> > think it's safe enough to ignore this problem. E.g. if a quick shutdown 
> > will brick your Pinephone modem or corrupt your database, then whatever 
> > service is involved there should request a larger timeout.
> 
> As several posts have shown, it is specifically not safe to ignore the 
> problem. It is a mystery to me how you can come to this assessment. 
> 
> We don't know if all affected services explicitly request a longer timeout. 
> We don't have a test procedure nor a QA criterion for this that is testable. 
> We don't know how many rely on the current default timeout because it has 
> worked so far. And in view of these known circumstances to introduce a "quick 
> shutdown" so nonchalantly and without exact data and tests is simply 
> irresponsible and endangers the good reputation of the distribution and 
> especially Fedora Server known to run reliably stable with (or in spite of) a 
> quick release sequence. 
> 
> And it does not take into account in any way the other fact, expressed here 
> in several posts, that it is not a problem of individual, singular processes, 
> but the interaction of several processes in the specific shutdown situation, 
> whereby individual processes can not terminate themselves as quickly as they 
> do in normal circumstances. And it's obviously a non-determinant random 
> process that turns out differently for each shutdown.
> 
> The current timeout may not be perfect, but long experience shows that in the 
> vast majority of cases the value results in a safe, uncorrupted shutdown.  We 
> do not have a wave of complaints about system corruption after shutdown.
> 
> And the current value may be the result of a wild guess. I do not know how it 
> was achieved. But replacing one wild guess with another wild guess that 
> introduces additional, unpredictable risks is not a sound and robust approach 
> (and that is true not only for server, by the way).

The current default is mostly arbitrary. It was just selected as a nice round
value, in the spirit of "let's pick something large enough to be larger than any
realistic process will ever need".

I think you're misinterpreting Michael's words that "it's safe enough to ignore 
this problem".
IIUC, the idea is to set a longer timeout in those cases at the service level.
I.e. the problem is "ignored" only in the sense of the system-wide default being
smaller, and the specific services setting a higher timeout as required.

Also, even with the current high defaults, some services still actually time 
out.
If something bad happens in that case, it is already happening. This is bad
for users in at least two ways. First, because they have to wait and wait, and
second because the timeout is actually hit so things *do* get terminated but 
when
this happens, we do nothing. The idea would be to lower the default timeouts,
but also approach any cases where we hit the timeout much more seriously.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-09 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Jan 09, 2023 at 11:04:11AM +0100, Lennart Poettering wrote:
> On Fr, 06.01.23 11:06, Michael Catanzaro (mcatanz...@redhat.com) wrote:
> 
> > Maybe instead of SIGKILL, we should send SIGQUIT instead. That way abrt
> > should complain next time you boot and users will have an opportunity to
> > report bugs to the package maintainer, instead of the problem being forever
> > ignored. Killing things silently makes it real hard to report bugs. And as a
> > bonus, the core dump should actually show what the process was doing at the
> > time it got killed. The more I think about it, the better this sounds.
> > Currently this can be configured using FinalKillSignal=SIGQUIT, so we'd just
> > need to figure out the right place to put that.
>
> > systemd already has a configuration option for this so we'd just have to
> > turn it on.
> 
> Don't use FinalKillSignal=SIGQUIT.
> 
> Use TimeoutStopFailureMode=abort instead. (which covers more ground,
> and sends SIGABRT rather than SIGQUIT on failure, which has the same
> effect: coredumping).

I guess we could add DefaultTimeoutStopFailureMode= setting and a
-Ddefault-default-timeout-stop-failure-mode= compile-time default for it.

Barring that, it's possible to do a per-type drop-ins:
/usr/lib/systemd/system/{service,scope,mount}.d/10-kill-mode.conf
or so, maybe for more types. But that'd be harder to override and more
messy in general.

> That said: dumping core is potentially extremely expensive (web
> browsers have gigabytes of virtual memory that we might end up
> processing and compressing). Quite often the stuff that is slow when
> exiting is also the stuff that is expensive to dump.
> 
> Hence, I am not sure you'll gain that much via this mechanism: you cut
> a long operation short and then execute long operation as result. You
> might end delaying things more than you hope shortening them.

That is true, but I don't think that it's an actual reason to not do this. The
job for the coredump gets a separate timeout, so the coredump would generally
run successfully during shutdown.

It'll obviously delay the shutdown, making the whole thing even more painful.
I assume that we would treat any such cases as bugs. If we get the coredumps
reported though abrt, it'd indeed make it easier to diagnose those cases.

--

Digging into some details:

It seems that coredumping usually takes a few seconds at most, even with
gigabytes of RSS. I won't cite specific numbers, since that's just a very
biased sample on my laptop gathered via
  journalctl --grep 'systemd-coredump@.*: Consumed'

If the default stop timeout is set to 15s, we would probably have to raise the
timeout for the systemd-coredump@.service to something higher. This would let
the coredump process run successfully in most cases.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-09 Thread Matthias Clasen
On Sat, Jan 7, 2023 at 9:31 AM Giuseppe Scrivano 
wrote:

>
> I've just opened a PR upstream for Podman to kill -9 all the remaining
> exec sessions when the container process terminates, so both --pid=host
> and --pid=private behaves in the same way.  It would solve the issue we
> are seeing.
>
>
That is fantastic. Thanks, Giuseppe!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-09 Thread Lennart Poettering
On Fr, 06.01.23 11:06, Michael Catanzaro (mcatanz...@redhat.com) wrote:

> Maybe instead of SIGKILL, we should send SIGQUIT instead. That way abrt
> should complain next time you boot and users will have an opportunity to
> report bugs to the package maintainer, instead of the problem being forever
> ignored. Killing things silently makes it real hard to report bugs. And as a
> bonus, the core dump should actually show what the process was doing at the
> time it got killed. The more I think about it, the better this sounds.
> Currently this can be configured using FinalKillSignal=SIGQUIT, so we'd just
> need to figure out the right place to put that.
>
> systemd already has a configuration option for this so we'd just have to
> turn it on.

Don't use FinalKillSignal=SIGQUIT.

Use TimeoutStopFailureMode=abort instead. (which covers more ground,
and sends SIGABRT rather than SIGQUIT on failure, which has the same
effect: coredumping).

That said: dumping core is potentially extremely expensive (web
browsers have gigabytes of virtual memory that we might end up
processing and compressing). Quite often the stuff that is slow when
exiting is also the stuff that is expensive to dump.

Hence, I am not sure you'll gain that much via this mechanism: you cut
a long operation short and then execute long operation as result. You
might end delaying things more than you hope shortening them.

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-06 Thread Peter Boy

> Am 06.01.2023 um 18:06 schrieb Michael Catanzaro :
> 
> ...
> 
> I think most of the feedback on this change can be summarized as:
> 
> (a) Specific services want longer timeouts.
> 
> This can already be configured via existing configuration mechanisms, so I 
> think it's safe enough to ignore this problem. E.g. if a quick shutdown will 
> brick your Pinephone modem or corrupt your database, then whatever service is 
> involved there should request a larger timeout.

As several posts have shown, it is specifically not safe to ignore the problem. 
It is a mystery to me how you can come to this assessment. 

We don't know if all affected services explicitly request a longer timeout. We 
don't have a test procedure nor a QA criterion for this that is testable. We 
don't know how many rely on the current default timeout because it has worked 
so far. And in view of these known circumstances to introduce a "quick 
shutdown" so nonchalantly and without exact data and tests is simply 
irresponsible and endangers the good reputation of the distribution and 
especially Fedora Server known to run reliably stable with (or in spite of) a 
quick release sequence. 

And it does not take into account in any way the other fact, expressed here in 
several posts, that it is not a problem of individual, singular processes, but 
the interaction of several processes in the specific shutdown situation, 
whereby individual processes can not terminate themselves as quickly as they do 
in normal circumstances. And it's obviously a non-determinant random process 
that turns out differently for each shutdown.

The current timeout may not be perfect, but long experience shows that in the 
vast majority of cases the value results in a safe, uncorrupted shutdown.  We 
do not have a wave of complaints about system corruption after shutdown.

And the current value may be the result of a wild guess. I do not know how it 
was achieved. But replacing one wild guess with another wild guess that 
introduces additional, unpredictable risks is not a sound and robust approach 
(and that is true not only for server, by the way).


> (b) Also, Fedora Server wants to opt out of this change entirely.
> 
> But I think all other Fedora editions and spins do want this change, so we 
> shouldn't make it a Workstation-specific change. Maybe we can change systemd 
> defaults and Fedora Server could install a configuration override?

You are welcome to do the work on such a configuration override. If you can 
guarantee to successfully complete this task in time, that's OK. Our work 
schedule is already at capacity, and without free resources, unfortunately.






--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor and board member
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-06 Thread Michael Catanzaro
On Fri, Jan 6 2023 at 11:06:26 AM -0600, Michael Catanzaro 
 wrote:

(a) Specific services want longer timeouts.

This can already be configured via existing configuration mechanisms, 
so I think it's safe enough to ignore this problem. E.g. if a quick 
shutdown will brick your Pinephone modem or corrupt your database, 
then whatever service is involved there should request a larger 
timeout.


Erm, at least... I think that would work? We'd need to make sure

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-06 Thread Michael Catanzaro



On Fri, Jan 6 2023 at 09:47:29 AM -0500, Matthias Clasen 
 wrote:

On my Silverblue system, the main offender for this is podman.

As soon as I have a toolbox running, conmon holds up the reboot for a 
very long time because it refuses to shutdown properly.


Maybe instead of SIGKILL, we should send SIGQUIT instead. That way abrt 
should complain next time you boot and users will have an opportunity 
to report bugs to the package maintainer, instead of the problem being 
forever ignored. Killing things silently makes it real hard to report 
bugs. And as a bonus, the core dump should actually show what the 
process was doing at the time it got killed. The more I think about it, 
the better this sounds. Currently this can be configured using 
FinalKillSignal=SIGQUIT, so we'd just need to figure out the right 
place to put that.


systemd already has a configuration option for this so we'd just have 
to turn it on.


I think most of the feedback on this change can be summarized as:

(a) Specific services want longer timeouts.

This can already be configured via existing configuration mechanisms, 
so I think it's safe enough to ignore this problem. E.g. if a quick 
shutdown will brick your Pinephone modem or corrupt your database, then 
whatever service is involved there should request a larger timeout.


(b) Also, Fedora Server wants to opt out of this change entirely.

But I think all other Fedora editions and spins do want this change, so 
we shouldn't make it a Workstation-specific change. Maybe we can change 
systemd defaults and Fedora Server could install a configuration 
override?


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-06 Thread Ralf Corsépius



Am 30.12.22 um 10:42 schrieb Peter Boy:




Am 30.12.2022 um 06:59 schrieb Nico Kadel-Garcia :


Am 28.12.22 um 11:49 schrieb Peter Boy:


It is a good idea to make the timeout configurable.  But the default timeout 
for servers must remain unchanged.


My problem is not "defined timeouts" it is systemd delaying shutdowns
for no obvious reasons.

...

And as you asked: On my (bare metal) servers, Im am occasionally
experiencing delayed shutdowns in the order of several minutes.

This is simply inacceptable!


If it is not acceptable for you, configure it to your needs.


FWIW: Yesterday, I had an infinite shutdown hanger, which wasn't caused 
by systemd timers, I could only work around by a pressing the reset switch.


Unfortunatly, I don't know the cause, but apparently something related 
to a non-responsive nfs-connection.


Ralf
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-06 Thread Matthias Clasen
On my Silverblue system, the main offender for this is podman.

As soon as I have a toolbox running, conmon holds up the reboot for a very
long time because it refuses to shutdown properly.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-05 Thread Richard W.M. Jones
On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
> The most common service to cause this issue is PackageKit, but there
> are others.

NFSv4 unmounts too.  I think there's some ordering issue.  I use NFS
everywhere and this delay is frustrating, so a shorter delay would be
welcome.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2023-01-03 Thread Peter Boy


> Am 30.12.2022 um 19:45 schrieb Nico Kadel-Garcia :
> 
> We need to be cautious about
> not being able to personally picture why someone would use an existing
> default and overriding it casually, and inflicting our new logic on
> the unsuspecting existing userbase.

+1!




--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-30 Thread Nico Kadel-Garcia
On Fri, Dec 30, 2022 at 12:02 PM Tomasz Torcz  wrote:
>
> On Fri, Dec 30, 2022 at 12:59:19AM -0500, Nico Kadel-Garcia wrote:
> > On Wed, Dec 28, 2022 at 7:01 AM Ralf Corsépius  wrote:
> > >
> > > Am 28.12.22 um 11:49 schrieb Peter Boy:
> > > >
> > > > It is a good idea to make the timeout configurable.  But the default 
> > > > timeout for servers must remain unchanged.
> > >
> > > My problem is not "defined timeouts" it is systemd delaying shutdowns
> > > for no obvious reasons.
> >
> > You've apparently not encountered the corruption of a database under
> > heavy load where the cache where swapspace has not yet been propagated
> > to disk. Imagine a server running a lot of virtual machines for an
> > image of what an overly aggressive shutdown timeout can do to your
> > otherwise stable systems.
>
>   This sounds serious, and this is the situation in which default
> setting is not correct, no matter if its 15 seconds or 120 seconds.
> The database and VM services should define own timeout (it goes from 0
> to infinity, plenty of values to choose from).

I didn't mean to give Talf a hard time.

systemd has been used to inflict unwelcome timeouts before, so it
should be modified only with caution. I'm especially thinking of that
infamous "let's make systemd responsible for ending logins" change
that broke screen, nohup, tux, and leaving background tasks running.
(See https://lwn.net/Articles/690151/ ) We need to be cautious about
not being able to personally picture why someone would use an existing
default and overriding it casually, and inflicting our new logic on
the unsuspecting existing userbase.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-30 Thread Tomasz Torcz
On Fri, Dec 30, 2022 at 12:59:19AM -0500, Nico Kadel-Garcia wrote:
> On Wed, Dec 28, 2022 at 7:01 AM Ralf Corsépius  wrote:
> >
> > Am 28.12.22 um 11:49 schrieb Peter Boy:
> > >
> > > It is a good idea to make the timeout configurable.  But the default 
> > > timeout for servers must remain unchanged.
> >
> > My problem is not "defined timeouts" it is systemd delaying shutdowns
> > for no obvious reasons.
> 
> You've apparently not encountered the corruption of a database under
> heavy load where the cache where swapspace has not yet been propagated
> to disk. Imagine a server running a lot of virtual machines for an
> image of what an overly aggressive shutdown timeout can do to your
> otherwise stable systems.

  This sounds serious, and this is the situation in which default
setting is not correct, no matter if its 15 seconds or 120 seconds.
The database and VM services should define own timeout (it goes from 0
to infinity, plenty of values to choose from).

-- 
Tomasz Torcz  “If you try to upissue this patchset I shall be 
seeking
to...@pipebreaker.pl   an IP-routable hand grenade.”  — Andrew Morton (LKML)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-30 Thread Michael Catanzaro
On Fri, Dec 30 2022 at 10:42:29 AM +0100, Peter Boy 
 wrote:
But the **default* values must provide the most safe operation 
possible and not require any intervention from the system 
administrator to achieve that.


I think we need to find some way for Workstation to have different 
defaults than Server. Unfortunately, as the change is currently 
implemented, it will apply to all Fedora editions


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-30 Thread Roberto Ragusa

On 12/30/22 06:59, Nico Kadel-Garcia wrote:


You've apparently not encountered the corruption of a database under
heavy load where the cache where swapspace has not yet been propagated
to disk. Imagine a server running a lot of virtual machines for an
image of what an overly aggressive shutdown timeout can do to your
otherwise stable systems.


Wait a moment, if you have memory cached data (dirty pages), the stuff
will reach the disk whatever you do to the processes; the kernel will
absolutely write any dirty page to disk when unmounting the fs.

The problem you are describing can only happen if your database is in
a VM, which gets killed during operation.
But killing a VM is equivalent to suddenly powering off a bare metal,
and if your DB becomes corrupted because of this, it is possibly
a low quality software, abusing the "DB" name.
Additionally, there are options governing how a "sync" in the VM
should be handled (e.g. assure data is in host RAM vs assure
data is in host disks).

Regards.
--
   Roberto Ragusamail at robertoragusa.it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-30 Thread Peter Boy


> Am 30.12.2022 um 06:59 schrieb Nico Kadel-Garcia :
> 
>> Am 28.12.22 um 11:49 schrieb Peter Boy:
>>> 
>>> It is a good idea to make the timeout configurable.  But the default 
>>> timeout for servers must remain unchanged.
>> 
>> My problem is not "defined timeouts" it is systemd delaying shutdowns
>> for no obvious reasons.
...
>> And as you asked: On my (bare metal) servers, Im am occasionally
>> experiencing delayed shutdowns in the order of several minutes.
>> 
>> This is simply inacceptable!

If it is not acceptable for you, configure it to your needs.

But the **default* values must provide the most safe operation possible and not 
require any intervention from the system administrator to achieve that.

This has been a fundamental and outstanding principle of Fedora since long time 
ago, maybe even from the beginning. And I still remember very unpleasantly 
times where I could, of course, go online with a default Fedora installation 
without worrying, but Debian/Ubuntu to my surprise did not even install a 
Firewall, let alone safely preconfigured. 


The current values may be too high. But they have proven themselves at least. 
And as long as we don't have reliable data for more optimal values, it would be 
**negligent** to shorten them out of sheer impatience. The long time frames 
only come into effect anyway when something doesn't run as smoothly as one 
would like. And it is precisely in such a case that caution is called for.



--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-29 Thread Frank Crawford
On Fri, 2022-12-30 at 00:59 -0500, Nico Kadel-Garcia wrote:
> On Wed, Dec 28, 2022 at 7:01 AM Ralf Corsépius 
> wrote:
> > 
> > Am 28.12.22 um 11:49 schrieb Peter Boy:
> > > 
> > > It is a good idea to make the timeout configurable.  But the
> > > default timeout for servers must remain unchanged.
> > 
> > My problem is not "defined timeouts" it is systemd delaying
> > shutdowns
> > for no obvious reasons.
> 
> You've apparently not encountered the corruption of a database under
> heavy load where the cache where swapspace has not yet been
> propagated
> to disk. Imagine a server running a lot of virtual machines for an
> image of what an overly aggressive shutdown timeout can do to your
> otherwise stable systems.
> 
I should ask here, have you timed how long you need the shutdown to be?
Is the current default of 90 sec between progressively stronger signals
sufficient?

Also, given all these complaints about shortening the timeout, how many
people know or have got around to changing either the default or the
timeout for a specific service?

This is all configurable (and yes, I have previously changed the
default, because I felt it was too long).  However, I even found out
while investigating this email, that it is possible for a service to
ask for more time on the fly, although it does take coding with the
appropriate systemd API.

I would suggest reading the man page "systemd.kill" for a better idea
on what actually happens and what is possible.

Regards
Frank
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-29 Thread Nico Kadel-Garcia
On Wed, Dec 28, 2022 at 7:01 AM Ralf Corsépius  wrote:
>
> Am 28.12.22 um 11:49 schrieb Peter Boy:
> >
> > It is a good idea to make the timeout configurable.  But the default 
> > timeout for servers must remain unchanged.
>
> My problem is not "defined timeouts" it is systemd delaying shutdowns
> for no obvious reasons.

You've apparently not encountered the corruption of a database under
heavy load where the cache where swapspace has not yet been propagated
to disk. Imagine a server running a lot of virtual machines for an
image of what an overly aggressive shutdown timeout can do to your
otherwise stable systems.

> And as you asked: On my (bare metal) servers, Im am occasionally
> experiencing delayed shutdowns in the order of several minutes.
>
> This is simply inacceptable!
>
> Ralf

I'm assuming you have very busy filesystems, perhaps not well
configured for their load. There databases that can be *really*
corrupted by interrupted shutdowns, especially when the change has
been written to the disk cache but not yet committed to disk. And some
network services, like NFS, can take way too long to shut down
gracefully, but risk the upstream server if clients are forcefully
shut down.

Nico Kadel-Garcia
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Björn Persson
Fabio Valentini wrote:
> Even if systemd prints nice diagnostic messages, they're useless if
> nobody is going to see them.
> And I doubt that many people know that pressing the Esc key makes
> plymouth go away.

Quite. Troubleshooting information as an Easter egg! Seriously people,
is there some competition to produce the most textless user interface?

Björn Persson


pgpAmFa7dbrFg.pgp
Description: OpenPGP digital signatur
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Stephen Smoogen
On Wed, 28 Dec 2022 at 08:45, Peter Boy  wrote:

>
>
> > Am 28.12.2022 um 13:00 schrieb Ralf Corsépius :
> >
> >
> >
> > Am 28.12.22 um 11:49 schrieb Peter Boy:
> >> It is a good idea to make the timeout configurable.  But the default
> timeout for servers must remain unchanged.
> >
> > My problem is not "defined timeouts" it is systemd delaying shutdowns
> for no obvious reasons.
>
> Yes, but instead of just „pulling the plug“ wouldn’t it be better to hunt
> for the reasons?
>
>
Most of the time, system administrators don't have time to hunt for the
reasons because something else is going to happen (like a UPS dropping
power) or a dozen other things. And once that is fixed the server is going
to stay up until the next major crap I need to have everything rebooted/off
in an outage window. Theoretically this testing should be done in in a
staging environment, but I have only seen 3 places in 40 years with any
time or ability to do so. Most of the places I have worked have had
'staging' environments which are really spare parts for production or
actually in some level of production for other departments to 'stage' their
code.

I think 30 seconds is going to be a better fit for most services. Ones
which need a longer time can override it in their service files and those
will be easier to find.



>
> > And as you asked: On my (bare metal) servers, Im am occasionally
> experiencing delayed shutdowns in the order of several minutes.
> >
> > This is simply inacceptable!
>
> Yes, but then always simply pulling the plug is not acceptable either, I
> think.
>
>
>
> --
> Peter Boy
> https://fedoraproject.org/wiki/User:Pboy
> p...@fedoraproject.org
>
> Timezone: CET (UTC+1) / CEST (UTC+2)
>
>
> Fedora Server Edition Working Group member
> Fedora docs team contributor
> Java developer and enthusiast
>
>
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>


-- 
Stephen Smoogen, Red Hat Automotive
Let us be kind to one another, for most of us are fighting a hard battle.
-- Ian MacClaren
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Peter Boy


> Am 28.12.2022 um 15:23 schrieb Neal Gompa :
> 
> On Wed, Dec 28, 2022 at 9:13 AM Peter Boy  wrote:
>> 
>> 
>> 
>>> Am 28.12.2022 um 13:34 schrieb Neal Gompa :
>>> 
>>> On Wed, Dec 28, 2022 at 7:25 AM Frank Crawford  
>>> wrote:
 
 ...
 
 I'd also note that this has always been configurable, it is just now
 suggesting a different value from the default.
>>> 
>>> I would also suggest that if you're shutting down/rebooting a server,
>>> then you really want that to be done sooner rather than later, so
>>> stuff getting killed as it shuts down if it doesn't do it normally
>>> after SIGTERM is probably fine, because they're likely hung.
>>> 
>>> Waiting a half hour for a system to reboot is not acceptable.
>> 
>> Indeed, but it is not acceptable because the configuration of the timeout is 
>> too long, but because there is probably something earnestly going wrong. And 
>> as a server admin, I would like to decide myself to wait patiently and hope, 
>> to evaluate the issue, or to pull the plug. I don't want to leave that to a 
>> stupid timeout.
>> 
>> 
>>> 15 or 30 seconds is probably fine, even for servers, because of the
>>> nature of how systemd processes this timeout. It's per service being
>>> shut down, rather than a global timeout.
>> 
>> Yeah, *probably* fine, maybe not. We don’t know, just guess. And as the 
>> saying goes: better safe than sorry.
>> 
>> And the 2 mins are empirically obviously a safe solution.
>> 
> 
> Is it? If the end result is the same, it doesn't matter whether it's
> 30 seconds or 2 minutes.

Yes indeed. But it’s the sysadmin who decides to reset after those "15 or 30" 
secs. And that makes a difference.

And mind you, it is not the  common case that a standalone server with maybe a 
few VMs and a moderate workload needs those 10 minutes quoted here for 
shutdown. It is the exception and an urgent reason to take a serious look not 
at the 10 mins but at the server.

For desktops, it's probably a different tradeoff. 


--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Neal Gompa
On Wed, Dec 28, 2022 at 9:13 AM Peter Boy  wrote:
>
>
>
> > Am 28.12.2022 um 13:34 schrieb Neal Gompa :
> >
> > On Wed, Dec 28, 2022 at 7:25 AM Frank Crawford  
> > wrote:
> >>
> >> ...
> >>
> >> I'd also note that this has always been configurable, it is just now
> >> suggesting a different value from the default.
> >
> > I would also suggest that if you're shutting down/rebooting a server,
> > then you really want that to be done sooner rather than later, so
> > stuff getting killed as it shuts down if it doesn't do it normally
> > after SIGTERM is probably fine, because they're likely hung.
> >
> > Waiting a half hour for a system to reboot is not acceptable.
>
> Indeed, but it is not acceptable because the configuration of the timeout is 
> too long, but because there is probably something earnestly going wrong. And 
> as a server admin, I would like to decide myself to wait patiently and hope, 
> to evaluate the issue, or to pull the plug. I don't want to leave that to a 
> stupid timeout.
>
>
> > 15 or 30 seconds is probably fine, even for servers, because of the
> > nature of how systemd processes this timeout. It's per service being
> > shut down, rather than a global timeout.
>
> Yeah, *probably* fine, maybe not. We don’t know, just guess. And as the 
> saying goes: better safe than sorry.
>
> And the 2 mins are empirically obviously a safe solution.
>

Is it? If the end result is the same, it doesn't matter whether it's
30 seconds or 2 minutes.



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Peter Boy


> Am 28.12.2022 um 13:34 schrieb Neal Gompa :
> 
> On Wed, Dec 28, 2022 at 7:25 AM Frank Crawford  
> wrote:
>> 
>> ...
>> 
>> I'd also note that this has always been configurable, it is just now
>> suggesting a different value from the default.
> 
> I would also suggest that if you're shutting down/rebooting a server,
> then you really want that to be done sooner rather than later, so
> stuff getting killed as it shuts down if it doesn't do it normally
> after SIGTERM is probably fine, because they're likely hung.
> 
> Waiting a half hour for a system to reboot is not acceptable.

Indeed, but it is not acceptable because the configuration of the timeout is 
too long, but because there is probably something earnestly going wrong. And as 
a server admin, I would like to decide myself to wait patiently and hope, to 
evaluate the issue, or to pull the plug. I don't want to leave that to a stupid 
timeout.  


> 15 or 30 seconds is probably fine, even for servers, because of the
> nature of how systemd processes this timeout. It's per service being
> shut down, rather than a global timeout.

Yeah, *probably* fine, maybe not. We don’t know, just guess. And as the saying 
goes: better safe than sorry.

And the 2 mins are empirically obviously a safe solution. 






--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Neal Gompa
On Wed, Dec 28, 2022 at 8:45 AM Peter Boy  wrote:
>
>
>
> > Am 28.12.2022 um 13:00 schrieb Ralf Corsépius :
> >
> >
> >
> > Am 28.12.22 um 11:49 schrieb Peter Boy:
> >> It is a good idea to make the timeout configurable.  But the default 
> >> timeout for servers must remain unchanged.
> >
> > My problem is not "defined timeouts" it is systemd delaying shutdowns for 
> > no obvious reasons.
>
> Yes, but instead of just „pulling the plug“ wouldn’t it be better to hunt for 
> the reasons?
>
>
> > And as you asked: On my (bare metal) servers, Im am occasionally 
> > experiencing delayed shutdowns in the order of several minutes.
> >
> > This is simply inacceptable!
>
> Yes, but then always simply pulling the plug is not acceptable either, I 
> think.
>

Actually, in most server environments I've operated in, it probably is
something people can deal with better because there's a plan for it.



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Peter Boy


> Am 28.12.2022 um 13:00 schrieb Ralf Corsépius :
> 
> 
> 
> Am 28.12.22 um 11:49 schrieb Peter Boy:
>> It is a good idea to make the timeout configurable.  But the default timeout 
>> for servers must remain unchanged.
> 
> My problem is not "defined timeouts" it is systemd delaying shutdowns for no 
> obvious reasons.

Yes, but instead of just „pulling the plug“ wouldn’t it be better to hunt for 
the reasons?


> And as you asked: On my (bare metal) servers, Im am occasionally experiencing 
> delayed shutdowns in the order of several minutes.
> 
> This is simply inacceptable!

Yes, but then always simply pulling the plug is not acceptable either, I think. 




--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Fabio Valentini
On Sat, Dec 24, 2022 at 4:38 AM Steve Grubb  wrote:
>
> On Friday, December 23, 2022 1:34:48 PM EST Alexander Ploumistos wrote:
> > On Fri, Dec 23, 2022 at 7:21 PM Steve Grubb  wrote:
> > > This is nice, but all I ever seen is a black screen and a spinning
> > > circle. No  text of any kind. If something were written to the console,
> > > how do you see it?
> >
> > Have you tried hitting "Esc" when that happens?
>
> No. Why would I? There is no text on that screen that even mentions that is a
> possible option. If that is possible, advertise it. Or better, kill the
> graphical  shutdown and explain why it's delayed.

Even if systemd prints nice diagnostic messages, they're useless if
nobody is going to see them.
And I doubt that many people know that pressing the Esc key makes
plymouth go away.

Would it be possible to print an informative message in Plymouth
instead? Something like "Shutdown is taking longer than expected,
please do not force off the computer". Other parts of the system
already use Plymouth for communicating other things to the user
(asking for LUKS decryption password, showing system upgrade process,
etc.) so I think that would make sense. It would also be far more
obvious than "you need to press the Esc key to see what's going on -
but don't ask me why it is that way or how I know that" ...

Fabio
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Neal Gompa
On Wed, Dec 28, 2022 at 7:25 AM Frank Crawford  wrote:
>
> On Wed, 2022-12-28 at 13:00 +0100, Ralf Corsépius wrote:
> >
> >
> > Am 28.12.22 um 11:49 schrieb Peter Boy:
> > >
> > > It is a good idea to make the timeout configurable.  But the
> > > default timeout for servers must remain unchanged.
> >
> > My problem is not "defined timeouts" it is systemd delaying shutdowns
> > for no obvious reasons.
> >
> > And as you asked: On my (bare metal) servers, Im am occasionally
> > experiencing delayed shutdowns in the order of several minutes.
> >
> > This is simply inacceptable!
>
> At one stage I timed this for an NFS failure and it took 30-40mins to
> finally timeout and reboot.  In some cases, especially related to
> filesystems and umounts it will try a number of time, eventually give
> up after 3 * 90s, and then come back to it again later, going through
> the whole process again.  Worse still it had already had issues with
> stopping executable on that filesystem, as not everything is run in
> parallel.
>
> However, given some of the arguments raised, it may be worth looking a
> different values for workstations/laptops against real servers.  If a
> workstation or laptop doesn't reboot in a minute or so, they will
> usually get hit with a force reset.  Real server installations are very
> different, they will almost always be left to reboot "eventually".
>
> I'd also note that this has always been configurable, it is just now
> suggesting a different value from the default.

I would also suggest that if you're shutting down/rebooting a server,
then you really want that to be done sooner rather than later, so
stuff getting killed as it shuts down if it doesn't do it normally
after SIGTERM is probably fine, because they're likely hung.

Waiting a half hour for a system to reboot is not acceptable.

15 or 30 seconds is probably fine, even for servers, because of the
nature of how systemd processes this timeout. It's per service being
shut down, rather than a global timeout.




--
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Frank Crawford
On Wed, 2022-12-28 at 13:00 +0100, Ralf Corsépius wrote:
> 
> 
> Am 28.12.22 um 11:49 schrieb Peter Boy:
> > 
> > It is a good idea to make the timeout configurable.  But the
> > default timeout for servers must remain unchanged.
> 
> My problem is not "defined timeouts" it is systemd delaying shutdowns
> for no obvious reasons.
> 
> And as you asked: On my (bare metal) servers, Im am occasionally 
> experiencing delayed shutdowns in the order of several minutes.
> 
> This is simply inacceptable!

At one stage I timed this for an NFS failure and it took 30-40mins to
finally timeout and reboot.  In some cases, especially related to
filesystems and umounts it will try a number of time, eventually give
up after 3 * 90s, and then come back to it again later, going through
the whole process again.  Worse still it had already had issues with
stopping executable on that filesystem, as not everything is run in
parallel.

However, given some of the arguments raised, it may be worth looking a
different values for workstations/laptops against real servers.  If a
workstation or laptop doesn't reboot in a minute or so, they will
usually get hit with a force reset.  Real server installations are very
different, they will almost always be left to reboot "eventually".

I'd also note that this has always been configurable, it is just now
suggesting a different value from the default.
> 
> Ralf

Regards
Frank
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Ralf Corsépius



Am 28.12.22 um 11:49 schrieb Peter Boy:


It is a good idea to make the timeout configurable.  But the default timeout 
for servers must remain unchanged.


My problem is not "defined timeouts" it is systemd delaying shutdowns 
for no obvious reasons.


And as you asked: On my (bare metal) servers, Im am occasionally 
experiencing delayed shutdowns in the order of several minutes.


This is simply inacceptable!

Ralf
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-28 Thread Peter Boy


> Am 22.12.2022 um 19:29 schrieb Adam Williamson :
> 
> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
>> On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
>>> https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>>> 
>>> This document represents a proposed Change. As part of the Changes
>>> process, proposals are publicly announced in order to receive
>>> community feedback. This proposal will only be implemented if approved
>>> by the Fedora Engineering Steering Committee.
>>> 
>>> == Summary ==
>>> A downstream configuration change to reduce the systemd unit timeout
>>> from 2 minutes to 15 seconds.
>> 
>>  Great change, please do it!
>> Also, sometimes after reaching the timeout, systemd extends wait by
>> another 2 minutes (or 1m30). I wasn't able to find in the sources or
>> documentation why this happens, but this behaviour should be blocked.
>> Otherwise some services after 15s will get another 15, and then another…
> 
> 15 seconds feels very aggressive to me. I can think of some cases, like
> libvirtd automatically suspending or cleanly shutting down running VMs,
> that might well take longer than that. Could we not go for 30 seconds?
> Going all the way from 90/120 down to 15 seems pretty radical.

Even though I am a bit late due to seasonal commitments, I would like to affirm 
a strong rejection of this change for the server variant on behalf of the 
server WG. 

The proposed limit of 15 secs is much to aggressiv, also the proposal of 30 
seconds. Unfortunately, we do not have data on how long on average it takes for 
a productive server shutdown  to terminate all VMs (with their own delays), all 
open database transactions, all service sessions, etc. But obviously the 
current time interval does not cause any striking problems. and is therefore OK 
for server. For a production server, a shutdown is a rare event and the main 
problem is not a period of 30 sec or 2 min, but the shutdown itself. And server 
admins are terribly conservative (and cautious) people, never change a working 
system unless there is a clear advantage. And this proposal brings no advantage 
at all for servers, only potential problems.

It is a good idea to make the timeout configurable.  But the default timeout 
for servers must remain unchanged. 





--
Peter Boy
https://fedoraproject.org/wiki/User:Pboy
p...@fedoraproject.org

Timezone: CET (UTC+1) / CEST (UTC+2)


Fedora Server Edition Working Group member
Fedora docs team contributor and board member
Java developer and enthusiast


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Steve Grubb
On Friday, December 23, 2022 1:34:48 PM EST Alexander Ploumistos wrote:
> On Fri, Dec 23, 2022 at 7:21 PM Steve Grubb  wrote:
> > This is nice, but all I ever seen is a black screen and a spinning
> > circle. No  text of any kind. If something were written to the console,
> > how do you see it?
> 
> Have you tried hitting "Esc" when that happens?

No. Why would I? There is no text on that screen that even mentions that is a 
possible option. If that is possible, advertise it. Or better, kill the 
graphical  shutdown and explain why it's delayed.

-Steve

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Alexander Ploumistos
On Fri, Dec 23, 2022 at 7:21 PM Steve Grubb  wrote:
>
> This is nice, but all I ever seen is a black screen and a spinning circle. No
> text of any kind. If something were written to the console, how do you see
> it?

Have you tried hitting "Esc" when that happens?
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Steve Grubb
Hello,

On Friday, December 23, 2022 9:48:22 AM EST Zbigniew Jędrzejewski-Szmek 
wrote:
> On Fri, Dec 23, 2022 at 08:09:56AM +0100, Tomasz Torcz wrote:
> > On Thu, Dec 22, 2022 at 05:22:09PM -0500, Steve Grubb wrote:
> > > On Thursday, December 22, 2022 1:29:29 PM EST Adam Williamson wrote:
> > > > 15 seconds feels very aggressive to me. I can think of some cases,
> > > > like libvirtd automatically suspending or cleanly shutting down
> > > > running VMs, that might well take longer than that. Could we not go
> > > > for 30 seconds? Going all the way from 90/120 down to 15 seems pretty
> > > > radical.
> > > 
> > > I run across this with some regularity. PackageKit is not installed on
> > > my system. What I wished was that when there is a stall shutting
> > > down, a message to the console or a dialog box explains who is holding
> > > up shutdown. If we knew who was holding things up, bugs might get
> > > filed.
> > 
> >   But there already is such message! First "waiting for shutdown" with
> > unit name and a timer. Then, in the last phase there's also "wating for
> > process:" message.
> 
> Yeah. We added printing of a lot of information in
> https://github.com/systemd/systemd/commit/3889fc6fc347f0e12070f7b873d065508
> c8df816:
 
> Example output:

This is nice, but all I ever seen is a black screen and a spinning circle. No 
text of any kind. If something were written to the console, how do you see 
it? If stalls were detected, maybe the graphical shutdown should be suspended 
so that you can see the console.

-Steve
 
>  Stopping user@1000.service...
> [  OK  ] Stopped dracut-shutdown.service.
> [  OK  ] Stopped systemd-logind.service.
> [  OK  ] Stopped systemd-logind.service - User Login Management.
> [* ] Job user@1000.service/stop running (2s / 2min): (1 of 2) User job
> slowstop.service/stop running (1s / 1min 30s)...
> [***   ] Job
> user@1000.service/stop running (3s / 2min): (2 of 2) User job
> slowstop2.service/stop running (2s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (4s / 2min): (1 of 2) User job
> slowstop.service/stop running (4s / 1min 30s)... [ *] Job
> user@1000.service/stop running (5s / 2min): (1 of 2) User job
> slowstop.service/stop running (5s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (6s / 2min): (2 of 2) User job
> slowstop2.service/stop running (6s / 1min 30s)... [***   ] Job
> user@1000.service/stop running (8s / 2min): (1 of 2) User job
> slowstop.service/stop running (7s / 1min 30s)... [***   ] Job
> user@1000.service/stop running (10s / 2min): (2 of 2) User job
> slowstop2.service/stop running (9s / 1min 30s)... [  *** ] Job
> user@1000.service/stop running (11s / 2min): (1 of 2) User job
> slowstop.service/stop running (10s / 1min 30s)... [ *] Job
> user@1000.service/stop running (12s / 2min): (2 of 2) User job
> slowstop2.service/stop running (12s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (13s / 2min): (1 of 2) User job
> slowstop.service/stop running (13s / 1min 30s)... [***   ] Job
> user@1000.service/stop running (15s / 2min): (2 of 2) User job
> slowstop2.service/stop running (14s / 1min 30s)... [* ] Job
> user@1000.service/stop running (15s / 2min): (2 of 2) User job
> slowstop2.service/stop running (14s / 1min 30s)... [***   ] Job
> user@1000.service/stop running (16s / 2min): User job
> slowstop.service/stop running (16s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (18s / 2min): User job
> slowstop.service/stop running (17s / 1min 30s)... [ *] Job
> user@1000.service/stop running (19s / 2min): User job
> slowstop.service/stop running (18s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (20s / 2min): User job
> slowstop.service/stop running (19s / 1min 30s)... [* ] Job
> user@1000.service/stop running (22s / 2min): User job
> slowstop.service/stop running (22s / 1min 30s)... [**] Job
> user@1000.service/stop running (30s / 2min): User job
> slowstop.service/stop running (29s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (32s / 2min): User job
> slowstop.service/stop running (31s / 1min 30s)... [ *] Job
> user@1000.service/stop running (33s / 2min): User job
> slowstop.service/stop running (32s / 1min 30s)... [   ***] Job
> user@1000.service/stop running (34s / 2min): User job
> slowstop.service/stop running (33s / 1min 30s)... [**] Job
> user@1000.service/stop running (37s / 2min): User job
> slowstop.service/stop running (36s / 1min 30s)... [  *** ] Job
> user@1000.service/stop running (41s / 2min): User job
> slowstop.service/stop running (41s / 1min 30s)... [  OK  ] Stopped
> user@1000.service - User Manager for UID 1000.
>  Stopping user-runtime-dir@1000.service - User Runtime Directory
> /run/user/1000...
> [  OK  ] Unmounted run-user-1000.mount -
> /run/user/1000.
> [  OK  ] Stopped user-runtime-dir@1000.service - User Runtime Directory
> /run/user/1000.
 
> Futher ideas how to improve things are welcome… 

Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Chris Murphy


On Fri, Dec 23, 2022, at 12:56 AM, Demi Marie Obenour wrote:

> Why cache mode unsafe?  How big a performance win is it?

Huge. In effect fsync is ignored. So if the host dies, write order is not 
guaranteed and can toast the guest file system.

The guest dying shouldn't pose a problem because the write order is eventually 
honored by the host. There's a variety of complex journal replay behaviors of 
the various file systems that'll come into play (no pun intended).

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Steve Grubb
Hello,

On Friday, December 23, 2022 6:52:02 AM EST Tom Hughes via devel wrote:
> On 23/12/2022 11:45, Naheem Zaffar wrote:
> > On Fri, 23 Dec 2022 at 08:26, Vitaly Zaitsev via devel 
> > mailto:devel@lists.fedoraproject.org>> 
> > wrote:
> > On 23/12/2022 09:20, Mattia Verga via devel wrote:
> >  > I know this is way harder, but the right approach would be having
> > a way
> >  > to tell systemd what processes can be killed and what other
> >  > processes
> >  > must not be forced off in any case, then display a user friendly
> > message
> >  > which inform the user that the system cannot be forced off ATM
> > "because
> >  > I'm doing this or that". In the worst case, the user can choose
> > to pull
> >  > the plug themselves.
> > 
> > I agree. Terminating the PackageKit service while updates are being
> > installed can result in a broken system.
> > 
> > Is there a way to be smarter about all this?
> > 
> > 1. Set default at 15s or something short.
> > 2. For services known to require longer (older pinephone modem firmware,
> > libvirtd), allow a larger timeout for that specific service only
> > 3. For services that should NOT be terminated have a mechanism for them 
> > to not be cut off
> 
> Despite the title of this change I believe the proposal is only
> to change the default timeout and a service would still be able to
> set a different timeout in it's service file.

I wonder if this proposal should also include  verifying that known problem 
services have an appropriate override?

-Steve

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Neil Hanlon
if you have persistent Journaling enabled, you'll find them in your last
boot log

On Fri, Dec 23, 2022, 08:55 Mattia Verga via devel <
devel@lists.fedoraproject.org> wrote:

> Il 23/12/22 15:48, Zbigniew Jędrzejewski-Szmek ha scritto:
> >
> > Yeah. We added printing of a lot of information in
> >
> https://github.com/systemd/systemd/commit/3889fc6fc347f0e12070f7b873d065508c8df816
> :
> >
> > Example output:
> >
> >   Stopping user@1000.service...
> > [  OK  ] Stopped dracut-shutdown.service.
> > [  OK  ] Stopped systemd-logind.service.
> > [  OK  ] Stopped systemd-logind.service - User Login Management.
> > [* ] Job user@1000.service/stop running (2s / 2min): (1 of 2) User
> job slowstop.service/stop running (1s / 1min 30s)...
> > [***   ] Job user@1000.service/stop running (3s / 2min): (2 of 2) User
> job slowstop2.service/stop running (2s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (4s / 2min): (1 of 2) User
> job slowstop.service/stop running (4s / 1min 30s)...
> > [ *] Job user@1000.service/stop running (5s / 2min): (1 of 2) User
> job slowstop.service/stop running (5s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (6s / 2min): (2 of 2) User
> job slowstop2.service/stop running (6s / 1min 30s)...
> > [***   ] Job user@1000.service/stop running (8s / 2min): (1 of 2) User
> job slowstop.service/stop running (7s / 1min 30s)...
> > [***   ] Job user@1000.service/stop running (10s / 2min): (2 of 2) User
> job slowstop2.service/stop running (9s / 1min 30s)...
> > [  *** ] Job user@1000.service/stop running (11s / 2min): (1 of 2) User
> job slowstop.service/stop running (10s / 1min 30s)...
> > [ *] Job user@1000.service/stop running (12s / 2min): (2 of 2) User
> job slowstop2.service/stop running (12s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (13s / 2min): (1 of 2) User
> job slowstop.service/stop running (13s / 1min 30s)...
> > [***   ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User
> job slowstop2.service/stop running (14s / 1min 30s)...
> > [* ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User
> job slowstop2.service/stop running (14s / 1min 30s)...
> > [***   ] Job user@1000.service/stop running (16s / 2min): User job
> slowstop.service/stop running (16s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (18s / 2min): User job
> slowstop.service/stop running (17s / 1min 30s)...
> > [ *] Job user@1000.service/stop running (19s / 2min): User job
> slowstop.service/stop running (18s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (20s / 2min): User job
> slowstop.service/stop running (19s / 1min 30s)...
> > [* ] Job user@1000.service/stop running (22s / 2min): User job
> slowstop.service/stop running (22s / 1min 30s)...
> > [**] Job user@1000.service/stop running (30s / 2min): User job
> slowstop.service/stop running (29s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (32s / 2min): User job
> slowstop.service/stop running (31s / 1min 30s)...
> > [ *] Job user@1000.service/stop running (33s / 2min): User job
> slowstop.service/stop running (32s / 1min 30s)...
> > [   ***] Job user@1000.service/stop running (34s / 2min): User job
> slowstop.service/stop running (33s / 1min 30s)...
> > [**] Job user@1000.service/stop running (37s / 2min): User job
> slowstop.service/stop running (36s / 1min 30s)...
> > [  *** ] Job user@1000.service/stop running (41s / 2min): User job
> slowstop.service/stop running (41s / 1min 30s)...
> > [  OK  ] Stopped user@1000.service - User Manager for UID 1000.
> >   Stopping user-runtime-dir@1000.service - User Runtime
> Directory /run/user/1000...
> > [  OK  ] Unmounted run-user-1000.mount - /run/user/1000.
> > [  OK  ] Stopped user-runtime-dir@1000.service - User Runtime Directory
> /run/user/1000.
> >
> > Futher ideas how to improve things are welcome… But I think that right
> now we
> > report enough to narrow things down to system or user service that is
> blocking
> > shutdown.
> >
> > Zbyszek
>
> Is that output available in any log at the next system startup? I
> usually hit the power button and then run away, but I would like to
> check if anything went slow when I restart my PC.
>
> Mattia
>
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 

Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Mattia Verga via devel
Il 23/12/22 15:48, Zbigniew Jędrzejewski-Szmek ha scritto:
>
> Yeah. We added printing of a lot of information in
> https://github.com/systemd/systemd/commit/3889fc6fc347f0e12070f7b873d065508c8df816:
>
> Example output:
>
>   Stopping user@1000.service...
> [  OK  ] Stopped dracut-shutdown.service.
> [  OK  ] Stopped systemd-logind.service.
> [  OK  ] Stopped systemd-logind.service - User Login Management.
> [* ] Job user@1000.service/stop running (2s / 2min): (1 of 2) User job 
> slowstop.service/stop running (1s / 1min 30s)...
> [***   ] Job user@1000.service/stop running (3s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (2s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (4s / 2min): (1 of 2) User job 
> slowstop.service/stop running (4s / 1min 30s)...
> [ *] Job user@1000.service/stop running (5s / 2min): (1 of 2) User job 
> slowstop.service/stop running (5s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (6s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (6s / 1min 30s)...
> [***   ] Job user@1000.service/stop running (8s / 2min): (1 of 2) User job 
> slowstop.service/stop running (7s / 1min 30s)...
> [***   ] Job user@1000.service/stop running (10s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (9s / 1min 30s)...
> [  *** ] Job user@1000.service/stop running (11s / 2min): (1 of 2) User job 
> slowstop.service/stop running (10s / 1min 30s)...
> [ *] Job user@1000.service/stop running (12s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (12s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (13s / 2min): (1 of 2) User job 
> slowstop.service/stop running (13s / 1min 30s)...
> [***   ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (14s / 1min 30s)...
> [* ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User job 
> slowstop2.service/stop running (14s / 1min 30s)...
> [***   ] Job user@1000.service/stop running (16s / 2min): User job 
> slowstop.service/stop running (16s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (18s / 2min): User job 
> slowstop.service/stop running (17s / 1min 30s)...
> [ *] Job user@1000.service/stop running (19s / 2min): User job 
> slowstop.service/stop running (18s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (20s / 2min): User job 
> slowstop.service/stop running (19s / 1min 30s)...
> [* ] Job user@1000.service/stop running (22s / 2min): User job 
> slowstop.service/stop running (22s / 1min 30s)...
> [**] Job user@1000.service/stop running (30s / 2min): User job 
> slowstop.service/stop running (29s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (32s / 2min): User job 
> slowstop.service/stop running (31s / 1min 30s)...
> [ *] Job user@1000.service/stop running (33s / 2min): User job 
> slowstop.service/stop running (32s / 1min 30s)...
> [   ***] Job user@1000.service/stop running (34s / 2min): User job 
> slowstop.service/stop running (33s / 1min 30s)...
> [**] Job user@1000.service/stop running (37s / 2min): User job 
> slowstop.service/stop running (36s / 1min 30s)...
> [  *** ] Job user@1000.service/stop running (41s / 2min): User job 
> slowstop.service/stop running (41s / 1min 30s)...
> [  OK  ] Stopped user@1000.service - User Manager for UID 1000.
>   Stopping user-runtime-dir@1000.service - User Runtime Directory 
> /run/user/1000...
> [  OK  ] Unmounted run-user-1000.mount - /run/user/1000.
> [  OK  ] Stopped user-runtime-dir@1000.service - User Runtime Directory 
> /run/user/1000.
>
> Futher ideas how to improve things are welcome… But I think that right now we
> report enough to narrow things down to system or user service that is blocking
> shutdown.
>
> Zbyszek

Is that output available in any log at the next system startup? I
usually hit the power button and then run away, but I would like to
check if anything went slow when I restart my PC.

Mattia

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> 
> This document represents a proposed Change. As part of the Changes
> process, proposals are publicly announced in order to receive
> community feedback. This proposal will only be implemented if approved
> by the Fedora Engineering Steering Committee.
> 
> == Summary ==
> A downstream configuration change to reduce the systemd unit timeout
> from 2 minutes to 15 seconds.
> 
> == Owner ==
> * Name: catanzaro
> * Email: mcatanzaro at redhat dot com
> * Name: aday
> * Email: aday at redhat dot com
> 
> 
> == Detailed Description ==
> Currently, a service that fails to stop at shutdown time can block
> shutdown for up to 2 minutes. This is extremely frustrating for our
> users - someone goes to shutdown or reboot their system, and then
> unexpectedly has to wait for a long time before they can do anything
> else.
> 
> The most common service to cause this issue is PackageKit, but there are 
> others.
> 
> When a service fails to shutdown when it is instructed to do so, it is
> not behaving properly, and it is preventing the system from behaving
> in an orderly and predictable manner. Desktop APIs exist for cases
> when services or apps legitimately need to prevent shutdown, and these
> allow the shutdown inhibit to be communicated to admins and users, so
> they understand what is happening. When the user decides to shut down
> anyway, services must terminate in a timely manner. The Workstation
> Working Group feels that 15 seconds is the maximum appropriate time
> for both system and user services, and that Fedora should be robust to
> buggy and misbehaving services that do not shut down in an appropriate
> manner.
> 
> === History ===
> 
> The Workstation Working Group has been
> [https://pagure.io/fedora-workstation/issue/163 working on this issue
> for several years]. Investigations have revealed that it's not
> possible to fix every misbehaving service: in some cases the
> misbehaviour comes from design flaws that are difficult to resolve.
> 
> An attempt has also been
> [https://github.com/systemd/systemd/pull/18386 made to have the unit
> timeout changed in upstream systemd]. That attempt did not go
> anywhere, despite various efforts to move it along. We are no longer
> comfortable waiting for upstream changes to land.
> 
> To our knowledge, there are no issues that will result from forcing
> services to stop after 15 seconds on typical systems. However, system
> administrators may need to configure a higher timeout if waiting
> longer for a particular service, which may be true for database
> services, for example.

I hope we can finally get this done. I'm sorry for my part in having this
stalled for so long without any progress. It never seemed like it's safe to
do. And as the discussion so far in this thread shows, there'll be some
potential issues in specific setups (databases, VMs, pinephones), so I think
that going through the Change process is the right way. At least it'll be
visible enough to get feedback and add workarounds where necessary.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Dec 23, 2022 at 08:09:56AM +0100, Tomasz Torcz wrote:
> On Thu, Dec 22, 2022 at 05:22:09PM -0500, Steve Grubb wrote:
> > On Thursday, December 22, 2022 1:29:29 PM EST Adam Williamson wrote:
> > > 15 seconds feels very aggressive to me. I can think of some cases, like
> > > libvirtd automatically suspending or cleanly shutting down running VMs,
> > > that might well take longer than that. Could we not go for 30 seconds?
> > > Going all the way from 90/120 down to 15 seems pretty radical.
> > 
> > I run across this with some regularity. PackageKit is not installed on my 
> > system. What I wished was that when there is a stall shutting down, a 
> > message 
> > to the console or a dialog box explains who is holding up shutdown. If we 
> > knew who was holding things up, bugs might get filed.
> 
>   But there already is such message! First "waiting for shutdown" with
> unit name and a timer. Then, in the last phase there's also "wating for 
> process:"
> message.

Yeah. We added printing of a lot of information in
https://github.com/systemd/systemd/commit/3889fc6fc347f0e12070f7b873d065508c8df816:

Example output:

 Stopping user@1000.service...
[  OK  ] Stopped dracut-shutdown.service.
[  OK  ] Stopped systemd-logind.service.
[  OK  ] Stopped systemd-logind.service - User Login Management.
[* ] Job user@1000.service/stop running (2s / 2min): (1 of 2) User job 
slowstop.service/stop running (1s / 1min 30s)...
[***   ] Job user@1000.service/stop running (3s / 2min): (2 of 2) User job 
slowstop2.service/stop running (2s / 1min 30s)...
[   ***] Job user@1000.service/stop running (4s / 2min): (1 of 2) User job 
slowstop.service/stop running (4s / 1min 30s)...
[ *] Job user@1000.service/stop running (5s / 2min): (1 of 2) User job 
slowstop.service/stop running (5s / 1min 30s)...
[   ***] Job user@1000.service/stop running (6s / 2min): (2 of 2) User job 
slowstop2.service/stop running (6s / 1min 30s)...
[***   ] Job user@1000.service/stop running (8s / 2min): (1 of 2) User job 
slowstop.service/stop running (7s / 1min 30s)...
[***   ] Job user@1000.service/stop running (10s / 2min): (2 of 2) User job 
slowstop2.service/stop running (9s / 1min 30s)...
[  *** ] Job user@1000.service/stop running (11s / 2min): (1 of 2) User job 
slowstop.service/stop running (10s / 1min 30s)...
[ *] Job user@1000.service/stop running (12s / 2min): (2 of 2) User job 
slowstop2.service/stop running (12s / 1min 30s)...
[   ***] Job user@1000.service/stop running (13s / 2min): (1 of 2) User job 
slowstop.service/stop running (13s / 1min 30s)...
[***   ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User job 
slowstop2.service/stop running (14s / 1min 30s)...
[* ] Job user@1000.service/stop running (15s / 2min): (2 of 2) User job 
slowstop2.service/stop running (14s / 1min 30s)...
[***   ] Job user@1000.service/stop running (16s / 2min): User job 
slowstop.service/stop running (16s / 1min 30s)...
[   ***] Job user@1000.service/stop running (18s / 2min): User job 
slowstop.service/stop running (17s / 1min 30s)...
[ *] Job user@1000.service/stop running (19s / 2min): User job 
slowstop.service/stop running (18s / 1min 30s)...
[   ***] Job user@1000.service/stop running (20s / 2min): User job 
slowstop.service/stop running (19s / 1min 30s)...
[* ] Job user@1000.service/stop running (22s / 2min): User job 
slowstop.service/stop running (22s / 1min 30s)...
[**] Job user@1000.service/stop running (30s / 2min): User job 
slowstop.service/stop running (29s / 1min 30s)...
[   ***] Job user@1000.service/stop running (32s / 2min): User job 
slowstop.service/stop running (31s / 1min 30s)...
[ *] Job user@1000.service/stop running (33s / 2min): User job 
slowstop.service/stop running (32s / 1min 30s)...
[   ***] Job user@1000.service/stop running (34s / 2min): User job 
slowstop.service/stop running (33s / 1min 30s)...
[**] Job user@1000.service/stop running (37s / 2min): User job 
slowstop.service/stop running (36s / 1min 30s)...
[  *** ] Job user@1000.service/stop running (41s / 2min): User job 
slowstop.service/stop running (41s / 1min 30s)...
[  OK  ] Stopped user@1000.service - User Manager for UID 1000.
 Stopping user-runtime-dir@1000.service - User Runtime Directory 
/run/user/1000...
[  OK  ] Unmounted run-user-1000.mount - /run/user/1000.
[  OK  ] Stopped user-runtime-dir@1000.service - User Runtime Directory 
/run/user/1000.

Futher ideas how to improve things are welcome… But I think that right now we
report enough to narrow things down to system or user service that is blocking
shutdown.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 

Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Zbigniew Jędrzejewski-Szmek
On Thu, Dec 22, 2022 at 10:40:23PM +0100, allan2016--- via devel wrote:
> På Thu, 22 Dec 2022 10:29:29 -0800
> Adam Williamson  skrev:
> > On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
> > > On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:  
> > > > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> > > > 
> > > > This document represents a proposed Change. As part of the Changes
> > > > process, proposals are publicly announced in order to receive
> > > > community feedback. This proposal will only be implemented if
> > > > approved by the Fedora Engineering Steering Committee.
> > > > 
> > > > == Summary ==
> > > > A downstream configuration change to reduce the systemd unit
> > > > timeout from 2 minutes to 15 seconds.  
> > > 
> > >   Great change, please do it!
> > > Also, sometimes after reaching the timeout, systemd extends wait by
> > > another 2 minutes (or 1m30). I wasn't able to find in the sources or
> > > documentation why this happens, but this behaviour should be
> > > blocked. Otherwise some services after 15s will get another 15, and
> > > then another…  
> > 
> > 15 seconds feels very aggressive to me. I can think of some cases,
> > like libvirtd automatically suspending or cleanly shutting down
> > running VMs, that might well take longer than that. Could we not go
> > for 30 seconds? Going all the way from 90/120 down to 15 seems pretty
> > radical.
> 
> 15 seconds will for sure kill the modem on the Pinephones for good.
> When the shutdown command are sent to the modem, it takes 20-30 seconds
> for the modem to shut down completely. Powering off the phone before the
> modem has completely shut down is more or less a sure way to kill the
> modem for good, as it can destroy the user space data in the modem.
> You will get a lot of angry Pinephone users - if introducing this
> "feature" in rawhide !

That's good feedback. I believe that in this case it'd be suitable for
the pinephone to install an override that extends the timeout.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Tom Hughes via devel

On 23/12/2022 11:45, Naheem Zaffar wrote:



On Fri, 23 Dec 2022 at 08:26, Vitaly Zaitsev via devel 
mailto:devel@lists.fedoraproject.org>> 
wrote:


On 23/12/2022 09:20, Mattia Verga via devel wrote:
 > I know this is way harder, but the right approach would be having
a way
 > to tell systemd what processes can be killed and what other processes
 > must not be forced off in any case, then display a user friendly
message
 > which inform the user that the system cannot be forced off ATM
"because
 > I'm doing this or that". In the worst case, the user can choose
to pull
 > the plug themselves.

I agree. Terminating the PackageKit service while updates are being
installed can result in a broken system.


Is there a way to be smarter about all this?

1. Set default at 15s or something short.
2. For services known to require longer (older pinephone modem firmware, 
libvirtd), allow a larger timeout for that specific service only
3. For services that should NOT be terminated have a mechanism for them 
to not be cut off


Despite the title of this change I believe the proposal is only
to change the default timeout and a service would still be able to
set a different timeout in it's service file.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Naheem Zaffar
On Fri, 23 Dec 2022 at 08:26, Vitaly Zaitsev via devel <
devel@lists.fedoraproject.org> wrote:

> On 23/12/2022 09:20, Mattia Verga via devel wrote:
> > I know this is way harder, but the right approach would be having a way
> > to tell systemd what processes can be killed and what other processes
> > must not be forced off in any case, then display a user friendly message
> > which inform the user that the system cannot be forced off ATM "because
> > I'm doing this or that". In the worst case, the user can choose to pull
> > the plug themselves.
>
> I agree. Terminating the PackageKit service while updates are being
> installed can result in a broken system.
>

Is there a way to be smarter about all this?

1. Set default at 15s or something short.
2. For services known to require longer (older pinephone modem firmware,
libvirtd), allow a larger timeout for that specific service only
3. For services that should NOT be terminated have a mechanism for them to
not be cut off

-- 
> Sincerely,
>Vitaly Zaitsev (vit...@easycoding.org)
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam, report it:
> https://pagure.io/fedora-infrastructure/new_issue
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Vitaly Zaitsev via devel

On 23/12/2022 09:20, Mattia Verga via devel wrote:

I know this is way harder, but the right approach would be having a way
to tell systemd what processes can be killed and what other processes
must not be forced off in any case, then display a user friendly message
which inform the user that the system cannot be forced off ATM "because
I'm doing this or that". In the worst case, the user can choose to pull
the plug themselves.


I agree. Terminating the PackageKit service while updates are being 
installed can result in a broken system.


--
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Mattia Verga via devel
Il 22/12/22 18:35, Ben Cotton ha scritto:
> https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>
I think this is not the right approach to solve the problem. Basically,
we're saying "I don't know what's going on, just pull the plug, I don't
care about data corruption", which is going to cause even more problems
to workstation users if data corruption happens.

I know this is way harder, but the right approach would be having a way
to tell systemd what processes can be killed and what other processes
must not be forced off in any case, then display a user friendly message
which inform the user that the system cannot be forced off ATM "because
I'm doing this or that". In the worst case, the user can choose to pull
the plug themselves.

Mattia

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-23 Thread Vitaly Zaitsev via devel

On 22/12/2022 18:35, Ben Cotton wrote:

A downstream configuration change to reduce the systemd unit timeout
from 2 minutes to 15 seconds.


+1 for this change. I've already reduced this timeout both on my desktop 
and laptop to even 10 seconds.


--
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Tomasz Torcz
On Thu, Dec 22, 2022 at 05:22:09PM -0500, Steve Grubb wrote:
> On Thursday, December 22, 2022 1:29:29 PM EST Adam Williamson wrote:
> > 15 seconds feels very aggressive to me. I can think of some cases, like
> > libvirtd automatically suspending or cleanly shutting down running VMs,
> > that might well take longer than that. Could we not go for 30 seconds?
> > Going all the way from 90/120 down to 15 seems pretty radical.
> 
> I run across this with some regularity. PackageKit is not installed on my 
> system. What I wished was that when there is a stall shutting down, a message 
> to the console or a dialog box explains who is holding up shutdown. If we 
> knew who was holding things up, bugs might get filed.

  But there already is such message! First "waiting for shutdown" with
unit name and a timer. Then, in the last phase there's also "wating for 
process:"
message.
  The problem is: at this points it is hardly debuggable. One cannot
start a new shell, sshd is off already, journalctl too. No way to gather
any information what's wrong with the process holding up shutdown. We
only get a name.  And usually you cannot reproduce the problem easy on
next shutdown.
  Maybe netconsole is still functioning at this point, but I doubt it.

-- 
Tomasz TorczTo co nierealne – tutaj jest normalne.
to...@pipebreaker.pl  Ziomale na życie mają tu patenty specjalne.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Demi Marie Obenour
On 12/22/22 14:55, Chris Murphy wrote:
> 
> 
> On Thu, Dec 22, 2022, at 1:29 PM, Adam Williamson wrote:
>> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
>>> On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
 https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer

 This document represents a proposed Change. As part of the Changes
 process, proposals are publicly announced in order to receive
 community feedback. This proposal will only be implemented if approved
 by the Fedora Engineering Steering Committee.

 == Summary ==
 A downstream configuration change to reduce the systemd unit timeout
 from 2 minutes to 15 seconds.
>>>
>>>   Great change, please do it!
>>> Also, sometimes after reaching the timeout, systemd extends wait by
>>> another 2 minutes (or 1m30). I wasn't able to find in the sources or
>>> documentation why this happens, but this behaviour should be blocked.
>>> Otherwise some services after 15s will get another 15, and then another…
>>
>> 15 seconds feels very aggressive to me. I can think of some cases, like
>> libvirtd automatically suspending or cleanly shutting down running VMs,
>> that might well take longer than that. Could we not go for 30 seconds?
>> Going all the way from 90/120 down to 15 seems pretty radical.
> 
> Yeah. I'm not opposed to the change, and I understand the main impetus behind 
> it (PackageKitd), but it's the consequences of unknowns that I'm still left 
> scratching my head trying to imagine worse case before we actually subject 
> users to it.
> 
> There really isn't a good kernel facility for something in between SIGTERM 
> which is ignorable, and SIGKILL which isn't. And I'm not familiar with 
> systemd's facilities for tracking service shutdown progress. i.e. I'm OK with 
> SIGKILL for a process that isn't responding. But I'm also not sure if there's 
> a facility for a process indicating either "I'm working on it" or "don't 
> force kill me or it'll be bad".
> 
> I also don't know if privileged services doing writes to the file system can 
> inhibit either remount read-only or umount? And if so, do we just wait for 
> all of that to complete? I think we'd have to. I'm pretty leery of rebooting 
> forcibly even if we can't remount ro because some process is holding things 
> up, doing the best it can to flush. Databases and VM's do come to mind, in 
> particular because I routinely run VMs on my laptop with cache mode unsafe. 
> If the VM is forcibly quit, it's fine. But if the host is forcibly rebooted 
> before the VM's pending writes are completed by the host, that'd be bad 
> (regardless of the file system choice).

Why cache mode unsafe?  How big a performance win is it?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Chris Murphy


On Thu, Dec 22, 2022, at 5:22 PM, Steve Grubb wrote:
> On Thursday, December 22, 2022 1:29:29 PM EST Adam Williamson wrote:
>> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
>> 
>> > On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
>> > 
>> > > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>> > > 
>> > > This document represents a proposed Change. As part of the Changes
>> > > process, proposals are publicly announced in order to receive
>> > > community feedback. This proposal will only be implemented if approved
>> > > by the Fedora Engineering Steering Committee.
>> > > 
>> > > == Summary ==
>> > > A downstream configuration change to reduce the systemd unit timeout
>> > > from 2 minutes to 15 seconds.
>> > 
>> >   Great change, please do it!
>> > 
>> > Also, sometimes after reaching the timeout, systemd extends wait by
>> > another 2 minutes (or 1m30). I wasn't able to find in the sources or
>> > documentation why this happens, but this behaviour should be blocked.
>> > Otherwise some services after 15s will get another 15, and then another…
>> 
>> 15 seconds feels very aggressive to me. I can think of some cases, like
>> libvirtd automatically suspending or cleanly shutting down running VMs,
>> that might well take longer than that. Could we not go for 30 seconds?
>> Going all the way from 90/120 down to 15 seems pretty radical.
>
> I run across this with some regularity. PackageKit is not installed on my 
> system. What I wished was that when there is a stall shutting down, a message 
> to the console or a dialog box explains who is holding up shutdown. If we 
> knew who was holding things up, bugs might get filed.

I wonder if systemctl list-jobs would be too much? 

This information needs to be logged too because 15 seconds won't be enough to 
see much. And for it to be logged, sysroot needs to be rw.


>
> In some cases I know that the system is rebuilding the nvidia drivers so that 
> graphics work on boot up. I'd like to let that finish and it certainly takes 
> more than 15 seconds. But without a blame message, how do we know what needs 
> looking into?

I expect there is (or will be) a way of tagging service units with indefinite 
wait. Reboot can't happen in the middle of kmod updates.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Steve Grubb
On Thursday, December 22, 2022 1:29:29 PM EST Adam Williamson wrote:
> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
> 
> > On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
> > 
> > > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> > > 
> > > This document represents a proposed Change. As part of the Changes
> > > process, proposals are publicly announced in order to receive
> > > community feedback. This proposal will only be implemented if approved
> > > by the Fedora Engineering Steering Committee.
> > > 
> > > == Summary ==
> > > A downstream configuration change to reduce the systemd unit timeout
> > > from 2 minutes to 15 seconds.
> > 
> >   Great change, please do it!
> > 
> > Also, sometimes after reaching the timeout, systemd extends wait by
> > another 2 minutes (or 1m30). I wasn't able to find in the sources or
> > documentation why this happens, but this behaviour should be blocked.
> > Otherwise some services after 15s will get another 15, and then another…
> 
> 15 seconds feels very aggressive to me. I can think of some cases, like
> libvirtd automatically suspending or cleanly shutting down running VMs,
> that might well take longer than that. Could we not go for 30 seconds?
> Going all the way from 90/120 down to 15 seems pretty radical.

I run across this with some regularity. PackageKit is not installed on my 
system. What I wished was that when there is a stall shutting down, a message 
to the console or a dialog box explains who is holding up shutdown. If we 
knew who was holding things up, bugs might get filed.

In some cases I know that the system is rebuilding the nvidia drivers so that 
graphics work on boot up. I'd like to let that finish and it certainly takes 
more than 15 seconds. But without a blame message, how do we know what needs 
looking into?

-Steve

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread allan2016--- via devel
På Thu, 22 Dec 2022 10:29:29 -0800
Adam Williamson  skrev:
> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
> > On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:  
> > > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> > > 
> > > This document represents a proposed Change. As part of the Changes
> > > process, proposals are publicly announced in order to receive
> > > community feedback. This proposal will only be implemented if
> > > approved by the Fedora Engineering Steering Committee.
> > > 
> > > == Summary ==
> > > A downstream configuration change to reduce the systemd unit
> > > timeout from 2 minutes to 15 seconds.  
> > 
> >   Great change, please do it!
> > Also, sometimes after reaching the timeout, systemd extends wait by
> > another 2 minutes (or 1m30). I wasn't able to find in the sources or
> > documentation why this happens, but this behaviour should be
> > blocked. Otherwise some services after 15s will get another 15, and
> > then another…  
> 
> 15 seconds feels very aggressive to me. I can think of some cases,
> like libvirtd automatically suspending or cleanly shutting down
> running VMs, that might well take longer than that. Could we not go
> for 30 seconds? Going all the way from 90/120 down to 15 seems pretty
> radical.

15 seconds will for sure kill the modem on the Pinephones for good.
When the shutdown command are sent to the modem, it takes 20-30 seconds
for the modem to shut down completely. Powering off the phone before the
modem has completely shut down is more or less a sure way to kill the
modem for good, as it can destroy the user space data in the modem.
You will get a lot of angry Pinephone users - if introducing this
"feature" in rawhide !


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Alexander Ploumistos
On Thu, Dec 22, 2022 at 8:55 PM Chris Murphy  wrote:
>
> Also I wonder if  there's a way for desktops to opt into this behavior? Or a 
> way for servers, iot, cloud, and rpm-ostree based systems to opt out?

Do you mean like setting the "DefaultTimeoutStopSec" variable in
/etc/systemd/system.conf?
(at least that's the one I *think* is responsible)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Chris Murphy


On Thu, Dec 22, 2022, at 1:29 PM, Adam Williamson wrote:
> On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
>> On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
>> > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
>> > 
>> > This document represents a proposed Change. As part of the Changes
>> > process, proposals are publicly announced in order to receive
>> > community feedback. This proposal will only be implemented if approved
>> > by the Fedora Engineering Steering Committee.
>> > 
>> > == Summary ==
>> > A downstream configuration change to reduce the systemd unit timeout
>> > from 2 minutes to 15 seconds.
>> 
>>   Great change, please do it!
>> Also, sometimes after reaching the timeout, systemd extends wait by
>> another 2 minutes (or 1m30). I wasn't able to find in the sources or
>> documentation why this happens, but this behaviour should be blocked.
>> Otherwise some services after 15s will get another 15, and then another…
>
> 15 seconds feels very aggressive to me. I can think of some cases, like
> libvirtd automatically suspending or cleanly shutting down running VMs,
> that might well take longer than that. Could we not go for 30 seconds?
> Going all the way from 90/120 down to 15 seems pretty radical.

Yeah. I'm not opposed to the change, and I understand the main impetus behind 
it (PackageKitd), but it's the consequences of unknowns that I'm still left 
scratching my head trying to imagine worse case before we actually subject 
users to it.

There really isn't a good kernel facility for something in between SIGTERM 
which is ignorable, and SIGKILL which isn't. And I'm not familiar with 
systemd's facilities for tracking service shutdown progress. i.e. I'm OK with 
SIGKILL for a process that isn't responding. But I'm also not sure if there's a 
facility for a process indicating either "I'm working on it" or "don't force 
kill me or it'll be bad".

I also don't know if privileged services doing writes to the file system can 
inhibit either remount read-only or umount? And if so, do we just wait for all 
of that to complete? I think we'd have to. I'm pretty leery of rebooting 
forcibly even if we can't remount ro because some process is holding things up, 
doing the best it can to flush. Databases and VM's do come to mind, in 
particular because I routinely run VMs on my laptop with cache mode unsafe. If 
the VM is forcibly quit, it's fine. But if the host is forcibly rebooted before 
the VM's pending writes are completed by the host, that'd be bad (regardless of 
the file system choice).

Also I wonder if  there's a way for desktops to opt into this behavior? Or a 
way for servers, iot, cloud, and rpm-ostree based systems to opt out? They very 
well might have legitimate reasons for very long service shutdowns: they're 
really super busy, and forward progress is being made but it'll take a *lot* 
longer than 15 minutes to get to a safe shutdown point.



-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Tom Hughes via devel

On 22/12/2022 19:18, Michael Catanzaro wrote:
On Thu, Dec 22 2022 at 10:29:29 AM -0800, Adam Williamson 
 wrote:

Could we not go for 30 seconds?


Personally I think 30 seconds is way too long for desktop users. But 
it's a lot better than 2 minutes, so if that's what we settle on, I 
won't complain.


The thing is that it's not really two minutes anyway, it's more
like eight minutes because each time the timer expires systemd
sends a stronger signal and restarts the timer until it eventually
gets to SIGKILL.

So in the worst case where a process is stuck in D wait then
you get all the way to the SIGKILL phase and then wait two minutes
for that before it eventually gives up and continues.

At least that is what usually seems to happen when I run into
this problem.

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Michael Catanzaro
On Thu, Dec 22 2022 at 10:29:29 AM -0800, Adam Williamson 
 wrote:

Could we not go for 30 seconds?


Personally I think 30 seconds is way too long for desktop users. But 
it's a lot better than 2 minutes, so if that's what we settle on, I 
won't complain.


libvirtd should probably take an inhibitor to inform the user to close 
things cleanly before starting to shut down the host.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Adam Williamson
On Thu, 2022-12-22 at 18:44 +0100, Tomasz Torcz wrote:
> On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
> > https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> > 
> > This document represents a proposed Change. As part of the Changes
> > process, proposals are publicly announced in order to receive
> > community feedback. This proposal will only be implemented if approved
> > by the Fedora Engineering Steering Committee.
> > 
> > == Summary ==
> > A downstream configuration change to reduce the systemd unit timeout
> > from 2 minutes to 15 seconds.
> 
>   Great change, please do it!
> Also, sometimes after reaching the timeout, systemd extends wait by
> another 2 minutes (or 1m30). I wasn't able to find in the sources or
> documentation why this happens, but this behaviour should be blocked.
> Otherwise some services after 15s will get another 15, and then another…

15 seconds feels very aggressive to me. I can think of some cases, like
libvirtd automatically suspending or cleanly shutting down running VMs,
that might well take longer than that. Could we not go for 30 seconds?
Going all the way from 90/120 down to 15 seems pretty radical.
-- 
Adam Williamson
Fedora QA
IRC: adamw | Twitter: adamw_ha
https://www.happyassassin.net

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Ben Cotton
https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer

This document represents a proposed Change. As part of the Changes
process, proposals are publicly announced in order to receive
community feedback. This proposal will only be implemented if approved
by the Fedora Engineering Steering Committee.

== Summary ==
A downstream configuration change to reduce the systemd unit timeout
from 2 minutes to 15 seconds.

== Owner ==
* Name: catanzaro
* Email: mcatanzaro at redhat dot com
* Name: aday
* Email: aday at redhat dot com


== Detailed Description ==
Currently, a service that fails to stop at shutdown time can block
shutdown for up to 2 minutes. This is extremely frustrating for our
users - someone goes to shutdown or reboot their system, and then
unexpectedly has to wait for a long time before they can do anything
else.

The most common service to cause this issue is PackageKit, but there are others.

When a service fails to shutdown when it is instructed to do so, it is
not behaving properly, and it is preventing the system from behaving
in an orderly and predictable manner. Desktop APIs exist for cases
when services or apps legitimately need to prevent shutdown, and these
allow the shutdown inhibit to be communicated to admins and users, so
they understand what is happening. When the user decides to shut down
anyway, services must terminate in a timely manner. The Workstation
Working Group feels that 15 seconds is the maximum appropriate time
for both system and user services, and that Fedora should be robust to
buggy and misbehaving services that do not shut down in an appropriate
manner.

=== History ===

The Workstation Working Group has been
[https://pagure.io/fedora-workstation/issue/163 working on this issue
for several years]. Investigations have revealed that it's not
possible to fix every misbehaving service: in some cases the
misbehaviour comes from design flaws that are difficult to resolve.

An attempt has also been
[https://github.com/systemd/systemd/pull/18386 made to have the unit
timeout changed in upstream systemd]. That attempt did not go
anywhere, despite various efforts to move it along. We are no longer
comfortable waiting for upstream changes to land.

To our knowledge, there are no issues that will result from forcing
services to stop after 15 seconds on typical systems. However, system
administrators may need to configure a higher timeout if waiting
longer for a particular service, which may be true for database
services, for example.

== Feedback ==
The relevant [https://pagure.io/fedora-workstation/issue/163
Workstation Working Group ticket] includes some discussion. This
change [https://pagure.io/fesco/issue/2853 was also previously
proposed to FESCo].

== Benefit to Fedora ==
The primary benefit of the change will be to mitigate a very annoying
and - frankly - embarrassing bug. Our users shouldn't have to randomly
sit waiting for their machine to shutdown. It will also encourage the
correct use of shutdown inhibit APIs.

Although this change will "paper over" bugs in services without fixing
them, we emphasize that reducing the timeout is not merely a
workaround for buggy services, but also the desired permanent design.
Of course it is desirable to fix the underlying bugs as well, but it
doesn't make sense to require this before fixing the service timeout
to match our needs.

== Scope ==
* Proposal owners:
** Merge [https://src.fedoraproject.org/rpms/systemd/pull-request/85
the downstream change] to {{package|systemd}}.
* Other developers:
** Test their packages with the new behavior and report issues as necessary.
* Release engineering: [https://pagure.io/releng/issue/11193 #11193]
* Policies and guidelines: No policy or guideline changes required
* Trademark approval: N/A (not needed for this Change)
* Alignment with Objectives:

== Upgrade/compatibility impact ==
System and user services will be killed with SIGKILL 15 seconds after
receiving SIGTERM, from previously 1 minute 30 seconds for most system
and user services, or 2 minutes for user manager system services (the
system service that runs all user services for a user), so services
will have less time to shut down gracefully by default. These defaults
are configurable and system administrators who require longer timeouts
would need to adjust them before or after upgrade. You may edit the
DefaultTimeoutStopSec= setting in /etc/systemd/user.conf and
/etc/systemd/system.conf. You may also create a drop-in to change the
TimeoutStopSec= setting for user@service.

== How To Test ==
Given the intermittent and unpredictable nature of the bug that is
being targeted, the best way to test is by using the upcoming Fedora
release. Are shutdown delays eliminated as intended? Do system
services experience issues as a result of the change?

== User Experience ==
This change will make the Fedora user experience less annoying. It
will also encourage the use of the existing inhibit APIs, which
provide better feedback for users 

Re: F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Tomasz Torcz
On Thu, Dec 22, 2022 at 12:35:54PM -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer
> 
> This document represents a proposed Change. As part of the Changes
> process, proposals are publicly announced in order to receive
> community feedback. This proposal will only be implemented if approved
> by the Fedora Engineering Steering Committee.
> 
> == Summary ==
> A downstream configuration change to reduce the systemd unit timeout
> from 2 minutes to 15 seconds.

  Great change, please do it!
Also, sometimes after reaching the timeout, systemd extends wait by
another 2 minutes (or 1m30). I wasn't able to find in the sources or
documentation why this happens, but this behaviour should be blocked.
Otherwise some services after 15s will get another 15, and then another…


-- 
Tomasz TorczTo co nierealne – tutaj jest normalne.
to...@pipebreaker.pl  Ziomale na życie mają tu patenty specjalne.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


F38 proposal: Shorter Shutdown Timer (System-Wide Change proposal)

2022-12-22 Thread Ben Cotton
https://fedoraproject.org/wiki/Changes/Shorter_Shutdown_Timer

This document represents a proposed Change. As part of the Changes
process, proposals are publicly announced in order to receive
community feedback. This proposal will only be implemented if approved
by the Fedora Engineering Steering Committee.

== Summary ==
A downstream configuration change to reduce the systemd unit timeout
from 2 minutes to 15 seconds.

== Owner ==
* Name: catanzaro
* Email: mcatanzaro at redhat dot com
* Name: aday
* Email: aday at redhat dot com


== Detailed Description ==
Currently, a service that fails to stop at shutdown time can block
shutdown for up to 2 minutes. This is extremely frustrating for our
users - someone goes to shutdown or reboot their system, and then
unexpectedly has to wait for a long time before they can do anything
else.

The most common service to cause this issue is PackageKit, but there are others.

When a service fails to shutdown when it is instructed to do so, it is
not behaving properly, and it is preventing the system from behaving
in an orderly and predictable manner. Desktop APIs exist for cases
when services or apps legitimately need to prevent shutdown, and these
allow the shutdown inhibit to be communicated to admins and users, so
they understand what is happening. When the user decides to shut down
anyway, services must terminate in a timely manner. The Workstation
Working Group feels that 15 seconds is the maximum appropriate time
for both system and user services, and that Fedora should be robust to
buggy and misbehaving services that do not shut down in an appropriate
manner.

=== History ===

The Workstation Working Group has been
[https://pagure.io/fedora-workstation/issue/163 working on this issue
for several years]. Investigations have revealed that it's not
possible to fix every misbehaving service: in some cases the
misbehaviour comes from design flaws that are difficult to resolve.

An attempt has also been
[https://github.com/systemd/systemd/pull/18386 made to have the unit
timeout changed in upstream systemd]. That attempt did not go
anywhere, despite various efforts to move it along. We are no longer
comfortable waiting for upstream changes to land.

To our knowledge, there are no issues that will result from forcing
services to stop after 15 seconds on typical systems. However, system
administrators may need to configure a higher timeout if waiting
longer for a particular service, which may be true for database
services, for example.

== Feedback ==
The relevant [https://pagure.io/fedora-workstation/issue/163
Workstation Working Group ticket] includes some discussion. This
change [https://pagure.io/fesco/issue/2853 was also previously
proposed to FESCo].

== Benefit to Fedora ==
The primary benefit of the change will be to mitigate a very annoying
and - frankly - embarrassing bug. Our users shouldn't have to randomly
sit waiting for their machine to shutdown. It will also encourage the
correct use of shutdown inhibit APIs.

Although this change will "paper over" bugs in services without fixing
them, we emphasize that reducing the timeout is not merely a
workaround for buggy services, but also the desired permanent design.
Of course it is desirable to fix the underlying bugs as well, but it
doesn't make sense to require this before fixing the service timeout
to match our needs.

== Scope ==
* Proposal owners:
** Merge [https://src.fedoraproject.org/rpms/systemd/pull-request/85
the downstream change] to {{package|systemd}}.
* Other developers:
** Test their packages with the new behavior and report issues as necessary.
* Release engineering: [https://pagure.io/releng/issue/11193 #11193]
* Policies and guidelines: No policy or guideline changes required
* Trademark approval: N/A (not needed for this Change)
* Alignment with Objectives:

== Upgrade/compatibility impact ==
System and user services will be killed with SIGKILL 15 seconds after
receiving SIGTERM, from previously 1 minute 30 seconds for most system
and user services, or 2 minutes for user manager system services (the
system service that runs all user services for a user), so services
will have less time to shut down gracefully by default. These defaults
are configurable and system administrators who require longer timeouts
would need to adjust them before or after upgrade. You may edit the
DefaultTimeoutStopSec= setting in /etc/systemd/user.conf and
/etc/systemd/system.conf. You may also create a drop-in to change the
TimeoutStopSec= setting for user@service.

== How To Test ==
Given the intermittent and unpredictable nature of the bug that is
being targeted, the best way to test is by using the upcoming Fedora
release. Are shutdown delays eliminated as intended? Do system
services experience issues as a result of the change?

== User Experience ==
This change will make the Fedora user experience less annoying. It
will also encourage the use of the existing inhibit APIs, which
provide better feedback for users