Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17 Mar 2011 22:48:36 +0100 Lennart Poettering lenn...@poettering.net wrote: On Thu, 17.03.11 10:20, Mike Kazantsev (mk.frag...@gmail.com) wrote: On Thu, 17 Mar 2011 01:39:19 +0100 Lennart Poettering lenn...@poettering.net wrote: Experiencing several reboots on a machine with 50+ enabled daemons I've noticed that some of them (mostly the ones, started via some laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to cleanly fail randomly on start just because GuessMainPID= mechanism fails and systemd actually kills the service. Hmm, GuessPID= fails? Do you know why exactly? Ideas for improvements? The current logic is pretty simply: we look for all processes in the service cgroup which have PPID == 1. If there is only one of these, we assume it is the main process. In your case there hence must be more than once where this condition applies? Any recommendation would else we could check? For some services I've observed following behavior: * logs state that service received sigterm and is shutting down. * systemd status shows Main PID that differs from the one in the logs and/or pidfile. Thus I assume that in all these cases launcher forks more than one process and when the first forked one (which gets marked as main) dies, systemd pulls the plug and just kills the rest of them. Problem seem to be related to timing and maybe some switch like GuessMainPIDAfterRunningForSec= would help, but it'd still be racy, so disabling pid guessing and using PIDFile= seem to be a better way to do it with app's cooperation, and all the apps with such complicated start seem to support pidfiles, so I don't think anything else is necessary there, unless pidfile-eradication becomes some kind of crusade, but then all such launchers should probably just go away as well. I understand that there's a limited number of reasons for such clean stop (manual interaction, units like rsyslog.service, Conflicts=, isolate, etc), but still it's a wrong way to approach the particular problem. I've solved the problem for myself by writing a simple dbus-python script (http://goo.gl/V6e7V). It shows exactly everything that's enabled and not active (with oneshot exception), not some random subset of this. Hmm, jupp. I agree, this is very useful. I added this to the todo list now. Thanks! Unfortunately, new rsyslog.service (and services using systemctl stop directly) can affect such display, which I think shows the flawed assumption that enabled in systemd means should be active, period (with the exception of oneshot units) on my part, and I don't know easy solution to this, short of adding another enabled-like state. Hmm, yeah. This problem is hard. But I think simply showing enabled but not running is already quite useful, even if a service on that list is not necessarily buggy, but just not hooked in by anything. I think Andrey's systemctl --query suggestion in this thread or special systemd-query tool should already be able to provide such functionality (and more), so it should be a good enough solution. Combined with failed state for dead-services-that-shouldn't-be it should be even better - services stopped via systemctl like that won't have failed state, so they can be easily filtered out by the same query tool or grep/awk. -- Mike Kazantsev // fraggod.net signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17 Mar 2011 22:58:00 +0100 Lennart Poettering lenn...@poettering.net wrote: On Thu, 17.03.11 22:48, Lennart Poettering (lenn...@poettering.net) wrote: Unfortunately, new rsyslog.service (and services using systemctl stop directly) can affect such display, which I think shows the flawed assumption that enabled in systemd means should be active, period (with the exception of oneshot units) on my part, and I don't know easy solution to this, short of adding another enabled-like state. Hmm, yeah. This problem is hard. But I think simply showing enabled but not running is already quite useful, even if a service on that list is not necessarily buggy, but just not hooked in by anything. Thinking about this, maybe a simpler solution would be to add a switch to list all services that have been running since the boot but are not running anymore. That would be quite trivial to implement. Does that make sense to you? Looks like kinda systemctl snapshot diffs to me. It's not really hard to do now with systemctl, grep, sort and diff, plus some service which does initial snapshot late at boot, but it looks like a kludge to me - services tend to fail at boot as well, and desired system state (enabled/disabled) may change over uptime (i.e. new stuff installed, something got disabled), so it doesn't make sense anymore to compare anything to just a boot-state. Maybe it'd make sense to add ability to diff state snapshots in this regard, comparing (and saving) not just running but also enabled/masked services. It looks way out of scope of systemctl, too, and more in the realm of systemd-query tool, maybe developed separately, as it sounds more like a separate systemd-monitoring solution at this point ;) -- Mike Kazantsev // fraggod.net signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, Feb 24, 2011 at 11:55 AM, Mike Kazantsev mk.frag...@gmail.com wrote: Something like systemctl --enabled would certainly be much more useful for such cases than the current systemctl --all, yet there will still be a lot of oneshot stuff, which are supposed to be dead, so a separate state for !oneshot enabled exited services like stopped (in place of inactive) and maybe a view like systemctl --stopped would be of a great help from my sysadmin's perspective. I understand that there's already a looong TODO, but maybe it's possible to cram such systemctl view option(s) somewhere in that list? I was vaguely thinking about adding generic selection option to replace current ad-hoc ones, something like --state=enabled,!exited or even allowing generic filtering on specific property, like --query Type=socket,dbus The latter is probably the way to go as it can be used to implement any custom query. And add better output format control of course :) It just needs some proper, but not too complicated, syntax. Is it good enough for TODO? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17 Mar 2011 09:52:21 +0300 Andrey Borzenkov arvidj...@mail.ru wrote: On Thu, Feb 24, 2011 at 11:55 AM, Mike Kazantsev mk.frag...@gmail.com wrote: Something like systemctl --enabled would certainly be much more useful for such cases than the current systemctl --all, yet there will still be a lot of oneshot stuff, which are supposed to be dead, so a separate state for !oneshot enabled exited services like stopped (in place of inactive) and maybe a view like systemctl --stopped would be of a great help from my sysadmin's perspective. I understand that there's already a looong TODO, but maybe it's possible to cram such systemctl view option(s) somewhere in that list? I was vaguely thinking about adding generic selection option to replace current ad-hoc ones, something like --state=enabled,!exited or even allowing generic filtering on specific property, like --query Type=socket,dbus The latter is probably the way to go as it can be used to implement any custom query. And add better output format control of course :) It just needs some proper, but not too complicated, syntax. Is it good enough for TODO? I think it'd be great, although syntax could be problematic indeed if and and or logic would be implemented. Simpliest (from my pov) thing that comes to mind is perl / bash / libpcap / etc syntax like rule1 ( rule2 || rule3 ) ! rule4, disallowing multiple values for the sake of unity (i.e. Type=socket || Type=dbus instead of Type=socket,dbus). -- Mike Kazantsev // fraggod.net signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17.03.11 10:20, Mike Kazantsev (mk.frag...@gmail.com) wrote: On Thu, 17 Mar 2011 01:39:19 +0100 Lennart Poettering lenn...@poettering.net wrote: On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote: Something like systemctl --enabled would certainly be much more useful for such cases than the current systemctl --all, yet there will still be a lot of oneshot stuff, which are supposed to be dead, so a separate state for !oneshot enabled exited services like stopped (in place of inactive) and maybe a view like systemctl --stopped would be of a great help from my sysadmin's perspective. Hmm, thinking about this: wouldn't it be a lot more useful for your case if we add an option which cuases services to enter fail if a service exits cleanly, but does so for no reason, i.e. without being asked to do that from systemd? or maybe that should even be the default for most services? After all only services which implement exit-on-idle would otherwise exit cleanly just for fun without being asked for that... I think it'd be an improvement, but that'd also give failed state a bit more ambiguity, although maybe it's not such a bad thing. Experiencing several reboots on a machine with 50+ enabled daemons I've noticed that some of them (mostly the ones, started via some laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to cleanly fail randomly on start just because GuessMainPID= mechanism fails and systemd actually kills the service. Hmm, GuessPID= fails? Do you know why exactly? Ideas for improvements? The current logic is pretty simply: we look for all processes in the service cgroup which have PPID == 1. If there is only one of these, we assume it is the main process. In your case there hence must be more than once where this condition applies? Any recommendation would else we could check? I understand that there's a limited number of reasons for such clean stop (manual interaction, units like rsyslog.service, Conflicts=, isolate, etc), but still it's a wrong way to approach the particular problem. I've solved the problem for myself by writing a simple dbus-python script (http://goo.gl/V6e7V). It shows exactly everything that's enabled and not active (with oneshot exception), not some random subset of this. Hmm, jupp. I agree, this is very useful. I added this to the todo list now. Unfortunately, new rsyslog.service (and services using systemctl stop directly) can affect such display, which I think shows the flawed assumption that enabled in systemd means should be active, period (with the exception of oneshot units) on my part, and I don't know easy solution to this, short of adding another enabled-like state. Hmm, yeah. This problem is hard. But I think simply showing enabled but not running is already quite useful, even if a service on that list is not necessarily buggy, but just not hooked in by anything. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17.03.11 22:48, Lennart Poettering (lenn...@poettering.net) wrote: Unfortunately, new rsyslog.service (and services using systemctl stop directly) can affect such display, which I think shows the flawed assumption that enabled in systemd means should be active, period (with the exception of oneshot units) on my part, and I don't know easy solution to this, short of adding another enabled-like state. Hmm, yeah. This problem is hard. But I think simply showing enabled but not running is already quite useful, even if a service on that list is not necessarily buggy, but just not hooked in by anything. Thinking about this, maybe a simpler solution would be to add a switch to list all services that have been running since the boot but are not running anymore. That would be quite trivial to implement. Does that make sense to you? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17.03.11 09:52, Andrey Borzenkov (arvidj...@mail.ru) wrote: I understand that there's already a looong TODO, but maybe it's possible to cram such systemctl view option(s) somewhere in that list? I was vaguely thinking about adding generic selection option to replace current ad-hoc ones, something like --state=enabled,!exited ! on the sell is always a bit weird. But this might be a good idea to have. or even allowing generic filtering on specific property, like --query Type=socket,dbus Hmm, having this would make systemctl almost an SQL database ;-) I think we should try to figure out what exactly we need, instead of losing ourselves in inventing new matching languages. systemctl already has quite a long man page, and I'd prefer if it wouldnt become a novel. Maybe one option would be to introduce it as seperate 'systemd-query' tool or so? Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17.03.11 12:52, Mike Kazantsev (mk.frag...@gmail.com) wrote: --query Type=socket,dbus The latter is probably the way to go as it can be used to implement any custom query. And add better output format control of course :) It just needs some proper, but not too complicated, syntax. Is it good enough for TODO? I think it'd be great, although syntax could be problematic indeed if and and or logic would be implemented. Simpliest (from my pov) thing that comes to mind is perl / bash / libpcap / etc syntax like rule1 ( rule2 || rule3 ) ! rule4, disallowing multiple values for the sake of unity (i.e. Type=socket || Type=dbus instead of Type=socket,dbus). Hmm, I am not convinced we want to come up with our own matching syntax there. A different approach might be to just use awk here. awk has been invented for this kind of matching. So it might be nicer to just have a mode where we have an output that is easily processable by awk, and then people can do all matching with awk? awk is pretty well known and widely spoken. And I think it is pretty good with CSV style dumps, so maybe we should just support that nicely... Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
2011/3/17 Lennart Poettering lenn...@poettering.net: Hmm, I am not convinced we want to come up with our own matching syntax there. A different approach might be to just use awk here. awk has been invented for this kind of matching. So it might be nicer to just have a mode where we have an output that is easily processable by awk, and then people can do all matching with awk? awk is pretty well known and widely spoken. And I think it is pretty good with CSV style dumps, so maybe we should just support that nicely... +1 -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote: Something like systemctl --enabled would certainly be much more useful for such cases than the current systemctl --all, yet there will still be a lot of oneshot stuff, which are supposed to be dead, so a separate state for !oneshot enabled exited services like stopped (in place of inactive) and maybe a view like systemctl --stopped would be of a great help from my sysadmin's perspective. Hmm, thinking about this: wouldn't it be a lot more useful for your case if we add an option which cuases services to enter fail if a service exits cleanly, but does so for no reason, i.e. without being asked to do that from systemd? or maybe that should even be the default for most services? After all only services which implement exit-on-idle would otherwise exit cleanly just for fun without being asked for that... Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services
On Thu, 17 Mar 2011 01:39:19 +0100 Lennart Poettering lenn...@poettering.net wrote: On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote: Something like systemctl --enabled would certainly be much more useful for such cases than the current systemctl --all, yet there will still be a lot of oneshot stuff, which are supposed to be dead, so a separate state for !oneshot enabled exited services like stopped (in place of inactive) and maybe a view like systemctl --stopped would be of a great help from my sysadmin's perspective. Hmm, thinking about this: wouldn't it be a lot more useful for your case if we add an option which cuases services to enter fail if a service exits cleanly, but does so for no reason, i.e. without being asked to do that from systemd? or maybe that should even be the default for most services? After all only services which implement exit-on-idle would otherwise exit cleanly just for fun without being asked for that... I think it'd be an improvement, but that'd also give failed state a bit more ambiguity, although maybe it's not such a bad thing. Experiencing several reboots on a machine with 50+ enabled daemons I've noticed that some of them (mostly the ones, started via some laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to cleanly fail randomly on start just because GuessMainPID= mechanism fails and systemd actually kills the service. Proposed solution should at least be useful to detect this (quite common) cases. These are mostly one-time issues however, showing a bug in systemd unit file. Shortcoming of this approach is that cleanly stopped but enabled services still won't be shown anywhere, so you can't really assert that all the services I've requested are running, which kinda beats the purpose of such display - you still can't trust it (or rather it doesn't show what you need) and have to either deploy software to work around this shortcoming or check status of all the services manually. I understand that there's a limited number of reasons for such clean stop (manual interaction, units like rsyslog.service, Conflicts=, isolate, etc), but still it's a wrong way to approach the particular problem. I've solved the problem for myself by writing a simple dbus-python script (http://goo.gl/V6e7V). It shows exactly everything that's enabled and not active (with oneshot exception), not some random subset of this. Unfortunately, new rsyslog.service (and services using systemctl stop directly) can affect such display, which I think shows the flawed assumption that enabled in systemd means should be active, period (with the exception of oneshot units) on my part, and I don't know easy solution to this, short of adding another enabled-like state. -- Mike Kazantsev // fraggod.net signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel