Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-18 Thread Mike Kazantsev
On Thu, 17 Mar 2011 22:48:36 +0100
Lennart Poettering lenn...@poettering.net wrote:

 On Thu, 17.03.11 10:20, Mike Kazantsev (mk.frag...@gmail.com) wrote:
 
  On Thu, 17 Mar 2011 01:39:19 +0100
  Lennart Poettering lenn...@poettering.net wrote:
  
  Experiencing several reboots on a machine with 50+ enabled daemons
  I've noticed that some of them (mostly the ones, started via some
  laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to
  cleanly fail randomly on start just because GuessMainPID= mechanism
  fails and systemd actually kills the service.
 
 Hmm, GuessPID= fails? Do you know why exactly? Ideas for improvements?
 
 The current logic is pretty simply: we look for all processes in the
 service cgroup which have PPID == 1. If there is only one of these, we
 assume it is the main process. In your case there hence must be more
 than once where this condition applies? Any recommendation would else we
 could check?
 

For some services I've observed following behavior:
 * logs state that service received sigterm and is shutting down.
 * systemd status shows Main PID that differs from the one in the
   logs and/or pidfile.

Thus I assume that in all these cases launcher forks more than one
process and when the first forked one (which gets marked as main)
dies, systemd pulls the plug and just kills the rest of them.

Problem seem to be related to timing and maybe some switch like
GuessMainPIDAfterRunningForSec= would help, but it'd still be racy, so
disabling pid guessing and using PIDFile= seem to be a better way to do
it with app's cooperation, and all the apps with such complicated start
seem to support pidfiles, so I don't think anything else is necessary
there, unless pidfile-eradication becomes some kind of crusade, but
then all such launchers should probably just go away as well.


  I understand that there's a limited number of reasons for such clean
  stop (manual interaction, units like rsyslog.service, Conflicts=,
  isolate, etc), but still it's a wrong way to approach the particular
  problem.
  
  I've solved the problem for myself by writing a simple dbus-python
  script (http://goo.gl/V6e7V). It shows exactly everything that's
  enabled and not active (with oneshot exception), not some random
  subset of this.
 
 Hmm, jupp. I agree, this is very useful. I added this to the todo list now.
 

Thanks!


  Unfortunately, new rsyslog.service (and services using systemctl stop
  directly) can affect such display, which I think shows the flawed
  assumption that enabled in systemd means should be active,
  period (with the exception of oneshot units) on my part, and I don't
  know easy solution to this, short of adding another enabled-like state.
 
 Hmm, yeah. This problem is hard. But I think simply showing enabled but
 not running is already quite useful, even if a service on that list is
 not necessarily buggy, but just not hooked in by anything.
 

I think Andrey's systemctl --query suggestion in this thread or
special systemd-query tool should already be able to provide such
functionality (and more), so it should be a good enough solution.

Combined with failed state for dead-services-that-shouldn't-be it
should be even better - services stopped via systemctl like that won't
have failed state, so they can be easily filtered out by the same
query tool or grep/awk.


-- 
Mike Kazantsev // fraggod.net


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-18 Thread Mike Kazantsev
On Thu, 17 Mar 2011 22:58:00 +0100
Lennart Poettering lenn...@poettering.net wrote:

 On Thu, 17.03.11 22:48, Lennart Poettering (lenn...@poettering.net) wrote:
 
   Unfortunately, new rsyslog.service (and services using systemctl stop
   directly) can affect such display, which I think shows the flawed
   assumption that enabled in systemd means should be active,
   period (with the exception of oneshot units) on my part, and I don't
   know easy solution to this, short of adding another enabled-like state.
  
  Hmm, yeah. This problem is hard. But I think simply showing enabled but
  not running is already quite useful, even if a service on that list is
  not necessarily buggy, but just not hooked in by anything.
 
 Thinking about this, maybe a simpler solution would be to add a switch
 to list all services that have been running since the boot but are not
 running anymore. That would be quite trivial to implement. Does that
 make sense to you?
 

Looks like kinda systemctl snapshot diffs to me.

It's not really hard to do now with systemctl, grep, sort and diff, plus
some service which does initial snapshot late at boot, but it looks
like a kludge to me - services tend to fail at boot as well, and
desired system state (enabled/disabled) may change over uptime (i.e.
new stuff installed, something got disabled), so it doesn't make sense
anymore to compare anything to just a boot-state.

Maybe it'd make sense to add ability to diff state snapshots in this
regard, comparing (and saving) not just running but also enabled/masked
services.

It looks way out of scope of systemctl, too, and more in the realm of
systemd-query tool, maybe developed separately, as it sounds more like
a separate systemd-monitoring solution at this point ;)


-- 
Mike Kazantsev // fraggod.net


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Andrey Borzenkov
On Thu, Feb 24, 2011 at 11:55 AM, Mike Kazantsev mk.frag...@gmail.com wrote:
 Something like systemctl --enabled would certainly be much more
 useful for such cases than the current systemctl --all, yet
 there will still be a lot of oneshot stuff, which are supposed to be
 dead, so a separate state for !oneshot  enabled  exited services
 like stopped (in place of inactive) and maybe a view like systemctl
 --stopped would be of a great help from my sysadmin's perspective.

 I understand that there's already a looong TODO, but maybe it's possible
 to cram such systemctl view option(s) somewhere in that list?


I was vaguely thinking about adding generic selection option to
replace current ad-hoc ones, something like

--state=enabled,!exited

or even allowing generic filtering on specific property, like

--query Type=socket,dbus

The latter is probably the way to go as it can be used to implement
any custom query. And add better output format control of course :) It
just needs some proper, but not too complicated, syntax.

Is it good enough for TODO?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Mike Kazantsev
On Thu, 17 Mar 2011 09:52:21 +0300
Andrey Borzenkov arvidj...@mail.ru wrote:

 On Thu, Feb 24, 2011 at 11:55 AM, Mike Kazantsev mk.frag...@gmail.com wrote:
  Something like systemctl --enabled would certainly be much more
  useful for such cases than the current systemctl --all, yet
  there will still be a lot of oneshot stuff, which are supposed to be
  dead, so a separate state for !oneshot  enabled  exited services
  like stopped (in place of inactive) and maybe a view like systemctl
  --stopped would be of a great help from my sysadmin's perspective.
 
  I understand that there's already a looong TODO, but maybe it's possible
  to cram such systemctl view option(s) somewhere in that list?
 
 
 I was vaguely thinking about adding generic selection option to
 replace current ad-hoc ones, something like
 
 --state=enabled,!exited
 
 or even allowing generic filtering on specific property, like
 
 --query Type=socket,dbus
 
 The latter is probably the way to go as it can be used to implement
 any custom query. And add better output format control of course :) It
 just needs some proper, but not too complicated, syntax.
 
 Is it good enough for TODO?

I think it'd be great, although syntax could be problematic indeed if
and and or logic would be implemented.

Simpliest (from my pov) thing that comes to mind is perl / bash /
libpcap / etc syntax like rule1  ( rule2 || rule3 )  ! rule4,
disallowing multiple values for the sake of unity (i.e. Type=socket ||
Type=dbus instead of Type=socket,dbus).


-- 
Mike Kazantsev // fraggod.net


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Lennart Poettering
On Thu, 17.03.11 10:20, Mike Kazantsev (mk.frag...@gmail.com) wrote:

 On Thu, 17 Mar 2011 01:39:19 +0100
 Lennart Poettering lenn...@poettering.net wrote:
 
  On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote:
  
   Something like systemctl --enabled would certainly be much more
   useful for such cases than the current systemctl --all, yet
   there will still be a lot of oneshot stuff, which are supposed to be
   dead, so a separate state for !oneshot  enabled  exited services
   like stopped (in place of inactive) and maybe a view like systemctl
   --stopped would be of a great help from my sysadmin's perspective.
  
  Hmm, thinking about this: wouldn't it be a lot more useful for your case
  if we add an option which cuases services to enter fail if a service
  exits cleanly, but does so for no reason, i.e. without being asked to do
  that from systemd?
  
  or maybe that should even be the default for most services? After all
  only services which implement exit-on-idle would otherwise exit cleanly
  just for fun without being asked for that...
  
 
 I think it'd be an improvement, but that'd also give failed state a
 bit more ambiguity, although maybe it's not such a bad thing.
 
 Experiencing several reboots on a machine with 50+ enabled daemons
 I've noticed that some of them (mostly the ones, started via some
 laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to
 cleanly fail randomly on start just because GuessMainPID= mechanism
 fails and systemd actually kills the service.

Hmm, GuessPID= fails? Do you know why exactly? Ideas for improvements?

The current logic is pretty simply: we look for all processes in the
service cgroup which have PPID == 1. If there is only one of these, we
assume it is the main process. In your case there hence must be more
than once where this condition applies? Any recommendation would else we
could check?

 I understand that there's a limited number of reasons for such clean
 stop (manual interaction, units like rsyslog.service, Conflicts=,
 isolate, etc), but still it's a wrong way to approach the particular
 problem.
 
 I've solved the problem for myself by writing a simple dbus-python
 script (http://goo.gl/V6e7V). It shows exactly everything that's
 enabled and not active (with oneshot exception), not some random
 subset of this.

Hmm, jupp. I agree, this is very useful. I added this to the todo list now.

 Unfortunately, new rsyslog.service (and services using systemctl stop
 directly) can affect such display, which I think shows the flawed
 assumption that enabled in systemd means should be active,
 period (with the exception of oneshot units) on my part, and I don't
 know easy solution to this, short of adding another enabled-like state.

Hmm, yeah. This problem is hard. But I think simply showing enabled but
not running is already quite useful, even if a service on that list is
not necessarily buggy, but just not hooked in by anything.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Lennart Poettering
On Thu, 17.03.11 22:48, Lennart Poettering (lenn...@poettering.net) wrote:

  Unfortunately, new rsyslog.service (and services using systemctl stop
  directly) can affect such display, which I think shows the flawed
  assumption that enabled in systemd means should be active,
  period (with the exception of oneshot units) on my part, and I don't
  know easy solution to this, short of adding another enabled-like state.
 
 Hmm, yeah. This problem is hard. But I think simply showing enabled but
 not running is already quite useful, even if a service on that list is
 not necessarily buggy, but just not hooked in by anything.

Thinking about this, maybe a simpler solution would be to add a switch
to list all services that have been running since the boot but are not
running anymore. That would be quite trivial to implement. Does that
make sense to you?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Lennart Poettering
On Thu, 17.03.11 09:52, Andrey Borzenkov (arvidj...@mail.ru) wrote:
  I understand that there's already a looong TODO, but maybe it's possible
  to cram such systemctl view option(s) somewhere in that list?
 
 
 I was vaguely thinking about adding generic selection option to
 replace current ad-hoc ones, something like
 
 --state=enabled,!exited

! on the sell is always a bit weird. But this might be a good idea to
have.

 or even allowing generic filtering on specific property, like
 
 --query Type=socket,dbus

Hmm, having this would make systemctl almost an SQL database ;-) 

I think we should try to figure out what exactly we need, instead of
losing ourselves in inventing new matching languages. systemctl already
has quite a long man page, and I'd prefer if it wouldnt become a novel.

Maybe one option would be to introduce it as seperate 'systemd-query'
tool or so?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Lennart Poettering
On Thu, 17.03.11 12:52, Mike Kazantsev (mk.frag...@gmail.com) wrote:

  --query Type=socket,dbus
  
  The latter is probably the way to go as it can be used to implement
  any custom query. And add better output format control of course :) It
  just needs some proper, but not too complicated, syntax.
  
  Is it good enough for TODO?
 
 I think it'd be great, although syntax could be problematic indeed if
 and and or logic would be implemented.
 
 Simpliest (from my pov) thing that comes to mind is perl / bash /
 libpcap / etc syntax like rule1  ( rule2 || rule3 )  ! rule4,
 disallowing multiple values for the sake of unity (i.e. Type=socket ||
 Type=dbus instead of Type=socket,dbus).

Hmm, I am not convinced we want to come up with our own matching syntax
there.

A different approach might be to just use awk here. awk has been
invented for this kind of matching. So it might be nicer to just have a
mode where we have an output that is easily processable by awk, and then
people can do all matching with awk?

awk is pretty well known and widely spoken. And I think it is pretty
good with CSV style dumps, so maybe we should just support that nicely...

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-17 Thread Michael Biebl
2011/3/17 Lennart Poettering lenn...@poettering.net:
 Hmm, I am not convinced we want to come up with our own matching syntax
 there.

 A different approach might be to just use awk here. awk has been
 invented for this kind of matching. So it might be nicer to just have a
 mode where we have an output that is easily processable by awk, and then
 people can do all matching with awk?

 awk is pretty well known and widely spoken. And I think it is pretty
 good with CSV style dumps, so maybe we should just support that nicely...

+1


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-16 Thread Lennart Poettering
On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote:

 Something like systemctl --enabled would certainly be much more
 useful for such cases than the current systemctl --all, yet
 there will still be a lot of oneshot stuff, which are supposed to be
 dead, so a separate state for !oneshot  enabled  exited services
 like stopped (in place of inactive) and maybe a view like systemctl
 --stopped would be of a great help from my sysadmin's perspective.

Hmm, thinking about this: wouldn't it be a lot more useful for your case
if we add an option which cuases services to enter fail if a service
exits cleanly, but does so for no reason, i.e. without being asked to do
that from systemd?

or maybe that should even be the default for most services? After all
only services which implement exit-on-idle would otherwise exit cleanly
just for fun without being asked for that...

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Inactive/dead services that are enabled are indistinguishable from unused or oneshot services

2011-03-16 Thread Mike Kazantsev
On Thu, 17 Mar 2011 01:39:19 +0100
Lennart Poettering lenn...@poettering.net wrote:

 On Thu, 24.02.11 13:55, Mike Kazantsev (mk.frag...@gmail.com) wrote:
 
  Something like systemctl --enabled would certainly be much more
  useful for such cases than the current systemctl --all, yet
  there will still be a lot of oneshot stuff, which are supposed to be
  dead, so a separate state for !oneshot  enabled  exited services
  like stopped (in place of inactive) and maybe a view like systemctl
  --stopped would be of a great help from my sysadmin's perspective.
 
 Hmm, thinking about this: wouldn't it be a lot more useful for your case
 if we add an option which cuases services to enter fail if a service
 exits cleanly, but does so for no reason, i.e. without being asked to do
 that from systemd?
 
 or maybe that should even be the default for most services? After all
 only services which implement exit-on-idle would otherwise exit cleanly
 just for fun without being asked for that...
 

I think it'd be an improvement, but that'd also give failed state a
bit more ambiguity, although maybe it's not such a bad thing.

Experiencing several reboots on a machine with 50+ enabled daemons
I've noticed that some of them (mostly the ones, started via some
laucher script like apachectl, pg_ctl, ejabberdctl, etc) tend to
cleanly fail randomly on start just because GuessMainPID= mechanism
fails and systemd actually kills the service.
Proposed solution should at least be useful to detect this (quite
common) cases. These are mostly one-time issues however, showing a bug
in systemd unit file.

Shortcoming of this approach is that cleanly stopped but enabled
services still won't be shown anywhere, so you can't really assert that
all the services I've requested are running, which kinda beats the
purpose of such display - you still can't trust it (or rather it
doesn't show what you need) and have to either deploy software to work
around this shortcoming or check status of all the services manually.

I understand that there's a limited number of reasons for such clean
stop (manual interaction, units like rsyslog.service, Conflicts=,
isolate, etc), but still it's a wrong way to approach the particular
problem.

I've solved the problem for myself by writing a simple dbus-python
script (http://goo.gl/V6e7V). It shows exactly everything that's
enabled and not active (with oneshot exception), not some random
subset of this.

Unfortunately, new rsyslog.service (and services using systemctl stop
directly) can affect such display, which I think shows the flawed
assumption that enabled in systemd means should be active,
period (with the exception of oneshot units) on my part, and I don't
know easy solution to this, short of adding another enabled-like state.


-- 
Mike Kazantsev // fraggod.net


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel