subject:"\[systemd\-devel\] Revisiting the ExecRestart issue"

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-05-04 Thread Brandon Black

On Thu, Apr 24, 2014 at 1:34 AM, Lennart Poettering
wrote:

> On Wed, 23.04.14 21:20, Brandon Black (blbl...@gmail.com) wrote:
>
> > The problem here is that the daemon performs operations that require root
> > privilege on startup, and then dumps its privileges for runtime, and thus
> > its execve'd child won't have the root privs it would need to start
> > everything over again.  In theory some of these privileged things, like
> > listening sockets, could be handed to the exec child, but that assumes
> the
> > configured set of listening sockets hasn't changed (which might be the
> > reason for the restart).
>
> There's always the option to raise the privs again via some setuid
> helper...

That would seem to defeat the purpose of losing them in the first place
(limiting the damage potential of a compromised daemon).

> > Other things like mlockall() can't be handed off
> > over fork/execve once privileges are gone.
>
> mlockall()? what's that supposed to do here? this is usually snakeoil...

This seems like another side-topic, but what about mlockall is snakeoil?
 Should that be documented somewhere? It was just the first example of a
privileged operation we use that came to mind.  It's an optional
(default-off) thing in gdnsd, but it seems like if you care about response
latency enough to minimize syscalls, minimizing pagefaults in the presence
of less-important batch processes that may consume significant memory is a
good idea as well.  In any case, no, I don't think I can completely get rid
of privileged ops on startup at this time.

> > officially letting a control process from ExecReload= become the main
> > process via some reasonably-standard mechanism?  That's already what
> > happens to the "control process" for ExecStart=.
>
> Well ExecStart= is very special, it's not the control process, really.

Semantics.  Can we not have some other verb be as special?  My point is,
the systemd code certainly knows how to do this, it just doesn't chose to
for ExecReload.  There could be an option declared for that behavior,
though, if it were a solution.

> > 2) Given the above, would it be reasonable that if a control process
> from a
> > verb like ExecReload sent a MAINPID= message over the control socket,
> > systemd would accept this as the new main pid *and* internally take care
> of
> > promoting the specified PID to the proper cgroup?
>
> Hmm, this becomes messy if the daemon actually is more than one
> process (think worker processes)... Not sure how we would handle that?

I assume you mean worker processes which detach themselves from the parent
via setsid() and thus don't show a relationship to it (why would the daemon
chose to disassociate worker children like that? I have no idea).
 Otherwise we could just move the whole process group of the sender of the
MAINPID= message.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-23 Thread Lennart Poettering

On Wed, 23.04.14 21:20, Brandon Black (blbl...@gmail.com) wrote:

> The problem here is that the daemon performs operations that require root
> privilege on startup, and then dumps its privileges for runtime, and thus
> its execve'd child won't have the root privs it would need to start
> everything over again.  In theory some of these privileged things, like
> listening sockets, could be handed to the exec child, but that assumes the
> configured set of listening sockets hasn't changed (which might be the
> reason for the restart).  

There's always the option to raise the privs again via some setuid
helper...

> Other things like mlockall() can't be handed off
> over fork/execve once privileges are gone.

mlockall()? what's that supposed to do here? this is usually snakeoil...

> > Control processes really can't become the main process I
> > fear...
> 
> 
> They can; I've already done it by writing to /sys as documented above, but
> that doesn't seem like a reliable API for doing so going forward on all
> platforms and in all situations.  What's the fundamental problem with

Also note that sooner or later cgroupfs write access will be removed
from userspace applications...

> officially letting a control process from ExecReload= become the main
> process via some reasonably-standard mechanism?  That's already what
> happens to the "control process" for ExecStart=.

Well ExecStart= is very special, it's not the control process, really.

> I'd propose two changes (and work on the patches myself) that would make
> this case work for me reliably, if they're acceptable:
> 
> 1) Can we get $NOTIFY_SOCKET set for control procs like ExecReload
> when NotifyAccess=all ?  That's what I initially thought that setting would
> do, but apparently it doesn't.  Or any other standard mechanism I could
> rely on so that I'm not hardcoding a fallback socket path.

Hmm, we don't do this yet? This sounds like a useful thing to do. Added
to the TODO list for now...

> 2) Given the above, would it be reasonable that if a control process from a
> verb like ExecReload sent a MAINPID= message over the control socket,
> systemd would accept this as the new main pid *and* internally take care of
> promoting the specified PID to the proper cgroup?

Hmm, this becomes messy if the daemon actually is more than one
process (think worker processes)... Not sure how we would handle that?

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-23 Thread Lennart Poettering

On Wed, 23.04.14 21:01, Brandon Black (blbl...@gmail.com) wrote:

> > At this point in time I am quite sure that ExecReload= should simply be
> > used for this.
> 
> That's an acceptable answer, although I think in the long term it poses
> some questions about additional custom verbs, since at least gdnsd now
> really wants two different reload-like actions (a simple SIGHUP that
> reloads zone data vs the overlapped-restart under discussion here).  But
> for now, the easy case (SIGHUP) can just be done outside of
> systemd/systemctl without any ill effects.

Yeah, I am not convinced that custom verbs are something to support in
systemd. They are not generic, and systemd/systemctl should really just
cover the generic verbs. I mean, as soon as you do generic verbs you
probably also want to extend them with extra modifiying switches and so
on. But that all is probably better done in some specific, auxiliary
tool shipped along with the package.

I mean, there's really no point in abstracting something within the
systemd/systemctl context that is inherently not abstractable, if you
follow what I mean.

SMF allowed extending services with custom verbs. I don't think that
that was one of their better design decisions...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-23 Thread Brandon Black

On Wed, Apr 23, 2014 at 3:06 PM, Lennart Poettering
wrote:

> > To recap my results: there were primarily two things in the way of
> naively
>
> using ExecReload to trigger gdnsd's overlapped restart:
> >
> > 1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
> > new daemon, which is a descendant of the ExecReload process.  The
> > ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
> > NotifyAccess=all.  So I hacked around that by having the daemon set
> > $NOTIFY_SOCKET for itself, to the value
> "@/org/freedesktop/systemd1/notify",
> > which seems semi-standard for the moment.
> >
> > 2) ExecReload control processes can't become the MAINPID even after
> > notification because they're not in the correct cgroup (or subgroup, or
> > whatever it is that's special about most control procs), unlike
> > Start'scontrol process, which is in the right
> > cgroup for its descendants to become MAINPID successfully.  This was
> hacked
> > around by grabbing the basic unit name from sd_pid_get_unit() (let's call
> > the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
> > /system.slice/$U/cgroup.procs" from the new daemon before it drops root
> > privs and later notifies about the MAINPID switch.
>
> Hmm, yeah, the new process really needs to be forked off the original
> main process.

The problem here is that the daemon performs operations that require root
privilege on startup, and then dumps its privileges for runtime, and thus
its execve'd child won't have the root privs it would need to start
everything over again.  In theory some of these privileged things, like
listening sockets, could be handed to the exec child, but that assumes the
configured set of listening sockets hasn't changed (which might be the
reason for the restart).  Other things like mlockall() can't be handed off
over fork/execve once privileges are gone.

> Control processes really can't become the main process I
> fear...

They can; I've already done it by writing to /sys as documented above, but
that doesn't seem like a reliable API for doing so going forward on all
platforms and in all situations.  What's the fundamental problem with
officially letting a control process from ExecReload= become the main
process via some reasonably-standard mechanism?  That's already what
happens to the "control process" for ExecStart=.

I'd propose two changes (and work on the patches myself) that would make
this case work for me reliably, if they're acceptable:

1) Can we get $NOTIFY_SOCKET set for control procs like ExecReload
when NotifyAccess=all ?  That's what I initially thought that setting would
do, but apparently it doesn't.  Or any other standard mechanism I could
rely on so that I'm not hardcoding a fallback socket path.

2) Given the above, would it be reasonable that if a control process from a
verb like ExecReload sent a MAINPID= message over the control socket,
systemd would accept this as the new main pid *and* internally take care of
promoting the specified PID to the proper cgroup?

-- Brandon
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-23 Thread Brandon Black

On Wed, Apr 23, 2014 at 1:18 AM, Lennart Poettering
wrote:
>
> UDP is lossy anyway, and a startup delay of a few seconds shouldn't be
> an issue at all. If we are speaking of 15min or so here, that might be a
> problem, but otherwise this really sounds fine. And if your daemon
> really takes 15min this sounds like something to look into...

 There are many values between a few seconds and 15 minutes that are both
(a) reasonable startup times given the user's large configuration and (b)
undesirable downtime for a critical service like DNS.

> At this point in time I am quite sure that ExecReload= should simply be
> used for this.
>

That's an acceptable answer, although I think in the long term it poses
some questions about additional custom verbs, since at least gdnsd now
really wants two different reload-like actions (a simple SIGHUP that
reloads zone data vs the overlapped-restart under discussion here).  But
for now, the easy case (SIGHUP) can just be done outside of
systemd/systemctl without any ill effects.

-- Brandon
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-23 Thread Lennart Poettering

On Tue, 01.04.14 01:55, Brandon Black (blbl...@gmail.com) wrote:

> On Fri, Mar 28, 2014 at 12:12 PM, Brandon Black  wrote:
> >
> >   Given where things are at today, as best I can tell my best bet is to go
> > down that sort of road, though, and try to clone over the cgroups
> > memberships manually somehow during an ExecReload= command for this restart
> > (even though it really is a restart), and leaving true reloads (SIGHUP to a
> > running daemon) to be done from outside systemd.
> >
> 
> I've done some experimenting this evening (on a Fedora 20 system w/
> systemd-208),
> playing with methods of MAINPID notification and how to coerce
> ExecReloadinto letting me do an overlapped restart.  The result is
> that I can make it
> work, but it's hacky.  The main thing that bothers me about it is that the
> mechanisms probably aren't officially supported interfaces and my methods
> will randomly fail on a future version of systemd (or a
> differently-configured distro).

You should be able to either inform systemd of new new PID by sending
MAINPID, or simply write a new PID file out, systemd should read it
again, if you configure it with PIDFile.

> To recap my results: there were primarily two things in the way of naively
> using ExecReload to trigger gdnsd's overlapped restart:
> 
> 1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
> new daemon, which is a descendant of the ExecReload process.  The
> ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
> NotifyAccess=all.  So I hacked around that by having the daemon set
> $NOTIFY_SOCKET for itself, to the value "@/org/freedesktop/systemd1/notify",
> which seems semi-standard for the moment.
> 
> 2) ExecReload control processes can't become the MAINPID even after
> notification because they're not in the correct cgroup (or subgroup, or
> whatever it is that's special about most control procs), unlike
> Start'scontrol process, which is in the right
> cgroup for its descendants to become MAINPID successfully.  This was hacked
> around by grabbing the basic unit name from sd_pid_get_unit() (let's call
> the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
> /system.slice/$U/cgroup.procs" from the new daemon before it drops root
> privs and later notifies about the MAINPID switch.

Hmm, yeah, the new process really needs to be forked off the original
main process. Control processes really can't become the main process I
fear...


Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-04-22 Thread Lennart Poettering

On Fri, 28.03.14 12:12, Brandon Black (blbl...@gmail.com) wrote:

> 4) Socket Activation! I know this is what some will scream when they skim
> the above, but it's not a realistic solution in this case for a few reasons:
> a) The startup delay, in some cases, can be many whole wallclock
> seconds.  This is necessary and acceptable in the general sense (this is
> network service that people use with large server-side installations, not a
> desktop thing).

UDP is lossy anyway, and a startup delay of a few seconds shouldn't be
an issue at all. If we are speaking of 15min or so here, that might be a
problem, but otherwise this really sounds fine. And if your daemon
really takes 15min this sounds like something to look into...

> c) Another side-point that might be better addressed in another thread:
> even if both of the above weren't true, this daemon uses several sockets
> for multiple "roles" internally, some of which share all low-level details
> (e.g. two distinct use-cases for multiple TCP sockets that serve different
> high-level protocols, where the user might choose arbitrary ports for
> both).  I'm not seeing any trivial way to distinguish these via socket
> activation - perhaps some kind of socket "label" that could be accessed by
> the daemon via sd_* APIs to distinguish would be useful here?

You can query the listening ports and properties using getsockname() and
friends. Also, sd-daemon provides sd_is_socket() which allows you to do
similar checks.

On our TODO list is to add an "fd store" concept to units where service
code can push fds to systemd, and pull them out again (to make reloads
nice). At the same time we'd add concept of labelling them.

> 5) ExecReexec - this was one of Lennart's musings in the previous thread in
> Dec2012.  However, this doesn't map well to gdnsd's model if implemented in
> the "obvious" manner of having ExecRexec send a signal to the running
> daemon to re-exec itself.  It would map well if gdnsd could respond to
> SIGFOO via fork()->execve() on itself with the "restart" verb and let the
> new instance replace itself when it's ready.  The problem is that the new
> restarting copy needs elevated privileges to bind its sockets, which it
> then loses permanently by the time it becomes a real daemon (and thus can't
> provide to the newly execve'd copy).  In some cases we could pass on the
> sockets on by clearing FD_CLOEXEC, but there's no guarantee as to what
> socket bindings the new daemon will have: typically the same as before, but
> perhaps the address or port number has changed in the config file for one
> of five different sockets.

At this point in time I am quite sure that ExecReload= should simply be
used for this.

I am quite sure that "systemctl restart" should do the same thing for
all services, and that means stopping the service, followed by starting,
and have both of these jobs follow the usual ordering dependency
logic (so that other jobs might be order between the stop/start!). 

OTOH "systemctl reload" should be that verb where some service-specific
reload operation is executed, where no restriction is made how this
ultimately is implemented, and where no ordering logic really
applies. Whether a process reexec is done for this or not is an
implementation detail of the specific service, where systemd shouldn't
really have to be involved. In general the only suggestion we'd make is
that the effect of ExecReload should be synchronous, as comprehensive as
possible, yet also as graceful as possible. Reexecing as part of reload
sounds like a good idea, if enough care is taken not to stop any ongoing
connections or transactions.

There have been some changes in systemd a while back that makes sure
that ExecReload= can replace the process, so this should pretty much
work now if the daemon is up to it.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-03-31 Thread Brandon Black

On Fri, Mar 28, 2014 at 12:12 PM, Brandon Black  wrote:
>
>   Given where things are at today, as best I can tell my best bet is to go
> down that sort of road, though, and try to clone over the cgroups
> memberships manually somehow during an ExecReload= command for this restart
> (even though it really is a restart), and leaving true reloads (SIGHUP to a
> running daemon) to be done from outside systemd.
>

I've done some experimenting this evening (on a Fedora 20 system w/
systemd-208),
playing with methods of MAINPID notification and how to coerce
ExecReloadinto letting me do an overlapped restart.  The result is
that I can make it
work, but it's hacky.  The main thing that bothers me about it is that the
mechanisms probably aren't officially supported interfaces and my methods
will randomly fail on a future version of systemd (or a
differently-configured distro).

To recap my results: there were primarily two things in the way of naively
using ExecReload to trigger gdnsd's overlapped restart:

1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
new daemon, which is a descendant of the ExecReload process.  The
ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
NotifyAccess=all.  So I hacked around that by having the daemon set
$NOTIFY_SOCKET for itself, to the value "@/org/freedesktop/systemd1/notify",
which seems semi-standard for the moment.

2) ExecReload control processes can't become the MAINPID even after
notification because they're not in the correct cgroup (or subgroup, or
whatever it is that's special about most control procs), unlike
Start'scontrol process, which is in the right
cgroup for its descendants to become MAINPID successfully.  This was hacked
around by grabbing the basic unit name from sd_pid_get_unit() (let's call
the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
/system.slice/$U/cgroup.procs" from the new daemon before it drops root
privs and later notifies about the MAINPID switch.

(And of course, re-purposing ExecReload isn't ideal in the first place.
 It's semantically wrong and it wastes the reload verb, forcing actual
reload actions to need to happen from outside of systemctl)

The resulting commit (which is off in a testing branch of a development
branch for now, there's plenty of time to work out alternate solutions) is
here:

https://github.com/blblack/gdnsd
/commit/17a40b0483da7d072912169e832df31d69349440

>From going through this exercise, I think I can refine my feature-plea to
this: What would be ideal (well, from the limited perspective of making
things easier for this one daemon) would be an ExecRestart (or whatever)
verb which acts almost exactly like ExecStart (correct control group for
final daemon, gets $NOTIFY_SOCKET), but has its own separate command string
and doesn't pre-check that the service is currently considered inactive.  I
don't think it would be too hard to write such a patch, but my first
concern is whether such a patch is even remotely likely to be accepted, or
whether there are better alternatives (other patches that could be made, or
perhaps better interfaces I'm unaware of in the current code can obviate
the hacky stuff above without any patching).
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-03-31 Thread Brandon Black

On Sat, Mar 29, 2014 at 9:09 PM, Michael Scherer  wrote:

> Le vendredi 28 mars 2014 à 12:12 -0500, Brandon Black a écrit :
> > 4) Socket Activation! I know this is what some will scream when they
> > skim the above, but it's not a realistic solution in this case for a
> > few reasons:
> > a) The startup delay, in some cases, can be many whole wallclock
> > seconds.  This is necessary and acceptable in the general sense (this
> > is network service that people use with large server-side
> > installations, not a desktop thing).
>
> It only occurs on the first start, no ?

No, these delays (well, for configurations large enough to involve
substantial delays) will happen on every fresh start, include "restart"
starts.  This means the sequential stop->start that systemd wants to do is
always going to give an availability gap where no daemon is processing
requests for a while.  Socket activation would keep the sockets open during
that window, but the buffers would just overflow anyways and/or the
eventual responses would be way too late to matter.  The command I want to
execute for ExecRestart doesn't have this issue because it knows how to
coordinate with itself for overlapping, so that the expensive "start"
operations happen before "stop".

> > socket "label" that could be accessed by the daemon via sd_* APIs to
> > distinguish would be useful here?
>
> You can use getsockopt to get some information, and match the port/type
> to the appropriate structure.
> See https://trac.torproject.org/projects/tor/ticket/8908 for a patch
> doing that kind of thing for tor.
>

What I was trying to say (perhaps very unclearly): there might be
distinctions between the many sockets which getsockname() does not capture.
 For a generic example: the daemon may allow the user to configure 0->N TCP
sockets for HTTP and 0->M other TCP sockets for HTTPS.  The user gets to
choose arbitrary port numbers for them all.  getsockname() is going to show
me M+N TCP sockets on arbitrary ports, but how does the information about
which was meant for which role get from user -> service unit -> actual
daemon code?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Revisiting the "ExecRestart" issue

2014-03-29 Thread Michael Scherer

Le vendredi 28 mars 2014 à 12:12 -0500, Brandon Black a écrit :
> 
> Hi all,
>I've brought this up before, but I became busy/discouraged and
> dropped the ball.  As systemd becomes increasingly widely deployed, I
> can no longer afford to do so, so I'd like to explore this area a bit
> further on the list again and see if we can't come up with a workable
> solution, or if perhaps I've missed some systemd/cgroups change in the
> past year or so that already allows a workaround.
>
> [.. snip .. ]
>
> 4) Socket Activation! I know this is what some will scream when they
> skim the above, but it's not a realistic solution in this case for a
> few reasons:
> a) The startup delay, in some cases, can be many whole wallclock
> seconds.  This is necessary and acceptable in the general sense (this
> is network service that people use with large server-side
> installations, not a desktop thing).

It only occurs on the first start, no ?


> b) The primary socket traffic we care about is UDP, and further we
> *really* care about request->response latency for this traffic.  Even
> if you could set a large enough receive buffer to handle several
> seconds of heavy UDP requests (and you can't, for at least some
> installations), the multi-second-delay in the responses isn't
> reasonable.

Again, that's a multiple second delay only for the first start, after,
this will be the regular way since the socket is directly used by the
daemon.


> c) Another side-point that might be better addressed in another
> thread: even if both of the above weren't true, this daemon uses
> several sockets for multiple "roles" internally, some of which share
> all low-level details (e.g. two distinct use-cases for multiple TCP
> sockets that serve different high-level protocols, where the user
> might choose arbitrary ports for both).  I'm not seeing any trivial
> way to distinguish these via socket activation - perhaps some kind of
> socket "label" that could be accessed by the daemon via sd_* APIs to
> distinguish would be useful here?

You can use getsockopt to get some information, and match the port/type
to the appropriate structure.
See https://trac.torproject.org/projects/tor/ticket/8908 for a patch
doing that kind of thing for tor. 


-- 
Michael Scherer

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] Revisiting the "ExecRestart" issue

2014-03-28 Thread Brandon Black

Hi all,
   I've brought this up before, but I became busy/discouraged and dropped
the ball.  As systemd becomes increasingly widely deployed, I can no longer
afford to do so, so I'd like to explore this area a bit further on the list
again and see if we can't come up with a workable solution, or if perhaps
I've missed some systemd/cgroups change in the past year or so that already
allows a workaround.

  To recap the previous discussion, see the threads at these links (same
thread, two different months in the thread-list):
http://lists.freedesktop.org/archives/systemd-devel/2012-November/007595.html
http://lists.freedesktop.org/archives/systemd-devel/2012-December/007804.html
  As well as this referenced/related thread from even earlier (different
author, but I suspect his issues are similar at the core of things):
http://lists.freedesktop.org/archives/systemd-devel/2012-June/005400.html

  The daemon I'm working on is the DNS server gdnsd (
https://github.com/~blblack/gdnsd ).  While trying to keep this short (fat
chance!), these are the core unique things that matter about it from a
systemd perspective, and how they seem to paint me into a corner:

0) It's meant to be somewhat portable outside of systemd and Linux, at
least to the *BSDs.  While I'm completely open to doing some small
(runtime-|autoconf-)conditional blocks of systemd-specific code in place of
traditional daemon code where it makes sense, I can't go and rewrite
everything in a new structure that only makes sense under systemd.

1) The daemon is designed to work as its own initscript.  Not unique, but
certainly less-common.  It ships a daemon binary which accepts
initscript-actions on the commandline.  So, "/usr/sbin/gdnsd start" forks
off a daemon, "/usr/sbin/gdnsd stop" kills the existing daemon, ditto for
"/usr/sbin/gdnsd status", and all the other common initscript verbs.  The
internal code is already handling unracy stops and starts, pidfile locking,
reliable "status", proper daemonization, privilege drop, etc through all of
this.  Most traditional sysvinit-like systems of course will use a real
shell initscript at runtime, and the real initscript can just invoke these
verbs, perhaps redirecting their verbose output to /dev/null (and know that
pidfiles and processes and whatnot are already well-managed and not need to
write clunky/racy shell code to try to solve those problems).

2) During startup of a fresh daemon, a number of operations have to happen
in a serial fashion due to hard dependency constraints, and for some users
these startup operations can take significant wallclock time relative to
desired service availability.  These events including things like loading
zonefiles (which can be expensive for large files or large counts of files,
which is a real world use-case today) and doing initial network-monitoring
polls of remote resources to set their initial state (which involve
timeouts for network responses - these are done in parallel to the degree
possible, but this can still add several seconds for reasonable
monitor-counts with reasonable timeouts).  All of these things must
complete before the new daemon can begin answering requests legitimately on
its listening sockets.

3) As you can imagine, this creates a problem for the traditional "restart"
verb: If one stops and then starts, there can be a long gap of service
unavailability.  To remedy this, I moved in the direction of having the
internal "restart" verb work in an overlapped fashion.  The way "restart"
is implemented basically follows this logic:
   a) restart is just a special case of "start"
   b) it parses configuration and does all the potentially-long operations
of a normal start first
   c) if anything fails (due to a new configuration error, etc), it dies
and leaves the old daemon instance alone.
   d) when it successfully reaches the point where it and the existing
daemon can no longer co-exist (because it needs to steal the bound
sockets), it *then* kills the old daemon using the "stop" logic, locks the
pidfile for itself, binds the sockets, and continues on as the new daemon.
   e) (and actually, in the upcoming next branch, SO_REUSEPORT will be used
to overlap the sockets as well, allowing for truly zero-packets-lost during
these restart operations).

4) Socket Activation! I know this is what some will scream when they skim
the above, but it's not a realistic solution in this case for a few reasons:
a) The startup delay, in some cases, can be many whole wallclock
seconds.  This is necessary and acceptable in the general sense (this is
network service that people use with large server-side installations, not a
desktop thing).
b) The primary socket traffic we care about is UDP, and further we
*really* care about request->response latency for this traffic.  Even if
you could set a large enough receive buffer to handle several seconds of
heavy UDP requests (and you can't, for at least some installations), the
multi-second-delay in the responses isn't

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

Re: [systemd-devel] Revisiting the "ExecRestart" issue

[systemd-devel] Revisiting the "ExecRestart" issue

11 matches

Site Navigation

Mail list logo

Footer information