Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-03 Thread Mike Hearn
Ah, to clarify, I'm talking about app-specific servers not Linux
system services, so dbus isn't really relevant (what would it be used
for?). The sort of programs that tend to be packaged with Docker
today, or deployed using AWS Lambda or just copied up to the server.
For example a typical business-specific Ruby on Rails or Spring Boot
app. Such programs don't have much use for dbus, will have complex but
short lived per-request state and will often be written on other
platforms, only deployed to Linux. You don't want to just cut a
connection whilst it's live because that'd break things like file
downloads, users would see 500 errors, at the same time trying to
serialize the full state of the connection to a buffer is impractical
because the app is highly complex and changing regularly (e.g. daily).


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-03 Thread Lennart Poettering
On Fr, 03.03.23 10:16, Mike Hearn (mike@hydraulic.software) wrote:

> Sorry, by "apps" I meant anything not supplied by OS developers. In
> this context, servers e.g. custom web app servers. I do currently run
> some of those with DynamicUser=1 and similar.
>
> > As long as the tool updating the disk image creates the new one under
> > a temporary name, and then replaces the old one with it via renaming,
> > upgrading portable services is as easy as restarting them
>
> Great.
>
> > > > But of course such an approach requires that services are written in a
> > > > way this is possible
> > >
> > > Right. I think that'd be quite hard to do especially with servers
> > > written in portable languages that don't expose stuff unavailable on
> > > Windows e.g. the JVM.
> >
> > Why would that be? portable services are just regular services that
> > happen to come with their own disk images, that's all.
>
> Sorry I meant the serialization and transmission of FDs to the fd
> store to support user-transparent restart. For example the Java API
> has no way to send fds over a UNIX domain socket because Windows
> doesn't support that, so you need third party libraries. And then it
> would appear to turn into a general problem of serializing the entire
> state of the app which is quite hard. Easier to assume that one
> connection should stick with one server version for the lifetime of
> that connection and then just phase in new servers as new connections
> roll in.

Right, writing system services in Java is indeed a headache I am
sure. No ready notifications, no socket activation, no fdstore, no
signals, no dbus, no watchdog logic, …

it's a race to the bottom if you never want to make use of the *good*
stuff. But then you shouldn't be surprised if you can't do certain
things...

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-03 Thread Luca Boccassi
On Fri, 3 Mar 2023 at 09:17, Mike Hearn  wrote:
> > > > But of course such an approach requires that services are written in a
> > > > way this is possible
> > >
> > > Right. I think that'd be quite hard to do especially with servers
> > > written in portable languages that don't expose stuff unavailable on
> > > Windows e.g. the JVM.
> >
> > Why would that be? portable services are just regular services that
> > happen to come with their own disk images, that's all.
>
> Sorry I meant the serialization and transmission of FDs to the fd
> store to support user-transparent restart. For example the Java API
> has no way to send fds over a UNIX domain socket because Windows
> doesn't support that, so you need third party libraries. And then it
> would appear to turn into a general problem of serializing the entire
> state of the app which is quite hard. Easier to assume that one
> connection should stick with one server version for the lifetime of
> that connection and then just phase in new servers as new connections
> roll in.

It only sounds easier, because it postpones the difficult part for
later. It requires every service to behave perfectly well and
according to the specification, and delegates process management down
to them. Except services cannot be relied upon, and will get it wrong,
and that will cause multiple versions of the same service to exist at
the same time and conflict with each other, and require manual
intervention to fix. On a "pet" machine (ie: your laptop) it's fixable
busywork, on a system with tens of thousands of headless nodes not
much so.

It is not a reliable and trustworthy pattern. The advantage of moving
state across via FD is not only speed and memory (double amount of
services, double amount of memory/cpu consumed and double hard cap of
memory needed on the system), but it's mainly about reliability by not
having to delegate process management to clients. Ie: when systemd
tells you to stop, you stop, end of story.


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-03 Thread Mike Hearn
Sorry, by "apps" I meant anything not supplied by OS developers. In
this context, servers e.g. custom web app servers. I do currently run
some of those with DynamicUser=1 and similar.

> As long as the tool updating the disk image creates the new one under
> a temporary name, and then replaces the old one with it via renaming,
> upgrading portable services is as easy as restarting them

Great.

> > > But of course such an approach requires that services are written in a
> > > way this is possible
> >
> > Right. I think that'd be quite hard to do especially with servers
> > written in portable languages that don't expose stuff unavailable on
> > Windows e.g. the JVM.
>
> Why would that be? portable services are just regular services that
> happen to come with their own disk images, that's all.

Sorry I meant the serialization and transmission of FDs to the fd
store to support user-transparent restart. For example the Java API
has no way to send fds over a UNIX domain socket because Windows
doesn't support that, so you need third party libraries. And then it
would appear to turn into a general problem of serializing the entire
state of the app which is quite hard. Easier to assume that one
connection should stick with one server version for the lifetime of
that connection and then just phase in new servers as new connections
roll in.


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-03 Thread Lennart Poettering
On Do, 02.03.23 23:05, Mike Hearn (mike@hydraulic.software) wrote:

> > There's currently no mechanism for that. File an RFE issue.
>
> https://github.com/systemd/systemd/issues/26647
>
> > In the "Portable Services" concept we currently assume you update the
> > disk image ("DDI") the service is on, and then simply restart the
> > service while leaving the socket around.
>
> I've always wanted to understand portable services better. I never
> quite grokked if portable services were meant for apps or operating
> system level stuff, or if it didn't matter.

Not sure what you mean by "apps"? desktop apps? They are conceptually
suitable for that, but not realistically, since we currently require
privs to mount disk images, and thus the whole concept is simply not
available for unpriv code.

So the focus is system-level services or system-level
"apps". i.e. stuff that might or might not have privs, stuff that
could use DynamicUser=1 (though this is not a requirement) and similar.

> It also wasn't quite clear
> to me how upgrades worked for them either - presumably if you stick
> them inside a deb or rpm you have the same problem, or if you rsync up
> a new image, etc. It'd be great to have some blog posts that tackle
> portable services end-to-end from the perspective of running
> servers.

As long as the tool updating the disk image creates the new one under
a temporary name, and then replaces the old one with it via renaming,
upgrading portable services is as easy as restarting them (well,
unless you make changes to the service definitions, in that case you
need to issue "portablectl reattach").

if tools update files like that then the old version of the portable
services can use the old image as long as it wants, and only once the
last reference to it is dropped it disappears from memory on disk. At
the same time the new invocatoin will only use the new disk image.

> > But of course such an approach requires that services are written in a
> > way this is possible
>
> Right. I think that'd be quite hard to do especially with servers
> written in portable languages that don't expose stuff unavailable on
> Windows e.g. the JVM.

Why would that be? portable services are just regular services that
happen to come with their own disk images, that's all.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-02 Thread Mike Hearn
> There's currently no mechanism for that. File an RFE issue.

https://github.com/systemd/systemd/issues/26647

> In the "Portable Services" concept we currently assume you update the
> disk image ("DDI") the service is on, and then simply restart the
> service while leaving the socket around.

I've always wanted to understand portable services better. I never
quite grokked if portable services were meant for apps or operating
system level stuff, or if it didn't matter. It also wasn't quite clear
to me how upgrades worked for them either - presumably if you stick
them inside a deb or rpm you have the same problem, or if you rsync up
a new image, etc. It'd be great to have some blog posts that tackle
portable services end-to-end from the perspective of running servers.

> But of course such an approach requires that services are written in a
> way this is possible

Right. I think that'd be quite hard to do especially with servers
written in portable languages that don't expose stuff unavailable on
Windows e.g. the JVM. Also, from the perspective of a packaging
tool/docker alternative, asking users to add major new features to
their servers is a non-starter. You don't need to do that stuff with
containers.


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-02 Thread Lennart Poettering
On Mo, 20.02.23 11:05, Mike Hearn (mike@hydraulic.software) wrote:

> Hi,
>
> I'm exploring socket activation as part of work on a tool that makes
> systemd-controlled servers easier to deploy and use. Given a config
> file the tool builds a package that contains the app and systemd
> units, uploads it, installs it with dependency resolution, the
> postinst scripts start the service etc. It's sort of a Docker
> alternative that's more classically Linux-y, designed for a world
> where really big machines are really cheap and thus many apps don't
> need to be cattle-ized. Pets are sometimes OK.
>
> As part of this I'm looking at how to make upgrades smooth. Socket
> activation already allows you to shut down, upgrade and restart a
> service without dropping connections because systemd will hold the
> connections until the service comes back but there are a couple of
> aspects that weren't really clear to me from reading the excellent
> "pid eins" blog post series. Could we maybe get a new blog post
> exploring these issues?
>
> 1. How exactly should you stop a service that's socket activated so it
> won't be re-activated during the upgrade but new connections won't be
> lost, e.g. in package scripts that are executed across upgrades.
> Currently the scripts stop the service before the upgrade happens,
> then restart afterwards.

There's currently no mechanism for that. File an RFE issue.

In the "Portable Services" concept we currently assume you update the
disk image ("DDI") the service is on, and then simply restart the
service while leaving the socket around.

I can see though that if you operate without disk images, then you
might want an explicit synchronization step.

Currently we implement a "freeze" concept for services (which uses the
cgroup freezer underneath), maybe we should extend this for socket
units to mean that we keep the sockets open but don#t act
anymore. You'd then issue "systemctl freeze foobar.socket" before you
do your upgrade and "systemctl thaw" afterwards.

> 2. Is it possible to run two versions of a service unit at once such
> that the old version finishes handling connections and then shuts
> down, whilst new connections are being handled by the new version?

Currently, not.

We have been discussing this scenario many times, and we could
certainly add something for this, but this kinda conflicts with the
goal to provide a pristine execution context for services: if we'd
restart a service like this and leave old processes around then the
cgroup of the service would of course still contain "legacy"
processes, which contradicts the rule that we always start with a
pristine execution environment.

So, there are two conflicting goals: the goal of guaranteeing clean
invocation and the goal of allowing old stuff to "passivate".

Inside of Microsoft we mostly settled on a different approach: instead
of leaving processes around during such restarts, let's instead
serialize all state of ongoing connections and upload their sockets to the
fdstore (i.e. see FileDescriptorStore= docs), along with a memfd of
the serialized state. Benefit of this approach: you solve the problem
properly and fully: after the restart only new code is in place, and
all old code is flushed out.

But of course such an approach requires that services are written in a
way this is possible, i.e. are capable of serializing their fully
state for all ongoing connections along with the socket fds to the
fdstore, and then deserialize all that when initializing again. This
is not hard but also not exactly trivial.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-01 Thread Michal Koutný
Hello Mike.

On Mon, Feb 20, 2023 at 11:05:41AM +0100, Mike Hearn  
wrote:
> 2. Is it possible to run two versions of a service unit at once such
> that the old version finishes handling connections and then shuts
> down, whilst new connections are being handled by the new version?

This is a recurring topic, tracked in [1]. I hope to make some progress
there soon.

Feel free to add your ideas there,
Michal

[1] https://github.com/systemd/systemd/issues/10228


signature.asc
Description: PGP signature


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-02-21 Thread Uoti Urpala
On Mon, 2023-02-20 at 12:22 +0100, Mike Hearn wrote:
> I see. So basically you have to keep the service running across the
> upgrade and then wait for it to shut down due to inactivity, then be
> restarted by systemd to make the update apply. Or alternatively you
> could make the app detect that it's been updated, stop accepting new
> connections, finish servicing the old connections, and then shut
> itself down once all existing connections are finished. On restart
> it'd then be using the new code, re-accept the socket from systemd
> and start accepting again.

Instead of "detect that it's been updated", I believe a more common and
recommendable approach would be to make it part of the daemon's normal
clean shutdown (for daemons where this behavior is appropriate). That
is, stop accepting new connections from the listening socket, but
finish serving already accepted connections. Then the "restart" part
alone is enough to switch to a new version without losing connections
(at least if things don't take so long that connections time out).



Re: [systemd-devel] Smooth upgrades for socket activated services

2023-02-20 Thread Mike Hearn
I see. So basically you have to keep the service running across the upgrade
and then wait for it to shut down due to inactivity, then be restarted by
systemd to make the update apply. Or alternatively you could make the app
detect that it's been updated, stop accepting new connections, finish
servicing the old connections, and then shut itself down once all existing
connections are finished. On restart it'd then be using the new code,
re-accept the socket from systemd and start accepting again.

I guess this can work for quiet services that are safe to change on disk
because they open everything at startup and never close or re-open the fds,
or if there's a snapshotting layer on top.


Re: [systemd-devel] Smooth upgrades for socket activated services

2023-02-20 Thread Michael Biebl
Am Mo., 20. Feb. 2023 um 11:06 Uhr schrieb Mike Hearn :
>
> Hi,
>
> I'm exploring socket activation as part of work on a tool that makes
> systemd-controlled servers easier to deploy and use. Given a config
> file the tool builds a package that contains the app and systemd
> units, uploads it, installs it with dependency resolution, the
> postinst scripts start the service etc. It's sort of a Docker
> alternative that's more classically Linux-y, designed for a world
> where really big machines are really cheap and thus many apps don't
> need to be cattle-ized. Pets are sometimes OK.
>
> As part of this I'm looking at how to make upgrades smooth. Socket
> activation already allows you to shut down, upgrade and restart a
> service without dropping connections because systemd will hold the
> connections until the service comes back but there are a couple of
> aspects that weren't really clear to me from reading the excellent
> "pid eins" blog post series. Could we maybe get a new blog post
> exploring these issues?
>
> 1. How exactly should you stop a service that's socket activated so it
> won't be re-activated during the upgrade but new connections won't be
> lost, e.g. in package scripts that are executed across upgrades.
> Currently the scripts stop the service before the upgrade happens,
> then restart afterwards.

Currently, there is no way to "freeze" the execution of a socket
activated service.
A feature I'm missing as well, fwiw.


[systemd-devel] Smooth upgrades for socket activated services

2023-02-20 Thread Mike Hearn
Hi,

I'm exploring socket activation as part of work on a tool that makes
systemd-controlled servers easier to deploy and use. Given a config
file the tool builds a package that contains the app and systemd
units, uploads it, installs it with dependency resolution, the
postinst scripts start the service etc. It's sort of a Docker
alternative that's more classically Linux-y, designed for a world
where really big machines are really cheap and thus many apps don't
need to be cattle-ized. Pets are sometimes OK.

As part of this I'm looking at how to make upgrades smooth. Socket
activation already allows you to shut down, upgrade and restart a
service without dropping connections because systemd will hold the
connections until the service comes back but there are a couple of
aspects that weren't really clear to me from reading the excellent
"pid eins" blog post series. Could we maybe get a new blog post
exploring these issues?

1. How exactly should you stop a service that's socket activated so it
won't be re-activated during the upgrade but new connections won't be
lost, e.g. in package scripts that are executed across upgrades.
Currently the scripts stop the service before the upgrade happens,
then restart afterwards.

2. Is it possible to run two versions of a service unit at once such
that the old version finishes handling connections and then shuts
down, whilst new connections are being handled by the new version?

I feel intuitively that this should be possible for services like ssh,
but you'd need it for anything that serves downloads. Obviously
services would have to opt in to this, as they'd have to be able to
handle two versions running at once in terms of shared
state/config/caches etc, but for servers that can handle this it would
make upgrades entirely transparent.

thanks,
-mike