Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Michal Koutný
On Tue, Mar 15, 2022 at 04:35:12PM +0100, Felip Moll  wrote:
> Meaning that it would be great to have a delegated cgroup subtree without
> the need of a service or scope.
> Just an empty subtree.

It looks appealing to add Delegate= directive to slice units.
Firstly, that'd prevent the use of the slice by anything systemd.
Then some notion of owner of that subtree would have to be defined (if
only for cleanup).
That owner would be a process -- bang, you created a service with
delegation or a scope with "keepalive" process.

(The above is slightly misleading) there could be an alternative of
something like RemainAfterExit=yes for scopes, i.e. such scopes would
not be stopped after last process exiting (but systemd would still be in
charge of cleaning the cgroup after explicit stop request and that'd
also mark the scope as truly stopped).
Such a recycled scope would only be useful via
org.freedesktop.systemd1.Manager.AttachProcessesToUnit().

BTW I'm also wondering how do you detect a job finishing in the case
original parent is gone (due to main service restart) and job's main
process reparented?

BTW 2 You didn't like having a scope for each job. Is it because of the
setup time (IOW jobs are short-lived) or persistent scopes overhead (too
many units, PID1 scalability)?

Michal


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Felip Moll
> It's shown as active, so where is the problem?
>
>
I have found the problem.
I start my main process (slurmd) on a terminal, which then forks-exec a
/bin/sleep infinity and creates a new scope adding the pid of the sleep.

If the slurmd is terminated with ctrl+c then the child processes die, so
the scope is destroyed. So I need to daemonize the sleep.
Or... use a service directly.


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Felip Moll
On Tue, Mar 15, 2022 at 1:29 PM Lennart Poettering 
wrote:

> On Mo, 14.03.22 23:12, Felip Moll (fe...@schedmd.com) wrote:
>
> > > But note that you can also run your main service as a service, and
> > > then allocate a *single* scope unit for *all* your payloads.
> >
> > The main issue is the scope needs a pid attached to it. I thought that
> the
> > scope could live without any process inside, but that's not happening.
> > So every time a user step/job finishes, my main process must take care of
> > it, and launch the scope again on the next coming job.
>
> Leave a stub process around in it. i.e something similar to
> "/bin/sleep infinity".
>
>
Ok.. this was my initial idea.


> > The forked process just does the dbus call, and when the scope is ready
> it
> > is moved to the corresponding cgroup (PIDFile=).
>
> Hmm? PIDFile= is a property of *services*, not *scopes*.
>
>
Sorry I meant PIDs, not PIDFile of course.


> And "scopes" cannot be moved to "cgroups". I cannot parse the above.
>
>
The forked process X does the dbus call to start the scope with
PIDs=$(pidof X), and when the scope is ready,
X is moved into the scope cgroup.


> Did you read up on scopes and services?
>
> See https://systemd.io/CGROUP_DELEGATION/, it explains the concept of
> "scopes". Scopes *have* cgroups, but cannot be moved to "cgroups".
>
>
Yes, it was a misunderstanding of my previous sentence.


> > Problem number one: if other processes are in the scope, the dbus call
> > won't work since I am using the same name all the time, e.g.
> > slurmstepd.scope.
> > So I first need to check if the scope exists and if so put the new
> > slurmstepd process inside. But we still have the race condition, if
> during
> > this phase all steps ends, systemd will do the cleanup.
>
> Leave a stub process around in it.


Ok, then I don't see the real difference of starting up a new service.


> > If instead I could just ask systemd to delegate a part of the tree for my
> > processes, then everything would be solved.
>
> I don't follow. You can enable delegation on the scope. I mean, that's
> the reason I suggested to use a scope.
>
>
Meaning that it would be great to have a delegated cgroup subtree without
the need of a service or scope.
Just an empty subtree.


> > Do you have any other suggestions?
>
> Not really, except maybe: please read up on the documentation, it
> explains a lot of the concepts.
>
>
I've done, I may not be expressing myself perfectly though. I apologize for
that.


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Lennart Poettering
On Di, 15.03.22 10:50, Felip Moll (fe...@schedmd.com) wrote:

> Another thing I have found is that if the process which created the scope
> (e.g. my main process, slurmd) terminates, then the scope is stopped even
> if I abandoned it and there's a pid inside.
> So this makes the proposed solution not working. What am I missing?
>
> ● gamba11_slurmstepd.scope
>  Loaded: loaded (/run/systemd/transient/gamba11_slurmstepd.scope;
> transient)
>  Transient: yes
>  Active: active (abandoned) since Tue 2022-03-15 10:40:34 CET; 4s ago

It's shown as active, so where is the problem?

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Lennart Poettering
On Mo, 14.03.22 23:12, Felip Moll (fe...@schedmd.com) wrote:

> > But note that you can also run your main service as a service, and
> > then allocate a *single* scope unit for *all* your payloads.
>
> The main issue is the scope needs a pid attached to it. I thought that the
> scope could live without any process inside, but that's not happening.
> So every time a user step/job finishes, my main process must take care of
> it, and launch the scope again on the next coming job.

Leave a stub process around in it. i.e something similar to
"/bin/sleep infinity".

> The forked process just does the dbus call, and when the scope is ready it
> is moved to the corresponding cgroup (PIDFile=).

Hmm? PIDFile= is a property of *services*, not *scopes*.

And "scopes" cannot be moved to "cgroups". I cannot parse the above.

Did you read up on scopes and services?

See https://systemd.io/CGROUP_DELEGATION/, it explains the concept of
"scopes". Scopes *have* cgroups, but cannot be moved to "cgroups".

> Problem number one: if other processes are in the scope, the dbus call
> won't work since I am using the same name all the time, e.g.
> slurmstepd.scope.
> So I first need to check if the scope exists and if so put the new
> slurmstepd process inside. But we still have the race condition, if during
> this phase all steps ends, systemd will do the cleanup.

Leave a stub process around in it.

> Problem number two, there's a significant delay since when creating the
> scope, until it is ready and the pid attached into it. The only way it
> worked was to put a 'sleep' after the dbus call and make my process wait
> for the async call to dbus to be materialized. This is really
> un-elegant.

If you want to synchronize in the cgroup creation to complete just
wait for the JobRemoved bus signal for the job returned by
StartTransientUnit().

> If instead I could just ask systemd to delegate a part of the tree for my
> processes, then everything would be solved.

I don't follow. You can enable delegation on the scope. I mean, that's
the reason I suggested to use a scope.

> Do you have any other suggestions?

Not really, except maybe: please read up on the documentation, it
explains a lot of the concepts.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Felip Moll
Another thing I have found is that if the process which created the scope
(e.g. my main process, slurmd) terminates, then the scope is stopped even
if I abandoned it and there's a pid inside.
So this makes the proposed solution not working. What am I missing?

● gamba11_slurmstepd.scope
 Loaded: loaded (/run/systemd/transient/gamba11_slurmstepd.scope;
transient)
 Transient: yes
 Active: active (abandoned) since Tue 2022-03-15 10:40:34 CET; 4s ago
 Tasks: 1 (limit: 38333)
 Memory: 0B
 CPU: 0
 CGroup: /system.slice/gamba11_slurmstepd.scope
 └─system
 └─18000 /home/lipi/slurm/master/inst/sbin/slurmstepd
infinity


mar 15 10:40:53 llit systemd[1]: gamba11_slurmstepd.scope: Succeeded.