On Sat, May 27, 2023 at 11:09:32PM +0200, Helmut Grohne wrote:
> Hi,
> 
> I sat down with Jochen in Hamburg to try and fix this.
> 
> On Sun, May 14, 2023 at 03:21:24PM -0400, Theodore Ts'o wrote:
> > Can someone send the instructions on how to fix this?
> 
> We wish we could give you. Instead, we document our findings, so maybe the
> next one looking into this bug has a better idea, but for now we give up
> as it is too late for bookworm anyway.

Helmut, Jochem, thanks so much for trying to look into this.  Here's
some additional context from my research.

First of all, the change to use WantedBy=default.target to
Wantedby-multi-user.target, as described in Message #19 of this bug,
was in response to a bug report from Ansgar, bug report #991349:

>I noticed that e2scrub_reap.service uses
>
>  WantedBy=default.target
>
>instead of the more usual
>
>  WantedBy=multi-user.target.
>
>As default.target is usually just an alias for multi-user.target or
>graphical.target, this means it will even be pulled in if someone uses
>some other custom target. This feels rather unexpected.
>
>Is there any reason not to use WantedBy=multi-user.target?

At the time, I thought to myself, sure, makes sense, and made the
change in commit b42c9788c75d ("e2scrub: use
WantedBy=multi-user.target in e2scrub_reap.service"), and in the
commit I noted "Addresses-Debian-Bug: #991349"

As near as I can tell, on a system that started with the Bullseye
version of e2fsprogs, and which has then updated to the Bookform
version e2fsprogs, via periodic updates to testing (Bookworm), the
default.target link still exists:

% ls -l /etc/systemd/system/default.target.wants/e2scrub_reap.service
0 lrwxrwxrwx 1 root root 40 Dec 19  2020 
/etc/systemd/system/default.target.wants/e2scrub_reap.service -> 
/lib/systemd/system/e2scrub_reap.service

... and this is enough for systemctl status to seem to think that
e2scrub_reap is still enabled:

% systemctl status e2scrub_reap
○ e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots
     Loaded: loaded (/lib/systemd/system/e2scrub_reap.service; enabled; preset: 
enable>
     Active: inactive (dead) since Sat 2023-05-27 17:53:22 EDT; 1h 34min ago
       Docs: man:e2scrub_all(8)
    Process: 1309 ExecStart=/sbin/e2scrub_all -A -r (code=exited, 
status=0/SUCCESS)
   Main PID: 1309 (code=exited, status=0/SUCCESS)
        CPU: 12ms
 ...

So sure, /etc/systemd.d/system/multi-user.target.wants/e2scrub_reap.service
doesn't exist.  *But* it still exists in .../default.target.wants/...
which seems to be enough to keep the e2scrub_reap service enabled.  Right?

What am I missing?

In any case, I am still unclear (a) what is actually broken in this
particular setup, since according to systemctl status the systemd unit
is apparently still appropriate enabled, even if it isn't via the
expected Wanted-b: multi-user.target.

And secondly, (b) what is e2fsprogs's control scripts supposed to have
done differently?  That is, if this is indeed this is a bug in
e2fsprogs --- what did I do wrong, and how do I fix it?

And if the answer is you should never, ever, try to change a Wanted-by
line in a systemd script, because debian's systemd unit file
infrastructure is too fragile to handle this correctly, given that
bookworm is about to ship with "Wanted-by: multi-user.target", what's
the best path forward at this point?

I'll note that e2scrub_reap.service is just a helper unit file which
is only needed to clean up after a system crash while e2scrub is
running --- and that will only happen if the user has edited and
appropriately configured e2scrub in /etc/e2scrub.conf.

So from my maintainer's perspective, what I am going for is that
e2scrub_reap.service and e2scrub_all.timer should *always* be enabled,
since the real control point (as far as I am concerned) is
/etc/e2scrub.conf.  I really don't actually *care* whether it is
enabled via default.target.wanted or multi-user.target.wanted.

If I need to be sent to some systemd re-educational camp to understand
the finer points about default.target vs multi-user.target, and
whether it acctually makes any difference whether the systemd unit
file says "Wanted-by: multi-user.target", but in the upgraded
bullseye->bookworm installation, the symlink is still in
*/default.target.wanted/* --- please point me at the documentation.

Otherwise, I'm beginning to think that nothing is actually broken, and
the bullseye2bookworm piuparts tests is just being overly picky, but
nothing is actually broken in actual practice.  And perhaps I should
just close this bug as "Working as Intended".

Again, what am I missing? 

                                        - Ted

P.S.  I really *am* trying to get with the systemd program, but this
all of this complexity is just hopelessly confusing.  :-( :-( :-(

P.P.S.  And there is actually a case where this will actually break a
real user, can someone give me clear reproduction instructions, which
starts with "install bookworm in a VM", then do X, then do Y, and then
observe breakage.  Please don't just point me at piuparts output,
because unfortunately, it's too complicated for my tiny brain to
understand.

Reply via email to