Bug#948193: e2scrub services delay boot by 2 seconds

2020-01-06 Thread Josh Triplett
On Mon, 6 Jan 2020 16:14:24 -0500 "Theodore Y. Ts'o"  wrote:
> On Mon, Jan 06, 2020 at 12:39:52AM -0800, Josh Triplett wrote:
> > That's an *additional* delay, on top of the sleeps above. The two-second
> > sleep in the "exitcode" function seems like the primary culprit.  Note
> > that I don't even have lvm2-tools installed.
> 
> Ah, yes, sorry, I had missed the sleep in the exitcode function.
> Actually it's not needed in e2scrub_all at all; it was there due
> copy/paste oversight.
> 
> The commit below should address your concern.

That addresses the two-second sleep in a default install, thank you.

Please also move the entirely commented-out e2scrub.conf to an example
in /usr/share/doc (deleting /etc/e2scrub.conf in postinst if it matches
the current example), and making the timer/service use
ConditionPathExists=/etc/e2scrub.conf, which will avoid running the
service at all.

- Josh Triplett



Bug#948193: e2scrub services delay boot by 2 seconds

2020-01-06 Thread Theodore Y. Ts'o
On Mon, Jan 06, 2020 at 12:39:52AM -0800, Josh Triplett wrote:
> That's an *additional* delay, on top of the sleeps above. The two-second
> sleep in the "exitcode" function seems like the primary culprit.  Note
> that I don't even have lvm2-tools installed.

Ah, yes, sorry, I had missed the sleep in the exitcode function.
Actually it's not needed in e2scrub_all at all; it was there due
copy/paste oversight.

The commit below should address your concern.

Cheers,

- Ted

commit 0b3208958eb63df6cd8b38ee63f3bc4266a683e7
Author: Theodore Ts'o 
Date:   Mon Jan 6 16:01:23 2020 -0500

e2scrub, e2scrub_all: don't sleep unnecessarily in exitcode

The two second sleep is only needed in e2scrub, and when there is a
failure, so that systemd has a chance to gather the log output before
e2scrub exits.  It's not needed if the script is exiting successfully,
and it's never needed for e2scrub_all ever.

Addresses-Debian-Bug: #948193
Signed-off-by: Theodore Ts'o 

diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in
index f21499b6..30ab7cbd 100644
--- a/scrub/e2scrub.in
+++ b/scrub/e2scrub.in
@@ -66,7 +66,7 @@ exitcode() {
# for capturing all the log messages if the scrub fails, because the
# fail service uses the service name to gather log messages for the
# error report.
-   if [ -n "${SERVICE_MODE}" ]; then
+   if [ -n "${SERVICE_MODE}" -a "${ret}" -ne 0 ]; then
test "${ret}" -ne 0 && ret=1
sleep 2
fi
diff --git a/scrub/e2scrub_all.in b/scrub/e2scrub_all.in
index f0336711..4288b969 100644
--- a/scrub/e2scrub_all.in
+++ b/scrub/e2scrub_all.in
@@ -56,14 +56,8 @@ exitcode() {
# section 22.2) and hope the admin will scan the log for what
# actually happened.
 
-   # We have to sleep 2 seconds here because journald uses the pid to
-   # connect our log messages to the systemd service.  This is critical
-   # for capturing all the log messages if the scrub fails, because the
-   # fail service uses the service name to gather log messages for the
-   # error report.
-   if [ -n "${SERVICE_MODE}" ]; then
+   if [ -n "${SERVICE_MODE}" -a "${ret}" -ne 0 ]; then
test "${ret}" -ne 0 && ret=1
-   sleep 2
fi
 
exit "${ret}"



Bug#948193: e2scrub services delay boot by 2 seconds

2020-01-06 Thread Josh Triplett
On Sun, 5 Jan 2020 15:41:57 -0500 "Theodore Y. Ts'o"  wrote:
> On Sat, Jan 04, 2020 at 07:57:16PM -0800, Josh Triplett wrote:
> > Package: e2fsprogs
> > Version: 1.45.4-1
> > Severity: important
> >
> > The e2fsprogs package installs a service and timer to run e2scrub. That
> > service sleeps for 2 seconds before exiting, delaying the boot by 2
> > seconds.
>
> It's not necessarily 2 seconds, and it's not directly sleeping.

$ grep sleep /sbin/e2scrub*
/sbin/e2scrub:  # We have to sleep 2 seconds here because journald uses the pid 
to
/sbin/e2scrub:  sleep 2
/sbin/e2scrub:  sleep 0.5
/sbin/e2scrub:  sleep 0.5
/sbin/e2scrub_all:  # We have to sleep 2 seconds here because journald uses 
the pid to
/sbin/e2scrub_all:  sleep 2

$ systemd-analyze blame | grep e2scrub
   2.012s e2scrub_all.service

> It's
> however long it takes to spin up any storage devices, caused by
> running lvs.  The bulk of the time of running "e2scrub_all -A -r" is
> the time to run 'lsblk' and 'lvs'.

That's an *additional* delay, on top of the sleeps above. The two-second
sleep in the "exitcode" function seems like the primary culprit.  Note
that I don't even have lvm2-tools installed.

> > Second, please use ConditionPathExists or similar to check for the tools
> > e2scrub needs (lsblk and lvcreate), rather than running a script that
> > checks for them and then exits.
> 
> That's not the cause of most of the time needed to run e2scrub_all.
> We also need to run these sanity checks when e2scrub_all is run by
> hand, or run out of cron.

It is when I'm trying to boot a system in ~100ms. I want to avoid
launching the service and its shell script *at all*.

> > And third, please consider *not* enabling this by default.
>
> It wasn't enabled by default.

I just installed a fresh system, and it definitely *is* enabled by
default, both via a systemd timer and via cron. I noticed that while
debugging the boot process on that fresh system, which also led me to
find it enabled on my laptop, where it similarly delays the boot process
by two seconds.

It's possible that we're talking past each other here. There's an
enabled systemd timer unit and an enabled cron job that run a shell
script. That shell script then sources its configuration file, just to
find out that it isn't actually configured, and then sleeps before
exiting. That's still enabled. (And even if it didn't have the sleep in
it, it's still enabled.)

This is the equivalent of an /etc/default/foo file with an ENABLED=0
flag or similar, which has similar problems and shouldn't ever happen
either.  The right way to have a disabled-by-default service is to
disable it, and tell people how to enable it. One way to do that would
be to disable the service/timer and tell people how to enable it via
systemctl or similar; another way would be to remove the sample
/etc/e2scrub.conf and then use ConditionPathExists=/etc/e2scrub.conf .
Personally, I would suggest the latter.

(Along the same lines, e2scrub_fail should have a ConditionPathExists on
sendmail, because it can't do anything useful if sendmail doesn't
exist.)

- Josh Triplett



Bug#948193: e2scrub services delay boot by 2 seconds

2020-01-05 Thread Theodore Y. Ts'o
On Sat, Jan 04, 2020 at 07:57:16PM -0800, Josh Triplett wrote:
> Package: e2fsprogs
> Version: 1.45.4-1
> Severity: important
> 
> The e2fsprogs package installs a service and timer to run e2scrub. That
> service sleeps for 2 seconds before exiting, delaying the boot by 2
> seconds.

It's not necessarily 2 seconds, and it's not directly sleeping.  It's
however long it takes to spin up any storage devices, caused by
running lvs.  The bulk of the time of running "e2scrub_all -A -r" is
the time to run 'lsblk' and 'lvs'.

I've already queued up a change (see below) so that we won't attempt
to clean up any left-over LVM snapshot volumes if e2scrub is not
enabled via /etc/e2scrub.conf, and even if scrubbing is enabled, we
check for snapshots via scanning /dev/mapper instead of using lvs.
This commit will be part of e2fsprogs 1.45.5, to be released in the
next few days.

> Second, please use ConditionPathExists or similar to check for the tools
> e2scrub needs (lsblk and lvcreate), rather than running a script that
> checks for them and then exits.

That's not the cause of most of the time needed to run e2scrub_all.
We also need to run these sanity checks when e2scrub_all is run by
hand, or run out of cron.

> And third, please consider *not* enabling this by default.

It wasn't enabled by default.  And the issue of lvs being slow is fixed by:

commit 333268d65d26fbb2d22f7a8b6ac797babcc69543
Author: Darrick J. Wong 
Date:   Mon Nov 4 17:54:14 2019 -0800

e2scrub_all: don't even reap if the config file doesn't allow it

Dave Chinner complains that the automated on-boot e2scrub reaping takes
a long time (because the lvs command can take a while to run) even
though the automated e2scrub is disabled via e2scrub.conf on his
systems.

We still need the reaping service to kill off stale e2scrub snapshots
after a crash, but it's unnecessary to annoy everyone with slow bootup.
Because we can look for the e2scrub snapshots in /dev/mapper, let's
skip reaping if periodic e2scrub is disabled unless we find evidence of
e2scrub snapshots in /dev.

Reported-by: Dave Chinner 
Signed-off-by: Darrick J. Wong 
Signed-off-by: Theodore Ts'o 

- Ted



Bug#948193: e2scrub services delay boot by 2 seconds

2020-01-04 Thread Josh Triplett
Package: e2fsprogs
Version: 1.45.4-1
Severity: important

The e2fsprogs package installs a service and timer to run e2scrub. That
service sleeps for 2 seconds before exiting, delaying the boot by 2
seconds.

First of all, sleeping for 2 seconds is not OK.

Second, please use ConditionPathExists or similar to check for the tools
e2scrub needs (lsblk and lvcreate), rather than running a script that
checks for them and then exits.

And third, please consider *not* enabling this by default.

-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 5.4.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE=C.UTF-8 
(charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages e2fsprogs depends on:
ii  libblkid12.34-0.1
ii  libc62.29-7
ii  libcom-err2  1.45.4-1
ii  libext2fs2   1.45.4-1
ii  libss2   1.45.4-1
ii  libuuid1 2.34-0.1
ii  logsave  1.45.4-1

Versions of packages e2fsprogs recommends:
pn  e2fsprogs-l10n  

Versions of packages e2fsprogs suggests:
pn  e2fsck-static  
pn  fuse2fs
pn  gpart  
ii  parted 3.3-1

-- no debconf information