Am 12.01.20 um 20:15 schrieb Santiago Vila:
> My theory is that this is some kind of race condition.
> 
> I initially cloned the machine from another one where this happened.
> 
> Then I discovered that the problem also happens (randomly) in a brand
> new machine if I copy the journal from the "bad" machine.
> 
> However, there is nothing special about the journal (or at least
> "journalctl --verify" says it's ok), except maybe that it's several
> megabytes long.
> 
> Could it be that systemd spends some time processing the journal at
> boot time and this is what triggers the race condition?

On my buster system:


> an 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking at job 
> systemd-timesyncd.service/stop conflicted_by=yes
> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Looking at job 
> systemd-timesyncd.service/start conflicted_by=no
> Jan 12 20:04:46 debian systemd[1]: systemd-timesyncd.service: Fixing 
> conflicting jobs 
> systemd-timesyncd.service/stop,systemd-timesyncd.service/start by deleting 
> job systemd-timesyncd.service/start
> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job 
> chrony.service/start conflicted_by=no
> Jan 12 20:04:46 debian systemd[1]: chrony.service: Looking at job 
> chrony.service/stop conflicted_by=no
> Jan 12 20:04:46 debian systemd[1]: chrony.service: Fixing conflicting jobs 
> chrony.service/start,chrony.service/stop by deleting job chrony.service/stop
> Jan 12 20:04:46 debian systemd[1]: Found redundant job 
> systemd-timesyncd.service/stop, dropping from transaction.


Those lines are missing on GCE system. Instead I see
> Jan 12 17:02:01 d1 systemd[1]: Keeping job systemd-timesyncd.service/start 
> because of sysinit.target/start
> Jan 12 17:02:01 d1 systemd[1]: Keeping job chrony.service/stop because of 
> systemd-timesyncd.service/start
> Jan 12 17:02:01 d1 systemd[1]: systemd-timesyncd.service: Installed new job 
> systemd-timesyncd.service/start as 119
> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 82 chrony.service/start 
> finished, result=canceled
> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Installed new job 
> chrony.service/stop as 121
> Jan 12 17:02:01 d1 systemd[1]: chrony.service: Job 121 chrony.service/stop 
> finished, result=done


The problem is, that the transaction is computed *before*
ConditionFileIsExecutable=!/usr/sbin/chronyd is evaluated (conditions
are evaluation just before the binary is executed) and it might indeed
depend on the ordering, in which the jobs are scheduled.

So, the simplest fix would be to drop the line
> Conflicts=systemd-timesyncd.service openntpd.service ntp.service 
> ntpsec.service
from chrony.service.
This way, systemd will schedule the start of both services.
chrony.service will be started properly and for
systemd-timesyncd.service the ConditionFileIsExecutable will kick in.

I'm sorry, that I first asked chrony (and other NTP implementations) to
add a Conflicts=systemd-timesyncd.service, hoping I could get rid of the
/lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
hack.

But it appears, this hack is actually much more reliable then using
Conflicts=
See
https://github.com/systemd/systemd/issues/7104#issuecomment-456755980
which explains the problem. We'd need some kind of asymmetrical conflicts.


I guess at this point it is best to ask chrony, ntp, openntpd, ntpsec
and virtualbox [1] to drop the Conflicts= line again.
Maybe we should even do that for buster via a stable upload?

Thoughts?

Michael


[1]
https://codesearch.debian.net/search?q=Conflicts%3Dsystemd-timesyncd.service&literal=1

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to