Hi,

Tomas Volf <[email protected]> skribis:

>> After spending hours on this and fixing improbable issues in the
>> Shepherd (will push shortly), I found that the root of the problem is
>> exactly what I feared and which led to the patches at
>> <https://issues.guix.gnu.org/76262>.
>>
>> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
>> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
>> it loses the race and waits forever.
>
> Observation here.  While yes, based on the description I agree that it
> is (bad) luck based, in practice it seems to be extremely reliable to
> reproduce.

Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’.  Thing is,
as soon as you would change something non-trivial, for instance the
‘message-destination’ procedure of shepherd so that it writes everything
to /dev/console, the problem would go away.  Even just commenting out
some of the parameters passed to ‘system-log’ could make the problem
disappear (!), which is why it took me a lot of time to figure it out.

>> Could you try your config with the patch at
>> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
>> the metal?

[...]

> I can confirm the patch 2 fixes the issue for me, both in the VM and on
> physical machine.

Yay!

> Only thing I have noticed that even when deploying the "good" commit, I
> see the following error in the log:
>
> guix deploy: warning: an error occurred while upgrading services on 
> '127.0.0.1':
> %exception #<inferior-object #<&service-not-found-error service: system-log>>

I think I understood this one now.

The old service has only one name: syslogd.  The new one, which upgrades
it, has two names: system-log and syslogd (system-log is its “canonical
name”).

The service upgrade machinery gets confused because it uses the
canonical name in one place.

I’ll investigate.

Ludo’.



Reply via email to