Hi Bruno, Bruno Victal <mi...@makinata.eu> skribis:
> Upon a guix system reconfigure, if a running service has undergone a > canonical name change since the previous generation > then shutdown or reboot commands fail, with shepherd indicating itself (root > service) as stopped. Oh, fun. > mirai@guix-nuc ~$ sudo reboot […] > Service xvnc has been stopped. > Service git-daemon has been stopped. > Assertion (eq? (canonical-name new) (canonical-name old)) failed. > assertion-failed() > Service root has been stopped. I think this assertion failure is the root cause. It comes from ‘register-services’ in 0.9.3, which reads: --8<---------------cut here---------------start------------->8--- (define (register-services . new-services) "Add NEW-SERVICES to the list of known services. If a service has already been registered, arrange to have it replaced when it is next stopped. If it is currently stopped, replace it immediately." (define (register-single-service new) ;; Sanity-checks first. (assert (list-of-symbols? (provided-by new))) (assert (list-of-symbols? (required-by new))) (assert (boolean? (respawn? new))) ;; FIXME: Just because we have a unique canonical name now doesn't mean it ;; will remain unique as other services are added. Whenever a service is ;; added it should check that it's not conflicting with any already ;; registered canonical names. (match (lookup-services (canonical-name new)) (() ;; empty, so we can safely add ourselves (for-each (lambda (name) (let ((old (lookup-services name))) (hashq-set! %services name (cons new old)))) (provided-by new))) ((old . rest) ;; one service registered, it may be an old version of us (assert (null? rest)) (assert (eq? (canonical-name new) (canonical-name old))) (if (running? old) (slot-set! old 'replacement new) (replace-service old new))))) (for-each register-single-service new-services)) --8<---------------cut here---------------end--------------->8--- ‘register-services’ was called from ‘replace-service’, itself called from ‘stop’ (right after ‘networking’ had been actually stopped): --8<---------------cut here---------------start------------->8--- ;; SERVICE is no longer running. (slot-set! service 'running #f) ;; Reset the list of respawns. (slot-set! service 'last-respawns '()) ;; Replace the service with its replacement, if it has one (let ((replacement (slot-ref service 'replacement))) (when replacement (replace-service service replacement))) ;<- here --8<---------------cut here---------------end--------------->8--- The assertion failure was thrown here. ‘stop’ calls itself recursively but it doesn’t catch exceptions from recursive calls: --8<---------------cut here---------------start------------->8--- (fold-services (lambda (other acc) (if (and (running? other) (required-by? service other)) (append (stop other) acc) ;<- here acc)) '()) --8<---------------cut here---------------end--------------->8--- The problem is that ‘root-service’ marks itself as stopped before it has effectively shut down the services: --8<---------------cut here---------------start------------->8--- #:stop (lambda (unused . args) (local-output (l10n "Exiting shepherd...")) ;; Prevent that we try to stop ourself again. (slot-set! root-service 'running #f) (shutdown-services) (quit)) --8<---------------cut here---------------end--------------->8--- So what happened is that ‘shutdown-services’ threw, the exception wasn’t caught, and thus it never called ‘quit’. QED. The service registry in the soon-to-be-released 0.10.0 no longer has that assertion failure (it vanished in 08510a2a2aaab388c90dd402bd7506d33014454f). Instead, it registers a replacement for the first service found to have one of the names of the new service. The problem of the #:stop method of ‘root-service’ still exists: if ‘shutdown-services’ throws, the ‘root’ service won’t terminate and it’ll remain in ‘stopping’ status. We could wrap the ‘shutdown-services’ call in ‘false-if-exception’, but I’d lean towards not doing it: it’s not supposed to throw, so maybe it’s best not to swallow the exception. To summarize, I believe the problem is solved in 0.10. Thanks, Ludo’.