Hello Guix devel, When `guix system init` fails, there are a number of possible causes of failure:
* Packages being downloaded are so broken that they cannot actually be built. * QC should filter this out. * The hardware being installed on is broken, usually a failure of the storage device being installed into. * Downloading substitutes from the substitute server failed. Of the above, the last is the most likely to occur in practice. I have been doing a number of repeated installation tests on VMs using the SJTUG mirror server, as well as the Berlin Cuirass, and a significant number of installation attempts via the guided installer fail due to problems with downloading substitutes. * From my system, the Berlin Cuirass server is very very very slow (< 40kiB/s, sometimes as low as 4kiB/s) and possibly because of the slowness, the download gets interrupted part of the way through which causes the install to fail. * The SJTUG server sometimes responds in ways that the Guix downloader does not expect, causing failures. What I do instead is to use the "manual" mode and just keep doing `guix system build` over and over until it manages to pull through. I think that the guided installer should also use the same technique of trying `guix system build` repeatedly for at least some number of tries, possibly asking the user if they want to keep trying (in case the issue is a permanent network error rather than a transient network error). Yes, currently a failure to install "just" kicks the user back to the guided install and they can rerun `guix system init`. ***HOWEVER***, because the store is in a COW mode, this sometimes leaves the store in a wonky state and the `guix system init` performs the system build from 0, or it can fail. Not to mention that this is requires more keypresses for the user. So, let me sketch proposed changes to `gnu/installer/final.scm`: ```patch @@ -169,6 +169,15 @@ or #f. Return #t on success and #f on failure." "/tmp/installer-system-init-options" read)) (const '()))) + (build-command (append (list "guix" "system" "build" + "--fallback") + options + (list (%installer-configuration-file)))) + (build-grub-command + (append (list "guix" "build" + "--fallback" + "grub" "grub-efi") + options)) (install-command (append (list "guix" "system" "init" "--fallback") options @@ -178,6 +187,36 @@ or #f. Return #t on success and #f on failure." (database-file (string-append database-dir "/db.sqlite")) (saved-database (string-append database-dir "/db.save")) (ret #f)) + + (define* (perform-install #:optional (tries 0)) + + (define (retry) + (perform-install (+ tries 1))) + + (define (ask-if-retry) + ;; TODO. Not sure best way to query user whether they + ;; would like to retry again. + ) + + (if (and (run-command build-command #:locale locale) + (run-command build-grub-command #:locale locale)) + (run-command install-command #:locale locale) + ;; Try to recover. + (begin + (format #t "~%~%~s~%~s~%~%" + (G_ "Failure while building system.") + (G_ "This is usually caused by (hopefully transient) network errors.")) + (cond + ((< tries %max-auto-system-build-retries) + (format #t "~s~%" + (G_ "Will wait 3 seconds and retry...")) + (sleep 3) + (retry)) + (else + #f))))) + (mkdir-p (%installer-target-dir)) ;; We want to initialize user passwords but we don't want to store them in @@ -221,9 +260,8 @@ or #f. Return #t on success and #f on failure." (lambda () (with-error-to-file "/dev/console" (lambda () - (run-command install-command - #:locale locale))))) - (run-command install-command #:locale locale)))) + (perform-install))))) + (perform-install)))) (lambda () ;; Restart guix-daemon so that it does no keep the MNT namespace ;; alive. ``` Notes: * `guix system build` only builds the *system*. It doesn't build the bootloader. I can't find a command that builds the bootloader; only `guix system init` or `guix system reconfigure` do that, but we need to differentiate between the failure "downloading from the substituter failed" (which might be fixable by just retrying) from "writing to the device being installed into failed". * In the above I use `guix build grub grub-efi` as a proxy for this, but it would be nice if there were some kind of `guix system build-bootloader` that would perform *building* of the script that installs the bootloader, but doesn't actually install the bootloader *yet*. * I don't know how best to ask the user if they want to retry the system building process. Thanks raid5atemyhomework