bug#58084: guix deploy fails, leaving the newly installed system generation active

2022-09-25 Thread Maxim Cournoyer
Hi,

While attempting to deploy to overdrive1, using the 9971141 commit in
the maintenance repo, I encountered the following error:

--8<---cut here---start->8---
maxim@hurd ~/src/guix-maintenance/hydra$ guix time-machine 
--commit=08d515233241ee0921b8b5ab706f98170c62437c -- deploy -L modules 
deploy-overdrive1.scm
The following 1 machine will be deployed:
  overdrive1

guix deploy: deploying to overdrive1...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
guix deploy: error: failed to deploy overdrive1: failed to switch systems while 
deploying 'overdrive1':
system-error "symlink" "~A" ("File exists") (17)
--8<---cut here---end--->8---

It also looks like even the above failed to "deploy" fully, the system
generation was left as the last active one:

--8<---cut here---start->8---
[...]
Generation 28   Sep 26 2022 04:04:36(current)
  file name: /var/guix/profiles/system-28-link
  canonical file name: /gnu/store/c02w7nyl5nr19x856455p2wh959r25h8-system
  label: GNU with Linux-Libre 5.19.10
  bootloader: grub-efi
  root device: /dev/sda3
  kernel: /gnu/store/nmdy7c4i34y12w8af7zl6sl9fmrp8wa0-linux-libre-5.19.10/Image
  channels:
sfl-packages:
  repository URL: https://gitlab.com/Apteryks/sfl-guix-channel
  branch: master
  commit: 6385881124429016f750b0f562b70e07f592275e
guix:
  repository URL: https://git.savannah.gnu.org/git/guix.git
  commit: 08d515233241ee0921b8b5ab706f98170c62437c
  configuration file: 
/gnu/store/myvzd1kpw2pfzfj3krl4lzpcbqsdn48x-configuration.scm
--8<---cut here---end--->8---

Which leaves me with two questions:

1. why did it fail?

2. when it encounters any error while deploying, shouldn't the
generation be removed instead of left as the active one?

Thanks,

Maxim





bug#58084: guix deploy fails, leaving the newly installed system generation active

2022-09-26 Thread Ludovic Courtès
Hi,

Maxim Cournoyer  skribis:

> While attempting to deploy to overdrive1, using the 9971141 commit in
> the maintenance repo, I encountered the following error:
>
> maxim@hurd ~/src/guix-maintenance/hydra$ guix time-machine 
> --commit=08d515233241ee0921b8b5ab706f98170c62437c -- deploy -L modules 
> deploy-overdrive1.scm
> The following 1 machine will be deployed:
>   overdrive1
>
> guix deploy: deploying to overdrive1...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: sending 0 store items (0 MiB) to 'overdrive1.guix.gnu.org'...
> guix deploy: error: failed to deploy overdrive1: failed to switch systems 
> while deploying 'overdrive1':
> system-error "symlink" "~A" ("File exists") (17)

I can reproduce it.

The failing code is in /gnu/store/…-switch-to-system.scm:

--8<---cut here---start->8---
(begin
  (use-modules
   (guix config)
   (guix profiles)
   (guix utils))
  (define profile
(or #f
(string-append %state-directory "/profiles/system")))
  (let*
  ((number
(#{1+}
 #
 (generation-number profile)))
   (generation
(generation-file-name profile number)))
(switch-symlinks generation 
"/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system")
(switch-symlinks profile generation)
(setenv "GUIX_NEW_SYSTEM" 
"/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system")
(primitive-load 
"/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")))
--8<---cut here---end--->8---

We can run it manually to get debugging data:

--8<---cut here---start->8---
ludo@overdrive1 ~$ sudo -E env -i COLUMNS=100  
"/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm"
making '/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system' the current 
system...
WARNING: (guile-user): imported module (guix build utils) overrides core 
binding `delete'
setting up setuid programs in '/run/setuid-programs'...
populating /etc from /gnu/store/hf3qxlaiajvapwis0lq20avgl2whfa5w-etc...
Backtrace:
   6 (primitive-load 
"/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm")
   5 (primitive-load 
"/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")
In ice-9/boot-9.scm:
   260:13  4 (for-each # _)
In unknown file:
   3 (primitive-load 
"/gnu/store/v03vaksmkpj7wv4dhm0yrd3y65lzbixz-activate-service.scm")
In srfi/srfi-1.scm:
634:9  2 (for-each # _)
In gnu/build/activation.scm:
   267:20  1 (_ "modprobe.d")
In unknown file:
   0 (symlink "/etc/static/modprobe.d" "/etc/modprobe.d")

ERROR: In procedure symlink:
In procedure symlink: File exists
--8<---cut here---end--->8---

This is because ‘zram-device-service-type’ contributes a file to
/etc/modprobe.d:

--8<---cut here---start->8---
(define %zram-device-config
  `("modprobe.d/zram.conf"
,(plain-file "zram.conf"
 "options zram num_devices=1")))

(define zram-device-service-type
  (service-type
(name 'zram)
(default-value (zram-device-configuration))
(extensions
  (list (service-extension kernel-module-loader-service-type
   (const (list "zram")))
(service-extension etc-service-type
   (const (list %zram-device-config)))
(service-extension udev-service-type
   (compose list zram-device-udev-rule
(description "Creates a zram swap device.")))
--8<---cut here---end--->8---

… which is fine, except that there was already a pre-existing
/etc/modprobe.d directory (coming from openSuSE, the distro that was
initially installed on this machine), which caused this activation code
to break:

--8<---cut here---start->8---
ludo@overdrive1 ~$ ls -l /etc/modprobe.d
total 36
-rw-r--r-- 1 root root 3221 Nov  6  2016 00-system.conf
-rw-r--r-- 1 root root  532 Nov 14  2012 10-unsupported-modules.conf
-rw-r--r-- 1 root root  181 May  5  2017 50-alsa.conf
-rw-r--r-- 1 root root 5009 Sep 15  2016 50-blacklist.conf
-rw-r--r-- 1 root root  128 Oct 12  2017 50-bluetooth.conf
-rw-r--r-- 1 root root   33 Oct 20  2016 50-ipw2200.conf
-rw-r--r-- 1 root root   34 Oct 20  2016 50-iwl3945.conf
-rw-r--r-- 1 root root   47 Nov 22  2011 99-local.conf
ludo@overdrive1 ~$ ls -ld /etc/modprobe.d
drwxr-xr-x 1 root root 260 Jan 29  2018 /etc/modprobe.d/
--8<---cut here---end--->8---

Once moved out of the way, reconfiguration proceeds just fine and
happiness ensues:

--8<---cut here---start->8---
ludo@overdrive1 ~$ ls -l /etc/modprobe.d
lrwxrwxrwx 1 root root 22 Sep 26 17:19 /etc/modprobe.d -> /etc/static/modprobe.d
ludo@overdrive1 

bug#58084: guix deploy fails, leaving the newly installed system generation active

2022-09-26 Thread Maxim Cournoyer
Hi,

Ludovic Courtès  writes:

[...]

> We can run it manually to get debugging data:
>
> ludo@overdrive1 ~$ sudo -E env -i COLUMNS=100  
> "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm"
> making '/gnu/store/kifxq4hmp4ihn6nb06ia8wms33qrndxn-system' the current 
> system...
> WARNING: (guile-user): imported module (guix build utils) overrides core 
> binding `delete'
> setting up setuid programs in '/run/setuid-programs'...
> populating /etc from /gnu/store/hf3qxlaiajvapwis0lq20avgl2whfa5w-etc...
> Backtrace:
>6 (primitive-load 
> "/gnu/store/xv7j4im9ap92mv0mbsm1wa4px93zxrms-switch-to-system.scm")
>5 (primitive-load 
> "/gnu/store/1wdwlaqkmixb1d7by7fj23lxppw8x44r-activate.scm")
> In ice-9/boot-9.scm:
>260:13  4 (for-each # _)
> In unknown file:
>3 (primitive-load 
> "/gnu/store/v03vaksmkpj7wv4dhm0yrd3y65lzbixz-activate-service.scm")
> In srfi/srfi-1.scm:
> 634:9  2 (for-each # gnu/build/activation.scm:257:12 (file)> _)
> In gnu/build/activation.scm:
>267:20  1 (_ "modprobe.d")
> In unknown file:
>0 (symlink "/etc/static/modprobe.d" "/etc/modprobe.d")
>
> ERROR: In procedure symlink:
> In procedure symlink: File exists
>
>
> This is because ‘zram-device-service-type’ contributes a file to
> /etc/modprobe.d:
>
> (define %zram-device-config
>   `("modprobe.d/zram.conf"
> ,(plain-file "zram.conf"
>  "options zram num_devices=1")))
>
> (define zram-device-service-type
>   (service-type
> (name 'zram)
> (default-value (zram-device-configuration))
> (extensions
>   (list (service-extension kernel-module-loader-service-type
>(const (list "zram")))
> (service-extension etc-service-type
>(const (list %zram-device-config)))
> (service-extension udev-service-type
>(compose list zram-device-udev-rule
> (description "Creates a zram swap device.")))
>
>
> … which is fine, except that there was already a pre-existing
> /etc/modprobe.d directory (coming from openSuSE, the distro that was
> initially installed on this machine), which caused this activation code
> to break:

Oh wow! Should we be extra careful and always rm files before linking to
their location?  Or define our own 'symlink' procedure that'd take care
of it?  That's not very elegant but better than obscure crashes like
this.

What do you think?

Thanks for the debugging!

Maxim





bug#58084: guix deploy fails, leaving the newly installed system generation active

2022-09-26 Thread Maxim Cournoyer
Hello again,

Maxim Cournoyer  writes:

[...]

>> … which is fine, except that there was already a pre-existing
>> /etc/modprobe.d directory (coming from openSuSE, the distro that was
>> initially installed on this machine), which caused this activation code
>> to break:
>
> Oh wow! Should we be extra careful and always rm files before linking to
> their location?  Or define our own 'symlink' procedure that'd take care
> of it?  That's not very elegant but better than obscure crashes like
> this.

I just had a better idea: fail and report that an unexpected file was
found there, leaving the user to inspect it and choose a proper action.

Thanks,

Maxim





bug#58084: guix deploy fails, leaving the newly installed system generation active

2022-09-29 Thread Ludovic Courtès
Hi,

Maxim Cournoyer  skribis:

> Maxim Cournoyer  writes:
>
> [...]
>
>>> … which is fine, except that there was already a pre-existing
>>> /etc/modprobe.d directory (coming from openSuSE, the distro that was
>>> initially installed on this machine), which caused this activation code
>>> to break:
>>
>> Oh wow! Should we be extra careful and always rm files before linking to
>> their location?  Or define our own 'symlink' procedure that'd take care
>> of it?  That's not very elegant but better than obscure crashes like
>> this.
>
> I just had a better idea: fail and report that an unexpected file was
> found there, leaving the user to inspect it and choose a proper action.

Yeah, that’d be nice.  It’s really a corner case that you’ll only hit
when installing on a non-empty file system, but gracefully handling it
would be nice for sure.

Ludo’.