Hi,
yelninei--- via Bug reports for GNU Guix <[email protected]> writes:
> Reverting da741d89310efd0530351670d9c55ec2f952ab98 "services: account: Create
> /var/guix/profiles/per-user/$USER." fixes this, but I am not sure why.
Woow, thanks for bisecting this, I would never had thought this could be
a problem.
I built the image for ‘bare-hurd.tmpl’ and booted it (with
“console=com1” on the Mach command line) and here’s what we see:
--8<---------------cut here---------------start------------->8---
shepherd[1]: Starting service file-systems...
shepherd[1]: Service file-systems started.
shepherd[1]: Service file-systems running with value #t.
shepherd[1]: Service file-systems has been started.
shepherd[1]: Starting service user-homes...
shepherd[1]: Service user-homes failed to start.
shepherd[1]: Exception caught while starting user-homes: (misc-error
"scm_fdes_to_port" "requested file mode not available on fdes" () #f)
shepherd[1]: Service loopback has been started.
shepherd[1]: Service loopback started.
shepherd[1]: Service loopback running with value #t.
--8<---------------cut here---------------end--------------->8---
The ‘user-homes’ service fails to start, so basically the system isn’t
brought up.
The culprit appears to be ‘mkdir-p/perms’:
--8<---------------cut here---------------start------------->8---
ludo@childhurd ~$ rpctrace -o log guile -c '(use-modules (gnu build
activation)) (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #o755)'
Backtrace:
In ice-9/boot-9.scm:
1752:10 7 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
6 (apply-smob/0 #<thunk 20f91a0>)
In ice-9/boot-9.scm:
724:2 5 (call-with-prompt _ _ #<procedure default-prompt-handle?>)
In ice-9/eval.scm:
619:8 4 (_ #(#(#<directory (guile-user) 20ec6e0>)))
In ice-9/command-line.scm:
185:19 3 (_ #<input: string 2106fc0>)
In unknown file:
2 (eval (mkdir-p/perms "foo/bar/baz" (getpwnam "ludo") #) #)
In gnu/build/activation.scm:
97:20 1 (mkdir-p/perms _ #("ludo" "x" 1000 998 "Ludovic Cou?" ?) ?)
In unknown file:
0 (open "." 7340032 #<undefined>)
ERROR: In procedure open:
In procedure scm_fdes_to_port: requested file mode not available on fdes
--8<---------------cut here---------------end--------------->8---
The relevant log snippet is this:
--8<---------------cut here---------------start------------->8---
17<--33(pid168)->dir_lookup ("etc/passwd" 4194305 0) = 0 1 ""
66<--74(pid168)
66<--74(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message
ID)
66<--74(pid168)->io_stat_request () = 0 {23 7 0 56029 0 1745320104 0 33188 1
0 0 1841 0 17453
19370 840000000 1745319369 220000000 1745319369 220000000 8192 8 0 0 0 0 0 0 0
0 0 0 0}
66<--74(pid168)->io_seek_request (0 0) = 0 0
66<--74(pid168)->io_read_request (-1 8192) = 0 "root:x:0:0:System
administrator:/root:/gnu/st
ore/a1vynvd381hxsf979qzv8r25bc3pd2r"
task13(pid168)-> 3206 (pn{ 30}) = 0
20<--32(pid168)->dir_lookup ("." 7340160 0) = 0 1 "" 66<--70(pid168)
66<--70(pid168)->io_stat_request () = 0 {23 7 0 264001 0 1745320625 0 16832 3
1000 998 4096 0
1745342831 30000000 1745342821 950000000 1745319372 110000000 8192 8 0 0 0
8388736 8388736 838
8736 8388736 8388736 8388736 8388736 8388736}
66<--70(pid168)->term_getctty () = 0xfffffed1 ((ipc/mig) bad request message
ID)
66<--70(pid168)->io_get_openmodes_request () = 0 0
25<--37(pid168)->io_write_request ("Backtrace:\n" -1) = 0 11
--8<---------------cut here---------------end--------------->8---
The ‘io_get_openmodes’ RPC corresponds to F_GETFL in
‘scm_i_fdes_is_valid’ in Guile.
Can be reproduced with just this:
guile -c '(open "." O_DIRECTORY)'
I think ‘flags_to_mode’ in Guile returns “r” on Linux, which is fine
because O_RDONLY is set. But on the Hurd, O_RDONLY is not set:
--8<---------------cut here---------------start------------->8---
ludo@childhurd ~$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY)
F_GETFL))'
;;; ("i586-pc-gnu" 0)
--8<---------------cut here---------------end--------------->8---
vs.:
--8<---------------cut here---------------start------------->8---
$ guile -c '(pk %host-type (fcntl (open-fdes "." O_DIRECTORY) F_GETFL))'
;;; ("x86_64-unknown-linux-gnu" 98304)
--8<---------------cut here---------------end--------------->8---
Long story short, O_RDONLY = 0 on Linux but it’s non-zero on the Hurd,
so to placate ‘scm_i_fdes_is_valid’, we need to show it that the
directory is opened with O_RDONLY:
diff --git a/gnu/build/activation.scm b/gnu/build/activation.scm
index 11f7c82d67..038d8327de 100644
--- a/gnu/build/activation.scm
+++ b/gnu/build/activation.scm
@@ -90,6 +90,7 @@ (define (mkdir-p/perms directory owner bits)
;; By combining O_NOFOLLOW and O_DIRECTORY, this procedure automatically
;; verifies that no components are symlinks.
(define open-flags (logior O_CLOEXEC ; don't pass the port on to subprocesses
+ O_RDONLY ;need on the Hurd, harmless on Linux
O_NOFOLLOW ; don't follow symlinks
O_DIRECTORY)) ; reject anything not a directory
Tested on both systems and it seems to work.
Let me know how it goes for you!
> Finding this was a lot of trial and error (bisecting did now work
> because of the python cross compilation failure) but sshd not showing
> up is caught by the childhurd system test. Encountering a record ABI
> mismatch requiring a recompile of the entire guix tree slowed this
> down as well.
For the API mismatch, you could probably rebuild just the small subset
of modules affected by this (for example, those that refer to
<guix-configuration> if that’s what’s involved).
> Also https://issues.guix.gnu.org/77610 is causing the the rest of the
> failures in the chldhurd system test which expect the guix daemon to
> be avaialble immediately. I started looking around in glibc and hurd
> but I haven't found a good setup yet to easily try changes without a
> full rebuild.
For such things, I found that testing interactively in QEMU is best.
Thanks for finding and debugging this!
Ludo’.