bug#39925: `guix pull` failure in multi-machine setup
Hey Ludo, > For #2, we would need to do something like Jakub did in (guix scripts > system reconfigure), where the effectul bits can be transparently > evaluated either locally or remotely. I don’t understand why #2 needs different mechanics. As I said, /var/guix is mounted r/w on every machine and in fact `guix package -i` is working as intended. Maybe we’ve got a communication issue here and we’re talking about two different things? Lars signature.asc Description: PGP signature
bug#39925: `guix pull` failure in multi-machine setup
Hi! Lars-Dominik Braun skribis: >> In fact, the former would probably not work because ‘guix pull’ modifies >> the local /var/guix/profiles, not the one on the host that runs the >> daemon. > Yes, /var/guix is shared via NFS too. Otherwise roaming between machines > wouldn’t work at all. > >> So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as >> powerful as you thought. :-) > It is, it’s just a bug we have to fix :) Can I help you debug this somehow, > i.e. figure out where exactly the error message is coming from? Well, I think you’re really asking for a new feature; we need more than just talk to a remote daemon. Updating profiles like ‘guix package’ and ‘guix pull’ do involve two things: 1. building the profile—this is done by talking to the daemon; 2. modifying things in /var/guix/profiles & co. GUIX_DAEMON_SOCKET addresses #1 but not #2. For #2, we would need to do something like Jakub did in (guix scripts system reconfigure), where the effectul bits can be transparently evaluated either locally or remotely. But really, that’d be a brand new feature, so I’m marking it as a wishlist if you don’t mind. :-) Thanks, Ludo’.
bug#39925: `guix pull` failure in multi-machine setup
Hi Ludo, > Oh it may be that we would also need to let ‘HOME’ through, so that > ~/.ssh/config is found, for example. That could have undesirable side > effects that are best avoided, though (e.g., ~/.cache/guile would become > visible.) shouldn’t be a problem since ~/.ssh/config does not exist for that user and known hosts are globally declared in /etc/ssh/ssh_known_hosts (strace indicates that guile-ssh/libssh reads that file). > I agree that the error message is sub-optimal. Not sure how to improve > on it (how can ‘build-self.scm’ know that it’s failing because of > that?). If I stop the daemon and `guix pull` it just says “guix pull: error: failed to connect to `/var/guix/daemon-socket/socket': Connection refused”. Something similar should do. I don’t know whether that’s possible though. > You could run: > ssh host guix pull Sure, that’s the only workaround I can think of right now. > In fact, the former would probably not work because ‘guix pull’ modifies > the local /var/guix/profiles, not the one on the host that runs the > daemon. Yes, /var/guix is shared via NFS too. Otherwise roaming between machines wouldn’t work at all. > So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as > powerful as you thought. :-) It is, it’s just a bug we have to fix :) Can I help you debug this somehow, i.e. figure out where exactly the error message is coming from? Cheers, Lars signature.asc Description: PGP signature
bug#39925: `guix pull` failure in multi-machine setup
Hi, Lars-Dominik Braun skribis: >> Sounds like this ssh URI is not valid on the nodes, is that right? > I would consider it valid, since `ssh master.` and `guix build > ` both work just fine from the nodes. It’s just `guix pull`, which is > causing issues. Oh it may be that we would also need to let ‘HOME’ through, so that ~/.ssh/config is found, for example. That could have undesirable side effects that are best avoided, though (e.g., ~/.cache/guile would become visible.) >> Right. So perhaps I don’t quite understand the use case. What about >> simply pulling from one of these machines, if everything is shared over >> NFS? > Sure, that’s an option, but anyone who tries will get a strange error message. I agree that the error message is sub-optimal. Not sure how to improve on it (how can ‘build-self.scm’ know that it’s failing because of that?). > And it breaks the appeal of having a remote guix daemon in the first place, > that is being able to run `guix ` on any machine I log into. If that > is not the case (i.e. not for `guix pull`) it would be more consistent to ask > users to SSH into a different machine every time they interact with guix. Does > that explain my use case? Instead of: GUIX_DAEMON_SOCKET=ssh://host guix pull You could run: ssh host guix pull In fact, the former would probably not work because ‘guix pull’ modifies the local /var/guix/profiles, not the one on the host that runs the daemon. So maybe the problem is that ‘GUIX_DAEMON_SOCKET=ssh://’ isn’t quite as powerful as you thought. :-) It’s really just a way to talk to a remote daemon, but ‘guix pull’, ‘guix package’, etc. also need to access /var/guix/profiles. Thanks, Ludo’.
bug#39925: `guix pull` failure in multi-machine setup
Hi, > Sounds like this ssh URI is not valid on the nodes, is that right? I would consider it valid, since `ssh master.` and `guix build ` both work just fine from the nodes. It’s just `guix pull`, which is causing issues. > Right. So perhaps I don’t quite understand the use case. What about > simply pulling from one of these machines, if everything is shared over > NFS? Sure, that’s an option, but anyone who tries will get a strange error message. And it breaks the appeal of having a remote guix daemon in the first place, that is being able to run `guix ` on any machine I log into. If that is not the case (i.e. not for `guix pull`) it would be more consistent to ask users to SSH into a different machine every time they interact with guix. Does that explain my use case? Lars signature.asc Description: PGP signature
bug#39925: `guix pull` failure in multi-machine setup
Hello, Lars-Dominik Braun skribis: >> This is a limitation in ‘build-aux/build-self.scm’: […] > I don’t understand what’s going on there unfortunately. Is there a high-level > explanation somewhere in the manual? > >> We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment >> variable through, along these lines: > Nope, that does not seem to be enough. After pulling on master doing the same > on a node (with a patched guix) yields: > > ---snip--- > ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(# &store-connection-error [file: "ssh://master." errno: 95] > 7f0f325f77b0>)'. > ---snap--- > > Any ideas? Sounds like this ssh URI is not valid on the nodes, is that right? >> + (when (and (not (file-port? port) daemon-socket)) > (when (and (not (file-port? port)) daemon-socket) > I assume:↑ > >> […] and won’t work with old Guix revisions anyway. > That means `guix time-machine` could not go back beyond a commit that fixes > the > issue, correct? Not a concern for me. Correct. >> However, for your use case, you could perhaps simply pull on one machine >> and use ‘guix copy’ to send Guix elsewhere? > The store is the same on all machines, since /gnu/store, /var/guix and /home > are all shared via NFS. As far as I understand the manual `guix copy` would be > useful for store to store transfers on different machines only. Right. So perhaps I don’t quite understand the use case. What about simply pulling from one of these machines, if everything is shared over NFS? HTH, Ludo’.
bug#39925: `guix pull` failure in multi-machine setup
Hi Ludo, > This is a limitation in ‘build-aux/build-self.scm’: […] I don’t understand what’s going on there unfortunately. Is there a high-level explanation somewhere in the manual? > We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment > variable through, along these lines: Nope, that does not seem to be enough. After pulling on master doing the same on a node (with a patched guix) yields: ---snip--- ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#" errno: 95] 7f0f325f77b0>)'. ---snap--- Any ideas? > + (when (and (not (file-port? port) daemon-socket)) (when (and (not (file-port? port)) daemon-socket) I assume:↑ > […] and won’t work with old Guix revisions anyway. That means `guix time-machine` could not go back beyond a commit that fixes the issue, correct? Not a concern for me. > However, for your use case, you could perhaps simply pull on one machine > and use ‘guix copy’ to send Guix elsewhere? The store is the same on all machines, since /gnu/store, /var/guix and /home are all shared via NFS. As far as I understand the manual `guix copy` would be useful for store to store transfers on different machines only. Lars signature.asc Description: PGP signature
bug#39925: `guix pull` failure in multi-machine setup
Hi, Lars-Dominik Braun skribis: > I’m using guix on a multi-machine setup with a single remote guix-daemon that > can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master. on the > compute nodes. Running `guix pull` on master works fine (the variable is not > set here), but it does not on a compute node. Instead it fails with this > error: > > ---snip--- > Backtrace: >1 (primitive-load > "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation") > In ice-9/eval.scm: >293:34 0 (_ #(#(#(#(#(#(#(#(#(#(#(#(# 7f19dd213140> (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?)) > > ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(# &store-connection-error [file: "/var/guix/daemon-socket/socket" errno: 111] > 7f19dba3a090>)'. > guix pull: error: You found a bug: the program > '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation' > failed to compute the derivation for Guix (version: > "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux"; > host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1). > Please report it by email to . > ---snap--- > > Obviously the socket on that compute machine is not working, because it’s on > an > NFS share /var/guix belonging to master. But why is the socket considered in > the first place? This is a limitation in ‘build-aux/build-self.scm’: ;; Use the port beneath the current store as the stdin of BUILD. This ;; way, we know 'open-pipe*' will not close it on 'exec'. If PORT is ;; not a file port (e.g., it's an SSH channel), then the subprocess's ;; stdin will actually be /dev/null. (let* ((pipe (with-input-from-port port (lambda () ;; … (if (file-port? port) ;<- here (number->string (logior major minor)) "none")) We could work around it by letting the ‘GUIX_DAEMON_SOCKET’ environment variable through, along these lines: diff --git a/build-aux/build-self.scm b/build-aux/build-self.scm index f2e785b7f1..18a78b5f41 100644 --- a/build-aux/build-self.scm +++ b/build-aux/build-self.scm @@ -400,6 +400,7 @@ files." #:pull-version pull-version)) (system (if system (return system) (current-system))) (home -> (getenv "HOME")) + (daemon-socket -> (getenv "GUIX_DAEMON_SOCKET")) ;; Note: Use the deprecated names here because the ;; caller might be Guix <= 0.16.0. @@ -424,6 +425,8 @@ files." (when home ;; Inherit HOME so that 'xdg-directory' works. (setenv "HOME" home)) + (when (and (not (file-port? port) daemon-socket)) +(setenv "GUIX_DAEMON_SOCKET" daemon-socket)) (open-pipe* OPEN_READ (derivation->output-path build) source system version It’s a bit hacky though, and won’t work with old Guix revisions anyway. However, for your use case, you could perhaps simply pull on one machine and use ‘guix copy’ to send Guix elsewhere? Or even explicitly run ‘guix pull’ on each node? Thanks, Ludo’.
bug#39925: `guix pull` failure in multi-machine setup
Hi, I’m using guix on a multi-machine setup with a single remote guix-daemon that can be reached via SSH. Thus GUIX_DAEMON_SOCKET=ssh://master. on the compute nodes. Running `guix pull` on master works fine (the variable is not set here), but it does not on a compute node. Instead it fails with this error: ---snip--- Backtrace: 1 (primitive-load "/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation") In ice-9/eval.scm: 293:34 0 (_ #(#(#(#(#(#(#(#(#(#(#(#(# (?)) #) # ?) ?) ?) ?) ?) ?) ?) ?) ?) ?)) ice-9/eval.scm:293:34: Throw to key `srfi-34' with args `(#)'. guix pull: error: You found a bug: the program '/gnu/store/n5wgvz287dwm62474mr42x34wl5j5wh7-compute-guix-derivation' failed to compute the derivation for Guix (version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; system: "x86_64-linux"; host version: "aac148a87b9a79b9992b8b1a9d76c217175d4a88"; pull-version: 1). Please report it by email to . ---snap--- Obviously the socket on that compute machine is not working, because it’s on an NFS share /var/guix belonging to master. But why is the socket considered in the first place? Cheers, Lars signature.asc Description: PGP signature