bug#58290: guile ssh error on guix deploy

2023-01-30 Thread Ludovic Courtès
Hi,

The bug should be fixed with the upgrade to Guile-SSH 0.16.3 in commit
e6f557dd23fbb298afa92dba3133ed985e560699.

Thanks Artyom for the prompt fix & release!

Ludo’.





bug#58290: guile ssh error on guix deploy

2023-01-24 Thread Andrew Tropin
On 2023-01-23 22:50, Ludovic Courtès wrote:

> Hi Artyom,
>
> "Artyom V. Poptsov"  skribis:
>
>> I figured out how to fix Guile-SSH channel "leak", so to say, that lead
>> to the OpenSSH "no more sessions" problem.
>>
>> Please run your tests with this branch and let me know if it works for
>> you (including all the edge cases):
>>   https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-channel-leak
>
> It works!  Specifically, I ran:
>
>   guix shell guile guile-ssh \
> --with-branch=guile-ssh=wip-fix-channel-leak -- \
> guile ssh-channels.scm
>
> and the script (same one as before) ran several hundreds of iterations
> just fine.

Can confirm the same, works on my machine :)  Thank you very much.

>
> I had looked at ‘ptob_close’ and related code but didn’t see this issue;
> good catch!
>
> If you plan to push a new release, we’ll just upgrade in Guix; otherwise
> we can apply the patch locally.
>
> Thank you!
>
> Ludo’.

-- 
Best regards,
Andrew Tropin


signature.asc
Description: PGP signature


bug#58290: guile ssh error on guix deploy

2023-01-23 Thread Ludovic Courtès
Hi Artyom,

"Artyom V. Poptsov"  skribis:

> I figured out how to fix Guile-SSH channel "leak", so to say, that lead
> to the OpenSSH "no more sessions" problem.
>
> Please run your tests with this branch and let me know if it works for
> you (including all the edge cases):
>   https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-channel-leak

It works!  Specifically, I ran:

  guix shell guile guile-ssh \
--with-branch=guile-ssh=wip-fix-channel-leak -- \
guile ssh-channels.scm

and the script (same one as before) ran several hundreds of iterations
just fine.

I had looked at ‘ptob_close’ and related code but didn’t see this issue;
good catch!

If you plan to push a new release, we’ll just upgrade in Guix; otherwise
we can apply the patch locally.

Thank you!

Ludo’.





bug#58290: guile ssh error on guix deploy

2023-01-23 Thread Artyom V. Poptsov
Hello,

I figured out how to fix Guile-SSH channel "leak", so to say, that lead
to the OpenSSH "no more sessions" problem.

Please run your tests with this branch and let me know if it works for
you (including all the edge cases):
  https://github.com/artyom-poptsov/guile-ssh/tree/wip-fix-channel-leak

Thanks,
- avp

-- 
Artyom "avp" Poptsov 
Home page: https://memory-heap.org/~avp/
CADR Hackerspace co-founder: https://cadrspace.ru/
GPG: D0C2 EAC1 3310 822D 98DE  B57C E9C5 A2D9 0898 A02F


signature.asc
Description: PGP signature


bug#58290: guile ssh error on guix deploy

2023-01-22 Thread Artyom V. Poptsov
Hello Ludovic, Andrew and all,

thank you for reporting the bug!

The code for reproducing the bug was very helpful indeed, I just
slightly modified the it:

--8<---cut here---start->8---
(use-modules (ssh session)
 (ssh popen)
 (ssh auth)
 (ssh log)
 (rnrs io ports))

(define session
  (make-session #:host "localhost"))

(define (main args)
  (session-parse-config! session)
  (connect! session)
  (userauth-public-key/auto! session)
  (set-log-verbosity! 'functions)

  (let loop ((i 0))
(format (current-error-port) "-- ~a --~%" i)
(let ((pipe (open-remote-pipe session "date" "r")))
  (pk 'x (get-string-all pipe))
  (close-port pipe)
  (loop (+ 1 i)
--8<---cut here---end--->8---

It seems to me that the reason Guile-SSH fails at least partly lies in
changes that were introduced in OpenSSH 5.1.

I set "LogLevel" to "DEBUG3" and ran the test code.  System logs have
shown the following error:

--8<---cut here---start->8---
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 8: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 9: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug2: channel 9: rcvd adjust 1216033
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 0: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 1: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 2: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 3: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 4: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 5: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 6: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 7: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 8: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: channel 9: will not send data 
after close
янв 22 22:40:06 elephant sshd[131488]: debug3: receive packet: type 90
янв 22 22:40:06 elephant sshd[131488]: debug1: server_input_channel_open: ctype 
session rchan 53 win 64000 max 32768
янв 22 22:40:06 elephant sshd[131488]: debug1: input_session_request
янв 22 22:40:06 elephant sshd[131488]: debug2: channel: expanding 20
янв 22 22:40:06 elephant sshd[131488]: debug1: channel 10: new [server-session]
янв 22 22:40:06 elephant sshd[131488]: debug1: session_open: channel 10
янв 22 22:40:06 elephant sshd[131488]: error: no more sessions
--8<---cut here---end--->8---

This "no more sessions" error was the key to understanding the problem.

Here's an excerpt from OpenSSH 5.1 changelog [1]:

--8<---cut here---start->8---
Changes since OpenSSH 5.0
=
[...]
 * Added a MaxSessions option to sshd_config(5) to allow control of the
   number of multiplexed sessions supported over a single TCP connection.
   This allows increasing the number of allowed sessions above the
   previous default of 10, disabling connection multiplexing 
   (MaxSessions=1) or disallowing login/shell/subsystem sessions
   entirely (MaxSessions=0).
--8<---cut here---end--->8---

When I set "MaxSessions" in "/etc/ssh/sshd_config" to 100 and restarted
the sshd daemon, the test code fails on the 100th iteration.

--8<---cut here---start->8---
[...]
;;; [2023/01/22 22:52:03.246549, 3] ssh_packet_need_rekey: rekey: 
[data_rekey_needed=0, out_blocks=817, in_blocks=1309]
;;; [2023/01/22 22:52:03.246559, 3] ssh_packet_socket_callback: rcv packet cb 
(len=0, state=INIT)
;;; [2023/01/22 22:52:03.246568, 3] ssh_packet_socket_callback: Waiting for 
more data (0 < 4)
;;; [2023/01/22 22:52:03.246579, 4] ssh_socket_pollcallback: sending control 
flow event
;;; [2023/01/22 22:52:03.246587, 4] ssh_packet_socket_controlflow_callback: 
sending channel_write_wontblock callback
;;; [2023/01/22 22:52:03.246600, 4] [GSSH DEBUG] read result: 0: #

;;; (x "Вс 22 янв 2023 22:52:03 MSK\n")
;;; [2023/01/22 22:52:03.246636, 4] [GSSH DEBUG] the channel is already freed 
by the closing request from the remote side.
-- 100 --
;;; [2023/01/22 22:52:03.246655, 2] channel_open: Creating a channel 143 with 
64000 window and 32768 max packet
;;; [2023/01/22 22:52:03.246664, 3] ssh_packet_need_rekey: rekey: 
[data_rekey_needed=0, out_blocks=820, in_blocks=1312]
;;; [2023/01/22 22:52:03.246693, 3] ssh_socket_unbuffered_write: Enabling 
POLLOUT for socket
;;; [2023/01/22 22:52:03.246703, 3] packet_send2: packet: wrote [type=90, 
len=32, padding_size=7, comp=24, payload=24]
;;; [2023/01/22 

bug#58290: guile ssh error on guix deploy

2023-01-12 Thread Ludovic Courtès
A simpler reproducer:

--8<---cut here---start->8---
(use-modules (ssh session)
 (ssh popen)
 (ssh auth)
 (rnrs io ports))

(define session
  (make-session #:host "HOST-OF-YOUR-CHOICE"))

(session-parse-config! session)
(connect! session)
(userauth-public-key/auto! session)

(let loop ((i 0))
  (pk 'loOp i)
  (let ((pipe (open-remote-pipe session "date" "r")))
(pk 'x (get-string-all pipe))
(close-port pipe)
(loop (+ 1 i
--8<---cut here---end--->8---

Ludo’.





bug#58290: guile ssh error on guix deploy

2023-01-12 Thread Ludovic Courtès
Hi,

Marius Bakke  skribis:

> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
> session rchan 66 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free 
> channel 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: 
> server-session, nchannels 11
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: 
> failure session
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, 
> nchannels 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, 
> nchannels 9
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, 
> nchannels 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, 
> nchannels 7
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, 
> nchannels 6

[...]

> In my current tests the failure always occur at the
> very end of deploy:
>
> building 
> /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: 
> channel 66 error (2) open failed: # 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it.  All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.

Here’s a reproducer:

--8<---cut here---start->8---
(use-modules (guix ssh)
 (guix))

(define %local
  (open-connection))

(define %remote
  (connect-to-remote-daemon (open-ssh-session "HOST-OF-YOUR-CHOICE")))

(let loop ((i 0))
  (pk 'loop i)
  (send-files %local 
'("/gnu/store/8fpk2cja3f07xls48jfnpgrzrljpqivr-coreutils-8.32") %remote)
  (loop (+ 1 i)))
--8<---cut here---end--->8---

It fails after 8 iterations.  It looks like we’re leaking SSH channels
or something.

However, in GDB, we can see that ‘ssh_channel_close’ is indeed called
after each iteration (via ‘ptob_close’ in Guile-SSH).

Artyom, does that ring a bell?

Thanks,
Ludo’.





bug#58290: guile ssh error on guix deploy

2022-11-10 Thread Andrew Tropin
On 2022-11-04 00:47, Marius Bakke wrote:

> Ludovic Courtès  skriver:
>
>> Hi,
>>
>> Marius Bakke  skribis:
>>
>>> I have the same problem.  Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
>>> 185.70.53.164 port 1915 ssh2: RSA 
>>> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after 
>>> sshd-2 termination. 
>>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---cut here---start->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>>  Session *s = session_new();
>>  debug("session_open: channel %d", chanid);
>>  if (s == NULL) {
>>  error("no more sessions");
>>  return 0;
>>  }
>>
>> [...]
>>
>> }
>> --8<---cut here---end--->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 
> channel 8 pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release 
> channel 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
> session rchan 65 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link 
> with channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: 
> confirm session
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: 
> channel 0 request exec reply 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 
> channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: 
> session 0 req exec
> Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
> session rchan 66 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free 
> channel 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: 
> server-session, nchannels 11
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: 
> failure session
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, 
> nchannels 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, 
> nchannels 9
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, 
> nchannels 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, 
> nchannels 7
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, 
> nchannels 6
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, 
> nchannels 5
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, 
> nchannels 4
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, 
> nchannels 3
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, 
> nchannels 2
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, 
> nchannels 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 
> channel 6 pid 25842
> Nov  3 

bug#58290: guile ssh error on guix deploy

2022-11-03 Thread Marius Bakke
Ludovic Courtès  skriver:

> Hi,
>
> Marius Bakke  skribis:
>
>> I have the same problem.  Log messages around failure look something
>> like this (extracted from above message):
>>
>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
>> 185.70.53.164 port 1915 ssh2: RSA 
>> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after 
>> sshd-2 termination. 
>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>
>> Then deploy crashes with 'Channel opening failure'.
>
> “no more sessions” comes from this:
>
> --8<---cut here---start->8---
> int
> session_open(Authctxt *authctxt, int chanid)
> {
>   Session *s = session_new();
>   debug("session_open: channel %d", chanid);
>   if (s == NULL) {
>   error("no more sessions");
>   return 0;
>   }
>
> [...]
>
> }
> --8<---cut here---end--->8---
>
> Would be interesting to run sshd in verbose/debug mode and see why we
> hit that; it could be because the maximum number of sessions has been
> reached or something.

I enabled (log-level 'debug) and this is what happens server side:

Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 
channel 8 pid 26345
Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release 
channel 8
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
session rchan 65 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with 
channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm 
session
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 
0 request exec reply 1
Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 
channel 0
Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 
0 req exec
Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype 
session rchan 66 win 64000 max 32768
Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
["no more sessions" error occurs here, in a different log file]
Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 
10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, 
nchannels 11
Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure 
session
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, 
nchannels 10
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, 
nchannels 9
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, 
nchannels 8
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, 
nchannels 7
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, 
nchannels 6
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, 
nchannels 5
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, 
nchannels 4
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, 
nchannels 3
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, 
nchannels 2
Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, 
nchannels 1
Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup

I compared this with a successful deploy:

Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 
channel 6 pid 25842
Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release 
channel 6
Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype 
session rchan 65 win 64000 max 32768
Nov  3 21:46:26 

bug#58290: guile ssh error on guix deploy

2022-10-18 Thread Ludovic Courtès
Hi,

Marius Bakke  skribis:

> I have the same problem.  Log messages around failure look something
> like this (extracted from above message):
>
>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
> 185.70.53.164 port 1915 ssh2: RSA 
> SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after 
> sshd-2 termination. 
>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>
> Then deploy crashes with 'Channel opening failure'.

“no more sessions” comes from this:

--8<---cut here---start->8---
int
session_open(Authctxt *authctxt, int chanid)
{
Session *s = session_new();
debug("session_open: channel %d", chanid);
if (s == NULL) {
error("no more sessions");
return 0;
}

[...]

}
--8<---cut here---end--->8---

Would be interesting to run sshd in verbose/debug mode and see why we
hit that; it could be because the maximum number of sessions has been
reached or something.

> Due to the number of SSH connections made by deploy and frequent
> occurence of this problem, I was not able to successfully deploy through
> many attempts.

Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
session per machine.  (I think it works well with the 25 local build
machines behind berlin; it’s also fine for a small set of machines I
take care of at work.)

> I found that removing the memoizing open-machine-ssh-session* in
> (gnu machine ssh) works around it and can happily deploy again.
>
> (that is, just use 'open-machine-ssh-session' instead)

Interesting.  That gives us a hint.  Could you add a ‘pk’ in
‘open-machine-ssh-session*’ to see how many connections it opens?

Thanks for your feedback!

Ludo’.





bug#58290: guile ssh error on guix deploy

2022-10-17 Thread Marius Bakke
Ludovic Courtès  skriver:

> Hi,
>
> Andrew Tropin  skribis:
>
 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
 Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel 
 opening failure: channel 67 error (2) open failed" #>>> (closed) 7f7d1af9e140> #f)'.
>>>
>>> Are there any hints from sshd in the /var/log/messages of that machine
>>> as to why the connection was closed?
>>
>> bob@pinky /var/log$ zcat messages.1.gz | grep "Oct  4" | grep ssh
>
> It’s hard (if not impossible) to see which lines correspond to the error
> above.  :-)
>
> Could try to reproduce the bug, and have ‘tail -f /var/log/messages’
> running on the server side so you can capture just the right lines?
> /var/log/debug might be interesting too.

I have the same problem.  Log messages around failure look something
like this (extracted from above message):

  Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
  Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 
185.70.53.164 port 1915 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
  Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
  Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 
termination. 
  Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 

Then deploy crashes with 'Channel opening failure'.

Due to the number of SSH connections made by deploy and frequent
occurence of this problem, I was not able to successfully deploy through
many attempts.

I found that removing the memoizing open-machine-ssh-session* in
(gnu machine ssh) works around it and can happily deploy again.

(that is, just use 'open-machine-ssh-session' instead)


signature.asc
Description: PGP signature


bug#58290: guile ssh error on guix deploy

2022-10-17 Thread Ludovic Courtès
Hi,

Andrew Tropin  skribis:

>>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>>> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel 
>>> opening failure: channel 67 error (2) open failed" #>> (closed) 7f7d1af9e140> #f)'.
>>
>> Are there any hints from sshd in the /var/log/messages of that machine
>> as to why the connection was closed?
>
> bob@pinky /var/log$ zcat messages.1.gz | grep "Oct  4" | grep ssh

It’s hard (if not impossible) to see which lines correspond to the error
above.  :-)

Could try to reproduce the bug, and have ‘tail -f /var/log/messages’
running on the server side so you can capture just the right lines?
/var/log/debug might be interesting too.

Thanks,
Ludo’.





bug#58290: guile ssh error on guix deploy

2022-10-14 Thread Andrew Tropin
On 2022-10-08 14:39, Ludovic Courtès wrote:

> Hi!
>
> Andrew Tropin  skribis:
>
>> From time to time I get the following error, the only thing I change is
>> IPv6 config in static-networking service.  Sometimes it fails, but after
>> a few more attempts with the same config it deploys sucessfully.
>>
>> -*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
>> Compilation started at Tue Oct  4 14:19:46
>>
>> make -k deploy-pinky
>> guix deploy ./guix/pinky.scm
>
> [...]
>
>> In guix/ssh.scm:
>> 376:2  7 (send-files # _ # …)
>> 222:5  6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
>> In ssh/popen.scm:
>>  64:4  5 (open-remote-pipe* _ "r+" _ . _)
>> In unknown file:
>>4 (channel-open-session #)
>> In ice-9/boot-9.scm:
>>   1685:16  3 (raise-exception _ #:continuable? _)
>>   1683:16  2 (raise-exception _ #:continuable? _)
>>   1685:16  1 (raise-exception _ #:continuable? _)
>>   1685:16  0 (raise-exception _ #:continuable? _)
>>
>> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel 
>> opening failure: channel 67 error (2) open failed" #> (closed) 7f7d1af9e140> #f)'.
>
> Are there any hints from sshd in the /var/log/messages of that machine
> as to why the connection was closed?

--8<---cut here---start->8---
bob@pinky /var/log$ zcat messages.1.gz | grep "Oct  4" | grep ssh
Oct  4 05:57:09 localhost shepherd[1]: Service sshd-380 has been started. 
Oct  4 05:57:11 localhost sshd[15950]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 54710 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:34:17 localhost shepherd[1]: Service sshd-381 has been started. 
Oct  4 06:34:19 localhost sshd[15958]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 57064 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:36:55 localhost shepherd[1]: Service sshd-382 has been started. 
Oct  4 06:36:57 localhost sshd[15991]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 41088 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:37:30 localhost shepherd[1]: Service sshd-383 has been started. 
Oct  4 06:37:33 localhost sshd[16031]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 56148 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:43:39 localhost shepherd[1]: 3 connections still in use after 
sshd-382 termination. 
Oct  4 06:43:39 localhost shepherd[1]: Service sshd-382 (PID 15991) exited with 
255. 
Oct  4 06:43:39 localhost shepherd[1]: Service sshd-382 has been disabled. 
Oct  4 06:43:39 localhost shepherd[1]: Transient service sshd-382 terminated, 
now unregistered. 
Oct  4 06:43:41 localhost shepherd[1]: Service sshd-384 has been started. 
Oct  4 06:43:42 localhost sshd[16166]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 36040 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:43:49 localhost shepherd[1]: 3 connections still in use after 
sshd-384 termination. 
Oct  4 06:43:49 localhost shepherd[1]: Service sshd-384 has been disabled. 
Oct  4 06:43:49 localhost shepherd[1]: Transient service sshd-384 terminated, 
now unregistered. 
Oct  4 06:48:58 localhost shepherd[1]: Service sshd-385 has been started. 
Oct  4 06:49:00 localhost sshd[16205]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 34134 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:52:05 localhost shepherd[1]: Service sshd-386 has been started. 
Oct  4 06:52:06 localhost sshd[16212]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 55922 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:53:02 localhost shepherd[1]: 4 connections still in use after 
sshd-386 termination. 
Oct  4 06:53:02 localhost shepherd[1]: Service sshd-386 has been disabled. 
Oct  4 06:53:02 localhost shepherd[1]: Transient service sshd-386 terminated, 
now unregistered. 
Oct  4 06:53:03 localhost shepherd[1]: Service sshd-387 has been started. 
Oct  4 06:53:04 localhost sshd[16370]: Accepted publickey for bob from 
201:2a73:cac3:e128:cf3d:8dc:8afa:df66 port 50604 ssh2: RSA 
SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
Oct  4 06:53:11 localhost shepherd[1]: 4 connections still in use after 
sshd-387 termination. 
Oct  4 06:53:11 localhost shepherd[1]: Service sshd-387 has been disabled. 
Oct  4 06:53:11 localhost shepherd[1]: Transient service sshd-387 terminated, 
now unregistered. 
Oct  4 06:54:25 localhost shepherd[1]: Service ssh-daemon has been stopped. 
Oct  4 06:57:42 localhost shepherd[1]: Service ssh-daemon has been started. 
Oct  4 06:58:20 localhost shepherd[1]: Service sshd-1 has been started. 
Oct  4 06:58:22 localhost sshd[239]: Accepted publickey for bob from 

bug#58290: guile ssh error on guix deploy

2022-10-08 Thread Ludovic Courtès
Hi!

Andrew Tropin  skribis:

> From time to time I get the following error, the only thing I change is
> IPv6 config in static-networking service.  Sometimes it fails, but after
> a few more attempts with the same config it deploys sucessfully.
>
> -*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
> Compilation started at Tue Oct  4 14:19:46
>
> make -k deploy-pinky
> guix deploy ./guix/pinky.scm

[...]

> In guix/ssh.scm:
> 376:2  7 (send-files # _ # …)
> 222:5  6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
> In ssh/popen.scm:
>  64:4  5 (open-remote-pipe* _ "r+" _ . _)
> In unknown file:
>4 (channel-open-session #)
> In ice-9/boot-9.scm:
>   1685:16  3 (raise-exception _ #:continuable? _)
>   1683:16  2 (raise-exception _ #:continuable? _)
>   1685:16  1 (raise-exception _ #:continuable? _)
>   1685:16  0 (raise-exception _ #:continuable? _)
>
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel 
> opening failure: channel 67 error (2) open failed" # (closed) 7f7d1af9e140> #f)'.

Are there any hints from sshd in the /var/log/messages of that machine
as to why the connection was closed?

Thanks,
Ludo’.





bug#58290: guile ssh error on guix deploy

2022-10-04 Thread Andrew Tropin
From time to time I get the following error, the only thing I change is
IPv6 config in static-networking service.  Sometimes it fails, but after
a few more attempts with the same config it deploys sucessfully.

--8<---cut here---start->8---
-*- mode: compilation; default-directory: "~/work/abcdw/trop.in/" -*-
Compilation started at Tue Oct  4 14:19:46

make -k deploy-pinky
guix deploy ./guix/pinky.scm
The following 1 machine will be deployed:
  pinky

guix deploy: deploying to pinky...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
substitute: updating substitutes from 'https://substitutes.nonguix.org'... 
100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'http://ci.guix.trop.in'... 100.0%
The following derivations will be built:
  /gnu/store/yzilvx4jr4s7174f6azxxbb5c24311xc-remote-exp.scm.drv
  /gnu/store/sqlbbg5z19h2ww86p8nzhfbnidsm5sag-switch-to-system.scm.drv
  /gnu/store/ayssizgz8i4mnk6p2yhilqgzwim4ql7j-system.drv
  /gnu/store/9fxf3gcgi07hg5nmhjvnvlhk6829h304-provenance.drv
  /gnu/store/djs7vabv5wr3inz3pva87wfw7ya10ajq-boot.drv
  /gnu/store/smmzsd1b9v4qmncvl0pl6f0nrasp39ks-activate.scm.drv
  /gnu/store/5hpm7v0xnqqilhj7qdg66cjkf8dvkimx-activate-service.scm.drv
  /gnu/store/g62yd6a8wd081w0y4jji83dbcxhiks3i-etc.drv

building /gnu/store/9fxf3gcgi07hg5nmhjvnvlhk6829h304-provenance.drv...
building /gnu/store/g62yd6a8wd081w0y4jji83dbcxhiks3i-etc.drv...
building /gnu/store/5hpm7v0xnqqilhj7qdg66cjkf8dvkimx-activate-service.scm.drv...
building /gnu/store/smmzsd1b9v4qmncvl0pl6f0nrasp39ks-activate.scm.drv...
building /gnu/store/djs7vabv5wr3inz3pva87wfw7ya10ajq-boot.drv...
building /gnu/store/ayssizgz8i4mnk6p2yhilqgzwim4ql7j-system.drv...
building /gnu/store/sqlbbg5z19h2ww86p8nzhfbnidsm5sag-switch-to-system.scm.drv...
building /gnu/store/yzilvx4jr4s7174f6azxxbb5c24311xc-remote-exp.scm.drv...
guix deploy: sending 10 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
guix deploy: sending 0 store items (0 MiB) to '23.137.249.202'...
The following derivations will be built:
  /gnu/store/36rmj2xp0sd7lx0ndv8imkyxrh2lmmdy-remote-exp.scm.drv
  /gnu/store/823hs0g4wxdmfwmj6nn8smcnrp45a1g7-install-bootloader.scm.drv
  /gnu/store/0m7f6gg0d0iq6r99p40rw3m65z52fjiy-grub.cfg.drv

building /gnu/store/0m7f6gg0d0iq6r99p40rw3m65z52fjiy-grub.cfg.drv...
building 
/gnu/store/823hs0g4wxdmfwmj6nn8smcnrp45a1g7-install-bootloader.scm.drv...
building /gnu/store/36rmj2xp0sd7lx0ndv8imkyxrh2lmmdy-remote-exp.scm.drv...
;;; [2022/10/04 14:20:23.191118, 0] [GSSH ERROR] Channel opening failure: 
channel 67 error (2) open failed: #
Backtrace:
In guix/store.scm:
  1402:13 19 (map/accumulate-builds # …)
  1377:11 18 (map/accumulate-builds # …)
   1295:8 17 (call-with-build-handler # …)
In ice-9/boot-9.scm:
  1752:10 16 (with-exception-handler _ _ #:unwind? _ # _)
In guix/scripts/deploy.scm:
168:6 15 (_)
In guix/store.scm:
  2165:25 14 (run-with-store # …)
In gnu/machine/ssh.scm:
   506:32 13 (_ _)
In ice-9/boot-9.scm:
  1752:10 12 (with-exception-handler _ _ #:unwind? _ # _)
In gnu/machine/ssh.scm:
   506:32 11 (_)
In guix/store.scm:
  2165:25 10 (run-with-store # …)
In guix/remote.scm:
   138:10  9 (_ _)
In guix/store.scm:
  2037:38  8 (_ #)
In guix/ssh.scm:
376:2  7 (send-files # _ # …)
222:5  6 (remote-run (begin (use-modules (guix) (srfi #) # #) …) #)
In ssh/popen.scm:
 64:4  5 (open-remote-pipe* _ "r+" _ . _)
In unknown file:
   4 (channel-open-session #)
In ice-9/boot-9.scm:
  1685:16  3 (raise-exception _ #:continuable? _)
  1683:16  2 (raise-exception _ #:continuable? _)
  1685:16  1 (raise-exception _ #:continuable? _)
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `guile-ssh-error' with args `("channel-open-session" "Channel 
opening failure: channel 67 error (2) open failed" # #f)'.
make: *** [Makefile:28: deploy-pinky] Error 1

Compilation exited abnormally with code 2 at Tue Oct  4 14:20:23
--8<---cut here---end--->8---

-- 
Best regards,
Andrew Tropin


signature.asc
Description: PGP signature