On Wed, Apr 2, 2014 at 3:35 PM, Andy Kurth <[email protected]> wrote: > Your output looks almost the same as when I successfully ssh in to a > working VM here. The only difference I can see up to when yours times > out is that the last line refers to "vcl.key-cert": > > Yours: > debug3: key_read: missing keytype > debug1: identity file /etc/vcl/vcl.key type 1 > debug1: identity file /etc/vcl/vcl.key-cert type -1 > > Ours: > debug3: key_read: missing keytype > debug1: identity file /etc/vcl/vcl.key type -1 > > While logged in as root, you can try stopping the sshd service and > then from a Cygwin shell, run: > /usr/sbin/sshd.exe -ddd > > Then try to connect from the management node. The debugging output > from sshd.exe should be displayed in the Cygwin window. What does it > look like? I'll compare it with one of ours. You can also try the > same on a working computer and compare the output. >
Every time I get one of these hung sshd's it's the same thing -- I can't restart sshd with cygrunsrv or M$ services, but I can taskkill it and then start it and it works fine. I did grab the output of a sshd -ddd session, but it will just show a good working connection because once sshd is killed and restarted it works fine. Thanks, Curtis. > > On Wed, Apr 2, 2014 at 3:41 PM, Curtis <[email protected]> wrote: >> >> On Wed, Apr 2, 2014 at 11:06 AM, Andy Kurth <[email protected]> wrote: >> > It looks like ssh on the management node is using a ConnectTimeout value of >> > 2 seconds: >> > debug3: timeout: 1999 ms remain after connect >> > >> > Does specifying a longer time make a difference? >> > ssh -o ConnectTimeout=10 -vvvv vm79 >> > >> >> No, doesn't seem to change anything. Though I had set the connect >> timeout to 2 only recently because I was testing rebooting virtual >> machines and seeing if I could connect via ssh to them after a reboot, >> so it was set to whatever the default was before when it first started >> breakin. >> >> Below is a session with it set to 10. >> >> root@VCL-PROD:~] $ ssh -vvvv vm79 >> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013 >> debug1: Reading configuration data /root/.ssh/config >> debug1: Applying options for vm* >> debug1: Reading configuration data /etc/ssh/ssh_config >> debug1: Applying options for * >> debug2: ssh_connect: needpriv 0 >> debug1: Connecting to vm79 [10.1.0.195] port 22. >> debug2: fd 3 setting O_NONBLOCK >> debug1: fd 3 clearing O_NONBLOCK >> debug1: Connection established. >> debug3: timeout: 10000 ms remain after connect >> debug1: permanently_set_uid: 0/0 >> debug3: Not a RSA1 key file /etc/vcl/vcl.key. >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> debug3: key_read: missing keytype >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug3: key_read: missing whitespace >> debug2: key_type_from_name: unknown key type '-----END' >> debug3: key_read: missing keytype >> debug1: identity file /etc/vcl/vcl.key type 1 >> debug1: identity file /etc/vcl/vcl.key-cert type -1 >> Connection timed out during banner exchange >> >> > >> > >> > On Wed, Apr 2, 2014 at 11:19 AM, Curtis <[email protected]> wrote: >> > >> >> Hi Andy, >> >> >> >> Thanks, inline... >> >> >> >> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <[email protected]> wrote: >> >> > I can't tell from just the commands. They look normal. Were there any >> >> > WARNING messages during the image process prior to the reboot? >> >> > >> >> > What error message is reported when you try to ssh from the management >> >> > node? (Connection timed out, etc) It may be helpful if you send the >> >> output >> >> > from running "ssh -v <win_computer>". >> >> > >> >> >> >> This is what that output looks like: >> >> >> >> [root@VCL-PROD:~] $ ssh -vvvv vm79 >> >> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013 >> >> debug1: Reading configuration data /root/.ssh/config >> >> debug1: Applying options for vm* >> >> debug1: Reading configuration data /etc/ssh/ssh_config >> >> debug1: Applying options for * >> >> debug2: ssh_connect: needpriv 0 >> >> debug1: Connecting to vm79 [10.1.0.195] port 22. >> >> debug2: fd 3 setting O_NONBLOCK >> >> debug1: fd 3 clearing O_NONBLOCK >> >> debug1: Connection established. >> >> debug3: timeout: 1999 ms remain after connect >> >> debug1: permanently_set_uid: 0/0 >> >> debug3: Not a RSA1 key file /etc/vcl/vcl.key. >> >> debug2: key_type_from_name: unknown key type '-----BEGIN' >> >> debug3: key_read: missing keytype >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug3: key_read: missing whitespace >> >> debug2: key_type_from_name: unknown key type '-----END' >> >> debug3: key_read: missing keytype >> >> debug1: identity file /etc/vcl/vcl.key type 1 >> >> debug1: identity file /etc/vcl/vcl.key-cert type -1 >> >> Connection timed out during banner exchange >> >> >> >> > To troubleshoot, you'll need to login as root using the password which >> >> was >> >> > redacted from the vcld.log output. Check the following: >> >> > >> >> > Is the Cygwin SSHD service running? If not, try to start it. If you >> >> > get >> >> > an error related to incorrect credentials then something went wrong when >> >> > root's password was set early on in the image capture process. >> >> >> >> It's usually hung up, ie. won't respond to commands. >> >> >> >> If I login to the vm on its console (with virt-manager) then sshd >> >> can't be restarted from the windows service console, or cygrunsrv, but >> >> if I kill it with taskill and then start it, it starts up fine. >> >> >> >> Something to do with long logon times maybe? >> >> >> >> > >> >> > If SSHD is running, it could be a firewall problem. Try simply turning >> >> off >> >> > the firewall temporarily on the Windows computer and try to ssh from the >> >> > management node. >> >> >> >> The windows fw is not on, or at least it says it's not on. It's turned >> >> off in the image. >> >> >> >> > >> >> > If the firewall isn't the problem, something isn't configured correctly >> >> > with the sshd service. While logged in as root, you can try running >> >> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd. This gets run >> >> automatically >> >> > when an image is loaded and configures sshd correctly and starts the >> >> > service. If running this solves the problem, then you'll have to figure >> >> > out which commands or changes made by this script fixed it. If >> >> > possibly, >> >> > it will be easier to troubleshoot if you take a snapshot of the computer >> >> > before running this script so that you can revert to the broken state in >> >> > order to narrow down the problem. >> >> > >> >> >> >> Ok will give the update_cygwin.cmd a shot. >> >> >> >> Thanks, >> >> Curtis. >> >> >> >> > -Andy >> >> > >> >> > >> >> > >> >> > >> >> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <[email protected]> wrote: >> >> > >> >> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <[email protected]> wrote: >> >> >> > Hi All, >> >> >> > >> >> >> > We are having an issue with some of our images where when we try to >> >> >> > create a new image from an existing image, everything goes ok until >> >> >> > the part where the virtual machine is rebooted, and after it's >> >> >> > rebooted sshd does not start up and the imaging process fails. >> >> >> > >> >> >> > Anyone have any thoughts? I'm fairly sure it has something to do with >> >> >> > the various commands that are run on the image once an image creation >> >> >> > process starts. >> >> >> >> >> >> Also, this gist has all the commands that are being run: >> >> >> >> >> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03 >> >> >> >> >> >> But I'm not much of a windows administrator -- does anyone see >> >> >> anything unusual in that gist that might be causing issues? Perhaps >> >> >> something with the root logon or password? >> >> >> >> >> >> > >> >> >> > Thanks, >> >> >> > Curtis. >> >> >> > >> >> >> > -- >> >> >> > Twitter: @serverascode >> >> >> > Blog: serverascode.com >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Twitter: @serverascode >> >> >> Blog: serverascode.com >> >> >> >> >> >> >> >> >> >> >> -- >> >> Twitter: @serverascode >> >> Blog: serverascode.com >> >> >> >> >> >> -- >> Twitter: @serverascode >> Blog: serverascode.com -- Twitter: @serverascode Blog: serverascode.com
