On Wed, Apr 2, 2014 at 3:35 PM, Andy Kurth <[email protected]> wrote:
> Your output looks almost the same as when I successfully ssh in to a
> working VM here.  The only difference I can see up to when yours times
> out is that the last line refers to "vcl.key-cert":
>
> Yours:
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type 1
> debug1: identity file /etc/vcl/vcl.key-cert type -1
>
> Ours:
> debug3: key_read: missing keytype
> debug1: identity file /etc/vcl/vcl.key type -1
>
> While logged in as root, you can try stopping the sshd service and
> then from a Cygwin shell, run:
> /usr/sbin/sshd.exe -ddd
>
> Then try to connect from the management node.  The debugging output
> from sshd.exe should be displayed in the Cygwin window.  What does it
> look like?  I'll compare it with one of ours.  You can also try the
> same on a working computer and compare the output.
>

Every time I get one of these hung sshd's it's the same thing -- I
can't restart sshd with cygrunsrv or M$ services, but I can taskkill
it and then start it and it works fine.

I did grab the output of a sshd -ddd session, but it will just show a
good working connection because once sshd is killed and restarted it
works fine.

Thanks,
Curtis.

>
> On Wed, Apr 2, 2014 at 3:41 PM, Curtis <[email protected]> wrote:
>>
>> On Wed, Apr 2, 2014 at 11:06 AM, Andy Kurth <[email protected]> wrote:
>> > It looks like ssh on the management node is using a ConnectTimeout value of
>> > 2 seconds:
>> > debug3: timeout: 1999 ms remain after connect
>> >
>> > Does specifying a longer time make a difference?
>> > ssh -o ConnectTimeout=10 -vvvv vm79
>> >
>>
>> No, doesn't seem to change anything. Though I had set the connect
>> timeout to 2 only recently because I was testing rebooting virtual
>> machines and seeing if I could connect via ssh to them after a reboot,
>> so it was set to whatever the default was before when it first started
>> breakin.
>>
>> Below is a session with it set to 10.
>>
>> root@VCL-PROD:~] $ ssh -vvvv vm79
>> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
>> debug1: Reading configuration data /root/.ssh/config
>> debug1: Applying options for vm*
>> debug1: Reading configuration data /etc/ssh/ssh_config
>> debug1: Applying options for *
>> debug2: ssh_connect: needpriv 0
>> debug1: Connecting to vm79 [10.1.0.195] port 22.
>> debug2: fd 3 setting O_NONBLOCK
>> debug1: fd 3 clearing O_NONBLOCK
>> debug1: Connection established.
>> debug3: timeout: 10000 ms remain after connect
>> debug1: permanently_set_uid: 0/0
>> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
>> debug2: key_type_from_name: unknown key type '-----BEGIN'
>> debug3: key_read: missing keytype
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug3: key_read: missing whitespace
>> debug2: key_type_from_name: unknown key type '-----END'
>> debug3: key_read: missing keytype
>> debug1: identity file /etc/vcl/vcl.key type 1
>> debug1: identity file /etc/vcl/vcl.key-cert type -1
>> Connection timed out during banner exchange
>>
>> >
>> >
>> > On Wed, Apr 2, 2014 at 11:19 AM, Curtis <[email protected]> wrote:
>> >
>> >> Hi Andy,
>> >>
>> >> Thanks, inline...
>> >>
>> >> On Wed, Apr 2, 2014 at 8:22 AM, Andy Kurth <[email protected]> wrote:
>> >> > I can't tell from just the commands.  They look normal.  Were there any
>> >> > WARNING messages during the image process prior to the reboot?
>> >> >
>> >> > What error message is reported when you try to ssh from the management
>> >> > node? (Connection timed out, etc)  It may be helpful if you send the
>> >> output
>> >> > from running "ssh -v <win_computer>".
>> >> >
>> >>
>> >> This is what that output looks like:
>> >>
>> >> [root@VCL-PROD:~] $ ssh -vvvv vm79
>> >> OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
>> >> debug1: Reading configuration data /root/.ssh/config
>> >> debug1: Applying options for vm*
>> >> debug1: Reading configuration data /etc/ssh/ssh_config
>> >> debug1: Applying options for *
>> >> debug2: ssh_connect: needpriv 0
>> >> debug1: Connecting to vm79 [10.1.0.195] port 22.
>> >> debug2: fd 3 setting O_NONBLOCK
>> >> debug1: fd 3 clearing O_NONBLOCK
>> >> debug1: Connection established.
>> >> debug3: timeout: 1999 ms remain after connect
>> >> debug1: permanently_set_uid: 0/0
>> >> debug3: Not a RSA1 key file /etc/vcl/vcl.key.
>> >> debug2: key_type_from_name: unknown key type '-----BEGIN'
>> >> debug3: key_read: missing keytype
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug3: key_read: missing whitespace
>> >> debug2: key_type_from_name: unknown key type '-----END'
>> >> debug3: key_read: missing keytype
>> >> debug1: identity file /etc/vcl/vcl.key type 1
>> >> debug1: identity file /etc/vcl/vcl.key-cert type -1
>> >> Connection timed out during banner exchange
>> >>
>> >> > To troubleshoot, you'll need to login as root using the password which
>> >> was
>> >> > redacted from the vcld.log output.  Check the following:
>> >> >
>> >> > Is the Cygwin SSHD service running?  If not, try to start it.  If you 
>> >> > get
>> >> > an error related to incorrect credentials then something went wrong when
>> >> > root's password was set early on in the image capture process.
>> >>
>> >> It's usually hung up, ie. won't respond to commands.
>> >>
>> >> If I login to the vm on its console (with virt-manager) then sshd
>> >> can't be restarted from the windows service console, or cygrunsrv, but
>> >> if I kill it with taskill and then start it, it starts up fine.
>> >>
>> >> Something to do with long logon times maybe?
>> >>
>> >> >
>> >> > If SSHD is running, it could be a firewall problem.  Try simply turning
>> >> off
>> >> > the firewall temporarily on the Windows computer and try to ssh from the
>> >> > management node.
>> >>
>> >> The windows fw is not on, or at least it says it's not on. It's turned
>> >> off in the image.
>> >>
>> >> >
>> >> > If the firewall isn't the problem, something isn't configured correctly
>> >> > with the sshd service.  While logged in as root, you can try running
>> >> > C:\cygwin\root\VCL\Scripts\update_cygwin.cmd.  This gets run
>> >> automatically
>> >> > when an image is loaded and configures sshd correctly and starts the
>> >> > service.  If running this solves the problem, then you'll have to figure
>> >> > out which commands or changes made by this script fixed it.  If 
>> >> > possibly,
>> >> > it will be easier to troubleshoot if you take a snapshot of the computer
>> >> > before running this script so that you can revert to the broken state in
>> >> > order to narrow down the problem.
>> >> >
>> >>
>> >> Ok will give the update_cygwin.cmd a shot.
>> >>
>> >> Thanks,
>> >> Curtis.
>> >>
>> >> > -Andy
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Apr 1, 2014 at 6:28 PM, Curtis <[email protected]> wrote:
>> >> >
>> >> >> On Tue, Apr 1, 2014 at 4:16 PM, Curtis <[email protected]> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > We are having an issue with some of our images where when we try to
>> >> >> > create a new image from an existing image, everything goes ok until
>> >> >> > the part where the virtual machine is rebooted, and after it's
>> >> >> > rebooted sshd does not start up and the imaging process fails.
>> >> >> >
>> >> >> > Anyone have any thoughts? I'm fairly sure it has something to do with
>> >> >> > the various commands that are run on the image once an image creation
>> >> >> > process starts.
>> >> >>
>> >> >> Also, this gist has all the commands that are being run:
>> >> >>
>> >> >> https://gist.github.com/curtisgithub/6117a73b47e994d9be03
>> >> >>
>> >> >> But I'm not much of a windows administrator -- does anyone see
>> >> >> anything unusual in that gist that might be causing issues? Perhaps
>> >> >> something with the root logon or password?
>> >> >>
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Curtis.
>> >> >> >
>> >> >> > --
>> >> >> > Twitter: @serverascode
>> >> >> > Blog: serverascode.com
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Twitter: @serverascode
>> >> >> Blog: serverascode.com
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Twitter: @serverascode
>> >> Blog: serverascode.com
>> >>
>>
>>
>>
>> --
>> Twitter: @serverascode
>> Blog: serverascode.com



-- 
Twitter: @serverascode
Blog: serverascode.com

Reply via email to