Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
For the record, once I added a new storage domain the Data center came up. So in the end, this seems to have been due to known bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423 Effectively, for hosts with static/manual IP addressing (i.e. not DHCP), the DNS and default route information are not set up correctly by hosted-engine-setup. I'm not sure why that's not considered a higher priority bug (e.g. blocker for 3.5.2?) since I believe the most typical configuration for servers is static IP addressing. All seems to be working now. Many thanks to Simone for the invaluable assistance. -Bob On Mar 10, 2015 2:29 PM, Bob Doolittle b...@doolittle.us.com mailto:b...@doolittle.us.com wrote: On 03/10/2015 10:20 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com mailto:b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com mailto:stira...@redhat.com Cc: users-ovirt users@ovirt.org mailto:users@ovirt.org Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com mailto:b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com mailto:stira...@redhat.com Cc: users-ovirt users@ovirt.org mailto:users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com mailto:b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com mailto:stira...@redhat.com Cc: users-ovirt users@ovirt.org mailto:users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) ... OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'r...@xion2.smartcity.net mailto:r...@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was Network is unreachable. Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 11/03/15 16:37, Bob Doolittle wrote: For the record, once I added a new storage domain the Data center came up. So in the end, this seems to have been due to known bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1160667 https://bugzilla.redhat.com/show_bug.cgi?id=1160423 Effectively, for hosts with static/manual IP addressing (i.e. not DHCP), the DNS and default route information are not set up correctly by hosted-engine-setup. I'm not sure why that's not considered a higher priority bug (e.g. blocker for 3.5.2?) since I believe the most typical configuration for servers is static IP addressing. +1 -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Tuesday, March 10, 2015 7:29:44 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 10:20 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) ... OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'r...@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was Network is unreachable. Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary.
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) ... OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'r...@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was Network is unreachable. Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary. Here's the host config after the failed host-deploy: Host/Node: # ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host? # ip addr 1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) ... OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'r...@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was Network is unreachable. Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary. Here's the host config after the failed host-deploy: Host/Node: # ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 # ip addr 1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p3p2: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast master ovirtmgmt state UP group default qlen 1000 link/ether b8:ca:3a:79:22:12 brd ff:ff:ff:ff:ff:ff inet6 fe80::baca:3aff:fe79:2212/64 scope link valid_lft forever preferred_lft forever 3: bond0: NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP mtu 1500 qdisc noqueue
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
On 03/10/2015 10:20 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Tuesday, March 10, 2015 2:40:13 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/10/2015 04:58 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) ... OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host 217016bb-fdcd-4344-a0ca-4548262d10a8, ovirt-vm.: java.io.IOException: Command returned failure code 1 during SSH session 'r...@xion2.smartcity.net' Can you please attach host-deploy logs from the engine VM? OK, attached. Like I said, it looks to me like a name-resolution issue during the yum update on the engine. I think I've fixed that, but do you have a better suggestion for cleaning up and re-deploying other than installing the OS on my host and starting all over again? I just finished starting over from scratch, starting with OS installation on my host/node, and wound up with a very similar problem - the engine couldn't reach the hosts during the yum operation. But this time the error was Network is unreachable. Which is weird, because I can ssh into the engine and ping many of those hosts, after the operation has failed. Here's my latest host-deploy log from the engine. I'd appreciate any clues. It seams that now your host is able to resolve that addresses but it's not able to connect over http. On your hosts some of them resolves as IPv6 addresses; can you please try to use curl to get one of the file that it wasn't able to fetch? Can you please check your network configuration before and after host-deploy? I can give you the network configuration after host-deploy, at least for the host/Node. The engine won't start for me this morning, after I shut down the host for the night. In order to give you the config before host-deploy (or, apparently for the engine), I'll have to re-install the OS on the host and start again from scratch. Obviously I'd rather not do that unless absolutely necessary. Here's the host config after the failed host-deploy: Host/Node: # ip route 169.254.0.0/16 dev ovirtmgmt scope link metric 1007 172.16.0.0/16 dev ovirtmgmt proto kernel scope link src 172.16.0.58 You are missing a default gateway and so the issue. Are you sure that it was properly configured before trying to deploy that host? It should have been, it was a fresh OS install. So I'm starting again, and keeping careful records of my network config. Here
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 11:48:03 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (The VDSM host was found in a failed state) On 03/09/2015 02:47 PM, Bob Doolittle wrote: Resending with CC to list (and an update). On 03/09/2015 01:40 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On 03/09/2015 12:53 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On 03/09/2015 07:12 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On Mar 9, 2015 5:23 AM, Simone Tiraboschi stira...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: users-ovirt users@ovirt.org Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) Hi, I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address. I'm assuming this should be a password-less login (key-based authentication?). Yes, it is. As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage