[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
Update:

Removing all gluster mounts from /etc/fstab, solves the boot problem. I am
then able to manually mount all gluster bricks, bring up glusterd properly.
I'm trying to deploy the hosted VM again now, but I suspect that the
problem there as well, is going to be that it's trying to mount a gluster
brick before bringing up networking.

On Thu, Feb 7, 2019 at 8:37 AM feral  wrote:

> Which logs?
>
> The nodes hang on boot at "Started Flush Journal to Persistent Storage".
> This would be followed by gluster mounts coming up (before networking,
> which still doesn't make sense to me...) but they of course all fail as
> networking is down.
> The gluster logs, post node failure, simply state that all connection
> attempts are failing (because networking is down.
>
> I managed to get networking online manually and push logs OUT (cant' start
> sshd as that causes reboot).
> https://drive.google.com/open?id=1Kdb2pRUC0O-5u3ZkA3KT0qvAQIpv9SZm
>
> It seems to me that some vital systemd service must be failing on the
> nodes (and possibly that's what's happening on the VM as well?
>
> On Thu, Feb 7, 2019 at 8:25 AM Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Thu, Feb 7, 2019 at 5:19 PM feral  wrote:
>>
>>> I've never managed to get a connection to the engine via VNC/Spice
>>> (works fine for my other hypervisors...)
>>>
>>> As I said, the network setup is super simple. All three nodes have 1
>>> interface each (eth0). They are all set with static IP's, with matching
>>> DHCP reservations on the DHCP server, with matching DNS. All nodes have
>>> entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
>>> engine VM gets 192.168.1.198. During the engine deployment, the VM does
>>> come up on 198. I can ping it and ssh into it, but at some point, the
>>> connection drops.
>>> So I'm not relying on DHCP or DNS at all. VM comes up where expected,
>>> for a while, then it goes to reboot to get transferred to the
>>> gluster_engine storage, and that's where it drops offline and never comes
>>> back.
>>>
>>> I did another round of deployment tests last night and discovered that
>>> the nodes all fail to boot immediately after the gluster deployment (not
>>> after VM deployment as I mistakenly stated earlier). So the nodes get in a
>>> bad state during gluster deployment. They stay online just fine and gluster
>>> works perfect, until the node tries to reboot (which it fails to do).
>>>
>>
>> So I suggest to focus on the gluster deployment; can you please share
>> gluster logs?
>>
>>
>>>
>>> Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
>>> using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
>>> when I'm trying 4.3). They are identical configurations other than the
>>> version of ovirt-node.
>>>
>>> On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
>>> wrote:
>>>


 On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:

> I have no idea what's wrong at this point. Very vanilla install of 3
> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
> deployment, takes hours, eventually fails with :
>
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, 
> \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, 
> \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, 
> \"live-data\":
> true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, 
> \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", 

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
Which logs?

The nodes hang on boot at "Started Flush Journal to Persistent Storage".
This would be followed by gluster mounts coming up (before networking,
which still doesn't make sense to me...) but they of course all fail as
networking is down.
The gluster logs, post node failure, simply state that all connection
attempts are failing (because networking is down.

I managed to get networking online manually and push logs OUT (cant' start
sshd as that causes reboot).
https://drive.google.com/open?id=1Kdb2pRUC0O-5u3ZkA3KT0qvAQIpv9SZm

It seems to me that some vital systemd service must be failing on the nodes
(and possibly that's what's happening on the VM as well?

On Thu, Feb 7, 2019 at 8:25 AM Simone Tiraboschi 
wrote:

>
>
> On Thu, Feb 7, 2019 at 5:19 PM feral  wrote:
>
>> I've never managed to get a connection to the engine via VNC/Spice (works
>> fine for my other hypervisors...)
>>
>> As I said, the network setup is super simple. All three nodes have 1
>> interface each (eth0). They are all set with static IP's, with matching
>> DHCP reservations on the DHCP server, with matching DNS. All nodes have
>> entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
>> engine VM gets 192.168.1.198. During the engine deployment, the VM does
>> come up on 198. I can ping it and ssh into it, but at some point, the
>> connection drops.
>> So I'm not relying on DHCP or DNS at all. VM comes up where expected, for
>> a while, then it goes to reboot to get transferred to the gluster_engine
>> storage, and that's where it drops offline and never comes back.
>>
>> I did another round of deployment tests last night and discovered that
>> the nodes all fail to boot immediately after the gluster deployment (not
>> after VM deployment as I mistakenly stated earlier). So the nodes get in a
>> bad state during gluster deployment. They stay online just fine and gluster
>> works perfect, until the node tries to reboot (which it fails to do).
>>
>
> So I suggest to focus on the gluster deployment; can you please share
> gluster logs?
>
>
>>
>> Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
>> using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
>> when I'm trying 4.3). They are identical configurations other than the
>> version of ovirt-node.
>>
>> On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
>> wrote:
>>
>>>
>>>
>>> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>>>
 I have no idea what's wrong at this point. Very vanilla install of 3
 nodes. Run the Hyperconverged wizard, completes fine. Run the engine
 deployment, takes hours, eventually fails with :

 [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
 [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
 true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
 "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
 "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
 "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
 \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
 (Wed Feb 6 11:44:44
 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
 11:44:44
 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
 \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
 {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
 \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
 \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
 "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
 true, \"extra\":
 \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
 (Wed Feb 6 11:44:44
 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
 11:44:44
 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
 \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
 {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
 \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
 \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
 [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt
 level]
 [ INFO ] changed: [localhost]
 [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
 [ INFO ] ok: [localhost]
 [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
 running]
 [ INFO ] skipping: [localhost]
 [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
 address]
 [ INFO ] 

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread Simone Tiraboschi
On Thu, Feb 7, 2019 at 5:19 PM feral  wrote:

> I've never managed to get a connection to the engine via VNC/Spice (works
> fine for my other hypervisors...)
>
> As I said, the network setup is super simple. All three nodes have 1
> interface each (eth0). They are all set with static IP's, with matching
> DHCP reservations on the DHCP server, with matching DNS. All nodes have
> entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
> engine VM gets 192.168.1.198. During the engine deployment, the VM does
> come up on 198. I can ping it and ssh into it, but at some point, the
> connection drops.
> So I'm not relying on DHCP or DNS at all. VM comes up where expected, for
> a while, then it goes to reboot to get transferred to the gluster_engine
> storage, and that's where it drops offline and never comes back.
>
> I did another round of deployment tests last night and discovered that the
> nodes all fail to boot immediately after the gluster deployment (not after
> VM deployment as I mistakenly stated earlier). So the nodes get in a bad
> state during gluster deployment. They stay online just fine and gluster
> works perfect, until the node tries to reboot (which it fails to do).
>

So I suggest to focus on the gluster deployment; can you please share
gluster logs?


>
> Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
> using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
> when I'm trying 4.3). They are identical configurations other than the
> version of ovirt-node.
>
> On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
> wrote:
>
>>
>>
>> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>>
>>> I have no idea what's wrong at this point. Very vanilla install of 3
>>> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
>>> deployment, takes hours, eventually fails with :
>>>
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
>>> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
>>> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
>>> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
>>> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
>>> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
>>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>>> (Wed Feb 6 11:44:44
>>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>>> 11:44:44
>>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
>>> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
>>> true, \"extra\":
>>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>>> (Wed Feb 6 11:44:44
>>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>>> 11:44:44
>>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
>>> [ INFO ] changed: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
>>> [ INFO ] ok: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
>>> running]
>>> [ INFO ] skipping: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
>>> address]
>>> [ INFO ] changed: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
>>> stats]
>>> [ INFO ] changed: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
>>> [ INFO ] ok: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
>>> address from VDSM stats]
>>> [ INFO ] ok: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
>>> [ INFO ] ok: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is
>>> different from engine's he_fqdn resolved IP]
>>> [ INFO ] skipping: [localhost]
>>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason
>>> the engine didn't started]
>>> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
>>> engine failed to start inside the 

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread feral
I've never managed to get a connection to the engine via VNC/Spice (works
fine for my other hypervisors...)

As I said, the network setup is super simple. All three nodes have 1
interface each (eth0). They are all set with static IP's, with matching
DHCP reservations on the DHCP server, with matching DNS. All nodes have
entries in /etc/hosts on each machine. IP's are 192.168.1.195-7, and the
engine VM gets 192.168.1.198. During the engine deployment, the VM does
come up on 198. I can ping it and ssh into it, but at some point, the
connection drops.
So I'm not relying on DHCP or DNS at all. VM comes up where expected, for a
while, then it goes to reboot to get transferred to the gluster_engine
storage, and that's where it drops offline and never comes back.

I did another round of deployment tests last night and discovered that the
nodes all fail to boot immediately after the gluster deployment (not after
VM deployment as I mistakenly stated earlier). So the nodes get in a bad
state during gluster deployment. They stay online just fine and gluster
works perfect, until the node tries to reboot (which it fails to do).

Also, the networking I'm using is identical to my ovirt 4.2 setup. I'm
using the same MAC addresses, IP's, and hostnames (4.2 cluster is offline
when I'm trying 4.3). They are identical configurations other than the
version of ovirt-node.

On Thu, Feb 7, 2019 at 12:15 AM Simone Tiraboschi 
wrote:

>
>
> On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:
>
>> I have no idea what's wrong at this point. Very vanilla install of 3
>> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
>> deployment, takes hours, eventually fails with :
>>
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
>> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
>> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
>> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
>> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
>> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>> (Wed Feb 6 11:44:44
>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>> 11:44:44
>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
>> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
>> true, \"extra\":
>> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
>> (Wed Feb 6 11:44:44
>> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
>> 11:44:44
>> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
>> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
>> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
>> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
>> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
>> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
>> running]
>> [ INFO ] skipping: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
>> address]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
>> stats]
>> [ INFO ] changed: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP
>> address from VDSM stats]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
>> [ INFO ] ok: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is different
>> from engine's he_fqdn resolved IP]
>> [ INFO ] skipping: [localhost]
>> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason
>> the engine didn't started]
>> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
>> engine failed to start inside the engine VM; please check engine.log."}
>>
>> ---
>>
>> I can't check the engine.log as I can't connect to the VM once this
>> failure occurs. I can ssh in prior to the VM being moved to gluster
>> 

[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-07 Thread Simone Tiraboschi
On Wed, Feb 6, 2019 at 11:07 PM feral  wrote:

> I have no idea what's wrong at this point. Very vanilla install of 3
> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
> deployment, takes hours, eventually fails with :
>
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
> true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
> running]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
> stats]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address
> from VDSM stats]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is different
> from engine's he_fqdn resolved IP]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason
> the engine didn't started]
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> engine failed to start inside the engine VM; please check engine.log."}
>
> ---
>
> I can't check the engine.log as I can't connect to the VM once this
> failure occurs. I can ssh in prior to the VM being moved to gluster
> storage, but as soon as it starts doing so, the VM never comes back online.
>

{\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
\"up\", \"detail\": \"Up\"}

means that the engine VM is up at virt level but we cannot reach the engine
over http for a liveness check.
A network issue, a wrong DHCP reservation, a bad name resolution or
something live that could be the reason.
I suggest to try connecting to the engine VM via VNC or serial console to
check what's wrong there.



>
>
> --
> _
> Fact:
> 1. Ninjas are mammals.
> 2. Ninjas fight ALL the time.
> 3. The purpose of the ninja is to flip out and kill people.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LAM3TX64UQOOO7A2WBMGZUEF4TVFHXJA/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZLKM5S4AZ4YOGE4A3GJ363KJFAAHKII2/


[ovirt-users] Re: ovirt-node-4.3, deployment fails when moving hosted engine vm to gluster storage.

2019-02-06 Thread feral
Update, when the node is rebooted, it fails with "timed out waiting for
device dev-gluster_vg_vdb-gluster_lv_data.device
The node also has no networking online, which is probably the cause of the
gluster failure.

On Wed, Feb 6, 2019 at 2:04 PM feral  wrote:

> I have no idea what's wrong at this point. Very vanilla install of 3
> nodes. Run the Hyperconverged wizard, completes fine. Run the engine
> deployment, takes hours, eventually fails with :
>
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check engine VM health]
> [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed":
> true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta":
> "0:00:00.340985", "end": "2019-02-06 11:44:48.836431", "rc": 0, "start":
> "2019-02-06 11:44:48.495446", "stderr": "", "stderr_lines": [], "stdout":
> "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}",
> "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\":
> true, \"extra\":
> \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=12994
> (Wed Feb 6 11:44:44
> 2019)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=12995 (Wed Feb 6
> 11:44:44
> 2019)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStop\\nstopped=False\\n\",
> \"hostname\": \"ovirt-431.localdomain\", \"host-id\": 1, \"engine-status\":
> {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\":
> \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false,
> \"maintenance\": false, \"crc32\": \"5474927a\", \"local_conf_timestamp\":
> 12995, \"host-ts\": 12994}, \"global_maintenance\": false}"]}
> [ INFO ] TASK [oVirt.hosted-engine-setup : Check VM status at virt level]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if engine VM is not
> running]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get VDSM's target engine VM
> stats]
> [ INFO ] changed: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Convert stats to JSON format]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Get target engine VM IP address
> from VDSM stats]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : debug]
> [ INFO ] ok: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail if Engine IP is different
> from engine's he_fqdn resolved IP]
> [ INFO ] skipping: [localhost]
> [ INFO ] TASK [oVirt.hosted-engine-setup : Fail is for any other reason
> the engine didn't started]
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The
> engine failed to start inside the engine VM; please check engine.log."}
>
> ---
>
> I can't check the engine.log as I can't connect to the VM once this
> failure occurs. I can ssh in prior to the VM being moved to gluster
> storage, but as soon as it starts doing so, the VM never comes back online.
>
>
> --
> _
> Fact:
> 1. Ninjas are mammals.
> 2. Ninjas fight ALL the time.
> 3. The purpose of the ninja is to flip out and kill people.
>


-- 
_
Fact:
1. Ninjas are mammals.
2. Ninjas fight ALL the time.
3. The purpose of the ninja is to flip out and kill people.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NBBBT25VZDO5EXUKFUE2HUALDLVSAROB/