Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
Hi Denis,

>
> That sound really strange. I would suspect some storage problems or
> something. As i told you earlier, output of --vm-status may shed light on
> that issue.

Unfortunately, I can't replicate it at the moment due to the need to
keep the VMs up.

>
>>
>
> Did you tried to migrate form bare metal engine to the hosted engine?
>>

Yes, I used this procedure:

http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/

Essentially, I used a brand new host not joined to the cluster to
deploy the Hosted Engine VM.

> Engine is responsible for starting those VMs. As you had no engine, there
> was no one to start them. Hosted Engine tools are only responsible for the
> engine VM, not other VMs.

I could not find out why the engine would not start from the logs I looked at.
I didn't have the time to spend on it as I had to get the VMs up and running

> I know, there exists 'bare metal - to - hosted engine' migration procedure,
> but i doubt i knew it good enough. If i remember correctly, you need to take
> a backup of your bare metal engine database, run migration preparation
> script, that will handle spm_id duplications, deploy your first HE host,
> restore database from the backup, deploy more HE hosts. I'm not sure if
> those steps are correct and would better ask Martin about migration process.

I did all these steps as per the URL above, and it did not report any
errors during the process.
The Hosted Engine VM started fine, but it did not appear in the list
of VMs. I think the problem here
was that the list of display types was incorrectly written in the
hosted engine properties file. I was still
left with the issue that the Hosted Engine could not be migrated to
any other host. It was suggested
to re-install the other hosts with the 'deploy hosted engine' option
(which was missing in the official
documentation). This didn't fix the issue so it was suggested that the
host_id was incorrect (as it did not
reflect the SPM ID of the host. I fixed this, then restarted the
cluster...with the result that the engine
would not start, and no VMs started. I could not see any storage
errors in any of the logs I looked at,
but it had not been a problem previously when rebooting hosts (though
I'd never restarted the whole cluster
before). When I used the old bare metal engine, I could get into the
GUI to start the VMs, not sure why
they didn't come up automatically.

I'd like to get it working and will work with the person who takes it
over to do this. I'd like to see it succeed
so eventually we could use oVirt as a proof of concept to replace
VMWare with RHEV. Everyone's help has been great,
but unfortunately it hasn't been entirely smooth sailing (for this
migration) so far.

Thanks again,

Cam
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread Denis Chaplygin
Hello!

On Fri, Jun 30, 2017 at 5:46 PM, cmc  wrote:

> I ran 'hosted-engine --vm-start' after trying to ping the engine and
> running 'hosted-engine --vm-status' (which said it wasn't running) and
> it reported that it was 'destroying storage' and starting the engine,
> though it did not start it. I could not see any evidence from
> 'hosted-engine --vm-status' or logs that it started.


That sound really strange. I would suspect some storage problems or
something. As i told you earlier, output of --vm-status may shed light on
that issue.


> By this point I
> was in a panic to get VMs running. So I had to fire up the old bare
> metal engine. This has been a very disappointing experience. I still
> have no idea why the IDs in 'host_id' differed from the spm ID, and
>

Did you tried to migrate form bare metal engine to the hosted engine?

>
>
> 1. Why did the VMs (apart from the Hosted Engine VM) not start on
> power up of the hosts? Is it because the hosts were powered down, that
> they stay in a down state on power up of the host?
>
>
Engine is responsible for starting those VMs. As you had no engine, there
was no one to start them. Hosted Engine tools are only responsible for the
engine VM, not other VMs.


> 2. Now that I have connected the bare metal engine back to the
> cluster, is there a way back, or do I have to start from scratch
> again? I imagine there is no way of getting the Hosted Engine running
> again. If not, what do I need to 'clean' all the hosts of the remnants
> of the failed deployment? I can of course reinitialise the LUN that
> the Hosted Engine was on - anything else?
>

I know, there exists 'bare metal - to - hosted engine' migration procedure,
but i doubt i knew it good enough. If i remember correctly, you need to
take a backup of your bare metal engine database, run migration preparation
script, that will handle spm_id duplications, deploy your first HE host,
restore database from the backup, deploy more HE hosts. I'm not sure if
those steps are correct and would better ask Martin about migration process.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
I ran 'hosted-engine --vm-start' after trying to ping the engine and
running 'hosted-engine --vm-status' (which said it wasn't running) and
it reported that it was 'destroying storage' and starting the engine,
though it did not start it. I could not see any evidence from
'hosted-engine --vm-status' or logs that it started. By this point I
was in a panic to get VMs running. So I had to fire up the old bare
metal engine. This has been a very disappointing experience. I still
have no idea why the IDs in 'host_id' differed from the spm ID, and
why, when I put the cluster into global maintenance and shutdown all
the hosts, the Hosted Engine did not come up, nor any of the VMs. I
don't feel confident in this any more. If I try the deploying the
Hosted Engine again I am not sure if it will result in the same
non-functional cluster. It gave no error on deployment, but clearly
something was wrong.

I have two questions:

1. Why did the VMs (apart from the Hosted Engine VM) not start on
power up of the hosts? Is it because the hosts were powered down, that
they stay in a down state on power up of the host?

2. Now that I have connected the bare metal engine back to the
cluster, is there a way back, or do I have to start from scratch
again? I imagine there is no way of getting the Hosted Engine running
again. If not, what do I need to 'clean' all the hosts of the remnants
of the failed deployment? I can of course reinitialise the LUN that
the Hosted Engine was on - anything else?

Thanks

On Fri, Jun 30, 2017 at 4:30 PM, Denis Chaplygin  wrote:
> Hello!
>
> On Fri, Jun 30, 2017 at 4:19 PM, cmc  wrote:
>>
>> Help! I put the cluster into global maintenance, then powered off and
>> then on all of the nodes I have powered off and powered on all the
>> nodes. I have taken it out of global maintenance. No VM has started,
>> including the hosted engine. This is very bad. I am going to look
>> through logs to see why nothing has started. Help greatly appreciated.
>
>
> Global maintenance mode turns off high availability for the hosted engine
> vm. You should either cancel global maintenance or start vm manually with
> hosted-engine --vm-start
>
> Global maintenance was added to allow manual maintenance of the engine VM,
> so in that mode state of the engine VM and engine itself is not managed and
> you a free to stop engine or vm or both, do whatever you like and hosted
> engine tools will not interfere. Obviously when engine VM just dies while
> cluster is in global maintenance (or all nodes reboot, as in your case)
> there is no one to restart it :)
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread Denis Chaplygin
Hello!

On Fri, Jun 30, 2017 at 4:19 PM, cmc  wrote:

> Help! I put the cluster into global maintenance, then powered off and
> then on all of the nodes I have powered off and powered on all the
> nodes. I have taken it out of global maintenance. No VM has started,
> including the hosted engine. This is very bad. I am going to look
> through logs to see why nothing has started. Help greatly appreciated.
>

Global maintenance mode turns off high availability for the hosted engine
vm. You should either cancel global maintenance or start vm manually with
hosted-engine --vm-start

Global maintenance was added to allow manual maintenance of the engine VM,
so in that mode state of the engine VM and engine itself is not managed and
you a free to stop engine or vm or both, do whatever you like and hosted
engine tools will not interfere. Obviously when engine VM just dies while
cluster is in global maintenance (or all nodes reboot, as in your case)
there is no one to restart it :)
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
I've had no other choice but to power up the old bare metal engine to
be able to start the VMs. This is probably really bad but I had to get
the VMs running.
I am guessing now that if the host is shutdown rather than simply
rebooted, that the VMs will not restart on powerup of the host. This
would not have been such a problem if the Hosted Engine started.

So I'm not sure where to go from here...

I guess it is start from scratch again?

On Fri, Jun 30, 2017 at 3:19 PM, cmc  wrote:
> Help! I put the cluster into global maintenance, then powered off and
> then on all of the nodes I have powered off and powered on all the
> nodes. I have taken it out of global maintenance. No VM has started,
> including the hosted engine. This is very bad. I am going to look
> through logs to see why nothing has started. Help greatly appreciated.
>
> Thanks,
>
> Cam
>
> On Fri, Jun 30, 2017 at 1:00 PM, cmc  wrote:
>> So I can run from any node: hosted-engine --set-maintenance
>> --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This
>> shouldn't affect the running of any VMs, correct? Sorry for the
>> questions, just want to do it correctly and not make assumptions :)
>>
>> Cheers,
>>
>> C
>>
>> On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak  wrote:
>>> Hi,
>>>
 Just to clarify: you mean the host_id in
 /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
 correct?
>>>
>>> Exactly.
>>>
>>> Put the cluster to global maintenance first. Or kill all agents (has
>>> the same effect).
>>>
>>> Martin
>>>
>>> On Fri, Jun 30, 2017 at 12:47 PM, cmc  wrote:
 Just to clarify: you mean the host_id in
 /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
 correct?

 On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
> Hi,
>
> cleaning metadata won't help in this case. Try transferring the
> spm_ids you got from the engine to the proper hosted engine hosts so
> the hosted engine ids match the spm_ids. Then restart all hosted
> engine services. I would actually recommend restarting all hosts after
> this change, but I have no idea how many VMs you have running.
>
> Martin
>
> On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
>> Tried running a 'hosted-engine --clean-metadata" as per
>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>> ovirt-ha-agent was not running anyway, but it fails with the following
>> error:
>>
>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>> to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>> call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>> return action(he)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 67, in action_clean
>> return he.clean(options.force_cleanup)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 345, in clean
>> self._initialize_domain_monitor()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 823, in _initialize_domain_monitor
>> raise Exception(msg)
>> Exception: Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, 
>> attempt '0'
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>> occurred, giving up. Please review the log and consider filing a bug.
>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>
>> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
>>> Actually, it looks like sanlock problems:
>>>
>>>"SanlockInitializationError: Failed to initialize sanlock, the
>>> number of errors has exceeded the limit"
>>>
>>>
>>>
>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
 Sorry, I am mistaken, two hosts failed for the agent with the 
 following error:

 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Failed to start monitoring domain
 (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
 during domain acquisition
 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Shutting down the agent because of 3 failures in a row!

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
Help! I put the cluster into global maintenance, then powered off and
then on all of the nodes I have powered off and powered on all the
nodes. I have taken it out of global maintenance. No VM has started,
including the hosted engine. This is very bad. I am going to look
through logs to see why nothing has started. Help greatly appreciated.

Thanks,

Cam

On Fri, Jun 30, 2017 at 1:00 PM, cmc  wrote:
> So I can run from any node: hosted-engine --set-maintenance
> --mode=global. By 'agents', you mean the ovirt-ha-agent, right? This
> shouldn't affect the running of any VMs, correct? Sorry for the
> questions, just want to do it correctly and not make assumptions :)
>
> Cheers,
>
> C
>
> On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak  wrote:
>> Hi,
>>
>>> Just to clarify: you mean the host_id in
>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>> correct?
>>
>> Exactly.
>>
>> Put the cluster to global maintenance first. Or kill all agents (has
>> the same effect).
>>
>> Martin
>>
>> On Fri, Jun 30, 2017 at 12:47 PM, cmc  wrote:
>>> Just to clarify: you mean the host_id in
>>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>>> correct?
>>>
>>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
 Hi,

 cleaning metadata won't help in this case. Try transferring the
 spm_ids you got from the engine to the proper hosted engine hosts so
 the hosted engine ids match the spm_ids. Then restart all hosted
 engine services. I would actually recommend restarting all hosts after
 this change, but I have no idea how many VMs you have running.

 Martin

 On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
> Tried running a 'hosted-engine --clean-metadata" as per
> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
> ovirt-ha-agent was not running anyway, but it fails with the following
> error:
>
> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
> to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
> call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 191, in _run_agent
> return action(he)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 67, in action_clean
> return he.clean(options.force_cleanup)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 345, in clean
> self._initialize_domain_monitor()
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 823, in _initialize_domain_monitor
> raise Exception(msg)
> Exception: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, 
> attempt '0'
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
> occurred, giving up. Please review the log and consider filing a bug.
> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>
> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
>> Actually, it looks like sanlock problems:
>>
>>"SanlockInitializationError: Failed to initialize sanlock, the
>> number of errors has exceeded the limit"
>>
>>
>>
>> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
>>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>>> error:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Shutting down the agent because of 3 failures in a row!
>>>
>>> What could cause these timeouts? Some other service not running?
>>>
>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
 Both services are up on all three hosts. The broke logs just report:

 Thread-6549::INFO::2017-06-29
 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
 Thread-6549::INFO::2017-06-29
 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

 Thanks,

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
So I can run from any node: hosted-engine --set-maintenance
--mode=global. By 'agents', you mean the ovirt-ha-agent, right? This
shouldn't affect the running of any VMs, correct? Sorry for the
questions, just want to do it correctly and not make assumptions :)

Cheers,

C

On Fri, Jun 30, 2017 at 12:12 PM, Martin Sivak  wrote:
> Hi,
>
>> Just to clarify: you mean the host_id in
>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>> correct?
>
> Exactly.
>
> Put the cluster to global maintenance first. Or kill all agents (has
> the same effect).
>
> Martin
>
> On Fri, Jun 30, 2017 at 12:47 PM, cmc  wrote:
>> Just to clarify: you mean the host_id in
>> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
>> correct?
>>
>> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
>>> Hi,
>>>
>>> cleaning metadata won't help in this case. Try transferring the
>>> spm_ids you got from the engine to the proper hosted engine hosts so
>>> the hosted engine ids match the spm_ids. Then restart all hosted
>>> engine services. I would actually recommend restarting all hosts after
>>> this change, but I have no idea how many VMs you have running.
>>>
>>> Martin
>>>
>>> On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
 Tried running a 'hosted-engine --clean-metadata" as per
 https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
 ovirt-ha-agent was not running anyway, but it fails with the following
 error:

 ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
 to start monitoring domain
 (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
 during domain acquisition
 ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
 call last):
   File 
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
 line 191, in _run_agent
 return action(he)
   File 
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
 line 67, in action_clean
 return he.clean(options.force_cleanup)
   File 
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 345, in clean
 self._initialize_domain_monitor()
   File 
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 823, in _initialize_domain_monitor
 raise Exception(msg)
 Exception: Failed to start monitoring domain
 (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
 during domain acquisition
 ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
 WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt 
 '0'
 ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
 occurred, giving up. Please review the log and consider filing a bug.
 INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down

 On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
> Actually, it looks like sanlock problems:
>
>"SanlockInitializationError: Failed to initialize sanlock, the
> number of errors has exceeded the limit"
>
>
>
> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>> error:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Shutting down the agent because of 3 failures in a row!
>>
>> What could cause these timeouts? Some other service not running?
>>
>> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
>>> Both services are up on all three hosts. The broke logs just report:
>>>
>>> Thread-6549::INFO::2017-06-29
>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>> Connection established
>>> Thread-6549::INFO::2017-06-29
>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>> Connection closed
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
 Hi,

 please make sure that both ovirt-ha-agent and ovirt-ha-broker services
 are restarted and up. The error says the agent can't talk to the
 broker. Is there anything in the broker.log?

 Best regards

 Martin Sivak

 On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
> I've restarted those two services across all 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread Martin Sivak
Hi,

> Just to clarify: you mean the host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
> correct?

Exactly.

Put the cluster to global maintenance first. Or kill all agents (has
the same effect).

Martin

On Fri, Jun 30, 2017 at 12:47 PM, cmc  wrote:
> Just to clarify: you mean the host_id in
> /etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
> correct?
>
> On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
>> Hi,
>>
>> cleaning metadata won't help in this case. Try transferring the
>> spm_ids you got from the engine to the proper hosted engine hosts so
>> the hosted engine ids match the spm_ids. Then restart all hosted
>> engine services. I would actually recommend restarting all hosts after
>> this change, but I have no idea how many VMs you have running.
>>
>> Martin
>>
>> On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
>>> Tried running a 'hosted-engine --clean-metadata" as per
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>>> ovirt-ha-agent was not running anyway, but it fails with the following
>>> error:
>>>
>>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>>> to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>>> call last):
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 191, in _run_agent
>>> return action(he)
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 67, in action_clean
>>> return he.clean(options.force_cleanup)
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 345, in clean
>>> self._initialize_domain_monitor()
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 823, in _initialize_domain_monitor
>>> raise Exception(msg)
>>> Exception: Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt 
>>> '0'
>>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>>> occurred, giving up. Please review the log and consider filing a bug.
>>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>>
>>> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
 Actually, it looks like sanlock problems:

"SanlockInitializationError: Failed to initialize sanlock, the
 number of errors has exceeded the limit"



 On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
> Sorry, I am mistaken, two hosts failed for the agent with the following 
> error:
>
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Shutting down the agent because of 3 failures in a row!
>
> What could cause these timeouts? Some other service not running?
>
> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
>> Both services are up on all three hosts. The broke logs just report:
>>
>> Thread-6549::INFO::2017-06-29
>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>> Connection established
>> Thread-6549::INFO::2017-06-29
>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>> Connection closed
>>
>> Thanks,
>>
>> Cam
>>
>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
>>> Hi,
>>>
>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>> are restarted and up. The error says the agent can't talk to the
>>> broker. Is there anything in the broker.log?
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
 I've restarted those two services across all hosts, have taken the
 Hosted Engine host out of maintenance, and when I try to migrate the
 Hosted Engine over to another host, it reports that all three hosts
 'did not satisfy internal filter HA because it is not a Hosted Engine
 host'.

 On the host that the Hosted Engine is currently on it reports in the 
 agent.log:

 ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
Just to clarify: you mean the host_id in
/etc/ovirt-hosted-engine/hosted-engine.conf should match the spm_id,
correct?

On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
> Hi,
>
> cleaning metadata won't help in this case. Try transferring the
> spm_ids you got from the engine to the proper hosted engine hosts so
> the hosted engine ids match the spm_ids. Then restart all hosted
> engine services. I would actually recommend restarting all hosts after
> this change, but I have no idea how many VMs you have running.
>
> Martin
>
> On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
>> Tried running a 'hosted-engine --clean-metadata" as per
>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>> ovirt-ha-agent was not running anyway, but it fails with the following
>> error:
>>
>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>> to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>> call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>> return action(he)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 67, in action_clean
>> return he.clean(options.force_cleanup)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 345, in clean
>> self._initialize_domain_monitor()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 823, in _initialize_domain_monitor
>> raise Exception(msg)
>> Exception: Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt 
>> '0'
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>> occurred, giving up. Please review the log and consider filing a bug.
>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>
>> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
>>> Actually, it looks like sanlock problems:
>>>
>>>"SanlockInitializationError: Failed to initialize sanlock, the
>>> number of errors has exceeded the limit"
>>>
>>>
>>>
>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
 Sorry, I am mistaken, two hosts failed for the agent with the following 
 error:

 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Failed to start monitoring domain
 (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
 during domain acquisition
 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Shutting down the agent because of 3 failures in a row!

 What could cause these timeouts? Some other service not running?

 On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
> Both services are up on all three hosts. The broke logs just report:
>
> Thread-6549::INFO::2017-06-29
> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
> Connection established
> Thread-6549::INFO::2017-06-29
> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Connection closed
>
> Thanks,
>
> Cam
>
> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
>> Hi,
>>
>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>> are restarted and up. The error says the agent can't talk to the
>> broker. Is there anything in the broker.log?
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
>>> I've restarted those two services across all hosts, have taken the
>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>> Hosted Engine over to another host, it reports that all three hosts
>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>> host'.
>>>
>>> On the host that the Hosted Engine is currently on it reports in the 
>>> agent.log:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>> Connection closed: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>> getting service path: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>> call last):

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread cmc
Ok, Thanks Martin. It should be feasible to get all VMs onto one host,
so I can do that (unless you recommend just shutting the entire
cluster down at once?). For the engine, I'll shut it down since it
won't migrate to another host, before shutting that host down.

Will let you know how it goes.

Thanks,

Cam

On Fri, Jun 30, 2017 at 9:47 AM, Martin Sivak  wrote:
> Hi,
>
> cleaning metadata won't help in this case. Try transferring the
> spm_ids you got from the engine to the proper hosted engine hosts so
> the hosted engine ids match the spm_ids. Then restart all hosted
> engine services. I would actually recommend restarting all hosts after
> this change, but I have no idea how many VMs you have running.
>
> Martin
>
> On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
>> Tried running a 'hosted-engine --clean-metadata" as per
>> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
>> ovirt-ha-agent was not running anyway, but it fails with the following
>> error:
>>
>> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
>> to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
>> call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>> return action(he)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 67, in action_clean
>> return he.clean(options.force_cleanup)
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 345, in clean
>> self._initialize_domain_monitor()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 823, in _initialize_domain_monitor
>> raise Exception(msg)
>> Exception: Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
>> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt 
>> '0'
>> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
>> occurred, giving up. Please review the log and consider filing a bug.
>> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>>
>> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
>>> Actually, it looks like sanlock problems:
>>>
>>>"SanlockInitializationError: Failed to initialize sanlock, the
>>> number of errors has exceeded the limit"
>>>
>>>
>>>
>>> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
 Sorry, I am mistaken, two hosts failed for the agent with the following 
 error:

 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Failed to start monitoring domain
 (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
 during domain acquisition
 ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
 ERROR Shutting down the agent because of 3 failures in a row!

 What could cause these timeouts? Some other service not running?

 On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
> Both services are up on all three hosts. The broke logs just report:
>
> Thread-6549::INFO::2017-06-29
> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
> Connection established
> Thread-6549::INFO::2017-06-29
> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Connection closed
>
> Thanks,
>
> Cam
>
> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
>> Hi,
>>
>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>> are restarted and up. The error says the agent can't talk to the
>> broker. Is there anything in the broker.log?
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
>>> I've restarted those two services across all hosts, have taken the
>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>> Hosted Engine over to another host, it reports that all three hosts
>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>> host'.
>>>
>>> On the host that the Hosted Engine is currently on it reports in the 
>>> agent.log:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>> Connection closed: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>> getting service path: 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-30 Thread Martin Sivak
Hi,

cleaning metadata won't help in this case. Try transferring the
spm_ids you got from the engine to the proper hosted engine hosts so
the hosted engine ids match the spm_ids. Then restart all hosted
engine services. I would actually recommend restarting all hosts after
this change, but I have no idea how many VMs you have running.

Martin

On Thu, Jun 29, 2017 at 8:27 PM, cmc  wrote:
> Tried running a 'hosted-engine --clean-metadata" as per
> https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
> ovirt-ha-agent was not running anyway, but it fails with the following
> error:
>
> ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
> to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
> call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 191, in _run_agent
> return action(he)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 67, in action_clean
> return he.clean(options.force_cleanup)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 345, in clean
> self._initialize_domain_monitor()
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 823, in _initialize_domain_monitor
> raise Exception(msg)
> Exception: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
> WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
> ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
> occurred, giving up. Please review the log and consider filing a bug.
> INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down
>
> On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
>> Actually, it looks like sanlock problems:
>>
>>"SanlockInitializationError: Failed to initialize sanlock, the
>> number of errors has exceeded the limit"
>>
>>
>>
>> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
>>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>>> error:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>>> ERROR Shutting down the agent because of 3 failures in a row!
>>>
>>> What could cause these timeouts? Some other service not running?
>>>
>>> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
 Both services are up on all three hosts. The broke logs just report:

 Thread-6549::INFO::2017-06-29
 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
 Connection established
 Thread-6549::INFO::2017-06-29
 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed

 Thanks,

 Cam

 On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
> Hi,
>
> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
> are restarted and up. The error says the agent can't talk to the
> broker. Is there anything in the broker.log?
>
> Best regards
>
> Martin Sivak
>
> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
>> I've restarted those two services across all hosts, have taken the
>> Hosted Engine host out of maintenance, and when I try to migrate the
>> Hosted Engine over to another host, it reports that all three hosts
>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>> host'.
>>
>> On the host that the Hosted Engine is currently on it reports in the 
>> agent.log:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>> Connection closed: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>> getting service path: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>> call last):
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>>   return action(he)
>> File

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Tried running a 'hosted-engine --clean-metadata" as per
https://bugzilla.redhat.com/show_bug.cgi?id=1350539, since
ovirt-ha-agent was not running anyway, but it fails with the following
error:

ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Failed
to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Traceback (most recent
call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
return action(he)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 67, in action_clean
return he.clean(options.force_cleanup)
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 345, in clean
self._initialize_domain_monitor()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Trying to restart agent
WARNING:ovirt_hosted_engine_ha.agent.agent.Agent:Restarting agent, attempt '0'
ERROR:ovirt_hosted_engine_ha.agent.agent.Agent:Too many errors
occurred, giving up. Please review the log and consider filing a bug.
INFO:ovirt_hosted_engine_ha.agent.agent.Agent:Agent shutting down

On Thu, Jun 29, 2017 at 6:10 PM, cmc  wrote:
> Actually, it looks like sanlock problems:
>
>"SanlockInitializationError: Failed to initialize sanlock, the
> number of errors has exceeded the limit"
>
>
>
> On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
>> Sorry, I am mistaken, two hosts failed for the agent with the following 
>> error:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
>> ERROR Shutting down the agent because of 3 failures in a row!
>>
>> What could cause these timeouts? Some other service not running?
>>
>> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
>>> Both services are up on all three hosts. The broke logs just report:
>>>
>>> Thread-6549::INFO::2017-06-29
>>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>> Connection established
>>> Thread-6549::INFO::2017-06-29
>>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>> Connection closed
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
 Hi,

 please make sure that both ovirt-ha-agent and ovirt-ha-broker services
 are restarted and up. The error says the agent can't talk to the
 broker. Is there anything in the broker.log?

 Best regards

 Martin Sivak

 On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
> I've restarted those two services across all hosts, have taken the
> Hosted Engine host out of maintenance, and when I try to migrate the
> Hosted Engine over to another host, it reports that all three hosts
> 'did not satisfy internal filter HA because it is not a Hosted Engine
> host'.
>
> On the host that the Hosted Engine is currently on it reports in the 
> agent.log:
>
> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
> Connection closed: Connection closed
> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
> getting service path: Connection closed
> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
> call last):
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 191, in _run_agent
>   return action(he)
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 64, in action_proper
>   return
> he.start_monitoring()
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 411, in start_monitoring
>   
> self._initialize_sanlock()
> File

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Hi Denis,

I ran the query as you suggested, just by starting at spm_id=1 and on
up to 3 (the number of hosts I have), and it identified a different
host for each spm_id, indicating that they are indeed unique, so this
looks good.

Regards,

Cam

On Thu, Jun 29, 2017 at 2:07 PM, Denis Chaplygin  wrote:
> Hello!
>
> On Thu, Jun 29, 2017 at 1:22 PM, Martin Sivak  wrote:
>>
>> Change the ids so they are distinct. I need to check if there is a way
>> to read the SPM ids from the engine as using the same numbers would be
>> the best.
>
>
> Host (SPM) ids are not shown in the UI, but you can search on it by typing
> 'spm_id=' into a search box and it will return you host with the
> specified id or nothing if that id is not in use
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Actually, it looks like sanlock problems:

   "SanlockInitializationError: Failed to initialize sanlock, the
number of errors has exceeded the limit"



On Thu, Jun 29, 2017 at 5:10 PM, cmc  wrote:
> Sorry, I am mistaken, two hosts failed for the agent with the following error:
>
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
> ERROR Shutting down the agent because of 3 failures in a row!
>
> What could cause these timeouts? Some other service not running?
>
> On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
>> Both services are up on all three hosts. The broke logs just report:
>>
>> Thread-6549::INFO::2017-06-29
>> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>> Connection established
>> Thread-6549::INFO::2017-06-29
>> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>> Connection closed
>>
>> Thanks,
>>
>> Cam
>>
>> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
>>> Hi,
>>>
>>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>>> are restarted and up. The error says the agent can't talk to the
>>> broker. Is there anything in the broker.log?
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
 I've restarted those two services across all hosts, have taken the
 Hosted Engine host out of maintenance, and when I try to migrate the
 Hosted Engine over to another host, it reports that all three hosts
 'did not satisfy internal filter HA because it is not a Hosted Engine
 host'.

 On the host that the Hosted Engine is currently on it reports in the 
 agent.log:

 ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
 Connection closed: Connection closed
 Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
 ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
 getting service path: Connection closed
 Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
 ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
 call last):
 File
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
 line 191, in _run_agent
   return action(he)
 File
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
 line 64, in action_proper
   return
 he.start_monitoring()
 File
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 411, in start_monitoring
   
 self._initialize_sanlock()
 File
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 691, in _initialize_sanlock

 constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
 File
 "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
 line 162, in get_service_path
   .format(str(e)))
   RequestError: Failed
 to get service path: Connection closed
 Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
 ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent

 On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak  wrote:
> Hi,
>
> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>
> The scheduling message just means that the host has score 0 or is not
> reporting score at all.
>
> Martin
>
> On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
>> Thanks Martin, do I have to restart anything? When I try to use the
>> 'migrate' operation, it complains that the other two hosts 'did not
>> satisfy internal filter HA because it is not a Hosted Engine host..'
>> (even though I reinstalled both these hosts with the 'deploy hosted
>> engine' option, which suggests that something needs restarting. Should
>> I worry about the sanlock errors, or will that be resolved by the
>> change in host_id?
>>
>> Kind regards,
>>
>> Cam
>>
>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Sorry, I am mistaken, two hosts failed for the agent with the following error:

ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine
ERROR Shutting down the agent because of 3 failures in a row!

What could cause these timeouts? Some other service not running?

On Thu, Jun 29, 2017 at 5:03 PM, cmc  wrote:
> Both services are up on all three hosts. The broke logs just report:
>
> Thread-6549::INFO::2017-06-29
> 17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
> Connection established
> Thread-6549::INFO::2017-06-29
> 17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
> Connection closed
>
> Thanks,
>
> Cam
>
> On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
>> Hi,
>>
>> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
>> are restarted and up. The error says the agent can't talk to the
>> broker. Is there anything in the broker.log?
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
>>> I've restarted those two services across all hosts, have taken the
>>> Hosted Engine host out of maintenance, and when I try to migrate the
>>> Hosted Engine over to another host, it reports that all three hosts
>>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>>> host'.
>>>
>>> On the host that the Hosted Engine is currently on it reports in the 
>>> agent.log:
>>>
>>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>>> Connection closed: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>>> getting service path: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>>> call last):
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 191, in _run_agent
>>>   return action(he)
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>>> line 64, in action_proper
>>>   return
>>> he.start_monitoring()
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 411, in start_monitoring
>>>   
>>> self._initialize_sanlock()
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 691, in _initialize_sanlock
>>>
>>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>> line 162, in get_service_path
>>>   .format(str(e)))
>>>   RequestError: Failed
>>> to get service path: Connection closed
>>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>>
>>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak  wrote:
 Hi,

 yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.

 The scheduling message just means that the host has score 0 or is not
 reporting score at all.

 Martin

 On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
> Thanks Martin, do I have to restart anything? When I try to use the
> 'migrate' operation, it complains that the other two hosts 'did not
> satisfy internal filter HA because it is not a Hosted Engine host..'
> (even though I reinstalled both these hosts with the 'deploy hosted
> engine' option, which suggests that something needs restarting. Should
> I worry about the sanlock errors, or will that be resolved by the
> change in host_id?
>
> Kind regards,
>
> Cam
>
> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
>> Change the ids so they are distinct. I need to check if there is a way
>> to read the SPM ids from the engine as using the same numbers would be
>> the best.
>>
>> Martin
>>
>>
>>
>> On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
>>> Is there 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Both services are up on all three hosts. The broke logs just report:

Thread-6549::INFO::2017-06-29
17:01:51,481::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-6549::INFO::2017-06-29
17:01:51,483::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed

Thanks,

Cam

On Thu, Jun 29, 2017 at 4:00 PM, Martin Sivak  wrote:
> Hi,
>
> please make sure that both ovirt-ha-agent and ovirt-ha-broker services
> are restarted and up. The error says the agent can't talk to the
> broker. Is there anything in the broker.log?
>
> Best regards
>
> Martin Sivak
>
> On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
>> I've restarted those two services across all hosts, have taken the
>> Hosted Engine host out of maintenance, and when I try to migrate the
>> Hosted Engine over to another host, it reports that all three hosts
>> 'did not satisfy internal filter HA because it is not a Hosted Engine
>> host'.
>>
>> On the host that the Hosted Engine is currently on it reports in the 
>> agent.log:
>>
>> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
>> Connection closed: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
>> getting service path: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
>> call last):
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 191, in _run_agent
>>   return action(he)
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
>> line 64, in action_proper
>>   return
>> he.start_monitoring()
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 411, in start_monitoring
>>   
>> self._initialize_sanlock()
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 691, in _initialize_sanlock
>>
>> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>> line 162, in get_service_path
>>   .format(str(e)))
>>   RequestError: Failed
>> to get service path: Connection closed
>> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
>> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>>
>> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak  wrote:
>>> Hi,
>>>
>>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>>>
>>> The scheduling message just means that the host has score 0 or is not
>>> reporting score at all.
>>>
>>> Martin
>>>
>>> On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
 Thanks Martin, do I have to restart anything? When I try to use the
 'migrate' operation, it complains that the other two hosts 'did not
 satisfy internal filter HA because it is not a Hosted Engine host..'
 (even though I reinstalled both these hosts with the 'deploy hosted
 engine' option, which suggests that something needs restarting. Should
 I worry about the sanlock errors, or will that be resolved by the
 change in host_id?

 Kind regards,

 Cam

 On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
> Change the ids so they are distinct. I need to check if there is a way
> to read the SPM ids from the engine as using the same numbers would be
> the best.
>
> Martin
>
>
>
> On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
>> Is there any way of recovering from this situation? I'd prefer to fix
>> the issue rather than re-deploy, but if there is no recovery path, I
>> could perhaps try re-deploying the hosted engine. In which case, would
>> the best option be to take a backup of the Hosted Engine, and then
>> shut it down, re-initialise the SAN partition (or use another
>> partition) and retry the deployment? Would it be better to use the
>> older backup from the bare metal engine that I originally used, or use
>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>> added since switching to Hosted Engine.
>>
>> Unfortunately I 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread Martin Sivak
Hi,

please make sure that both ovirt-ha-agent and ovirt-ha-broker services
are restarted and up. The error says the agent can't talk to the
broker. Is there anything in the broker.log?

Best regards

Martin Sivak

On Thu, Jun 29, 2017 at 4:42 PM, cmc  wrote:
> I've restarted those two services across all hosts, have taken the
> Hosted Engine host out of maintenance, and when I try to migrate the
> Hosted Engine over to another host, it reports that all three hosts
> 'did not satisfy internal filter HA because it is not a Hosted Engine
> host'.
>
> On the host that the Hosted Engine is currently on it reports in the 
> agent.log:
>
> ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
> Connection closed: Connection closed
> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
> getting service path: Connection closed
> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
> call last):
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 191, in _run_agent
>   return action(he)
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 64, in action_proper
>   return
> he.start_monitoring()
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 411, in start_monitoring
>   
> self._initialize_sanlock()
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 691, in _initialize_sanlock
>
> constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 162, in get_service_path
>   .format(str(e)))
>   RequestError: Failed
> to get service path: Connection closed
> Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
>
> On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak  wrote:
>> Hi,
>>
>> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>>
>> The scheduling message just means that the host has score 0 or is not
>> reporting score at all.
>>
>> Martin
>>
>> On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
>>> Thanks Martin, do I have to restart anything? When I try to use the
>>> 'migrate' operation, it complains that the other two hosts 'did not
>>> satisfy internal filter HA because it is not a Hosted Engine host..'
>>> (even though I reinstalled both these hosts with the 'deploy hosted
>>> engine' option, which suggests that something needs restarting. Should
>>> I worry about the sanlock errors, or will that be resolved by the
>>> change in host_id?
>>>
>>> Kind regards,
>>>
>>> Cam
>>>
>>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
 Change the ids so they are distinct. I need to check if there is a way
 to read the SPM ids from the engine as using the same numbers would be
 the best.

 Martin



 On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
> Is there any way of recovering from this situation? I'd prefer to fix
> the issue rather than re-deploy, but if there is no recovery path, I
> could perhaps try re-deploying the hosted engine. In which case, would
> the best option be to take a backup of the Hosted Engine, and then
> shut it down, re-initialise the SAN partition (or use another
> partition) and retry the deployment? Would it be better to use the
> older backup from the bare metal engine that I originally used, or use
> a backup from the Hosted Engine? I'm not sure if any VMs have been
> added since switching to Hosted Engine.
>
> Unfortunately I have very little time left to get this working before
> I have to hand it over for eval (by end of Friday).
>
> Here are some log snippets from the cluster that are current
>
> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>
> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
> 3) (clusterlock:282)
> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
> Error acquiring host id 3 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
I've restarted those two services across all hosts, have taken the
Hosted Engine host out of maintenance, and when I try to migrate the
Hosted Engine over to another host, it reports that all three hosts
'did not satisfy internal filter HA because it is not a Hosted Engine
host'.

On the host that the Hosted Engine is currently on it reports in the agent.log:

ovirt-ha-agent ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR
Connection closed: Connection closed
Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Exception
getting service path: Connection closed
Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent
call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 191, in _run_agent
  return action(he)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 64, in action_proper
  return
he.start_monitoring()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 411, in start_monitoring
  self._initialize_sanlock()
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 691, in _initialize_sanlock

constants.SERVICE_TYPE + constants.LOCKSPACE_EXTENSION)
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 162, in get_service_path
  .format(str(e)))
  RequestError: Failed
to get service path: Connection closed
Jun 29 15:22:25 kvm-ldn-03 ovirt-ha-agent[12653]: ovirt-ha-agent
ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent

On Thu, Jun 29, 2017 at 1:25 PM, Martin Sivak  wrote:
> Hi,
>
> yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.
>
> The scheduling message just means that the host has score 0 or is not
> reporting score at all.
>
> Martin
>
> On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
>> Thanks Martin, do I have to restart anything? When I try to use the
>> 'migrate' operation, it complains that the other two hosts 'did not
>> satisfy internal filter HA because it is not a Hosted Engine host..'
>> (even though I reinstalled both these hosts with the 'deploy hosted
>> engine' option, which suggests that something needs restarting. Should
>> I worry about the sanlock errors, or will that be resolved by the
>> change in host_id?
>>
>> Kind regards,
>>
>> Cam
>>
>> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
>>> Change the ids so they are distinct. I need to check if there is a way
>>> to read the SPM ids from the engine as using the same numbers would be
>>> the best.
>>>
>>> Martin
>>>
>>>
>>>
>>> On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
 Is there any way of recovering from this situation? I'd prefer to fix
 the issue rather than re-deploy, but if there is no recovery path, I
 could perhaps try re-deploying the hosted engine. In which case, would
 the best option be to take a backup of the Hosted Engine, and then
 shut it down, re-initialise the SAN partition (or use another
 partition) and retry the deployment? Would it be better to use the
 older backup from the bare metal engine that I originally used, or use
 a backup from the Hosted Engine? I'm not sure if any VMs have been
 added since switching to Hosted Engine.

 Unfortunately I have very little time left to get this working before
 I have to hand it over for eval (by end of Friday).

 Here are some log snippets from the cluster that are current

 In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:

 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
 Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
 3) (clusterlock:282)
 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
 Error acquiring host id 3 for domain
 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
 Traceback (most recent call last):
   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
 self.domain.acquireHostId(self.hostId, async=True)
   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
 self._manifest.acquireHostId(hostId, async)
   File "/usr/share/vdsm/storage/sd.py", line 449, 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread Denis Chaplygin
Hello!

On Thu, Jun 29, 2017 at 1:22 PM, Martin Sivak  wrote:

> Change the ids so they are distinct. I need to check if there is a way
> to read the SPM ids from the engine as using the same numbers would be
> the best.
>

Host (SPM) ids are not shown in the UI, but you can search on it by typing
'spm_id=' into a search box and it will return you host with the
specified id or nothing if that id is not in use
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread Martin Sivak
Hi,

yep, you have to restart the ovirt-ha-agent and ovirt-ha-broker services.

The scheduling message just means that the host has score 0 or is not
reporting score at all.

Martin

On Thu, Jun 29, 2017 at 1:33 PM, cmc  wrote:
> Thanks Martin, do I have to restart anything? When I try to use the
> 'migrate' operation, it complains that the other two hosts 'did not
> satisfy internal filter HA because it is not a Hosted Engine host..'
> (even though I reinstalled both these hosts with the 'deploy hosted
> engine' option, which suggests that something needs restarting. Should
> I worry about the sanlock errors, or will that be resolved by the
> change in host_id?
>
> Kind regards,
>
> Cam
>
> On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
>> Change the ids so they are distinct. I need to check if there is a way
>> to read the SPM ids from the engine as using the same numbers would be
>> the best.
>>
>> Martin
>>
>>
>>
>> On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
>>> Is there any way of recovering from this situation? I'd prefer to fix
>>> the issue rather than re-deploy, but if there is no recovery path, I
>>> could perhaps try re-deploying the hosted engine. In which case, would
>>> the best option be to take a backup of the Hosted Engine, and then
>>> shut it down, re-initialise the SAN partition (or use another
>>> partition) and retry the deployment? Would it be better to use the
>>> older backup from the bare metal engine that I originally used, or use
>>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>>> added since switching to Hosted Engine.
>>>
>>> Unfortunately I have very little time left to get this working before
>>> I have to hand it over for eval (by end of Friday).
>>>
>>> Here are some log snippets from the cluster that are current
>>>
>>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>>
>>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
>>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
>>> 3) (clusterlock:282)
>>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
>>> Error acquiring host id 3 for domain
>>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>>> Traceback (most recent call last):
>>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
>>> self.domain.acquireHostId(self.hostId, async=True)
>>>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
>>> self._manifest.acquireHostId(hostId, async)
>>>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
>>> self._domainLock.acquireHostId(hostId, async)
>>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>> line 297, in acquireHostId
>>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>>> AcquireHostIdFailure: Cannot acquire host id:
>>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
>>> lockspace add failure', 'Invalid argument'))
>>>
>>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>>
>>> MainThread::ERROR::2017-06-19
>>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>>> Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> MainThread::WARNING::2017-06-19
>>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Error while monitoring engine: Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> MainThread::WARNING::2017-06-19
>>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Unexpected error
>>> Traceback (most recent call last):
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 443, in start_monitoring
>>> self._initialize_domain_monitor()
>>>   File 
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>> line 823, in _initialize_domain_monitor
>>> raise Exception(msg)
>>> Exception: Failed to start monitoring domain
>>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>>> during domain acquisition
>>> MainThread::ERROR::2017-06-19
>>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>>> Shutting down the agent because of 3 failures in a row!
>>>
>>> From sanlock.log:
>>>
>>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>> conflicts with name of list1 s5
>>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>>
>>> From the two other hosts:

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Thanks Martin, do I have to restart anything? When I try to use the
'migrate' operation, it complains that the other two hosts 'did not
satisfy internal filter HA because it is not a Hosted Engine host..'
(even though I reinstalled both these hosts with the 'deploy hosted
engine' option, which suggests that something needs restarting. Should
I worry about the sanlock errors, or will that be resolved by the
change in host_id?

Kind regards,

Cam

On Thu, Jun 29, 2017 at 12:22 PM, Martin Sivak  wrote:
> Change the ids so they are distinct. I need to check if there is a way
> to read the SPM ids from the engine as using the same numbers would be
> the best.
>
> Martin
>
>
>
> On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
>> Is there any way of recovering from this situation? I'd prefer to fix
>> the issue rather than re-deploy, but if there is no recovery path, I
>> could perhaps try re-deploying the hosted engine. In which case, would
>> the best option be to take a backup of the Hosted Engine, and then
>> shut it down, re-initialise the SAN partition (or use another
>> partition) and retry the deployment? Would it be better to use the
>> older backup from the bare metal engine that I originally used, or use
>> a backup from the Hosted Engine? I'm not sure if any VMs have been
>> added since switching to Hosted Engine.
>>
>> Unfortunately I have very little time left to get this working before
>> I have to hand it over for eval (by end of Friday).
>>
>> Here are some log snippets from the cluster that are current
>>
>> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>>
>> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
>> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
>> 3) (clusterlock:282)
>> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
>> Error acquiring host id 3 for domain
>> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
>> self.domain.acquireHostId(self.hostId, async=True)
>>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
>> self._manifest.acquireHostId(hostId, async)
>>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
>> self._domainLock.acquireHostId(hostId, async)
>>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>> line 297, in acquireHostId
>> raise se.AcquireHostIdFailure(self._sdUUID, e)
>> AcquireHostIdFailure: Cannot acquire host id:
>> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
>> lockspace add failure', 'Invalid argument'))
>>
>> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>>
>> MainThread::ERROR::2017-06-19
>> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
>> Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> MainThread::WARNING::2017-06-19
>> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Error while monitoring engine: Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> MainThread::WARNING::2017-06-19
>> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Unexpected error
>> Traceback (most recent call last):
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 443, in start_monitoring
>> self._initialize_domain_monitor()
>>   File 
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>> line 823, in _initialize_domain_monitor
>> raise Exception(msg)
>> Exception: Failed to start monitoring domain
>> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
>> during domain acquisition
>> MainThread::ERROR::2017-06-19
>> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>> Shutting down the agent because of 3 failures in a row!
>>
>> From sanlock.log:
>>
>> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
>> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>> conflicts with name of list1 s5
>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>>
>> From the two other hosts:
>>
>> host 2:
>>
>> vdsm.log
>>
>> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
>> Internal server error (__init__:570)
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
>> 565, in _handle_request
>> res = method(**params)
>>   File 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread Martin Sivak
Change the ids so they are distinct. I need to check if there is a way
to read the SPM ids from the engine as using the same numbers would be
the best.

Martin



On Thu, Jun 29, 2017 at 12:46 PM, cmc  wrote:
> Is there any way of recovering from this situation? I'd prefer to fix
> the issue rather than re-deploy, but if there is no recovery path, I
> could perhaps try re-deploying the hosted engine. In which case, would
> the best option be to take a backup of the Hosted Engine, and then
> shut it down, re-initialise the SAN partition (or use another
> partition) and retry the deployment? Would it be better to use the
> older backup from the bare metal engine that I originally used, or use
> a backup from the Hosted Engine? I'm not sure if any VMs have been
> added since switching to Hosted Engine.
>
> Unfortunately I have very little time left to get this working before
> I have to hand it over for eval (by end of Friday).
>
> Here are some log snippets from the cluster that are current
>
> In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:
>
> 2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
> Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
> 3) (clusterlock:282)
> 2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
> Error acquiring host id 3 for domain
> 207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
> self.domain.acquireHostId(self.hostId, async=True)
>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
> self._manifest.acquireHostId(hostId, async)
>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
> self._domainLock.acquireHostId(hostId, async)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line 297, in acquireHostId
> raise se.AcquireHostIdFailure(self._sdUUID, e)
> AcquireHostIdFailure: Cannot acquire host id:
> ('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
> lockspace add failure', 'Invalid argument'))
>
> From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:
>
> MainThread::ERROR::2017-06-19
> 13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
> Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-19
> 13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Error while monitoring engine: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::WARNING::2017-06-19
> 13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Unexpected error
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 443, in start_monitoring
> self._initialize_domain_monitor()
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 823, in _initialize_domain_monitor
> raise Exception(msg)
> Exception: Failed to start monitoring domain
> (sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
> during domain acquisition
> MainThread::ERROR::2017-06-19
> 13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
> Shutting down the agent because of 3 failures in a row!
>
> From sanlock.log:
>
> 2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
> 207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
> conflicts with name of list1 s5
> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>
> From the two other hosts:
>
> host 2:
>
> vdsm.log
>
> 2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
> Internal server error (__init__:570)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
> 565, in _handle_request
> res = method(**params)
>   File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
> 202, in _dynamicMethod
> result = fn(*methodArgs)
>   File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
> io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
>   File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
> 'current_values': v.getIoTune()}
>   File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
> result = self.getIoTuneResponse()
>   File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
> res = self._dom.blockIoTune(
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
> 47, 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-29 Thread cmc
Is there any way of recovering from this situation? I'd prefer to fix
the issue rather than re-deploy, but if there is no recovery path, I
could perhaps try re-deploying the hosted engine. In which case, would
the best option be to take a backup of the Hosted Engine, and then
shut it down, re-initialise the SAN partition (or use another
partition) and retry the deployment? Would it be better to use the
older backup from the bare metal engine that I originally used, or use
a backup from the Hosted Engine? I'm not sure if any VMs have been
added since switching to Hosted Engine.

Unfortunately I have very little time left to get this working before
I have to hand it over for eval (by end of Friday).

Here are some log snippets from the cluster that are current

In /var/log/vdsm/vdsm.log on the host that has the Hosted Engine:

2017-06-29 10:50:15,071+0100 INFO  (monitor/207221b) [storage.SANLock]
Acquiring host id for domain 207221b2-959b-426b-b945-18e1adfed62f (id:
3) (clusterlock:282)
2017-06-29 10:50:15,072+0100 ERROR (monitor/207221b) [storage.Monitor]
Error acquiring host id 3 for domain
207221b2-959b-426b-b945-18e1adfed62f (monitor:558)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 555, in _acquireHostId
self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
self._manifest.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
self._domainLock.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
line 297, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
('207221b2-959b-426b-b945-18e1adfed62f', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))

From /var/log/ovirt-hosted-engine-ha/agent.log on the same host:

MainThread::ERROR::2017-06-19
13:30:50,592::hosted_engine::822::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor)
Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::469::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Error while monitoring engine: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::WARNING::2017-06-19
13:30:50,593::hosted_engine::472::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Unexpected error
Traceback (most recent call last):
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 443, in start_monitoring
self._initialize_domain_monitor()
  File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 823, in _initialize_domain_monitor
raise Exception(msg)
Exception: Failed to start monitoring domain
(sd_uuid=207221b2-959b-426b-b945-18e1adfed62f, host_id=1): timeout
during domain acquisition
MainThread::ERROR::2017-06-19
13:30:50,593::hosted_engine::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Shutting down the agent because of 3 failures in a row!

From sanlock.log:

2017-06-29 11:17:06+0100 1194149 [2530]: add_lockspace
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
conflicts with name of list1 s5
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0

From the two other hosts:

host 2:

vdsm.log

2017-06-29 10:53:47,755+0100 ERROR (jsonrpc/4) [jsonrpc.JsonRpcServer]
Internal server error (__init__:570)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line
565, in _handle_request
res = method(**params)
  File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line
202, in _dynamicMethod
result = fn(*methodArgs)
  File "/usr/share/vdsm/API.py", line 1454, in getAllVmIoTunePolicies
io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
  File "/usr/share/vdsm/clientIF.py", line 448, in getAllVmIoTunePolicies
'current_values': v.getIoTune()}
  File "/usr/share/vdsm/virt/vm.py", line 2803, in getIoTune
result = self.getIoTuneResponse()
  File "/usr/share/vdsm/virt/vm.py", line 2816, in getIoTuneResponse
res = self._dom.blockIoTune(
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line
47, in __getattr__
% self.vmid)
NotConnectedError: VM u'a79e6b0e-fff4-4cba-a02c-4c00be151300' was not
started yet or was shut down

/var/log/ovirt-hosted-engine-ha/agent.log

MainThread::INFO::2017-06-29
10:56:33,636::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
Found OVF_STORE: imgUUID:222610db-7880-4f4f-8559-a3635fd73555,
volUUID:c6e0d29b-eabf-4a09-a330-df54cfdd73f1

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-28 Thread cmc
Hi Martin,

yes, on two of the machines they have the same host_id. The other has
a different host_id.

To update since yesterday: I reinstalled and deployed Hosted Engine on
the other host (so all three hosts in the cluster now have it
installed). The second one I deployed said it was able to host the
engine (unlike the first I reinstalled), so I tried putting the host
with the Hosted Engine on it into maintenance to see if it would
migrate over. It managed to move all hosts but the Hosted Engine. And
now the host that said it was able to host the engine says
'unavailable due to HA score'. The host that it was trying to move
from is now in 'preparing for maintenance' for the last 12 hours.

The summary is:

kvm-ldn-01 - one of the original, pre-Hosted Engine hosts, reinstalled
with 'Deploy Hosted Engine'. No icon saying it can host the Hosted
Hngine, host_id of '2' in /etc/ovirt-hosted-engine/hosted-engine.conf.
'add_lockspace' fails in sanlock.log

kvm-ldn-02 - the other host that was pre-existing before Hosted Engine
was created. Reinstalled with 'Deploy Hosted Engine'. Had an icon
saying that it was able to host the Hosted Engine, but after migration
was attempted when putting kvm-ldn-03 into maintenance, it reports:
'unavailable due to HA score'. It has a host_id of '1' in
/etc/ovirt-hosted-engine/hosted-engine.conf. No errors in sanlock.log

kvm-ldn-03 - this was the host I deployed Hosted Engine on, which was
not part of the original cluster. I restored the bare-metal engine
backup in the Hosted Engine on this host when deploying it, without
error. It currently has the Hosted Engine on it (as the only VM after
I put that host into maintenance to test the HA of Hosted Engine).
Sanlock log shows conflicts

I will look through all the logs for any other errors. Please let me
know if you need any logs or other clarification/information.

Thanks,

Campbell

On Wed, Jun 28, 2017 at 9:25 AM, Martin Sivak  wrote:
> Hi,
>
> can you please check the contents of
> /etc/ovirt-hosted-engine/hosted-engine.conf or
> /etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
> right now) and search for host-id?
>
> Make sure the IDs are different. If they are not, then there is a bug 
> somewhere.
>
> Martin
>
> On Tue, Jun 27, 2017 at 6:26 PM, cmc  wrote:
>> I see this on the host it is trying to migrate in /var/log/sanlock:
>>
>> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
>> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
>> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
>> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
>> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>>
>> The sanlock service is running. Why would this occur?
>>
>> Thanks,
>>
>> C
>>
>> On Tue, Jun 27, 2017 at 5:21 PM, cmc  wrote:
>>> Hi Martin,
>>>
>>> Thanks for the reply. I have done this, and the deployment completed
>>> without error. However, it still will not allow the Hosted Engine
>>> migrate to another host. The
>>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>>> reports:
>>>
>>> 8<---
>>>
>>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>>> High Availability Communications Broker...
>>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>>> Failed to read metadata from
>>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>>   Traceback (most
>>> recent call last):
>>> File
>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>> line 129, in get_raw_stats_for_service_type
>>>   f =
>>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>>   OSError: [Errno 2]
>>> No such file or directory:
>>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>>
>>> 8<---
>>>
>>> I checked the path, and it exists. I can run 'less -f' on it fine. The
>>> perms are slightly different on the host that is running the VM vs the
>>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>>> this a san locking issue?
>>>
>>> Thanks for any help,
>>>
>>> Cam
>>>
>>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak  wrote:
> Should it be? It was not in the instructions for the migration from
> bare-metal to Hosted VM

 The hosted engine will only migrate to hosts that have the services
 running. Please put one 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-28 Thread Martin Sivak
Hi,

can you please check the contents of
/etc/ovirt-hosted-engine/hosted-engine.conf or
/etc/ovirt-hosted-engine-ha/agent.conf (I am not sure which one it is
right now) and search for host-id?

Make sure the IDs are different. If they are not, then there is a bug somewhere.

Martin

On Tue, Jun 27, 2017 at 6:26 PM, cmc  wrote:
> I see this on the host it is trying to migrate in /var/log/sanlock:
>
> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>
> The sanlock service is running. Why would this occur?
>
> Thanks,
>
> C
>
> On Tue, Jun 27, 2017 at 5:21 PM, cmc  wrote:
>> Hi Martin,
>>
>> Thanks for the reply. I have done this, and the deployment completed
>> without error. However, it still will not allow the Hosted Engine
>> migrate to another host. The
>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>> reports:
>>
>> 8<---
>>
>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>> High Availability Communications Broker...
>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>> Failed to read metadata from
>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>   Traceback (most
>> recent call last):
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>> line 129, in get_raw_stats_for_service_type
>>   f =
>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>   OSError: [Errno 2]
>> No such file or directory:
>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>
>> 8<---
>>
>> I checked the path, and it exists. I can run 'less -f' on it fine. The
>> perms are slightly different on the host that is running the VM vs the
>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>> this a san locking issue?
>>
>> Thanks for any help,
>>
>> Cam
>>
>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak  wrote:
 Should it be? It was not in the instructions for the migration from
 bare-metal to Hosted VM
>>>
>>> The hosted engine will only migrate to hosts that have the services
>>> running. Please put one other host to maintenance and select Hosted
>>> engine action: DEPLOY in the reinstall dialog.
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc  wrote:
 I changed the 'os.other.devices.display.protocols.value.3.6 =
 spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
 as 4 and the hosted engine now appears in the list of VMs. I am
 guessing the compatibility version was causing it to use the 3.6
 version. However, I am still unable to migrate the engine VM to
 another host. When I try putting the host it is currently on into
 maintenance, it reports:

 Error while executing action: Cannot switch the Host(s) to Maintenance 
 mode.
 There are no available hosts capable of running the engine VM.

 Running 'hosted-engine --vm-status' still shows 'Engine status:
 unknown stale-data'.

 The ovirt-ha-broker service is only running on one host. It was set to
 'disabled' in systemd. It won't start as there is no
 /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
 Should it be? It was not in the instructions for the migration from
 bare-metal to Hosted VM

 Thanks,

 Cam

 On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
> Hi Tomas,
>
> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
> engine VM, I have:
>
> os.other.devices.display.protocols.value = 
> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
> os.other.devices.display.protocols.value.3.6 = 
> spice/qxl,vnc/cirrus,vnc/qxl
>
> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>
> Is there somewhere else I should be looking?
>
> Thanks,
>
> Cam
>
> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  
> wrote:
>>
>>
>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>  wrote:

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-27 Thread cmc
On the host that has the Hosted Engine VM, the sanlock.log reports:

2017-06-27 17:30:20+0100 1043742 [7307]: add_lockspace
207221b2-959b-426b-b945-18e1adfed62f:3:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
conflicts with name of list1 s5
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0

Again, I'm not sure what has happened here.

On Tue, Jun 27, 2017 at 5:26 PM, cmc  wrote:
> I see this on the host it is trying to migrate in /var/log/sanlock:
>
> 2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
> 207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
> 2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
> busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
> 2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262
>
> The sanlock service is running. Why would this occur?
>
> Thanks,
>
> C
>
> On Tue, Jun 27, 2017 at 5:21 PM, cmc  wrote:
>> Hi Martin,
>>
>> Thanks for the reply. I have done this, and the deployment completed
>> without error. However, it still will not allow the Hosted Engine
>> migrate to another host. The
>> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
>> I re-installed, but the ovirt-ha-broker.service, though it starts,
>> reports:
>>
>> 8<---
>>
>> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
>> High Availability Communications Broker...
>> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
>> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
>> Failed to read metadata from
>> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>>   Traceback (most
>> recent call last):
>> File
>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>> line 129, in get_raw_stats_for_service_type
>>   f =
>> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>>   OSError: [Errno 2]
>> No such file or directory:
>> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>>
>> 8<---
>>
>> I checked the path, and it exists. I can run 'less -f' on it fine. The
>> perms are slightly different on the host that is running the VM vs the
>> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
>> this a san locking issue?
>>
>> Thanks for any help,
>>
>> Cam
>>
>> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak  wrote:
 Should it be? It was not in the instructions for the migration from
 bare-metal to Hosted VM
>>>
>>> The hosted engine will only migrate to hosts that have the services
>>> running. Please put one other host to maintenance and select Hosted
>>> engine action: DEPLOY in the reinstall dialog.
>>>
>>> Best regards
>>>
>>> Martin Sivak
>>>
>>> On Tue, Jun 27, 2017 at 1:23 PM, cmc  wrote:
 I changed the 'os.other.devices.display.protocols.value.3.6 =
 spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
 as 4 and the hosted engine now appears in the list of VMs. I am
 guessing the compatibility version was causing it to use the 3.6
 version. However, I am still unable to migrate the engine VM to
 another host. When I try putting the host it is currently on into
 maintenance, it reports:

 Error while executing action: Cannot switch the Host(s) to Maintenance 
 mode.
 There are no available hosts capable of running the engine VM.

 Running 'hosted-engine --vm-status' still shows 'Engine status:
 unknown stale-data'.

 The ovirt-ha-broker service is only running on one host. It was set to
 'disabled' in systemd. It won't start as there is no
 /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
 Should it be? It was not in the instructions for the migration from
 bare-metal to Hosted VM

 Thanks,

 Cam

 On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
> Hi Tomas,
>
> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
> engine VM, I have:
>
> os.other.devices.display.protocols.value = 
> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
> os.other.devices.display.protocols.value.3.6 = 
> spice/qxl,vnc/cirrus,vnc/qxl
>
> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>
> Is there somewhere else I should be looking?
>
> Thanks,
>
> Cam
>
> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  
> wrote:
>>
>>
>> On Thu, 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-27 Thread cmc
I see this on the host it is trying to migrate in /var/log/sanlock:

2017-06-27 17:10:40+0100 527703 [2407]: s3528 lockspace
207221b2-959b-426b-b945-18e1adfed62f:1:/dev/207221b2-959b-426b-b945-18e1adfed62f/ids:0
2017-06-27 17:13:00+0100 527843 [27446]: s3528 delta_acquire host_id 1
busy1 1 2 1042692 3d4ec963-8486-43a2-a7d9-afa82508f89f.kvm-ldn-03
2017-06-27 17:13:01+0100 527844 [2407]: s3528 add_lockspace fail result -262

The sanlock service is running. Why would this occur?

Thanks,

C

On Tue, Jun 27, 2017 at 5:21 PM, cmc  wrote:
> Hi Martin,
>
> Thanks for the reply. I have done this, and the deployment completed
> without error. However, it still will not allow the Hosted Engine
> migrate to another host. The
> /etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
> I re-installed, but the ovirt-ha-broker.service, though it starts,
> reports:
>
> 8<---
>
> Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
> High Availability Communications Broker...
> Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
> ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
> Failed to read metadata from
> /rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
>   Traceback (most
> recent call last):
> File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
> line 129, in get_raw_stats_for_service_type
>   f =
> os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
>   OSError: [Errno 2]
> No such file or directory:
> '/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'
>
> 8<---
>
> I checked the path, and it exists. I can run 'less -f' on it fine. The
> perms are slightly different on the host that is running the VM vs the
> one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
> this a san locking issue?
>
> Thanks for any help,
>
> Cam
>
> On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak  wrote:
>>> Should it be? It was not in the instructions for the migration from
>>> bare-metal to Hosted VM
>>
>> The hosted engine will only migrate to hosts that have the services
>> running. Please put one other host to maintenance and select Hosted
>> engine action: DEPLOY in the reinstall dialog.
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Tue, Jun 27, 2017 at 1:23 PM, cmc  wrote:
>>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
>>> as 4 and the hosted engine now appears in the list of VMs. I am
>>> guessing the compatibility version was causing it to use the 3.6
>>> version. However, I am still unable to migrate the engine VM to
>>> another host. When I try putting the host it is currently on into
>>> maintenance, it reports:
>>>
>>> Error while executing action: Cannot switch the Host(s) to Maintenance mode.
>>> There are no available hosts capable of running the engine VM.
>>>
>>> Running 'hosted-engine --vm-status' still shows 'Engine status:
>>> unknown stale-data'.
>>>
>>> The ovirt-ha-broker service is only running on one host. It was set to
>>> 'disabled' in systemd. It won't start as there is no
>>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>>> Should it be? It was not in the instructions for the migration from
>>> bare-metal to Hosted VM
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
 Hi Tomas,

 So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
 engine VM, I have:

 os.other.devices.display.protocols.value = 
 spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
 os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl

 That seems to match - I assume since this is 4.1, the 3.6 should not apply

 Is there somewhere else I should be looking?

 Thanks,

 Cam

 On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  
 wrote:
>
>
> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>  wrote:
>>
>>
>> > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
>> >
>> > Tomas, what fields are needed in a VM to pass the check that causes
>> > the following error?
>> >
>> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>> > 'ImportVm'
>> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>> >
>> > 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-27 Thread cmc
Hi Martin,

Thanks for the reply. I have done this, and the deployment completed
without error. However, it still will not allow the Hosted Engine
migrate to another host. The
/etc/ovirt-hosted-engine/hosted-engine.conf got created ok on the host
I re-installed, but the ovirt-ha-broker.service, though it starts,
reports:

8<---

Jun 27 14:58:26 kvm-ldn-01 systemd[1]: Starting oVirt Hosted Engine
High Availability Communications Broker...
Jun 27 14:58:27 kvm-ldn-01 ovirt-ha-broker[6101]: ovirt-ha-broker
ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker ERROR
Failed to read metadata from
/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata
  Traceback (most
recent call last):
File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 129, in get_raw_stats_for_service_type
  f =
os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
  OSError: [Errno 2]
No such file or directory:
'/rhev/data-center/mnt/blockSD/207221b2-959b-426b-b945-18e1adfed62f/ha_agent/hosted-engine.metadata'

8<---

I checked the path, and it exists. I can run 'less -f' on it fine. The
perms are slightly different on the host that is running the VM vs the
one that is reporting errors (600 vs 660), ownership is vdsm:qemu. Is
this a san locking issue?

Thanks for any help,

Cam

On Tue, Jun 27, 2017 at 1:41 PM, Martin Sivak  wrote:
>> Should it be? It was not in the instructions for the migration from
>> bare-metal to Hosted VM
>
> The hosted engine will only migrate to hosts that have the services
> running. Please put one other host to maintenance and select Hosted
> engine action: DEPLOY in the reinstall dialog.
>
> Best regards
>
> Martin Sivak
>
> On Tue, Jun 27, 2017 at 1:23 PM, cmc  wrote:
>> I changed the 'os.other.devices.display.protocols.value.3.6 =
>> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
>> as 4 and the hosted engine now appears in the list of VMs. I am
>> guessing the compatibility version was causing it to use the 3.6
>> version. However, I am still unable to migrate the engine VM to
>> another host. When I try putting the host it is currently on into
>> maintenance, it reports:
>>
>> Error while executing action: Cannot switch the Host(s) to Maintenance mode.
>> There are no available hosts capable of running the engine VM.
>>
>> Running 'hosted-engine --vm-status' still shows 'Engine status:
>> unknown stale-data'.
>>
>> The ovirt-ha-broker service is only running on one host. It was set to
>> 'disabled' in systemd. It won't start as there is no
>> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
>> Should it be? It was not in the instructions for the migration from
>> bare-metal to Hosted VM
>>
>> Thanks,
>>
>> Cam
>>
>> On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
>>> Hi Tomas,
>>>
>>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>>> engine VM, I have:
>>>
>>> os.other.devices.display.protocols.value = 
>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>>
>>> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>>>
>>> Is there somewhere else I should be looking?
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  wrote:


 On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
  wrote:
>
>
> > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
> >
> > Tomas, what fields are needed in a VM to pass the check that causes
> > the following error?
> >
> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
> > 'ImportVm'
> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> >
> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>
> to match the OS and VM Display type;-)
> Configuration is in osinfo….e.g. if that is import from older releases on
> Linux this is typically caused by the cahgen of cirrus to vga for 
> non-SPICE
> VMs


 yep, the default supported combinations for 4.0+ is this:
 os.other.devices.display.protocols.value =
 spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus

>
>
> >
> > Thanks.
> >
> > On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
> >> Hi Martin,
> >>
> >>>
> >>> just as a random comment, do you still have the database backup from
> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-27 Thread Martin Sivak
> Should it be? It was not in the instructions for the migration from
> bare-metal to Hosted VM

The hosted engine will only migrate to hosts that have the services
running. Please put one other host to maintenance and select Hosted
engine action: DEPLOY in the reinstall dialog.

Best regards

Martin Sivak

On Tue, Jun 27, 2017 at 1:23 PM, cmc  wrote:
> I changed the 'os.other.devices.display.protocols.value.3.6 =
> spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
> as 4 and the hosted engine now appears in the list of VMs. I am
> guessing the compatibility version was causing it to use the 3.6
> version. However, I am still unable to migrate the engine VM to
> another host. When I try putting the host it is currently on into
> maintenance, it reports:
>
> Error while executing action: Cannot switch the Host(s) to Maintenance mode.
> There are no available hosts capable of running the engine VM.
>
> Running 'hosted-engine --vm-status' still shows 'Engine status:
> unknown stale-data'.
>
> The ovirt-ha-broker service is only running on one host. It was set to
> 'disabled' in systemd. It won't start as there is no
> /etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
> Should it be? It was not in the instructions for the migration from
> bare-metal to Hosted VM
>
> Thanks,
>
> Cam
>
> On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
>> Hi Tomas,
>>
>> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
>> engine VM, I have:
>>
>> os.other.devices.display.protocols.value = 
>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>>
>> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>>
>> Is there somewhere else I should be looking?
>>
>> Thanks,
>>
>> Cam
>>
>> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  wrote:
>>>
>>>
>>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>>  wrote:


 > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
 >
 > Tomas, what fields are needed in a VM to pass the check that causes
 > the following error?
 >
 > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
 > 'ImportVm'
 > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
 >
 > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS

 to match the OS and VM Display type;-)
 Configuration is in osinfo….e.g. if that is import from older releases on
 Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE
 VMs
>>>
>>>
>>> yep, the default supported combinations for 4.0+ is this:
>>> os.other.devices.display.protocols.value =
>>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>>


 >
 > Thanks.
 >
 > On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
 >> Hi Martin,
 >>
 >>>
 >>> just as a random comment, do you still have the database backup from
 >>> the bare metal -> VM attempt? It might be possible to just try again
 >>> using it. Or in the worst case.. update the offending value there
 >>> before restoring it to the new engine instance.
 >>
 >> I still have the backup. I'd rather do the latter, as re-running the
 >> HE deployment is quite lengthy and involved (I have to re-initialise
 >> the FC storage each time). Do you know what the offending value(s)
 >> would be? Would it be in the Postgres DB or in a config file
 >> somewhere?
 >>
 >> Cheers,
 >>
 >> Cam
 >>
 >>> Regards
 >>>
 >>> Martin Sivak
 >>>
 >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
  Hi Yanir,
 
  Thanks for the reply.
 
 > First of all, maybe a chain reaction of :
 > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
 > 'ImportVm'
 > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
 >
 > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
 > is causing the hosted engine vm not to be set up correctly  and
 > further
 > actions were made when the hosted engine vm wasnt in a stable state.
 >
 > As for now, are you trying to revert back to a previous/initial
 > state ?
 
  I'm not trying to revert it to a previous state for now. This was a
  migration from a bare metal engine, and it didn't report any error
  during the migration. I'd had some problems on my first attempts at
  this migration, whereby it never completed (due to a proxy issue) but
  I managed to resolve 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-27 Thread cmc
I changed the 'os.other.devices.display.protocols.value.3.6 =
spice/qxl,vnc/cirrus,vnc/qxl' line to have the same display protocols
as 4 and the hosted engine now appears in the list of VMs. I am
guessing the compatibility version was causing it to use the 3.6
version. However, I am still unable to migrate the engine VM to
another host. When I try putting the host it is currently on into
maintenance, it reports:

Error while executing action: Cannot switch the Host(s) to Maintenance mode.
There are no available hosts capable of running the engine VM.

Running 'hosted-engine --vm-status' still shows 'Engine status:
unknown stale-data'.

The ovirt-ha-broker service is only running on one host. It was set to
'disabled' in systemd. It won't start as there is no
/etc/ovirt-hosted-engine/hosted-engine.conf on the other two hosts.
Should it be? It was not in the instructions for the migration from
bare-metal to Hosted VM

Thanks,

Cam

On Thu, Jun 22, 2017 at 1:07 PM, cmc  wrote:
> Hi Tomas,
>
> So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
> engine VM, I have:
>
> os.other.devices.display.protocols.value = 
> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
> os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl
>
> That seems to match - I assume since this is 4.1, the 3.6 should not apply
>
> Is there somewhere else I should be looking?
>
> Thanks,
>
> Cam
>
> On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  wrote:
>>
>>
>> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>>  wrote:
>>>
>>>
>>> > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
>>> >
>>> > Tomas, what fields are needed in a VM to pass the check that causes
>>> > the following error?
>>> >
>>> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>> > 'ImportVm'
>>> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>> >
>>> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>>
>>> to match the OS and VM Display type;-)
>>> Configuration is in osinfo….e.g. if that is import from older releases on
>>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE
>>> VMs
>>
>>
>> yep, the default supported combinations for 4.0+ is this:
>> os.other.devices.display.protocols.value =
>> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>>
>>>
>>>
>>> >
>>> > Thanks.
>>> >
>>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
>>> >> Hi Martin,
>>> >>
>>> >>>
>>> >>> just as a random comment, do you still have the database backup from
>>> >>> the bare metal -> VM attempt? It might be possible to just try again
>>> >>> using it. Or in the worst case.. update the offending value there
>>> >>> before restoring it to the new engine instance.
>>> >>
>>> >> I still have the backup. I'd rather do the latter, as re-running the
>>> >> HE deployment is quite lengthy and involved (I have to re-initialise
>>> >> the FC storage each time). Do you know what the offending value(s)
>>> >> would be? Would it be in the Postgres DB or in a config file
>>> >> somewhere?
>>> >>
>>> >> Cheers,
>>> >>
>>> >> Cam
>>> >>
>>> >>> Regards
>>> >>>
>>> >>> Martin Sivak
>>> >>>
>>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
>>>  Hi Yanir,
>>> 
>>>  Thanks for the reply.
>>> 
>>> > First of all, maybe a chain reaction of :
>>> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>>> > 'ImportVm'
>>> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>> >
>>> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>> > is causing the hosted engine vm not to be set up correctly  and
>>> > further
>>> > actions were made when the hosted engine vm wasnt in a stable state.
>>> >
>>> > As for now, are you trying to revert back to a previous/initial
>>> > state ?
>>> 
>>>  I'm not trying to revert it to a previous state for now. This was a
>>>  migration from a bare metal engine, and it didn't report any error
>>>  during the migration. I'd had some problems on my first attempts at
>>>  this migration, whereby it never completed (due to a proxy issue) but
>>>  I managed to resolve this. Do you know of a way to get the Hosted
>>>  Engine VM into a stable state, without rebuilding the entire cluster
>>>  from scratch (since I have a lot of VMs on it)?
>>> 
>>>  Thanks for any help.
>>> 
>>>  Regards,
>>> 
>>>  Cam
>>> 
>>> > Regards,
>>> > Yanir
>>> >
>>> > On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>>> >>
>>> >> Hi Jenny/Martin,
>>> >>
>>> >> Any idea what I can do here? The hosted engine VM has no 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread cmc
Hi Tomas,

So in my /usr/share/ovirt-engine/conf/osinfo-defaults.properties on my
engine VM, I have:

os.other.devices.display.protocols.value = spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
os.other.devices.display.protocols.value.3.6 = spice/qxl,vnc/cirrus,vnc/qxl

That seems to match - I assume since this is 4.1, the 3.6 should not apply

Is there somewhere else I should be looking?

Thanks,

Cam

On Thu, Jun 22, 2017 at 11:40 AM, Tomas Jelinek  wrote:
>
>
> On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek
>  wrote:
>>
>>
>> > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
>> >
>> > Tomas, what fields are needed in a VM to pass the check that causes
>> > the following error?
>> >
>> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>> > 'ImportVm'
>> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>> >
>> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>
>> to match the OS and VM Display type;-)
>> Configuration is in osinfo….e.g. if that is import from older releases on
>> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE
>> VMs
>
>
> yep, the default supported combinations for 4.0+ is this:
> os.other.devices.display.protocols.value =
> spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus
>
>>
>>
>> >
>> > Thanks.
>> >
>> > On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
>> >> Hi Martin,
>> >>
>> >>>
>> >>> just as a random comment, do you still have the database backup from
>> >>> the bare metal -> VM attempt? It might be possible to just try again
>> >>> using it. Or in the worst case.. update the offending value there
>> >>> before restoring it to the new engine instance.
>> >>
>> >> I still have the backup. I'd rather do the latter, as re-running the
>> >> HE deployment is quite lengthy and involved (I have to re-initialise
>> >> the FC storage each time). Do you know what the offending value(s)
>> >> would be? Would it be in the Postgres DB or in a config file
>> >> somewhere?
>> >>
>> >> Cheers,
>> >>
>> >> Cam
>> >>
>> >>> Regards
>> >>>
>> >>> Martin Sivak
>> >>>
>> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
>>  Hi Yanir,
>> 
>>  Thanks for the reply.
>> 
>> > First of all, maybe a chain reaction of :
>> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>> > 'ImportVm'
>> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>> >
>> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>> > is causing the hosted engine vm not to be set up correctly  and
>> > further
>> > actions were made when the hosted engine vm wasnt in a stable state.
>> >
>> > As for now, are you trying to revert back to a previous/initial
>> > state ?
>> 
>>  I'm not trying to revert it to a previous state for now. This was a
>>  migration from a bare metal engine, and it didn't report any error
>>  during the migration. I'd had some problems on my first attempts at
>>  this migration, whereby it never completed (due to a proxy issue) but
>>  I managed to resolve this. Do you know of a way to get the Hosted
>>  Engine VM into a stable state, without rebuilding the entire cluster
>>  from scratch (since I have a lot of VMs on it)?
>> 
>>  Thanks for any help.
>> 
>>  Regards,
>> 
>>  Cam
>> 
>> > Regards,
>> > Yanir
>> >
>> > On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>> >>
>> >> Hi Jenny/Martin,
>> >>
>> >> Any idea what I can do here? The hosted engine VM has no log on any
>> >> host in /var/log/libvirt/qemu, and I fear that if I need to put the
>> >> host into maintenance, e.g., to upgrade it that I created it on
>> >> (which
>> >> I think is hosting it), or if it fails for any reason, it won't get
>> >> migrated to another host, and I will not be able to manage the
>> >> cluster. It seems to be a very dangerous position to be in.
>> >>
>> >> Thanks,
>> >>
>> >> Cam
>> >>
>> >> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
>> >>> Thanks Martin. The hosts are all part of the same cluster.
>> >>>
>> >>> I get these errors in the engine.log on the engine:
>> >>>
>> >>> 2017-06-19 03:28:05,030Z WARN
>> >>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> >>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
>> >>> 'ImportVm'
>> >>> failed for user SYST
>> >>> EM. Reasons:
>> >>>
>> >>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>> >>> 2017-06-19 03:28:05,030Z INFO
>> >>> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread Tomas Jelinek
On Thu, Jun 22, 2017 at 12:38 PM, Michal Skrivanek <
michal.skriva...@redhat.com> wrote:

>
> > On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
> >
> > Tomas, what fields are needed in a VM to pass the check that causes
> > the following error?
> >
> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
> 'ImportVm'
> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_
> TYPE_IS_NOT_SUPPORTED_BY_OS
>
> to match the OS and VM Display type;-)
> Configuration is in osinfo….e.g. if that is import from older releases on
> Linux this is typically caused by the cahgen of cirrus to vga for non-SPICE
> VMs
>

yep, the default supported combinations for 4.0+ is this:
os.other.devices.display.protocols.value =
spice/qxl,vnc/vga,vnc/qxl,vnc/cirrus


>
> >
> > Thanks.
> >
> > On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
> >> Hi Martin,
> >>
> >>>
> >>> just as a random comment, do you still have the database backup from
> >>> the bare metal -> VM attempt? It might be possible to just try again
> >>> using it. Or in the worst case.. update the offending value there
> >>> before restoring it to the new engine instance.
> >>
> >> I still have the backup. I'd rather do the latter, as re-running the
> >> HE deployment is quite lengthy and involved (I have to re-initialise
> >> the FC storage each time). Do you know what the offending value(s)
> >> would be? Would it be in the Postgres DB or in a config file
> >> somewhere?
> >>
> >> Cheers,
> >>
> >> Cam
> >>
> >>> Regards
> >>>
> >>> Martin Sivak
> >>>
> >>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
>  Hi Yanir,
> 
>  Thanks for the reply.
> 
> > First of all, maybe a chain reaction of :
> > WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action
> 'ImportVm'
> > failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> > ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_
> TYPE_IS_NOT_SUPPORTED_BY_OS
> > is causing the hosted engine vm not to be set up correctly  and
> further
> > actions were made when the hosted engine vm wasnt in a stable state.
> >
> > As for now, are you trying to revert back to a previous/initial
> state ?
> 
>  I'm not trying to revert it to a previous state for now. This was a
>  migration from a bare metal engine, and it didn't report any error
>  during the migration. I'd had some problems on my first attempts at
>  this migration, whereby it never completed (due to a proxy issue) but
>  I managed to resolve this. Do you know of a way to get the Hosted
>  Engine VM into a stable state, without rebuilding the entire cluster
>  from scratch (since I have a lot of VMs on it)?
> 
>  Thanks for any help.
> 
>  Regards,
> 
>  Cam
> 
> > Regards,
> > Yanir
> >
> > On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
> >>
> >> Hi Jenny/Martin,
> >>
> >> Any idea what I can do here? The hosted engine VM has no log on any
> >> host in /var/log/libvirt/qemu, and I fear that if I need to put the
> >> host into maintenance, e.g., to upgrade it that I created it on
> (which
> >> I think is hosting it), or if it fails for any reason, it won't get
> >> migrated to another host, and I will not be able to manage the
> >> cluster. It seems to be a very dangerous position to be in.
> >>
> >> Thanks,
> >>
> >> Cam
> >>
> >> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
> >>> Thanks Martin. The hosts are all part of the same cluster.
> >>>
> >>> I get these errors in the engine.log on the engine:
> >>>
> >>> 2017-06-19 03:28:05,030Z WARN
> >>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> >>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action
> 'ImportVm'
> >>> failed for user SYST
> >>> EM. Reasons:
> >>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_
> ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> >>> 2017-06-19 03:28:05,030Z INFO
> >>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> >>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
> >>> 'EngineLock:{exclusiveLocks='[a
> >>> 79e6b0e-fff4-4cba-a02c-4c00be151300= >>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
> >>> HostedEngine=]',
> >>> sharedLocks=
> >>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300= >>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
> >>> 2017-06-19 03:28:05,030Z ERROR
> >>> [org.ovirt.engine.core.bll.HostedEngineImporter]
> >>> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread Michal Skrivanek

> On 22 Jun 2017, at 12:31, Martin Sivak  wrote:
> 
> Tomas, what fields are needed in a VM to pass the check that causes
> the following error?
> 
> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS

to match the OS and VM Display type;-)
Configuration is in osinfo….e.g. if that is import from older releases on Linux 
this is typically caused by the cahgen of cirrus to vga for non-SPICE VMs

> 
> Thanks.
> 
> On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
>> Hi Martin,
>> 
>>> 
>>> just as a random comment, do you still have the database backup from
>>> the bare metal -> VM attempt? It might be possible to just try again
>>> using it. Or in the worst case.. update the offending value there
>>> before restoring it to the new engine instance.
>> 
>> I still have the backup. I'd rather do the latter, as re-running the
>> HE deployment is quite lengthy and involved (I have to re-initialise
>> the FC storage each time). Do you know what the offending value(s)
>> would be? Would it be in the Postgres DB or in a config file
>> somewhere?
>> 
>> Cheers,
>> 
>> Cam
>> 
>>> Regards
>>> 
>>> Martin Sivak
>>> 
>>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
 Hi Yanir,
 
 Thanks for the reply.
 
> First of all, maybe a chain reaction of :
> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> is causing the hosted engine vm not to be set up correctly  and further
> actions were made when the hosted engine vm wasnt in a stable state.
> 
> As for now, are you trying to revert back to a previous/initial state ?
 
 I'm not trying to revert it to a previous state for now. This was a
 migration from a bare metal engine, and it didn't report any error
 during the migration. I'd had some problems on my first attempts at
 this migration, whereby it never completed (due to a proxy issue) but
 I managed to resolve this. Do you know of a way to get the Hosted
 Engine VM into a stable state, without rebuilding the entire cluster
 from scratch (since I have a lot of VMs on it)?
 
 Thanks for any help.
 
 Regards,
 
 Cam
 
> Regards,
> Yanir
> 
> On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>> 
>> Hi Jenny/Martin,
>> 
>> Any idea what I can do here? The hosted engine VM has no log on any
>> host in /var/log/libvirt/qemu, and I fear that if I need to put the
>> host into maintenance, e.g., to upgrade it that I created it on (which
>> I think is hosting it), or if it fails for any reason, it won't get
>> migrated to another host, and I will not be able to manage the
>> cluster. It seems to be a very dangerous position to be in.
>> 
>> Thanks,
>> 
>> Cam
>> 
>> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
>>> Thanks Martin. The hosts are all part of the same cluster.
>>> 
>>> I get these errors in the engine.log on the engine:
>>> 
>>> 2017-06-19 03:28:05,030Z WARN
>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
>>> failed for user SYST
>>> EM. Reasons:
>>> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>> 2017-06-19 03:28:05,030Z INFO
>>> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>> 'EngineLock:{exclusiveLocks='[a
>>> 79e6b0e-fff4-4cba-a02c-4c00be151300=>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>> HostedEngine=]',
>>> sharedLocks=
>>> '[a79e6b0e-fff4-4cba-a02c-4c00be151300=>> ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>> 2017-06-19 03:28:05,030Z ERROR
>>> [org.ovirt.engine.core.bll.HostedEngineImporter]
>>> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>> Engine VM
>>> 
>>> The sanlock.log reports conflicts on that same host, and a different
>>> error on the other hosts, not sure if they are related.
>>> 
>>> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
>>> which I deployed the hosted engine VM on:
>>> 
>>> MainThread::ERROR::2017-06-19
>>> 
>>> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread Martin Sivak
Tomas, what fields are needed in a VM to pass the check that causes
the following error?

 WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
 failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
 ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS

Thanks.

On Thu, Jun 22, 2017 at 12:19 PM, cmc  wrote:
> Hi Martin,
>
>>
>> just as a random comment, do you still have the database backup from
>> the bare metal -> VM attempt? It might be possible to just try again
>> using it. Or in the worst case.. update the offending value there
>> before restoring it to the new engine instance.
>
> I still have the backup. I'd rather do the latter, as re-running the
> HE deployment is quite lengthy and involved (I have to re-initialise
> the FC storage each time). Do you know what the offending value(s)
> would be? Would it be in the Postgres DB or in a config file
> somewhere?
>
> Cheers,
>
> Cam
>
>> Regards
>>
>> Martin Sivak
>>
>> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
>>> Hi Yanir,
>>>
>>> Thanks for the reply.
>>>
 First of all, maybe a chain reaction of :
 WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
 failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
 ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
 is causing the hosted engine vm not to be set up correctly  and further
 actions were made when the hosted engine vm wasnt in a stable state.

 As for now, are you trying to revert back to a previous/initial state ?
>>>
>>> I'm not trying to revert it to a previous state for now. This was a
>>> migration from a bare metal engine, and it didn't report any error
>>> during the migration. I'd had some problems on my first attempts at
>>> this migration, whereby it never completed (due to a proxy issue) but
>>> I managed to resolve this. Do you know of a way to get the Hosted
>>> Engine VM into a stable state, without rebuilding the entire cluster
>>> from scratch (since I have a lot of VMs on it)?
>>>
>>> Thanks for any help.
>>>
>>> Regards,
>>>
>>> Cam
>>>
 Regards,
 Yanir

 On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>
> Hi Jenny/Martin,
>
> Any idea what I can do here? The hosted engine VM has no log on any
> host in /var/log/libvirt/qemu, and I fear that if I need to put the
> host into maintenance, e.g., to upgrade it that I created it on (which
> I think is hosting it), or if it fails for any reason, it won't get
> migrated to another host, and I will not be able to manage the
> cluster. It seems to be a very dangerous position to be in.
>
> Thanks,
>
> Cam
>
> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
> > Thanks Martin. The hosts are all part of the same cluster.
> >
> > I get these errors in the engine.log on the engine:
> >
> > 2017-06-19 03:28:05,030Z WARN
> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> > failed for user SYST
> > EM. Reasons:
> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> > 2017-06-19 03:28:05,030Z INFO
> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
> > 'EngineLock:{exclusiveLocks='[a
> > 79e6b0e-fff4-4cba-a02c-4c00be151300= > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
> > HostedEngine=]',
> > sharedLocks=
> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300= > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
> > 2017-06-19 03:28:05,030Z ERROR
> > [org.ovirt.engine.core.bll.HostedEngineImporter]
> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
> > Engine VM
> >
> > The sanlock.log reports conflicts on that same host, and a different
> > error on the other hosts, not sure if they are related.
> >
> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
> > which I deployed the hosted engine VM on:
> >
> > MainThread::ERROR::2017-06-19
> >
> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> > Unable to extract HEVM OVF
> > MainThread::ERROR::2017-06-19
> >
> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> > Failed extracting VM OVF from the OVF_STORE volume, falling back to
> > initial vm.conf
> >
> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread cmc
Hi Martin,

>
> just as a random comment, do you still have the database backup from
> the bare metal -> VM attempt? It might be possible to just try again
> using it. Or in the worst case.. update the offending value there
> before restoring it to the new engine instance.

I still have the backup. I'd rather do the latter, as re-running the
HE deployment is quite lengthy and involved (I have to re-initialise
the FC storage each time). Do you know what the offending value(s)
would be? Would it be in the Postgres DB or in a config file
somewhere?

Cheers,

Cam

> Regards
>
> Martin Sivak
>
> On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
>> Hi Yanir,
>>
>> Thanks for the reply.
>>
>>> First of all, maybe a chain reaction of :
>>> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
>>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>> is causing the hosted engine vm not to be set up correctly  and further
>>> actions were made when the hosted engine vm wasnt in a stable state.
>>>
>>> As for now, are you trying to revert back to a previous/initial state ?
>>
>> I'm not trying to revert it to a previous state for now. This was a
>> migration from a bare metal engine, and it didn't report any error
>> during the migration. I'd had some problems on my first attempts at
>> this migration, whereby it never completed (due to a proxy issue) but
>> I managed to resolve this. Do you know of a way to get the Hosted
>> Engine VM into a stable state, without rebuilding the entire cluster
>> from scratch (since I have a lot of VMs on it)?
>>
>> Thanks for any help.
>>
>> Regards,
>>
>> Cam
>>
>>> Regards,
>>> Yanir
>>>
>>> On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:

 Hi Jenny/Martin,

 Any idea what I can do here? The hosted engine VM has no log on any
 host in /var/log/libvirt/qemu, and I fear that if I need to put the
 host into maintenance, e.g., to upgrade it that I created it on (which
 I think is hosting it), or if it fails for any reason, it won't get
 migrated to another host, and I will not be able to manage the
 cluster. It seems to be a very dangerous position to be in.

 Thanks,

 Cam

 On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
 > Thanks Martin. The hosts are all part of the same cluster.
 >
 > I get these errors in the engine.log on the engine:
 >
 > 2017-06-19 03:28:05,030Z WARN
 > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
 > failed for user SYST
 > EM. Reasons:
 > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
 > 2017-06-19 03:28:05,030Z INFO
 > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
 > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
 > 'EngineLock:{exclusiveLocks='[a
 > 79e6b0e-fff4-4cba-a02c-4c00be151300= ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
 > HostedEngine=]',
 > sharedLocks=
 > '[a79e6b0e-fff4-4cba-a02c-4c00be151300= ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
 > 2017-06-19 03:28:05,030Z ERROR
 > [org.ovirt.engine.core.bll.HostedEngineImporter]
 > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
 > Engine VM
 >
 > The sanlock.log reports conflicts on that same host, and a different
 > error on the other hosts, not sure if they are related.
 >
 > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
 > which I deployed the hosted engine VM on:
 >
 > MainThread::ERROR::2017-06-19
 >
 > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
 > Unable to extract HEVM OVF
 > MainThread::ERROR::2017-06-19
 >
 > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
 > Failed extracting VM OVF from the OVF_STORE volume, falling back to
 > initial vm.conf
 >
 > I've seen some of these issues reported in bugzilla, but they were for
 > older versions of oVirt (and appear to be resolved).
 >
 > I will install that package on the other two hosts, for which I will
 > put them in maintenance as vdsm is installed as an upgrade. I guess
 > restarting vdsm is a good idea after that?
 >
 > Thanks,
 >
 > Campbell
 >
 > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
 > wrote:
 >> Hi,
 >>
 >> you do not have to install it on all hosts. But you should have 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread Martin Sivak
Hi,

just as a random comment, do you still have the database backup from
the bare metal -> VM attempt? It might be possible to just try again
using it. Or in the worst case.. update the offending value there
before restoring it to the new engine instance.

Regards

Martin Sivak

On Thu, Jun 22, 2017 at 11:39 AM, cmc  wrote:
> Hi Yanir,
>
> Thanks for the reply.
>
>> First of all, maybe a chain reaction of :
>> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
>> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
>> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>> is causing the hosted engine vm not to be set up correctly  and further
>> actions were made when the hosted engine vm wasnt in a stable state.
>>
>> As for now, are you trying to revert back to a previous/initial state ?
>
> I'm not trying to revert it to a previous state for now. This was a
> migration from a bare metal engine, and it didn't report any error
> during the migration. I'd had some problems on my first attempts at
> this migration, whereby it never completed (due to a proxy issue) but
> I managed to resolve this. Do you know of a way to get the Hosted
> Engine VM into a stable state, without rebuilding the entire cluster
> from scratch (since I have a lot of VMs on it)?
>
> Thanks for any help.
>
> Regards,
>
> Cam
>
>> Regards,
>> Yanir
>>
>> On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>>>
>>> Hi Jenny/Martin,
>>>
>>> Any idea what I can do here? The hosted engine VM has no log on any
>>> host in /var/log/libvirt/qemu, and I fear that if I need to put the
>>> host into maintenance, e.g., to upgrade it that I created it on (which
>>> I think is hosting it), or if it fails for any reason, it won't get
>>> migrated to another host, and I will not be able to manage the
>>> cluster. It seems to be a very dangerous position to be in.
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
>>> > Thanks Martin. The hosts are all part of the same cluster.
>>> >
>>> > I get these errors in the engine.log on the engine:
>>> >
>>> > 2017-06-19 03:28:05,030Z WARN
>>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
>>> > failed for user SYST
>>> > EM. Reasons:
>>> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>>> > 2017-06-19 03:28:05,030Z INFO
>>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>>> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>>> > 'EngineLock:{exclusiveLocks='[a
>>> > 79e6b0e-fff4-4cba-a02c-4c00be151300=>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>>> > HostedEngine=]',
>>> > sharedLocks=
>>> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=>> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>>> > 2017-06-19 03:28:05,030Z ERROR
>>> > [org.ovirt.engine.core.bll.HostedEngineImporter]
>>> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>>> > Engine VM
>>> >
>>> > The sanlock.log reports conflicts on that same host, and a different
>>> > error on the other hosts, not sure if they are related.
>>> >
>>> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
>>> > which I deployed the hosted engine VM on:
>>> >
>>> > MainThread::ERROR::2017-06-19
>>> >
>>> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>> > Unable to extract HEVM OVF
>>> > MainThread::ERROR::2017-06-19
>>> >
>>> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>> > Failed extracting VM OVF from the OVF_STORE volume, falling back to
>>> > initial vm.conf
>>> >
>>> > I've seen some of these issues reported in bugzilla, but they were for
>>> > older versions of oVirt (and appear to be resolved).
>>> >
>>> > I will install that package on the other two hosts, for which I will
>>> > put them in maintenance as vdsm is installed as an upgrade. I guess
>>> > restarting vdsm is a good idea after that?
>>> >
>>> > Thanks,
>>> >
>>> > Campbell
>>> >
>>> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
>>> > wrote:
>>> >> Hi,
>>> >>
>>> >> you do not have to install it on all hosts. But you should have more
>>> >> than one and ideally all hosted engine enabled nodes should belong to
>>> >> the same engine cluster.
>>> >>
>>> >> Best regards
>>> >>
>>> >> Martin Sivak
>>> >>
>>> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc  wrote:
>>> >>> Hi Jenny,
>>> >>>
>>> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts?
>>> >>> Could that be the reason it is failing to see it properly?
>>> 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread cmc
Hi Yanir,

Thanks for the reply.

> First of all, maybe a chain reaction of :
> WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
> ,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> is causing the hosted engine vm not to be set up correctly  and further
> actions were made when the hosted engine vm wasnt in a stable state.
>
> As for now, are you trying to revert back to a previous/initial state ?

I'm not trying to revert it to a previous state for now. This was a
migration from a bare metal engine, and it didn't report any error
during the migration. I'd had some problems on my first attempts at
this migration, whereby it never completed (due to a proxy issue) but
I managed to resolve this. Do you know of a way to get the Hosted
Engine VM into a stable state, without rebuilding the entire cluster
from scratch (since I have a lot of VMs on it)?

Thanks for any help.

Regards,

Cam

> Regards,
> Yanir
>
> On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:
>>
>> Hi Jenny/Martin,
>>
>> Any idea what I can do here? The hosted engine VM has no log on any
>> host in /var/log/libvirt/qemu, and I fear that if I need to put the
>> host into maintenance, e.g., to upgrade it that I created it on (which
>> I think is hosting it), or if it fails for any reason, it won't get
>> migrated to another host, and I will not be able to manage the
>> cluster. It seems to be a very dangerous position to be in.
>>
>> Thanks,
>>
>> Cam
>>
>> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
>> > Thanks Martin. The hosts are all part of the same cluster.
>> >
>> > I get these errors in the engine.log on the engine:
>> >
>> > 2017-06-19 03:28:05,030Z WARN
>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
>> > failed for user SYST
>> > EM. Reasons:
>> > VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
>> > 2017-06-19 03:28:05,030Z INFO
>> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
>> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
>> > 'EngineLock:{exclusiveLocks='[a
>> > 79e6b0e-fff4-4cba-a02c-4c00be151300=> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
>> > HostedEngine=]',
>> > sharedLocks=
>> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300=> > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
>> > 2017-06-19 03:28:05,030Z ERROR
>> > [org.ovirt.engine.core.bll.HostedEngineImporter]
>> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
>> > Engine VM
>> >
>> > The sanlock.log reports conflicts on that same host, and a different
>> > error on the other hosts, not sure if they are related.
>> >
>> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
>> > which I deployed the hosted engine VM on:
>> >
>> > MainThread::ERROR::2017-06-19
>> >
>> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>> > Unable to extract HEVM OVF
>> > MainThread::ERROR::2017-06-19
>> >
>> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>> > Failed extracting VM OVF from the OVF_STORE volume, falling back to
>> > initial vm.conf
>> >
>> > I've seen some of these issues reported in bugzilla, but they were for
>> > older versions of oVirt (and appear to be resolved).
>> >
>> > I will install that package on the other two hosts, for which I will
>> > put them in maintenance as vdsm is installed as an upgrade. I guess
>> > restarting vdsm is a good idea after that?
>> >
>> > Thanks,
>> >
>> > Campbell
>> >
>> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
>> > wrote:
>> >> Hi,
>> >>
>> >> you do not have to install it on all hosts. But you should have more
>> >> than one and ideally all hosted engine enabled nodes should belong to
>> >> the same engine cluster.
>> >>
>> >> Best regards
>> >>
>> >> Martin Sivak
>> >>
>> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc  wrote:
>> >>> Hi Jenny,
>> >>>
>> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts?
>> >>> Could that be the reason it is failing to see it properly?
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Cam
>> >>>
>> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc  wrote:
>>  Hi Jenny,
>> 
>>  Logs are attached. I can see errors in there, but am unsure how they
>>  arose.
>> 
>>  Thanks,
>> 
>>  Campbell
>> 
>>  On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar 
>>  wrote:
>> > From the output it looks like the agent is down, try starting it by
>> > running:

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-22 Thread Yanir Quinn
HI,
First of all, maybe a chain reaction of :
WARN  [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
(org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
failed for user SYSTEM. Reasons: VAR__ACTION__IMPORT
,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
is causing the hosted engine vm not to be set up correctly  and further
actions were made when the hosted engine vm wasnt in a stable state.

As for now, are you trying to revert back to a previous/initial state ?

Regards,
Yanir

On Wed, Jun 21, 2017 at 4:32 PM, cmc  wrote:

> Hi Jenny/Martin,
>
> Any idea what I can do here? The hosted engine VM has no log on any
> host in /var/log/libvirt/qemu, and I fear that if I need to put the
> host into maintenance, e.g., to upgrade it that I created it on (which
> I think is hosting it), or if it fails for any reason, it won't get
> migrated to another host, and I will not be able to manage the
> cluster. It seems to be a very dangerous position to be in.
>
> Thanks,
>
> Cam
>
> On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
> > Thanks Martin. The hosts are all part of the same cluster.
> >
> > I get these errors in the engine.log on the engine:
> >
> > 2017-06-19 03:28:05,030Z WARN
> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> > failed for user SYST
> > EM. Reasons: VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_
> ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> > 2017-06-19 03:28:05,030Z INFO
> > [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> > (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
> > 'EngineLock:{exclusiveLocks='[a
> > 79e6b0e-fff4-4cba-a02c-4c00be151300= > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
> > HostedEngine=]',
> > sharedLocks=
> > '[a79e6b0e-fff4-4cba-a02c-4c00be151300= > ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
> > 2017-06-19 03:28:05,030Z ERROR
> > [org.ovirt.engine.core.bll.HostedEngineImporter]
> > (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
> > Engine VM
> >
> > The sanlock.log reports conflicts on that same host, and a different
> > error on the other hosts, not sure if they are related.
> >
> > And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
> > which I deployed the hosted engine VM on:
> >
> > MainThread::ERROR::2017-06-19
> > 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.
> ovf.ovf_store.OVFStore::(getEngineVMOVF)
> > Unable to extract HEVM OVF
> > MainThread::ERROR::2017-06-19
> > 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.
> hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> > Failed extracting VM OVF from the OVF_STORE volume, falling back to
> > initial vm.conf
> >
> > I've seen some of these issues reported in bugzilla, but they were for
> > older versions of oVirt (and appear to be resolved).
> >
> > I will install that package on the other two hosts, for which I will
> > put them in maintenance as vdsm is installed as an upgrade. I guess
> > restarting vdsm is a good idea after that?
> >
> > Thanks,
> >
> > Campbell
> >
> > On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak 
> wrote:
> >> Hi,
> >>
> >> you do not have to install it on all hosts. But you should have more
> >> than one and ideally all hosted engine enabled nodes should belong to
> >> the same engine cluster.
> >>
> >> Best regards
> >>
> >> Martin Sivak
> >>
> >> On Wed, Jun 21, 2017 at 11:29 AM, cmc  wrote:
> >>> Hi Jenny,
> >>>
> >>> Does ovirt-hosted-engine-ha need to be installed across all hosts?
> >>> Could that be the reason it is failing to see it properly?
> >>>
> >>> Thanks,
> >>>
> >>> Cam
> >>>
> >>> On Mon, Jun 19, 2017 at 1:27 PM, cmc  wrote:
>  Hi Jenny,
> 
>  Logs are attached. I can see errors in there, but am unsure how they
> arose.
> 
>  Thanks,
> 
>  Campbell
> 
>  On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar 
> wrote:
> > From the output it looks like the agent is down, try starting it by
> running:
> > systemctl start ovirt-ha-agent.
> >
> > The engine is supposed to see the hosted engine storage domain and
> import it
> > to the system, then it should import the hosted engine vm.
> >
> > Can you attach the agent log from the host
> > (/var/log/ovirt-hosted-engine-ha/agent.log)
> > and the engine log from the engine vm (/var/log/ovirt-engine/engine.
> log)?
> >
> > Thanks,
> > Jenny
> >
> >
> > On Mon, Jun 19, 2017 at 12:41 PM, cmc  wrote:
> >>
> >>  Hi Jenny,
> >>
> >> > What version are you running?
> >>
> >> 4.1.2.2-1.el7.centos
> >>
> >> > For the 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-21 Thread cmc
Hi Jenny/Martin,

Any idea what I can do here? The hosted engine VM has no log on any
host in /var/log/libvirt/qemu, and I fear that if I need to put the
host into maintenance, e.g., to upgrade it that I created it on (which
I think is hosting it), or if it fails for any reason, it won't get
migrated to another host, and I will not be able to manage the
cluster. It seems to be a very dangerous position to be in.

Thanks,

Cam

On Wed, Jun 21, 2017 at 11:48 AM, cmc  wrote:
> Thanks Martin. The hosts are all part of the same cluster.
>
> I get these errors in the engine.log on the engine:
>
> 2017-06-19 03:28:05,030Z WARN
> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> (org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
> failed for user SYST
> EM. Reasons: 
> VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
> 2017-06-19 03:28:05,030Z INFO
> [org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
> (org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
> 'EngineLock:{exclusiveLocks='[a
> 79e6b0e-fff4-4cba-a02c-4c00be151300= ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>,
> HostedEngine=]',
> sharedLocks=
> '[a79e6b0e-fff4-4cba-a02c-4c00be151300= ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName HostedEngine>]'}'
> 2017-06-19 03:28:05,030Z ERROR
> [org.ovirt.engine.core.bll.HostedEngineImporter]
> (org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
> Engine VM
>
> The sanlock.log reports conflicts on that same host, and a different
> error on the other hosts, not sure if they are related.
>
> And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
> which I deployed the hosted engine VM on:
>
> MainThread::ERROR::2017-06-19
> 13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
> Unable to extract HEVM OVF
> MainThread::ERROR::2017-06-19
> 13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
> Failed extracting VM OVF from the OVF_STORE volume, falling back to
> initial vm.conf
>
> I've seen some of these issues reported in bugzilla, but they were for
> older versions of oVirt (and appear to be resolved).
>
> I will install that package on the other two hosts, for which I will
> put them in maintenance as vdsm is installed as an upgrade. I guess
> restarting vdsm is a good idea after that?
>
> Thanks,
>
> Campbell
>
> On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak  wrote:
>> Hi,
>>
>> you do not have to install it on all hosts. But you should have more
>> than one and ideally all hosted engine enabled nodes should belong to
>> the same engine cluster.
>>
>> Best regards
>>
>> Martin Sivak
>>
>> On Wed, Jun 21, 2017 at 11:29 AM, cmc  wrote:
>>> Hi Jenny,
>>>
>>> Does ovirt-hosted-engine-ha need to be installed across all hosts?
>>> Could that be the reason it is failing to see it properly?
>>>
>>> Thanks,
>>>
>>> Cam
>>>
>>> On Mon, Jun 19, 2017 at 1:27 PM, cmc  wrote:
 Hi Jenny,

 Logs are attached. I can see errors in there, but am unsure how they arose.

 Thanks,

 Campbell

 On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar  wrote:
> From the output it looks like the agent is down, try starting it by 
> running:
> systemctl start ovirt-ha-agent.
>
> The engine is supposed to see the hosted engine storage domain and import 
> it
> to the system, then it should import the hosted engine vm.
>
> Can you attach the agent log from the host
> (/var/log/ovirt-hosted-engine-ha/agent.log)
> and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)?
>
> Thanks,
> Jenny
>
>
> On Mon, Jun 19, 2017 at 12:41 PM, cmc  wrote:
>>
>>  Hi Jenny,
>>
>> > What version are you running?
>>
>> 4.1.2.2-1.el7.centos
>>
>> > For the hosted engine vm to be imported and displayed in the engine, 
>> > you
>> > must first create a master storage domain.
>>
>> To provide a bit more detail: this was a migration of a bare-metal
>> engine in an existing cluster to a hosted engine VM for that cluster.
>> As part of this migration, I built an entirely new host and ran
>> 'hosted-engine --deploy' (followed these instructions:
>>
>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>> I restored the backup from the engine and it completed without any
>> errors. I didn't see any instructions regarding a master storage
>> domain in the page above. The cluster has two existing master storage
>> domains, one is fibre channel, which is up, and one ISO domain, 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-21 Thread cmc
Thanks Martin. The hosts are all part of the same cluster.

I get these errors in the engine.log on the engine:

2017-06-19 03:28:05,030Z WARN
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
(org.ovirt.thread.pool-6-thread-23) [] Validation of action 'ImportVm'
failed for user SYST
EM. Reasons: 
VAR__ACTION__IMPORT,VAR__TYPE__VM,ACTION_TYPE_FAILED_ILLEGAL_VM_DISPLAY_TYPE_IS_NOT_SUPPORTED_BY_OS
2017-06-19 03:28:05,030Z INFO
[org.ovirt.engine.core.bll.exportimport.ImportVmCommand]
(org.ovirt.thread.pool-6-thread-23) [] Lock freed to object
'EngineLock:{exclusiveLocks='[a
79e6b0e-fff4-4cba-a02c-4c00be151300=,
HostedEngine=]',
sharedLocks=
'[a79e6b0e-fff4-4cba-a02c-4c00be151300=]'}'
2017-06-19 03:28:05,030Z ERROR
[org.ovirt.engine.core.bll.HostedEngineImporter]
(org.ovirt.thread.pool-6-thread-23) [] Failed importing the Hosted
Engine VM

The sanlock.log reports conflicts on that same host, and a different
error on the other hosts, not sure if they are related.

And this in the /var/log/ovirt-hosted-engine-ha/agent log on the host
which I deployed the hosted engine VM on:

MainThread::ERROR::2017-06-19
13:09:49,743::ovf_store::124::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
Unable to extract HEVM OVF
MainThread::ERROR::2017-06-19
13:09:49,743::config::445::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
Failed extracting VM OVF from the OVF_STORE volume, falling back to
initial vm.conf

I've seen some of these issues reported in bugzilla, but they were for
older versions of oVirt (and appear to be resolved).

I will install that package on the other two hosts, for which I will
put them in maintenance as vdsm is installed as an upgrade. I guess
restarting vdsm is a good idea after that?

Thanks,

Campbell

On Wed, Jun 21, 2017 at 10:51 AM, Martin Sivak  wrote:
> Hi,
>
> you do not have to install it on all hosts. But you should have more
> than one and ideally all hosted engine enabled nodes should belong to
> the same engine cluster.
>
> Best regards
>
> Martin Sivak
>
> On Wed, Jun 21, 2017 at 11:29 AM, cmc  wrote:
>> Hi Jenny,
>>
>> Does ovirt-hosted-engine-ha need to be installed across all hosts?
>> Could that be the reason it is failing to see it properly?
>>
>> Thanks,
>>
>> Cam
>>
>> On Mon, Jun 19, 2017 at 1:27 PM, cmc  wrote:
>>> Hi Jenny,
>>>
>>> Logs are attached. I can see errors in there, but am unsure how they arose.
>>>
>>> Thanks,
>>>
>>> Campbell
>>>
>>> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar  wrote:
 From the output it looks like the agent is down, try starting it by 
 running:
 systemctl start ovirt-ha-agent.

 The engine is supposed to see the hosted engine storage domain and import 
 it
 to the system, then it should import the hosted engine vm.

 Can you attach the agent log from the host
 (/var/log/ovirt-hosted-engine-ha/agent.log)
 and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)?

 Thanks,
 Jenny


 On Mon, Jun 19, 2017 at 12:41 PM, cmc  wrote:
>
>  Hi Jenny,
>
> > What version are you running?
>
> 4.1.2.2-1.el7.centos
>
> > For the hosted engine vm to be imported and displayed in the engine, you
> > must first create a master storage domain.
>
> To provide a bit more detail: this was a migration of a bare-metal
> engine in an existing cluster to a hosted engine VM for that cluster.
> As part of this migration, I built an entirely new host and ran
> 'hosted-engine --deploy' (followed these instructions:
>
> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
> I restored the backup from the engine and it completed without any
> errors. I didn't see any instructions regarding a master storage
> domain in the page above. The cluster has two existing master storage
> domains, one is fibre channel, which is up, and one ISO domain, which
> is currently offline.
>
> > What do you mean the hosted engine commands are failing? What happens
> > when
> > you run hosted-engine --vm-status now?
>
> Interestingly, whereas when I ran it before, it exited with no output
> and a return code of '1', it now reports:
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : False
> Hostname   : kvm-ldn-03.ldn.fscfc.co.uk
> Host ID: 1
> Engine status  : unknown stale-data
> Score  : 

Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-21 Thread cmc
Hi Jenny,

Does ovirt-hosted-engine-ha need to be installed across all hosts?
Could that be the reason it is failing to see it properly?

Thanks,

Cam

On Mon, Jun 19, 2017 at 1:27 PM, cmc  wrote:
> Hi Jenny,
>
> Logs are attached. I can see errors in there, but am unsure how they arose.
>
> Thanks,
>
> Campbell
>
> On Mon, Jun 19, 2017 at 12:29 PM, Evgenia Tokar  wrote:
>> From the output it looks like the agent is down, try starting it by running:
>> systemctl start ovirt-ha-agent.
>>
>> The engine is supposed to see the hosted engine storage domain and import it
>> to the system, then it should import the hosted engine vm.
>>
>> Can you attach the agent log from the host
>> (/var/log/ovirt-hosted-engine-ha/agent.log)
>> and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)?
>>
>> Thanks,
>> Jenny
>>
>>
>> On Mon, Jun 19, 2017 at 12:41 PM, cmc  wrote:
>>>
>>>  Hi Jenny,
>>>
>>> > What version are you running?
>>>
>>> 4.1.2.2-1.el7.centos
>>>
>>> > For the hosted engine vm to be imported and displayed in the engine, you
>>> > must first create a master storage domain.
>>>
>>> To provide a bit more detail: this was a migration of a bare-metal
>>> engine in an existing cluster to a hosted engine VM for that cluster.
>>> As part of this migration, I built an entirely new host and ran
>>> 'hosted-engine --deploy' (followed these instructions:
>>>
>>> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
>>> I restored the backup from the engine and it completed without any
>>> errors. I didn't see any instructions regarding a master storage
>>> domain in the page above. The cluster has two existing master storage
>>> domains, one is fibre channel, which is up, and one ISO domain, which
>>> is currently offline.
>>>
>>> > What do you mean the hosted engine commands are failing? What happens
>>> > when
>>> > you run hosted-engine --vm-status now?
>>>
>>> Interestingly, whereas when I ran it before, it exited with no output
>>> and a return code of '1', it now reports:
>>>
>>> --== Host 1 status ==--
>>>
>>> conf_on_shared_storage : True
>>> Status up-to-date  : False
>>> Hostname   : kvm-ldn-03.ldn.fscfc.co.uk
>>> Host ID: 1
>>> Engine status  : unknown stale-data
>>> Score  : 0
>>> stopped: True
>>> Local maintenance  : False
>>> crc32  : 0217f07b
>>> local_conf_timestamp   : 2911
>>> Host timestamp : 2897
>>> Extra metadata (valid at timestamp):
>>> metadata_parse_version=1
>>> metadata_feature_version=1
>>> timestamp=2897 (Thu Jun 15 16:22:54 2017)
>>> host-id=1
>>> score=0
>>> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
>>> conf_on_shared_storage=True
>>> maintenance=False
>>> state=AgentStopped
>>> stopped=True
>>>
>>> Yet I can login to the web GUI fine. I guess it is not HA due to being
>>> in an unknown state currently? Does the hosted-engine-ha rpm need to
>>> be installed across all nodes in the cluster, btw?
>>>
>>> Thanks for the help,
>>>
>>> Cam
>>>
>>> >
>>> > Jenny Tokar
>>> >
>>> >
>>> > On Thu, Jun 15, 2017 at 6:32 PM, cmc  wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> I've migrated from a bare-metal engine to a hosted engine. There were
>>> >> no errors during the install, however, the hosted engine did not get
>>> >> started. I tried running:
>>> >>
>>> >> hosted-engine --status
>>> >>
>>> >> on the host I deployed it on, and it returns nothing (exit code is 1
>>> >> however). I could not ping it either. So I tried starting it via
>>> >> 'hosted-engine --vm-start' and it returned:
>>> >>
>>> >> Virtual machine does not exist
>>> >>
>>> >> But it then became available. I logged into it successfully. It is not
>>> >> in the list of VMs however.
>>> >>
>>> >> Any ideas why the hosted-engine commands fail, and why it is not in
>>> >> the list of virtual machines?
>>> >>
>>> >> Thanks for any help,
>>> >>
>>> >> Cam
>>> >> ___
>>> >> Users mailing list
>>> >> Users@ovirt.org
>>> >> http://lists.ovirt.org/mailman/listinfo/users
>>> >
>>> >
>>
>>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-19 Thread Evgenia Tokar
>From the output it looks like the agent is down, try starting it by
running: systemctl start ovirt-ha-agent.

The engine is supposed to see the hosted engine storage domain and import
it to the system, then it should import the hosted engine vm.

Can you attach the agent log from the host
(/var/log/ovirt-hosted-engine-ha/agent.log)
and the engine log from the engine vm (/var/log/ovirt-engine/engine.log)?

Thanks,
Jenny


On Mon, Jun 19, 2017 at 12:41 PM, cmc  wrote:

>  Hi Jenny,
>
> > What version are you running?
>
> 4.1.2.2-1.el7.centos
>
> > For the hosted engine vm to be imported and displayed in the engine, you
> > must first create a master storage domain.
>
> To provide a bit more detail: this was a migration of a bare-metal
> engine in an existing cluster to a hosted engine VM for that cluster.
> As part of this migration, I built an entirely new host and ran
> 'hosted-engine --deploy' (followed these instructions:
> http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_
> Metal_to_an_EL-Based_Self-Hosted_Environment/).
> I restored the backup from the engine and it completed without any
> errors. I didn't see any instructions regarding a master storage
> domain in the page above. The cluster has two existing master storage
> domains, one is fibre channel, which is up, and one ISO domain, which
> is currently offline.
>
> > What do you mean the hosted engine commands are failing? What happens
> when
> > you run hosted-engine --vm-status now?
>
> Interestingly, whereas when I ran it before, it exited with no output
> and a return code of '1', it now reports:
>
> --== Host 1 status ==--
>
> conf_on_shared_storage : True
> Status up-to-date  : False
> Hostname   : kvm-ldn-03.ldn.fscfc.co.uk
> Host ID: 1
> Engine status  : unknown stale-data
> Score  : 0
> stopped: True
> Local maintenance  : False
> crc32  : 0217f07b
> local_conf_timestamp   : 2911
> Host timestamp : 2897
> Extra metadata (valid at timestamp):
> metadata_parse_version=1
> metadata_feature_version=1
> timestamp=2897 (Thu Jun 15 16:22:54 2017)
> host-id=1
> score=0
> vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
> conf_on_shared_storage=True
> maintenance=False
> state=AgentStopped
> stopped=True
>
> Yet I can login to the web GUI fine. I guess it is not HA due to being
> in an unknown state currently? Does the hosted-engine-ha rpm need to
> be installed across all nodes in the cluster, btw?
>
> Thanks for the help,
>
> Cam
>
> >
> > Jenny Tokar
> >
> >
> > On Thu, Jun 15, 2017 at 6:32 PM, cmc  wrote:
> >>
> >> Hi,
> >>
> >> I've migrated from a bare-metal engine to a hosted engine. There were
> >> no errors during the install, however, the hosted engine did not get
> >> started. I tried running:
> >>
> >> hosted-engine --status
> >>
> >> on the host I deployed it on, and it returns nothing (exit code is 1
> >> however). I could not ping it either. So I tried starting it via
> >> 'hosted-engine --vm-start' and it returned:
> >>
> >> Virtual machine does not exist
> >>
> >> But it then became available. I logged into it successfully. It is not
> >> in the list of VMs however.
> >>
> >> Any ideas why the hosted-engine commands fail, and why it is not in
> >> the list of virtual machines?
> >>
> >> Thanks for any help,
> >>
> >> Cam
> >> ___
> >> Users mailing list
> >> Users@ovirt.org
> >> http://lists.ovirt.org/mailman/listinfo/users
> >
> >
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-19 Thread cmc
 Hi Jenny,

> What version are you running?

4.1.2.2-1.el7.centos

> For the hosted engine vm to be imported and displayed in the engine, you
> must first create a master storage domain.

To provide a bit more detail: this was a migration of a bare-metal
engine in an existing cluster to a hosted engine VM for that cluster.
As part of this migration, I built an entirely new host and ran
'hosted-engine --deploy' (followed these instructions:
http://www.ovirt.org/documentation/self-hosted/chap-Migrating_from_Bare_Metal_to_an_EL-Based_Self-Hosted_Environment/).
I restored the backup from the engine and it completed without any
errors. I didn't see any instructions regarding a master storage
domain in the page above. The cluster has two existing master storage
domains, one is fibre channel, which is up, and one ISO domain, which
is currently offline.

> What do you mean the hosted engine commands are failing? What happens when
> you run hosted-engine --vm-status now?

Interestingly, whereas when I ran it before, it exited with no output
and a return code of '1', it now reports:

--== Host 1 status ==--

conf_on_shared_storage : True
Status up-to-date  : False
Hostname   : kvm-ldn-03.ldn.fscfc.co.uk
Host ID: 1
Engine status  : unknown stale-data
Score  : 0
stopped: True
Local maintenance  : False
crc32  : 0217f07b
local_conf_timestamp   : 2911
Host timestamp : 2897
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=2897 (Thu Jun 15 16:22:54 2017)
host-id=1
score=0
vm_conf_refresh_time=2911 (Thu Jun 15 16:23:08 2017)
conf_on_shared_storage=True
maintenance=False
state=AgentStopped
stopped=True

Yet I can login to the web GUI fine. I guess it is not HA due to being
in an unknown state currently? Does the hosted-engine-ha rpm need to
be installed across all nodes in the cluster, btw?

Thanks for the help,

Cam

>
> Jenny Tokar
>
>
> On Thu, Jun 15, 2017 at 6:32 PM, cmc  wrote:
>>
>> Hi,
>>
>> I've migrated from a bare-metal engine to a hosted engine. There were
>> no errors during the install, however, the hosted engine did not get
>> started. I tried running:
>>
>> hosted-engine --status
>>
>> on the host I deployed it on, and it returns nothing (exit code is 1
>> however). I could not ping it either. So I tried starting it via
>> 'hosted-engine --vm-start' and it returned:
>>
>> Virtual machine does not exist
>>
>> But it then became available. I logged into it successfully. It is not
>> in the list of VMs however.
>>
>> Any ideas why the hosted-engine commands fail, and why it is not in
>> the list of virtual machines?
>>
>> Thanks for any help,
>>
>> Cam
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HostedEngine VM not visible, but running

2017-06-18 Thread Evgenia Tokar
Hi,

What version are you running?

For the hosted engine vm to be imported and displayed in the engine, you
must first create a master storage domain.

What do you mean the hosted engine commands are failing? What happens when
you run hosted-engine --vm-status now?

Jenny Tokar


On Thu, Jun 15, 2017 at 6:32 PM, cmc  wrote:

> Hi,
>
> I've migrated from a bare-metal engine to a hosted engine. There were
> no errors during the install, however, the hosted engine did not get
> started. I tried running:
>
> hosted-engine --status
>
> on the host I deployed it on, and it returns nothing (exit code is 1
> however). I could not ping it either. So I tried starting it via
> 'hosted-engine --vm-start' and it returned:
>
> Virtual machine does not exist
>
> But it then became available. I logged into it successfully. It is not
> in the list of VMs however.
>
> Any ideas why the hosted-engine commands fail, and why it is not in
> the list of virtual machines?
>
> Thanks for any help,
>
> Cam
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] HostedEngine VM not visible, but running

2017-06-15 Thread cmc
Hi,

I've migrated from a bare-metal engine to a hosted engine. There were
no errors during the install, however, the hosted engine did not get
started. I tried running:

hosted-engine --status

on the host I deployed it on, and it returns nothing (exit code is 1
however). I could not ping it either. So I tried starting it via
'hosted-engine --vm-start' and it returned:

Virtual machine does not exist

But it then became available. I logged into it successfully. It is not
in the list of VMs however.

Any ideas why the hosted-engine commands fail, and why it is not in
the list of virtual machines?

Thanks for any help,

Cam
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users