Re: Recover VM after KVM host down (and HA not working) ?

2017-12-27 Thread Jean-Francois Nadeau
Hmm could this be the culprit ?

WARN  [c.c.h.KVMInvestigator] (AgentTaskPool-10:ctx-694feb6c)
(logid:160220c5) Agent investigation was requested on host
Host[-4-Routing], but host does not support investigation because it has no
NFS storage. Skipping investigation.

The primary storage is NFS.

On Sat, Dec 23, 2017 at 10:14 AM, Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> Clearly the management server doesn't realize the instance on the failed
> host is not running...  but the host is in Alert state and powered down,
> and missing NFS heartbeats.
>
> 2017-12-23 14:57:52,427 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-694feb6c)
> (logid:160220c5) Transition:[Resource state = Enabled, Agent event =
> AgentDisconnected, Host id = 4, name = r62-i122-36-01.domain.com]
> 2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
> 2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
> (CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
> host 4
>
> Next step ?
>
> On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
> the.jfnad...@gmail.com> wrote:
>
>> I'd really like to get at the bottom of this.It does sound like the
>> behavior mentioned in https://issues.apache.org/j
>> ira/browse/CLOUDSTACK-5582 but should be long fixed.
>>
>> One suspect log entry (be unrelated) I noticed is this recurring
>> exception in the manger logs :
>>
>> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
>> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>>
>> Which I guess is caused by the use of an external DHCP so manager fails
>> to determine a running VM IP.Which brings me to my next question
>> how is a VM marked for HA actually monitored ?
>>
>>
>> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green 
>> wrote:
>>
>>> If all else fails, change its state to the correct  state in the MySQL
>>> database and restart the management  service. Sadly that is the only way
>>> I
>>> could do it when my Cloudstack got confused and stuck an instance in an
>>> intermediate state where I couldn't do anything with it.
>>>
>>> On Dec 22, 2017 at 9:09 AM, >> the.jfnad...@gmail.com>>
>>> wrote:
>>>
>>> Good morning,
>>>
>>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>>
>>> Im trying to recover VMs after an host failure (powered off from OOB).
>>>
>>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>>> advanced mode with vlan separation and created a shared network with no
>>> services since I wish to use an external DHCP.
>>>
>>> First,  say I don't have a compute offering with HA enabled and a KVM
>>> host
>>> goes down...  I can't put it in maintenance mode while down and disabling
>>> it have no effect on the state of the lost VMs.  VM stays in running
>>> state
>>> according to manager.   What should I do to force restart on remaining
>>> healthy hosts ?
>>>
>>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>>> with a compute offering with HA enabled.   Same result.  Manager do see
>>> the
>>> host as disconnected and powered off but take no action.   I certainly
>>> miss
>>> something here.  Please help !
>>>
>>> Regards,
>>>
>>> Jean-Francois
>>>
>>
>>
>


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-23 Thread Jean-Francois Nadeau
Clearly the management server doesn't realize the instance on the failed
host is not running...  but the host is in Alert state and powered down,
and missing NFS heartbeats.

2017-12-23 14:57:52,427 DEBUG [c.c.h.Status]
(AgentTaskPool-10:ctx-694feb6c) (logid:160220c5) Transition:[Resource state
= Enabled, Agent event = AgentDisconnected, Host id = 4, name =
r62-i122-36-01.domain.com]
2017-12-23 14:58:24,487 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 1 VMs on host 4
2017-12-23 14:58:24,495 DEBUG [c.c.c.CapacityManagerImpl]
(CapacityChecker:ctx-66fbe484) (logid:1f53cd63) Found 0 VM, not running on
host 4

Next step ?

On Sat, Dec 23, 2017 at 9:49 AM, Jean-Francois Nadeau <
the.jfnad...@gmail.com> wrote:

> I'd really like to get at the bottom of this.It does sound like the
> behavior mentioned in https://issues.apache.org/
> jira/browse/CLOUDSTACK-5582 but should be long fixed.
>
> One suspect log entry (be unrelated) I noticed is this recurring exception
> in the manger logs :
>
> ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
> (logid:16dd70ad) Caught the Exception in VmIpFetchTask
>
> Which I guess is caused by the use of an external DHCP so manager fails to
> determine a running VM IP.Which brings me to my next question how
> is a VM marked for HA actually monitored ?
>
>
> On Sat, Dec 23, 2017 at 3:38 AM, Eric Green 
> wrote:
>
>> If all else fails, change its state to the correct  state in the MySQL
>> database and restart the management  service. Sadly that is the only way I
>> could do it when my Cloudstack got confused and stuck an instance in an
>> intermediate state where I couldn't do anything with it.
>>
>> On Dec 22, 2017 at 9:09 AM, > >>
>> wrote:
>>
>> Good morning,
>>
>> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>>
>> Im trying to recover VMs after an host failure (powered off from OOB).
>>
>> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
>> advanced mode with vlan separation and created a shared network with no
>> services since I wish to use an external DHCP.
>>
>> First,  say I don't have a compute offering with HA enabled and a KVM host
>> goes down...  I can't put it in maintenance mode while down and disabling
>> it have no effect on the state of the lost VMs.  VM stays in running state
>> according to manager.   What should I do to force restart on remaining
>> healthy hosts ?
>>
>> Then I enabled  IPMI on all KVM hosts and attempted the same experience
>> with a compute offering with HA enabled.   Same result.  Manager do see
>> the
>> host as disconnected and powered off but take no action.   I certainly
>> miss
>> something here.  Please help !
>>
>> Regards,
>>
>> Jean-Francois
>>
>
>


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-23 Thread Jean-Francois Nadeau
I'd really like to get at the bottom of this.It does sound like the
behavior mentioned in https://issues.apache.org/jira/browse/CLOUDSTACK-5582
but should be long fixed.

One suspect log entry (be unrelated) I noticed is this recurring exception
in the manger logs :

ERROR [c.c.v.UserVmManagerImpl] (UserVm-ipfetch-3:ctx-d4c44c2b)
(logid:16dd70ad) Caught the Exception in VmIpFetchTask

Which I guess is caused by the use of an external DHCP so manager fails to
determine a running VM IP.Which brings me to my next question how
is a VM marked for HA actually monitored ?


On Sat, Dec 23, 2017 at 3:38 AM, Eric Green 
wrote:

> If all else fails, change its state to the correct  state in the MySQL
> database and restart the management  service. Sadly that is the only way I
> could do it when my Cloudstack got confused and stuck an instance in an
> intermediate state where I couldn't do anything with it.
>
> On Dec 22, 2017 at 9:09 AM,  >>
> wrote:
>
> Good morning,
>
> New to ACS and doing a POC with 4.10 on Centos 7 and KVM.
>
> Im trying to recover VMs after an host failure (powered off from OOB).
>
> Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
> advanced mode with vlan separation and created a shared network with no
> services since I wish to use an external DHCP.
>
> First,  say I don't have a compute offering with HA enabled and a KVM host
> goes down...  I can't put it in maintenance mode while down and disabling
> it have no effect on the state of the lost VMs.  VM stays in running state
> according to manager.   What should I do to force restart on remaining
> healthy hosts ?
>
> Then I enabled  IPMI on all KVM hosts and attempted the same experience
> with a compute offering with HA enabled.   Same result.  Manager do see the
> host as disconnected and powered off but take no action.   I certainly miss
> something here.  Please help !
>
> Regards,
>
> Jean-Francois
>


Re: Recover VM after KVM host down (and HA not working) ?

2017-12-23 Thread Eric Green
If all else fails, change its state to the correct  state in the MySQL
database and restart the management  service. Sadly that is the only way I
could do it when my Cloudstack got confused and stuck an instance in an
intermediate state where I couldn't do anything with it.

On Dec 22, 2017 at 9:09 AM, >
wrote:

Good morning,

New to ACS and doing a POC with 4.10 on Centos 7 and KVM.

Im trying to recover VMs after an host failure (powered off from OOB).

Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
advanced mode with vlan separation and created a shared network with no
services since I wish to use an external DHCP.

First,  say I don't have a compute offering with HA enabled and a KVM host
goes down...  I can't put it in maintenance mode while down and disabling
it have no effect on the state of the lost VMs.  VM stays in running state
according to manager.   What should I do to force restart on remaining
healthy hosts ?

Then I enabled  IPMI on all KVM hosts and attempted the same experience
with a compute offering with HA enabled.   Same result.  Manager do see the
host as disconnected and powered off but take no action.   I certainly miss
something here.  Please help !

Regards,

Jean-Francois


Recover VM after KVM host down (and HA not working) ?

2017-12-22 Thread Jean-Francois Nadeau
Good morning,

New to ACS and doing a POC with 4.10 on Centos 7 and KVM.

Im trying to recover VMs after an host failure (powered off from OOB).

Primary storage is NFS and IPMI is configured for the KVM hosts.  Zone is
advanced mode with vlan separation and created a shared network with no
services since I wish to use an external DHCP.

First,  say I don't have a compute offering with HA enabled and a KVM host
goes down...  I can't put it in maintenance mode while down and disabling
it have no effect on the state of the lost VMs.  VM stays in running state
according to manager.   What should I do to force restart on remaining
healthy hosts ?

Then I enabled  IPMI on all KVM hosts and attempted the same experience
with a compute offering with HA enabled.   Same result.  Manager do see the
host as disconnected and powered off but take no action.   I certainly miss
something here.  Please help !

Regards,

Jean-Francois