Re: Cloudstack installation on Ubuntu Xenial

2018-03-15 Thread Daniel Coric
Hello Rohit,

I'm glad you've noticed the thread. Thank you for clearance.

It is definitely reproducible with the 4.11.0.0 and Ubuntu Xenial (16.4.04) - 
unfortunately I did't save any of the logs.

In the process of adding the host, I couldn't authenticate with the "root" user 
(the default value of "PermitRootLogin" in /etc/ssh/sshd_config is 
"prohibit-password" - I simply overlooked that fact) so I used "sudoer" user 
and disabled strictness.

After adding the host that way there were none of the keystore/certificate 
releted files in the /etc/cloudstack/agent directory (only agent.properties 
environment.properties and log4j-cloud.xml). I had to use provisionCertificate 
API to generate those.

Regards
Daniel

On 2018/03/15 11:56:43, Rohit Yadav  wrote: 
> Hi Daniel,
> 
> 
> After you added the Ubuntu hosts, does it have cloud.jks at 
> /etc/cloudstack/agent? Can you confirm any errors seen during addition of KVM 
> host to the Ubuntu based management server?
> 
> 
> The log:
> 
> 2018-03-12 20:44:03,787 WARN  [utils.nio.Link] (main:null) (logid:) Failed to 
> load keystore, using trust all manager
> 
> 
> Suggests that your KVM host failed to be secured (i.e. have the keystore jks 
> file setup) which could be due to several reasons. Can you check/confirm that 
> the user used to add the Ubuntu based KVM host was indeed 'root'. A sudoer 
> user may fail to add/create a jks/keystore file if it does not have access in 
> the /etc/cloudstack/agent directory.
> 
> 
> Furthermore, once the agent is up, with the auth strictness setting set to 
> false, you can re-attempt at re-securing your KVM host using the 
> provisionCertificate API and pass it a host id. However, if you can reproduce 
> the issue that fresh addition of KVM host fails to secure the host (i.e. 
> create the certificates and jks file) that indeed is an issue.
> 
> 
> A similar issue was recently fixed and will make into 4.11.1.0:
> 
> https://github.com/apache/cloudstack/pull/2454 (with this fix, addHost will 
> also fail in case it fails to secure the KVM host)
> 
> 
> - Rohit
> 
> 
> 
> 
> 
> 
> From: Daniel Coric 
> Sent: Thursday, March 15, 2018 2:03:36 AM
> To: users@cloudstack.apache.org
> Subject: Re: Cloudstack installation on Ubuntu Xenial
> 
> Hello Rafael,
> 
> I'm aware of it, thank you. I also assumed that there could be some problem 
> with it, that's why I shared a link (second one) in my first post, hopping 
> that someone could confirm me that assumption.
> 
> After I have set ca.plugin.root.auth.strictness to false everything worked 
> just fine - although it shouldn't be needed to do that for freshly installed 
> environments.
> 
> At least it was not needed on the CentoOS. The CA framework did "kick in" (as 
> the article says) and has done his job.
> 
> Regards
> Daniel Coric
> 
> On 2018/03/14 00:48:11, Rafael Weingärtner  
> wrote:
> > Looking at the logs you provided looks like something wrong with the
> > certificate used to secure communication with your KVM agent. I am not
> > familiar with KVM and ACS. I know however, that there is a CA pluging that
> > can issue and install certificates on hosts. Have you tried that?
> >
> 
> rohit.ya...@shapeblue.com 
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>   
>  
> 
> > On Tue, Mar 13, 2018 at 5:07 PM, Daniel Coric  wrote:
> >
> > > Hello Rafael,
> > >
> > > Thank you for your response.
> > >
> > > I really did nothing except installing CS on a fresh installed Ubuntu VM -
> > > as I did it on the CentOS. On the CentOS everything worked out of the box 
> > > -
> > > on the Ubuntu problems.
> > >
> > > I tried to install it from different package repositories (community,
> > > ShapeBlue, self-built), compared and followed Ubuntu specific installation
> > > instructions from two different sources (ACS, ShapeBlue) every time same
> > > errors in agent.log.
> > >
> > > So, I would rather say that there is something wrong either with the
> > > source or Ubuntu - but, as the first time CS user I could be wrong, of
> > > course.
> > >
> > > Regards
> > > Daniel Coric
> > >
> > > On 2018/03/13 18:43:46, Rafael Weingärtner 
> > > 
> > > wrote:
> > > > The MySQL thing is only a warning and should not cause problems in your
> > > > POC. The other is an error. There is something wrong with your agent's
> > > > configurations/deployment.
> > > >
> > > > On Mon, Mar 12, 2018 at 9:57 PM, Daniel Coric  wrote:
> > > >
> > > > > Hello Everyone,
> > > > >
> > > > > I'm getting myself familiar with CloudStack so please excuse if I have
> > > > > overlooked something obvious.
> > > > >
> > > > > Using build and install instructions from the official documentation I
> > > > > have managed to successfully install CloudStack 4.11 on the neasted
> > > CentOS
> > > > > 7.4 KVM (from both community provided package repositories and
> > > self-built
> > > > > packages).
> > > > >
> > 

Re: KVM HostHA

2018-03-15 Thread Jon Marshall
Hi Parth


Thanks for that.


I am a beginner too when it comes to this.


Am currently rebuilding so will update this thread when I have retested


Jon



From: Parth Patel 
Sent: 15 March 2018 14:37
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

Hi Jon,

I have to admit that I have a beginner/mediocre understanding of cloudstack
overall (especially the host HA feature). But what works for me should work
for everyone. So, to answer your questions:

1) how many compute nodes do you have
>

I have tested using three agents as when using only two nodes, management
server deemed one node which was running system VMs and router as unfit for
migration and stopped the VM. I currently use one node for execution of
system VMs and router, and two agents (compute nodes you can say) out of
which one is running a  HA-enabled VM and one agent running 0 VMs running
as I only have 4GB ram in each of those :| I use one machine (fourth one)
for running management server and MySQL database. I also have the 5th
machine separate purely for NFS. Although, you can easily have management,
MySQL and NFS setup on the same machine (depends on your machine's
configuration/capacity)

>
>
> 2) are you running basic or advanced networking
>

I am using basic (flat) networking where my management IP addresses range
from 172
16.4.131 to 172.16.4.137 and guest IP addresses range from 172.16.4.138 to
172.16.4.149. Both are on a /24 network.

>
> 3) how have you setup your NICs ie. on each compute node I have 3 separate
> NICs, one for management, one for the VMs and one for storage (NFS).
>

I only have 1 NIC per machine (same is used for all 3 types of traffic). I
have seen management server use peer routing from other agents to perform
some operations in my XenCluster but I highly doubt this would be the case
your management server does not mark a host as "Down" (as I said I don't
know about internal working of Cloudstack but just a guess as I've seen in
management server logs) I suggest you remove all three NICs of a host for
simulating my scenario.

>
>
> So far I have not managed to get any failover of VMs no matter what I try
>

I also recommend you update your qemu-kvm, NFS and other packages (there
has just been a recent update for CentOS 7) (again I know this is
superstitious but still, sometimes different package versions have been
known to be the root cause of the issue)

Side note: my ACS 4.11 agent auto reboots itself after it has retried
communicating with management server 4 times, at almost the exact same time
management server decides in its logs that the host and HA-enabled VM has
stopped executing and it restarts the HA-enabled VM on another host.


Hope this helps.

Regards,
Parth Patel.

>
>
> 
> From: Parth Patel 
> Sent: 14 March 2018 14:36
> To: users@cloudstack.apache.org
> Subject: Re: KVM HostHA
>
> Hi Paul and Adrina,
>
> I don't know the functioning of Host-HA features but what Paul explained,
> my ACS 4.11 does the same without even host HA or ipmi access. As I stated
> earlier multiple times, without host HA and ipmi, my ha-enabled VMs
> executing on a normal host get restarted on another suitable host in
> cluster after approximately 3 minutes of event ping timeout. After which
> the cloudstack agent with no connection to management server because of
> unplugged NIC (all my machines currently have only one NIC / whole zone is
> in a flat network) reboots itself (the reason was explained by Rohit in an
> another thread). The management server marks the host down and only
> Ha-enabled VMs executing on it get restarted on another host (without any
> mention of host HA or ipmi or fencing in management server logs) while
> normal VMs executing on it are stopped.
>
> I don't know if this was a desired outcome, but I think my current ACS 4.11
> installation has features (at least performs some ;) provided by Host HA
> without configuring it or ipmi.
>
> Regards,
> Parth Patel
>
> On Wed 14 Mar, 2018, 18:41 Boris Stoyanov, 
> wrote:
>
> > yes, KVM + NFS shared storage.
> >
> > Boris.
> >
> >
> > boris.stoya...@shapeblue.com
> > www.shapeblue.com
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]

Shapeblue - The CloudStack Company
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<
> http://www.shapeblue.com/>
>
> Shapeblue - The CloudStack Company
> www.shapeblue.com
> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a
> framework developed by ShapeBlue to deliver the rapid deployment of a
> standardised ...
>
>
>
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapebl

Re: KVM HostHA

2018-03-15 Thread Parth Patel
Hi Jon,

I have to admit that I have a beginner/mediocre understanding of cloudstack
overall (especially the host HA feature). But what works for me should work
for everyone. So, to answer your questions:

1) how many compute nodes do you have
>

I have tested using three agents as when using only two nodes, management
server deemed one node which was running system VMs and router as unfit for
migration and stopped the VM. I currently use one node for execution of
system VMs and router, and two agents (compute nodes you can say) out of
which one is running a  HA-enabled VM and one agent running 0 VMs running
as I only have 4GB ram in each of those :| I use one machine (fourth one)
for running management server and MySQL database. I also have the 5th
machine separate purely for NFS. Although, you can easily have management,
MySQL and NFS setup on the same machine (depends on your machine's
configuration/capacity)

>
>
> 2) are you running basic or advanced networking
>

I am using basic (flat) networking where my management IP addresses range
from 172
16.4.131 to 172.16.4.137 and guest IP addresses range from 172.16.4.138 to
172.16.4.149. Both are on a /24 network.

>
> 3) how have you setup your NICs ie. on each compute node I have 3 separate
> NICs, one for management, one for the VMs and one for storage (NFS).
>

I only have 1 NIC per machine (same is used for all 3 types of traffic). I
have seen management server use peer routing from other agents to perform
some operations in my XenCluster but I highly doubt this would be the case
your management server does not mark a host as "Down" (as I said I don't
know about internal working of Cloudstack but just a guess as I've seen in
management server logs) I suggest you remove all three NICs of a host for
simulating my scenario.

>
>
> So far I have not managed to get any failover of VMs no matter what I try
>

I also recommend you update your qemu-kvm, NFS and other packages (there
has just been a recent update for CentOS 7) (again I know this is
superstitious but still, sometimes different package versions have been
known to be the root cause of the issue)

Side note: my ACS 4.11 agent auto reboots itself after it has retried
communicating with management server 4 times, at almost the exact same time
management server decides in its logs that the host and HA-enabled VM has
stopped executing and it restarts the HA-enabled VM on another host.


Hope this helps.

Regards,
Parth Patel.

>
>
> 
> From: Parth Patel 
> Sent: 14 March 2018 14:36
> To: users@cloudstack.apache.org
> Subject: Re: KVM HostHA
>
> Hi Paul and Adrina,
>
> I don't know the functioning of Host-HA features but what Paul explained,
> my ACS 4.11 does the same without even host HA or ipmi access. As I stated
> earlier multiple times, without host HA and ipmi, my ha-enabled VMs
> executing on a normal host get restarted on another suitable host in
> cluster after approximately 3 minutes of event ping timeout. After which
> the cloudstack agent with no connection to management server because of
> unplugged NIC (all my machines currently have only one NIC / whole zone is
> in a flat network) reboots itself (the reason was explained by Rohit in an
> another thread). The management server marks the host down and only
> Ha-enabled VMs executing on it get restarted on another host (without any
> mention of host HA or ipmi or fencing in management server logs) while
> normal VMs executing on it are stopped.
>
> I don't know if this was a desired outcome, but I think my current ACS 4.11
> installation has features (at least performs some ;) provided by Host HA
> without configuring it or ipmi.
>
> Regards,
> Parth Patel
>
> On Wed 14 Mar, 2018, 18:41 Boris Stoyanov, 
> wrote:
>
> > yes, KVM + NFS shared storage.
> >
> > Boris.
> >
> >
> > boris.stoya...@shapeblue.com
> > www.shapeblue.com
> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<
> http://www.shapeblue.com/>
>
> Shapeblue - The CloudStack Company
> www.shapeblue.com
> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a
> framework developed by ShapeBlue to deliver the rapid deployment of a
> standardised ...
>
>
>
> > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue
> >
> >
> >
> > > On 14 Mar 2018, at 14:51, Andrija Panic 
> wrote:
> > >
> > > Hi Boris,
> > >
> > > ok thanks for the explanation - that makes sense, and covers my
> > "exception
> > > case" that I have.
> > >
> > > This is atm only available for NFS as I could read (KVM on NFS) ?
> > >
> > > Cheers
> > >
> > > On 14 March 2018 at 13:02, Boris Stoyanov <
> boris.stoya...@shapeblue.com>
> > > wrote:
> > >
> > >> Hi Andrija,
> > >>
> > >> There’s two types of checks Host-HA is doing to determine if host if
> > >> healthy.
> > >>
> > >> 1. Health checks - pings the host as soon as there’s connection issues
> > >> with the agent
> > >>
> > >> If

Re: Change VPC CIDR - and some Mailing List issues

2018-03-15 Thread Rafael Weingärtner
Can people review this PR https://github.com/apache/cloudstack-www/pull/43.
It has to do with the mailing list search mechanism

On Wed, Mar 7, 2018 at 11:30 AM, Andrija Panic 
wrote:

> root@r-5015-VM:~# grep -ir "10.128.0.0/18" /etc/ ### this is VPC CIDR
>
> /etc/iptables/router_rules.v4:-A INPUT -s 10.128.64.0/18 -d 10.128.0.0/18
> -j MARK --set-xmark 0x524/0x
> /etc/iptables/router_rules.v4:-A FORWARD -s 10.128.64.0/18 -d
> 10.128.0.0/18
> -j MARK --set-xmark 0x524/0x
> /etc/iptables/router_rules.v4:-A FORWARD -s 10.128.0.0/18 -d
> 10.128.64.0/18
> -j MARK --set-xmark 0x525/0x
> /etc/iptables/router_rules.v4:-A OUTPUT -s 10.128.0.0/18 -d 10.128.64.0/18
> -j MARK --set-xmark 0x525/0x
> /etc/iptables/router_rules.v4:-A FORWARD -s 10.128.0.0/18 ! -d
> 10.128.0.0/18
> -j ACCEPT
> /etc/ipsec.d/ipsec.vpn-185.39.XXX.YYY.conf: leftsubnet=10.128.0.0/18
> /etc/cloudstack/cmdline.json:"vpccidr": "10.128.0.0/18"
> /etc/cloudstack/site2sitevpn.json:"local_guest_cidr": "
> 10.128.0.0/18
> ",
>
> So just restart VPC and be safe better than sorry :)
>
> Cheers
>
> On 7 March 2018 at 14:21,  wrote:
>
> > Hi,
> >
> > As far as I know, when creating a site 2 site VPN, you can only specify
> > the remote networks. The local network is always set to the whole VPC
> CIDR.
> > Or am I wrong?
> >
> > Regards
> > Daniel
> >
> > On 07.03.18, 12:39, "Rafael Weingärtner" 
> > wrote:
> >
> > I agree with you. I was not aware of that link in ACS website. I
> > already
> > created a task for myself to fix that.
> >
> > I thought the VPC CIDR was used only as a logical value internally in
> > ACS.
> > However, as you pointed out, you can create a VPN to the whole VPC.
> > Then,
> > yes, a restart would be required.
> >
> >
> > On Wed, Mar 7, 2018 at 8:33 AM, 
> > wrote:
> >
> > > Hi,
> > >
> > > Maybe we could link to the Apache search system at the page listing
> > the
> > > Cloudstack Mailing-Lists: https://cloudstack.apache.org/
> > mailing-lists.html
> > >
> > > If you click on the list there, you get to
> > http://mail-archives.apache.
> > > org/mod_mbox/cloudstack-users/. Then there is markmail linked and
> > the
> > > https://lists.apache.org/list.html?users@cloudstack.apache.org
> link
> > you
> > > shared (which btw looks best to me, thanks).
> > >
> > > The tiers are going to stay as they are currently. I guess the CIDR
> > is
> > > used in the Strongswan VPN configuration as local network, so I
> > guess a
> > > restart might be required.
> > >
> > > Other thoughts?
> > >
> > > Thanks
> > > Daniel
> > >
> > > On 07.03.18, 12:25, "Rafael Weingärtner" <
> > rafaelweingart...@gmail.com>
> > > wrote:
> > >
> > > MarkMail is not an Apache's system. If you want an Apache's
> > system to
> > > search mailing lists you can use:
> > > https://lists.apache.org/list.html?d...@cloudstack.apache.org.
> > >
> > > Do you intend on changing the Tiers CIDR as well? If it is only
> > the
> > > VPC,
> > > you might not even need to restart with a cleanup. Of course,
> it
> > is
> > > always
> > > a good practice to test before applying in production.
> > >
> > > On Wed, Mar 7, 2018 at 8:07 AM,  > fraunhofer.de>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > >
> > > >
> > > > First of all: when trying to search the lists on MarkMail (
> > > > https://cloudstack.apache.org/mailing-lists.html) I get a
> > warning
> > > that
> > > > the entered information will be transmitted insecurely (no
> > HTTPs).
> > > If I
> > > > accept that, MarkMail redirects back to HTTPs but does not
> > present a
> > > valid
> > > > certificate (unknown issuer, Firefox 58.0.2
> > > >
> > > >
> > > >
> > > > Now, to the question:
> > > >
> > > >
> > > >
> > > > We have a VPC with a pretty large CIDR (172.19.0.0/16),
> which
> > > however
> > > > only has tiers in the upper half (172.19.128.0/17). We now
> > would
> > > like to
> > > > reduce the VPC CIDR. Is it safe to edit this in the database
> > and
> > > then do a
> > > > VPC restart with cleanup? Anything else to consider?
> > > >
> > > >
> > > >
> > > > We use VPN s2s tunnel, so I guess we need to change the
> remote
> > > subnet on
> > > > the other VPN endpoints, but other than that?
> > > >
> > > >
> > > >
> > > > Is it possible like that, any problems to expect?
> > > >
> > > >
> > > >
> > > > Thanks and regards
> > > >
> > > > Daniel
> > >
> > >
> >
> >
> > --
> > Rafael Weingärtner
> >
> >
>
>
> --
>
> Andrija Panić
>



-- 
Rafael Weingärtner


Re: Cloudstack installation on Ubuntu Xenial

2018-03-15 Thread Rohit Yadav
Hi Daniel,


After you added the Ubuntu hosts, does it have cloud.jks at 
/etc/cloudstack/agent? Can you confirm any errors seen during addition of KVM 
host to the Ubuntu based management server?


The log:

2018-03-12 20:44:03,787 WARN  [utils.nio.Link] (main:null) (logid:) Failed to 
load keystore, using trust all manager


Suggests that your KVM host failed to be secured (i.e. have the keystore jks 
file setup) which could be due to several reasons. Can you check/confirm that 
the user used to add the Ubuntu based KVM host was indeed 'root'. A sudoer user 
may fail to add/create a jks/keystore file if it does not have access in the 
/etc/cloudstack/agent directory.


Furthermore, once the agent is up, with the auth strictness setting set to 
false, you can re-attempt at re-securing your KVM host using the 
provisionCertificate API and pass it a host id. However, if you can reproduce 
the issue that fresh addition of KVM host fails to secure the host (i.e. create 
the certificates and jks file) that indeed is an issue.


A similar issue was recently fixed and will make into 4.11.1.0:

https://github.com/apache/cloudstack/pull/2454 (with this fix, addHost will 
also fail in case it fails to secure the KVM host)


- Rohit






From: Daniel Coric 
Sent: Thursday, March 15, 2018 2:03:36 AM
To: users@cloudstack.apache.org
Subject: Re: Cloudstack installation on Ubuntu Xenial

Hello Rafael,

I'm aware of it, thank you. I also assumed that there could be some problem 
with it, that's why I shared a link (second one) in my first post, hopping that 
someone could confirm me that assumption.

After I have set ca.plugin.root.auth.strictness to false everything worked just 
fine - although it shouldn't be needed to do that for freshly installed 
environments.

At least it was not needed on the CentoOS. The CA framework did "kick in" (as 
the article says) and has done his job.

Regards
Daniel Coric

On 2018/03/14 00:48:11, Rafael Weingärtner  wrote:
> Looking at the logs you provided looks like something wrong with the
> certificate used to secure communication with your KVM agent. I am not
> familiar with KVM and ACS. I know however, that there is a CA pluging that
> can issue and install certificates on hosts. Have you tried that?
>

rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

> On Tue, Mar 13, 2018 at 5:07 PM, Daniel Coric  wrote:
>
> > Hello Rafael,
> >
> > Thank you for your response.
> >
> > I really did nothing except installing CS on a fresh installed Ubuntu VM -
> > as I did it on the CentOS. On the CentOS everything worked out of the box -
> > on the Ubuntu problems.
> >
> > I tried to install it from different package repositories (community,
> > ShapeBlue, self-built), compared and followed Ubuntu specific installation
> > instructions from two different sources (ACS, ShapeBlue) every time same
> > errors in agent.log.
> >
> > So, I would rather say that there is something wrong either with the
> > source or Ubuntu - but, as the first time CS user I could be wrong, of
> > course.
> >
> > Regards
> > Daniel Coric
> >
> > On 2018/03/13 18:43:46, Rafael Weingärtner 
> > wrote:
> > > The MySQL thing is only a warning and should not cause problems in your
> > > POC. The other is an error. There is something wrong with your agent's
> > > configurations/deployment.
> > >
> > > On Mon, Mar 12, 2018 at 9:57 PM, Daniel Coric  wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > > I'm getting myself familiar with CloudStack so please excuse if I have
> > > > overlooked something obvious.
> > > >
> > > > Using build and install instructions from the official documentation I
> > > > have managed to successfully install CloudStack 4.11 on the neasted
> > CentOS
> > > > 7.4 KVM (from both community provided package repositories and
> > self-built
> > > > packages).
> > > >
> > > > I have tried some of the basic operations like: uploading iso images,
> > > > adding volumes and users, creating templates, creating and using VMs
> > (both
> > > > as admin and user) etc.
> > > > As far as I can tell, everything worked as expected - except the fact
> > that
> > > > CentOS VM took about half an hour to shut down.
> > > >
> > > > Than I decided to give it a try on Ubuntu too. And indeed, Ubuntu
> > 16.04.4
> > > > VM shut down normally.
> > > >
> > > > But, that was also the only thing that worked as expected on that
> > Ubuntu
> > > > VM.
> > > >
> > > > I have tried to find some solution on internet but the closest I could
> > get
> > > > was this thread:
> > > > https://www.mail-archive.com/users@cloudstack.apache.org/msg22578.html
> > > > and this documentation:
> > > > http://docs.cloudstack.apache.org/projects/cloudstack-
> > > > administration/en/latest/hosts.html#security
> > > >
> > > > And I'm not even sure if I am on the right path to the solution - any
> > > > assistance would be mu

Re: KVM HostHA

2018-03-15 Thread Jon Marshall
Hi Parth


Can I just ask a few questions -


1) how many compute nodes do you have


2) are you running basic or advanced networking


3) how have you setup your NICs ie. on each compute node I have 3 separate 
NICs, one for management, one for the VMs and one for storage (NFS).


So far I have not managed to get any failover of VMs no matter what I try


Jon



From: Parth Patel 
Sent: 14 March 2018 14:36
To: users@cloudstack.apache.org
Subject: Re: KVM HostHA

Hi Paul and Adrina,

I don't know the functioning of Host-HA features but what Paul explained,
my ACS 4.11 does the same without even host HA or ipmi access. As I stated
earlier multiple times, without host HA and ipmi, my ha-enabled VMs
executing on a normal host get restarted on another suitable host in
cluster after approximately 3 minutes of event ping timeout. After which
the cloudstack agent with no connection to management server because of
unplugged NIC (all my machines currently have only one NIC / whole zone is
in a flat network) reboots itself (the reason was explained by Rohit in an
another thread). The management server marks the host down and only
Ha-enabled VMs executing on it get restarted on another host (without any
mention of host HA or ipmi or fencing in management server logs) while
normal VMs executing on it are stopped.

I don't know if this was a desired outcome, but I think my current ACS 4.11
installation has features (at least performs some ;) provided by Host HA
without configuring it or ipmi.

Regards,
Parth Patel

On Wed 14 Mar, 2018, 18:41 Boris Stoyanov, 
wrote:

> yes, KVM + NFS shared storage.
>
> Boris.
>
>
> boris.stoya...@shapeblue.com
> www.shapeblue.com
[http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]

Shapeblue - The CloudStack Company
www.shapeblue.com
Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a 
framework developed by ShapeBlue to deliver the rapid deployment of a 
standardised ...



> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
> > On 14 Mar 2018, at 14:51, Andrija Panic  wrote:
> >
> > Hi Boris,
> >
> > ok thanks for the explanation - that makes sense, and covers my
> "exception
> > case" that I have.
> >
> > This is atm only available for NFS as I could read (KVM on NFS) ?
> >
> > Cheers
> >
> > On 14 March 2018 at 13:02, Boris Stoyanov 
> > wrote:
> >
> >> Hi Andrija,
> >>
> >> There’s two types of checks Host-HA is doing to determine if host if
> >> healthy.
> >>
> >> 1. Health checks - pings the host as soon as there’s connection issues
> >> with the agent
> >>
> >> If that fails,
> >>
> >> 2. Activity checks - checks if there are any writing operations on the
> >> Disks of the VMs that are running on the hosts. This is to determine if
> the
> >> VMs are actually alive and executing processes. Only if no disk
> operations
> >> are executed on the shared storage, only then it’s trying to Recover the
> >> host with IPMI call, if that eventually fails, it migrates the VMs to a
> >> healthy host and Fences the faulty one.
> >>
> >> Hope that explains your case.
> >>
> >> Boris.
> >>
> >>
> >> boris.stoya...@shapeblue.com
> >> www.shapeblue.com
> >> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> >> @shapeblue
> >>
> >>
> >>
> >>> On 14 Mar 2018, at 13:53, Andrija Panic 
> wrote:
> >>>
> >>> Hi Paul,
> >>>
> >>> sorry to bump in the middle of the thread, but just curious about the
> >> idea
> >>> behing host-HA and why it behaves the way you exlained above:
> >>>
> >>>
> >>> Would it be more sense (or not?), that when MGMT detects agents is
> >>> unreachable or host unreachable (or after unsuccessful i.e. agent
> >> restart,
> >>> etc...,to be defined), to actually use IPMI to STONITH the node, thus
> >>> making sure no VMS running and then to really start all HA-enabled VMs
> on
> >>> other hosts ?
> >>>
> >>> I'm just trying to make parallel to the corosync/pacemaker as
> clustering
> >>> suite/services in Linux (RHEL and others), where when majority of nodes
> >>> detect that one node is down, a common thing (especially for shared
> >>> storage) is to STONITH that node, make sure it;s down, then move
> >> "resource"
> >>> (in our case VMs) to other cluster nodes ?
> >>>
> >>> I see it's  actually much broader setup per
> >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA but
> >> again -
> >>> whole idea (in my head at least...) is when host get's down, we make
> sure
> >>> it's down (avoid VM corruption, by doint STONITH to that node) and then
> >>> start HA VMs on ohter hosts.
> >>>
> >>> I understand there might be exceptions as I have right now (4.8) -
> >> libvirt
> >>> get stuck (librbd exception or similar) so agent get's disconnected,
> but
> >>> VMs are still running fine... (except DB get messed up, all NICs loose
> >>> isolation_uri, VR's loose MAC addresses and