Re: [Openstack] HA Compute & Instance Evacuation
Hi Torin, >> Do you have a timetable on when the patch will be merged? If it is a >> relatively small window of time, I would rather wait to use >> the patched mainline code. You should be able to test masakari successfully as below three patches are already merged. 1. https://review.openstack.org/#/c/546492/15 - openstack/masakari-monitors (it doesn't use masakariclient any more) 2. https://review.openstack.org/#/c/567781/ - openstack/requirements (openstacksdk lower constraints updated to 0.13.0) 3. https://review.openstack.org/#/c/536653/ - openstack/masakari (change service-type from "ha" to "instance-ha". If you are planning to install Openstack using latest devstack, then it will install openstacksdk 0.13.0 by default. No need to take any further action by yourself otherwise you need to ensure that you have correct version of openstacksdk (0.13.0) and also add masakari endpoint to use the correct service-type. Recommend to install latest masakari using devstack. 4. https://review.openstack.org/#/c/557634/2 - python-masakariclient (This patch needs to be merged ASAP) If you are planning to use python-masakariclient to create failover segments or add hosts etc, then you will need to wait until this patch is merged. We need to update this patch to add correct version of openstacksdk in requirements.txt. We will merge this particular patch by tomorrow. But if you plan to add failover segment/hosts by calling RestFul API using curl or any other method, then probably you won't face any issues. Regards, Tushar Patil From: Torin Woltjer <torin.wolt...@granddial.com> Sent: Friday, May 11, 2018 11:46:05 PM To: Patil, Tushar Cc: jpetr...@coredial.com; openstack@lists.openstack.org Subject: Re: [Openstack] HA Compute & Instance Evacuation On Friday, May 11, 2018 12:40:58 AM EDT Patil, Tushar wrote: > I think this is what is needed to make it work. > Install openstacksdk version 0.13.0. > > Apply patch: https://review.openstack.org/#/c/546492/ > > In this patch ,we need to bump openstacksdk version from 0.11.2 to 0.13.0. > We will merge above patch soon. Do you have a timetable on when the patch will be merged? If it is a relatively small window of time, I would rather wait to use the patched mainline code. Otherwise, I am willing to try to work with the patch. Additionally, patching python is something that I am not familiar with. Is there a good resource on doing this? You have been a great help so far, thanks again. Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
On Friday, May 11, 2018 12:40:58 AM EDT Patil, Tushar wrote: > I think this is what is needed to make it work. > Install openstacksdk version 0.13.0. > > Apply patch: https://review.openstack.org/#/c/546492/ > > In this patch ,we need to bump openstacksdk version from 0.11.2 to 0.13.0. > We will merge above patch soon. Do you have a timetable on when the patch will be merged? If it is a relatively small window of time, I would rather wait to use the patched mainline code. Otherwise, I am willing to try to work with the patch. Additionally, patching python is something that I am not familiar with. Is there a good resource on doing this? You have been a great help so far, thanks again. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
Hi Torin, Presently, masakari-monitors is completely broken. Extremely sorry for the inconvenience. I think this is what is needed to make it work. Install openstacksdk version 0.13.0. Apply patch: https://review.openstack.org/#/c/546492/ In this patch ,we need to bump openstacksdk version from 0.11.2 to 0.13.0. We will merge above patch soon. Regards, Tushar Patil From: Torin Woltjer <torin.wolt...@granddial.com> Sent: Thursday, May 10, 2018 11:08:58 PM To: Patil, Tushar Cc: jpetr...@coredial.com; openstack@lists.openstack.org Subject: Re: [Openstack] HA Compute & Instance Evacuation Hi Tushar, I followed the documentation to set up the masakari monitors, after I installed the masakari API. None of the monitor services seem to work. I keep getting an error: "AttributeError: 'module' object has no attribute 'URI'" Here is the full output: http://paste.openstack.org/show/720761/ Are you aware of what causes the issue? Can you provide any example configs for a working masakari setup? On Sunday, May 6, 2018 10:41:48 PM EDT Patil, Tushar wrote: > Hi Torin, > > Masakari supports 4 different types of recovery methods at the time of > creation of failover_segment. > > 1. auto: It will let nova decide on which compute host the instances should > be evacuated. > > 2. reserved_host: You will first need to add reserved hosts to the failover > segments. Masakari engine will select the first available reserved host > from the failover segment, enable compute service in nova and then use that > reserved host to evacuate the instances from the failed compute host. > > 3. auto_priority: it will first try to evacuate instances using 'auto' > recovery method, if it's fails then it attempts to evacuate using > "reserved_host" recovery method. > > 4. rh_priority: It's opposite of above "auto_priority" recovery method. it > will first try to evacuate instances using 'reserved_host' recovery method, > if it's fails then it attempts to evacuate using "auto" recovery method. > > In your case you will need to use "auto" recovery method. > > Please refer to the below documentation links for more details. > > Masakari system architecture: > https://docs.openstack.org/masakari/latest/ > > Masakari api-ref: > https://developer.openstack.org/api-ref/instance-ha/ > > To install masakari-monitors with pacemaker/corosync: > https://review.openstack.org/#/c/489095/6/doc/source/install_and_configure_d > ebian.rst > > Other ways to reach us: Masakari weekly meeting on #openstack-meeting IRC > channel on every Tuesday at 0400 UTC or else you can post your queries on > #openstack-masakari IRC channel. > > Regards, > Tushar Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
+++ Torin Woltjer [02/05/18 20:39 +]: There is no HA behaviour for compute nodes. You are referring to HA of workloads running on compute nodes, not HA of compute nodes themselves. It was a mistake for me to say HA when referring to compute and instances. Really I want to avoid a situation where one of my compute hosts gives up the ghost, and all of the instances are offline until someone reboots them on a different host. I would like them to automatically reboot on a healthy compute node. Check out Masakari: https://wiki.openstack.org/wiki/Masakari This looks like the kind of thing I'm searching for. I'm seeing 3 components here, I'm assuming one goes on compute hosts and one or both of the others go on the control nodes? Is there any documentation outlining the procedure for deploying this? Will there be any problem running the Masakari API service on 2 machines simultaneously, sitting behind HAProxy? Check for 'Instance HA': https://blueprints.launchpad.net/tripleo/+spec/instance-ha Which more or less came with: https://github.com/beekhof/osp-ha-deploy/blob/master/pcmk/compute-managed.scenario https://github.com/beekhof/osp-ha-deploy/blob/master/pcmk/controller-managed.scenario Ansible scripts are at git://github.com/redhat-openstack/tripleo-quickstart-utils And enabled via: ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \ -e release="RELEASE" This of course requires a valid HA deployment setup on the controllers (usually tripleO or OSP Director). Regards, Pablo ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -- Pablo Iranzo Gómez (pablo.ira...@redhat.com) GnuPG: 0x5BD8E1E4 Principal Software Maintenance Engineer - OpenStackiranzo @ IRC RHC{A,SS,DS,VA,E,SA,SP,AOSP}, JBCAA#110-215-852RHCA Level V Blog: https://iranzo.github.io Citellus: https://citellus.org signature.asc Description: PGP signature ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
Hi Torin, Masakari supports 4 different types of recovery methods at the time of creation of failover_segment. 1. auto: It will let nova decide on which compute host the instances should be evacuated. 2. reserved_host: You will first need to add reserved hosts to the failover segments. Masakari engine will select the first available reserved host from the failover segment, enable compute service in nova and then use that reserved host to evacuate the instances from the failed compute host. 3. auto_priority: it will first try to evacuate instances using 'auto' recovery method, if it's fails then it attempts to evacuate using "reserved_host" recovery method. 4. rh_priority: It's opposite of above "auto_priority" recovery method. it will first try to evacuate instances using 'reserved_host' recovery method, if it's fails then it attempts to evacuate using "auto" recovery method. In your case you will need to use "auto" recovery method. Please refer to the below documentation links for more details. Masakari system architecture: https://docs.openstack.org/masakari/latest/ Masakari api-ref: https://developer.openstack.org/api-ref/instance-ha/ To install masakari-monitors with pacemaker/corosync: https://review.openstack.org/#/c/489095/6/doc/source/install_and_configure_debian.rst Other ways to reach us: Masakari weekly meeting on #openstack-meeting IRC channel on every Tuesday at 0400 UTC or else you can post your queries on #openstack-masakari IRC channel. Regards, Tushar From: Torin Woltjer <torin.wolt...@granddial.com> Sent: Saturday, May 5, 2018 3:43:05 AM To: jpetr...@coredial.com Cc: openstack@lists.openstack.org Subject: Re: [Openstack] HA Compute & Instance Evacuation Thank you very much for the information. Just for clarification, when you say reserved hosts, do you mean that I must keep unloaded virtualization hosts in reserve? Or can Masakari move instances from a downed host to an already loaded host that has open capacity? Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
Thank you very much for the information. Just for clarification, when you say reserved hosts, do you mean that I must keep unloaded virtualization hosts in reserve? Or can Masakari move instances from a downed host to an already loaded host that has open capacity? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
Take this with a grain of salt because we're using the original version before the project moved under the Big Tent and I'm not sure how much it's evolved since then. I assume the basic functions are the same though. You're correct; Corosync and Pacemaker are used to determine if a compute node goes down. The masakari-host-monitor process runs on each compute node and checks the cluster status and sends a notification to masakari-controller when a node goes down. The controller process keeps a list of reserved hosts in it's database and calls nova host-evacuate to move the Instances to one of the reserved hosts. In our environment I also configured STONITH and I'd highly recommend it. With STONITH Pacemaker sends a shutdown command to the Out of Band Management card of the unreachable node to make sure that it can't come back and cause a conflict. There are two other components, masakari-process-monitor and masakari-instance-monitor. These also run on your compute nodes. The former watches the nova-compute service and the later monitors running instances and restarts them if necessary. Looking here it seems they've split Masakari into thee different repos: https://github.com/openstack?utf8=%E2%9C%93=masakari== masakari - The controller service and API masakari-monitors - Compute node monitoring services python-masakari-client - The cli tools ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
I'm vaguely familiar with Pacemaker/Corosync, as I'm using it with HAProxy on my controller nodes. I'm assuming in this instance that you use Pacemaker on your compute hosts so masakari can detect host outages? If possible could you go into more detail about the configuration? I would like to use Masakari and I'm having trouble finding a step by step or other documentation to get started with. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
> There is no HA behaviour for compute nodes. > > You are referring to HA of workloads running on compute nodes, not HA of > compute nodes themselves. It was a mistake for me to say HA when referring to compute and instances. Really I want to avoid a situation where one of my compute hosts gives up the ghost, and all of the instances are offline until someone reboots them on a different host. I would like them to automatically reboot on a healthy compute node. > Check out Masakari: > > https://wiki.openstack.org/wiki/Masakari This looks like the kind of thing I'm searching for. I'm seeing 3 components here, I'm assuming one goes on compute hosts and one or both of the others go on the control nodes? Is there any documentation outlining the procedure for deploying this? Will there be any problem running the Masakari API service on 2 machines simultaneously, sitting behind HAProxy? ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
We're using the original Masakari project for this and it works really well. In fact just last week we lost a compute node and all of VM's were successfully migrated to a reserve host in under 5 minutes. It's a really nice feeling when your infrastructure heals itself before you even get a chance to start troubleshooting. It does require a good deal of configuration to get it up and running, especially the clustering with Pacemaker/Corosync so be prepared to get familiar with those tools and STONITH if you're not already. Worth it if some of your infrastructure doesn't have redundancy built in at higher level. ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: [Openstack] HA Compute & Instance Evacuation
On 05/02/2018 02:43 PM, Torin Woltjer wrote: I am working on setting up Openstack for HA and one of the last orders of business is getting HA behavior out of the compute nodes. There is no HA behaviour for compute nodes. Is there a project that will automatically evacuate instances from a downed or failed compute host, and automatically reboot them on their new host? Check out Masakari: https://wiki.openstack.org/wiki/Masakari I'm curious what suggestions people have about this, or whatever advice you might have. Is there a best way of getting this functionality, or anything else I should be aware of? You are referring to HA of workloads running on compute nodes, not HA of compute nodes themselves. My advice would be to install Kubernetes on one or more VMs (with the VMs acting as Kubernetes nodes) and use that project's excellent orchestrator for daemonsets/statefulsets which is essentially the use case you are describing. The OpenStack Compute API (implemented in Nova) is not an orchestration API. It's a low-level infrastructure API for executing basic actions on compute resources. Best, -jay ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
[Openstack] HA Compute & Instance Evacuation
I am working on setting up Openstack for HA and one of the last orders of business is getting HA behavior out of the compute nodes. Is there a project that will automatically evacuate instances from a downed or failed compute host, and automatically reboot them on their new host? I'm curious what suggestions people have about this, or whatever advice you might have. Is there a best way of getting this functionality, or anything else I should be aware of? Thanks, ___ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack