Re: [OpenStack-Infra] Work toward a translations checksite and call for help
On Mon, Aug 1, 2016 at 12:12 PM, Jeremy Stanleywrote: > I'm hesitant to rely on unstack/clean/stack working consistently > over time, though maybe others have seen them behave more reliably > than I think they do. I had assumed we'd replace with fresh servers > each time and bootstrap DevStack from scratch, though perhaps that's > overkill? This is what I was assuming as well, since we'd need a fresh version of DevStack itself each time so the latest translations cleanly apply. It would be hard to track all the changes by just doing unstack/clean/fresh DevStack clone/stack, even if it was reliable over time (my experience has also been that it's not). I also learned the other day that the rejoin_stack.sh script was largely unmaintained and removed in Mitaka, so any reboots cause you to have to run unstack/clean/stack again, which is worthy to consider as we discuss snapshots. -- Elizabeth Krumbach Joseph || Lyz || pleia2 ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
Am 2016-07-25 19:05, schrieb Jeremy Stanley: On 2016-07-25 11:08:35 +0530 (+0530), Vipul Nayyar wrote: Honestly, I was also thinking that using containers for implementing blue/green deployment would be best for implementing minimal downtime. I suggest having a basic run-through of this idea with the community over tomorrow's irc meeting should be a good start. Waving containers at the problem doesn't really solve the fundamental issue at hand (we could just as easily use DNS or an Apache redirect to switch between virtual machines, possibly more easily since we already have existing mechanisms for deploying and replacing virtual machines). The issue that needs addressing first, I think, is how to get new DevStack deployments from master branch tip of all projects to work consistently at each rebuild interval or, more likely, to design a pattern that avoids replacing a working deployment with a broken one along with some means to find out that redeployment is failing so that it can effectively be troubleshot post-mortem. Hi Jeremy, broken DevStack installation - that's the point. With LXD container you can take snapshot, run unstack or clean script, fetch new code and stack again. If it failed you can restore the snapshot and try new installation on another day. Without snapshot you can start new container with new code and shutdown the old one. So I like the idea with haproxy in front but wouldn't change any DNS entries because it takes time for end-users. If you have enough resources then we can work with 3 VMs: 2 DevStack installation with translation check-site and one with haproxy hosting the public FQDN and a kind of trigger to refresh the installation on the DevStack VM _if_ the other VM is up. If the other DevStack service is down, the trigger should try an unstack/clean/stack after one day and switch over if the service is up. This could be done with lb-update (https://www.haproxy.com/doc/aloha/7.0/haproxy/lbupdate.html) or haproxy API. The process should have a small monitoring about the status. kind regards Frank ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
Oh, hahaha, I thought the dns.py was actually doing something. Now that I see the script I know what you mean :-). 2016-08-01 17:10 GMT+02:00 Jeremy Stanley: > On 2016-08-01 16:46:07 +0200 (+0200), Ricardo Carrillo Cruz wrote: > > In my mind, I thought set_dns would be really an ansible wrapper to > > system-config launch/dns.py script. > [...] > > There's a reason why that script only tells you what commands to > run, and doesn't run them for you. At least that way we can still > assert that we're not writing automation to communicate with > Rackspace's (proprietary, non-free, nonstandard, non-OpenStack) DNS > API if a sysadmin has to manually run commands to update records > through it. Then it's no worse on a philosophical level than using a > Web browser to make DNS changes through their similarly proprietary > dashboard site. > -- > Jeremy Stanley > > ___ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
On 2016-08-01 16:46:07 +0200 (+0200), Ricardo Carrillo Cruz wrote: > In my mind, I thought set_dns would be really an ansible wrapper to > system-config launch/dns.py script. [...] There's a reason why that script only tells you what commands to run, and doesn't run them for you. At least that way we can still assert that we're not writing automation to communicate with Rackspace's (proprietary, non-free, nonstandard, non-OpenStack) DNS API if a sysadmin has to manually run commands to update records through it. Then it's no worse on a philosophical level than using a Web browser to make DNS changes through their similarly proprietary dashboard site. -- Jeremy Stanley ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
In my mind, I thought set_dns would be really an ansible wrapper to system-config launch/dns.py script. But yeah, putting the switch on what's Devstack latest and what's not on an Apache reverse proxy works too. The workflow would be similar to what I depicted. I think the biggest issue is that DevStack really gives a lot of problems when you try to stack/unstack , so long-lived servers are asking for trouble here. 2016-08-01 16:36 GMT+02:00 Jeremy Stanley: > On 2016-08-01 16:08:49 +0200 (+0200), Ricardo Carrillo Cruz wrote: > [...] > > The set DNS task would check a file on the puppetmaster which contains > the > > state of blue/green DNS records (translate-latest.openstack.org > pointing to > > translate_a and translate-soon-to-be-deleted.openstack.org pointing to > > translate_b or viceversa) and would only run in case any of the preceding > > create_server tasks did anything. > [...] > > Problem is we can't (okay, shouldn't) automate DNS changes while > we're relying on Rackspace's DNS service, since it's not using a > standard OpenStack API and we really don't want to write additional > tooling to it. > > As mentioned in my earlier E-mail, a simple alternative is to just > update a HTTP 302 (temporary) redirect or a rewrite/proxy to the > "live" deployment in an Apache vhost on static.openstack.org or > perhaps update a persistent haproxy pool. Proxying rather than > redirecting probably makes the most sense as we can avoid presenting > IP-address-based URLs to the consumer (and if we're forced to deploy > with TLS then we might be able to stabilize a solution for that at > the proxy too). > -- > Jeremy Stanley > > ___ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
On 2016-08-01 16:08:49 +0200 (+0200), Ricardo Carrillo Cruz wrote: [...] > The set DNS task would check a file on the puppetmaster which contains the > state of blue/green DNS records (translate-latest.openstack.org pointing to > translate_a and translate-soon-to-be-deleted.openstack.org pointing to > translate_b or viceversa) and would only run in case any of the preceding > create_server tasks did anything. [...] Problem is we can't (okay, shouldn't) automate DNS changes while we're relying on Rackspace's DNS service, since it's not using a standard OpenStack API and we really don't want to write additional tooling to it. As mentioned in my earlier E-mail, a simple alternative is to just update a HTTP 302 (temporary) redirect or a rewrite/proxy to the "live" deployment in an Apache vhost on static.openstack.org or perhaps update a persistent haproxy pool. Proxying rather than redirecting probably makes the most sense as we can avoid presenting IP-address-based URLs to the consumer (and if we're forced to deploy with TLS then we might be able to stabilize a solution for that at the proxy too). -- Jeremy Stanley ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
How about something like a playbook that runs on puppetmaster periodically doing something like this: create_server_translate_a create_server_translate_b set_dns The create_server_translate tasks would be idempotent, i.e. they won't leak servers. The set DNS task would check a file on the puppetmaster which contains the state of blue/green DNS records (translate-latest.openstack.org pointing to translate_a and translate-soon-to-be-deleted.openstack.org pointing to translate_b or viceversa) and would only run in case any of the preceding create_server tasks did anything. Then at $DAYS, we have a cron task that deletes whatever server is blue (or green, none of those colors are my favorites :-), swapping A is Blue/B is green or viceversa. The main play from above for recreating them would pick up and create a new server and do the needful from DNS perspective. Thoughts? 2016-07-25 19:05 GMT+02:00 Jeremy Stanley: > On 2016-07-25 11:08:35 +0530 (+0530), Vipul Nayyar wrote: > > Honestly, I was also thinking that using containers for implementing > > blue/green deployment would be best for implementing minimal downtime. I > > suggest having a basic run-through of this idea with the community over > > tomorrow's irc meeting should be a good start. > > Waving containers at the problem doesn't really solve the > fundamental issue at hand (we could just as easily use DNS or an > Apache redirect to switch between virtual machines, possibly more > easily since we already have existing mechanisms for deploying and > replacing virtual machines). The issue that needs addressing first, > I think, is how to get new DevStack deployments from master branch > tip of all projects to work consistently at each rebuild interval > or, more likely, to design a pattern that avoids replacing a working > deployment with a broken one along with some means to find out that > redeployment is failing so that it can effectively be troubleshot > post-mortem. > -- > Jeremy Stanley > > ___ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
On 2016-07-25 11:08:35 +0530 (+0530), Vipul Nayyar wrote: > Honestly, I was also thinking that using containers for implementing > blue/green deployment would be best for implementing minimal downtime. I > suggest having a basic run-through of this idea with the community over > tomorrow's irc meeting should be a good start. Waving containers at the problem doesn't really solve the fundamental issue at hand (we could just as easily use DNS or an Apache redirect to switch between virtual machines, possibly more easily since we already have existing mechanisms for deploying and replacing virtual machines). The issue that needs addressing first, I think, is how to get new DevStack deployments from master branch tip of all projects to work consistently at each rebuild interval or, more likely, to design a pattern that avoids replacing a working deployment with a broken one along with some means to find out that redeployment is failing so that it can effectively be troubleshot post-mortem. -- Jeremy Stanley ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
Hey Frank, Honestly, I was also thinking that using containers for implementing blue/green deployment would be best for implementing minimal downtime. I suggest having a basic run-through of this idea with the community over tomorrow's irc meeting should be a good start. Regards Vipul Nayyar On Wed, Jul 20, 2016 at 8:15 PM, Frank Kloekerwrote: > Am 2016-07-11 14:59, schrieb Vipul Nayyar: > >> Hey Elizabeth, >> >> I'd like to contribute. :-) >> >> I have some past deployment and Ops experience and I'm really >> interested in building something of a blue green deployment system >> here, to decrease the downtime. Although, I'm still going through the >> infra related docs which I'm fairly new to, but with a little bit of >> guidance early on, I'll be happy to take over some responsibilities >> over time. >> >> Maybe a good place for me to start might be, to have a deep look at >> the puppet module written by Frank and probably noting down the most >> common errors that are encountered regularly. I'd like to hear more >> concrete thoughts from the community about how to proceed on this, if >> any. >> > > Welcome Vipul, > > no big prefaces, I'd like the idea with blue/green deployment because we > have to bridge downtime when DevStack is re-installing, requirement is once > a week (day). And we have to pick a way return if DevStack installation > failed. The reason for this is more DevStack specific because we want to > use master branch with the newest changes. > I have gained some experience with LXD containter and want to push the > topic a little bit forward. The draft of my idea is here: > https://github.com/eumel8/translation_checksite/blob/container/translation_check_container.jpg > There are 2 container with DevStack installation + translation checksite. > In front of the container is some magic, called Watchdog for installing the > stuff and guarding the installation. Traffic will be route to the last > available container version. Container installation is a little bit > described here: > http://docs.openstack.org/developer/devstack/guides/lxc.html But needs to > adapt for LXD 2.0. > And we have to persuade the infra team to provide 16.04 VM :-) > Let me know what do you think. > > kind regards > > Frank > > ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
Am 2016-07-11 14:59, schrieb Vipul Nayyar: Hey Elizabeth, I'd like to contribute. :-) I have some past deployment and Ops experience and I'm really interested in building something of a blue green deployment system here, to decrease the downtime. Although, I'm still going through the infra related docs which I'm fairly new to, but with a little bit of guidance early on, I'll be happy to take over some responsibilities over time. Maybe a good place for me to start might be, to have a deep look at the puppet module written by Frank and probably noting down the most common errors that are encountered regularly. I'd like to hear more concrete thoughts from the community about how to proceed on this, if any. Welcome Vipul, no big prefaces, I'd like the idea with blue/green deployment because we have to bridge downtime when DevStack is re-installing, requirement is once a week (day). And we have to pick a way return if DevStack installation failed. The reason for this is more DevStack specific because we want to use master branch with the newest changes. I have gained some experience with LXD containter and want to push the topic a little bit forward. The draft of my idea is here: https://github.com/eumel8/translation_checksite/blob/container/translation_check_container.jpg There are 2 container with DevStack installation + translation checksite. In front of the container is some magic, called Watchdog for installing the stuff and guarding the installation. Traffic will be route to the last available container version. Container installation is a little bit described here: http://docs.openstack.org/developer/devstack/guides/lxc.html But needs to adapt for LXD 2.0. And we have to persuade the infra team to provide 16.04 VM :-) Let me know what do you think. kind regards Frank ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
On Mon, Jul 11, 2016 at 5:59 AM, Vipul Nayyarwrote: > Hey Elizabeth, > > I'd like to contribute. :-) > > I have some past deployment and Ops experience and I'm really interested in > building something of a blue green deployment system here, to decrease the > downtime. Although, I'm still going through the infra related docs which I'm > fairly new to, but with a little bit of guidance early on, I'll be happy to > take over some responsibilities over time. > > Maybe a good place for me to start might be, to have a deep look at the > puppet module written by Frank and probably noting down the most common > errors that are encountered regularly. I'd like to hear more concrete > thoughts from the community about how to proceed on this, if any. Thank you for volunteering to help! The following is quick rundown of how I've been testing Frank's changes, which is also a bit of a crash course in how we test our Puppet work, which is valuable to learn: First of all, I was testing on public cloud instances with 8G of RAM running Ubuntu 14.04, but I no longer have access to the one I was using. I now test this on a local KVM instance with 8G of RAM. As for testing itself, you'll want to follow our instructions for "Making a change in Puppet" here: http://docs.openstack.org/infra/system-config/sysadmin.html#making-a-change-in-puppet Put something like the following in the local.pp: http://paste.openstack.org/show/489372/ Before running the ./install_modules.sh command, apply Frank's https://review.openstack.org/#/c/276466/ to your cloned /root/system-config with git fetch, which will be something like, as root: cd system-config/ git fetch https://review.openstack.org/openstack-infra/system-config refs/changes/66/276466/9 && git checkout FETCH_HEAD ...you can get this fetch link from Gerrit, at the top right of https://review.openstack.org/#/c/276466/ where it says "Download" and has a drop down menu with all the links. Then you can continue with install_modules and the puppet apply command. If your system or internet connection is on the slow side, you may also need to bump the timeout in the checksite module, which I did in this patch: https://review.openstack.org/#/c/337912/ Since I really could use the help running these tests and improving the fault tolerance of this module, I really appreciate your effort. Please mail the list here or grab me on IRC (I'm pleia2 in #openstack-infra on freenode) if you need any help. Collecting error messages folks run into here will help us too. -- Elizabeth Krumbach Joseph || Lyz || pleia2 ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
Re: [OpenStack-Infra] Work toward a translations checksite and call for help
Hey Elizabeth, I'd like to contribute. :-) I have some past deployment and Ops experience and I'm really interested in building something of a blue green deployment system here, to decrease the downtime. Although, I'm still going through the infra related docs which I'm fairly new to, but with a little bit of guidance early on, I'll be happy to take over some responsibilities over time. Maybe a good place for me to start might be, to have a deep look at the puppet module written by Frank and probably noting down the most common errors that are encountered regularly. I'd like to hear more concrete thoughts from the community about how to proceed on this, if any. Thanks Vipul Nayyar ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
[OpenStack-Infra] Work toward a translations checksite and call for help
Hi everyone, I brought this up a few meetings ago, but I wanted to collect the thoughts in one place to more easily get infra team input on the status of work toward a translations checksite for the i18n team. As some history, the i18n team wrote a specification a while back which we approved, which folks can read for background: http://specs.openstack.org/openstack-infra/infra-specs/specs/translation_check_site.html The original assignees were mostly i18n people, and have been pulled off to other things. As one of the primary infra liaisons with the i18n team I've been pulled into helping, but my ability to help is limited due to time and need for collaboration with some other infra folks on some decisions. So here I am emailing the rest of the team for help. Plus we also wanted to bring the conversations happening privately about roadblocks to happen publicly so I don't continue to be a blocker here. Over the past several months Frank Kloeker worked to write a preliminary Puppet module for us in puppet-translation_checksite (now merged) and he has an outstanding corresponding system-config patch: https://review.openstack.org/#/c/276466/ As the spec outlines, the assumption was that we'd run this on a long-lived server in some way, updating the translation strings directly from Zanata daily, and re-installing DevStack once a week. We've run into a few issues with this, which I'd appreciate some thoughts about so I have some help evaluating how to move forward. 1. The Puppet module is really fragile. In theory it works, Frank did a good job with it. But almost every time I run it I run into another problem. Sometimes it has to do with a DevStack error (there was a known problem a couple times when I tried to run it), or trouble with my environment (DevStack doesn't fail gracefully if a dependency is not satisfied due to network timeout or whatnot) and sometimes it's just a change in our infra that breaks things (yesterday it was an unexpected problem with the puppet apt module). The module itself doesn't yet have any recovery for any of this. If we had DevStack running along well for a week, and it gets to the next week and it fails to build, we're stuck with a broken system and no notification that it's broken. We could spend time building fault tolerance and build failure alerts into it, but I want to make sure we're on the right track first. 2. We don't actually have a solution to run "new" DevStack once a week. Some options: - The once a week rebuild is just known downtime for the checksite, have a cron job to ./unstack and delete /opt/devstack? - Get to a place we're we're auto-building new servers, and just build a new one and swap DNS once a week once we know the new server also is running properly with something like a health script that must pass - Something else? 3. It takes a long time to run DevStack's stack.sh, which this module does. Current timeout is 3600 (1 hour), but I have to bump it up to run it locally in my tests. Even at an hour, this will really gum up the works if it's part of system-config and running alongside all our other ansible+puppet runs, even if the building of DevStack is only once a week. Is this acceptable to us? 4. While we will have i18n team members logging into the Horizon interface to see the progress of their translations work (that's the whole point), the translations checksite is essentially read-only and we have a pretty good mechanism in place for spinning up daily DevStack instances for all our tests. Maybe we should back-peddle and somehow leverage this tooling instead? Thanks everyone. -- Elizabeth Krumbach Joseph || Lyz || pleia2 ___ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra