Hi everyone, I brought this up a few meetings ago, but I wanted to collect the thoughts in one place to more easily get infra team input on the status of work toward a translations checksite for the i18n team. As some history, the i18n team wrote a specification a while back which we approved, which folks can read for background: http://specs.openstack.org/openstack-infra/infra-specs/specs/translation_check_site.html
The original assignees were mostly i18n people, and have been pulled off to other things. As one of the primary infra liaisons with the i18n team I've been pulled into helping, but my ability to help is limited due to time and need for collaboration with some other infra folks on some decisions. So here I am emailing the rest of the team for help. Plus we also wanted to bring the conversations happening privately about roadblocks to happen publicly so I don't continue to be a blocker here. Over the past several months Frank Kloeker worked to write a preliminary Puppet module for us in puppet-translation_checksite (now merged) and he has an outstanding corresponding system-config patch: https://review.openstack.org/#/c/276466/ As the spec outlines, the assumption was that we'd run this on a long-lived server in some way, updating the translation strings directly from Zanata daily, and re-installing DevStack once a week. We've run into a few issues with this, which I'd appreciate some thoughts about so I have some help evaluating how to move forward. 1. The Puppet module is really fragile. In theory it works, Frank did a good job with it. But almost every time I run it I run into another problem. Sometimes it has to do with a DevStack error (there was a known problem a couple times when I tried to run it), or trouble with my environment (DevStack doesn't fail gracefully if a dependency is not satisfied due to network timeout or whatnot) and sometimes it's just a change in our infra that breaks things (yesterday it was an unexpected problem with the puppet apt module). The module itself doesn't yet have any recovery for any of this. If we had DevStack running along well for a week, and it gets to the next week and it fails to build, we're stuck with a broken system and no notification that it's broken. We could spend time building fault tolerance and build failure alerts into it, but I want to make sure we're on the right track first. 2. We don't actually have a solution to run "new" DevStack once a week. Some options: - The once a week rebuild is just known downtime for the checksite, have a cron job to ./unstack and delete /opt/devstack? - Get to a place we're we're auto-building new servers, and just build a new one and swap DNS once a week once we know the new server also is running properly with something like a health script that must pass - Something else? 3. It takes a long time to run DevStack's stack.sh, which this module does. Current timeout is 3600 (1 hour), but I have to bump it up to run it locally in my tests. Even at an hour, this will really gum up the works if it's part of system-config and running alongside all our other ansible+puppet runs, even if the building of DevStack is only once a week. Is this acceptable to us? 4. While we will have i18n team members logging into the Horizon interface to see the progress of their translations work (that's the whole point), the translations checksite is essentially read-only and we have a pretty good mechanism in place for spinning up daily DevStack instances for all our tests. Maybe we should back-peddle and somehow leverage this tooling instead? Thanks everyone. -- Elizabeth Krumbach Joseph || Lyz || pleia2 _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
