Hi, At the last infra team meeting, we talked about whether and how to proceed with Gitea. I'd like to summarize that quickly and make sure we're all on board with it.
* We will continue to deploy our own Kubernetes using the k8s-for-openstack Ansible playbook that Monty found. Since that's developed by a third-party, we will use it by checking out the upstream source from GitHub, but pinning to a known sha so that we don't encounter surprises. * We discussed deploying with a new version of rook which does not require the flex driver, but it turns out I was a bit ahead of things -- that hasn't landed yet. So we can probably keep our current deployment. Ian raised two new issues: 1) We should verify that the system still functions if our single-master Kubernetes loses its master. Monty and I tried this -- it doesn't. The main culprit here seems to be DNS. The single master is responsible for intra-(and extra!)-cluster DNS. This makes gitea unhappy for three reasons: a) if its SQL connections have gone idle and terminated, it cannot re-establish them, and b) it is unable to resolve remote hostnames for avatars, which can greatly slow down page loads, and c) the replication receiver is not a long running process, it's just run over SSH, so it can't connect to the database either, and therefore replication fails. The obvious solution, use a multi-master setup, apparently has issues if k8s is deployed in a cloud with LoadBalancer objects (which we are using). Kubernetes does have support for scale-out DNS, it's not clear whether that still has a SPOF though. Monty is experimenting with this. If that doesn't improve things, we may still want to proceed since the system should still mostly work for browsing and git clones if the master fails, and full operation will resume when it comes online. 2) Rook is difficult to upgrade. This appears to be the case. When it does come time to upgrade rook, we may want to simply build a new Kubernetes cluster for the system. Presumably by that point, it won't require the flexvolume driver, which will be a good reason to make a new cluster anyway, and perhaps further upgrades after that won't be as complicated. Once we conclude investigation into issue #1, I think these are the next steps: * Land the patches to manage the opendev k8s cluster with Ansible. * Pin the k8s-on-openstack repo to the current sha. * Add HTTPS termination to the cluster. * Update opendev.org DNS to point to the cluster. * Treat this as a soft-launch of the production service. Do not publicise it or encourage people to switch to it yet, but continue to observe it as we complete the rest of the tasks in [1]. [1] http://specs.openstack.org/openstack-infra/infra-specs/specs/opendev-gerrit.html#work-items -Jim _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
