On Sat, 2016-03-05 at 11:15 -0500, Emilien Macchi wrote: > I'm kind of hijacking Dan's e-mail but I would like to propose some > technical improvements to stop having so much CI failures. > > > 1/ Stop creating swap files. We don't have SSD, this is IMHO a > terrible > mistake to swap on files because we don't have enough RAM. In my > experience, swaping on non-SSD disks is even worst that not having > enough RAM. We should stop doing that I think. > > > 2/ Split CI jobs in scenarios. > > Currently we have CI jobs for ceph, HA, non-ha, containers and the > current situation is that jobs fail randomly, due to performances > issues. > > Puppet OpenStack CI had the same issue where we had one integration > job > and we never stopped adding more services until all becomes *very* > unstable. We solved that issue by splitting the jobs and creating > scenarios: > > https://github.com/openstack/puppet-openstack-integration#description > > What I propose is to split TripleO jobs in more jobs, but with less > services. > > The benefit of that: > > * more services coverage > * jobs will run faster > * less random issues due to bad performances > > The cost is of course it will consume more resources. > That's why I suggest 3/. > > We could have: > > * HA job with ceph and a full compute scenario (glance, nova, cinder, > ceilometer, aodh & gnocchi). > * Same with IPv6 & SSL. > * HA job without ceph and full compute scenario too > * HA job without ceph and basic compute (glance and nova), with extra > services like Trove, Sahara, etc. > * ... > (note: all jobs would have network isolation, which is to me a > requirement when testing an installer like TripleO).
I'm not sure we have enough resources to entertain this option. I would like to see us split the jobs up but not in exactly the way you describe above. I would rather see us put the effort into architecture changes like "split stack" which cloud allow us to test the configuration side of our Heat stack on normal Cloud instances. Once we have this in place I think we would have more potential resources and could entertain running more jobs to and thus could split things out to run in parallel if we choose to do so. > > 3/ Drop non-ha job. > I'm not sure why we have it, and the benefit of testing that > comparing > to HA. A couple of reasons we have the nonha job I think. First is that not everyone wants to use HA. We run our own TripleO CI cloud without HA at this point and I think there is interest in maintaining this as a less complex installation alternative where HA isn't needed. Second is need to support functionally testing TripleO where developers don't have enough resources for 3 controller nodes. At the very least we'd need a second single node HA job (which wouldn't really be doing HA) but would allow us to continue supporting the compressed installation for developer testing, etc. Dan > > > Any comment / feedback is welcome, > _____________________________________________________________________ > _____ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs > cribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev