Both the tests are now marked as bad since there's has been more than one instance where these tests have failed even after fixing the infra problem. Request geo-rep team to take a look at and revive the tests back soon.
On Tue, Jan 23, 2018 at 2:30 PM, Atin Mukherjee <amukh...@redhat.com> wrote: > > > On Mon, Jan 22, 2018 at 5:13 PM, Nigel Babu <nig...@redhat.com> wrote: > >> Update: All the nodes that had problems with geo-rep are now fixed. >> Waiting on the patch to be merged before we switch over to Centos 7. If >> things go well, we'll replace nodes one by one as soon as we have one green >> on Centos 7. >> > > I just noticed we failed again on the geo-rep tests @ > https://build.gluster.org/job/centos6-regression/8604/console . Nigel > reconfirmed that we have all the machines cleaned up. What else could be > going wrong here? > > >> On Mon, Jan 22, 2018 at 12:21 PM, Nigel Babu <nig...@redhat.com> wrote: >> >>> Hello folks, >>> >>> As you may have noticed, we've had a lot of centos6-regression failures >>> lately. The geo-replication failures are the new ones which particularly >>> concern me. These failures have nothing to do with the test. The tests are >>> exposing a problem in our infrastructure that we've carried around for a >>> long time. Our machines are not clean machines that we automated. We setup >>> automation on machines that were already created. At some point, we loaned >>> machines for debugging. During this time, developers have inadvertently >>> done 'make install' on the system to install onto system paths rather than >>> into /build/install. This is what is causing the geo-replication tests >>> to fail. I've tried cleaning the machines up several times with little to >>> no success. >>> >>> Last week, we decided to take an aggressive path to fix this problem. We >>> planned to replace all our problematic nodes with new Centos 7 nodes. This >>> exposed more problems. We expected a specific type of machine from >>> Rackspace. These are no longer offered. Thus, our automation fails on some >>> steps. I've spent this weekend tweaking our automation so that it works >>> on the new Rackspace machines and I'm down to just one test failure[1]. >>> I have a patch up to fix this failure[2]. As soon as that patch is >>> merged, we can push forward with Centos7 nodes. In 4.0, we're dropping >>> support for Centos 6, so this decision makes more sense to do sooner than >>> later. >>> >>> We'll not be lending machines anymore from production. We'll be creating >>> new nodes which are a snapshots of an existing production node. This >>> machine will be destroyed after use. This helps prevent this particular >>> problem in the future. This also means that our machine capacity at all >>> times is at 100 with very minimal wastage. >>> >>> [1]: https://build.gluster.org/job/cage-test/184/consoleText >>> [2]: https://review.gluster.org/#/c/19262/ >>> >>> -- >>> nigelb >>> >> >> >> >> -- >> nigelb >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-devel >> > >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel