Re: [openstack-dev] [TripleO] CI report : 1/11/2014 - 4/12/2014

2014-12-08 Thread Derek Higgins
On 04/12/14 13:37, Dan Prince wrote:
 On Thu, 2014-12-04 at 11:51 +, Derek Higgins wrote:
 A month since my last update, sorry my bad

 since the last email we've had 5 incidents causing ci failures

 26/11/2014 : Lots of ubuntu jobs failed over 24 hours (maybe half)
 - We seem to suffer any time an ubuntu mirror isn't in sync causing hash
 mismatch errors. For now I've pinned DNS on our proxy to a specific
 server so we stop DNS round robining
 
 This sound fine to me. I personally like the model where you pin to a
 specific mirror, perhaps one that is geographically closer to your
 datacenter. This also makes Squid caching (in the rack) happier in some
 cases.
 

 21/11/2014 : All tripleo jobs failed for about 16 hours
 - Neutron started asserting that local_ip be set to a valid ip address,
 on the seed we had been leaving it blank
 - Cinder moved to using  oslo.concurreny which in turn requires that
 lock_path be set, we are now setting it
 
 
 Thinking about how we might catch these ahead of time with our limited
 resources ATM. These sorts of failures all seem related to configuration
 and or requirements changes. I wonder if we were to selectively
 (automatically) run check experimental jobs on all reviews with
 associated tickets which have either doc changes or modify
 requirements.txt. Probably a bit of work to pull this off but if we had
 a report containing these results coming down the pike we might be
 able to catch them ahead of time.
Yup, this sounds like it could be beneficial, alternatively if we soon
have the capacity to run on more projects (capacity is increasing) we'll
be running on all reviews and we'll be able to generate the report your
talking about, either way we should do something like this soon.

 
 

 8/11/2014 : All fedora tripleo jobs failed for about 60 hours (over a
 weekend)
 - A url being accessed on  https://bzr.linuxfoundation.org is no longer
 available, we removed the dependency

 7/11/2014 : All tripleo tests failed for about 24 hours
 - Options were removed from nova.conf that had been deprecated (although
 no deprecation warnings were being reported), we were still using these
 in tripleo

 as always more details can be found here
 https://etherpad.openstack.org/p/tripleo-ci-breakages
 
 Thanks for sending this out! Very useful.
no problem
 
 Dan
 

 thanks,
 Derek.

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] CI report : 1/11/2014 - 4/12/2014

2014-12-04 Thread Dan Prince
On Thu, 2014-12-04 at 11:51 +, Derek Higgins wrote:
 A month since my last update, sorry my bad
 
 since the last email we've had 5 incidents causing ci failures
 
 26/11/2014 : Lots of ubuntu jobs failed over 24 hours (maybe half)
 - We seem to suffer any time an ubuntu mirror isn't in sync causing hash
 mismatch errors. For now I've pinned DNS on our proxy to a specific
 server so we stop DNS round robining

This sound fine to me. I personally like the model where you pin to a
specific mirror, perhaps one that is geographically closer to your
datacenter. This also makes Squid caching (in the rack) happier in some
cases.

 
 21/11/2014 : All tripleo jobs failed for about 16 hours
 - Neutron started asserting that local_ip be set to a valid ip address,
 on the seed we had been leaving it blank
 - Cinder moved to using  oslo.concurreny which in turn requires that
 lock_path be set, we are now setting it


Thinking about how we might catch these ahead of time with our limited
resources ATM. These sorts of failures all seem related to configuration
and or requirements changes. I wonder if we were to selectively
(automatically) run check experimental jobs on all reviews with
associated tickets which have either doc changes or modify
requirements.txt. Probably a bit of work to pull this off but if we had
a report containing these results coming down the pike we might be
able to catch them ahead of time.


 
 8/11/2014 : All fedora tripleo jobs failed for about 60 hours (over a
 weekend)
 - A url being accessed on  https://bzr.linuxfoundation.org is no longer
 available, we removed the dependency
 
 7/11/2014 : All tripleo tests failed for about 24 hours
 - Options were removed from nova.conf that had been deprecated (although
 no deprecation warnings were being reported), we were still using these
 in tripleo
 
 as always more details can be found here
 https://etherpad.openstack.org/p/tripleo-ci-breakages

Thanks for sending this out! Very useful.

Dan

 
 thanks,
 Derek.
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev