[openstack-dev] [qa] Smarter timeouts in Tempest?

Matt Riedemann Mon, 19 May 2014 08:59:26 -0700

I was looking through this timeout bug [1] this morning and am able tocorrelate that around the time of the image snapshot timeout, ceilometerwas really hammering CPU on the host. There are already threads onceilometer performance and how that needs to be improved for Tempestruns so I don't want to get into that here.

What I'm thinking about is if there is a way to be smarter about how wedo timeouts in the tests, rather than just rely on globally configuredhard-coded timeouts which are bound to fail intermittently in dynamicenvironments like this.

I'm thinking something along the lines of keeping track of CPU stats onintervals in our waiter loops, then when we reach our configuredtimeout, calculate the average CPU load/idle and if it falls below somethreshold, we cut the timeout in half and redo the timeout loop - and wecontinue that until our timeout reaches some level that no longer makessense, like once it drops less than a minute for example.

Are there other ideas here? My main concern is the number of randomtimeout failures we see in the tests and then people are trying tofingerprint them with elastic-recheck but the queries are so genericthey are not really useful. We now put the test class and test case inthe compute test timeout messages, but it's also not very useful tofingerprint every individual permutation of test class/case that we canhit a timeout in.


[1] https://bugs.launchpad.net/nova/+bug/1320617

--

Thanks,

Matt Riedemann


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [qa] Smarter timeouts in Tempest?

Reply via email to