Hi David, 

I wanted to reply to this last week but I was swamped. This sounds like great 
work! I can't say exactly on how the Tempest actions would need to be extended 
without seeing the code and understanding what the proposed changes are, but 
I'd definitely like to see what you have to figure out how it could integrate 
into our work.

Daryl


On Jan 26, 2012, at 11:02 AM, David Kranz wrote:

> (after long cursing at pep8) I am preparing to check in a set of stress tests 
> that we have developed and wanted to get some feedback before I attempt to do 
> so.
> 
> The problem to be solved is that nova is a distributed, asynchronous system 
> that is prone to race condition bugs. These bugs will not be easily found 
> during
> functional testing but will be encountered by users in large deployments in a 
> way that is hard to debug. The stress test tries to cause these bugs to 
> happen in a more
> controlled environment.
> 
> The basic idea of the test is that there are a number of actions, roughly 
> corresponding to the Compute API, that are fired pseudo-randomly at a nova 
> cluster as fast as possible. These actions consist of what to do, how to 
> verify success, and a state filter to make sure that the operation makes 
> sense. For example, if the action is to reboot a server and none are active, 
> nothing should be done. A test case is a set of actions to be performed and 
> the probability that each action should be selected. There are also 
> parameters controlling rate of fire and stuff like that. Currently there are 
> only a few actions defined but this test has discovered three bugs just with 
> that so I want to check it in as quickly as possible.
> 
> I was going to check in a 'stress' directory parallel to the 'tempest' 
> directory.
> 
> This test is not like functional tests in that it can never succeed, only 
> fail. Ideally it will run for a long time and so cannot really be run
> after every checkin. It would be good to run a short-duration case as part of 
> the functional tests though.
> 
> This test requires some new parameters for the environment. For example, one 
> thing it does is periodically check the nova logs
> on all cluster nodes to make sure there are no errors and will fail the test 
> is there are. So it needs the path to the ssh private key for
> the cluster nodes. It seems that currently we have getters defined for all of 
> the config parameters. Should there be a new getter for
> every kind of config option that some one adds to Tempest, or should we just 
> provide a method to get the parameter by string name?
> 
> This test was developed before the tempest code was available. The "what to 
> do" part of each action is pretty similar to the tempest methods that
> call the API. Would it make sense at some point to extend the tempest actions 
> to include methods that enable them to participate in a stress test?
> 
> Any other comments or issues?
> 
> Thanks.
> 
> -David
> 
> -- 
> Mailing list: https://launchpad.net/~openstack-qa-team
> Post to     : openstack-qa-team@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack-qa-team
> More help   : https://help.launchpad.net/ListHelp


-- 
Mailing list: https://launchpad.net/~openstack-qa-team
Post to     : openstack-qa-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack-qa-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to