Thanks, Daryl. Is it best to just do a 'git review', or let people see it in some other way?

 -David

On 1/31/2012 12:52 PM, Daryl Walleck wrote:
Hi David,

I wanted to reply to this last week but I was swamped. This sounds like great 
work! I can't say exactly on how the Tempest actions would need to be extended 
without seeing the code and understanding what the proposed changes are, but 
I'd definitely like to see what you have to figure out how it could integrate 
into our work.

Daryl


On Jan 26, 2012, at 11:02 AM, David Kranz wrote:

(after long cursing at pep8) I am preparing to check in a set of stress tests 
that we have developed and wanted to get some feedback before I attempt to do 
so.

The problem to be solved is that nova is a distributed, asynchronous system 
that is prone to race condition bugs. These bugs will not be easily found during
functional testing but will be encountered by users in large deployments in a 
way that is hard to debug. The stress test tries to cause these bugs to happen 
in a more
controlled environment.

The basic idea of the test is that there are a number of actions, roughly 
corresponding to the Compute API, that are fired pseudo-randomly at a nova 
cluster as fast as possible. These actions consist of what to do, how to verify 
success, and a state filter to make sure that the operation makes sense. For 
example, if the action is to reboot a server and none are active, nothing 
should be done. A test case is a set of actions to be performed and the 
probability that each action should be selected. There are also parameters 
controlling rate of fire and stuff like that. Currently there are only a few 
actions defined but this test has discovered three bugs just with that so I 
want to check it in as quickly as possible.

I was going to check in a 'stress' directory parallel to the 'tempest' 
directory.

This test is not like functional tests in that it can never succeed, only fail. 
Ideally it will run for a long time and so cannot really be run
after every checkin. It would be good to run a short-duration case as part of 
the functional tests though.

This test requires some new parameters for the environment. For example, one 
thing it does is periodically check the nova logs
on all cluster nodes to make sure there are no errors and will fail the test is 
there are. So it needs the path to the ssh private key for
the cluster nodes. It seems that currently we have getters defined for all of 
the config parameters. Should there be a new getter for
every kind of config option that some one adds to Tempest, or should we just 
provide a method to get the parameter by string name?

This test was developed before the tempest code was available. The "what to do" 
part of each action is pretty similar to the tempest methods that
call the API. Would it make sense at some point to extend the tempest actions 
to include methods that enable them to participate in a stress test?

Any other comments or issues?

Thanks.

-David

--
Mailing list: https://launchpad.net/~openstack-qa-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~openstack-qa-team
More help   : https://help.launchpad.net/ListHelp


--
Mailing list: https://launchpad.net/~openstack-qa-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~openstack-qa-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to