On Wed, Jul 24, 2013 at 01:45:13PM -0700, Sheng Yang wrote: > On Tue, Jul 23, 2013 at 11:54 PM, Prasanna Santhanam <t...@apache.org> wrote: > > > On Tue, Jul 23, 2013 at 11:20:22PM -0700, Sheng Yang wrote: > > > About the patch I committed, in fact it's not moving. It's fixing. > > > Because: > > > 1. The logical reason is, currently in many cases(which I committed the > > > patches) the account is already create per test case rather than per > > > module. And if account is created in setUp() rather than setUpClass(), the > > > clean up would be in tearDown() rather than tearDownClass(), because > > > they're useless anyway after setUp create another account and > > > more VMs. patch didn't move account creation from setUpClass(). > > > > Sheng - thanks I saw your patches more clearly now. The diffs mislead > > me that they were perhaps moved. You're right that the cleanup > > list should add the account in the locality of where it was created ie > > per-test or per-module as the case may be. > > > > > 2. The more direct reason is too many test cases in regression test failed > > > just because lack of resource to create new deployment. E.g. > > > https://issues.apache.org/jira/browse/CLOUDSTACK-3643 . Whenever I saw > > > "Fail to create VPC" or "Fail to deploy VM", mostly it because the testing > > > setup is overloaded. > > > > Yes this is a concern. We are mislead by resource exhaustion to be a > > failure in the test. Do you think the tests could be better organized > > when run in a group against a limited resource deployment? > > > > I can think of following things: > 1. The maximum parallel execution test cases number should leave some(but > not too much) margin for wrong test cases.
This will be difficult to guess as we differ deployments for the runs. What we can do is provide a threshold in the testrunner to the number of tests that can be throttled at once against a deployment. Earlier we used python's multithreading to parallelize tests (python not v. good at it) but now it works using executors (jvm) available on the jenkins node. So it will be tricky to implement this. > 2. Every test case should take care of it's own resources and release them > ASAP after they're no longer in use. We could provide a generic cleanup so the testcase doesn't have to think of GC on its own. Every CloudResource.create call can be intercepted and added to a gc chain. Based on the type of the resource we schedule the gc in the right order. Say vm> network> account and so on. > 3. Reduce the time out for failure operation. E.g. we don't need to wait > tens of minutes for SSH timeout. Yeah - I will fix this today. It is annoying to not see any logs during this time. We should simply fail after 1-2 minutes. > 4. Check for unnecessary and duplicate test cases. > Oh - this one's absolutely important. I notice you fixed a couple of cases that were invalid. Thanks. But it would be great to review the tests by the dev as and when they come in as patches. I run it through the dev via reviewboard usually. Hope we can do it better going forward when folks are committing tests directly as well. > If we can get test cases running faster, then they can release the > resources sooner. > > > > > > > > I know VM creating is very time consuming, and reuse the account and VM > > is > > > really nice! But the fact is many of current test cases is not written in > > > this way. So I think we should release the resource as soon as it's > > > obsolete. > > > > > Yup that's the way it was intended. We probably deviated from there > > and will fix these tests during this sprint. > > > > > About reusing the already deployed VM and account, it would be a > > > case-by-case issue. For example, TestAddVmToSubDomain case in > > test_accounts > > > do create the two VMs shared across the module(class), then tear it down > > > when in tearDownClass(). But sometime there is not a easy way to do so. I > > > think that's why there are not many VirtualMachine.create() happened in > > > setUpClass(). > > > > > > > Agreed on this. Much of this information is within the test cases and > > until you drill down in the scenario you won't be aware of it. But > > in general the guidelines probably were not clear (I got questions > > from others working on the tests) so it was better to spell them out. > > > > It's time to write up these guidelines in our wiki which has some > > basic guidelines already: > > > > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Marvin+-+Testing+with+Python#Marvin-TestingwithPython-Guidelinestochoosescenariosforintegration > > > Yes, I've add some clean up guideline to it. > > --Sheng > > > > > > > Thanks for sharing your thoughts, > > > > > > > --Sheng > > > > > > > > > On Tue, Jul 23, 2013 at 10:56 PM, Prasanna Santhanam <t...@apache.org> > > wrote: > > > > > > > In the test modules when you debug you will notice that accounts are > > > > created once per module in setUpClass() all resources created within > > > > it and tearDownClass() destroys the account initiating cleanup. All > > > > the resources are appended to a cleanup [] list and deleted in > > > > appropriate order at the end of the test module in tearDownClass() > > > > > > > > There are a few reasons for doing this for all the tests in the class > > > > at once as opposed to doing it for every test. > > > > > > > > Modules are grouped by feature eg: test_tags does tags related > > > > tests. And more often than not all tests in the module share similar > > > > patterns in the set of API calls made to achieve a test scenario. > > > > > > > > If an account was created per test the overhead of cleaning up is much > > > > higher than when cleaning up once per module. Because every new > > > > account needs a new VR for the first VM deployed in it. And almost > > > > every test will deploy a VM. So this slows down test run significantly > > > > and eats up resources like VLANs very quickly which are needed for > > > > every account. > > > > > > > > I saw a few fixes that moved the resource cleanup from tearDownClass() > > > > to tearDown and that prompted me to send this email. Hope this makes > > > > sense. I'd like to hear other's thoughts on how best to accomplish all > > > > the tests in the most optimum way without hurting resources on a > > > > deployment. > > > > > > > > Also - on the test infrastructure on jenkins.buildacloud.org there is > > > > no way to timeout a specific test if it takes up more time than > > > > necessary. So I'm going to introduce a timeout plugin in nose that > > > > will abort the test if it takes longer than 1/2hr. I think this should > > > > help weeding out test that do arbitrary 'sleeps' or wait for very long > > > > cleanup operations. That way we should be able to optimize the test as > > > > well > > > > > > > > Thoughts? > > > > > > > > -- > > > > Prasanna., > > > > > > > > ------------------------ > > > > Powered by BigRock.com > > > > > > > > > > > > -- > > Prasanna., > > > > ------------------------ > > Powered by BigRock.com > > > > -- Prasanna., ------------------------ Powered by BigRock.com