Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Tue, Mar 22, 2016 at 9:31 AM, Kevin Benton wrote: > Thanks for doing this. I dug into the test_volume_boot_pattern test to see > what was going on. > > On the first boot, Nova called Neutron to create a port at 23:29:44 and it > took 441ms to return the port to Nova.[1] > Nova then plugged the interface for that port into OVS a little over 6 > seconds later at 23:29:50.[2] > The Neutron agent attempted to process this on the iteration at 23:29:52 > [3]; however, it didn't get the ofport populated from the OVSDB monitor... a > bug![4] The Neutron agent did catch it on the next iteration two seconds > later on a retry and notified the Neutron server at 23:29:54.[5] Good work as usual Kevin, just approved the fix to this bug. > The Neutron server processed the port ACTIVE change in just under 80ms[6], > but it did not dispatch the notification to Nova until 2 seconds later at > 23:29:56 [7] due to the Nova notification batching mechanism[8]. > > Total time between port create and boot is about 12 seconds. 6 in Nova and 6 > in Neutron. > > For the Neutron side, the bug fix should eliminate 2 seconds. We could > probably make the Nova notifier batching mechanism a little more aggressive > so it only batches up calls in a very short interval rather than making 2 > second buckets at all times. The remaining 2 seconds is just the agent > processing loop interval, which can be tuned with a config but it should be > okay if that's the only bottleneck. > > For Nova, we need to improve that 6 seconds after it has created the Neutron > port before it has plugged it into the vswitch. I can see it makes some > other calls to Neutron in this time to list security groups and floating > IPs. Maybe this can be done asynchronously because I don't think they should > block the initial VM boot to pause that plugs in the VIF. > > Completely unrelated to the boot process, the entire tempest run spent ~412 > seconds building and destroying Neutron resources in setup and teardown.[9] > However, considering the number of tests executed, this seems reasonable so > I'm not sure we need to work on optimizing that yet. > > > Cheers, > Kevin Benton > > > 1. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_45_341 > 2. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-n-cpu.txt.gz#_2016-03-21_23_29_50_629 > 3. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_52_216 > 4. https://bugs.launchpad.net/neutron/+bug/1560464 > 5. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_54_738 > 6. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_54_813 > 7. > http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_56_782 > 8. > http://git.openstack.org/cgit/openstack/neutron/tree/neutron/notifiers/nova.py > 9. egrep -R 'tearDown|setUp' tempest.txt.gz | grep 9696 | awk '{print > $(NF)}' | ./fsum > > On Mon, Mar 21, 2016 at 5:09 PM, Clark Boylan wrote: >> >> On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: >> > On 03/21/2016 04:09 PM, Clark Boylan wrote: >> > > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: >> > >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: >> > >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: >> > Do you have an a better insight of job runtimes vs jobs in other >> > projects? >> > Most of the time in the job runtime is actually spent setting the >> > infrastructure up, and I am not sure we can do anything about it, >> > unless >> > we >> > take this with Infra. >> > >>> >> > >>> I haven't done a comparison yet buts lets break down the runtime of >> > >>> a >> > >>> recent successful neutron full run against neutron master [0]. >> > >> >> > >> And now for some comparative data from the gate-tempest-dsvm-full job >> > >> [0]. This job also ran against a master change that merged and ran in >> > >> the same cloud and region as the neutron job. >> > >> >> > > snip >> > >> Generally each step of this job was quicker. There were big >> > >> differences >> > >> in devstack and tempest run time though. Is devstack much slower to >> > >> setup neutron when compared to nova net? For tempest it looks like we >> > >> run ~1510 tests against neutron and only ~1269 against nova net. This >> > >> may account for the large difference there. I also recall that we run >> > >> ipv6 tempest tests against neutron deployments that were inefficient >> > >> and >> > >> booted 2 qemu VMs per test (not sure if that is still the case but >> > >> illustrates that the tests themselves may not be very quick in the >> > >> neutron case). >> > > >> > > Looking at the tempes
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
Thanks for doing this. I dug into the test_volume_boot_pattern test to see what was going on. On the first boot, Nova called Neutron to create a port at 23:29:44 and it took 441ms to return the port to Nova.[1] Nova then plugged the interface for that port into OVS a little over 6 seconds later at 23:29:50.[2] The Neutron agent attempted to process this on the iteration at 23:29:52 [3]; however, it didn't get the ofport populated from the OVSDB monitor... a bug![4] The Neutron agent did catch it on the next iteration two seconds later on a retry and notified the Neutron server at 23:29:54.[5] The Neutron server processed the port ACTIVE change in just under 80ms[6], but it did not dispatch the notification to Nova until 2 seconds later at 23:29:56 [7] due to the Nova notification batching mechanism[8]. Total time between port create and boot is about 12 seconds. 6 in Nova and 6 in Neutron. For the Neutron side, the bug fix should eliminate 2 seconds. We could probably make the Nova notifier batching mechanism a little more aggressive so it only batches up calls in a very short interval rather than making 2 second buckets at all times. The remaining 2 seconds is just the agent processing loop interval, which can be tuned with a config but it should be okay if that's the only bottleneck. For Nova, we need to improve that 6 seconds after it has created the Neutron port before it has plugged it into the vswitch. I can see it makes some other calls to Neutron in this time to list security groups and floating IPs. Maybe this can be done asynchronously because I don't think they should block the initial VM boot to pause that plugs in the VIF. Completely unrelated to the boot process, the entire tempest run spent ~412 seconds building and destroying Neutron resources in setup and teardown.[9] However, considering the number of tests executed, this seems reasonable so I'm not sure we need to work on optimizing that yet. Cheers, Kevin Benton 1. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_45_341 2. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-n-cpu.txt.gz#_2016-03-21_23_29_50_629 3. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_52_216 4. https://bugs.launchpad.net/neutron/+bug/1560464 5. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_54_738 6. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_54_813 7. http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_56_782 8. http://git.openstack.org/cgit/openstack/neutron/tree/neutron/notifiers/nova.py 9. egrep -R 'tearDown|setUp' tempest.txt.gz | grep 9696 | awk '{print $(NF)}' | ./fsum On Mon, Mar 21, 2016 at 5:09 PM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: > > On 03/21/2016 04:09 PM, Clark Boylan wrote: > > > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: > > >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: > > >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > > Do you have an a better insight of job runtimes vs jobs in other > > projects? > > Most of the time in the job runtime is actually spent setting the > > infrastructure up, and I am not sure we can do anything about it, > unless > > we > > take this with Infra. > > >>> > > >>> I haven't done a comparison yet buts lets break down the runtime of a > > >>> recent successful neutron full run against neutron master [0]. > > >> > > >> And now for some comparative data from the gate-tempest-dsvm-full job > > >> [0]. This job also ran against a master change that merged and ran in > > >> the same cloud and region as the neutron job. > > >> > > > snip > > >> Generally each step of this job was quicker. There were big > differences > > >> in devstack and tempest run time though. Is devstack much slower to > > >> setup neutron when compared to nova net? For tempest it looks like we > > >> run ~1510 tests against neutron and only ~1269 against nova net. This > > >> may account for the large difference there. I also recall that we run > > >> ipv6 tempest tests against neutron deployments that were inefficient > and > > >> booted 2 qemu VMs per test (not sure if that is still the case but > > >> illustrates that the tests themselves may not be very quick in the > > >> neutron case). > > > > > > Looking at the tempest slowest tests output for each of these jobs > > > (neutron and nova net) some tests line up really well across jobs and > > > others do not. In order to get a better handle on the runtime for > > > individual tests I have pushed https://review.openstack.org/295487 > which > > > will run
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On 03/21/2016 09:48 PM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 06:37 PM, Assaf Muller wrote: >> On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan >> wrote: >>> On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote: On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan wrote: If what we want is to cut down execution time I'd suggest to stop running Cinder tests on Neutron patches (Call it as an experiment) and see how long it takes for a regression to slip in. Being an optimistic, I would guess: Never! >>> >>> Experience has shown about a week and that its not an if but a when. >> >> I'm really curious how can a Neutron patch screw up Cinder (And the >> regression be missed by Neutron and Nova tests that interact with >> Neutron). I guess I wasn't around when this was happening. If anyone >> could shed historic light on this I'd appreciate it. > > Not neutron screwing up cinder just general time to regression when gate > stops testing something. We saw it when we stopped testing postgres for > example. > If we're running these tests on Neutron patches solely as a data point for performance testing, Tempest is obviously not the tool for the job and doesn't provide any added value we can't get from Rally and profilers for example. If there's otherwise value for running Cinder (And other tests that don't exercise the Neutron API), I'd love to know what it is :) I cannot remember any legit Cinder failure on Neutron patches. >>> >>> I think that is the complete wrong approach to take here. We have caught >>> a problem in neutron your goal should be to fix it not to stop testing >>> it. >> >> You misunderstood my intentions. I'm not saying we should plant our >> head in the sand and sing until the problem goes away, but I am saying >> that if we're interested in uncovering performance issues with >> Neutron's control plane, then there's more effective ways to do so. If >> you're interested and have the energy, profiling the neutron-server >> process while running Rally tests is a much better usage of time. >> Comparing nova-network and Neutron is just not a useful data point. > > The question was why is Neutron CI so slow. Upon investigation I found > that jobs using nova-net are ~20 minutes faster in one cloud than those > using neutron. I am not attempting to do performance testing on Neutron > I am attempting to narrow down where this lost 20 minutes can be found. > In this case it is a very useful data point. We know we can run these > tests faster because we have that data. Therefore the assumption is that > neutron can (and honestly it should) run just as quickly. > > We need these tests for integration testing (at least thats the > assertion by them living in tempest). We also want the jobs to run > faster (the topic of this thread). Using the data available to us we > find that the biggest costs in these jobs is the tempest testing itself. > The best way to make the jobs run quicker is to address the tests > themselves. Looking at the relative performance of the two solutions > available to us we find that there is room for improvement in the > Neutron testing. Thats all I am trying to point out. This has nothing to > do with proper performance testing or running rally and everything to do > with make the integration tests quicker. > >>> The fact that neutron is much slower in these test cases is an >>> indication that these tests DO exercise the neutron api and that you do >>> want to cover these code paths and that you need to address them, not >>> that you should stop testing them. >>> >>> We are not running these tests on neutron solely for performance >>> testing. In fact to get reasonable performance testing out of it I had >>> to jump through a few hoops: make tempest run serially then recheck >>> until the jobs ran in the same cloud more than once. Performance testing >>> has never been the goal of these tests. These tests exist to make sure >>> that OpenStack works. Boot from volume is an important piece of this and >>> we are making sure that OpenStack (this means glance, nova, neutron, >>> cinder) continue to work for this use case. > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > I would like to thank Clark, who could have chosen many different tasks fighting for his attention today, yet chose to focus on getting data for neutron tests in order to help Rosella and Ihar in their stated goal. Thank you, Clark, Anita. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 06:37 PM, Assaf Muller wrote: > On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan > wrote: > > On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote: > >> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan > >> wrote: > >> > >> If what we want is to cut down execution time I'd suggest to stop > >> running Cinder tests on Neutron patches (Call it as an experiment) and > >> see how long it takes for a regression to slip in. Being an > >> optimistic, I would guess: Never! > > > > Experience has shown about a week and that its not an if but a when. > > I'm really curious how can a Neutron patch screw up Cinder (And the > regression be missed by Neutron and Nova tests that interact with > Neutron). I guess I wasn't around when this was happening. If anyone > could shed historic light on this I'd appreciate it. Not neutron screwing up cinder just general time to regression when gate stops testing something. We saw it when we stopped testing postgres for example. > >> If we're running these tests on Neutron patches solely as a data point > >> for performance testing, Tempest is obviously not the tool for the job > >> and doesn't provide any added value we can't get from Rally and > >> profilers for example. If there's otherwise value for running Cinder > >> (And other tests that don't exercise the Neutron API), I'd love to > >> know what it is :) I cannot remember any legit Cinder failure on > >> Neutron patches. > > > > I think that is the complete wrong approach to take here. We have caught > > a problem in neutron your goal should be to fix it not to stop testing > > it. > > You misunderstood my intentions. I'm not saying we should plant our > head in the sand and sing until the problem goes away, but I am saying > that if we're interested in uncovering performance issues with > Neutron's control plane, then there's more effective ways to do so. If > you're interested and have the energy, profiling the neutron-server > process while running Rally tests is a much better usage of time. > Comparing nova-network and Neutron is just not a useful data point. The question was why is Neutron CI so slow. Upon investigation I found that jobs using nova-net are ~20 minutes faster in one cloud than those using neutron. I am not attempting to do performance testing on Neutron I am attempting to narrow down where this lost 20 minutes can be found. In this case it is a very useful data point. We know we can run these tests faster because we have that data. Therefore the assumption is that neutron can (and honestly it should) run just as quickly. We need these tests for integration testing (at least thats the assertion by them living in tempest). We also want the jobs to run faster (the topic of this thread). Using the data available to us we find that the biggest costs in these jobs is the tempest testing itself. The best way to make the jobs run quicker is to address the tests themselves. Looking at the relative performance of the two solutions available to us we find that there is room for improvement in the Neutron testing. Thats all I am trying to point out. This has nothing to do with proper performance testing or running rally and everything to do with make the integration tests quicker. > > The fact that neutron is much slower in these test cases is an > > indication that these tests DO exercise the neutron api and that you do > > want to cover these code paths and that you need to address them, not > > that you should stop testing them. > > > > We are not running these tests on neutron solely for performance > > testing. In fact to get reasonable performance testing out of it I had > > to jump through a few hoops: make tempest run serially then recheck > > until the jobs ran in the same cloud more than once. Performance testing > > has never been the goal of these tests. These tests exist to make sure > > that OpenStack works. Boot from volume is an important piece of this and > > we are making sure that OpenStack (this means glance, nova, neutron, > > cinder) continue to work for this use case. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote: >> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan >> wrote: >> > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: >> >> On 03/21/2016 04:09 PM, Clark Boylan wrote: >> >> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: >> >> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: >> >> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: >> >> Do you have an a better insight of job runtimes vs jobs in other >> >> projects? >> >> Most of the time in the job runtime is actually spent setting the >> >> infrastructure up, and I am not sure we can do anything about it, >> >> unless >> >> we >> >> take this with Infra. >> >> >>> >> >> >>> I haven't done a comparison yet buts lets break down the runtime of a >> >> >>> recent successful neutron full run against neutron master [0]. >> >> >> >> >> >> And now for some comparative data from the gate-tempest-dsvm-full job >> >> >> [0]. This job also ran against a master change that merged and ran in >> >> >> the same cloud and region as the neutron job. >> >> >> >> >> > snip >> >> >> Generally each step of this job was quicker. There were big differences >> >> >> in devstack and tempest run time though. Is devstack much slower to >> >> >> setup neutron when compared to nova net? For tempest it looks like we >> >> >> run ~1510 tests against neutron and only ~1269 against nova net. This >> >> >> may account for the large difference there. I also recall that we run >> >> >> ipv6 tempest tests against neutron deployments that were inefficient >> >> >> and >> >> >> booted 2 qemu VMs per test (not sure if that is still the case but >> >> >> illustrates that the tests themselves may not be very quick in the >> >> >> neutron case). >> >> > >> >> > Looking at the tempest slowest tests output for each of these jobs >> >> > (neutron and nova net) some tests line up really well across jobs and >> >> > others do not. In order to get a better handle on the runtime for >> >> > individual tests I have pushed https://review.openstack.org/295487 which >> >> > will run tempest serially reducing the competition for resources between >> >> > tests. >> >> > >> >> > Hopefully the subunit logs generated by this change can provide more >> >> > insight into where we are losing time during the tempest test runs. >> > >> > The results are in, we have gate-tempest-dsvm-full [0] and >> > gate-tempest-dsvm-neutron-full [1] job results where tempest ran >> > serially to reduce resource contention and provide accurateish per test >> > timing data. Both of these jobs ran on the same cloud so should have >> > comparable performance from the underlying VMs. >> > >> > gate-tempest-dsvm-full >> > Time spent in job before tempest: 700 seconds >> > Time spent running tempest: 2428 >> > Tempest tests run: 1269 (113 skipped) >> > >> > gate-tempest-dsvm-neutron-full >> > Time spent in job before tempest: 789 seconds >> > Time spent running tempest: 4407 seconds >> > Tempest tests run: 1510 (76 skipped) >> > >> > All times above are wall time as recorded by Jenkins. >> > >> > We can also compare the 10 slowest tests in the non neutron job against >> > their runtimes in the neutron job. (note this isn't a list of the top 10 >> > slowest tests in the neutron job because that job runs extra tests). >> > >> > nova net job >> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern >> > 85.232 >> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern >> > 83.319 >> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance >> > 50.338 >> > tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern >> > 43.494 >> > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario >> > 40.225 >> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance >> >39.653 >> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete >> > 37.720 >> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete >> > 36.355 >> > tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped >> >27.375 >> > tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks >> > 27.025 >> > >> > neutron job >> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern >> >
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote: > On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan > wrote: > > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: > >> On 03/21/2016 04:09 PM, Clark Boylan wrote: > >> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: > >> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: > >> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > >> Do you have an a better insight of job runtimes vs jobs in other > >> projects? > >> Most of the time in the job runtime is actually spent setting the > >> infrastructure up, and I am not sure we can do anything about it, > >> unless > >> we > >> take this with Infra. > >> >>> > >> >>> I haven't done a comparison yet buts lets break down the runtime of a > >> >>> recent successful neutron full run against neutron master [0]. > >> >> > >> >> And now for some comparative data from the gate-tempest-dsvm-full job > >> >> [0]. This job also ran against a master change that merged and ran in > >> >> the same cloud and region as the neutron job. > >> >> > >> > snip > >> >> Generally each step of this job was quicker. There were big differences > >> >> in devstack and tempest run time though. Is devstack much slower to > >> >> setup neutron when compared to nova net? For tempest it looks like we > >> >> run ~1510 tests against neutron and only ~1269 against nova net. This > >> >> may account for the large difference there. I also recall that we run > >> >> ipv6 tempest tests against neutron deployments that were inefficient and > >> >> booted 2 qemu VMs per test (not sure if that is still the case but > >> >> illustrates that the tests themselves may not be very quick in the > >> >> neutron case). > >> > > >> > Looking at the tempest slowest tests output for each of these jobs > >> > (neutron and nova net) some tests line up really well across jobs and > >> > others do not. In order to get a better handle on the runtime for > >> > individual tests I have pushed https://review.openstack.org/295487 which > >> > will run tempest serially reducing the competition for resources between > >> > tests. > >> > > >> > Hopefully the subunit logs generated by this change can provide more > >> > insight into where we are losing time during the tempest test runs. > > > > The results are in, we have gate-tempest-dsvm-full [0] and > > gate-tempest-dsvm-neutron-full [1] job results where tempest ran > > serially to reduce resource contention and provide accurateish per test > > timing data. Both of these jobs ran on the same cloud so should have > > comparable performance from the underlying VMs. > > > > gate-tempest-dsvm-full > > Time spent in job before tempest: 700 seconds > > Time spent running tempest: 2428 > > Tempest tests run: 1269 (113 skipped) > > > > gate-tempest-dsvm-neutron-full > > Time spent in job before tempest: 789 seconds > > Time spent running tempest: 4407 seconds > > Tempest tests run: 1510 (76 skipped) > > > > All times above are wall time as recorded by Jenkins. > > > > We can also compare the 10 slowest tests in the non neutron job against > > their runtimes in the neutron job. (note this isn't a list of the top 10 > > slowest tests in the neutron job because that job runs extra tests). > > > > nova net job > > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern > > 85.232 > > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern > > 83.319 > > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance > > 50.338 > > tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern > > 43.494 > > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario > > 40.225 > > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance > >39.653 > > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete > > 37.720 > > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete > > 36.355 > > tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped > >27.375 > > tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks > > 27.025 > > > > neutron job > > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern > > 110.345 > > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern > >1
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: >> On 03/21/2016 04:09 PM, Clark Boylan wrote: >> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: >> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: >> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: >> Do you have an a better insight of job runtimes vs jobs in other >> projects? >> Most of the time in the job runtime is actually spent setting the >> infrastructure up, and I am not sure we can do anything about it, unless >> we >> take this with Infra. >> >>> >> >>> I haven't done a comparison yet buts lets break down the runtime of a >> >>> recent successful neutron full run against neutron master [0]. >> >> >> >> And now for some comparative data from the gate-tempest-dsvm-full job >> >> [0]. This job also ran against a master change that merged and ran in >> >> the same cloud and region as the neutron job. >> >> >> > snip >> >> Generally each step of this job was quicker. There were big differences >> >> in devstack and tempest run time though. Is devstack much slower to >> >> setup neutron when compared to nova net? For tempest it looks like we >> >> run ~1510 tests against neutron and only ~1269 against nova net. This >> >> may account for the large difference there. I also recall that we run >> >> ipv6 tempest tests against neutron deployments that were inefficient and >> >> booted 2 qemu VMs per test (not sure if that is still the case but >> >> illustrates that the tests themselves may not be very quick in the >> >> neutron case). >> > >> > Looking at the tempest slowest tests output for each of these jobs >> > (neutron and nova net) some tests line up really well across jobs and >> > others do not. In order to get a better handle on the runtime for >> > individual tests I have pushed https://review.openstack.org/295487 which >> > will run tempest serially reducing the competition for resources between >> > tests. >> > >> > Hopefully the subunit logs generated by this change can provide more >> > insight into where we are losing time during the tempest test runs. > > The results are in, we have gate-tempest-dsvm-full [0] and > gate-tempest-dsvm-neutron-full [1] job results where tempest ran > serially to reduce resource contention and provide accurateish per test > timing data. Both of these jobs ran on the same cloud so should have > comparable performance from the underlying VMs. > > gate-tempest-dsvm-full > Time spent in job before tempest: 700 seconds > Time spent running tempest: 2428 > Tempest tests run: 1269 (113 skipped) > > gate-tempest-dsvm-neutron-full > Time spent in job before tempest: 789 seconds > Time spent running tempest: 4407 seconds > Tempest tests run: 1510 (76 skipped) > > All times above are wall time as recorded by Jenkins. > > We can also compare the 10 slowest tests in the non neutron job against > their runtimes in the neutron job. (note this isn't a list of the top 10 > slowest tests in the neutron job because that job runs extra tests). > > nova net job > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern > 85.232 > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern > 83.319 > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance > 50.338 > tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern > 43.494 > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario > 40.225 > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance >39.653 > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete > 37.720 > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete > 36.355 > tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped >27.375 > tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks > 27.025 > > neutron job > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern > 110.345 > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern >108.170 > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance > 63.852 > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance >
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote: > On 03/21/2016 04:09 PM, Clark Boylan wrote: > > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: > >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: > >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > Do you have an a better insight of job runtimes vs jobs in other > projects? > Most of the time in the job runtime is actually spent setting the > infrastructure up, and I am not sure we can do anything about it, unless > we > take this with Infra. > >>> > >>> I haven't done a comparison yet buts lets break down the runtime of a > >>> recent successful neutron full run against neutron master [0]. > >> > >> And now for some comparative data from the gate-tempest-dsvm-full job > >> [0]. This job also ran against a master change that merged and ran in > >> the same cloud and region as the neutron job. > >> > > snip > >> Generally each step of this job was quicker. There were big differences > >> in devstack and tempest run time though. Is devstack much slower to > >> setup neutron when compared to nova net? For tempest it looks like we > >> run ~1510 tests against neutron and only ~1269 against nova net. This > >> may account for the large difference there. I also recall that we run > >> ipv6 tempest tests against neutron deployments that were inefficient and > >> booted 2 qemu VMs per test (not sure if that is still the case but > >> illustrates that the tests themselves may not be very quick in the > >> neutron case). > > > > Looking at the tempest slowest tests output for each of these jobs > > (neutron and nova net) some tests line up really well across jobs and > > others do not. In order to get a better handle on the runtime for > > individual tests I have pushed https://review.openstack.org/295487 which > > will run tempest serially reducing the competition for resources between > > tests. > > > > Hopefully the subunit logs generated by this change can provide more > > insight into where we are losing time during the tempest test runs. The results are in, we have gate-tempest-dsvm-full [0] and gate-tempest-dsvm-neutron-full [1] job results where tempest ran serially to reduce resource contention and provide accurateish per test timing data. Both of these jobs ran on the same cloud so should have comparable performance from the underlying VMs. gate-tempest-dsvm-full Time spent in job before tempest: 700 seconds Time spent running tempest: 2428 Tempest tests run: 1269 (113 skipped) gate-tempest-dsvm-neutron-full Time spent in job before tempest: 789 seconds Time spent running tempest: 4407 seconds Tempest tests run: 1510 (76 skipped) All times above are wall time as recorded by Jenkins. We can also compare the 10 slowest tests in the non neutron job against their runtimes in the neutron job. (note this isn't a list of the top 10 slowest tests in the neutron job because that job runs extra tests). nova net job tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern 85.232 tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern 83.319 tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance 50.338 tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern 43.494 tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario 40.225 tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 39.653 tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete 37.720 tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete 36.355 tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped 27.375 tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks 27.025 neutron job tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern 110.345 tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern 108.170 tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance 63.852 tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance 59.931 tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern 57.835 tempest.scenario.test_minimum
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On 03/21/2016 04:09 PM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: Do you have an a better insight of job runtimes vs jobs in other projects? Most of the time in the job runtime is actually spent setting the infrastructure up, and I am not sure we can do anything about it, unless we take this with Infra. >>> >>> I haven't done a comparison yet buts lets break down the runtime of a >>> recent successful neutron full run against neutron master [0]. >> >> And now for some comparative data from the gate-tempest-dsvm-full job >> [0]. This job also ran against a master change that merged and ran in >> the same cloud and region as the neutron job. >> > snip >> Generally each step of this job was quicker. There were big differences >> in devstack and tempest run time though. Is devstack much slower to >> setup neutron when compared to nova net? For tempest it looks like we >> run ~1510 tests against neutron and only ~1269 against nova net. This >> may account for the large difference there. I also recall that we run >> ipv6 tempest tests against neutron deployments that were inefficient and >> booted 2 qemu VMs per test (not sure if that is still the case but >> illustrates that the tests themselves may not be very quick in the >> neutron case). > > Looking at the tempest slowest tests output for each of these jobs > (neutron and nova net) some tests line up really well across jobs and > others do not. In order to get a better handle on the runtime for > individual tests I have pushed https://review.openstack.org/295487 which > will run tempest serially reducing the competition for resources between > tests. > > Hopefully the subunit logs generated by this change can provide more > insight into where we are losing time during the tempest test runs. Subunit logs aren't the full story here. Activity in addCleanup doesn't get added to the subunit time accounting for the test, which causes some interesting issues when waiting for resources to delete. I would be especially cautious of that on some of these. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: > > On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > > > Do you have an a better insight of job runtimes vs jobs in other > > > projects? > > > Most of the time in the job runtime is actually spent setting the > > > infrastructure up, and I am not sure we can do anything about it, unless > > > we > > > take this with Infra. > > > > I haven't done a comparison yet buts lets break down the runtime of a > > recent successful neutron full run against neutron master [0]. > > And now for some comparative data from the gate-tempest-dsvm-full job > [0]. This job also ran against a master change that merged and ran in > the same cloud and region as the neutron job. > snip > Generally each step of this job was quicker. There were big differences > in devstack and tempest run time though. Is devstack much slower to > setup neutron when compared to nova net? For tempest it looks like we > run ~1510 tests against neutron and only ~1269 against nova net. This > may account for the large difference there. I also recall that we run > ipv6 tempest tests against neutron deployments that were inefficient and > booted 2 qemu VMs per test (not sure if that is still the case but > illustrates that the tests themselves may not be very quick in the > neutron case). Looking at the tempest slowest tests output for each of these jobs (neutron and nova net) some tests line up really well across jobs and others do not. In order to get a better handle on the runtime for individual tests I have pushed https://review.openstack.org/295487 which will run tempest serially reducing the competition for resources between tests. Hopefully the subunit logs generated by this change can provide more insight into where we are losing time during the tempest test runs. Clark __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > > Do you have an a better insight of job runtimes vs jobs in other > > projects? > > Most of the time in the job runtime is actually spent setting the > > infrastructure up, and I am not sure we can do anything about it, unless > > we > > take this with Infra. > > I haven't done a comparison yet buts lets break down the runtime of a > recent successful neutron full run against neutron master [0]. And now for some comparative data from the gate-tempest-dsvm-full job [0]. This job also ran against a master change that merged and ran in the same cloud and region as the neutron job. Basic host setup takes 63 seconds. Start of job to 2016-03-21 16:46:41.058 [1] Workspace setup takes 380 seconds. 2016-03-21 16:46:41.058 [1] to 2016-03-21 16:53:01.754 [2] Devstack takes 890 seconds. 2016-03-21 16:53:19.235 [3] to 2016-03-21 17:08:10.082 [4] Loading old tempest subunit streams takes 63 seconds. 2016-03-21 17:08:10.111 [5] to 2016-03-21 17:09:13.454 [6] Tempest takes 1347 seconds. 2016-03-21 17:09:13.587 [7] to 2016-03-21 17:31:40.885 [8] Then we spend the rest of the test time (52 seconds) cleaning up. 2016-03-21 17:31:40.885 [8] to end of job. [0] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/ [1] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_46_41_058 [2] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_53_01_754 [3] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_53_19_235 [4] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_08_10_082 [5] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_08_10_111 [6] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_09_13_454 [7] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_09_13_587 [8] http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_31_40_885 Generally each step of this job was quicker. There were big differences in devstack and tempest run time though. Is devstack much slower to setup neutron when compared to nova net? For tempest it looks like we run ~1510 tests against neutron and only ~1269 against nova net. This may account for the large difference there. I also recall that we run ipv6 tempest tests against neutron deployments that were inefficient and booted 2 qemu VMs per test (not sure if that is still the case but illustrates that the tests themselves may not be very quick in the neutron case). Of course we may also be seeing differences in cloud VMs (though tried to control for by looking at tests that ran in the same region). Hard to say without more data. In any case this hopefully serves as a good starting point for others to dig into the ~20 minute discrepancy between nova net + tempest and neutron + tempest. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On 21 March 2016 at 11:08, Clark Boylan wrote: > On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > > Do you have an a better insight of job runtimes vs jobs in other > > projects? > > Most of the time in the job runtime is actually spent setting the > > infrastructure up, and I am not sure we can do anything about it, unless > > we > > take this with Infra. > > I haven't done a comparison yet buts lets break down the runtime of a > recent successful neutron full run against neutron master [0]. > > Basic host setup takes 65 seconds. Start of job to 2016-03-17 > 22:14:27.397 [1] > Workspace setup takes 520 seconds. 2016-03-17 22:14:27.397 [1] to > 2016-03-17 22:23:07.429 [2] > Devstack takes 1205 seconds. 2016-03-17 22:23:18.760 [3] to 2016-03-17 > 22:43:23.339 [4] > Loading old tempest subunit streams takes 155 seconds. 2016-03-17 > 22:43:23.340 [5] to 2016-03-17 22:45:58.061 [6] > Tempest takes 1982 seconds. 2016-03-17 22:45:58.201 [7] to 2016-03-17 > 23:19:00.117 [8] > Then we spend the rest of the test time (76 seconds) cleaning up. > 2016-03-17 23:19:00.117 [8] to end of job. > > Note that I haven't accounted for all of the time used and instead of > focused on the major steps that use the most time. Also it is Monday > morning and some of my math may be off. > [0] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/ > [1] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_14_27_397 > [2] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_07_429 > [3] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_18_760 > [4] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_339 > [5] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_340 > [6] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_061 > [7] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_201 > [8] > > http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_23_19_00_117 > > One big takeaway from this is that the vast majority of the time is > spent in devstack and tempest not in the infrastructure setup. You > should be able to dig into both the devstack setup and tempest test > runtimes and hopefully speed things up. > > Hopefully this gives you enough information to get started into digging > on this. > Clark: thanks for this insightful response. I should clarify my comment about infrastructure setup (it is Monday for me too :)): what I meant was the there is a good portion of time spent to get to a point where tests can be run. That includes node setup as well as stacking. That is obviously less than 50%, but even >30% feels like a substantial overhead. I am not sure what we can do about it, but looping you in this discussion seemed like the least this thread should do. That said, there are many tempest tests that take over 30 seconds to complete and those do not even touch Neutron. For those that do, then we should clearly identify where the slowness comes from and I think that's where, as a Neutron team, our focus should be. IMO, before we go on and talk about evicting jobs, I think we should take a closer look (i.e. profiling) where time is spent so that we can make each test run leaner. [1] http://status.openstack.org//openstack-health/#/job/gate-tempest-dsvm-neutron-full?groupKey=project&resolutionKey=hour&end=2016-03-21T18:14:19.534Z > > > Clark > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: > Do you have an a better insight of job runtimes vs jobs in other > projects? > Most of the time in the job runtime is actually spent setting the > infrastructure up, and I am not sure we can do anything about it, unless > we > take this with Infra. I haven't done a comparison yet buts lets break down the runtime of a recent successful neutron full run against neutron master [0]. Basic host setup takes 65 seconds. Start of job to 2016-03-17 22:14:27.397 [1] Workspace setup takes 520 seconds. 2016-03-17 22:14:27.397 [1] to 2016-03-17 22:23:07.429 [2] Devstack takes 1205 seconds. 2016-03-17 22:23:18.760 [3] to 2016-03-17 22:43:23.339 [4] Loading old tempest subunit streams takes 155 seconds. 2016-03-17 22:43:23.340 [5] to 2016-03-17 22:45:58.061 [6] Tempest takes 1982 seconds. 2016-03-17 22:45:58.201 [7] to 2016-03-17 23:19:00.117 [8] Then we spend the rest of the test time (76 seconds) cleaning up. 2016-03-17 23:19:00.117 [8] to end of job. Note that I haven't accounted for all of the time used and instead of focused on the major steps that use the most time. Also it is Monday morning and some of my math may be off. [0] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/ [1] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_14_27_397 [2] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_07_429 [3] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_18_760 [4] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_339 [5] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_340 [6] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_061 [7] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_201 [8] http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_23_19_00_117 One big takeaway from this is that the vast majority of the time is spent in devstack and tempest not in the infrastructure setup. You should be able to dig into both the devstack setup and tempest test runtimes and hopefully speed things up. Hopefully this gives you enough information to get started into digging on this. Clark __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On 21 March 2016 at 04:15, Rossella Sblendido wrote: > Hello all, > > the tests that we run on the gate for Neutron take pretty long (longer > than one hour). I think we can improve that and make better use of the > resources. Here are some ideas that came up when Ihar and I discussed this topic > during the sprint in Brno: > > 1) We have few jobs that are non-voting. I think it's OK to have > non-voting jobs for a limited amount of time, while we try to make them > stable but this shouldn't be too long, otherwise we waste time running > those tests without even using the results. If a job is still not-voting > after 3 months (or 4 or 6, we can find a good time interval) the job > should be removed. My hope is that this threat will make us find some > time to actually fix the job and make it vote :) > > 2) multi-node jobs run for every patch set. Is that really what we want? > They take pretty long. We could move them to a periodic job. I know we > can easily forget about periodic jobs, to avoid that we could run them > in the gate queue too. If a patch can't merge because of a failure we > will fix the issue. To trigger them for a specific patch that might > affect multi-node we can run the experimental jobs. > > Thoughts? > Thanks for raising the topic. That said, I am not sure I see how what you propose is going to make things better. Jobs, either non voting or multnode run in parallel, thus reducing the number of jobs won't reduce the time to feedback though it would improve resource usage. We are already pretty conscious of that and compared to other projects we already run a limited numbers of jobs, but we can do better, of course. Do you have an a better insight of job runtimes vs jobs in other projects? Most of the time in the job runtime is actually spent setting the infrastructure up, and I am not sure we can do anything about it, unless we take this with Infra. > > Rossella > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
On 21 March 2016 at 04:32, Sean M. Collins wrote: > Rossella Sblendido wrote: > > 2) multi-node jobs run for every patch set. Is that really what we want? > > They take pretty long. We could move them to a periodic job. > > I would rather remove all the single-node jobs. Nova has been moving to > multinode jobs for their gate (if I recall correctly my > conversation with Dan Smith) and we should be moving in this direction > too. We should test Neutron the way it is deployed in production. > > This was not true last time I checked. Switching to multinode jobs for the gate means that all projects in the integrated gate will have to use the miltinode configuration. > Also, who is really monitoring the periodic jobs? Truthfully? I know > there are some IPv6 jobs that are periodic and I'll be the first to > admit that I am not following them *at all*. > > So, my thinking is, unless it's running at the gate and inflicting pain > on people, it's not going to be a treated as a priority. Look at Linux > Bridge - serious race conditions that existed for years only > got fixed once I inflicted pain on all the Neutron devs by making it > voting and running on every patchset (sorry, not sorry). > > -- > Sean M. Collins > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
> On Mar 21, 2016, at 5:40 AM, Ihar Hrachyshka wrote: > > Sean M. Collins wrote: > >> Rossella Sblendido wrote: >>> 2) multi-node jobs run for every patch set. Is that really what we want? >>> They take pretty long. We could move them to a periodic job. >> >> I would rather remove all the single-node jobs. Nova has been moving to >> multinode jobs for their gate (if I recall correctly my >> conversation with Dan Smith) and we should be moving in this direction >> too. We should test Neutron the way it is deployed in production. >> >> Also, who is really monitoring the periodic jobs? Truthfully? I know >> there are some IPv6 jobs that are periodic and I'll be the first to >> admit that I am not following them *at all*. > > Well, stable maintainers track their periodic job failures. :) Email > notifications when something starts failing help. > >> >> So, my thinking is, unless it's running at the gate and inflicting pain >> on people, it's not going to be a treated as a priority. Look at Linux >> Bridge - serious race conditions that existed for years only >> got fixed once I inflicted pain on all the Neutron devs by making it >> voting and running on every patchset (sorry, not sorry). > > I think there is still common ground between you and Rossella’s stances: the > fact that we want to inflict gating pain does not mean that we want to > execute every single job on each PS uploaded to gerrit. For some advanced and > non-obvious checks [like partial grenade] the validation could be probably > postponed till the patch hits the gate. > > Yes, sometimes it will mean gate being reset due to the bad patch. This can > be avoided in most of cases if reviewers and the author for a patch that > potentially touches a specific scenario execute the jobs before hitting the > gate with the patch [for example, if the job is in experimental set, it’s a > matter of ‘check experimental’ before pressing W+1]. We have been pretty consciously moving neutron jobs to cause pain to *neutron* and not everyone else, which is the opposite of a “gate only” plan. Aside from that being against infra policy, I think I’m reading between the lines that folks are wanting faster iterations between patchsets. I note that the standard -full job is up to 55-65 minutes, from it’s old time of 40-45. Have we characterized why that’s so much slower now? Perhaps addressing that will bring down to the turn-around for all. Thanks, doug > > Ihar > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
Sean M. Collins wrote: Rossella Sblendido wrote: 2) multi-node jobs run for every patch set. Is that really what we want? They take pretty long. We could move them to a periodic job. I would rather remove all the single-node jobs. Nova has been moving to multinode jobs for their gate (if I recall correctly my conversation with Dan Smith) and we should be moving in this direction too. We should test Neutron the way it is deployed in production. Also, who is really monitoring the periodic jobs? Truthfully? I know there are some IPv6 jobs that are periodic and I'll be the first to admit that I am not following them *at all*. Well, stable maintainers track their periodic job failures. :) Email notifications when something starts failing help. So, my thinking is, unless it's running at the gate and inflicting pain on people, it's not going to be a treated as a priority. Look at Linux Bridge - serious race conditions that existed for years only got fixed once I inflicted pain on all the Neutron devs by making it voting and running on every patchset (sorry, not sorry). I think there is still common ground between you and Rossella’s stances: the fact that we want to inflict gating pain does not mean that we want to execute every single job on each PS uploaded to gerrit. For some advanced and non-obvious checks [like partial grenade] the validation could be probably postponed till the patch hits the gate. Yes, sometimes it will mean gate being reset due to the bad patch. This can be avoided in most of cases if reviewers and the author for a patch that potentially touches a specific scenario execute the jobs before hitting the gate with the patch [for example, if the job is in experimental set, it’s a matter of ‘check experimental’ before pressing W+1]. Ihar __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
Rossella Sblendido wrote: > 2) multi-node jobs run for every patch set. Is that really what we want? > They take pretty long. We could move them to a periodic job. I would rather remove all the single-node jobs. Nova has been moving to multinode jobs for their gate (if I recall correctly my conversation with Dan Smith) and we should be moving in this direction too. We should test Neutron the way it is deployed in production. Also, who is really monitoring the periodic jobs? Truthfully? I know there are some IPv6 jobs that are periodic and I'll be the first to admit that I am not following them *at all*. So, my thinking is, unless it's running at the gate and inflicting pain on people, it's not going to be a treated as a priority. Look at Linux Bridge - serious race conditions that existed for years only got fixed once I inflicted pain on all the Neutron devs by making it voting and running on every patchset (sorry, not sorry). -- Sean M. Collins __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?
Hello all, the tests that we run on the gate for Neutron take pretty long (longer than one hour). I think we can improve that and make better use of the resources. Here are some ideas that came up when Ihar and I discussed this topic during the sprint in Brno: 1) We have few jobs that are non-voting. I think it's OK to have non-voting jobs for a limited amount of time, while we try to make them stable but this shouldn't be too long, otherwise we waste time running those tests without even using the results. If a job is still not-voting after 3 months (or 4 or 6, we can find a good time interval) the job should be removed. My hope is that this threat will make us find some time to actually fix the job and make it vote :) 2) multi-node jobs run for every patch set. Is that really what we want? They take pretty long. We could move them to a periodic job. I know we can easily forget about periodic jobs, to avoid that we could run them in the gate queue too. If a patch can't merge because of a failure we will fix the issue. To trigger them for a specific patch that might affect multi-node we can run the experimental jobs. Thoughts? Rossella __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev