Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On 3 February 2016 at 18:49, Armando M. wrote: > > > On 3 February 2016 at 04:28, Sean Dague wrote: > >> On 02/02/2016 10:03 PM, Matthew Treinish wrote: >> > On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: >> >> Folks, >> >> >> >> We have some IPv6 related bugs [1,2,3] that have been lingering for >> some >> >> time now. They have been hurting the gate (e.g. [4] the most recent >> >> offending failure) and since it looks like they have been without >> owners >> >> nor a plan of action for some time, I made the hard decision of >> skipping >> >> them [5] ahead of the busy times ahead. >> > >> > So TBH I don't think the failure rate for these tests are really at a >> point >> > necessitating a skip: >> > >> > >> http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac >> > >> http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os >> > >> http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os >> > >> > (also just a cool side-note, you can see the very obvious performance >> regression >> > caused by the keystonemiddleware release and when we excluded that >> version in >> > requirements) >> > >> > Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% >> failure >> > rate, but the other 2 really aren't. I normally would be -1 on the skip >> patch >> > because of that. We try to save the skips for cases where the bugs are >> really >> > severe and preventing productivity at a large scale. >> > >> > But, in this case these ipv6 tests are kinda of out of place in >> tempest. Having >> > all the permutations of possible ip allocation configurations always >> seemed a >> > bit too heavy handed. These tests are also consistently in the top 10 >> slowest >> > for a run. We really should have trimmed down this set a while ago so >> we're only >> > have a single case in tempest. Neutron should own the other possible >> > configurations as an in-tree test. >> > >> > Brian Haley has a patch up from Dec. that was trying to clean it up: >> > >> > https://review.openstack.org/#/c/239868/ >> > >> > We probably should revisit that soon, since quite clearly no one is >> looking at >> > these right now. >> >> We definitely shouldn't be running all the IPv6 tests. >> >> But I also think the assumption that the failure rate is low is not a >> valid reason to keep a test. Unreliable tests that don't have anyone >> looking into them should be deleted. They are providing negative value. >> Because people just recheck past them even if their code made the race >> worse. So any legitimate issues they are exposing are being ignored. >> >> If the neutron PTL wants tests pulled, we should just do it. >> >> > Thanks for the support! Having said, I think it's important to make a > judgement call on a case by case basis, because removing tests blindly > might as well backfire. > > In this specific instance and all things considered, merging [2] (or even > better [1]) feel like a net gain. > > Cheers, > Armando > > [1] https://review.openstack.org/#/c/239868/ > [2] https://review.openstack.org/#/c/275457/ > > Btw I did respin [1], because I am still seeing intermittent failures. [1] https://review.openstack.org/#/c/275457/ -Sean >> >> -- >> Sean Dague >> http://dague.net >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On 3 February 2016 at 04:28, Sean Dague wrote: > On 02/02/2016 10:03 PM, Matthew Treinish wrote: > > On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: > >> Folks, > >> > >> We have some IPv6 related bugs [1,2,3] that have been lingering for some > >> time now. They have been hurting the gate (e.g. [4] the most recent > >> offending failure) and since it looks like they have been without owners > >> nor a plan of action for some time, I made the hard decision of skipping > >> them [5] ahead of the busy times ahead. > > > > So TBH I don't think the failure rate for these tests are really at a > point > > necessitating a skip: > > > > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac > > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os > > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os > > > > (also just a cool side-note, you can see the very obvious performance > regression > > caused by the keystonemiddleware release and when we excluded that > version in > > requirements) > > > > Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% > failure > > rate, but the other 2 really aren't. I normally would be -1 on the skip > patch > > because of that. We try to save the skips for cases where the bugs are > really > > severe and preventing productivity at a large scale. > > > > But, in this case these ipv6 tests are kinda of out of place in tempest. > Having > > all the permutations of possible ip allocation configurations always > seemed a > > bit too heavy handed. These tests are also consistently in the top 10 > slowest > > for a run. We really should have trimmed down this set a while ago so > we're only > > have a single case in tempest. Neutron should own the other possible > > configurations as an in-tree test. > > > > Brian Haley has a patch up from Dec. that was trying to clean it up: > > > > https://review.openstack.org/#/c/239868/ > > > > We probably should revisit that soon, since quite clearly no one is > looking at > > these right now. > > We definitely shouldn't be running all the IPv6 tests. > > But I also think the assumption that the failure rate is low is not a > valid reason to keep a test. Unreliable tests that don't have anyone > looking into them should be deleted. They are providing negative value. > Because people just recheck past them even if their code made the race > worse. So any legitimate issues they are exposing are being ignored. > > If the neutron PTL wants tests pulled, we should just do it. > > Thanks for the support! Having said, I think it's important to make a judgement call on a case by case basis, because removing tests blindly might as well backfire. In this specific instance and all things considered, merging [2] (or even better [1]) feel like a net gain. Cheers, Armando [1] https://review.openstack.org/#/c/239868/ [2] https://review.openstack.org/#/c/275457/ > -Sean > > -- > Sean Dague > http://dague.net > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On 02/02/2016 10:03 PM, Matthew Treinish wrote: > On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: >> Folks, >> >> We have some IPv6 related bugs [1,2,3] that have been lingering for some >> time now. They have been hurting the gate (e.g. [4] the most recent >> offending failure) and since it looks like they have been without owners >> nor a plan of action for some time, I made the hard decision of skipping >> them [5] ahead of the busy times ahead. > > So TBH I don't think the failure rate for these tests are really at a point > necessitating a skip: > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os > > (also just a cool side-note, you can see the very obvious performance > regression > caused by the keystonemiddleware release and when we excluded that version in > requirements) > > Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% failure > rate, but the other 2 really aren't. I normally would be -1 on the skip patch > because of that. We try to save the skips for cases where the bugs are really > severe and preventing productivity at a large scale. > > But, in this case these ipv6 tests are kinda of out of place in tempest. > Having > all the permutations of possible ip allocation configurations always seemed a > bit too heavy handed. These tests are also consistently in the top 10 slowest > for a run. We really should have trimmed down this set a while ago so we're > only > have a single case in tempest. Neutron should own the other possible > configurations as an in-tree test. > > Brian Haley has a patch up from Dec. that was trying to clean it up: > > https://review.openstack.org/#/c/239868/ > > We probably should revisit that soon, since quite clearly no one is looking at > these right now. We definitely shouldn't be running all the IPv6 tests. But I also think the assumption that the failure rate is low is not a valid reason to keep a test. Unreliable tests that don't have anyone looking into them should be deleted. They are providing negative value. Because people just recheck past them even if their code made the race worse. So any legitimate issues they are exposing are being ignored. If the neutron PTL wants tests pulled, we should just do it. -Sean -- Sean Dague http://dague.net __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On 2 February 2016 at 19:03, Matthew Treinish wrote: > On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: > > Folks, > > > > We have some IPv6 related bugs [1,2,3] that have been lingering for some > > time now. They have been hurting the gate (e.g. [4] the most recent > > offending failure) and since it looks like they have been without owners > > nor a plan of action for some time, I made the hard decision of skipping > > them [5] ahead of the busy times ahead. > > So TBH I don't think the failure rate for these tests are really at a point > necessitating a skip: > > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os > > http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os > > (also just a cool side-note, you can see the very obvious performance > regression > caused by the keystonemiddleware release and when we excluded that version > in > requirements) > > Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% > failure > rate, but the other 2 really aren't. I normally would be -1 on the skip > patch > because of that. We try to save the skips for cases where the bugs are > really > severe and preventing productivity at a large scale. > I am being overly aggressive here, just because I am conscious of the time of the year :) > > But, in this case these ipv6 tests are kinda of out of place in tempest. > Having > all the permutations of possible ip allocation configurations always > seemed a > bit too heavy handed. These tests are also consistently in the top 10 > slowest > for a run. We really should have trimmed down this set a while ago so > we're only > have a single case in tempest. Neutron should own the other possible > configurations as an in-tree test. > +1 > > Brian Haley has a patch up from Dec. that was trying to clean it up: > > https://review.openstack.org/#/c/239868/ > > We probably should revisit that soon, since quite clearly no one is > looking at > these right now. > > I thought that had merged already...my memory doesn't serve me as it used to anymore :( > > -Matt Treinish > > > > > > Now one might argue that skipping them is counterproductive because it > may > > allow other regressions to sneak in, but I am hoping that this > > controversial action will indeed smoke out the right folks. > > > > Comments welcome. > > > > Regards, > > Armando > > > > [1] https://bugs.launchpad.net/neutron/+bug/1477192 > > [2] https://bugs.launchpad.net/neutron/+bug/1509004 > > [3] https://bugs.launchpad.net/openstack-gate/+bug/1540983 > > [4] > > > http://logs.openstack.org/37/264937/5/gate/gate-tempest-dsvm-neutron-full/afeaabd//logs/testr_results.html.gz > > [5] https://review.openstack.org/#/c/275457/ > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On 02/02/2016 10:03 PM, Matthew Treinish wrote: On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: Folks, We have some IPv6 related bugs [1,2,3] that have been lingering for some time now. They have been hurting the gate (e.g. [4] the most recent offending failure) and since it looks like they have been without owners nor a plan of action for some time, I made the hard decision of skipping them [5] ahead of the busy times ahead. So TBH I don't think the failure rate for these tests are really at a point necessitating a skip: http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os (also just a cool side-note, you can see the very obvious performance regression caused by the keystonemiddleware release and when we excluded that version in requirements) Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% failure rate, but the other 2 really aren't. I normally would be -1 on the skip patch because of that. We try to save the skips for cases where the bugs are really severe and preventing productivity at a large scale. But, in this case these ipv6 tests are kinda of out of place in tempest. Having all the permutations of possible ip allocation configurations always seemed a bit too heavy handed. These tests are also consistently in the top 10 slowest for a run. We really should have trimmed down this set a while ago so we're only have a single case in tempest. Neutron should own the other possible configurations as an in-tree test. Brian Haley has a patch up from Dec. that was trying to clean it up: https://review.openstack.org/#/c/239868/ I just updated that to mark six of the eight tests as "slow" per your previous comment, such that only the dual-NIC/dual-stack tests are run in the gate, the others will run in the periodic nightly job. http://status.openstack.org/openstack-health/#/job/periodic-tempest-dsvm-all-master Will help lessen the impact until we can determine if it's the test or Neutron. -Brian We probably should revisit that soon, since quite clearly no one is looking at these right now. -Matt Treinish Now one might argue that skipping them is counterproductive because it may allow other regressions to sneak in, but I am hoping that this controversial action will indeed smoke out the right folks. Comments welcome. Regards, Armando [1] https://bugs.launchpad.net/neutron/+bug/1477192 [2] https://bugs.launchpad.net/neutron/+bug/1509004 [3] https://bugs.launchpad.net/openstack-gate/+bug/1540983 [4] http://logs.openstack.org/37/264937/5/gate/gate-tempest-dsvm-neutron-full/afeaabd//logs/testr_results.html.gz [5] https://review.openstack.org/#/c/275457/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
On Tue, Feb 02, 2016 at 05:09:47PM -0800, Armando M. wrote: > Folks, > > We have some IPv6 related bugs [1,2,3] that have been lingering for some > time now. They have been hurting the gate (e.g. [4] the most recent > offending failure) and since it looks like they have been without owners > nor a plan of action for some time, I made the hard decision of skipping > them [5] ahead of the busy times ahead. So TBH I don't think the failure rate for these tests are really at a point necessitating a skip: http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_multi_prefix_slaac http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os http://status.openstack.org/openstack-health/#/test/tempest.scenario.test_network_v6.TestGettingAddress.test_dhcp6_stateless_from_os (also just a cool side-note, you can see the very obvious performance regression caused by the keystonemiddleware release and when we excluded that version in requirements) Well, test_dualnet_dhcp6_stateless_from_os is kinda there with a ~10% failure rate, but the other 2 really aren't. I normally would be -1 on the skip patch because of that. We try to save the skips for cases where the bugs are really severe and preventing productivity at a large scale. But, in this case these ipv6 tests are kinda of out of place in tempest. Having all the permutations of possible ip allocation configurations always seemed a bit too heavy handed. These tests are also consistently in the top 10 slowest for a run. We really should have trimmed down this set a while ago so we're only have a single case in tempest. Neutron should own the other possible configurations as an in-tree test. Brian Haley has a patch up from Dec. that was trying to clean it up: https://review.openstack.org/#/c/239868/ We probably should revisit that soon, since quite clearly no one is looking at these right now. -Matt Treinish > > Now one might argue that skipping them is counterproductive because it may > allow other regressions to sneak in, but I am hoping that this > controversial action will indeed smoke out the right folks. > > Comments welcome. > > Regards, > Armando > > [1] https://bugs.launchpad.net/neutron/+bug/1477192 > [2] https://bugs.launchpad.net/neutron/+bug/1509004 > [3] https://bugs.launchpad.net/openstack-gate/+bug/1540983 > [4] > http://logs.openstack.org/37/264937/5/gate/gate-tempest-dsvm-neutron-full/afeaabd//logs/testr_results.html.gz > [5] https://review.openstack.org/#/c/275457/ signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [QA][Neutron] IPv6 related intermittent test failures
Folks, We have some IPv6 related bugs [1,2,3] that have been lingering for some time now. They have been hurting the gate (e.g. [4] the most recent offending failure) and since it looks like they have been without owners nor a plan of action for some time, I made the hard decision of skipping them [5] ahead of the busy times ahead. Now one might argue that skipping them is counterproductive because it may allow other regressions to sneak in, but I am hoping that this controversial action will indeed smoke out the right folks. Comments welcome. Regards, Armando [1] https://bugs.launchpad.net/neutron/+bug/1477192 [2] https://bugs.launchpad.net/neutron/+bug/1509004 [3] https://bugs.launchpad.net/openstack-gate/+bug/1540983 [4] http://logs.openstack.org/37/264937/5/gate/gate-tempest-dsvm-neutron-full/afeaabd//logs/testr_results.html.gz [5] https://review.openstack.org/#/c/275457/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev