Re: [openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures

Matt Riedemann Fri, 28 Mar 2014 10:48:17 -0700


On 3/27/2014 8:00 AM, Salvatore Orlando wrote:

On 26 March 2014 19:19, James E. Blair <jebl...@openstack.org
<mailto:jebl...@openstack.org>> wrote:

    Salvatore Orlando <sorla...@nicira.com <mailto:sorla...@nicira.com>>
    writes:

     > On another note, we noticed that the duplicated jobs currently
    executed for
     > redundancy in neutron actually seem to point all to the same
    build id.
     > I'm not sure then if we're actually executing each job twice or just
     > duplicating lines in the jenkins report.

    Thanks for catching that, and I'm sorry that didn't work right.  Zuul is
    in fact running the jobs twice, but it is only looking at one of them
    when sending reports and (more importantly) decided whether the change
    has succeeded or failed.  Fixing this is possible, of course, but turns
    out to be a rather complicated change.  Since we don't make heavy use of
    this feature, I lean toward simply instantiating multiple instances of
    identically configured jobs and invoking them (eg "neutron-pg-1",
    "neutron-pg-2").

    Matthew Treinish has already worked up a patch to do that, and I've
    written a patch to revert the incomplete feature from Zuul.

That makes sense to me. I think it is just a matter about the results
are reported to gerrit since from what I gather in logstash the jobs are
executed twice for each new patchset or recheck.

For the status of the full job, I gave a look at the numbers reported by
Rossella.
All the bugs are already known; some of them are not even bug; others
have been recently fixed (given the time span of Rossella analysis and
the fact it covers also non-rebased patches it might be possible to have
this kind of false positive).

of all full job failures, 44% should be discarded.
Bug 1291611 (12%) is definitely not a neutron bug... hopefully.
Bug 1281969 (12%) is really too generic.
It bears the hallmark of bug1283522, and therefore the high number might
be due to the fact that trunk was plagued by this bug up to a few days
before the analysis.
However, it's worth noting that there is also another instance of "lock
timeout" which has caused 11 failures in full job in the past week.
A new bug has been filed for this issue:
https://bugs.launchpad.net/neutron/+bug/1298355
Bug 1294603 was related to a test now skipped. It is still being debated
whether the problem lies in test design, neutron LBaaS or neutron L3.

The following bugs seem not to be neutron bugs:
1290642, 1291920, 1252971, 1257885

Bug 1292242 appears to have been fixed while the analysis was going on
Bug 1277439 instead is already known to affects neutron jobs occasionally.

The actual state of the job is perhaps better than what the raw numbers
say. I would keep monitoring it, and then make it voting after the
Icehouse release is cut, so that we'll be able to deal with possible
higher failure rate in the "quiet" period of the release cycle.

    -Jim

    _______________________________________________
    OpenStack-dev mailing list
    OpenStack-dev@lists.openstack.org
    <mailto:OpenStack-dev@lists.openstack.org>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

I reported this bug [1] yesterday. This was hit in our internal Tempestruns on RHEL 6.5 with x86_64 and the nova libvirt driver with theneutron openvswitch ML2 driver. We're running without tenant isolationon python 2.6 (no testr yet) so the tests are in serial. We're runningbasically the full API/CLI/Scenarios tests though, no filtering on thesmoke tag.

Out of 1,971 tests run, we had 3 failures where a nova instance failedto spawn because networking callback events failed, i.e. neutron sends aserver event request to nova and it's a bad URL so nova API pukes andthen the networking request in neutron server fails. As linked in thebug report I'm seeing the same neutron server log error showing up inlogstash for community jobs but it's not 100% failure. I haven't seenthe n-api log error show up in logstash though.


Just bringing this to people's attention in case anyone else sees it.

[1] https://bugs.launchpad.net/nova/+bug/1298640

--

Thanks,

Matt Riedemann


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [qa] [neutron] Neutron Full Parallel job - Last 4 days failures

Reply via email to