Re: [controller-dev] [integration-dev] [mdsal-dev] 3node cluster regression in Carbon - since Jan 5th

Mainzer, Gal Wed, 11 Jan 2017 03:24:33 -0800

Choosing one of the jobs to become ODL cloud job Is probably a right thing. 
Same goes to other ODL wise jobs that can hold other multiple projects.

As for the reason that other dependent projects should consider the methodology 
of being blocked until regression is fixed - I can offer 2 reasons:
1. As a community, we do need to face questions of why using ODL and not other 
open SDN controllers/standards. Such a process can strengthen our position - 
maybe it's a TSC discussion but I do think it should be addressed.
2. cloud use-case that uses kernel/protocol project uses many major features in 
those projects and it can keep them honest and stable as well. I do agree that 
false alarms/failing jobs is extremely problematic, but if I add the fact that 
those jobs are gating for netvirt, genius & neutron-northbound, we reduce the 
per project false alarm to minimum which IMO is cost effective to the value we 
get of project major feature coverage. 

Regarding the dashboard, I think that for one job this is great, but if we'll 
have multiple various use-cases and each one has a job with it's affecting 
projects, this can become more difficult.

Gal

-----Original Message-----
From: Jamo Luhrsen [mailto:jluhr...@gmail.com] 
Sent: Wednesday, January 11, 2017 12:12 AM
To: Mainzer, Gal <gmain...@hpe.com>; Robert Varga <n...@hq.sk>
Cc: Luis Gomez <ece...@gmail.com>; netvirt-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org; integration-...@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org; genius-...@lists.opendaylight.org
Subject: Re: [integration-dev] [controller-dev] [mdsal-dev] 3node cluster 
regression in Carbon - since Jan 5th

On 01/09/2017 08:33 PM, Mainzer, Gal wrote:
> Maybe not as a gate job but more of a periodic that runs every 4-6 hours.

well, if you put it like this, we already have it right? except we run every 24 
hours, not 4-6. I'm leery to bump our openstack csit jobs to a quicker cadence 
because each job takes ~90m (and it's growing). And we have a significant 
number of them. Our infra is sometimes pretty saturated as it is. Maybe we can 
pick just one of these jobs to run at a higher frequency.
That would at least give us a shorter time frame to search for patches, when we 
get a regression.

> 
> At this stage, those jobs are stable enough (and if not we are really 
> close to that point) for a single failure to state that there is a 
> regression. All we need to agree is that if that cloud suite is failing - all 
> relevant project should stop merging (even as a process and not by a gerrit 
> mechanic lock) until we are back from regression.

we aren't totally stable enough yet, imho. We are very close though.

<devils advocate>
however convincing these dependent projects to stop merging is asking a lot. 
Who says md-sal or controller gives a hoot about the "cloud" stuff working for 
opendaylight. maybe other ODL projects are still working fine and the 
assumption is our cloud projects are the projects that need to fix themselves, 
while everyone else can continue to do work.
</devils advocate>

> We can add additional job that with a single click, will collect all 
> commits from all relevant projects that are suspected
> - as Jamo said, ~15 are dependent. This will reduce our analysis time 
> by even maybe reverting suspected commits just to come back from the 
> regression and release the "lock".

this would be super cool.

> Without proper dashboard I'm not really expecting all projects to 
> monitor this, but at first stage we can monitor that job (like we do today) 
> and send critical mail on certain failures.

our poor man's dashboard is just the regular jenkins job landing page. We need 
to get things coming back blue on a consistent basis first, because every time 
I look it's mostly red and yellow.

JamO

> Sent from my iPhone
> 
> On 10 Jan 2017, at 1:31, Robert Varga <n...@hq.sk> wrote:
> 
>> On 01/09/2017 10:37 PM, Jamo Luhrsen wrote: so you mean to have this 
>> "cloud suite" run as a gating job on gerrit patches for all projects 
>> that our "ODL for openstack" needs, I think. That would be nice, but we 
>> would need to convince a lot of projects to do it. Looks like at least 12 
>> projects are dependencies for netvirt:
>> 
>> controller,dlux,genius,infrautils,mdsal,netconf,neutron,odlparent,ope
>> nflowplugin,ovsdb,sfc,yangtools
> 
> Judging from how long it takes for -autorelease and -distcheck to 
> stabilize for each release, I would hate to see such a job gate offset-0 
> patches.
> 
> In this particular set of projects, there is a history of breakage 
> happening on OFP/OVSDB and OVSDB/SFC (I think) boundaries.
> 
> Just my .02, Robert
> 
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Re: [controller-dev] [integration-dev] [mdsal-dev] 3node cluster regression in Carbon - since Jan 5th

Reply via email to