Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 11, 2014 at 02:02:00PM -0400, Dan Prince wrote: > I've always referred to the virt/driver.py API as an internal API > meaning there are no guarantees about it being preserved across > releases. I'm not saying this is correct... just that it is what we've > got. While OpenStack attempts to do a good job at stabilizing its > public API's we haven't done the same for internal API's. It is actually > quite painful to be out of tree at this point as I've seen with the > Ironic driver being out of the Nova tree. (really glad that is back in > now!) Oh absolutely, I've always insisted that virt/driver.py is unstable and that as a result out of tree drivers get to keep both pieces when it breaks. > So because we haven't designed things to be split out in this regard we > can't just go and do it. I don't think that conclusion follows directly. We certainly need to do some prep work to firm up our virt driver interface, as outlined in my original mail, but if we agreed to push forward in this I think it is practical to get that done in Kilo and split in L. It is mostly a matter of having the will todo it IMHO. > I tinkered with some numbers... not sure if this helps or hurts my > stance but here goes. By my calculation this is the number of commits > we've made that touched each virt driver tree for the last 3 releases > plus stuff done to-date in Juno. > > Created using a command like this in each virt directory for each > release: git log origin/stable/havana..origin/stable/icehouse > --no-merges --pretty=oneline . | wc -l > > essex => folsom: > > baremetal: 26 > hyperv: 9 > libvirt: 222 > vmwareapi: 18 > xenapi: 164 > * total for above: 439 > > folsom => grizzly: > > baremetal: 83 > hyperv: 58 > libvirt: 254 > vmwareapi: 59 > xenapi: 126 >* total for above: 580 > > grizzly => havana: > > baremetal: 48 > hyperv: 55 > libvirt: 157 > vmwareapi: 105 > xenapi: 123 >* total for above: 488 > > havana => icehouse: > > baremetal: 45 > hyperv: 42 > libvirt: 212 > vmwareapi: 121 > xenapi: 100 >* total for above: 520 > > icehouse => master: > > baremetal: 26 > hyperv: 32 > libvirt: 188 > vmwareapi: 121 > xenapi: 71 >* total for above: 438 > > --- > > A couple of things jump out at me from the numbers: > > -drivers that are being deprecated (baremetal) still have lots of > changes. Some of these changes are valid bug fixes for the driver but a > majority of them are actually related to internal cleanups and interface > changes. This goes towards the fact that Nova isn't mature enough to do > a split like this yet. Our position that the virt driver is internal only, has permitted us to make backwards incompatible changes to it at will. Given that freedom people inevitably take that route since is is the least effort option. If our position had been that the virt driver needed to be forwards compatible, people would have been forced to make the same changes without breaking existing drivers. IOW, the fact that we've made lots of changes to baremetal historically, doesn't imply that we can't decide to make the virt driver API stable henceforth & thus avoid further changes of that kind. > -the number of commits landed isn't growing *that* much across releases > in the virt driver trees. Presumably we think we were doing a better job > 2 years ago? But the number of changes in the virt trees is largely the > same... perhaps this is because people aren't submitting stuff because > they are frustrated though? Our core team size & thus review bandwidth has been fairly static over that time, so the only way virt driver commits could have risen is if core reviewers increased their focus on virt drivers at the expense of other parts of nova. I actually read those numbers as showing that as we've put more effort into reviewing vmware contributions, we've lost resource going into libvirt contributions. In addition we're of course missing out on capturing the changes that we've never had submitted, or submitted by abandoned, or submitted by slipped across multiple releases waiting for merge. Overall I think the figures paint a pretty depressing picture of no overall growth, perhaps even a decline. > > For comparison here are the total number of commits for each Nova > release (includes the above commits): > > essex -> folsom: 1708 > folsom -> grizzly: 2131 > grizzly -> havana: 2188 > havana -> icehouse: 1696 > icehouse -> master: 1493 > > --- So we've still a way to go for juno cycle, but I'd be surprised if we got beyond the havana numbers given where we are today. Again I think those numbers show a plateau or even decline, which just reinforces my point that our model is not scaling today. > So say around 30% of the commits for a given release touch the virt > drivers themselves.. many of them aren't specifically related to the > virt drivers. Rather just general Nova internal cleanups because the > interfaces aren't stable. > > And while
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/11/2014 12:02 PM, Dan Prince wrote: Maybe I'm impatient (I totally am!) but I see much of the review slowdown as a result of the feedback loop times increasing over the years. OpenStack has some really great CI and testing but I think our focus on not breaking things actually has us painted into a corner. We are losing our agility and the review process is paying the price. At this point I think splitting out the virt drivers would be more of a distraction than a help. I think the only solution to feedback loop times increasing is to scale the review process, which I think means giving more people responsibility for a smaller amount of code. I don't think it's strictly necessary to split the code out into a totally separate repo, but I do think it would make sense to have changes that are entirely contained within a virt driver be reviewed only by developers of that virt driver rather than requiring review by the project as a whole. And they should only have to pass a subset of the CI testing--that way they wouldn't be held up by gating bugs in other areas. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, 2014-09-04 at 11:24 +0100, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. > I've always referred to the virt/driver.py API as an internal API meaning there are no guarantees about it being preserved across releases. I'm not saying this is correct... just that it is what we've got. While OpenStack attempts to do a good job at stabilizing its public API's we haven't done the same for internal API's. It is actually quite painful to be out of tree at this point as I've seen with the Ironic driver being out of the Nova tree. (really glad that is back in now!) So because we haven't designed things to be split out in this regard we can't just go and do it. I tinkered with some numbers... not sure if this helps or hurts my stance but here goes. By my calculation this is the number of commits we've made that touched each virt driver tree for the last 3 releases plus stuff done to-date in Juno. Created using a command like this in each virt directory for each release: git log origin/stable/havana..origin/stable/icehouse --no-merges --pretty=oneline . | wc -l essex => folsom: baremetal: 26 hyperv: 9 libvirt: 222 vmwareapi: 18 xenapi: 164 * total for above: 439 folsom => grizzly: baremetal: 83 hyperv: 58 libvirt: 254 vmwareapi: 59 xenapi: 126 * total for above: 580 grizzly => havana: baremetal: 48 hyperv: 55 libvirt: 157 vmwareapi: 105 xenapi: 123 * total for above: 488 havana => icehouse: baremetal: 45 hyperv: 42 libvirt: 212 vmwareapi: 121 xenapi: 100 * total for above: 520 icehouse => master: baremetal: 26 hyperv: 32 libvirt: 188 vmwareapi: 121 xenapi: 71 * total for above: 438 --- A couple of things jump out at me from the numbers: -drivers that are being deprecated (baremetal) still have lots of changes. Some of these changes are valid bug fixes for the driver but a majority of them are actually related to internal cleanups and interface changes. This goes towards the fact that Nova isn't mature enough to do a split like this yet. -the number of commits landed isn't growing *that* much across releases in the virt driver trees. Presumably we think we were doing a better job 2 years ago? But the number of changes in the virt trees is largely the same... perhaps this is because people aren't submitting stuff because they are frustrated though? --- For comparison here are the total number of commits for each Nova release (includes the above commits): essex -> folsom: 1708 folsom -> grizzly: 2131 grizzly -> havana: 2188 havana -> icehouse: 1696 icehouse -> master: 1493 --- So say around 30% of the commits for a given release touch the virt drivers themselves.. many of them aren't specifically related to the virt drivers. Rather just general Nova internal cleanups because the interfaces aren't stable. And while splitting Nova virt drivers might help out some I'm not sure it helps the general Nova issue in that we have more reviews with less of the good ones landing. Nova is a weird beast at the moment and just splitting things like this is probably going to harm as much as it helps (like we saw with Ironic) unless we stabilize the APIs... and even then I'm skeptical of death by a million tiny sub-projects. I'm just not convinced this is the number #1 pain point around Nova reviews. What about the other 70%? For me a lot of the frustration with reviews is around test/gate time, pushing things through, rechecks, etc... and if we break something it takes just as much time to get the revert in. The last point (the ability to revert code quickly) is a really important one as it sometimes takes days to get a simple (obvious) revert landed. This leaves groups like TripleO who have their own CI and 3rd party testing systems which also capable of finding many critical issues in the difficult position of having to revert/cherry pick critical changes for days at a time in order to keep things running. Maybe I'm impatient (I totally am!) but I see much of the review slowdown as a result of the feedbac
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 05:27:06PM +, Alessandro Pilotti wrote: > This means that if we reach a point in which we agree to spin off the drivers > in > separate core projects, we need to consider how driver related CIs will be > still > included in the Nova review process, possibly with voting rights when the > individual CI stability allows it. Having each third party CI to vote only on > its spin-off driver project is not an option IMO, as it won’t catch > regressions > introduced in Nova that affect the drivers, including race conditions [5] Yes, the 3rd party CI would still need to be run against the nova common repos to ensure changes there don't cause regressions on the virt drivers in question. I'd expect them to continue to be non-gating as they are today though. THe 3rd party CI would only be gating on the virt driver repo. > An interesting area of discussion is who is going to be part of the initial > core > teams for each new subproject. I truly appreciated the experience and help of > the Nova core guys, so in order to allow a smoother transition I’d suggest to > have for each new project (e.g. nova-compute-hyperv, nova-compute-vmware, etc) > an initial core team consisting in one or two members of the current Nova > sub-team and one Nova core, with ideally each patch reviewed by both the > domain > experts and the Nova core. The team could then go on its way by voting its own > members as any other OpenStack project does. The question of precisely who should be on the core team of each virt driver will probably vary depending on the driver. In the Xen & libvirt cases, they are already privileged to have several nova-core members who would naturally also be core on the virt drivers. In the VMWare / HyperV cases, the idea you mention of having a couple of existing nova cores (temporarily) join their new teams would be a good way to bootstrap the new team. Beyond those cores though, I think what I'd suggest is that we look at the list of people who have contributed most code to each driver, and also the people who have reviewed most code in each driver and finally people active in the sub-team meetings. From those lists identify approx 5-10 top candidates to form the nucleus of the new team. Once up & running for a few months they can then look to promote any other candidates who show commitment to the driver in question. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Wed, Sep 10, 2014 at 12:41:44PM -0700, Vishvananda Ishaya wrote: > > On Sep 5, 2014, at 4:12 AM, Sean Dague wrote: > > > On 09/05/2014 06:40 AM, Nikola Đipanov wrote: > >> > >> > >> Just some things to think about with regards to the whole idea, by no > >> means exhaustive. > > > > So maybe the better question is: what are the top sources of technical > > debt in Nova that we need to address? And if we did, everyone would be > > more sane, and feel less burnt. > > > > Maybe the drivers are the worst debt, and jettisoning them makes them > > someone else's problem, so that helps some. I'm not entirely convinced > > right now. > > > > I think Cells represents a lot of debt right now. It doesn't fully work > > with the rest of Nova, and produces a ton of extra code paths special > > cased for the cells path. > > > > The Scheduler has a ton of debt as has been pointed out by the efforts > > in and around Gannt. The focus has been on the split, but realistically > > I'm with Jay is that we should focus on the debt, and exposing a REST > > interface in Nova. > > > > What about the Nova objects transition? That continues to be slow > > because it's basically Dan (with a few other helpers from time to time). > > Would it be helpful if we did an all hands on deck transition of the > > rest of Nova for K1 and just get it done? Would be nice to have the bulk > > of Nova core working on one thing like this and actually be in shared > > context with everyone else for a while. > > In my mind, spliting helps with all of these things. A lot of the cleanup > related work is completely delayed because the review queue starts to seem > like an insurmountable hurdle. There are various cleanups needed in the > drivers as well but they are not progressing due to the glacier pace we > are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug > fixes, Libvirt resize/migrate (this is still using ssh to copy data!) > > People need smaller areas of work. And they need a sense of pride and > ownership of the things that they work on. In my mind that is the best > way to ensure success. I do like to look at past experiance for guidance, and with Nova we have had a history of splitting out pieces of code and I think it is fair to say that all those splits have been very successful for both sides (the new project and Nova). eg if we look at the size and scope of the cinder project & team today, I don't think it could ever have grown to that scale if it had remained part of Nova. Splitting it out unleashed its latent potential for success. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 2014-09-10 12:19:08 -0700 (-0700), Vishvananda Ishaya wrote: > I don’t think this is a viable option for us, but if we were going > to do it, we would probably be better off using > https://code.google.com/p/rietveld/ as a base, since it is > actually written in python. The proposal floated in Atlanta was to write a new python-based front-end built on Gerrit's API layer (in fact, at least one such alternative front-end now exists in the form of gertty, but that's console-oriented and so probably not to everyone's tastes). I'll let the vinz developers speak to their plans and current progress though. -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Sep 5, 2014, at 4:12 AM, Sean Dague wrote: > On 09/05/2014 06:40 AM, Nikola Đipanov wrote: >> >> >> Just some things to think about with regards to the whole idea, by no >> means exhaustive. > > So maybe the better question is: what are the top sources of technical > debt in Nova that we need to address? And if we did, everyone would be > more sane, and feel less burnt. > > Maybe the drivers are the worst debt, and jettisoning them makes them > someone else's problem, so that helps some. I'm not entirely convinced > right now. > > I think Cells represents a lot of debt right now. It doesn't fully work > with the rest of Nova, and produces a ton of extra code paths special > cased for the cells path. > > The Scheduler has a ton of debt as has been pointed out by the efforts > in and around Gannt. The focus has been on the split, but realistically > I'm with Jay is that we should focus on the debt, and exposing a REST > interface in Nova. > > What about the Nova objects transition? That continues to be slow > because it's basically Dan (with a few other helpers from time to time). > Would it be helpful if we did an all hands on deck transition of the > rest of Nova for K1 and just get it done? Would be nice to have the bulk > of Nova core working on one thing like this and actually be in shared > context with everyone else for a while. In my mind, spliting helps with all of these things. A lot of the cleanup related work is completely delayed because the review queue starts to seem like an insurmountable hurdle. There are various cleanups needed in the drivers as well but they are not progressing due to the glacier pace we are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug fixes, Libvirt resize/migrate (this is still using ssh to copy data!) People need smaller areas of work. And they need a sense of pride and ownership of the things that they work on. In my mind that is the best way to ensure success. Vish signature.asc Description: Message signed with OpenPGP using GPGMail ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Sep 4, 2014, at 8:33 AM, Daniel P. Berrange wrote: > On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote: >> Hi, >> I do not think that Nova is in a death spiral. I just think that the >> current way of working at the moment is strangling the project. I do not >> understand why we need to split drivers out of the core project. Why not >> have the ability to provide Œcore review¹ status to people for reviewing >> those parts of the code? We have enough talented people in OpenStack to be >> able to write a driver above gerrit to enable that. > > The consensus view at the summit was that, having tried & failed at getting > useful changes into gerrit, it is not a viable option unless we undertake a > permanent fork of the code base. There didn't seem to be any apetite for > maintaining & developing a large java app ourselves. So people we're looking > to start writing a replacement for gerrit from scratch (albeit reusing the > database schema). I don’t think this is a viable option for us, but if we were going to do it, we would probably be better off using https://code.google.com/p/rietveld/ as a base, since it is actually written in python. Vish signature.asc Description: Message signed with OpenPGP using GPGMail ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Sep 4, 2014, at 3:24 AM, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. I am highly in favor of this approach (and have been for at least a year). Every time we have brought this up in the past there has been concern about the shared code, but we have to make a change. We have tried various other approaches and none of them have made a dent. +1000 Vish > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they > add up to a big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear > that the backlog of code up for review never goes away. Even > intensive code review efforts at various points in the dev cycle > makes only a small impact on the backlog. This has a pretty > significant impact on core team members, as their work is never > done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with > the reviews in a more efficient manner than plain gerrit allows > for. These certainly help, but they can't ever solve the problem > on their own - just make it slightly more bearable. And this is > not even considering that core team members might have useful > contributions to make in ways beyond just code review. Ultimately > the workload is just too high to sustain the levels of review > required, so core team members will eventually burn out (as they > have done many times already). > > Even if one person attempts to take the initiative to heavily > invest in review of certain features it is often to no avail. > Unless a second dedicated core reviewer can be found to 'tag > team' it is hard for one person to make a difference. The end > result is that a patch is +2d and then sits idle for weeks or > more until a merge conflict requires it to be reposted at which > point even that one +2 is lost. This is a pretty demotivating > outcome for both reviewers & the patch contributor. > > > New core team talent > > > It can't escape attention that the Nova core team does not grow > in size very often. When Nova was younger and its code base was > smaller, it was easier for contributors to get onto core because > the base level of knowledge required was that much smaller. To > get onto core today requires a major investment in learning Nova > over a year or more. Even people who potentially have the latent > skills may not have the time available to invest in learning the > entire of Nova. > > With the number of reviews proposed to Nova, the core team should > probably be at least double its current size[1]. There is plenty of > expertize in the project as a whole but it is typically focused > into specific areas of the codebase. There is nowhere we can find > 20 more people with broad knowledge of the codebase who could be > promoted even over the next year, let alone today. This is ignoring > that many existing members of core are relatively inactive due to > burnout and so need replacing. That means we really need another > 25-30 people for core. That's not going to happen. > > > Code review delays > -- > > The obvious result of having too much work for too few reviewers > is that code contributors face major delays in getting their work > reviewed and merged. From personal experience, during Juno, I've > probably spent 1 week in aggregate on actual code development vs > 8 weeks on waiting on code review. You have to constantly be on > alert for review comments because unless you can respond quickly > (and repost) while you still have the at
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 9/8/14, 7:23 PM, "Sylvain Bauza" wrote: > >Le 08/09/2014 18:06, Steven Dake a écrit : >> On 09/05/2014 06:10 AM, Sylvain Bauza wrote: >>> >>> Le 05/09/2014 12:48, Sean Dague a écrit : On 09/05/2014 03:02 AM, Sylvain Bauza wrote: > Le 05/09/2014 01:22, Michael Still a écrit : >> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange >> wrote: >> >> [Heavy snipping because of length] >> >>> The radical (?) solution to the nova core team bottleneck is thus >>>to >>> follow this lead and split the nova virt drivers out into separate >>> projects and delegate their maintainence to new dedicated teams. >>> >>>- Nova becomes the home for the public APIs, RPC system, >>>database >>> persistent and the glue that ties all this together with the >>> virt driver API. >>> >>>- Each virt driver project gets its own core team and is >>> responsible >>> for dealing with review, merge & release of their codebase. >> I think this is the crux of the matter. We're not doing a great >> job of >> landing code at the moment, because we can't keep up with the review >> workload. >> >> So far we've had two proposals mooted: >> >>- slots / runways, where we try to rate limit the number of >>things >> we're trying to review at once to maintain focus >>- splitting all the virt drivers out of the nova tree > Ahem, IIRC, there is a third proposal for Kilo : > - create subteam's half-cores responsible for reviewing patch's > iterations and send to cores approvals requests once they consider >the > patch enough stable for it. > > As I explained, it would allow to free up reviewing time for cores > without loosing the control over what is being merged. I don't really understand how the half core idea works outside of a math equation, because the point is in core is to have trust over the judgement of your fellow core members so that they can land code when you aren't looking. I'm not sure how I manage to build up half trust in someone any quicker. >>> >>> Well, this thread is becoming huge so that's becoming hard to follow >>> all the discussion but I explained the idea elsewhere. Let me just >>> provide it here too : >>> The idea is *not* to land patches by the halfcores. Core team will >>> still be fully responsible for approving patches. The main problem in >>> Nova is that cores are spending lots of time because they review each >>> iteration of a patch, and also have to look at if a patch is good or >>> not. >>> >>> That's really time consuming, and for most of the time, quite >>> frustrating as it requires to follow the patch's life, so there are >>> high risks that your core attention is becoming distracted over the >>> life of the patch. >>> >>> Here, the idea is to reduce dramatically this time by having teams >>> dedicated to specific areas (as it's already done anyway for the >>> various majority of reviewers) who could on their own take time for >>> reviewing all the iterations. Of course, that doesn't mean cores >>> would loose the possibility to specifically follow a patch and bypass >>> the halfcores, that's just for helping them if they're overwhelmed. >>> >>> About the question of trusting cores or halfcores, I can just say >>> that Nova team is anyway needing to grow up or divide it so the >>> trusting delegation has to be real anyway. >>> >>> This whole process is IMHO very encouraging for newcomers because >>> that creates dedicated teams that could help them to improve their >>> changes, and not waiting 2 months for getting a -1 and a frank reply. >>> >>> >> Interesting idea, but having been core on Heat for ~2 years, it is >> critical to be involved in the review from the beginning of the patch >> set. Typically you won't see core reviewer's participate in a review >> that is already being handled by two core reviewers. >> >> The reason it is important from the beginning of the change request is >> that the project core can store the iterations and purpose of the >> change in their heads. Delegating all that up front work to a >> non-core just seems counter to the entire process of code reviews. >> Better would be reduce the # of reviews in the queue (what is proposed >> by this change) or trust new reviewers "faster". I'm not sure how you >> do that - but this second model is what your proposing. >> >> I think one thing that would be helpful is to point out somehow in the >> workflow that two core reviewers are involved in the review so core >> reviewers don't have to sift through 10 pages of reviews to find new >> work. >> > >Now that the specs repo is in place and has been proved with Juno, most >of the design stage is approved before the implementation is going. If >the cores are getting more time because they wouldn't be focused on each >single patchset, they could really find some pat
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 08/09/2014 18:06, Steven Dake a écrit : On 09/05/2014 06:10 AM, Sylvain Bauza wrote: Le 05/09/2014 12:48, Sean Dague a écrit : On 09/05/2014 03:02 AM, Sylvain Bauza wrote: Le 05/09/2014 01:22, Michael Still a écrit : On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: [Heavy snipping because of length] The radical (?) solution to the nova core team bottleneck is thus to follow this lead and split the nova virt drivers out into separate projects and delegate their maintainence to new dedicated teams. - Nova becomes the home for the public APIs, RPC system, database persistent and the glue that ties all this together with the virt driver API. - Each virt driver project gets its own core team and is responsible for dealing with review, merge & release of their codebase. I think this is the crux of the matter. We're not doing a great job of landing code at the moment, because we can't keep up with the review workload. So far we've had two proposals mooted: - slots / runways, where we try to rate limit the number of things we're trying to review at once to maintain focus - splitting all the virt drivers out of the nova tree Ahem, IIRC, there is a third proposal for Kilo : - create subteam's half-cores responsible for reviewing patch's iterations and send to cores approvals requests once they consider the patch enough stable for it. As I explained, it would allow to free up reviewing time for cores without loosing the control over what is being merged. I don't really understand how the half core idea works outside of a math equation, because the point is in core is to have trust over the judgement of your fellow core members so that they can land code when you aren't looking. I'm not sure how I manage to build up half trust in someone any quicker. Well, this thread is becoming huge so that's becoming hard to follow all the discussion but I explained the idea elsewhere. Let me just provide it here too : The idea is *not* to land patches by the halfcores. Core team will still be fully responsible for approving patches. The main problem in Nova is that cores are spending lots of time because they review each iteration of a patch, and also have to look at if a patch is good or not. That's really time consuming, and for most of the time, quite frustrating as it requires to follow the patch's life, so there are high risks that your core attention is becoming distracted over the life of the patch. Here, the idea is to reduce dramatically this time by having teams dedicated to specific areas (as it's already done anyway for the various majority of reviewers) who could on their own take time for reviewing all the iterations. Of course, that doesn't mean cores would loose the possibility to specifically follow a patch and bypass the halfcores, that's just for helping them if they're overwhelmed. About the question of trusting cores or halfcores, I can just say that Nova team is anyway needing to grow up or divide it so the trusting delegation has to be real anyway. This whole process is IMHO very encouraging for newcomers because that creates dedicated teams that could help them to improve their changes, and not waiting 2 months for getting a -1 and a frank reply. Interesting idea, but having been core on Heat for ~2 years, it is critical to be involved in the review from the beginning of the patch set. Typically you won't see core reviewer's participate in a review that is already being handled by two core reviewers. The reason it is important from the beginning of the change request is that the project core can store the iterations and purpose of the change in their heads. Delegating all that up front work to a non-core just seems counter to the entire process of code reviews. Better would be reduce the # of reviews in the queue (what is proposed by this change) or trust new reviewers "faster". I'm not sure how you do that - but this second model is what your proposing. I think one thing that would be helpful is to point out somehow in the workflow that two core reviewers are involved in the review so core reviewers don't have to sift through 10 pages of reviews to find new work. Now that the specs repo is in place and has been proved with Juno, most of the design stage is approved before the implementation is going. If the cores are getting more time because they wouldn't be focused on each single patchset, they could really find some patches they would like to look at, or they could just wait for the half-approvals from the halfcores. If a core thinks that a patch is enough tricky for looking at each iteration, I don't see any bad things. At least, it's up to the core reviewer to choose which patches he could look at, and he would be more free than if the slots proposal would be there. I'm a core from a tiny project but I know how time consuming it is. I would really enjoy if I could delegate
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 06:10 AM, Sylvain Bauza wrote: Le 05/09/2014 12:48, Sean Dague a écrit : On 09/05/2014 03:02 AM, Sylvain Bauza wrote: Le 05/09/2014 01:22, Michael Still a écrit : On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: [Heavy snipping because of length] The radical (?) solution to the nova core team bottleneck is thus to follow this lead and split the nova virt drivers out into separate projects and delegate their maintainence to new dedicated teams. - Nova becomes the home for the public APIs, RPC system, database persistent and the glue that ties all this together with the virt driver API. - Each virt driver project gets its own core team and is responsible for dealing with review, merge & release of their codebase. I think this is the crux of the matter. We're not doing a great job of landing code at the moment, because we can't keep up with the review workload. So far we've had two proposals mooted: - slots / runways, where we try to rate limit the number of things we're trying to review at once to maintain focus - splitting all the virt drivers out of the nova tree Ahem, IIRC, there is a third proposal for Kilo : - create subteam's half-cores responsible for reviewing patch's iterations and send to cores approvals requests once they consider the patch enough stable for it. As I explained, it would allow to free up reviewing time for cores without loosing the control over what is being merged. I don't really understand how the half core idea works outside of a math equation, because the point is in core is to have trust over the judgement of your fellow core members so that they can land code when you aren't looking. I'm not sure how I manage to build up half trust in someone any quicker. Well, this thread is becoming huge so that's becoming hard to follow all the discussion but I explained the idea elsewhere. Let me just provide it here too : The idea is *not* to land patches by the halfcores. Core team will still be fully responsible for approving patches. The main problem in Nova is that cores are spending lots of time because they review each iteration of a patch, and also have to look at if a patch is good or not. That's really time consuming, and for most of the time, quite frustrating as it requires to follow the patch's life, so there are high risks that your core attention is becoming distracted over the life of the patch. Here, the idea is to reduce dramatically this time by having teams dedicated to specific areas (as it's already done anyway for the various majority of reviewers) who could on their own take time for reviewing all the iterations. Of course, that doesn't mean cores would loose the possibility to specifically follow a patch and bypass the halfcores, that's just for helping them if they're overwhelmed. About the question of trusting cores or halfcores, I can just say that Nova team is anyway needing to grow up or divide it so the trusting delegation has to be real anyway. This whole process is IMHO very encouraging for newcomers because that creates dedicated teams that could help them to improve their changes, and not waiting 2 months for getting a -1 and a frank reply. Interesting idea, but having been core on Heat for ~2 years, it is critical to be involved in the review from the beginning of the patch set. Typically you won't see core reviewer's participate in a review that is already being handled by two core reviewers. The reason it is important from the beginning of the change request is that the project core can store the iterations and purpose of the change in their heads. Delegating all that up front work to a non-core just seems counter to the entire process of code reviews. Better would be reduce the # of reviews in the queue (what is proposed by this change) or trust new reviewers "faster". I'm not sure how you do that - but this second model is what your proposing. I think one thing that would be helpful is to point out somehow in the workflow that two core reviewers are involved in the review so core reviewers don't have to sift through 10 pages of reviews to find new work. Regards, -steve As I said elsewhere, I dislike the slots proposal because it sends to the developers the message that the price to pay for contributing to Nova is increasing. Again, that's not because you're prioritizing that you increase your velocity, that's 2 distinct subjects. -Sylvain -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
>> The last few days have been interesting as I watch FFEs come through. >> People post explaining their feature, its importance, and the risk >> associated with it. Three cores sign on for review. All of the ones >> I've looked at have received active review since being posted. Would >> it be bonkers to declare nova to be in "permanent feature freeze"? If >> we could maintain the level of focus we see now, then we'd be getting >> heaps more done that before. > > Agreed. Honestly, this has been a really nice flow. I'd love to figure > out what part of this focus is capturable for normal cadence. This > realistically is what I was hoping slots would provide, because I feel > like we actually move really fast when we call out 5-10 things to go > look at this week. The funny thing is, last week I was thinking how similar FF is to what slots/runways would likely provide. That is, intense directed focus on a single thing by a group of people until it's merged (or fails). Context is kept between iterations because everyone is on board for quick iterations with minimal distraction between them. It *does* work during FF, as we've seen in the past -- I'd expect we have nearly 100% merge rate of FFEs. How we arrive at a thing getting focus is different in slots/runways, but I feel the result could be the same. Splitting out the virt drivers is an easy way to make the life of a core much easier, but I think the negative impacts are severe and potentially irreversible, so I'd rather make sure we're totally out of options before we exercise it. --Dan signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, 2014-09-05 at 14:14 +0200, Thierry Carrez wrote: > Daniel P. Berrange wrote: > > For a long time I've use the LKML 'subsystem maintainers' model as the > > reference point for ideas. In a more LKML like model, each virt team > > (or other subsystem team) would have their own separate GIT repo with > > a complete Nova codebase, where they did they day to day code submissions, > > reviews and merges. Periodically the primary subsystem maintainer would > > submit a large pull / merge requests to the overall Nova maintainer. > > The $1,000,000 question in such a model is what kind of code review > > happens during the big pull requests to integrate subsystem trees. > > Please note that the Kernel subsystem model is actually a trust tree > based on 20 years of trust building. OpenStack is only 4 years old, so > it's difficult to apply the same model as-is. That's true but not entirely accurate. The kernel maintainership is a trust tree, but not every person in that tree has been in the position for 20 years. We have one or two who have (Dave Miller, net maintainer, for instance), but we have some newcomers: Sarah Sharp has only been on USB3.0 for a year. People pass in and out of the maintainer tree all the time. In many ways, the Open Stack core model is also a trust tree (you elect people to the core and support their nominations because you trust them to do the required job). It's not a 1 for 1 conversion, but it should be possible to derive the trust you need from the model you already have, should you wish to make OpenStack function more like the Linux Kernel. Essentially Daniel's proposal boils down to making the trust boundaries align with separated community interests to get more scaling in the model. This is very similar to the way the kernel operates: most maintainers only have expertise in their own areas. We have a few people with broad reach, like Andrew and Linus, but by and large most people settle down in a much smaller area. However, you don't have to follow the kernel model to get this to happen, you just have to identify the natural interest boundaries of the contributors and align around them (provided they have enough mass to form their own community). James ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Daniel, Thanks for the well thought out and thorough proposal to help Nova. As an OpenStack operator/developer since Cactus time, it has definitely gotten harder and harder to get fixes in Nova for small bugs that we find running at scale with production systems. This forces us to maintain more and more custom patches in-house (or for longer periods of time). The huge amount of time necessary to shepherd patches through review discourages additional devs from contributing patches because of the amount of time investment required. I believe whatever we can do to improve the ability to fix technical debt within Nova and both keep and grow the non-core contributors of Nova would be greatly beneficial. Thanks! Nate ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, 2014-09-05 at 08:02 -0400, Sean Dague wrote: > On 09/05/2014 07:40 AM, Daniel P. Berrange wrote: > > On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote: > >> On 09/05/2014 06:40 AM, Nikola Đipanov wrote: > >>> A handy example of this I can think of is the currently granted FFE for > >>> serial consoles - consider how much of the code went into the common > >>> part vs. the libvirt specific part, I would say the ratio is very close > >>> to 1 if not even in favour of the common part (current 4 outstanding > >>> patches are all for core, and out of the 5 merged - only one of them was > >>> purely libvirt specific, assuming virt/ will live in nova-common). > >>> > >>> Joe asked a similar question elsewhere on the thread. > >>> > >>> Once again - I am not against doing it - what I am saying is that we > >>> need to look into this closer as it may not be as big of a win from the > >>> number of changes needed per feature as we may think. > >>> > >>> Just some things to think about with regards to the whole idea, by no > >>> means exhaustive. > >> > >> So maybe the better question is: what are the top sources of technical > >> debt in Nova that we need to address? And if we did, everyone would be > >> more sane, and feel less burnt. > >> > >> Maybe the drivers are the worst debt, and jettisoning them makes them > >> someone else's problem, so that helps some. I'm not entirely convinced > >> right now. > >> > >> I think Cells represents a lot of debt right now. It doesn't fully work > >> with the rest of Nova, and produces a ton of extra code paths special > >> cased for the cells path. > >> > >> The Scheduler has a ton of debt as has been pointed out by the efforts > >> in and around Gannt. The focus has been on the split, but realistically > >> I'm with Jay is that we should focus on the debt, and exposing a REST > >> interface in Nova. > >> > >> What about the Nova objects transition? That continues to be slow > >> because it's basically Dan (with a few other helpers from time to time). > >> Would it be helpful if we did an all hands on deck transition of the > >> rest of Nova for K1 and just get it done? Would be nice to have the bulk > >> of Nova core working on one thing like this and actually be in shared > >> context with everyone else for a while. > > > > I think the idea that we can tell everyone in Nova what they should > > focus on for a cycle, or more generally, is doomed to failure. This > > isn't a closed source company controlled project where you can dictate > > what everyones priority must be. We must accept that rely on all our > > contributors good will in voluntarily giving their time & resource to > > the projct, to scratch whatever itch they have in the project. We have > > to encourage them to want to work nova and demonstrate that we value > > whatever form of contributor they choose to make. If we have technical > > debt that we think is important to address we need to illustrate / > > show people why they should care about helping. If they none the less > > decide that work isn't for them, we can't just cast them aside and/or > > ignore their contributions, while we get on with other things. This > > is why I think it is important that we split up nova to allow each > > are to self-organize around what they consider to be priorities in > > their area of interest / motivation. Not enabling that is going to > > to continue to kill our community > > I'm getting tired of the reprieve that because we are an Open Source > project declaring priorities is pointless, because it's not. I would say > it's actually the exception that a developer wakes up in the morning and > says "I completely disregard what anyone else thinks is important in > this project, this is what I'm going to do today". Because if that's how > they felt they wouldn't choose to be part of a community, they would > just go do their own thing. Lone wolfs by definition don't form > communities. Actually, I don't think this analysis is accurate. Some people are simply interested in small aspects of a project. It's the "scratch your own itch" part of open source. The thing which makes itch scratchers not lone wolfs is the desire to go the extra mile to make what they've done useful to the community. If they never do this, they likely have a forked repo with only their changes (and are the epitome of a lone wolf). If you scratch your own itch and make the effort to get it upstream, you're assisting the community (even if that's the only piece of code you do) and that assistance makes you (at least for a time) part of the community. A community doesn't necessarily require continuity from all its elements. It requires continuity from some (the core, if you will), but it also allows for contributions from people who only have one or two things they need doing. For OpenStack to convert its users into its contributors, it is going to have to embrace this, because they likely only need a couple of things fixi
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 03:01 PM, Russell Bryant wrote: On 09/05/2014 10:06 AM, Jay Pipes wrote: On 09/05/2014 06:29 AM, John Garbutt wrote: Scheduler: I think we need to split out the scheduler with a similar level of urgency. We keep blocking features on the split, because we know we don't have the review bandwidth to deal with them. Right now I am talking about a compute related scheduler in the compute program, that might evolve to worry about other services at a later date. -1 Without first cleaning up the interfaces around resource tracking, claim creation and processing, and the communication interfaces between the nova-conductor, nova-scheduler, and nova-compute. I see no urgency at all in splitting out the scheduler. The cleanup of the interfaces around the resource tracker and scheduler has great priority, though, IMO. I'd just reframe things ... I'd like the work you're referring to here be treated as an obvious key pre-requisite to a split, and this cleanup is what should be treated with urgency by those with a vested interest in getting more autonomy around scheduler development. Sure, that's a perfectly gentle way of putting it :) Thanks! -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 10:06 AM, Jay Pipes wrote: > On 09/05/2014 06:29 AM, John Garbutt wrote: >> Scheduler: I think we need to split out the scheduler with a similar >> level of urgency. We keep blocking features on the split, because we >> know we don't have the review bandwidth to deal with them. Right now I >> am talking about a compute related scheduler in the compute program, >> that might evolve to worry about other services at a later date. > > -1 > > Without first cleaning up the interfaces around resource tracking, claim > creation and processing, and the communication interfaces between the > nova-conductor, nova-scheduler, and nova-compute. > > I see no urgency at all in splitting out the scheduler. The cleanup of > the interfaces around the resource tracker and scheduler has great > priority, though, IMO. I'd just reframe things ... I'd like the work you're referring to here be treated as an obvious key pre-requisite to a split, and this cleanup is what should be treated with urgency by those with a vested interest in getting more autonomy around scheduler development. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 03:52 AM, Daniel P. Berrange wrote: So my biggest fear with a model where each team had their own full Nova tree and did large pull requests, is that we'd suffer major pain during the merging of large pull requests, especially if any of the merges touched common code. It could make the pull requests take a really long time to get accepted into the primary repo. By constrast with split out git repos per virt driver code, we will only ever have 1 stage of code review for each patch. Changes to common code would go straight to main nova common repo and so get reviewed by the experts there without delay, avoiding the 2nd stage of review from merge requests. Why treat things differently? It seems to me that even in the first scenario you could still send common code changes straight to the main nova repo. Then the pulls from the virt repo would literally only touch the virt code in the common repo. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Well, I and I believe a few others feel a slightly higher sense of urgency about splitting out the scheduler but I don't want to hijack this thread for that debate. Fair warning, I intend to start a new thread where we can talk specifically about the scheduler split, I'm afraid we're in the situation where we're all in agreement but everyone has a different view of what that agreement is. -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Jay Pipes [mailto:jaypi...@gmail.com] Sent: Friday, September 5, 2014 8:07 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers On 09/05/2014 06:29 AM, John Garbutt wrote: > Scheduler: I think we need to split out the scheduler with a similar > level of urgency. We keep blocking features on the split, because we > know we don't have the review bandwidth to deal with them. Right now I > am talking about a compute related scheduler in the compute program, > that might evolve to worry about other services at a later date. -1 Without first cleaning up the interfaces around resource tracking, claim creation and processing, and the communication interfaces between the nova-conductor, nova-scheduler, and nova-compute. I see no urgency at all in splitting out the scheduler. The cleanup of the interfaces around the resource tracker and scheduler has great priority, though, IMO. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 10:25:09AM -0500, Kevin L. Mitchell wrote: > On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote: > > > 2. Removal of drivers other than the reference implementation for each > > > project could be the healthiest option > > > a. Requires transparent, public, automated 3'rd party CI > > > b. Requires a TRUE plugin architecture and mentality > > > c. Requires a stable and well defined API > > > > As mentioned in the original mail I don't want to see a situation where > > we end up with some drivers in tree and others out of tree as it sets up > > bad dynamics within the project. Those out of tree will always have the > > impression of being second class citizens and thus there will be constant > > pressure to accept drivers back into tree. The so called 'reference' > > driver that stayed in tree would also continue to be penalized in the > > way it is today, and so its development would be disadvantaged compared > > to the out of tree drivers. > > I have one quibble with the notion of "not even one" driver in core: I > think it is probably useful to include a dummy, do-nothing driver that > can be used for in-tree functional tests and as an example to point > those interested in writing a driver. Then, the "second-class citizen" > is the one actually in the tree :) Beyond that, I agree with this > proposal: it has never made sense to me that *all* drivers live in the > tree, and it actually offends my sense of organization to have the tree > so cluttered; we split functions when they get too big, we split modules > when they get too big, and we create subdirectories when packages get > too big, so why not split repos when they get too big? Oh sure, having a "fake virt" driver in tree is fine and indeed desirable for the reasons you mention. I was exclusively thinking about the real virt drivers in my earlier statement. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
> I look at what we do with Ironic testing current as a guide here. > We have tempest job that runs against Nova, that validates changes > to nova don't break the separate Ironic git repo. So my thought > is that all our current tempest jobs would simply work in that > way. IOW changes to so called "nova common" would run jobs that > validate the change against all the virt driver git repos. I think > this kind of setup is pretty much mandatory for split repos to be > viable, because I don't want to see us loose testing coverage in > this proposed change. Thanks Daniel for raising it this problem. Yeah I think that what we did with Ironic while the driver is* out of the Nova tree serves as a good example. I also think that having drivers out of the tree is possible, making the tests run against the "nova-common" and assert things didn't break is no problem. But as you described before the process of code submission was quite painful and required a lot of effort and coordination from the Ironic and Nova teams, we would need to improve that. Another problem we will have in splitting the drivers out is that classic limitation of launchpad blueprints, we can't track tasks across multiple projects. (This will change once Storyboard is completed I guess). But that's all a long-term solution. In the short term I don't have see any real solution yet, this thing about asking companies/projects that has a driver in Nova to help with reviews is not so bad IMO. I've started reviewing code in Nova today and will continue doing that, maybe aiming for core so that we can speed up the future reviews to the Ironic driver. Now, I let me throw a crazy idea here into the mix (it might be stupid, but): Maybe Nova is doing much more than it should, deprecating the baremetal and network part and splitting the scheduler out of the project helps a lot. But, and if other parts were splitted as well, like managing flavors, creating the instances etc... And then Nova can be the thing that knows how to talk/manage hypervisors only and won't have to deal with crazy cases like the Ironic where we try make real machines looks & feel like VMs to fit into Nova, because that's painful and I think we are going to have many limitations if we continue to do that (I believe the same may happen with the Docker driver). So if we have another project on top of Nova, Ironic and $CONTAINER_PROJECT_NAME** that abstract all the rest and only talks to Nova when a VM is going to be deployed or Ironic when a Baremetal machine is going to be deployed, etc... Maybe then Nova will be considerable small and can keep all drivers in tree (hypervisor drivers only, no Docker or Ironic). * was tempted to write 'was' there :) ** A new project that will know how to handle the containers case. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote: > > 2. Removal of drivers other than the reference implementation for each > > project could be the healthiest option > > a. Requires transparent, public, automated 3'rd party CI > > b. Requires a TRUE plugin architecture and mentality > > c. Requires a stable and well defined API > > As mentioned in the original mail I don't want to see a situation where > we end up with some drivers in tree and others out of tree as it sets up > bad dynamics within the project. Those out of tree will always have the > impression of being second class citizens and thus there will be constant > pressure to accept drivers back into tree. The so called 'reference' > driver that stayed in tree would also continue to be penalized in the > way it is today, and so its development would be disadvantaged compared > to the out of tree drivers. I have one quibble with the notion of "not even one" driver in core: I think it is probably useful to include a dummy, do-nothing driver that can be used for in-tree functional tests and as an example to point those interested in writing a driver. Then, the "second-class citizen" is the one actually in the tree :) Beyond that, I agree with this proposal: it has never made sense to me that *all* drivers live in the tree, and it actually offends my sense of organization to have the tree so cluttered; we split functions when they get too big, we split modules when they get too big, and we create subdirectories when packages get too big, so why not split repos when they get too big? -- Kevin L. Mitchell Rackspace ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 06:29 AM, John Garbutt wrote: Scheduler: I think we need to split out the scheduler with a similar level of urgency. We keep blocking features on the split, because we know we don't have the review bandwidth to deal with them. Right now I am talking about a compute related scheduler in the compute program, that might evolve to worry about other services at a later date. -1 Without first cleaning up the interfaces around resource tracking, claim creation and processing, and the communication interfaces between the nova-conductor, nova-scheduler, and nova-compute. I see no urgency at all in splitting out the scheduler. The cleanup of the interfaces around the resource tracker and scheduler has great priority, though, IMO. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
> > > - Each virt driver project gets its own core team and is responsible >for dealing with review, merge & release of their codebase. > > Note, I really do mean *all* virt drivers should be separate. I do > not want to see some virt drivers split out and others remain in tree > because I feel that signifies that the out of tree ones are second > class citizens. +1. I made this same proposal to Michael during the mid-cycle. However, I haven't wanted to conflate this issue with bringing Docker back into Nova. For the Docker driver in particular, I feel that being able to stay out of tree and having our own core team would be beneficial, but I wouldn't want to do this unless it applied equally to all drivers. -- Regards, Eric Windisch ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 05/09/2014 15:11, Jay Pipes a écrit : On 09/05/2014 08:58 AM, Sylvain Bauza wrote: Le 05/09/2014 14:48, Jay Pipes a écrit : On 09/05/2014 02:59 AM, Sylvain Bauza wrote: Le 05/09/2014 01:26, Jay Pipes a écrit : On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Jay, we agreed on a plan to carry on, please be sure we're working on it, see the Gantt meetings logs for what my vision is. I've attended most of the Gantt meetings, except for a couple recent ones due to my house move (finally done, yay!). I believe we are mostly aligned on the plan of record, but I see no urgency in splitting out the scheduler. I only see urgency on cleaning up the interfaces. But, that said, let's not highjack Dan's thread here too much. We can discuss on IRC. I was only saying that Don's comment that splitting the scheduler out would help solve the bandwidth issues should be predicated on the same contingency that Dan placed on splitting out the virt drivers: that the internal interfaces be cleaned up, documented and stabilized. So, this effort requires at least one cycle, and as Dan stated, there is urgency, so I think we need to identify a short-term solution which doesn't require refactoring. My personal opinion is what Russell and Thierry expressed, ie. subteam delegation (to what I call "half-cores") for iterations and only approvals for cores. Yeah, I don't have much of an issue with the subteam delegation proposals. It's just really a technical problem to solve w.r.t. Gerrit permissions. Well, that just requires new Gerrit groups and a new label (like Subteam-Approved) so that members of this group could just +Subteam-Approved if they're OK (here I imagine 2 people from the group labelling it) And what about code that crosses module boundaries? Would we need a LibvirtSubteamApproved, SchedulerSubteamApproved, etc? Luckily not. I think we only need one more label (we only have 3 now : Verified, Code-Review, Approved). Here the key thing is having a search label that cores can consume because they know that this label is worth of interest. If something is crosses module, then that's something that probably a core would help. For example, if I'm an API halfcore, I can subteam-approve all the changes related to the API itself (so that encourages small and readable patches btw.) but I leave my turn if I'm looking at something I don't know enough (or I provide +1) The porting idea is to encourage reviewing because the step is not so high as if I wanted to be core. On the other hand, if an halfcore is becoming enough trustable (because he also provides good +1s for other areas and is enough involved in the release process), then this folk is a good candidate for becoming core. As you identified, most of the proposal is based on gentle-person agreement because Gerrit is not enough flexible for doing that (although since 2.8, you can search all patches related to a path, like file:^nova/scheduler/*) -Sylvain Of course, all the groups could have permissions to label any file of Nova, but here we can just define a gentleman's agreement, like we do for having two +2s before approving. Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce this kind of policy, that's what I was getting at. That would say that cores could just search using Gerrit with 'label:Subteam-Approved>=1' Interesting, yes, that would be useful. -jay -Sylvain Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
> -Original Message- > From: Sean Dague [mailto:s...@dague.net] > Sent: 05 September 2014 11:49 > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out > virt drivers > > On 09/05/2014 03:02 AM, Sylvain Bauza wrote: > > > > > > Ahem, IIRC, there is a third proposal for Kilo : > > - create subteam's half-cores responsible for reviewing patch's > > iterations and send to cores approvals requests once they consider the > > patch enough stable for it. > > > > As I explained, it would allow to free up reviewing time for cores > > without loosing the control over what is being merged. > > I don't really understand how the half core idea works outside of a math > equation, because the point is in core is to have trust over the judgement of > your fellow core members so that they can land code when you aren't > looking. I'm not sure how I manage to build up half trust in someone any > quicker. > > -Sean > You seem to be looking at a model Sean where trust is purely binary - you’re either trusted to know about all of Nova or not trusted at all. What Sylvain is proposing (I think) is something more akin to having folks that are trusted in some areas of the system and/or trusted to be right enough of the time that their reviewing skills take a significant part of the burden of the core reviewers.That kind of incremental development of trust feels like a fairly natural model me.Its some way between the full divide and rule approach of splitting out various components (which doesn't feel like a short term solution) and the blanket approach of adding more cores. Making it easier to incrementally grant trust, and having the processes and will to remove it if its seen to be misused feels to me like it has to be part of the solution to breaking out of the "we need more people we trust, but we don’t feel comfortable trusting more than N people at any one time". Sometimes you have to give people a chance in small, well defined and controlled steps. Phil ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 08:58 AM, Sylvain Bauza wrote: Le 05/09/2014 14:48, Jay Pipes a écrit : On 09/05/2014 02:59 AM, Sylvain Bauza wrote: Le 05/09/2014 01:26, Jay Pipes a écrit : On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Jay, we agreed on a plan to carry on, please be sure we're working on it, see the Gantt meetings logs for what my vision is. I've attended most of the Gantt meetings, except for a couple recent ones due to my house move (finally done, yay!). I believe we are mostly aligned on the plan of record, but I see no urgency in splitting out the scheduler. I only see urgency on cleaning up the interfaces. But, that said, let's not highjack Dan's thread here too much. We can discuss on IRC. I was only saying that Don's comment that splitting the scheduler out would help solve the bandwidth issues should be predicated on the same contingency that Dan placed on splitting out the virt drivers: that the internal interfaces be cleaned up, documented and stabilized. So, this effort requires at least one cycle, and as Dan stated, there is urgency, so I think we need to identify a short-term solution which doesn't require refactoring. My personal opinion is what Russell and Thierry expressed, ie. subteam delegation (to what I call "half-cores") for iterations and only approvals for cores. Yeah, I don't have much of an issue with the subteam delegation proposals. It's just really a technical problem to solve w.r.t. Gerrit permissions. Well, that just requires new Gerrit groups and a new label (like Subteam-Approved) so that members of this group could just +Subteam-Approved if they're OK (here I imagine 2 people from the group labelling it) And what about code that crosses module boundaries? Would we need a LibvirtSubteamApproved, SchedulerSubteamApproved, etc? Of course, all the groups could have permissions to label any file of Nova, but here we can just define a gentleman's agreement, like we do for having two +2s before approving. Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce this kind of policy, that's what I was getting at. That would say that cores could just search using Gerrit with 'label:Subteam-Approved>=1' Interesting, yes, that would be useful. -jay -Sylvain Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 05/09/2014 12:48, Sean Dague a écrit : On 09/05/2014 03:02 AM, Sylvain Bauza wrote: Le 05/09/2014 01:22, Michael Still a écrit : On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: [Heavy snipping because of length] The radical (?) solution to the nova core team bottleneck is thus to follow this lead and split the nova virt drivers out into separate projects and delegate their maintainence to new dedicated teams. - Nova becomes the home for the public APIs, RPC system, database persistent and the glue that ties all this together with the virt driver API. - Each virt driver project gets its own core team and is responsible for dealing with review, merge & release of their codebase. I think this is the crux of the matter. We're not doing a great job of landing code at the moment, because we can't keep up with the review workload. So far we've had two proposals mooted: - slots / runways, where we try to rate limit the number of things we're trying to review at once to maintain focus - splitting all the virt drivers out of the nova tree Ahem, IIRC, there is a third proposal for Kilo : - create subteam's half-cores responsible for reviewing patch's iterations and send to cores approvals requests once they consider the patch enough stable for it. As I explained, it would allow to free up reviewing time for cores without loosing the control over what is being merged. I don't really understand how the half core idea works outside of a math equation, because the point is in core is to have trust over the judgement of your fellow core members so that they can land code when you aren't looking. I'm not sure how I manage to build up half trust in someone any quicker. Well, this thread is becoming huge so that's becoming hard to follow all the discussion but I explained the idea elsewhere. Let me just provide it here too : The idea is *not* to land patches by the halfcores. Core team will still be fully responsible for approving patches. The main problem in Nova is that cores are spending lots of time because they review each iteration of a patch, and also have to look at if a patch is good or not. That's really time consuming, and for most of the time, quite frustrating as it requires to follow the patch's life, so there are high risks that your core attention is becoming distracted over the life of the patch. Here, the idea is to reduce dramatically this time by having teams dedicated to specific areas (as it's already done anyway for the various majority of reviewers) who could on their own take time for reviewing all the iterations. Of course, that doesn't mean cores would loose the possibility to specifically follow a patch and bypass the halfcores, that's just for helping them if they're overwhelmed. About the question of trusting cores or halfcores, I can just say that Nova team is anyway needing to grow up or divide it so the trusting delegation has to be real anyway. This whole process is IMHO very encouraging for newcomers because that creates dedicated teams that could help them to improve their changes, and not waiting 2 months for getting a -1 and a frank reply. As I said elsewhere, I dislike the slots proposal because it sends to the developers the message that the price to pay for contributing to Nova is increasing. Again, that's not because you're prioritizing that you increase your velocity, that's 2 distinct subjects. -Sylvain -Sean ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 05/09/2014 14:48, Jay Pipes a écrit : On 09/05/2014 02:59 AM, Sylvain Bauza wrote: Le 05/09/2014 01:26, Jay Pipes a écrit : On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Jay, we agreed on a plan to carry on, please be sure we're working on it, see the Gantt meetings logs for what my vision is. I've attended most of the Gantt meetings, except for a couple recent ones due to my house move (finally done, yay!). I believe we are mostly aligned on the plan of record, but I see no urgency in splitting out the scheduler. I only see urgency on cleaning up the interfaces. But, that said, let's not highjack Dan's thread here too much. We can discuss on IRC. I was only saying that Don's comment that splitting the scheduler out would help solve the bandwidth issues should be predicated on the same contingency that Dan placed on splitting out the virt drivers: that the internal interfaces be cleaned up, documented and stabilized. So, this effort requires at least one cycle, and as Dan stated, there is urgency, so I think we need to identify a short-term solution which doesn't require refactoring. My personal opinion is what Russell and Thierry expressed, ie. subteam delegation (to what I call "half-cores") for iterations and only approvals for cores. Yeah, I don't have much of an issue with the subteam delegation proposals. It's just really a technical problem to solve w.r.t. Gerrit permissions. Well, that just requires new Gerrit groups and a new label (like Subteam-Approved) so that members of this group could just +Subteam-Approved if they're OK (here I imagine 2 people from the group labelling it) Of course, all the groups could have permissions to label any file of Nova, but here we can just define a gentleman's agreement, like we do for having two +2s before approving. That would say that cores could just search using Gerrit with 'label:Subteam-Approved>=1' -Sylvain Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 02:59 AM, Sylvain Bauza wrote: Le 05/09/2014 01:26, Jay Pipes a écrit : On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Jay, we agreed on a plan to carry on, please be sure we're working on it, see the Gantt meetings logs for what my vision is. I've attended most of the Gantt meetings, except for a couple recent ones due to my house move (finally done, yay!). I believe we are mostly aligned on the plan of record, but I see no urgency in splitting out the scheduler. I only see urgency on cleaning up the interfaces. But, that said, let's not highjack Dan's thread here too much. We can discuss on IRC. I was only saying that Don's comment that splitting the scheduler out would help solve the bandwidth issues should be predicated on the same contingency that Dan placed on splitting out the virt drivers: that the internal interfaces be cleaned up, documented and stabilized. So, this effort requires at least one cycle, and as Dan stated, there is urgency, so I think we need to identify a short-term solution which doesn't require refactoring. My personal opinion is what Russell and Thierry expressed, ie. subteam delegation (to what I call "half-cores") for iterations and only approvals for cores. Yeah, I don't have much of an issue with the subteam delegation proposals. It's just really a technical problem to solve w.r.t. Gerrit permissions. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Daniel P. Berrange wrote: > For a long time I've use the LKML 'subsystem maintainers' model as the > reference point for ideas. In a more LKML like model, each virt team > (or other subsystem team) would have their own separate GIT repo with > a complete Nova codebase, where they did they day to day code submissions, > reviews and merges. Periodically the primary subsystem maintainer would > submit a large pull / merge requests to the overall Nova maintainer. > The $1,000,000 question in such a model is what kind of code review > happens during the big pull requests to integrate subsystem trees. Please note that the Kernel subsystem model is actually a trust tree based on 20 years of trust building. OpenStack is only 4 years old, so it's difficult to apply the same model as-is. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 07:49:04AM -0400, Sean Dague wrote: > On 09/05/2014 07:26 AM, Daniel P. Berrange wrote: > > On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote: > >> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote: > >>> On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote: > On Thu, 4 Sep 2014 11:24:29 +0100 > "Daniel P. Berrange" wrote: > > > > - A fairly significant amount of nova code would need to be > >considered semi-stable API. Certainly everything under nova/virt > >and any object which is passed in/out of the virt driver API. > >Changes to such APIs would have to be done in a backwards > >compatible manner, since it is no longer possible to lock-step > >change all the virt driver impls. In some ways I think this would > >be a good thing as it will encourage people to put more thought > >into the long term maintainability of nova internal code instead > >of relying on being able to rip it apart later, at will. > > > > - The nova/virt/driver.py class would need to be much better > >specified. All parameters / return values which are opaque dicts > >must be replaced with objects + attributes. Completion of the > >objectification work is mandatory, so there is cleaner separation > >between virt driver impls & the rest of Nova. > > I think for this to work well with multiple repositories and drivers > having different priorities over implementing changes in the API it > would not just need to be semi-stable, but stable with versioning built > in from the start to allow for backwards incompatible changes. And > the interface would have to be very well documented including things > such as what exceptions are allowed to be raised through the API. > Hopefully this would be enforced through code as well. But as long as > driver maintainers are willing to commit to this extra overhead I can > see it working. > >>> > >>> With our primary REST or RPC APIs we're under quite strict rules about > >>> what we can & can't change - almost impossible to remove an existing > >>> API from the REST API for example. With the internal virt driver API > >>> we would probably have a little more freedom. For example, I think > >>> if we found an existing virt driver API that was insufficient for a > >>> new bit of work, we could add a new API in parallel with it, give the > >>> virt drivers 1 dev cycle to convert, and then permanently delete the > >>> original virt driver API. So a combination of that kind of API > >>> replacement, versioning for some data structures/objects, and use of > >>> the capabilties flags would probably be sufficient. That's what I mean > >>> by semi-stable here - no need to maintain existing virt driver APIs > >>> indefinitely - we can remove & replace them in reasonably short time > >>> scales as long as we avoid any lock-step updates. > >> > >> I have spent a lot of time over the last year working on things that > >> require coordinated code lands between projects it's much more > >> friction than you give it credit. > >> > >> Every added git tree adds a non linear cost to mental overhead, and a > >> non linear integration cost. Realistically the reason the gate is in the > >> state it is has a ton to do with the fact that it's integrating 40 git > >> trees. Because virt drivers run in the process space of Nova Compute, > >> they can pretty much do whatever, and the impacts are going to be > >> somewhat hard to figure out. > >> > >> Also, if spinning these out seems like the right idea, I think nova-core > >> needs to retain core rights over the drivers as well. Because there do > >> need to be veto authority on some of the worst craziness. > > > > If they want todo crazy stuff, let them live or die with the > > consequences. > > > >> If the VMWare team stopped trying to build a distributed lock manager > >> inside their compute driver, or the Hyperv team didn't wait until J2 to > >> start pushing patches, I think there would be more trust in some of > >> these teams. But, I am seriously concerned in both those cases, and the > >> slow review there is a function of a historic lack of trust in judgment. > >> I also personally went on a moratorium a year ago in reviewing either > >> driver because entities at both places where complaining to my > >> management chain through back channels that I was -1ing their code... > > > > I venture to suggest that the reason we care so much about those kind > > of things is precisely because of our policy of pulling them in the > > tree. Having them in tree means their quality (or not) reflects directly > > on the project as a whole. Separate them from Nova as a whole and give > > them control of their own desinty and they can deal with the consequences > > of their actions and people can judge the results for themselves. > > > > We don't have the ti
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 07:40 AM, Daniel P. Berrange wrote: > On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote: >> On 09/05/2014 06:40 AM, Nikola Đipanov wrote: >>> A handy example of this I can think of is the currently granted FFE for >>> serial consoles - consider how much of the code went into the common >>> part vs. the libvirt specific part, I would say the ratio is very close >>> to 1 if not even in favour of the common part (current 4 outstanding >>> patches are all for core, and out of the 5 merged - only one of them was >>> purely libvirt specific, assuming virt/ will live in nova-common). >>> >>> Joe asked a similar question elsewhere on the thread. >>> >>> Once again - I am not against doing it - what I am saying is that we >>> need to look into this closer as it may not be as big of a win from the >>> number of changes needed per feature as we may think. >>> >>> Just some things to think about with regards to the whole idea, by no >>> means exhaustive. >> >> So maybe the better question is: what are the top sources of technical >> debt in Nova that we need to address? And if we did, everyone would be >> more sane, and feel less burnt. >> >> Maybe the drivers are the worst debt, and jettisoning them makes them >> someone else's problem, so that helps some. I'm not entirely convinced >> right now. >> >> I think Cells represents a lot of debt right now. It doesn't fully work >> with the rest of Nova, and produces a ton of extra code paths special >> cased for the cells path. >> >> The Scheduler has a ton of debt as has been pointed out by the efforts >> in and around Gannt. The focus has been on the split, but realistically >> I'm with Jay is that we should focus on the debt, and exposing a REST >> interface in Nova. >> >> What about the Nova objects transition? That continues to be slow >> because it's basically Dan (with a few other helpers from time to time). >> Would it be helpful if we did an all hands on deck transition of the >> rest of Nova for K1 and just get it done? Would be nice to have the bulk >> of Nova core working on one thing like this and actually be in shared >> context with everyone else for a while. > > I think the idea that we can tell everyone in Nova what they should > focus on for a cycle, or more generally, is doomed to failure. This > isn't a closed source company controlled project where you can dictate > what everyones priority must be. We must accept that rely on all our > contributors good will in voluntarily giving their time & resource to > the projct, to scratch whatever itch they have in the project. We have > to encourage them to want to work nova and demonstrate that we value > whatever form of contributor they choose to make. If we have technical > debt that we think is important to address we need to illustrate / > show people why they should care about helping. If they none the less > decide that work isn't for them, we can't just cast them aside and/or > ignore their contributions, while we get on with other things. This > is why I think it is important that we split up nova to allow each > are to self-organize around what they consider to be priorities in > their area of interest / motivation. Not enabling that is going to > to continue to kill our community I'm getting tired of the reprieve that because we are an Open Source project declaring priorities is pointless, because it's not. I would say it's actually the exception that a developer wakes up in the morning and says "I completely disregard what anyone else thinks is important in this project, this is what I'm going to do today". Because if that's how they felt they wouldn't choose to be part of a community, they would just go do their own thing. Lone wolfs by definition don't form communities. And the FFE process is firm demonstration that when we pick a small number of things to look at, they move a lot more quickly. People are always free to work on whatever they want. But providing some focus to debt clean up. FFE++ effectively, would be really nice. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, 5 Sep 2014, Daniel P. Berrange wrote: I venture to suggest that the reason we care so much about those kind of things is precisely because of our policy of pulling them in the tree. Having them in tree means their quality (or not) reflects directly on the project as a whole. Separate them from Nova as a whole and give them control of their own desinty and they can deal with the consequences of their actions and people can judge the results for themselves. Apart from any of the other issues present in this thread (and not commenting on them in this message), I think this paragraph (above) represents an unfortunately narrow view about how perceptions of the quality of OpenStack work. People who are invested in using OpenStack in some fashion and are not in the development priesthood see OpenStack. They don't see individual teams making virt drivers. It may be (I don't know) that having more granularity in projects will allow different teams to engage at different rates and thus get stuff done, but I do not think it will do much with regard to external perceptions of quality. That's going to take a much different kind of work and attention. -- Chris Dent tw:@anticdent freenode:cdent https://tank.peermore.com/tanks/cdent ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 07:26 AM, Daniel P. Berrange wrote: > On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote: >> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote: >>> On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote: On Thu, 4 Sep 2014 11:24:29 +0100 "Daniel P. Berrange" wrote: > > - A fairly significant amount of nova code would need to be >considered semi-stable API. Certainly everything under nova/virt >and any object which is passed in/out of the virt driver API. >Changes to such APIs would have to be done in a backwards >compatible manner, since it is no longer possible to lock-step >change all the virt driver impls. In some ways I think this would >be a good thing as it will encourage people to put more thought >into the long term maintainability of nova internal code instead >of relying on being able to rip it apart later, at will. > > - The nova/virt/driver.py class would need to be much better >specified. All parameters / return values which are opaque dicts >must be replaced with objects + attributes. Completion of the >objectification work is mandatory, so there is cleaner separation >between virt driver impls & the rest of Nova. I think for this to work well with multiple repositories and drivers having different priorities over implementing changes in the API it would not just need to be semi-stable, but stable with versioning built in from the start to allow for backwards incompatible changes. And the interface would have to be very well documented including things such as what exceptions are allowed to be raised through the API. Hopefully this would be enforced through code as well. But as long as driver maintainers are willing to commit to this extra overhead I can see it working. >>> >>> With our primary REST or RPC APIs we're under quite strict rules about >>> what we can & can't change - almost impossible to remove an existing >>> API from the REST API for example. With the internal virt driver API >>> we would probably have a little more freedom. For example, I think >>> if we found an existing virt driver API that was insufficient for a >>> new bit of work, we could add a new API in parallel with it, give the >>> virt drivers 1 dev cycle to convert, and then permanently delete the >>> original virt driver API. So a combination of that kind of API >>> replacement, versioning for some data structures/objects, and use of >>> the capabilties flags would probably be sufficient. That's what I mean >>> by semi-stable here - no need to maintain existing virt driver APIs >>> indefinitely - we can remove & replace them in reasonably short time >>> scales as long as we avoid any lock-step updates. >> >> I have spent a lot of time over the last year working on things that >> require coordinated code lands between projects it's much more >> friction than you give it credit. >> >> Every added git tree adds a non linear cost to mental overhead, and a >> non linear integration cost. Realistically the reason the gate is in the >> state it is has a ton to do with the fact that it's integrating 40 git >> trees. Because virt drivers run in the process space of Nova Compute, >> they can pretty much do whatever, and the impacts are going to be >> somewhat hard to figure out. >> >> Also, if spinning these out seems like the right idea, I think nova-core >> needs to retain core rights over the drivers as well. Because there do >> need to be veto authority on some of the worst craziness. > > If they want todo crazy stuff, let them live or die with the > consequences. > >> If the VMWare team stopped trying to build a distributed lock manager >> inside their compute driver, or the Hyperv team didn't wait until J2 to >> start pushing patches, I think there would be more trust in some of >> these teams. But, I am seriously concerned in both those cases, and the >> slow review there is a function of a historic lack of trust in judgment. >> I also personally went on a moratorium a year ago in reviewing either >> driver because entities at both places where complaining to my >> management chain through back channels that I was -1ing their code... > > I venture to suggest that the reason we care so much about those kind > of things is precisely because of our policy of pulling them in the > tree. Having them in tree means their quality (or not) reflects directly > on the project as a whole. Separate them from Nova as a whole and give > them control of their own desinty and they can deal with the consequences > of their actions and people can judge the results for themselves. > > We don't have the time or resources go continue baby-sitting them > ourselves - attempting todo so has just resulted in a scenario where > they end up getting largely ignored as you admit here. This ultimately > makes their quality even worse,
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote: > On 09/05/2014 06:40 AM, Nikola Đipanov wrote: > > A handy example of this I can think of is the currently granted FFE for > > serial consoles - consider how much of the code went into the common > > part vs. the libvirt specific part, I would say the ratio is very close > > to 1 if not even in favour of the common part (current 4 outstanding > > patches are all for core, and out of the 5 merged - only one of them was > > purely libvirt specific, assuming virt/ will live in nova-common). > > > > Joe asked a similar question elsewhere on the thread. > > > > Once again - I am not against doing it - what I am saying is that we > > need to look into this closer as it may not be as big of a win from the > > number of changes needed per feature as we may think. > > > > Just some things to think about with regards to the whole idea, by no > > means exhaustive. > > So maybe the better question is: what are the top sources of technical > debt in Nova that we need to address? And if we did, everyone would be > more sane, and feel less burnt. > > Maybe the drivers are the worst debt, and jettisoning them makes them > someone else's problem, so that helps some. I'm not entirely convinced > right now. > > I think Cells represents a lot of debt right now. It doesn't fully work > with the rest of Nova, and produces a ton of extra code paths special > cased for the cells path. > > The Scheduler has a ton of debt as has been pointed out by the efforts > in and around Gannt. The focus has been on the split, but realistically > I'm with Jay is that we should focus on the debt, and exposing a REST > interface in Nova. > > What about the Nova objects transition? That continues to be slow > because it's basically Dan (with a few other helpers from time to time). > Would it be helpful if we did an all hands on deck transition of the > rest of Nova for K1 and just get it done? Would be nice to have the bulk > of Nova core working on one thing like this and actually be in shared > context with everyone else for a while. I think the idea that we can tell everyone in Nova what they should focus on for a cycle, or more generally, is doomed to failure. This isn't a closed source company controlled project where you can dictate what everyones priority must be. We must accept that rely on all our contributors good will in voluntarily giving their time & resource to the projct, to scratch whatever itch they have in the project. We have to encourage them to want to work nova and demonstrate that we value whatever form of contributor they choose to make. If we have technical debt that we think is important to address we need to illustrate / show people why they should care about helping. If they none the less decide that work isn't for them, we can't just cast them aside and/or ignore their contributions, while we get on with other things. This is why I think it is important that we split up nova to allow each are to self-organize around what they consider to be priorities in their area of interest / motivation. Not enabling that is going to to continue to kill our community Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote: > On 09/05/2014 06:22 AM, Daniel P. Berrange wrote: > > On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote: > >> On Thu, 4 Sep 2014 11:24:29 +0100 > >> "Daniel P. Berrange" wrote: > >>> > >>> - A fairly significant amount of nova code would need to be > >>>considered semi-stable API. Certainly everything under nova/virt > >>>and any object which is passed in/out of the virt driver API. > >>>Changes to such APIs would have to be done in a backwards > >>>compatible manner, since it is no longer possible to lock-step > >>>change all the virt driver impls. In some ways I think this would > >>>be a good thing as it will encourage people to put more thought > >>>into the long term maintainability of nova internal code instead > >>>of relying on being able to rip it apart later, at will. > >>> > >>> - The nova/virt/driver.py class would need to be much better > >>>specified. All parameters / return values which are opaque dicts > >>>must be replaced with objects + attributes. Completion of the > >>>objectification work is mandatory, so there is cleaner separation > >>>between virt driver impls & the rest of Nova. > >> > >> I think for this to work well with multiple repositories and drivers > >> having different priorities over implementing changes in the API it > >> would not just need to be semi-stable, but stable with versioning built > >> in from the start to allow for backwards incompatible changes. And > >> the interface would have to be very well documented including things > >> such as what exceptions are allowed to be raised through the API. > >> Hopefully this would be enforced through code as well. But as long as > >> driver maintainers are willing to commit to this extra overhead I can > >> see it working. > > > > With our primary REST or RPC APIs we're under quite strict rules about > > what we can & can't change - almost impossible to remove an existing > > API from the REST API for example. With the internal virt driver API > > we would probably have a little more freedom. For example, I think > > if we found an existing virt driver API that was insufficient for a > > new bit of work, we could add a new API in parallel with it, give the > > virt drivers 1 dev cycle to convert, and then permanently delete the > > original virt driver API. So a combination of that kind of API > > replacement, versioning for some data structures/objects, and use of > > the capabilties flags would probably be sufficient. That's what I mean > > by semi-stable here - no need to maintain existing virt driver APIs > > indefinitely - we can remove & replace them in reasonably short time > > scales as long as we avoid any lock-step updates. > > I have spent a lot of time over the last year working on things that > require coordinated code lands between projects it's much more > friction than you give it credit. > > Every added git tree adds a non linear cost to mental overhead, and a > non linear integration cost. Realistically the reason the gate is in the > state it is has a ton to do with the fact that it's integrating 40 git > trees. Because virt drivers run in the process space of Nova Compute, > they can pretty much do whatever, and the impacts are going to be > somewhat hard to figure out. > > Also, if spinning these out seems like the right idea, I think nova-core > needs to retain core rights over the drivers as well. Because there do > need to be veto authority on some of the worst craziness. If they want todo crazy stuff, let them live or die with the consequences. > If the VMWare team stopped trying to build a distributed lock manager > inside their compute driver, or the Hyperv team didn't wait until J2 to > start pushing patches, I think there would be more trust in some of > these teams. But, I am seriously concerned in both those cases, and the > slow review there is a function of a historic lack of trust in judgment. > I also personally went on a moratorium a year ago in reviewing either > driver because entities at both places where complaining to my > management chain through back channels that I was -1ing their code... I venture to suggest that the reason we care so much about those kind of things is precisely because of our policy of pulling them in the tree. Having them in tree means their quality (or not) reflects directly on the project as a whole. Separate them from Nova as a whole and give them control of their own desinty and they can deal with the consequences of their actions and people can judge the results for themselves. We don't have the time or resources go continue baby-sitting them ourselves - attempting todo so has just resulted in a scenario where they end up getting largely ignored as you admit here. This ultimately makes their quality even worse, because the lack of reviewer availability means they stand little chance of pushing through the work to
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 06:40 AM, Nikola Đipanov wrote: > On 09/04/2014 12:24 PM, Daniel P. Berrange wrote: >> Position statement >> == >> >> Over the past year I've increasingly come to the conclusion that >> Nova is heading for (or probably already at) a major crisis. If >> steps are not taken to avert this, the project is likely to loose >> a non-trivial amount of talent, both regular code contributors and >> core team members. That includes myself. This is not good for >> Nova's long term health and so should be of concern to anyone >> involved in Nova and OpenStack. >> >> For those who don't want to read the whole mail, the executive >> summary is that the nova-core team is an unfixable bottleneck >> in our development process with our current project structure. >> The only way I see to remove the bottleneck is to split the virt >> drivers out of tree and let them all have their own core teams >> in their area of code, leaving current nova core to focus on >> all the common code outside the virt driver impls. I, now, none >> the less urge people to read the whole mail. >> >> >> Background information >> == >> >> I see many factors coming together to form the crisis >> >> - Burn out of core team members from over work >> - Difficulty bringing new talent into the core team >> - Long delay in getting code reviewed & merged >> - Marginalization of code areas which aren't popular >> - Increasing size of nova code through new drivers >> - Exclusion of developers without corporate backing >> >> Each item on their own may not seem too bad, but combined they >> add up to a big problem. >> > > As many others - I cannot +1 this enough. Some technical comments below > that we may want to consider before, but to sum them up - this will be a > TON OF WORK! we better make sure we really want to do this before. > > However - please don't read this as FUD, maybe rather pointing out that > devil is in the details, and maybe getting ahead of myself with too deep > of a dive. > >> >> - A fairly significant amount of nova code would need to be >>considered semi-stable API. Certainly everything under nova/virt >>and any object which is passed in/out of the virt driver API. >>Changes to such APIs would have to be done in a backwards >>compatible manner, since it is no longer possible to lock-step >>change all the virt driver impls. In some ways I think this would >>be a good thing as it will encourage people to put more thought >>into the long term maintainability of nova internal code instead >>of relying on being able to rip it apart later, at will. >> > > I think we should not underestimate how big of a job this will be. We > have been treating that API as internal for a long time and a lot of > abstractions are just broken and need to be redesigned and then > refactored. A lot of the stuff is implementation specific (live > migrations is a good example of this). What makes it more difficult is > that we need to get this as right as possible before we do the split. > > Now I am not saying this cannot be done or that we shouldn't to it, > however I _am_ saying that we should not take lightly how much work > there will be and how fiddly the work itself is. > > On top of that - there are some other serious issues with nova common > code that we need to take care of ASAP, and this will definitely > increase the churn and make that more difficult. We should take this > into account and make sure we are focusing efforts on the right things. > Making sure we do is the biggest challenge nova core faces in addition > to all the others mentioned above. > >> - The nova/virt/driver.py class would need to be much better >>specified. All parameters / return values which are opaque dicts >>must be replaced with objects + attributes. Completion of the >>objectification work is mandatory, so there is cleaner separation >>between virt driver impls & the rest of Nova. >> > > Not only that - currently nova-objects do their versioning magic only > over RPC, while they would have to do it over library boundaries. This > in itself will require work, and is likely going to influence how we > stabilize the API. > > However - splitting out the scheduler is likely to require objects to be > able to do similar things, and there are other things that we may want > to do (e.g. using properly versioned data for the extensible resources) > that will benefit from this. > >> - If changes are required to common code, the virt driver developer >>would first have to get the necccessary pieces merged into Nova >>common. Then the follow up virt driver specific changes could be >>proposed to their repo. This implies that some changes to virt >>drivers will still contend for resource in the common nova repo >>and team. This contention should be lower than it is today though >>since the current nova core team should have less code to look >>
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 12:40:59PM +0200, Nikola Đipanov wrote: > On 09/04/2014 12:24 PM, Daniel P. Berrange wrote: > > - A fairly significant amount of nova code would need to be > >considered semi-stable API. Certainly everything under nova/virt > >and any object which is passed in/out of the virt driver API. > >Changes to such APIs would have to be done in a backwards > >compatible manner, since it is no longer possible to lock-step > >change all the virt driver impls. In some ways I think this would > >be a good thing as it will encourage people to put more thought > >into the long term maintainability of nova internal code instead > >of relying on being able to rip it apart later, at will. > > > > I think we should not underestimate how big of a job this will be. We > have been treating that API as internal for a long time and a lot of > abstractions are just broken and need to be redesigned and then > refactored. A lot of the stuff is implementation specific (live > migrations is a good example of this). What makes it more difficult is > that we need to get this as right as possible before we do the split. > > Now I am not saying this cannot be done or that we shouldn't to it, > however I _am_ saying that we should not take lightly how much work > there will be and how fiddly the work itself is. > > On top of that - there are some other serious issues with nova common > code that we need to take care of ASAP, and this will definitely > increase the churn and make that more difficult. We should take this > into account and make sure we are focusing efforts on the right things. > Making sure we do is the biggest challenge nova core faces in addition > to all the others mentioned above. > > > - The nova/virt/driver.py class would need to be much better > >specified. All parameters / return values which are opaque dicts > >must be replaced with objects + attributes. Completion of the > >objectification work is mandatory, so there is cleaner separation > >between virt driver impls & the rest of Nova. > > > > Not only that - currently nova-objects do their versioning magic only > over RPC, while they would have to do it over library boundaries. This > in itself will require work, and is likely going to influence how we > stabilize the API. > > However - splitting out the scheduler is likely to require objects to be > able to do similar things, and there are other things that we may want > to do (e.g. using properly versioned data for the extensible resources) > that will benefit from this. Looking at what we did for the NUMA work, the objects we have returned from the nova/virt/driver.py APIs (as defined in hardware.py) are separate from the versioned objects we use for persisting the data in the datbase (as defined nova/objects/numa_topology.py). So in this case the nova-objects versioning problem doesn't leak into the virt drivers. If solving the versioning problemm over library boundaries isn't workable, then perhaps the separate of objects is what we should look at. ie, the version objects be purely an internal thing for nova common to deal with and objects to be consumed by the virt drivers are defined by the virt driver API itself. > > - If changes are required to common code, the virt driver developer > >would first have to get the necccessary pieces merged into Nova > >common. Then the follow up virt driver specific changes could be > >proposed to their repo. This implies that some changes to virt > >drivers will still contend for resource in the common nova repo > >and team. This contention should be lower than it is today though > >since the current nova core team should have less code to look > >after per-person on aggregate. > > > > A handy example of this I can think of is the currently granted FFE for > serial consoles - consider how much of the code went into the common > part vs. the libvirt specific part, I would say the ratio is very close > to 1 if not even in favour of the common part (current 4 outstanding > patches are all for core, and out of the 5 merged - only one of them was > purely libvirt specific, assuming virt/ will live in nova-common). > > Joe asked a similar question elsewhere on the thread. In terms of patches merged to Nova, 1385 merged in 6 months, of which 437 (30%) touched /virt/ files. This obviously doesn't distinguish between virt driver changes that we 100% isolated inside the virt driver from changes that touch multiple code areas. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org htt
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 06:22 AM, Daniel P. Berrange wrote: > On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote: >> On Thu, 4 Sep 2014 11:24:29 +0100 >> "Daniel P. Berrange" wrote: >>> >>> - A fairly significant amount of nova code would need to be >>>considered semi-stable API. Certainly everything under nova/virt >>>and any object which is passed in/out of the virt driver API. >>>Changes to such APIs would have to be done in a backwards >>>compatible manner, since it is no longer possible to lock-step >>>change all the virt driver impls. In some ways I think this would >>>be a good thing as it will encourage people to put more thought >>>into the long term maintainability of nova internal code instead >>>of relying on being able to rip it apart later, at will. >>> >>> - The nova/virt/driver.py class would need to be much better >>>specified. All parameters / return values which are opaque dicts >>>must be replaced with objects + attributes. Completion of the >>>objectification work is mandatory, so there is cleaner separation >>>between virt driver impls & the rest of Nova. >> >> I think for this to work well with multiple repositories and drivers >> having different priorities over implementing changes in the API it >> would not just need to be semi-stable, but stable with versioning built >> in from the start to allow for backwards incompatible changes. And >> the interface would have to be very well documented including things >> such as what exceptions are allowed to be raised through the API. >> Hopefully this would be enforced through code as well. But as long as >> driver maintainers are willing to commit to this extra overhead I can >> see it working. > > With our primary REST or RPC APIs we're under quite strict rules about > what we can & can't change - almost impossible to remove an existing > API from the REST API for example. With the internal virt driver API > we would probably have a little more freedom. For example, I think > if we found an existing virt driver API that was insufficient for a > new bit of work, we could add a new API in parallel with it, give the > virt drivers 1 dev cycle to convert, and then permanently delete the > original virt driver API. So a combination of that kind of API > replacement, versioning for some data structures/objects, and use of > the capabilties flags would probably be sufficient. That's what I mean > by semi-stable here - no need to maintain existing virt driver APIs > indefinitely - we can remove & replace them in reasonably short time > scales as long as we avoid any lock-step updates. I have spent a lot of time over the last year working on things that require coordinated code lands between projects it's much more friction than you give it credit. Every added git tree adds a non linear cost to mental overhead, and a non linear integration cost. Realistically the reason the gate is in the state it is has a ton to do with the fact that it's integrating 40 git trees. Because virt drivers run in the process space of Nova Compute, they can pretty much do whatever, and the impacts are going to be somewhat hard to figure out. Also, if spinning these out seems like the right idea, I think nova-core needs to retain core rights over the drivers as well. Because there do need to be veto authority on some of the worst craziness. If the VMWare team stopped trying to build a distributed lock manager inside their compute driver, or the Hyperv team didn't wait until J2 to start pushing patches, I think there would be more trust in some of these teams. But, I am seriously concerned in both those cases, and the slow review there is a function of a historic lack of trust in judgment. I also personally went on a moratorium a year ago in reviewing either driver because entities at both places where complaining to my management chain through back channels that I was -1ing their code... when I was one of the few people actually trying to provide constructive feedback (basically only Russell and I were reviewing that code in Grizzly, everyone else was ignoring it). Things may have changed since then, at least I see a ton of good work from tjones in making Nova overall better, but that was a pretty bitter pill. (Sorry for the tangent, but honestly if we are going to fix what's broken we probably have to expose all related brokens.) If the concern is that we are keeping out too many contributors by the CI requirements: let's let Class C back in tree. I believe in the Freebsd case you were one of the original opponents to a top level driver, and that they should go through libvirt instead. But I'm cool with them just showing up as a Class C. But I honestly don't think the virt driver split is going to make any of this easier, when you account for the additional overhead it's going to create, and the work required to get there. -Sean -- Sean Dague http://dague.net _
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 11:29:43AM +0100, John Garbutt wrote: > On 4 September 2014 23:48, Russell Bryant wrote: > > On 09/04/2014 06:24 AM, Daniel P. Berrange wrote: > > If we ignored gerrit for a moment, is rapid increase in splitting out > > components the ideal workflow? Would we be better off finding a way to > > finally just implement a model more like the Linux kernel with > > sub-system maintainers and pull requests to a top-level tree? Maybe. > > I'm not convinced that split of repos is obviously better. > > I was thinking along similar lines. > > Regardless of that, we should try this for Kilo. > > If it feels like we are getting too much driver divergence, and > tempest is not keeping everyone inline, the community is fragmenting > and no one is working on the core of nova, then we might have to think > about an alternative plan for L, including bringing the drivers back > in tree. > > At least the separate repos will help us firm up the interfaces, which > I think is a good thing. > > I worry about what it means to test a feature in "nova common, nova > api, or nova core" or whatever we call it, if there are no virt > drivers in tree. To some extent we might want to improve the fake virt > driver for some in-tree functional tests anyways. But thats a separate > discussion. I look at what we do with Ironic testing current as a guide here. We have tempest job that runs against Nova, that validates changes to nova don't break the separate Ironic git repo. So my thought is that all our current tempest jobs would simply work in that way. IOW changes to so called "nova common" would run jobs that validate the change against all the virt driver git repos. I think this kind of setup is pretty much mandatory for split repos to be viable, because I don't want to see us loose testing coverage in this proposed change. Having a decent in-tree fake virt driver would none the less be a nice idea, because it would allow for more complete functional testing isolated from the risks of bugs in the virt drivers themselves. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 07:22 PM, Michael Still wrote: > On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange > wrote: > > [Heavy snipping because of length] > >> The radical (?) solution to the nova core team bottleneck is thus to >> follow this lead and split the nova virt drivers out into separate >> projects and delegate their maintainence to new dedicated teams. >> >> - Nova becomes the home for the public APIs, RPC system, database >>persistent and the glue that ties all this together with the >>virt driver API. >> >> - Each virt driver project gets its own core team and is responsible >>for dealing with review, merge & release of their codebase. > > I think this is the crux of the matter. We're not doing a great job of > landing code at the moment, because we can't keep up with the review > workload. > > So far we've had two proposals mooted: > > - slots / runways, where we try to rate limit the number of things > we're trying to review at once to maintain focus > - splitting all the virt drivers out of the nova tree > > Splitting the drivers out of the nova tree does come at a cost -- we'd > need to stabilise and probably version the hypervisor driver > interface, and that will encourage more "out of tree" drivers, which > are things we haven't historically wanted to do. If we did this split, > I think we need to acknowledge that we are changing policy there. It > also means that nova-core wouldn't be the ones holding the quality bar > for hypervisor drivers any more, I guess this would open the door for > drivers to more actively compete on the quality of their > implementations, which might be a good thing. > > Both of these have interesting aspects, and I agree we need to do > _something_. I do wonder if there is a hybrid approach as well though. > For example, could we implement some sort of more formal lieutenant > system for drivers? We've talked about it in the past but never been > able to express how it would work in practise. > > The last few days have been interesting as I watch FFEs come through. > People post explaining their feature, its importance, and the risk > associated with it. Three cores sign on for review. All of the ones > I've looked at have received active review since being posted. Would > it be bonkers to declare nova to be in "permanent feature freeze"? If > we could maintain the level of focus we see now, then we'd be getting > heaps more done that before. Agreed. Honestly, this has been a really nice flow. I'd love to figure out what part of this focus is capturable for normal cadence. This realistically is what I was hoping slots would provide, because I feel like we actually move really fast when we call out 5-10 things to go look at this week. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 01:26 AM, Jay Pipes wrote: > On 09/04/2014 10:33 AM, Dugger, Donald D wrote: >> Basically +1 with what Daniel is saying (note that, as mentioned, a >> side effect of our effort to split out the scheduler will help but >> not solve this problem). > > The difference between Dan's proposal and the Gantt split is that Dan's > proposal features quite prominently the following: > > == begin == > > - The nova/virt/driver.py class would need to be much better >specified. All parameters / return values which are opaque dicts >must be replaced with objects + attributes. Completion of the >objectification work is mandatory, so there is cleaner separation >between virt driver impls & the rest of Nova. > > == end == > > In other words, Dan's proposal above is EXACTLY what I've been saying > needs to be done to the interfaces between nova-conductor, nova-compute, > and nova-scheduler *before* any split of the scheduler code is even > remotely feasible. > > Splitting the scheduler out before this is done would actually not "help > but not solve this problem" -- it would instead further the problem, IMO. > I don't think it's news to anyone that I strongly agree with the above but let me restate that once more: +1000 Not only that - but we need to make sure the APIs are *good and sane* too. This is where the real meat of these types of problems is really. If you need an example of why this is so crazy important - take a look at Cinder that did get split out, and all the grief that came from it the API being half baked ([1], [2], but there is plenty more examples). Actually - as I write this I think of Ironic and can't help but think that the API is _so freakin' important_ that you actually might be better off writing the whole thing from scratch just to get the API right. [1] https://review.openstack.org/#/c/87546/ [2] https://bugs.launchpad.net/tempest/+bug/1302774 N. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 5 September 2014 00:26, Jay Pipes wrote: > On 09/04/2014 10:33 AM, Dugger, Donald D wrote: >> >> Basically +1 with what Daniel is saying (note that, as mentioned, a >> side effect of our effort to split out the scheduler will help but >> not solve this problem). > > > The difference between Dan's proposal and the Gantt split is that Dan's > proposal features quite prominently the following: > > == begin == > > - The nova/virt/driver.py class would need to be much better >specified. All parameters / return values which are opaque dicts >must be replaced with objects + attributes. Completion of the >objectification work is mandatory, so there is cleaner separation >between virt driver impls & the rest of Nova. > > == end == > > In other words, Dan's proposal above is EXACTLY what I've been saying needs > to be done to the interfaces between nova-conductor, nova-compute, and > nova-scheduler *before* any split of the scheduler code is even remotely > feasible. > > Splitting the scheduler out before this is done would actually not "help but > not solve this problem" -- it would instead further the problem, IMO. Given any changes we make to the scheduler interface need to be backwards compatible, I am not totally convinced being in a separate repo makes things a whole lot worse, vs the review bottlenecks we have. Anyways, I certainly agree that work needs to be done ASAP, and if we can make that a priority in Nova, it would be much quicker and easier to do while still inside Nova. We have similar issues with glance, cinder and neutron right now that need fixing soon too. I know we have patches up for some improvements in that area, but it certainly feels like we need to do better there. The virt driver is a step ahead of the scheduler because we know what interface we are talking about, and we already have most of a versioning plan in place. I think the key work we have with the scheduler is to actually draw out the interface (in code), so we agree what interface we need to firm up and version. I think we are starting to get agreement on that now, which is great. I still think the scheduler split is as urgent as the virt split, but the virt split is much closer to being possible right now. At this point, it feels like all of kilo-1 gets dedicated to splitting out these interfaces, and completing objects. But lets see what the summit brings. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/05/2014 03:02 AM, Sylvain Bauza wrote: > > Le 05/09/2014 01:22, Michael Still a écrit : >> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange >> wrote: >> >> [Heavy snipping because of length] >> >>> The radical (?) solution to the nova core team bottleneck is thus to >>> follow this lead and split the nova virt drivers out into separate >>> projects and delegate their maintainence to new dedicated teams. >>> >>> - Nova becomes the home for the public APIs, RPC system, database >>> persistent and the glue that ties all this together with the >>> virt driver API. >>> >>> - Each virt driver project gets its own core team and is responsible >>> for dealing with review, merge & release of their codebase. >> I think this is the crux of the matter. We're not doing a great job of >> landing code at the moment, because we can't keep up with the review >> workload. >> >> So far we've had two proposals mooted: >> >> - slots / runways, where we try to rate limit the number of things >> we're trying to review at once to maintain focus >> - splitting all the virt drivers out of the nova tree > > Ahem, IIRC, there is a third proposal for Kilo : > - create subteam's half-cores responsible for reviewing patch's > iterations and send to cores approvals requests once they consider the > patch enough stable for it. > > As I explained, it would allow to free up reviewing time for cores > without loosing the control over what is being merged. I don't really understand how the half core idea works outside of a math equation, because the point is in core is to have trust over the judgement of your fellow core members so that they can land code when you aren't looking. I'm not sure how I manage to build up half trust in someone any quicker. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 06:22:18PM -0500, Michael Still wrote: > On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange > wrote: > > [Heavy snipping because of length] > > > The radical (?) solution to the nova core team bottleneck is thus to > > follow this lead and split the nova virt drivers out into separate > > projects and delegate their maintainence to new dedicated teams. > > > > - Nova becomes the home for the public APIs, RPC system, database > >persistent and the glue that ties all this together with the > >virt driver API. > > > > - Each virt driver project gets its own core team and is responsible > >for dealing with review, merge & release of their codebase. > > I think this is the crux of the matter. We're not doing a great job of > landing code at the moment, because we can't keep up with the review > workload. > > So far we've had two proposals mooted: > > - slots / runways, where we try to rate limit the number of things > we're trying to review at once to maintain focus FWIW, I'm not really seeing that as a long term solution. In its essence it is just a more effective way for us to say 'no' to our potential contributors. While it could no doubt relieve pressure on the core team by reducing the flow of the pipe, I don't think it is helpful for our contributors overall. > - splitting all the virt drivers out of the nova tree > > Splitting the drivers out of the nova tree does come at a cost -- we'd > need to stabilise and probably version the hypervisor driver > interface, and that will encourage more "out of tree" drivers, which > are things we haven't historically wanted to do. If we did this split, > I think we need to acknowledge that we are changing policy there. It > also means that nova-core wouldn't be the ones holding the quality bar > for hypervisor drivers any more, I guess this would open the door for > drivers to more actively compete on the quality of their > implementations, which might be a good thing. There are already a number of drivers out of tree such as Docker, Ironic (though soon to be in tree), and IIUC there's something IBM have done for Power hypervisor, and work Oracle have done for the Solaris virt/container technologies. Probably the distinction I'd made is around things that are actively part of the OpenStack community (eg on our gerrit infrastructure and or stackforge, etc), vs things that are developed in complete isolation from the OpenStack community. I'm unclear what the state of play is wrt discussions on OpenStack technology compatibility certification & trademark usage, but perhaps that is a partial counterweight to your concern ? I'd certainly like to see a focus on out of tree drivers remaining a strong part of the openstack community, and not go off into their own completely isolated world outside the community. But yes, I am clearly proposing a change our integration policy here and so we need need to carefully consider what that means and take any neccessary steps to mitigate risks. In some respects I think the split repos could allow us to raise the bar in terms of quality. For example, with a single repo, I don't see it ever being practical to make VMware/HyperV/XenAPI CI systems gating on changes, because it would push up the level of pain from false job failures in the gate even further than today. With a separate repo each virt driver would only need to run jobs directly related to them, so the VMWare CI could easily be made gating on VMWare driver git repo. On testing in general, I think we need to look at the granularity at which we run tests, in order to let us scale up the number of tests we run. For example, it is suggested that each feature like disk encryption, disk discard support, each vif driver, and so on, each requires a new tempest job with appropriate settings. If we look at the number of possible tunable knobs like, that easily implies 100's more tempest jobs with varying configs. I don't think it is practical to consider doing that with our setup today. With separate virt driver repos we'd have more headroom to add a larger number of jobs since the volume of changes being tested overall would be smaller. > Both of these have interesting aspects, and I agree we need to do > _something_. I do wonder if there is a hybrid approach as well though. > For example, could we implement some sort of more formal lieutenant > system for drivers? We've talked about it in the past but never been > able to express how it would work in practise. Gerrit makes it hard to express that formally due to the lack of path based permissioning. If we do go for the virt driver split, it would none the less be useful if we trialled a lieutenant or sub-team model during Kilo, as a way to prepare for an eventual driver split in L. So this is worth talking about regardless I reckon. I still think on balance a virt driver split is benefical since it brings benefits beyond just the review team. > The last few days have be
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 12:24 PM, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they > add up to a big problem. > As many others - I cannot +1 this enough. Some technical comments below that we may want to consider before, but to sum them up - this will be a TON OF WORK! we better make sure we really want to do this before. However - please don't read this as FUD, maybe rather pointing out that devil is in the details, and maybe getting ahead of myself with too deep of a dive. > > - A fairly significant amount of nova code would need to be >considered semi-stable API. Certainly everything under nova/virt >and any object which is passed in/out of the virt driver API. >Changes to such APIs would have to be done in a backwards >compatible manner, since it is no longer possible to lock-step >change all the virt driver impls. In some ways I think this would >be a good thing as it will encourage people to put more thought >into the long term maintainability of nova internal code instead >of relying on being able to rip it apart later, at will. > I think we should not underestimate how big of a job this will be. We have been treating that API as internal for a long time and a lot of abstractions are just broken and need to be redesigned and then refactored. A lot of the stuff is implementation specific (live migrations is a good example of this). What makes it more difficult is that we need to get this as right as possible before we do the split. Now I am not saying this cannot be done or that we shouldn't to it, however I _am_ saying that we should not take lightly how much work there will be and how fiddly the work itself is. On top of that - there are some other serious issues with nova common code that we need to take care of ASAP, and this will definitely increase the churn and make that more difficult. We should take this into account and make sure we are focusing efforts on the right things. Making sure we do is the biggest challenge nova core faces in addition to all the others mentioned above. > - The nova/virt/driver.py class would need to be much better >specified. All parameters / return values which are opaque dicts >must be replaced with objects + attributes. Completion of the >objectification work is mandatory, so there is cleaner separation >between virt driver impls & the rest of Nova. > Not only that - currently nova-objects do their versioning magic only over RPC, while they would have to do it over library boundaries. This in itself will require work, and is likely going to influence how we stabilize the API. However - splitting out the scheduler is likely to require objects to be able to do similar things, and there are other things that we may want to do (e.g. using properly versioned data for the extensible resources) that will benefit from this. > - If changes are required to common code, the virt driver developer >would first have to get the necccessary pieces merged into Nova >common. Then the follow up virt driver specific changes could be >proposed to their repo. This implies that some changes to virt >drivers will still contend for resource in the common nova repo >and team. This contention should be lower than it is today though >since the current nova core team should have less code to look >after per-person on aggregate. > A handy example of this I can think of is the currently granted FFE for serial consoles - consider how much of the code went into the common
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 4 September 2014 23:48, Russell Bryant wrote: > On 09/04/2014 06:24 AM, Daniel P. Berrange wrote: >> Position statement >> == >> >> Over the past year I've increasingly come to the conclusion that >> Nova is heading for (or probably already at) a major crisis. If >> steps are not taken to avert this, the project is likely to loose >> a non-trivial amount of talent, both regular code contributors and >> core team members. That includes myself. This is not good for >> Nova's long term health and so should be of concern to anyone >> involved in Nova and OpenStack. >> >> For those who don't want to read the whole mail, the executive >> summary is that the nova-core team is an unfixable bottleneck >> in our development process with our current project structure. >> The only way I see to remove the bottleneck is to split the virt >> drivers out of tree and let them all have their own core teams >> in their area of code, leaving current nova core to focus on >> all the common code outside the virt driver impls. I, now, none >> the less urge people to read the whole mail. > > Fantastic write-up. I can't +1 enough the problem statement, which I > think you've done a nice job of framing. We've taken steps to try to > improve this, but none of them have been big enough. I feel we've > reached a tipping point. I think many others do too, and several > proposals being discussed all seem rooted in this same core issue. +1 I totally agree we need to split Nova up further, there just didn't seem to be the support for this before now. Not yet sure the virt drivers are the best split, but we already have sub-teams ready to take them on, so it will probably work for that reason. > If we ignored gerrit for a moment, is rapid increase in splitting out > components the ideal workflow? Would we be better off finding a way to > finally just implement a model more like the Linux kernel with > sub-system maintainers and pull requests to a top-level tree? Maybe. > I'm not convinced that split of repos is obviously better. I was thinking along similar lines. Regardless of that, we should try this for Kilo. If it feels like we are getting too much driver divergence, and tempest is not keeping everyone inline, the community is fragmenting and no one is working on the core of nova, then we might have to think about an alternative plan for L, including bringing the drivers back in tree. At least the separate repos will help us firm up the interfaces, which I think is a good thing. I worry about what it means to test a feature in "nova common, nova api, or nova core" or whatever we call it, if there are no virt drivers in tree. To some extent we might want to improve the fake virt driver for some in-tree functional tests anyways. But thats a separate discussion. > I don't think we can afford to wait much longer without drastic change, > so let's make it happen. +1 But I do think we should try and go further... Scheduler: I think we need to split out the scheduler with a similar level of urgency. We keep blocking features on the split, because we know we don't have the review bandwidth to deal with them. Right now I am talking about a compute related scheduler in the compute program, that might evolve to worry about other services at a later date. Nova-network: Maybe there isn't a big enough community to support this right now, but we need to actually delete this, or pull it out of nova-core. API: I suspect we might want to also look at splitting out the API from Nova common too. This one is a slightly more drastic, and needs more pre-split work (and is very related to making cells a first class concept), but I am still battling with that inside my head. Oslo: I suspect we may need to do something around the virt utilities, so they are easy to share, but there are probably other opportunities too. Thanks, John ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote: > On Thu, 4 Sep 2014 11:24:29 +0100 > "Daniel P. Berrange" wrote: > > > > - A fairly significant amount of nova code would need to be > >considered semi-stable API. Certainly everything under nova/virt > >and any object which is passed in/out of the virt driver API. > >Changes to such APIs would have to be done in a backwards > >compatible manner, since it is no longer possible to lock-step > >change all the virt driver impls. In some ways I think this would > >be a good thing as it will encourage people to put more thought > >into the long term maintainability of nova internal code instead > >of relying on being able to rip it apart later, at will. > > > > - The nova/virt/driver.py class would need to be much better > >specified. All parameters / return values which are opaque dicts > >must be replaced with objects + attributes. Completion of the > >objectification work is mandatory, so there is cleaner separation > >between virt driver impls & the rest of Nova. > > I think for this to work well with multiple repositories and drivers > having different priorities over implementing changes in the API it > would not just need to be semi-stable, but stable with versioning built > in from the start to allow for backwards incompatible changes. And > the interface would have to be very well documented including things > such as what exceptions are allowed to be raised through the API. > Hopefully this would be enforced through code as well. But as long as > driver maintainers are willing to commit to this extra overhead I can > see it working. With our primary REST or RPC APIs we're under quite strict rules about what we can & can't change - almost impossible to remove an existing API from the REST API for example. With the internal virt driver API we would probably have a little more freedom. For example, I think if we found an existing virt driver API that was insufficient for a new bit of work, we could add a new API in parallel with it, give the virt drivers 1 dev cycle to convert, and then permanently delete the original virt driver API. So a combination of that kind of API replacement, versioning for some data structures/objects, and use of the capabilties flags would probably be sufficient. That's what I mean by semi-stable here - no need to maintain existing virt driver APIs indefinitely - we can remove & replace them in reasonably short time scales as long as we avoid any lock-step updates. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, 4 Sep 2014 12:57:57 -0700 Joe Gordon wrote: > > Overall I do think we need to re-think how the review burden is > distributed. That being said, this is a nice proposal but I am not > sure if it moves the review burden around enough or is the right > approach. Do you have any rough numbers on what percent of the review > burden goes to virt drivers today (how ever you want to define that > statement, number of merged patches, man hours, lines of code, number > of reviews etc.). If for example today the nova review team spends > 10% of there review time on virt drivers then I don't think this > proposal will have a significant impact on the review backlog (for > nova-common). Even if it doesn't have a huge impact on the review backlog for nova-common (I think it should at least help a bit) it does have the potential to make life much easier for the virt driver developers. I think my main concern is around testing - as soon as we have multiple repositories involved I think debugging of test failures (especially races) tends to get more complicated and we have fewer people who are familiar enough with the two code bases. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, 4 Sep 2014 11:24:29 +0100 "Daniel P. Berrange" wrote: > > - A fairly significant amount of nova code would need to be >considered semi-stable API. Certainly everything under nova/virt >and any object which is passed in/out of the virt driver API. >Changes to such APIs would have to be done in a backwards >compatible manner, since it is no longer possible to lock-step >change all the virt driver impls. In some ways I think this would >be a good thing as it will encourage people to put more thought >into the long term maintainability of nova internal code instead >of relying on being able to rip it apart later, at will. > > - The nova/virt/driver.py class would need to be much better >specified. All parameters / return values which are opaque dicts >must be replaced with objects + attributes. Completion of the >objectification work is mandatory, so there is cleaner separation >between virt driver impls & the rest of Nova. I think for this to work well with multiple repositories and drivers having different priorities over implementing changes in the API it would not just need to be semi-stable, but stable with versioning built in from the start to allow for backwards incompatible changes. And the interface would have to be very well documented including things such as what exceptions are allowed to be raised through the API. Hopefully this would be enforced through code as well. But as long as driver maintainers are willing to commit to this extra overhead I can see it working. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 06:48:33PM -0400, Russell Bryant wrote: > On 09/04/2014 06:24 AM, Daniel P. Berrange wrote: > > Position statement > > == > > > > Over the past year I've increasingly come to the conclusion that > > Nova is heading for (or probably already at) a major crisis. If > > steps are not taken to avert this, the project is likely to loose > > a non-trivial amount of talent, both regular code contributors and > > core team members. That includes myself. This is not good for > > Nova's long term health and so should be of concern to anyone > > involved in Nova and OpenStack. > > > > For those who don't want to read the whole mail, the executive > > summary is that the nova-core team is an unfixable bottleneck > > in our development process with our current project structure. > > The only way I see to remove the bottleneck is to split the virt > > drivers out of tree and let them all have their own core teams > > in their area of code, leaving current nova core to focus on > > all the common code outside the virt driver impls. I, now, none > > the less urge people to read the whole mail. > > Fantastic write-up. I can't +1 enough the problem statement, which I > think you've done a nice job of framing. We've taken steps to try to > improve this, but none of them have been big enough. I feel we've > reached a tipping point. I think many others do too, and several > proposals being discussed all seem rooted in this same core issue. > > When it comes to the proposed solution, I'm +1 on that too, but part of > that is that it's hard for me to ignore the limitations placed on us by > our current review infrastructure (gerrit). > > If we ignored gerrit for a moment, is rapid increase in splitting out > components the ideal workflow? Would we be better off finding a way to > finally just implement a model more like the Linux kernel with > sub-system maintainers and pull requests to a top-level tree? Maybe. > I'm not convinced that split of repos is obviously better. > > You make some good arguments for why splitting has other benefits. For a long time I've use the LKML 'subsystem maintainers' model as the reference point for ideas. In a more LKML like model, each virt team (or other subsystem team) would have their own separate GIT repo with a complete Nova codebase, where they did they day to day code submissions, reviews and merges. Periodically the primary subsystem maintainer would submit a large pull / merge requests to the overall Nova maintainer. The $1,000,000 question in such a model is what kind of code review happens during the big pull requests to integrate subsystem trees. The closest example I can see is what's happening with the Ironic driver merge reviews. I'm personally finding review of that to be quite a burdensome activity, because all comments on the merge review then get fed back to the orginal maintainers who do a new round of patch + review in Ironic tree and then we get a new version submitted back to nova tree for merge. Rinse, repeat. So my biggest fear with a model where each team had their own full Nova tree and did large pull requests, is that we'd suffer major pain during the merging of large pull requests, especially if any of the merges touched common code. It could make the pull requests take a really long time to get accepted into the primary repo. By constrast with split out git repos per virt driver code, we will only ever have 1 stage of code review for each patch. Changes to common code would go straight to main nova common repo and so get reviewed by the experts there without delay, avoiding the 2nd stage of review from merge requests. The more I think abut this, the more attracted I am to the idea that separate repos will facilitate us doing more targetted testing and allow 3rd party CI to become gating over their respective virt driver codebases. Finally the LKML model would still leave some drivers at a disadvantage for development, if they're not able to meet the standards we require in terms of CI testing, to be accepted into the primary repo. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 12:57:57PM -0700, Joe Gordon wrote: > On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange > wrote: > > Proposal / solution > > === > > > > In the past Nova has spun out its volume layer to form the cinder > > project. The Neutron project started as an attempt to solve the > > networking space, and ultimately replace the nova-network. It > > is likely that the schedular will be spun out to a separate project. > > > > Now Neutron itself has grown so large and successful that it is > > considering going one step further and spinning its actual drivers > > out of tree into standalone add-on projects [4]. I've heard on the > > grapevine that Ironic is considering similar steps for hardware > > drivers. > > > > The radical (?) solution to the nova core team bottleneck is thus to > > follow this lead and split the nova virt drivers out into separate > > projects and delegate their maintainence to new dedicated teams. > > > > - Nova becomes the home for the public APIs, RPC system, database > >persistent and the glue that ties all this together with the > >virt driver API. > > > > - Each virt driver project gets its own core team and is responsible > >for dealing with review, merge & release of their codebase. > > > > Overall I do think we need to re-think how the review burden is > distributed. That being said, this is a nice proposal but I am not sure if > it moves the review burden around enough or is the right approach. Do you > have any rough numbers on what percent of the review burden goes to virt > drivers today (how ever you want to define that statement, number of merged > patches, man hours, lines of code, number of reviews etc.). If for example > today the nova review team spends 10% of there review time on virt drivers > then I don't think this proposal will have a significant impact on the > review backlog (for nova-common). I'm a little wary of doing too many stats on things like reviews and patches, because I fear it does not capture the full picture. Specifically we're turning away contributors before they ever get to the point of submitting reviews / patches, by rejecting their blueprints/specs. Also the difficultly of getting stuff reviewed is discouraging people even considering doing alot of work in the first place - if I had had the confidence in getting it reviewed & merged I would easily have submitted twice as much code to libvirt this cycle, but as it was I didn't even start work on most things I would have liked to. That said though, in the past 6 months we had 1385 changes merged. Of those, 437 touched at least one file in the /virt/ directory which is approximately 30%. I agree though, this proposal will not have a dramatic effect on the review backlog for the nova common code. It would probably be a small (but noticable) improvement - most of the benefit would fall on the virt drivers I expect. If we can make Nova a more productive & enjoyable place to contribute though, this should ultimately feed through into more people being involved in general and thus more resource available to nova common too. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 10:44:17PM -0600, John Griffith wrote: > Just some thoughts and observations I've had regarding this topic in Cinder > the past couple of years. I realize this is a Nova thread so hopefully > some of this can be applied in a more general context. > > TLDR: > 1. I think moving drivers into their own repo is just shoveling the pile to > make a new pile (not really solving anything) I'm not familiar with Cinder, but for Nova it would certainly have clear benefits and not merely be shoveling the pile. Specifically it would - Easily let us double the number of "core" reviewers on aggregate - Reduce the bar for getting into a driver core team thus increasing the talent pool we can promote from. - Work accepted in a release for one driver would not reduce the bandwidth for another driver to accept work, since their review teams are separate - We can have more targetted testing, which will reduce the amount of bogus gate failures people get when submitting reviews and allow every driver to have gating CI jobs without impacting the other drivers > 2. Removal of drivers other than the reference implementation for each > project could be the healthiest option > a. Requires transparent, public, automated 3'rd party CI > b. Requires a TRUE plugin architecture and mentality > c. Requires a stable and well defined API As mentioned in the original mail I don't want to see a situation where we end up with some drivers in tree and others out of tree as it sets up bad dynamics within the project. Those out of tree will always have the impression of being second class citizens and thus there will be constant pressure to accept drivers back into tree. The so called 'reference' driver that stayed in tree would also continue to be penalized in the way it is today, and so its development would be disadvantaged compared to the out of tree drivers. > 3. While I'm still sort of a fan of the removal of drivers, I do think > Cinder is "making it work", there have been missteps and yes it's a pain > sometimes but it's working "ok" and we've got plans to try and improve > > 4. Adding restrictions like drivers only in first milestone and more > intense scrutinization of features will go a long way to help resolve the > issues we do have currently Not in nova at least. We have a fundamental bottleneck in nova and simply re-arranging review priorities in this kind of way will never fix it. We've tried many different approaches to prioritization of work and the only result is that we've got more aggressive at saying no to contributors. This is directly resulting in the crisis we have today. > I've spent a fair amount of time thinking about the explosive number of > drivers being added to Cinder over the past year or so. I've been a pretty > vocal proponent of the idea of removing all drivers except the LVM > reference implementation from Cinder. I'd rather see Vendors drivers > maintained in their own Github Repo and truly follow a "plugin" model. > This of course means that Cinder has to be truly designed and maintained > with a real plugin architecture kept in mind in every aspect of development > (experience proves this harder to do than it sounds). I think with things > stable and well defined interfaces as well as 3'rd party CI this is > actually a reasonable approach and could be effective. I do not see how > creating a separate repo and in essence yet another set of OpenStack > Projects really helps with the problem. The fact is that the biggest issue > most people see with driver contributions is those that are made by > organizations that work on their driver only and don't contribute back to > the core project (whether that be in the form of reviews of core > contributions). I'm not sure I understand why that would be any different > by just putting the code in a separate bucket. In other words, getting a > solid and consistent team working on that "project" seems like you've just > kicked the can down the road so you don't have to deal with it. Fundamentally people contributing to a project are doing so voluntarily to scratch their own itch. The project leadership can help identify areas that need work and encourage people to take up the challenge, but you cannot force people to do the work. We've done many things in nova that are basically inflicting a form of punishment on contributors if they don't work on things we tell them to work on. This is not having a positive effect, on the contrary it is resulting in alot of demovated and pissed off contributors who are ultimately leaving the project. I agree that splitting the virt drivers out into their own repositories is not going to hugely help get more people to work on Nova core - that was not the primary intention. The big focus is on unblocking development of the virt drivers so that their contributors actually feeled their efforts are valued by the project. If we make the project a more attracti
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 02:56:04PM -0500, Kyle Mestery wrote: > On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange > wrote: > > Proposal / solution > > === > > > > In the past Nova has spun out its volume layer to form the cinder > > project. The Neutron project started as an attempt to solve the > > networking space, and ultimately replace the nova-network. It > > is likely that the schedular will be spun out to a separate project. > > > > Now Neutron itself has grown so large and successful that it is > > considering going one step further and spinning its actual drivers > > out of tree into standalone add-on projects [4]. I've heard on the > > grapevine that Ironic is considering similar steps for hardware > > drivers. > > > I just wanted to note that this is a huge problem in Neutron, and it > gets worse with each release as we add on more drivers and plugins > which carry a maintenance cost without gaining any new reviewers from > the companies who have the drivers. The rough plan I have for Neutron > involves moving all non-Open Source drivers out of tree into a > separate git repository. Your message has made me think that perhaps > we in Neutron should go one step further and even remove the Open > Source drivers, leaving the in-tree implementation as the only one > there. Where we move these is the main issue. Given we have 20+ > drivers/plugins now, one git repository per driver/plugin won't scale, > as we add 3-5 each cycle. So perhaps a single repository is the best > idea here, with shared reviews from vendors across each other's code. While I'll make no secret of my dislike for closed source software, my feeling is that OpenStack as a project is explicitly welcoming closed source software & vendors, not least by virtue of using a more permissive Apache license instead of a strong copyleft license like GPL. So given the project's stance, I'd not be in favour of discriminating against drivers for closed source software. In actual fact though, the premise of my proposal is the idea that moving a driver out of tree will actually help its development by giving its team much greater freedom & responsbility. So by only moving out non-open source drivers, we'd arguably be putting the in-tree open source drivers at a disadvantage ! I'm also very much drawn to the idea that having separate repos will let us do more targetted setup of CI test jobs, so each test job is actually directly relevant to the code being tested. I can see your concern about the number of drivers you have in Neutron and the frequency with which more are added. We don't have anywhere near this number in Nova and are not likely to ever grow that much. If you did have 30 separate drivers and thus 30 separate GIT repos though, the question to consider is who is ultimately responsible for reviewing those drivers. If each of those 30 drivers had their own self-organized team of people the burden of 30 repos is not as bad as it seems, since any one person would probably only be concerned with a couple of git repos. If you still see the single neutron core team being responsible for each of those repos, then I can see that having 30 repos would be a big burden. I don't think there is a single right answer here for all OpenStack projects. It is entirely conceivable that it might be best for Neutron to have a single repo for a set of driver, while being best for Nova to have a separate repo for each driver. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 05/09/2014 01:22, Michael Still a écrit : On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: [Heavy snipping because of length] The radical (?) solution to the nova core team bottleneck is thus to follow this lead and split the nova virt drivers out into separate projects and delegate their maintainence to new dedicated teams. - Nova becomes the home for the public APIs, RPC system, database persistent and the glue that ties all this together with the virt driver API. - Each virt driver project gets its own core team and is responsible for dealing with review, merge & release of their codebase. I think this is the crux of the matter. We're not doing a great job of landing code at the moment, because we can't keep up with the review workload. So far we've had two proposals mooted: - slots / runways, where we try to rate limit the number of things we're trying to review at once to maintain focus - splitting all the virt drivers out of the nova tree Ahem, IIRC, there is a third proposal for Kilo : - create subteam's half-cores responsible for reviewing patch's iterations and send to cores approvals requests once they consider the patch enough stable for it. As I explained, it would allow to free up reviewing time for cores without loosing the control over what is being merged. -Sylvain Splitting the drivers out of the nova tree does come at a cost -- we'd need to stabilise and probably version the hypervisor driver interface, and that will encourage more "out of tree" drivers, which are things we haven't historically wanted to do. If we did this split, I think we need to acknowledge that we are changing policy there. It also means that nova-core wouldn't be the ones holding the quality bar for hypervisor drivers any more, I guess this would open the door for drivers to more actively compete on the quality of their implementations, which might be a good thing. Both of these have interesting aspects, and I agree we need to do _something_. I do wonder if there is a hybrid approach as well though. For example, could we implement some sort of more formal lieutenant system for drivers? We've talked about it in the past but never been able to express how it would work in practise. The last few days have been interesting as I watch FFEs come through. People post explaining their feature, its importance, and the risk associated with it. Three cores sign on for review. All of the ones I've looked at have received active review since being posted. Would it be bonkers to declare nova to be in "permanent feature freeze"? If we could maintain the level of focus we see now, then we'd be getting heaps more done that before. These issues should very definitely be on the agenda for the design summit, probably early in the week. Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 05/09/2014 01:26, Jay Pipes a écrit : On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Jay, we agreed on a plan to carry on, please be sure we're working on it, see the Gantt meetings logs for what my vision is. That said, I think this concern of clean interfaces also applies to this thread: if we want to spin off the virt drivers out of Nova git repo, that does requires a cleanup on the interfaces, in particular on the compute manager and the resource tracker, where a lot of bits are still strongly tied and not versionified (thanks to JSON dicts). So, this effort requires at least one cycle, and as Dan stated, there is urgency, so I think we need to identify a short-term solution which doesn't require refactoring. My personal opinion is what Russell and Thierry expressed, ie. subteam delegation (to what I call "half-cores") for iterations and only approvals for cores. -Sylvain Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 4, 2014 at 4:32 PM, Jay Pipes wrote: > > > On 09/04/2014 12:11 PM, Duncan Thomas wrote: > >> I think that having a shared review team across all of the drivers >> has definite benefits in terms of coherency and consistency - it is >> very easy for experts on one technology to become tunnel-visioned on >> some points and miss the wider, cross project picture. A common >> drivers team is likely to have a broad enough range of opinions to >> keep things healthy, compared to one repo (and team) per driver, and >> also they are able to speak collectively to teh core nova team, which >> helps set priorities there when they need to be influenced on behalf >> of the drivers team. >> > > In theory, the above sounds good. In practice, it doesn't happen. The code > in the virt drivers is horribly inconsistent, duplicative and yet slightly > and pointlessly different, and uses paradigms that make sense for the one > platform but don't necessarily make sense for another platform. > > The testing/CI benefits that Dan highlighted -- in terms of patches to > non-related virt drivers not interfering with the stability and progress of > a patch to another virt driver -- is the #1 critical benefit to Dan's > proposal, and doing a single virt drivers core team and repo totally throws > that benefit away. > > Best, > -jay > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Just some thoughts and observations I've had regarding this topic in Cinder the past couple of years. I realize this is a Nova thread so hopefully some of this can be applied in a more general context. TLDR: 1. I think moving drivers into their own repo is just shoveling the pile to make a new pile (not really solving anything) 2. Removal of drivers other than the reference implementation for each project could be the healthiest option a. Requires transparent, public, automated 3'rd party CI b. Requires a TRUE plugin architecture and mentality c. Requires a stable and well defined API 3. While I'm still sort of a fan of the removal of drivers, I do think Cinder is "making it work", there have been missteps and yes it's a pain sometimes but it's working "ok" and we've got plans to try and improve 4. Adding restrictions like drivers only in first milestone and more intense scrutinization of features will go a long way to help resolve the issues we do have currently Now the long winded version with a little more detail and context; I've spent a fair amount of time thinking about the explosive number of drivers being added to Cinder over the past year or so. I've been a pretty vocal proponent of the idea of removing all drivers except the LVM reference implementation from Cinder. I'd rather see Vendors drivers maintained in their own Github Repo and truly follow a "plugin" model. This of course means that Cinder has to be truly designed and maintained with a real plugin architecture kept in mind in every aspect of development (experience proves this harder to do than it sounds). I think with things stable and well defined interfaces as well as 3'rd party CI this is actually a reasonable approach and could be effective. I do not see how creating a separate repo and in essence yet another set of OpenStack Projects really helps with the problem. The fact is that the biggest issue most people see with driver contributions is those that are made by organizations that work on their driver only and don't contribute back to the core project (whether that be in the form of reviews of core contributions). I'm not sure I understand why that would be any different by just putting the code in a separate bucket. In other words, getting a solid and consistent team working on that "project" seems like you've just kicked the can down the road so you don't have to deal with it. Any time I've mentioned the removal approach the response is typically that there's no quality control, or that Vendors won't be as willing to invest in OpenStack because they can focus on their own interests and get by with that. The quality control one was a tough one to counter, but now that we're moving towards things like 3'rd party CI I'm not sure that's quite as significant as it was a year ago. I'd still like to see a public record of testing in the form of CI, NOT just Vendor-A submitting something that says.. "yeah, I'm awesome". I suspect that OpenStack adopters would look at things like public CI postings to determine what's worth pursuing and what's not. The other concern I had in the past was "we'd loose valuable contributors". There are vendors that are directly responsible for providing us with some great contributors in the Core of the Cinder project. They do a great job of balancing the tactical and strategic interests, and the concern is that if the driv
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
- Original Message - > On 09/04/2014 11:32 AM, Vladik Romanovsky wrote: > > +1 > > > > I very much agree with Dan's the propsal. > > > > I am concerned about difficulties we will face with merging > > patches that spreads accross various regions: manager, conductor, > > scheduler, etc.. > > However, I think, this is a small price to pay for having a more focused > > teams. > > > > IMO, we will stiil have to pay it, the moment the scheduler will separate. > > There will be more pain the moment the scheduler separates, IMO, > especially with its current design and interfaces. I absolutely agree that the scheduler split is a non-starter without stabilizing all of the relevant interfaces. I hope there's not much debate on that high level point. Of course, identifying exactly what those interfaces should be a bit more complicated, but I hope the focus can stay there. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 12:11 PM, Duncan Thomas wrote: I think that having a shared review team across all of the drivers has definite benefits in terms of coherency and consistency - it is very easy for experts on one technology to become tunnel-visioned on some points and miss the wider, cross project picture. A common drivers team is likely to have a broad enough range of opinions to keep things healthy, compared to one repo (and team) per driver, and also they are able to speak collectively to teh core nova team, which helps set priorities there when they need to be influenced on behalf of the drivers team. In theory, the above sounds good. In practice, it doesn't happen. The code in the virt drivers is horribly inconsistent, duplicative and yet slightly and pointlessly different, and uses paradigms that make sense for the one platform but don't necessarily make sense for another platform. The testing/CI benefits that Dan highlighted -- in terms of patches to non-related virt drivers not interfering with the stability and progress of a patch to another virt driver -- is the #1 critical benefit to Dan's proposal, and doing a single virt drivers core team and repo totally throws that benefit away. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 10:33 AM, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). The difference between Dan's proposal and the Gantt split is that Dan's proposal features quite prominently the following: == begin == - The nova/virt/driver.py class would need to be much better specified. All parameters / return values which are opaque dicts must be replaced with objects + attributes. Completion of the objectification work is mandatory, so there is cleaner separation between virt driver impls & the rest of Nova. == end == In other words, Dan's proposal above is EXACTLY what I've been saying needs to be done to the interfaces between nova-conductor, nova-compute, and nova-scheduler *before* any split of the scheduler code is even remotely feasible. Splitting the scheduler out before this is done would actually not "help but not solve this problem" -- it would instead further the problem, IMO. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: [Heavy snipping because of length] > The radical (?) solution to the nova core team bottleneck is thus to > follow this lead and split the nova virt drivers out into separate > projects and delegate their maintainence to new dedicated teams. > > - Nova becomes the home for the public APIs, RPC system, database >persistent and the glue that ties all this together with the >virt driver API. > > - Each virt driver project gets its own core team and is responsible >for dealing with review, merge & release of their codebase. I think this is the crux of the matter. We're not doing a great job of landing code at the moment, because we can't keep up with the review workload. So far we've had two proposals mooted: - slots / runways, where we try to rate limit the number of things we're trying to review at once to maintain focus - splitting all the virt drivers out of the nova tree Splitting the drivers out of the nova tree does come at a cost -- we'd need to stabilise and probably version the hypervisor driver interface, and that will encourage more "out of tree" drivers, which are things we haven't historically wanted to do. If we did this split, I think we need to acknowledge that we are changing policy there. It also means that nova-core wouldn't be the ones holding the quality bar for hypervisor drivers any more, I guess this would open the door for drivers to more actively compete on the quality of their implementations, which might be a good thing. Both of these have interesting aspects, and I agree we need to do _something_. I do wonder if there is a hybrid approach as well though. For example, could we implement some sort of more formal lieutenant system for drivers? We've talked about it in the past but never been able to express how it would work in practise. The last few days have been interesting as I watch FFEs come through. People post explaining their feature, its importance, and the risk associated with it. Three cores sign on for review. All of the ones I've looked at have received active review since being posted. Would it be bonkers to declare nova to be in "permanent feature freeze"? If we could maintain the level of focus we see now, then we'd be getting heaps more done that before. These issues should very definitely be on the agenda for the design summit, probably early in the week. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 09:36 AM, Gary Kotton wrote: Hi, I do not think that Nova is in a death spiral. I just think that the current way of working at the moment is strangling the project. I do not understand why we need to split drivers out of the core project. Why not have the ability to provide Œcore review¹ status to people for reviewing those parts of the code? We have enough talented people in OpenStack to be able to write a driver above gerrit to enable that. Clearly you have never looked at the Gerrit source code. :) -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 06:24 AM, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. Fantastic write-up. I can't +1 enough the problem statement, which I think you've done a nice job of framing. We've taken steps to try to improve this, but none of them have been big enough. I feel we've reached a tipping point. I think many others do too, and several proposals being discussed all seem rooted in this same core issue. When it comes to the proposed solution, I'm +1 on that too, but part of that is that it's hard for me to ignore the limitations placed on us by our current review infrastructure (gerrit). If we ignored gerrit for a moment, is rapid increase in splitting out components the ideal workflow? Would we be better off finding a way to finally just implement a model more like the Linux kernel with sub-system maintainers and pull requests to a top-level tree? Maybe. I'm not convinced that split of repos is obviously better. You make some good arguments for why splitting has other benefits. Besides, even if we weren't going to split them and instead wanted to have separate branches, we'd have to take interface stability much more seriously. I think the work immediately needed overlaps quite a bit. In any case, let's not completely side-tracked on the end game workflow. I am completely on board with the idea that we have to move to a model that involves more than one team and spreading out the responsibility further than we have thus far. I don't think we can afford to wait much longer without drastic change, so let's make it happen. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they > add up to a big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear > that the backlog of code up for review never goes away. Even > intensive code review efforts at various points in the dev cycle > makes only a small impact on the backlog. This has a pretty > significant impact on core team members, as their work is never > done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with > the reviews in a more efficient manner than plain gerrit allows > for. These certainly help, but they can't ever solve the problem > on their own - just make it slightly more bearable. And this is > not even considering that core team members might have useful > contributions to make in ways beyond just code review. Ultimately > the workload is just too high to sustain the levels of review > required, so core team members will eventually burn out (as they > have done many times already). > > Even if one person attempts to take the initiative to heavily > invest in review of certain features it is often to no avail. > Unless a second dedicated core reviewer can be found to 'tag > team' it is hard for one person to make a difference. The end > result is that a patch is +2d and then sits idle for weeks or > more until a merge conflict requires it to be reposted at which > point even that one +2 is lost. This is a pretty demotivating > outcome for both reviewers & the patch contributor. > > > New core team talent > > > It can't escape attention that the Nova core team does not grow > in size very often. When Nova was younger and its code base was > smaller, it was easier for contributors to get onto core because > the base level of knowledge required was that much smaller. To > get onto core today requires a major investment in learning Nova > over a year or more. Even people who potentially have the latent > skills may not have the time available to invest in learning the > entire of Nova. > > With the number of reviews proposed to Nova, the core team should > probably be at least double its current size[1]. There is plenty of > expertize in the project as a whole but it is typically focused > into specific areas of the codebase. There is nowhere we can find > 20 more people with broad knowledge of the codebase who could be > promoted even over the next year, let alone today. This is ignoring > that many existing members of core are relatively inactive due to > burnout and so need replacing. That means we really need another > 25-30 people for core. That's not going to happen. > > > Code review delays > -- > > The obvious result of having too much work for too few reviewers > is that code contributors face major delays in getting their work > reviewed and merged. From personal experience, during Juno, I've > probably spent 1 week in aggregate on actual code development vs > 8 weeks on waiting on code review. You have to constantly be on > alert for review comments because unless you can respond quickly > (and repost) while you still have the attention of the reviewer, > they may not be look again for days/weeks. > > The length of time to get work merged serves as a demotivator to > actually do work in the first place. I've personally avoided doing > alot of code refactoring & cleanup work that would improve the > maintainability of th
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange wrote: > Position statement > == > > Over the past year I've increasingly come to the conclusion that > Nova is heading for (or probably already at) a major crisis. If > steps are not taken to avert this, the project is likely to loose > a non-trivial amount of talent, both regular code contributors and > core team members. That includes myself. This is not good for > Nova's long term health and so should be of concern to anyone > involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive > summary is that the nova-core team is an unfixable bottleneck > in our development process with our current project structure. > The only way I see to remove the bottleneck is to split the virt > drivers out of tree and let them all have their own core teams > in their area of code, leaving current nova core to focus on > all the common code outside the virt driver impls. I, now, none > the less urge people to read the whole mail. > As others have said, thanks for writing this up Daniel. > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they > add up to a big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear > that the backlog of code up for review never goes away. Even > intensive code review efforts at various points in the dev cycle > makes only a small impact on the backlog. This has a pretty > significant impact on core team members, as their work is never > done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with > the reviews in a more efficient manner than plain gerrit allows > for. These certainly help, but they can't ever solve the problem > on their own - just make it slightly more bearable. And this is > not even considering that core team members might have useful > contributions to make in ways beyond just code review. Ultimately > the workload is just too high to sustain the levels of review > required, so core team members will eventually burn out (as they > have done many times already). > > Even if one person attempts to take the initiative to heavily > invest in review of certain features it is often to no avail. > Unless a second dedicated core reviewer can be found to 'tag > team' it is hard for one person to make a difference. The end > result is that a patch is +2d and then sits idle for weeks or > more until a merge conflict requires it to be reposted at which > point even that one +2 is lost. This is a pretty demotivating > outcome for both reviewers & the patch contributor. > > > New core team talent > > > It can't escape attention that the Nova core team does not grow > in size very often. When Nova was younger and its code base was > smaller, it was easier for contributors to get onto core because > the base level of knowledge required was that much smaller. To > get onto core today requires a major investment in learning Nova > over a year or more. Even people who potentially have the latent > skills may not have the time available to invest in learning the > entire of Nova. > > With the number of reviews proposed to Nova, the core team should > probably be at least double its current size[1]. There is plenty of > expertize in the project as a whole but it is typically focused > into specific areas of the codebase. There is nowhere we can find > 20 more people with broad knowledge of the codebase who could be > promoted even over the next year, let alone today. This is ignoring > that many existing members of core are relatively inactive due to > burnout and so need replacing. That means we really need another > 25-30 people for core. That's not going to happen. > > > Code review delays > -- > > The obvious result of having too much work for too few reviewers > is that code contributors face major delays in getting their work > reviewed and merged. From personal experience, during Juno, I've > probably spent 1 week in aggregate on actual code development vs > 8 weeks on waiting on code review. You have to constantly be on > alert for review comments because unless you can respond quickly > (and repost) while you still have the attention of the reviewer, > they may not be look again for days/weeks. > > The length of time to get work merged serves as a demotivator to > actually do work in the first place. I've personally avoided doing > alot of code refactoring & cle
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Hi all, This is an issue that has been discussed quite a few times. As I was fearing the bottleneck effect is getting worse with each release. Nova grew simply too much and even though features like networking and block storage have been spun off at some point in time, it still lacks the cohesion necessary for a successful long term lifecycle, or in other terms, it’s just too big to be properly maintained by a handful of amazing and overworked people. Compute drivers are easy to identify as decoupled sub-projects and are among those which suffer to a bigger extent the lack of an independent development process. Nova is a mature project (at least relatively to the OpenStack’s context) and as such new features and bug fixes need to go through a very thorough screening and review before being approved and merged, which does not work well with sub-projects that need to grow faster, especially when introduced later in the lifecycle (e.g the current Hyper-V driver introduced in Folsom) or when being pushed by more aggressive market requirements. Just as an example, only 3 out of 8 Hyper-V blueprint specs have been approved and implemented in Juno, the rest will simply get bumped to Kilo, which means that new additional specs will need to be bumped to L and so on introducing further delays. We ended up privileging feature parity blueprints, delaying almost anything else. Bug fixes landing time in stable releases is also another issue for the user base since merging in master takes a long time and backporting requires another long review process, e.g. more than four months in some cases [1]. As a result we ended up releasing the fixes in a project fork that became our de facto stable release in place of upstream, while waiting for upstream merge. We never experienced similar issues in smaller projects like Neutron, Cinder, Ceilometer or Horizon where we are involved as well, which can be a practical example of the potential benefits of splitting Nova. OpenStack has a clear process for incubation, letting new projects grow as fast as they need during their youth and integrating them into core only when a mature stage is reached [2]. Unfortunately this process applies to projects, but not to subprojects (Hyper-V and VMWare drivers in particular, but not only) resulting in a way slower development pace compared to what a project lead by an independent team could have allowed. On the other hand, Docker is an example of a driver going the StackForge way, but its ultimate potential inclusion in Nova will just increase the current pain points. >From an Hyper-V team perspective, in the late Havana cycle the same reasons highlighted in this thread almost lead us to ask for removal of the driver from Nova in order to improve our development process, even at the cost of the subsequent fall from (core) grace and StackForge incubation Purgatory period, so I’m definitely happy that the conversation has been resumed with a bigger consensus. The main factor that blocked the Hyper-V driver’s exit from Nova was the introduction of the Hyper-V CI during the same cycle. Regressions are a very sensitive topic when you run OpenStack components on an operating system which is not Linux and the CI helped a lot in blocking or discovering issues in a timely fashion. Beside that, the size of the Hyper-V team increased considerably during Icehouse and Juno [3], so the Hyper-V CI became a mandatory and almost irreplaceable tool in our review process, leading us to reach an excellent level of stability of the driver on every supported version of Hyper-V (and progressive CI voting stability as well, but that’s another topic [4]). This means that if we reach a point in which we agree to spin off the drivers in separate core projects, we need to consider how driver related CIs will be still included in the Nova review process, possibly with voting rights when the individual CI stability allows it. Having each third party CI to vote only on its spin-off driver project is not an option IMO, as it won’t catch regressions introduced in Nova that affect the drivers, including race conditions [5] An interesting area of discussion is who is going to be part of the initial core teams for each new subproject. I truly appreciated the experience and help of the Nova core guys, so in order to allow a smoother transition I’d suggest to have for each new project (e.g. nova-compute-hyperv, nova-compute-vmware, etc) an initial core team consisting in one or two members of the current Nova sub-team and one Nova core, with ideally each patch reviewed by both the domain experts and the Nova core. The team could then go on its way by voting its own members as any other OpenStack project does. Another point of discussion is the stabilization and documentation of the driver interface. There are simply too many areas where the behavior between drivers differs, and looking at some other driver’s behavior was in too many cases the only source of document
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 09/04/2014 11:32 AM, Vladik Romanovsky wrote: +1 I very much agree with Dan's the propsal. I am concerned about difficulties we will face with merging patches that spreads accross various regions: manager, conductor, scheduler, etc.. However, I think, this is a small price to pay for having a more focused teams. IMO, we will stiil have to pay it, the moment the scheduler will separate. There will be more pain the moment the scheduler separates, IMO, especially with its current design and interfaces. -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 05:11:22PM +0100, Duncan Thomas wrote: > On 4 September 2014 16:00, Solly Ross wrote: > >> My only question is about the need to separate out each virt driver into a > >> separate project, wouldn't you > >> accomplish a lot of the benefit by creating a single virt project that > >> includes all of the drivers? > > > > I don't think there's particularly a *point* to having all drivers in one > > repo. Part of code review is looking for code "gotchas", but part of code > > review is looking for subtle issues that are caused by the very nature of > > the driver. A HyperV "core" reviewing a libvirt change should certainly be > > able to provide the former, but most likely cannot provide the latter to a > > sufficient degree (if he or she can, then he or she should be a libvirt > > "core" as well). > > I think that having a shared review team across all of the drivers has > definite benefits in terms of coherency and consistency - it is very > easy for experts on one technology to become tunnel-visioned on some > points and miss the wider, cross project picture. A common drivers > team is likely to have a broad enough range of opinions to keep things > healthy, compared to one repo (and team) per driver, and also they are > able to speak collectively to teh core nova team, which helps set > priorities there when they need to be influenced on behalf of the > drivers team. If people are interested in reviewing all the driver code there's nothing preventing them doing that. It is easy to setup gerrit to notify you on changes across many drivers if you have that desire, or to write scripts to query gerrit too. Realistically though, even today most people working on a virt driver totally ignore the other virt drivers and so separating them isn't going to make things significantly worse in that regard. > TLDR: I don't think there's particularly a point to splitting out the > drivers into individual repos, and much to be gained from keeping them > all in one (but still breaking them out of nova) There's significant benefits in the way we can test and gate changes by having separate repos. It also ensures that the workload for changes for one driver don't impact on the workload of changes for another driver which is a very real problem today. It also ensures that any new drivers can start off on a level playing field wrt existing drivers and not have to jump over a huge initial bar to get into the official repo. So there is a great deal of benefit to having one repo per driver. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 04/09/2014 17:57, Daniel P. Berrange a écrit : On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote: Actually, I think Sylvain's point is even stronger as I don't think splitting the virt drivers out of Nova is a complete fix. It may solve the review latency for the virt driver area but, unless virt drivers are the bulk of Nova patches, the Nova core team will still be swamped with review requests. Some solution, maybe half-cores, will still be needed for Nova long term. Absolutely, nova core will still have an awful lot of work todo and will need to have fresh blood. The split will free up some % of existing cores time though as there's certainly plenty of virt driver only patches going through merge that are taking up non negligble review time. eg I've done loads of review on vmware only code which I'd be relieved of with vmware maintainers able to form their own review core for their driver. There is also the fact that people are holding back on even submitting code for many drivers because they know it'll never get reviewed. So the proportion of virt driver only code is likely to be higher than what we currently see on review today. I totally understand your point and I agree with it. I'm just thinking that for Kilo and Lxxx, we also need to experiment some halfcore teams in order to free up your review duty, at least until the virt code is splitted out correctly. On a side note, assuming I'm a non-core (so you can just throw my advice), I don't think the runway/slot proposal for Kilo will increase the reviewing bandwidth as it will just create another layer of prioritization without addressing the velocity. In another world, that's not because you just create a Scrum's sprint with 2 people and provide poker planning that you can address a 2-month man-day work. -Sylvain Regards, Daniel ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 4 September 2014 16:00, Solly Ross wrote: >> My only question is about the need to separate out each virt driver into a >> separate project, wouldn't you >> accomplish a lot of the benefit by creating a single virt project that >> includes all of the drivers? > > I don't think there's particularly a *point* to having all drivers in one > repo. Part of code review is looking for code "gotchas", but part of code > review is looking for subtle issues that are caused by the very nature of the > driver. A HyperV "core" reviewing a libvirt change should certainly be able > to provide the former, but most likely cannot provide the latter to a > sufficient degree (if he or she can, then he or she should be a libvirt > "core" as well). I think that having a shared review team across all of the drivers has definite benefits in terms of coherency and consistency - it is very easy for experts on one technology to become tunnel-visioned on some points and miss the wider, cross project picture. A common drivers team is likely to have a broad enough range of opinions to keep things healthy, compared to one repo (and team) per driver, and also they are able to speak collectively to teh core nova team, which helps set priorities there when they need to be influenced on behalf of the drivers team. TLDR: I don't think there's particularly a point to splitting out the drivers into individual repos, and much to be gained from keeping them all in one (but still breaking them out of nova) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote: > Actually, I think Sylvain's point is even stronger as I don't think > splitting the virt drivers out of Nova is a complete fix. It may > solve the review latency for the virt driver area but, unless virt > drivers are the bulk of Nova patches, the Nova core team will still > be swamped with review requests. Some solution, maybe half-cores, > will still be needed for Nova long term. Absolutely, nova core will still have an awful lot of work todo and will need to have fresh blood. The split will free up some % of existing cores time though as there's certainly plenty of virt driver only patches going through merge that are taking up non negligble review time. eg I've done loads of review on vmware only code which I'd be relieved of with vmware maintainers able to form their own review core for their driver. There is also the fact that people are holding back on even submitting code for many drivers because they know it'll never get reviewed. So the proportion of virt driver only code is likely to be higher than what we currently see on review today. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Actually, I think Sylvain's point is even stronger as I don't think splitting the virt drivers out of Nova is a complete fix. It may solve the review latency for the virt driver area but, unless virt drivers are the bulk of Nova patches, the Nova core team will still be swamped with review requests. Some solution, maybe half-cores, will still be needed for Nova long term. -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Sylvain Bauza [mailto:sba...@redhat.com] Sent: Thursday, September 4, 2014 9:19 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Le 04/09/2014 17:00, Solly Ross a écrit : >> My only question is about the need to separate out each virt driver >> into a separate project, wouldn't you accomplish a lot of the benefit by >> creating a single virt project that includes all of the drivers? > I don't think there's particularly a *point* to having all drivers in one > repo. Part of code review is looking for code "gotchas", but part of code > review is looking for subtle issues that are caused by the very nature of the > driver. A HyperV "core" reviewing a libvirt change should certainly be able > to provide the former, but most likely cannot provide the latter to a > sufficient degree (if he or she can, then he or she should be a libvirt > "core" as well). > > A strong +1 to Dan's proposal. I think this would also make it easier for > non-core reviewers to get started reviewing, without having a specialized > tool setup. As I said previously, I'm also giving a +1 to this proposal. That said, as I think it deserves at least one iteration for getting this done (look at the scheduler split and since hox long we're working on it), I also think we need a short-term solution like the one proposed by Thierry, ie. what I call "half-cores" - people who help reviewing an code area and free up time for cores just for approving instead of focusing on each iteration. -Sylvain > Best Regards, > Solly Ross > > P.S. >> This is a crisis. A large crisis. In fact, if you got a moment, it's >> a twelve-storey crisis with a magnificent entrance hall, carpeting >> throughout, 24-hour portage, and an enormous sign on the roof, saying >> 'This Is a Large Crisis'. A large crisis requires a large plan. > Ha! > > - Original Message - >> From: "Donald D Dugger" >> To: "Daniel P. Berrange" , "OpenStack Development >> Mailing List (not for usage questions)" >> >> Sent: Thursday, September 4, 2014 10:33:27 AM >> Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting >> outvirt drivers >> >> Basically +1 with what Daniel is saying (note that, as mentioned, a >> side effect of our effort to split out the scheduler will help but >> not solve this problem). >> >> My only question is about the need to separate out each virt driver >> into a separate project, wouldn't you accomplish a lot of the benefit >> by creating a single virt project that includes all of the drivers? >> I wouldn't necessarily expect a VMware guy to understand the >> specifics of the HyperV implementation but both people should >> understand what a virt driver does, how it interfaces to Nova and >> they should be able to intelligently review each other's code. >> >> -- >> Don Dugger >> "Censeo Toto nos in Kansa esse decisse." - D. Gale >> Ph: 303/443-3786 >> >> -Original Message- >> From: Daniel P. Berrange [mailto:berra...@redhat.com] >> Sent: Thursday, September 4, 2014 4:24 AM >> To: OpenStack Development >> Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting >> out virt drivers >> >> Position statement >> == >> >> Over the past year I've increasingly come to the conclusion that Nova >> is heading for (or probably already at) a major crisis. If steps are >> not taken to avert this, the project is likely to loose a non-trivial >> amount of talent, both regular code contributors and core team >> members. That includes myself. This is not good for Nova's long term >> health and so should be of concern to anyone involved in Nova and OpenStack. >> >> For those who don't want to read the whole mail, the executive >> summary is that the nova-core team is an unfixable bottleneck in our >> development process with our current proje
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote: > Hi, > I do not think that Nova is in a death spiral. I just think that the > current way of working at the moment is strangling the project. I do not > understand why we need to split drivers out of the core project. Why not > have the ability to provide Œcore review¹ status to people for reviewing > those parts of the code? We have enough talented people in OpenStack to be > able to write a driver above gerrit to enable that. The consensus view at the summit was that, having tried & failed at getting useful changes into gerrit, it is not a viable option unless we undertake a permanent fork of the code base. There didn't seem to be any apetite for maintaining & developing a large java app ourselves. So people we're looking to start writing a replacement for gerrit from scratch (albeit reusing the database schema). Even if we did have such fine grained permissioning in gerrit or another review tool, I'd still suggest a split because this is about more than just the review team size. There are a number of other compelling benefits to having fully separate drivers I've mentioned in the original thread & other replies here. > Fragmenting the project will be very unhealthy. On the contrary, I think it will re-invigorate the project. The other historical cases where open stack projects have split out code have resulted in a pretty significant benefit for all involved. The testing frameworks we have will help ensure that the virt drivers continue to provide consistent semantics, just as they do today, and any eventual openstack trademark certifications would re-inforce that. Improving the specification of the virt driver interface by introducing more objects and killing undocumented dict usage will also further help in keeping virt drivers aligned. > On 9/4/14, 3:59 PM, "Thierry Carrez" wrote: > > >Like I mentioned before, I think the only way out of the Nova death > >spiral is to split code and give control over it to smaller dedicated > >review teams. This is one way to do it. Thanks Dan for pulling this > >together :) > > > >A couple comments inline: > > > >Daniel P. Berrange wrote: > >> [...] > >> This is a crisis. A large crisis. In fact, if you got a moment, it's > >> a twelve-storey crisis with a magnificent entrance hall, carpeting > >> throughout, 24-hour portage, and an enormous sign on the roof, > >> saying 'This Is a Large Crisis'. A large crisis requires a large > >> plan. > >> [...] > > > >I totally agree. We need a plan now, because we can't go through another > >cycle without a solution in sight. > > > >> [...] > >> This has quite a few implications for the way development would > >> operate. > >> > >> - The Nova core team at least, would be voluntarily giving up a big > >>amount of responsibility over the evolution of virt drivers. Due > >>to human nature, people are not good at giving up power, so this > >>may be painful to swallow. Realistically current nova core are > >>not experts in most of the virt drivers to start with, and more > >>important we clearly do not have sufficient time to do a good job > >>of review with everything submitted. Much of the current need > >>for core review of virt drivers is to prevent the mis-use of a > >>poorly defined virt driver API...which can be mitigated - See > >>later point(s) > >> > >> - Nova core would/should not have automatic +2 over the virt driver > >>repositories since it is unreasonable to assume they have the > >>suitable domain knowledge for all virt drivers out there. People > >>would of course be able to be members of multiple core teams. For > >>example John G would naturally be nova-core and nova-xen-core. I > >>would aim for nova-core and nova-libvirt-core, and so on. I do not > >>want any +2 responsibility over VMWare/HyperV/Docker drivers since > >>they're not my area of expertize - I only look at them today because > >>they have no other nova-core representation. > >> > >> - Not sure if it implies the Nova PTL would be solely focused on > >>Nova common. eg would there continue to be one PTL over all virt > >>driver implementation projects, or would each project have its > >>own PTL. Maybe this is irrelevant if a Czars approach is chosen > >>by virt driver projects for their work. I'd be inclined to say > >>that a single PTL should stay as a figurehead to represent all > >>the virt driver projects, acting as a point of contact to ensure > >>we keep communication / co-operation between the drivers in sync. > >> [...] > > > >At this point it may look like our current structure (programs, one PTL, > >single core teams...) prevents us from implementing that solution. I > >just want to say that in OpenStack, organizational structure reflects > >how we work, not the other way around. If we need to reorganize > >"official" project structure to work in smarter and long-term hea
[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
+1 I very much agree with Dan's the propsal. I am concerned about difficulties we will face with merging patches that spreads accross various regions: manager, conductor, scheduler, etc.. However, I think, this is a small price to pay for having a more focused teams. IMO, we will stiil have to pay it, the moment the scheduler will separate. Regards, Vladik ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 10:18:04AM -0500, Matt Riedemann wrote: > > >> > >> - Changes submitted to nova common code would trigger running of CI > >>tests against the external virt drivers. Each virt driver core team > >>would decide whether they want their driver to be tested upon Nova > >>common changes. Expect that all would choose to be included to the > >>same extent that they are today. So level of validation of nova code > >>would remain at least at current level. I don't want to reduce the > >>amount of code testing here since that's contrary to the direction > >>we're taking wrt testing. > >> > >> - Changes submitted to virt drivers would trigger running CI tests > >>that are applicable. eg changes to libvirt driver repo would not > >>involve running database migration tests, since all database code > >>is isolated in nova. libvirt changes would not trigger vmware, > >>xenserver, ironic, etc CI systems. Virt driver changes should > >>see fewer false positives in the tests as a result, and those > >>that do occur should be more explicitly related to the code being > >>proposed. eg a change to vmware is not going to trigger a tempest > >>run that uses libvirt, so non-deterministic failures in libvirt > >>will no longer plague vmware developers reviews. This would also > >>make it possible for VMWare CI to be made gating for changes to > >>the VMWare virt driver repository, without negatively impacting > >>other virt drivers. So this change should increase testing quality > >>for non-libvirt virt drivers and reduce pain of false failures > >>for everyone. [snip] > Even if we split the virt drivers out, libvirt would still be the default in > the Tempest gate runs right? Yes, what I'm calling the nova common repository would still need to have a tempest job that was gating on at least one virt driver as a sanity check. As mentioned above, I'd pretty much expect that all current tempest jobs for nova common code would continue unchanged. IOW, a libvirt job would still be gating, and there'd still be a number of 3rd party CIs for other virt drivers non-gating too. The only change in testing jobs would be wrt to the new git repos for the individual virt drivers. Those would be only running jobs directly related to the code in those repos. it vmware is tested by a vmware CI job and libvirt is tested by a libvirt CI job. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 04/09/2014 17:00, Solly Ross a écrit : My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I don't think there's particularly a *point* to having all drivers in one repo. Part of code review is looking for code "gotchas", but part of code review is looking for subtle issues that are caused by the very nature of the driver. A HyperV "core" reviewing a libvirt change should certainly be able to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, then he or she should be a libvirt "core" as well). A strong +1 to Dan's proposal. I think this would also make it easier for non-core reviewers to get started reviewing, without having a specialized tool setup. As I said previously, I'm also giving a +1 to this proposal. That said, as I think it deserves at least one iteration for getting this done (look at the scheduler split and since hox long we're working on it), I also think we need a short-term solution like the one proposed by Thierry, ie. what I call "half-cores" - people who help reviewing an code area and free up time for cores just for approving instead of focusing on each iteration. -Sylvain Best Regards, Solly Ross P.S. This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. Ha! - Original Message - From: "Donald D Dugger" To: "Daniel P. Berrange" , "OpenStack Development Mailing List (not for usage questions)" Sent: Thursday, September 4, 2014 10:33:27 AM Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Thursday, September 4, 2014 4:24 AM To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed & merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the re
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On 9/4/2014 9:57 AM, Daniel P. Berrange wrote: On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote: Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). Thanks for taking the time to read & give feedback My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. A single repo for virt drivers would have all the same costs of separating from nova common, but with fewer of the benefits of separate repos per driver. IOW, if we're going to split the virt drivers out from the nova common, then we should go all the way. I think the separate driver repos is fairly compelling for a number of reasons besides just core team size. As mentioned elsewhere it allows better targeting of CI test jobs. ie a VMware CI job can be easily made gating for only VMware code changes. So VMWare CI instability won't affect libvirt code submissions, and libvirt CI instability won't affect VMware code submissions. Separate repos means that people starting off a new driver (like Ironic or Docker) would not have to immediately meet the same very high quality & testing bar that existing drivers do. THey can evolve at their own pace and not have to then undergo the disruption of jumping from their initial repo to the 'official' repo. Finally, I would like each drivers team to be isolated from each other in terms of code review capacity planning as far as practical - ie the libvirt team should be able to accept as many libvirt features as they can handle without being concerned that they'll reduce what vmware is able to accept (though changes involving the nova common code would obviously still contend). Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed & merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Le 04/09/2014 15:36, Gary Kotton a écrit : Hi, I do not think that Nova is in a death spiral. I just think that the current way of working at the moment is strangling the project. I do not understand why we need to split drivers out of the core project. Why not have the ability to provide Œcore review¹ status to people for reviewing those parts of the code? We have enough talented people in OpenStack to be able to write a driver above gerrit to enable that. Fragmenting the project will be very unhealthy. For what it is worth having a release date at the end of a vacation is really bad. Look at the numbers: http://stackalytics.com/report/contribution/nova-group/30 Thanks Gary From my perspective, the raw number of reviews should not be the only metric for saying if someone good for being a core. Indeed, that's quite easy to provide some comments on cosmetic but if you see why the patches are getting a -1 from a core, that's mostly because of a more important design issue or going reverse from another current effort. Also, I can note that Stackanalytics metrics are *really* different from other tools like http://russellbryant.net/openstack-stats/nova-reviewers-30.txt As a non-core people, I can just say that a core people must be at least there during Nova meetings and voice his opinions, provide some help with the gate status, look at bugs, give feedback to newcomers etc. and not just click on -1 or +1 Here, the problem is that the core team is not scalable : I don't want to provide examples of governments but just adding more people is often not the solution. Instead, providing delegations to subteams seems maybe the intermediate solution for helping this as it could help the core team to only approve and leave the subteam's half-cores reviewing the iterations until they consider the patch enough good for being merged. Of course, nova cores could still bypass half-cores as they know the whole knowledge of Nova, or they could disapprove what the halfcores agreed, but that would free a lot of time for cores without giving them more bureaucracy. I really like Dan's proposal of splitting code into different repos with separate teams and a single PTL (that's exactly the difference in between a Program and a Project) but as it requires some prework, I'm just thinking of allocating halfcores as a short-term solution until all the bits are sorted out. And yes, there is urgency, I also felt the pain. -Sylvain On 9/4/14, 3:59 PM, "Thierry Carrez" wrote: Like I mentioned before, I think the only way out of the Nova death spiral is to split code and give control over it to smaller dedicated review teams. This is one way to do it. Thanks Dan for pulling this together :) A couple comments inline: Daniel P. Berrange wrote: [...] This is a crisis. A large crisis. In fact, if you got a moment, it's a twelve-storey crisis with a magnificent entrance hall, carpeting throughout, 24-hour portage, and an enormous sign on the roof, saying 'This Is a Large Crisis'. A large crisis requires a large plan. [...] I totally agree. We need a plan now, because we can't go through another cycle without a solution in sight. [...] This has quite a few implications for the way development would operate. - The Nova core team at least, would be voluntarily giving up a big amount of responsibility over the evolution of virt drivers. Due to human nature, people are not good at giving up power, so this may be painful to swallow. Realistically current nova core are not experts in most of the virt drivers to start with, and more important we clearly do not have sufficient time to do a good job of review with everything submitted. Much of the current need for core review of virt drivers is to prevent the mis-use of a poorly defined virt driver API...which can be mitigated - See later point(s) - Nova core would/should not have automatic +2 over the virt driver repositories since it is unreasonable to assume they have the suitable domain knowledge for all virt drivers out there. People would of course be able to be members of multiple core teams. For example John G would naturally be nova-core and nova-xen-core. I would aim for nova-core and nova-libvirt-core, and so on. I do not want any +2 responsibility over VMWare/HyperV/Docker drivers since they're not my area of expertize - I only look at them today because they have no other nova-core representation. - Not sure if it implies the Nova PTL would be solely focused on Nova common. eg would there continue to be one PTL over all virt driver implementation projects, or would each project have its own PTL. Maybe this is irrelevant if a Czars approach is chosen by virt driver projects for their work. I'd be inclined to say that a single PTL should stay as a figurehead to represent all the virt driver projects, acting as a point of contact to ensure
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
> My only question is about the need to separate out each virt driver into a > separate project, wouldn't you > accomplish a lot of the benefit by creating a single virt project that > includes all of the drivers? I don't think there's particularly a *point* to having all drivers in one repo. Part of code review is looking for code "gotchas", but part of code review is looking for subtle issues that are caused by the very nature of the driver. A HyperV "core" reviewing a libvirt change should certainly be able to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, then he or she should be a libvirt "core" as well). A strong +1 to Dan's proposal. I think this would also make it easier for non-core reviewers to get started reviewing, without having a specialized tool setup. Best Regards, Solly Ross P.S. >This is a crisis. A large crisis. In fact, if you got a moment, it's > a twelve-storey crisis with a magnificent entrance hall, carpeting > throughout, 24-hour portage, and an enormous sign on the roof, > saying 'This Is a Large Crisis'. A large crisis requires a large > plan. Ha! - Original Message - > From: "Donald D Dugger" > To: "Daniel P. Berrange" , "OpenStack Development > Mailing List (not for usage questions)" > > Sent: Thursday, September 4, 2014 10:33:27 AM > Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out > virt drivers > > Basically +1 with what Daniel is saying (note that, as mentioned, a side > effect of our effort to split out the scheduler will help but not solve this > problem). > > My only question is about the need to separate out each virt driver into a > separate project, wouldn't you accomplish a lot of the benefit by creating a > single virt project that includes all of the drivers? I wouldn't > necessarily expect a VMware guy to understand the specifics of the HyperV > implementation but both people should understand what a virt driver does, > how it interfaces to Nova and they should be able to intelligently review > each other's code. > > -- > Don Dugger > "Censeo Toto nos in Kansa esse decisse." - D. Gale > Ph: 303/443-3786 > > -----Original Message- > From: Daniel P. Berrange [mailto:berra...@redhat.com] > Sent: Thursday, September 4, 2014 4:24 AM > To: OpenStack Development > Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out > virt drivers > > Position statement > == > > Over the past year I've increasingly come to the conclusion that Nova is > heading for (or probably already at) a major crisis. If steps are not taken > to avert this, the project is likely to loose a non-trivial amount of > talent, both regular code contributors and core team members. That includes > myself. This is not good for Nova's long term health and so should be of > concern to anyone involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive summary is > that the nova-core team is an unfixable bottleneck in our development > process with our current project structure. > The only way I see to remove the bottleneck is to split the virt drivers out > of tree and let them all have their own core teams in their area of code, > leaving current nova core to focus on all the common code outside the virt > driver impls. I, now, none the less urge people to read the whole mail. > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they add up to a > big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear that the > backlog of code up for review never goes away. Even intensive code review > efforts at various points in the dev cycle makes only a small impact on the > backlog. This has a pretty significant impact on core team members, as their > work is never done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with the reviews > in a more efficient manner than plain gerrit allows for. These certainly > help, but they can
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote: > Basically +1 with what Daniel is saying (note that, as mentioned, > a side effect of our effort to split out the scheduler will help > but not solve this problem). Thanks for taking the time to read & give feedback > My only question is about the need to separate out each virt driver > into a separate project, wouldn't you accomplish a lot of the > benefit by creating a single virt project that includes all of the > drivers? I wouldn't necessarily expect a VMware guy to understand > the specifics of the HyperV implementation but both people should > understand what a virt driver does, how it interfaces to Nova and > they should be able to intelligently review each other's code. A single repo for virt drivers would have all the same costs of separating from nova common, but with fewer of the benefits of separate repos per driver. IOW, if we're going to split the virt drivers out from the nova common, then we should go all the way. I think the separate driver repos is fairly compelling for a number of reasons besides just core team size. As mentioned elsewhere it allows better targeting of CI test jobs. ie a VMware CI job can be easily made gating for only VMware code changes. So VMWare CI instability won't affect libvirt code submissions, and libvirt CI instability won't affect VMware code submissions. Separate repos means that people starting off a new driver (like Ironic or Docker) would not have to immediately meet the same very high quality & testing bar that existing drivers do. THey can evolve at their own pace and not have to then undergo the disruption of jumping from their initial repo to the 'official' repo. Finally, I would like each drivers team to be isolated from each other in terms of code review capacity planning as far as practical - ie the libvirt team should be able to accept as many libvirt features as they can handle without being concerned that they'll reduce what vmware is able to accept (though changes involving the nova common code would obviously still contend). > Position statement > == > > Over the past year I've increasingly come to the conclusion that Nova is > heading for (or probably already at) a major crisis. If steps are not taken > to avert this, the project is likely to loose a non-trivial amount of talent, > both regular code contributors and core team members. That includes myself. > This is not good for Nova's long term health and so should be of concern to > anyone involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive summary is > that the nova-core team is an unfixable bottleneck in our development process > with our current project structure. > The only way I see to remove the bottleneck is to split the virt drivers out > of tree and let them all have their own core teams in their area of code, > leaving current nova core to focus on all the common code outside the virt > driver impls. I, now, none the less urge people to read the whole mail. > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they add up to a > big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear that the > backlog of code up for review never goes away. Even intensive code review > efforts at various points in the dev cycle makes only a small impact on the > backlog. This has a pretty significant impact on core team members, as their > work is never done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with the reviews > in a more efficient manner than plain gerrit allows for. These certainly > help, but they can't ever solve the problem on their own - just make it > slightly more bearable. And this is not even considering that core team > members might have useful contributions to make in ways beyond just code > review. Ultimately the workload is just too high to sustain the levels of > review required, so core team members will eventually burn out (as they have > done many times already). > > Even if one person attempts to take the initiative to heavily invest in > review of certain features it is often to no avail. > Unless a second dedicated core reviewer can be found to 'tag team' it is hard > for one person to make a difference. The end result is that a patch is +2d > and then sits idle for weeks or more until a merge conflict requires it
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Basically +1 with what Daniel is saying (note that, as mentioned, a side effect of our effort to split out the scheduler will help but not solve this problem). My only question is about the need to separate out each virt driver into a separate project, wouldn't you accomplish a lot of the benefit by creating a single virt project that includes all of the drivers? I wouldn't necessarily expect a VMware guy to understand the specifics of the HyperV implementation but both people should understand what a virt driver does, how it interfaces to Nova and they should be able to intelligently review each other's code. -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Daniel P. Berrange [mailto:berra...@redhat.com] Sent: Thursday, September 4, 2014 4:24 AM To: OpenStack Development Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed & merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers & the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not have the time available to invest in learning the entire of Nova. With the number of reviews proposed to Nova, the core team should probably be at least double its current size[1]. There is plenty of expertize in the project as a whole but it is typically focused into specific areas of the codebase. There is nowhere we can find 20 more people with broad knowledge of the codebase who could be promoted even over the next year, let alone today. This is ignoring that many existing members of core are relatively inactive due to burnout and so need replacing. That means we really need anothe
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Hi, I do not think that Nova is in a death spiral. I just think that the current way of working at the moment is strangling the project. I do not understand why we need to split drivers out of the core project. Why not have the ability to provide Œcore review¹ status to people for reviewing those parts of the code? We have enough talented people in OpenStack to be able to write a driver above gerrit to enable that. Fragmenting the project will be very unhealthy. For what it is worth having a release date at the end of a vacation is really bad. Look at the numbers: http://stackalytics.com/report/contribution/nova-group/30 Thanks Gary On 9/4/14, 3:59 PM, "Thierry Carrez" wrote: >Like I mentioned before, I think the only way out of the Nova death >spiral is to split code and give control over it to smaller dedicated >review teams. This is one way to do it. Thanks Dan for pulling this >together :) > >A couple comments inline: > >Daniel P. Berrange wrote: >> [...] >> This is a crisis. A large crisis. In fact, if you got a moment, it's >> a twelve-storey crisis with a magnificent entrance hall, carpeting >> throughout, 24-hour portage, and an enormous sign on the roof, >> saying 'This Is a Large Crisis'. A large crisis requires a large >> plan. >> [...] > >I totally agree. We need a plan now, because we can't go through another >cycle without a solution in sight. > >> [...] >> This has quite a few implications for the way development would >> operate. >> >> - The Nova core team at least, would be voluntarily giving up a big >>amount of responsibility over the evolution of virt drivers. Due >>to human nature, people are not good at giving up power, so this >>may be painful to swallow. Realistically current nova core are >>not experts in most of the virt drivers to start with, and more >>important we clearly do not have sufficient time to do a good job >>of review with everything submitted. Much of the current need >>for core review of virt drivers is to prevent the mis-use of a >>poorly defined virt driver API...which can be mitigated - See >>later point(s) >> >> - Nova core would/should not have automatic +2 over the virt driver >>repositories since it is unreasonable to assume they have the >>suitable domain knowledge for all virt drivers out there. People >>would of course be able to be members of multiple core teams. For >>example John G would naturally be nova-core and nova-xen-core. I >>would aim for nova-core and nova-libvirt-core, and so on. I do not >>want any +2 responsibility over VMWare/HyperV/Docker drivers since >>they're not my area of expertize - I only look at them today because >>they have no other nova-core representation. >> >> - Not sure if it implies the Nova PTL would be solely focused on >>Nova common. eg would there continue to be one PTL over all virt >>driver implementation projects, or would each project have its >>own PTL. Maybe this is irrelevant if a Czars approach is chosen >>by virt driver projects for their work. I'd be inclined to say >>that a single PTL should stay as a figurehead to represent all >>the virt driver projects, acting as a point of contact to ensure >>we keep communication / co-operation between the drivers in sync. >> [...] > >At this point it may look like our current structure (programs, one PTL, >single core teams...) prevents us from implementing that solution. I >just want to say that in OpenStack, organizational structure reflects >how we work, not the other way around. If we need to reorganize >"official" project structure to work in smarter and long-term healthy >ways, that's a really small price to pay. > >-- >Thierry Carrez (ttx) > >___ >OpenStack-dev mailing list >OpenStack-dev@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Like I mentioned before, I think the only way out of the Nova death spiral is to split code and give control over it to smaller dedicated review teams. This is one way to do it. Thanks Dan for pulling this together :) A couple comments inline: Daniel P. Berrange wrote: > [...] > This is a crisis. A large crisis. In fact, if you got a moment, it's > a twelve-storey crisis with a magnificent entrance hall, carpeting > throughout, 24-hour portage, and an enormous sign on the roof, > saying 'This Is a Large Crisis'. A large crisis requires a large > plan. > [...] I totally agree. We need a plan now, because we can't go through another cycle without a solution in sight. > [...] > This has quite a few implications for the way development would > operate. > > - The Nova core team at least, would be voluntarily giving up a big >amount of responsibility over the evolution of virt drivers. Due >to human nature, people are not good at giving up power, so this >may be painful to swallow. Realistically current nova core are >not experts in most of the virt drivers to start with, and more >important we clearly do not have sufficient time to do a good job >of review with everything submitted. Much of the current need >for core review of virt drivers is to prevent the mis-use of a >poorly defined virt driver API...which can be mitigated - See >later point(s) > > - Nova core would/should not have automatic +2 over the virt driver >repositories since it is unreasonable to assume they have the >suitable domain knowledge for all virt drivers out there. People >would of course be able to be members of multiple core teams. For >example John G would naturally be nova-core and nova-xen-core. I >would aim for nova-core and nova-libvirt-core, and so on. I do not >want any +2 responsibility over VMWare/HyperV/Docker drivers since >they're not my area of expertize - I only look at them today because >they have no other nova-core representation. > > - Not sure if it implies the Nova PTL would be solely focused on >Nova common. eg would there continue to be one PTL over all virt >driver implementation projects, or would each project have its >own PTL. Maybe this is irrelevant if a Czars approach is chosen >by virt driver projects for their work. I'd be inclined to say >that a single PTL should stay as a figurehead to represent all >the virt driver projects, acting as a point of contact to ensure >we keep communication / co-operation between the drivers in sync. > [...] At this point it may look like our current structure (programs, one PTL, single core teams...) prevents us from implementing that solution. I just want to say that in OpenStack, organizational structure reflects how we work, not the other way around. If we need to reorganize "official" project structure to work in smarter and long-term healthy ways, that's a really small price to pay. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
On Thu, Sep 04, 2014 at 12:14:39PM +, Day, Phil wrote: > Hi Daniel, > > Thanks for putting together such a thoughtful piece - I probably need to > re-read it few times to take in everything you're saying, but a couple > of thoughts that did occur to me: > > - I can see how this could help where a change is fully contained within > a virt driver, but I wonder how many of those there really are ? Of the > things that I've see go through recently nearly all also seem to touch the > compute manager in someway, and a lot (like the Numa changes) also have > impacts into the scheduler. Isn't it going to make it harder to get > any of those changes in if they have to be co-ordinated across two or > more repos ? Actually, in my experiance of reviewing code this past cycle or two I see a fairly significant portion of code that is entirely within the scope of a virt driver. I'm also seeing that people are refraining from actually doing changes to the virt drivers because of the burden of getting code past review, so what we see today is probably not even representative of the potential. There are certainly some high profile exceptions such as the NUMA work, or the new serial console work where you're going to cross the repos. In such work we already try to break patches into isolated pieces, so the stuff touching common code is a separate commit from the stuff touching virt code. This is general good practice to be encouraging. So, yes, it would need coordination across the repos to get the full work submitted, but I don't think that burden is unduly large compared to current practice. We do in fact already see this need for co-ordination in other ways, For example, API changes have parts that affect python-novaclient, and perhaps horizon too. Storage & network changes often cross Neutron / Cinder and Nova. If we can reduce the burden on nova-core the stuff going into common codebase shoudl stand more chance of getting review too. So overall yes, this is a valid point, but I'm not particularly concerned about the negatives impacts of it, because we're already dealing with them today to a large extent. > - I think you hit the nail on the head in terms of the scope of > Nova and how few people probably really understand all of it, > but given the amount of trust that goes with being a core wouldn't > it also be able to make people cores on the understanding that > they will only approve code in the areas they are expert in ? > It kind of feels that this happens to a large extent already, > for example I don't see Chris or Ken'ichi taking on work outside > of the API layer.It kind of feels as if given a small amount > of trust we could have additional core reviewers focused on > specific parts of the system without having to split up the > code base if that's where the problem is. Yes, you are right that it happens to some extent but I think it is quite a big jump to effectively scale it up that amount of trust to a team that realistically would need to be 40+ people in size. Also this isn't soley about review bandwidth. One of the things I raised was about how there's certain standards required for being part of nova, such as CI testing. If you can't meet that you're forced into a sub-optimal development practice compared to the rest of nova where you are out of tree at subject to be broken by Nova changes at any time, which is what Docker and Ironic have been facing. Separate repos will also facilitate more targetted application of our testing resources, so vmware repo changes wouldn't need to suffer false failures from libvirt tempest jobs, and similarly vmware CI could be made gating for vmware without causing libvirt code to suffer instability. > > -Original Message- > > From: Daniel P. Berrange [mailto:berra...@redhat.com] > > Sent: 04 September 2014 11:24 > > To: OpenStack Development > > Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out > > virt > > drivers > > > > Position statement > > == > > > > Over the past year I've increasingly come to the conclusion that Nova is > > heading for (or probably already at) a major crisis. If steps are not taken > > to > > avert this, the project is likely to loose a non-trivial amount of talent, > > both > > regular code contributors and core team members. That includes myself. This > > is not good for Nova's long term health and so should be of concern to > > anyone involved in Nova and OpenStack. > > > > For those who don't want to read the whole mail, the executive summary is > > that the nova-core team is an unfixable bottleneck in our development > > process with our curre
Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Hi Daniel, Thanks for putting together such a thoughtful piece - I probably need to re-read it few times to take in everything you're saying, but a couple of thoughts that did occur to me: - I can see how this could help where a change is fully contained within a virt driver, but I wonder how many of those there really are ? Of the things that I've see go through recently nearly all also seem to touch the compute manager in someway, and a lot (like the Numa changes) also have impacts into the scheduler. Isn't it going to make it harder to get any of those changes in if they have to be co-ordinated across two or more repos ? - I think you hit the nail on the head in terms of the scope of Nova and how few people probably really understand all of it, but given the amount of trust that goes with being a core wouldn't it also be able to make people cores on the understanding that they will only approve code in the areas they are expert in ?It kind of feels that this happens to a large extent already, for example I don't see Chris or Ken'ichi taking on work outside of the API layer. It kind of feels as if given a small amount of trust we could have additional core reviewers focused on specific parts of the system without having to split up the code base if that's where the problem is. Phil > -Original Message- > From: Daniel P. Berrange [mailto:berra...@redhat.com] > Sent: 04 September 2014 11:24 > To: OpenStack Development > Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt > drivers > > Position statement > == > > Over the past year I've increasingly come to the conclusion that Nova is > heading for (or probably already at) a major crisis. If steps are not taken to > avert this, the project is likely to loose a non-trivial amount of talent, > both > regular code contributors and core team members. That includes myself. This > is not good for Nova's long term health and so should be of concern to > anyone involved in Nova and OpenStack. > > For those who don't want to read the whole mail, the executive summary is > that the nova-core team is an unfixable bottleneck in our development > process with our current project structure. > The only way I see to remove the bottleneck is to split the virt drivers out > of > tree and let them all have their own core teams in their area of code, leaving > current nova core to focus on all the common code outside the virt driver > impls. I, now, none the less urge people to read the whole mail. > > > Background information > == > > I see many factors coming together to form the crisis > > - Burn out of core team members from over work > - Difficulty bringing new talent into the core team > - Long delay in getting code reviewed & merged > - Marginalization of code areas which aren't popular > - Increasing size of nova code through new drivers > - Exclusion of developers without corporate backing > > Each item on their own may not seem too bad, but combined they add up to > a big problem. > > Core team burn out > -- > > Having been involved in Nova for several dev cycles now, it is clear that the > backlog of code up for review never goes away. Even intensive code review > efforts at various points in the dev cycle makes only a small impact on the > backlog. This has a pretty significant impact on core team members, as their > work is never done. At best, the dial is sometimes set to 10, instead of 11. > > Many people, myself included, have built tools to help deal with the reviews > in a more efficient manner than plain gerrit allows for. These certainly help, > but they can't ever solve the problem on their own - just make it slightly > more bearable. And this is not even considering that core team members > might have useful contributions to make in ways beyond just code review. > Ultimately the workload is just too high to sustain the levels of review > required, so core team members will eventually burn out (as they have done > many times already). > > Even if one person attempts to take the initiative to heavily invest in review > of certain features it is often to no avail. > Unless a second dedicated core reviewer can be found to 'tag team' it is hard > for one person to make a difference. The end result is that a patch is +2d and > then sits idle for weeks or more until a merge conflict requires it to be > reposted at which point even that one +2 is lost. This is a pretty > demotivating > outcome for both reviewers & the patch contributor. > > > New core team talent > > > It can't escape atte
[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers
Position statement == Over the past year I've increasingly come to the conclusion that Nova is heading for (or probably already at) a major crisis. If steps are not taken to avert this, the project is likely to loose a non-trivial amount of talent, both regular code contributors and core team members. That includes myself. This is not good for Nova's long term health and so should be of concern to anyone involved in Nova and OpenStack. For those who don't want to read the whole mail, the executive summary is that the nova-core team is an unfixable bottleneck in our development process with our current project structure. The only way I see to remove the bottleneck is to split the virt drivers out of tree and let them all have their own core teams in their area of code, leaving current nova core to focus on all the common code outside the virt driver impls. I, now, none the less urge people to read the whole mail. Background information == I see many factors coming together to form the crisis - Burn out of core team members from over work - Difficulty bringing new talent into the core team - Long delay in getting code reviewed & merged - Marginalization of code areas which aren't popular - Increasing size of nova code through new drivers - Exclusion of developers without corporate backing Each item on their own may not seem too bad, but combined they add up to a big problem. Core team burn out -- Having been involved in Nova for several dev cycles now, it is clear that the backlog of code up for review never goes away. Even intensive code review efforts at various points in the dev cycle makes only a small impact on the backlog. This has a pretty significant impact on core team members, as their work is never done. At best, the dial is sometimes set to 10, instead of 11. Many people, myself included, have built tools to help deal with the reviews in a more efficient manner than plain gerrit allows for. These certainly help, but they can't ever solve the problem on their own - just make it slightly more bearable. And this is not even considering that core team members might have useful contributions to make in ways beyond just code review. Ultimately the workload is just too high to sustain the levels of review required, so core team members will eventually burn out (as they have done many times already). Even if one person attempts to take the initiative to heavily invest in review of certain features it is often to no avail. Unless a second dedicated core reviewer can be found to 'tag team' it is hard for one person to make a difference. The end result is that a patch is +2d and then sits idle for weeks or more until a merge conflict requires it to be reposted at which point even that one +2 is lost. This is a pretty demotivating outcome for both reviewers & the patch contributor. New core team talent It can't escape attention that the Nova core team does not grow in size very often. When Nova was younger and its code base was smaller, it was easier for contributors to get onto core because the base level of knowledge required was that much smaller. To get onto core today requires a major investment in learning Nova over a year or more. Even people who potentially have the latent skills may not have the time available to invest in learning the entire of Nova. With the number of reviews proposed to Nova, the core team should probably be at least double its current size[1]. There is plenty of expertize in the project as a whole but it is typically focused into specific areas of the codebase. There is nowhere we can find 20 more people with broad knowledge of the codebase who could be promoted even over the next year, let alone today. This is ignoring that many existing members of core are relatively inactive due to burnout and so need replacing. That means we really need another 25-30 people for core. That's not going to happen. Code review delays -- The obvious result of having too much work for too few reviewers is that code contributors face major delays in getting their work reviewed and merged. From personal experience, during Juno, I've probably spent 1 week in aggregate on actual code development vs 8 weeks on waiting on code review. You have to constantly be on alert for review comments because unless you can respond quickly (and repost) while you still have the attention of the reviewer, they may not be look again for days/weeks. The length of time to get work merged serves as a demotivator to actually do work in the first place. I've personally avoided doing alot of code refactoring & cleanup work that would improve the maintainability of the libvirt driver in the long term, because I can't face the battle to get it reviewed & merged. Other people have told me much the same. It is not uncommon to see changes that have been pending for 2 dev cycles, not because the code was bad but becau