Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-12 Thread Daniel P. Berrange
On Thu, Sep 11, 2014 at 02:02:00PM -0400, Dan Prince wrote:
 I've always referred to the virt/driver.py API as an internal API
 meaning there are no guarantees about it being preserved across
 releases. I'm not saying this is correct... just that it is what we've
 got.  While OpenStack attempts to do a good job at stabilizing its
 public API's we haven't done the same for internal API's. It is actually
 quite painful to be out of tree at this point as I've seen with the
 Ironic driver being out of the Nova tree. (really glad that is back in
 now!)

Oh absolutely, I've always insisted that virt/driver.py is unstable
and that as a result out of tree drivers get to keep both pieces when
it breaks.

 So because we haven't designed things to be split out in this regard we
 can't just go and do it. 

I don't think that conclusion follows directly. We certainly need to
do some prep work to firm up our virt driver interface, as outlined
in my original mail, but if we agreed to push forward in this I think
it is practical to get that done in Kilo and split in L. It is
mostly a matter of having the will todo it IMHO.

 I tinkered with some numbers... not sure if this helps or hurts my
 stance but here goes. By my calculation this is the number of commits
 we've made that touched each virt driver tree for the last 3 releases
 plus stuff done to-date in Juno.
 
 Created using a command like this in each virt directory for each
 release: git log origin/stable/havana..origin/stable/icehouse
 --no-merges --pretty=oneline . | wc -l
 
 essex = folsom:
 
  baremetal: 26
  hyperv: 9
  libvirt: 222
  vmwareapi: 18
  xenapi: 164
 * total for above: 439
 
 folsom = grizzly:
 
  baremetal: 83
  hyperv: 58
  libvirt: 254
  vmwareapi: 59
  xenapi: 126
* total for above: 580
 
 grizzly = havana:
 
  baremetal: 48
  hyperv: 55
  libvirt: 157
  vmwareapi: 105
  xenapi: 123
* total for above: 488
 
 havana = icehouse:
 
  baremetal: 45
  hyperv: 42
  libvirt: 212
  vmwareapi: 121
  xenapi: 100
* total for above: 520
 
 icehouse = master:
 
  baremetal: 26
  hyperv: 32
  libvirt: 188
  vmwareapi: 121
  xenapi: 71
* total for above: 438
 
 ---
 
 A couple of things jump out at me from the numbers:
 
  -drivers that are being deprecated (baremetal) still have lots of
 changes. Some of these changes are valid bug fixes for the driver but a
 majority of them are actually related to internal cleanups and interface
 changes. This goes towards the fact that Nova isn't mature enough to do
 a split like this yet.

Our position that the virt driver is internal only, has permitted us
to make backwards incompatible changes to it at will. Given that freedom
people inevitably take that route since is is the least effort option.
If our position had been that the virt driver needed to be forwards
compatible, people would have been forced to make the same changes without
breaking existing drivers.  IOW, the fact that we've made lots of changes
to baremetal historically, doesn't imply that we can't decide to make the
virt driver API stable henceforth  thus avoid further changes of that
kind.

  -the number of commits landed isn't growing *that* much across releases
 in the virt driver trees. Presumably we think we were doing a better job
 2 years ago? But the number of changes in the virt trees is largely the
 same... perhaps this is because people aren't submitting stuff because
 they are frustrated though?

Our core team size  thus review bandwidth has been fairly static over
that time, so the only way virt driver commits could have risen is if
core reviewers increased their focus on virt drivers at the expense of
other parts of nova. I actually read those numbers as showing that as
we've put more effort into reviewing vmware contributions, we've lost
resource going into libvirt contributions.

In addition we're of course missing out on capturing the changes that
we've never had submitted, or submitted by abandoned, or submitted by
slipped across multiple releases waiting for merge. Overall I think
the figures paint a pretty depressing picture of no overall growth,
perhaps even a decline.


 
 For comparison here are the total number of commits for each Nova
 release (includes the above commits):
 
 essex - folsom: 1708
 folsom - grizzly: 2131
 grizzly - havana: 2188
 havana - icehouse: 1696
 icehouse - master: 1493
 
 ---

So we've still a way to go for juno cycle, but I'd be surprised if we
got beyond the havana numbers given where we are today. Again I think
those numbers show a plateau or even decline, which just reinforces
my point that our model is not scaling today.

 So say around 30% of the commits for a given release touch the virt
 drivers themselves.. many of them aren't specifically related to the
 virt drivers. Rather just general Nova internal cleanups because the
 interfaces aren't stable.
 
 And while splitting Nova virt drivers might help out some I'm not sure
 it helps the general Nova issue in that we 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Daniel P. Berrange
On Wed, Sep 10, 2014 at 12:41:44PM -0700, Vishvananda Ishaya wrote:
 
 On Sep 5, 2014, at 4:12 AM, Sean Dague s...@dague.net wrote:
 
  On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
  
  
  Just some things to think about with regards to the whole idea, by no
  means exhaustive.
  
  So maybe the better question is: what are the top sources of technical
  debt in Nova that we need to address? And if we did, everyone would be
  more sane, and feel less burnt.
  
  Maybe the drivers are the worst debt, and jettisoning them makes them
  someone else's problem, so that helps some. I'm not entirely convinced
  right now.
  
  I think Cells represents a lot of debt right now. It doesn't fully work
  with the rest of Nova, and produces a ton of extra code paths special
  cased for the cells path.
  
  The Scheduler has a ton of debt as has been pointed out by the efforts
  in and around Gannt. The focus has been on the split, but realistically
  I'm with Jay is that we should focus on the debt, and exposing a REST
  interface in Nova.
  
  What about the Nova objects transition? That continues to be slow
  because it's basically Dan (with a few other helpers from time to time).
  Would it be helpful if we did an all hands on deck transition of the
  rest of Nova for K1 and just get it done? Would be nice to have the bulk
  of Nova core working on one thing like this and actually be in shared
  context with everyone else for a while.
 
 In my mind, spliting helps with all of these things. A lot of the cleanup
 related work is completely delayed because the review queue starts to seem
 like an insurmountable hurdle. There are various cleanups needed in the
 drivers as well but they are not progressing due to the glacier pace we
 are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug
 fixes, Libvirt resize/migrate (this is still using ssh to copy data!)
 
 People need smaller areas of work. And they need a sense of pride and
 ownership of the things that they work on. In my mind that is the best
 way to ensure success.

I do like to look at past experiance for guidance, and with Nova we have
had a history of splitting out pieces of code and I think it is fair to
say that all those splits have been very successful for both sides (the
new project and Nova). eg if we look at the size and scope of the cinder
project  team today, I don't think it could ever have grown to that
scale if it had remained part of Nova. Splitting it out unleashed its
latent potential for success.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Dan Prince
On Thu, 2014-09-04 at 11:24 +0100, Daniel P. Berrange wrote:
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.
 


I've always referred to the virt/driver.py API as an internal API
meaning there are no guarantees about it being preserved across
releases. I'm not saying this is correct... just that it is what we've
got.  While OpenStack attempts to do a good job at stabilizing its
public API's we haven't done the same for internal API's. It is actually
quite painful to be out of tree at this point as I've seen with the
Ironic driver being out of the Nova tree. (really glad that is back in
now!)

So because we haven't designed things to be split out in this regard we
can't just go and do it. 

I tinkered with some numbers... not sure if this helps or hurts my
stance but here goes. By my calculation this is the number of commits
we've made that touched each virt driver tree for the last 3 releases
plus stuff done to-date in Juno.

Created using a command like this in each virt directory for each
release: git log origin/stable/havana..origin/stable/icehouse
--no-merges --pretty=oneline . | wc -l

essex = folsom:

 baremetal: 26
 hyperv: 9
 libvirt: 222
 vmwareapi: 18
 xenapi: 164
* total for above: 439

folsom = grizzly:

 baremetal: 83
 hyperv: 58
 libvirt: 254
 vmwareapi: 59
 xenapi: 126
   * total for above: 580

grizzly = havana:

 baremetal: 48
 hyperv: 55
 libvirt: 157
 vmwareapi: 105
 xenapi: 123
   * total for above: 488

havana = icehouse:

 baremetal: 45
 hyperv: 42
 libvirt: 212
 vmwareapi: 121
 xenapi: 100
   * total for above: 520

icehouse = master:

 baremetal: 26
 hyperv: 32
 libvirt: 188
 vmwareapi: 121
 xenapi: 71
   * total for above: 438

---

A couple of things jump out at me from the numbers:

 -drivers that are being deprecated (baremetal) still have lots of
changes. Some of these changes are valid bug fixes for the driver but a
majority of them are actually related to internal cleanups and interface
changes. This goes towards the fact that Nova isn't mature enough to do
a split like this yet.

 -the number of commits landed isn't growing *that* much across releases
in the virt driver trees. Presumably we think we were doing a better job
2 years ago? But the number of changes in the virt trees is largely the
same... perhaps this is because people aren't submitting stuff because
they are frustrated though?

---

For comparison here are the total number of commits for each Nova
release (includes the above commits):

essex - folsom: 1708
folsom - grizzly: 2131
grizzly - havana: 2188
havana - icehouse: 1696
icehouse - master: 1493

---

So say around 30% of the commits for a given release touch the virt
drivers themselves.. many of them aren't specifically related to the
virt drivers. Rather just general Nova internal cleanups because the
interfaces aren't stable.

And while splitting Nova virt drivers might help out some I'm not sure
it helps the general Nova issue in that we have more reviews with less
of the good ones landing. Nova is a weird beast at the moment and just
splitting things like this is probably going to harm as much as it helps
(like we saw with Ironic) unless we stabilize the APIs... and even then
I'm skeptical of death by a million tiny sub-projects. I'm just not
convinced this is the number #1 pain point around Nova reviews. What
about the other 70%?

For me a lot of the frustration with reviews is around test/gate time,
pushing things through, rechecks, etc... and if we break something it
takes just as much time to get the revert in. The last point (the
ability to revert code quickly) is a really important one as it
sometimes takes days to get a simple (obvious) revert landed. This
leaves groups like TripleO who have their own CI and 3rd party testing
systems which also capable of finding many critical issues in the
difficult position of having to revert/cherry pick critical changes for
days at a time in order to keep things running.

Maybe I'm impatient (I totally am!) but I see much of the review
slowdown as a result of the feedback loop times increasing over 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-11 Thread Chris Friesen

On 09/11/2014 12:02 PM, Dan Prince wrote:


Maybe I'm impatient (I totally am!) but I see much of the review
slowdown as a result of the feedback loop times increasing over the
years. OpenStack has some really great CI and testing but I think our
focus on not breaking things actually has us painted into a corner. We
are losing our agility and the review process is paying the price. At
this point I think splitting out the virt drivers would be more of a
distraction than a help.


I think the only solution to feedback loop times increasing is to scale 
the review process, which I think means giving more people 
responsibility for a smaller amount of code.


I don't think it's strictly necessary to split the code out into a 
totally separate repo, but I do think it would make sense to have 
changes that are entirely contained within a virt driver be reviewed 
only by developers of that virt driver rather than requiring review by 
the project as a whole.  And they should only have to pass a subset of 
the CI testing--that way they wouldn't be held up by gating bugs in 
other areas.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 4, 2014, at 3:24 AM, Daniel P. Berrange berra...@redhat.com wrote:

 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.

I am highly in favor of this approach (and have been for at
least a year). Every time we have brought this up in the past
there has been concern about the shared code, but we have to
make a change. We have tried various other approaches and none
of them have made a dent.

+1000

Vish
 
 
 Background information
 ==
 
 I see many factors coming together to form the crisis
 
 - Burn out of core team members from over work 
 - Difficulty bringing new talent into the core team
 - Long delay in getting code reviewed  merged
 - Marginalization of code areas which aren't popular
 - Increasing size of nova code through new drivers
 - Exclusion of developers without corporate backing
 
 Each item on their own may not seem too bad, but combined they
 add up to a big problem.
 
 Core team burn out
 --
 
 Having been involved in Nova for several dev cycles now, it is clear
 that the backlog of code up for review never goes away. Even
 intensive code review efforts at various points in the dev cycle
 makes only a small impact on the backlog. This has a pretty
 significant impact on core team members, as their work is never
 done. At best, the dial is sometimes set to 10, instead of 11.
 
 Many people, myself included, have built tools to help deal with
 the reviews in a more efficient manner than plain gerrit allows
 for. These certainly help, but they can't ever solve the problem
 on their own - just make it slightly more bearable. And this is
 not even considering that core team members might have useful
 contributions to make in ways beyond just code review. Ultimately
 the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they
 have done many times already).
 
 Even if one person attempts to take the initiative to heavily
 invest in review of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag
 team' it is hard for one person to make a difference. The end
 result is that a patch is +2d and then sits idle for weeks or
 more until a merge conflict requires it to be reposted at which
 point even that one +2 is lost. This is a pretty demotivating
 outcome for both reviewers  the patch contributor.
 
 
 New core team talent
 
 
 It can't escape attention that the Nova core team does not grow
 in size very often. When Nova was younger and its code base was
 smaller, it was easier for contributors to get onto core because
 the base level of knowledge required was that much smaller. To
 get onto core today requires a major investment in learning Nova
 over a year or more. Even people who potentially have the latent
 skills may not have the time available to invest in learning the
 entire of Nova.
 
 With the number of reviews proposed to Nova, the core team should
 probably be at least double its current size[1]. There is plenty of
 expertize in the project as a whole but it is typically focused
 into specific areas of the codebase. There is nowhere we can find
 20 more people with broad knowledge of the codebase who could be
 promoted even over the next year, let alone today. This is ignoring
 that many existing members of core are relatively inactive due to
 burnout and so need replacing. That means we really need another
 25-30 people for core. That's not going to happen.
 
 
 Code review delays
 --
 
 The obvious result of having too much work for too few reviewers
 is that code contributors face major delays in getting their work
 reviewed and merged. From personal experience, during Juno, I've
 probably spent 1 week in aggregate on actual code development vs
 8 weeks on waiting on code review. You have to constantly be on
 alert for review comments because unless you can respond quickly
 (and repost) while you still have the attention of the reviewer,
 they may not be look again for days/weeks.
 
 The 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 4, 2014, at 8:33 AM, Daniel P. Berrange berra...@redhat.com wrote:

 On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote:
 Hi,
 I do not think that Nova is in a death spiral. I just think that the
 current way of working at the moment is strangling the project. I do not
 understand why we need to split drivers out of the core project. Why not
 have the ability to provide Œcore review¹ status to people for reviewing
 those parts of the code? We have enough talented people in OpenStack to be
 able to write a driver above gerrit to enable that.
 
 The consensus view at the summit was that, having tried  failed at getting
 useful changes into gerrit, it is not a viable option unless we undertake a
 permanent fork of the code base. There didn't seem to be any apetite for
 maintaining  developing a large java app ourselves. So people we're looking
 to start writing a replacement for gerrit from scratch (albeit reusing the
 database schema).

I don’t think this is a viable option for us, but if we were going to do it,
we would probably be better off using https://code.google.com/p/rietveld/
as a base, since it is actually written in python.

Vish



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Vishvananda Ishaya

On Sep 5, 2014, at 4:12 AM, Sean Dague s...@dague.net wrote:

 On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
 
 
 Just some things to think about with regards to the whole idea, by no
 means exhaustive.
 
 So maybe the better question is: what are the top sources of technical
 debt in Nova that we need to address? And if we did, everyone would be
 more sane, and feel less burnt.
 
 Maybe the drivers are the worst debt, and jettisoning them makes them
 someone else's problem, so that helps some. I'm not entirely convinced
 right now.
 
 I think Cells represents a lot of debt right now. It doesn't fully work
 with the rest of Nova, and produces a ton of extra code paths special
 cased for the cells path.
 
 The Scheduler has a ton of debt as has been pointed out by the efforts
 in and around Gannt. The focus has been on the split, but realistically
 I'm with Jay is that we should focus on the debt, and exposing a REST
 interface in Nova.
 
 What about the Nova objects transition? That continues to be slow
 because it's basically Dan (with a few other helpers from time to time).
 Would it be helpful if we did an all hands on deck transition of the
 rest of Nova for K1 and just get it done? Would be nice to have the bulk
 of Nova core working on one thing like this and actually be in shared
 context with everyone else for a while.

In my mind, spliting helps with all of these things. A lot of the cleanup
related work is completely delayed because the review queue starts to seem
like an insurmountable hurdle. There are various cleanups needed in the
drivers as well but they are not progressing due to the glacier pace we
are moving right now. Some examples: Vmware spawn refactor, Hyper-v bug
fixes, Libvirt resize/migrate (this is still using ssh to copy data!)

People need smaller areas of work. And they need a sense of pride and
ownership of the things that they work on. In my mind that is the best
way to ensure success.

Vish





signature.asc
Description: Message signed with OpenPGP using GPGMail
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-10 Thread Jeremy Stanley
On 2014-09-10 12:19:08 -0700 (-0700), Vishvananda Ishaya wrote:
 I don’t think this is a viable option for us, but if we were going
 to do it, we would probably be better off using
 https://code.google.com/p/rietveld/ as a base, since it is
 actually written in python.

The proposal floated in Atlanta was to write a new python-based
front-end built on Gerrit's API layer (in fact, at least one such
alternative front-end now exists in the form of gertty, but that's
console-oriented and so probably not to everyone's tastes). I'll let
the vinz developers speak to their plans and current progress
though.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-09 Thread Gary Kotton


On 9/8/14, 7:23 PM, Sylvain Bauza sba...@redhat.com wrote:


Le 08/09/2014 18:06, Steven Dake a écrit :
 On 09/05/2014 06:10 AM, Sylvain Bauza wrote:

 Le 05/09/2014 12:48, Sean Dague a écrit :
 On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
 Le 05/09/2014 01:22, Michael Still a écrit :
 On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
 berra...@redhat.com wrote:

 [Heavy snipping because of length]

 The radical (?) solution to the nova core team bottleneck is thus
to
 follow this lead and split the nova virt drivers out into separate
 projects and delegate their maintainence to new dedicated teams.

- Nova becomes the home for the public APIs, RPC system,
database
  persistent and the glue that ties all this together with the
  virt driver API.

- Each virt driver project gets its own core team and is
 responsible
  for dealing with review, merge  release of their codebase.
 I think this is the crux of the matter. We're not doing a great
 job of
 landing code at the moment, because we can't keep up with the review
 workload.

 So far we've had two proposals mooted:

- slots / runways, where we try to rate limit the number of
things
 we're trying to review at once to maintain focus
- splitting all the virt drivers out of the nova tree
 Ahem, IIRC, there is a third proposal for Kilo :
   - create subteam's half-cores responsible for reviewing patch's
 iterations and send to cores approvals requests once they consider
the
 patch enough stable for it.

 As I explained, it would allow to free up reviewing time for cores
 without loosing the control over what is being merged.
 I don't really understand how the half core idea works outside of a
 math
 equation, because the point is in core is to have trust over the
 judgement of your fellow core members so that they can land code when
 you aren't looking. I'm not sure how I manage to build up half trust
in
 someone any quicker.

 Well, this thread is becoming huge so that's becoming hard to follow
 all the discussion but I explained the idea elsewhere. Let me just
 provide it here too :
 The idea is *not* to land patches by the halfcores. Core team will
 still be fully responsible for approving patches. The main problem in
 Nova is that cores are spending lots of time because they review each
 iteration of a patch, and also have to look at if a patch is good or
 not.

 That's really time consuming, and for most of the time, quite
 frustrating as it requires to follow the patch's life, so there are
 high risks that your core attention is becoming distracted over the
 life of the patch.

 Here, the idea is to reduce dramatically this time by having teams
 dedicated to specific areas (as it's already done anyway for the
 various majority of reviewers) who could on their own take time for
 reviewing all the iterations. Of course, that doesn't mean cores
 would loose the possibility to specifically follow a patch and bypass
 the halfcores, that's just for helping them if they're overwhelmed.

 About the question of trusting cores or halfcores, I can just say
 that Nova team is anyway needing to grow up or divide it so the
 trusting delegation has to be real anyway.

 This whole process is IMHO very encouraging for newcomers because
 that creates dedicated teams that could help them to improve their
 changes, and not waiting 2 months for getting a -1 and a frank reply.


 Interesting idea, but having been core on Heat for ~2 years, it is
 critical to be involved in the review from the beginning of the patch
 set.  Typically you won't see core reviewer's participate in a review
 that is already being handled by two core reviewers.

 The reason it is important from the beginning of the change request is
 that the project core can store the iterations and purpose of the
 change in their heads.  Delegating all that up front work to a
 non-core just seems counter to the entire process of code reviews.
 Better would be reduce the # of reviews in the queue (what is proposed
 by this change) or trust new reviewers faster.  I'm not sure how you
 do that - but this second model is what your proposing.

 I think one thing that would be helpful is to point out somehow in the
 workflow that two core reviewers are involved in the review so core
 reviewers don't have to sift through 10 pages of reviews to find new
 work.


Now that the specs repo is in place and has been proved with Juno, most
of the design stage is approved before the implementation is going. If
the cores are getting more time because they wouldn't be focused on each
single patchset, they could really find some patches they would like to
look at, or they could just wait for the half-approvals from the
halfcores.

If a core thinks that a patch is enough tricky for looking at each
iteration, I don't see any bad things. At least, it's up to the core
reviewer to choose which patches he could look at, and he would be more
free than if the slots proposal would be there.

I'm a core from a 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-08 Thread Dan Smith
 The last few days have been interesting as I watch FFEs come through.
 People post explaining their feature, its importance, and the risk
 associated with it. Three cores sign on for review. All of the ones
 I've looked at have received active review since being posted. Would
 it be bonkers to declare nova to be in permanent feature freeze? If
 we could maintain the level of focus we see now, then we'd be getting
 heaps more done that before.
 
 Agreed. Honestly, this has been a really nice flow. I'd love to figure
 out what part of this focus is capturable for normal cadence. This
 realistically is what I was hoping slots would provide, because I feel
 like we actually move really fast when we call out 5-10 things to go
 look at this week.

The funny thing is, last week I was thinking how similar FF is to what
slots/runways would likely provide. That is, intense directed focus on a
single thing by a group of people until it's merged (or fails). Context
is kept between iterations because everyone is on board for quick
iterations with minimal distraction between them. It *does* work during
FF, as we've seen in the past -- I'd expect we have nearly 100% merge
rate of FFEs. How we arrive at a thing getting focus is different in
slots/runways, but I feel the result could be the same.

Splitting out the virt drivers is an easy way to make the life of a core
much easier, but I think the negative impacts are severe and potentially
irreversible, so I'd rather make sure we're totally out of options
before we exercise it.

--Dan



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-08 Thread Sylvain Bauza


Le 08/09/2014 18:06, Steven Dake a écrit :

On 09/05/2014 06:10 AM, Sylvain Bauza wrote:


Le 05/09/2014 12:48, Sean Dague a écrit :

On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
berra...@redhat.com wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.

   - Each virt driver project gets its own core team and is 
responsible

 for dealing with review, merge  release of their codebase.
I think this is the crux of the matter. We're not doing a great 
job of

landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

   - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
   - splitting all the virt drivers out of the nova tree

Ahem, IIRC, there is a third proposal for Kilo :
  - create subteam's half-cores responsible for reviewing patch's
iterations and send to cores approvals requests once they consider the
patch enough stable for it.

As I explained, it would allow to free up reviewing time for cores
without loosing the control over what is being merged.
I don't really understand how the half core idea works outside of a 
math

equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.


Well, this thread is becoming huge so that's becoming hard to follow 
all the discussion but I explained the idea elsewhere. Let me just 
provide it here too :
The idea is *not* to land patches by the halfcores. Core team will 
still be fully responsible for approving patches. The main problem in 
Nova is that cores are spending lots of time because they review each 
iteration of a patch, and also have to look at if a patch is good or 
not.


That's really time consuming, and for most of the time, quite 
frustrating as it requires to follow the patch's life, so there are 
high risks that your core attention is becoming distracted over the 
life of the patch.


Here, the idea is to reduce dramatically this time by having teams 
dedicated to specific areas (as it's already done anyway for the 
various majority of reviewers) who could on their own take time for 
reviewing all the iterations. Of course, that doesn't mean cores 
would loose the possibility to specifically follow a patch and bypass 
the halfcores, that's just for helping them if they're overwhelmed.


About the question of trusting cores or halfcores, I can just say 
that Nova team is anyway needing to grow up or divide it so the 
trusting delegation has to be real anyway.


This whole process is IMHO very encouraging for newcomers because 
that creates dedicated teams that could help them to improve their 
changes, and not waiting 2 months for getting a -1 and a frank reply.



Interesting idea, but having been core on Heat for ~2 years, it is 
critical to be involved in the review from the beginning of the patch 
set.  Typically you won't see core reviewer's participate in a review 
that is already being handled by two core reviewers.


The reason it is important from the beginning of the change request is 
that the project core can store the iterations and purpose of the 
change in their heads.  Delegating all that up front work to a 
non-core just seems counter to the entire process of code reviews. 
Better would be reduce the # of reviews in the queue (what is proposed 
by this change) or trust new reviewers faster.  I'm not sure how you 
do that - but this second model is what your proposing.


I think one thing that would be helpful is to point out somehow in the 
workflow that two core reviewers are involved in the review so core 
reviewers don't have to sift through 10 pages of reviews to find new 
work.




Now that the specs repo is in place and has been proved with Juno, most 
of the design stage is approved before the implementation is going. If 
the cores are getting more time because they wouldn't be focused on each 
single patchset, they could really find some patches they would like to 
look at, or they could just wait for the half-approvals from the halfcores.


If a core thinks that a patch is enough tricky for looking at each 
iteration, I don't see any bad things. At least, it's up to the core 
reviewer to choose which patches he could look at, and he would be more 
free than if the slots proposal would be there.


I'm a core from a tiny project but I know how time consuming it is. I 
would really enjoy if 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that 
Dan's proposal features quite prominently the following:


== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying 
needs to be done to the interfaces between nova-conductor, 
nova-compute, and nova-scheduler *before* any split of the scheduler 
code is even remotely feasible.


Splitting the scheduler out before this is done would actually not 
help but not solve this problem -- it would instead further the 
problem, IMO.




Jay, we agreed on a plan to carry on, please be sure we're working on 
it, see the Gantt meetings logs for what my vision is.



That said, I think this concern of clean interfaces also applies to this 
thread: if we want to spin off the virt drivers out of Nova git repo, 
that does requires a cleanup on the interfaces, in particular on the 
compute manager and the resource tracker, where a lot of bits are still 
strongly tied and not versionified (thanks to JSON dicts).


So, this effort requires at least one cycle, and as Dan stated, there is 
urgency, so I think we need to identify a short-term solution which 
doesn't require refactoring. My personal opinion is what Russell and 
Thierry expressed, ie. subteam delegation (to what I call half-cores) 
for iterations and only approvals for cores.


-Sylvain



Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange berra...@redhat.com wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

  - Nova becomes the home for the public APIs, RPC system, database
persistent and the glue that ties all this together with the
virt driver API.

  - Each virt driver project gets its own core team and is responsible
for dealing with review, merge  release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

  - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
  - splitting all the virt drivers out of the nova tree


Ahem, IIRC, there is a third proposal for Kilo :
 - create subteam's half-cores responsible for reviewing patch's 
iterations and send to cores approvals requests once they consider the 
patch enough stable for it.


As I explained, it would allow to free up reviewing time for cores 
without loosing the control over what is being merged.


-Sylvain


Splitting the drivers out of the nova tree does come at a cost -- we'd
need to stabilise and probably version the hypervisor driver
interface, and that will encourage more out of tree drivers, which
are things we haven't historically wanted to do. If we did this split,
I think we need to acknowledge that we are changing policy there. It
also means that nova-core wouldn't be the ones holding the quality bar
for hypervisor drivers any more, I guess this would open the door for
drivers to more actively compete on the quality of their
implementations, which might be a good thing.

Both of these have interesting aspects, and I agree we need to do
_something_. I do wonder if there is a hybrid approach as well though.
For example, could we implement some sort of more formal lieutenant
system for drivers? We've talked about it in the past but never been
able to express how it would work in practise.

The last few days have been interesting as I watch FFEs come through.
People post explaining their feature, its importance, and the risk
associated with it. Three cores sign on for review. All of the ones
I've looked at have received active review since being posted. Would
it be bonkers to declare nova to be in permanent feature freeze? If
we could maintain the level of focus we see now, then we'd be getting
heaps more done that before.

These issues should very definitely be on the agenda for the design
summit, probably early in the week.

Michael




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 02:56:04PM -0500, Kyle Mestery wrote:
 On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  Proposal / solution
  ===
 
  In the past Nova has spun out its volume layer to form the cinder
  project. The Neutron project started as an attempt to solve the
  networking space, and ultimately replace the nova-network. It
  is likely that the schedular will be spun out to a separate project.
 
  Now Neutron itself has grown so large and successful that it is
  considering going one step further and spinning its actual drivers
  out of tree into standalone add-on projects [4]. I've heard on the
  grapevine that Ironic is considering similar steps for hardware
  drivers.
 
 I just wanted to note that this is a huge problem in Neutron, and it
 gets worse with each release as we add on more drivers and plugins
 which carry a maintenance cost without gaining any new reviewers from
 the companies who have the drivers. The rough plan I have for Neutron
 involves moving all non-Open Source drivers out of tree into a
 separate git repository. Your message has made me think that perhaps
 we in Neutron should go one step further and even remove the Open
 Source drivers, leaving the in-tree implementation as the only one
 there. Where we move these is the main issue. Given we have 20+
 drivers/plugins now, one git repository per driver/plugin won't scale,
 as we add 3-5 each cycle. So perhaps a single repository is the best
 idea here, with shared reviews from vendors across each other's code.

While I'll make no secret of my dislike for closed source software,
my feeling is that OpenStack as a project is explicitly welcoming
closed source software  vendors, not least by virtue of using a
more permissive Apache license instead of a strong copyleft license
like GPL. So given the project's stance, I'd not be in favour of
discriminating against drivers for closed source software.

In actual fact though, the premise of my proposal is the idea that
moving a driver out of tree will actually help its development by
giving its team much greater freedom  responsbility. So by only
moving out non-open source drivers, we'd arguably be putting the
in-tree open source drivers at a disadvantage ! I'm also very much
drawn to the idea that having separate repos will let us do more
targetted setup of CI test jobs, so each test job is actually
directly relevant to the code being tested.

I can see your concern about the number of drivers you have in
Neutron and the frequency with which more are added. We don't
have anywhere near this number in Nova and are not likely to
ever grow that much. If you did have 30 separate drivers and
thus 30 separate GIT repos though, the question to consider is
who is ultimately responsible for reviewing those drivers. If
each of those 30 drivers had their own self-organized team of
people the burden of 30 repos is not as bad as it seems, since
any one person would probably only be concerned with a couple
of git repos.  If you still see the single neutron core team
being responsible for each of those repos, then I can see that
having 30 repos would be a big burden. I don't think there is
a single right answer here for all OpenStack projects. It is
entirely conceivable that it might be best for Neutron to have
a single repo for a set of driver, while being best for Nova
to have a separate repo for each driver.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 10:44:17PM -0600, John Griffith wrote:
 Just some thoughts and observations I've had regarding this topic in Cinder
 the past couple of years.  I realize this is a Nova thread so hopefully
 some of this can be applied in a more general context.
 
 TLDR:
 1. I think moving drivers into their own repo is just shoveling the pile to
 make a new pile (not really solving anything)

I'm not familiar with Cinder, but for Nova it would certainly have clear
benefits and not merely be shoveling the pile. Specifically it would

 - Easily let us double the number of core reviewers on aggregate

 - Reduce the bar for getting into a driver core team thus increasing
   the talent pool we can promote from.

 - Work accepted in a release for one driver would not reduce the
   bandwidth for another driver to accept work, since their review
   teams are separate

 - We can have more targetted testing, which will reduce the amount
   of bogus gate failures people get when submitting reviews and
   allow every driver to have gating CI jobs without impacting the
   other drivers

 2. Removal of drivers other than the reference implementation for each
 project could be the healthiest option
 a. Requires transparent, public, automated 3'rd party CI
 b. Requires a TRUE plugin architecture and mentality
 c. Requires a stable and well defined API

As mentioned in the original mail I don't want to see a situation where
we end up with some drivers in tree and others out of tree as it sets up
bad dynamics within the project. Those out of tree will always have the
impression of being second class citizens and thus there will be constant
pressure to accept drivers back into tree. The so called 'reference'
driver that stayed in tree would also continue to be penalized in the
way it is today, and so its development would be disadvantaged compared
to the out of tree drivers.

 3. While I'm still sort of a fan of the removal of drivers, I do think
 Cinder is making it work, there have been missteps and yes it's a pain
 sometimes but it's working ok and we've got plans to try and improve
 
 4. Adding restrictions like drivers only in first milestone and more
 intense scrutinization of features will go a long way to help resolve the
 issues we do have currently

Not in nova at least. We have a fundamental bottleneck in nova and
simply re-arranging review priorities in this kind of way will never
fix it. We've tried many different approaches to prioritization of
work and the only result is that we've got more aggressive at saying
no to contributors. This is directly resulting in the crisis we have
today.

 I've spent a fair amount of time thinking about the explosive number of
 drivers being added to Cinder over the past year or so.  I've been a pretty
 vocal proponent of the idea of removing all drivers except the LVM
 reference implementation from Cinder.  I'd rather see Vendors drivers
 maintained in their own Github Repo and truly follow a plugin model.
  This of course means that Cinder has to be truly designed and maintained
 with a real plugin architecture kept in mind in every aspect of development
 (experience proves this harder to do than it sounds).  I think with things
 stable and well defined interfaces as well as 3'rd party CI this is
 actually a reasonable approach and could be effective.  I do not see how
 creating a separate repo and in essence yet another set of OpenStack
 Projects really helps with the problem.  The fact is that the biggest issue
 most people see with driver contributions is those that are made by
 organizations that work on their driver only and don't contribute back to
 the core project (whether that be in the form of reviews of core
 contributions).  I'm not sure I understand why that would be any different
 by just putting the code in a separate bucket.  In other words, getting a
 solid and consistent team working on that project seems like you've just
 kicked the can down the road so you don't have to deal with it.

Fundamentally people contributing to a project are doing so voluntarily
to scratch their own itch. The project leadership can help identify areas
that need work and encourage people to take up the challenge, but you
cannot force people to do the work. We've done many things in nova that
are basically inflicting a form of punishment on contributors if they
don't work on things we tell them to work on. This is not having a positive
effect, on the contrary it is resulting in alot of demovated and pissed off
contributors who are ultimately leaving the project.

I agree that splitting the virt drivers out into their own repositories is
not going to hugely help get more people to work on Nova core - that was
not the primary intention. The big focus is on unblocking development of
the virt drivers so that their contributors actually feeled their efforts
are valued by the project. If we make the project a more attractive place
to work in general that will 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 12:57:57PM -0700, Joe Gordon wrote:
 On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange berra...@redhat.com
 wrote:
  Proposal / solution
  ===
 
  In the past Nova has spun out its volume layer to form the cinder
  project. The Neutron project started as an attempt to solve the
  networking space, and ultimately replace the nova-network. It
  is likely that the schedular will be spun out to a separate project.
 
  Now Neutron itself has grown so large and successful that it is
  considering going one step further and spinning its actual drivers
  out of tree into standalone add-on projects [4]. I've heard on the
  grapevine that Ironic is considering similar steps for hardware
  drivers.
 
  The radical (?) solution to the nova core team bottleneck is thus to
  follow this lead and split the nova virt drivers out into separate
  projects and delegate their maintainence to new dedicated teams.
 
   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.
 
   - Each virt driver project gets its own core team and is responsible
 for dealing with review, merge  release of their codebase.
 
 
 Overall I do think we need to re-think how the review burden is
 distributed. That being said, this is a nice proposal but I am not sure if
 it moves the review burden around enough or is the right approach. Do you
 have any rough numbers on what percent of the review burden goes to virt
 drivers today (how ever you want to define that statement, number of merged
 patches, man hours, lines of code, number of reviews  etc.). If for example
 today the nova review team spends 10% of there review time on virt drivers
 then I don't think this proposal will have a significant impact on the
 review backlog (for nova-common).

I'm a little wary of doing too many stats on things like reviews and
patches, because I fear it does not capture the full picture. Specifically
we're turning away contributors before they ever get to the point of
submitting reviews / patches, by rejecting their blueprints/specs.
Also the difficultly of getting stuff reviewed is discouraging people
even considering doing alot of work in the first place - if I had had the
confidence in getting it reviewed  merged I would easily have submitted
twice as much code to libvirt this cycle, but as it was I didn't even
start work on most things I would have liked to.

That said though, in the past 6 months we had 1385 changes merged.
Of those, 437 touched at least one file in the /virt/ directory
which is approximately 30%.

I agree though, this proposal will not have a dramatic effect on
the review backlog for the nova common code. It would probably be
a small (but noticable) improvement - most of the benefit would
fall on the virt drivers I expect. If we can make Nova a more
productive  enjoyable place to contribute though, this should
ultimately feed through into more people being involved in general
and thus more resource available to nova common too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 06:48:33PM -0400, Russell Bryant wrote:
 On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
  Position statement
  ==
  
  Over the past year I've increasingly come to the conclusion that
  Nova is heading for (or probably already at) a major crisis. If
  steps are not taken to avert this, the project is likely to loose
  a non-trivial amount of talent, both regular code contributors and
  core team members. That includes myself. This is not good for
  Nova's long term health and so should be of concern to anyone
  involved in Nova and OpenStack.
  
  For those who don't want to read the whole mail, the executive
  summary is that the nova-core team is an unfixable bottleneck
  in our development process with our current project structure.
  The only way I see to remove the bottleneck is to split the virt
  drivers out of tree and let them all have their own core teams
  in their area of code, leaving current nova core to focus on
  all the common code outside the virt driver impls. I, now, none
  the less urge people to read the whole mail.
 
 Fantastic write-up.  I can't +1 enough the problem statement, which I
 think you've done a nice job of framing.  We've taken steps to try to
 improve this, but none of them have been big enough.  I feel we've
 reached a tipping point.  I think many others do too, and several
 proposals being discussed all seem rooted in this same core issue.
 
 When it comes to the proposed solution, I'm +1 on that too, but part of
 that is that it's hard for me to ignore the limitations placed on us by
 our current review infrastructure (gerrit).
 
 If we ignored gerrit for a moment, is rapid increase in splitting out
 components the ideal workflow?  Would we be better off finding a way to
 finally just implement a model more like the Linux kernel with
 sub-system maintainers and pull requests to a top-level tree?  Maybe.
 I'm not convinced that split of repos is obviously better.
 
 You make some good arguments for why splitting has other benefits.

For a long time I've use the LKML 'subsystem maintainers' model as the
reference point for ideas. In a more LKML like model, each virt team
(or other subsystem team) would have their own separate GIT repo with
a complete Nova codebase, where they did they day to day code submissions,
reviews and merges. Periodically the primary subsystem maintainer would
submit a large pull / merge requests to the overall Nova maintainer.
The $1,000,000 question in such a model is what kind of code review
happens during the big pull requests to integrate subsystem trees. 

The closest example I can see is what's happening with the Ironic
driver merge reviews. I'm personally finding review of that to be
quite a burdensome activity, because all comments on the merge
review then get fed back to the orginal maintainers who do a new
round of patch + review in Ironic tree and then we get a new version
submitted back to nova tree for merge. Rinse, repeat.

So my biggest fear with a model where each team had their own full
Nova tree and did large pull requests, is that we'd suffer major
pain during the merging of large pull requests, especially if any
of the merges touched common code. It could make the pull requests
take a really long time to get accepted into the primary repo.

By constrast with split out git repos per virt driver code, we will
only ever have 1 stage of code review for each patch. Changes to
common code would go straight to main nova common repo and so get
reviewed by the experts there without delay, avoiding the 2nd stage
of review from merge requests.

The more I think abut this, the more attracted I am to the idea
that separate repos will facilitate us doing more targetted testing
and allow 3rd party CI to become gating over their respective virt
driver codebases.

Finally the LKML model would still leave some drivers at a disadvantage
for development, if they're not able to meet the standards we require
in terms of CI testing, to be accepted into the primary repo.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Christopher Yeoh
On Thu, 4 Sep 2014 11:24:29 +0100
Daniel P. Berrange berra...@redhat.com wrote:
 
  - A fairly significant amount of nova code would need to be
considered semi-stable API. Certainly everything under nova/virt
and any object which is passed in/out of the virt driver API.
Changes to such APIs would have to be done in a backwards
compatible manner, since it is no longer possible to lock-step
change all the virt driver impls. In some ways I think this would
be a good thing as it will encourage people to put more thought
into the long term maintainability of nova internal code instead
of relying on being able to rip it apart later, at will.
 
  - The nova/virt/driver.py class would need to be much better
specified. All parameters / return values which are opaque dicts
must be replaced with objects + attributes. Completion of the
objectification work is mandatory, so there is cleaner separation
between virt driver impls  the rest of Nova.

I think for this to work well with multiple repositories and drivers
having different priorities over implementing changes in the API it
would not just need to be semi-stable, but stable with versioning built
in from the start to allow for backwards incompatible changes. And
the interface would have to be very well documented including things
such as what exceptions are allowed to be raised through the API.
Hopefully this would be enforced through code as well. But as long as
driver maintainers are willing to commit to this extra overhead I can
see it working. 

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Christopher Yeoh
On Thu, 4 Sep 2014 12:57:57 -0700
Joe Gordon joe.gord...@gmail.com wrote:

 
 Overall I do think we need to re-think how the review burden is
 distributed. That being said, this is a nice proposal but I am not
 sure if it moves the review burden around enough or is the right
 approach. Do you have any rough numbers on what percent of the review
 burden goes to virt drivers today (how ever you want to define that
 statement, number of merged patches, man hours, lines of code, number
 of reviews  etc.). If for example today the nova review team spends
 10% of there review time on virt drivers then I don't think this
 proposal will have a significant impact on the review backlog (for
 nova-common).

Even if it doesn't have a huge impact on the review backlog for
nova-common (I think it should at least help a bit) it does have the
potential to make life much easier for the virt driver developers. 

I think my main concern is around testing - as soon as we have multiple
repositories involved I think debugging of test failures
(especially races) tends to get more complicated and we have fewer
people who are familiar enough with the two code bases. 

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
 On Thu, 4 Sep 2014 11:24:29 +0100
 Daniel P. Berrange berra...@redhat.com wrote:
  
   - A fairly significant amount of nova code would need to be
 considered semi-stable API. Certainly everything under nova/virt
 and any object which is passed in/out of the virt driver API.
 Changes to such APIs would have to be done in a backwards
 compatible manner, since it is no longer possible to lock-step
 change all the virt driver impls. In some ways I think this would
 be a good thing as it will encourage people to put more thought
 into the long term maintainability of nova internal code instead
 of relying on being able to rip it apart later, at will.
  
   - The nova/virt/driver.py class would need to be much better
 specified. All parameters / return values which are opaque dicts
 must be replaced with objects + attributes. Completion of the
 objectification work is mandatory, so there is cleaner separation
 between virt driver impls  the rest of Nova.
 
 I think for this to work well with multiple repositories and drivers
 having different priorities over implementing changes in the API it
 would not just need to be semi-stable, but stable with versioning built
 in from the start to allow for backwards incompatible changes. And
 the interface would have to be very well documented including things
 such as what exceptions are allowed to be raised through the API.
 Hopefully this would be enforced through code as well. But as long as
 driver maintainers are willing to commit to this extra overhead I can
 see it working. 

With our primary REST or RPC APIs we're under quite strict rules about
what we can  can't change - almost impossible to remove an existing
API from the REST API for example. With the internal virt driver API
we would probably have a little more freedom. For example, I think
if we found an existing virt driver API that was insufficient for a
new bit of work, we could add a new API in parallel with it, give the
virt drivers 1 dev cycle to convert, and then permanently delete the
original virt driver API. So a combination of that kind of API
replacement,  versioning for some data structures/objects, and use of
the capabilties flags would probably be sufficient. That's what I mean
by semi-stable here - no need to maintain existing virt driver APIs
indefinitely - we can remove  replace them in reasonably short time
scales as long as we avoid any lock-step updates.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread John Garbutt
On 4 September 2014 23:48, Russell Bryant rbry...@redhat.com wrote:
 On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
 Position statement
 ==

 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.

 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.

 Fantastic write-up.  I can't +1 enough the problem statement, which I
 think you've done a nice job of framing.  We've taken steps to try to
 improve this, but none of them have been big enough.  I feel we've
 reached a tipping point.  I think many others do too, and several
 proposals being discussed all seem rooted in this same core issue.

+1

I totally agree we need to split Nova up further, there just didn't
seem to be the support for this before now.

Not yet sure the virt drivers are the best split, but we already have
sub-teams ready to take them on, so it will probably work for that
reason.

 If we ignored gerrit for a moment, is rapid increase in splitting out
 components the ideal workflow?  Would we be better off finding a way to
 finally just implement a model more like the Linux kernel with
 sub-system maintainers and pull requests to a top-level tree?  Maybe.
 I'm not convinced that split of repos is obviously better.

I was thinking along similar lines.

Regardless of that, we should try this for Kilo.

If it feels like we are getting too much driver divergence, and
tempest is not keeping everyone inline, the community is fragmenting
and no one is working on the core of nova, then we might have to think
about an alternative plan for L, including bringing the drivers back
in tree.

At least the separate repos will help us firm up the interfaces, which
I think is a good thing.

I worry about what it means to test a feature in nova common, nova
api, or nova core or whatever we call it, if there are no virt
drivers in tree. To some extent we might want to improve the fake virt
driver for some in-tree functional tests anyways. But thats a separate
discussion.

 I don't think we can afford to wait much longer without drastic change,
 so let's make it happen.

+1

But I do think we should try and go further...

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.

Nova-network: Maybe there isn't a big enough community to support this
right now, but we need to actually delete this, or pull it out of
nova-core.

API: I suspect we might want to also look at splitting out the API
from Nova common too. This one is a slightly more drastic, and needs
more pre-split work (and is very related to making cells a first class
concept), but I am still battling with that inside my head.

Oslo: I suspect we may need to do something around the virt utilities,
so they are easy to share, but there are probably other opportunities
too.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 06:22:18PM -0500, Michael Still wrote:
 On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange berra...@redhat.com 
 wrote:
 
 [Heavy snipping because of length]
 
  The radical (?) solution to the nova core team bottleneck is thus to
  follow this lead and split the nova virt drivers out into separate
  projects and delegate their maintainence to new dedicated teams.
 
   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.
 
   - Each virt driver project gets its own core team and is responsible
 for dealing with review, merge  release of their codebase.
 
 I think this is the crux of the matter. We're not doing a great job of
 landing code at the moment, because we can't keep up with the review
 workload.
 
 So far we've had two proposals mooted:
 
  - slots / runways, where we try to rate limit the number of things
 we're trying to review at once to maintain focus

FWIW, I'm not really seeing that as a long term solution. In its
essence it is just a more effective way for us to say 'no' to our
potential contributors. While it could no doubt relieve pressure
on the core team by reducing the flow of the pipe, I don't think
it is helpful for our contributors overall.

  - splitting all the virt drivers out of the nova tree
 
 Splitting the drivers out of the nova tree does come at a cost -- we'd
 need to stabilise and probably version the hypervisor driver
 interface, and that will encourage more out of tree drivers, which
 are things we haven't historically wanted to do. If we did this split,
 I think we need to acknowledge that we are changing policy there. It
 also means that nova-core wouldn't be the ones holding the quality bar
 for hypervisor drivers any more, I guess this would open the door for
 drivers to more actively compete on the quality of their
 implementations, which might be a good thing.

There are already a number of drivers out of tree such as Docker,
Ironic (though soon to be in tree), and IIUC there's something IBM
have done for Power hypervisor, and work Oracle have done for the
Solaris virt/container technologies. Probably the distinction I'd
made is around things that are actively part of the OpenStack
community (eg on our gerrit infrastructure and or stackforge, etc),
vs things that are developed in complete isolation from the OpenStack
community.

I'm unclear what the state of play is wrt discussions on OpenStack
technology compatibility certification  trademark usage, but perhaps
that is a partial counterweight to your concern ? I'd certainly like
to see a focus on out of tree drivers remaining a strong part of the
openstack community, and not go off into their own completely isolated
world outside the community.

But yes, I am clearly proposing a change our integration policy here
and so we need need to carefully consider what that means and take
any neccessary steps to mitigate risks.

In some respects I think the split repos could allow us to raise the
bar in terms of quality. For example, with a single repo, I don't
see it ever being practical to make VMware/HyperV/XenAPI  CI systems
gating on changes, because it would push up the level of pain from
false job failures in the gate even further than today. With a separate
repo each virt driver would only need to run jobs directly related to
them, so the VMWare CI could easily be made gating on VMWare driver git
repo.

On testing in general, I think we need to look at the granularity
at which we run tests, in order to let us scale up the number of tests
we run. For example, it is suggested that each feature like disk 
encryption,  disk discard support, each vif driver, and so on, each
requires a new tempest job with appropriate settings. If we look at
the number of possible tunable knobs like, that easily implies 100's
more tempest jobs with varying configs. I don't think it is practical
to consider doing that with our setup today. With separate virt driver
repos we'd have more headroom to add a larger number of jobs since
the volume of changes being tested overall would be smaller.

 Both of these have interesting aspects, and I agree we need to do
 _something_. I do wonder if there is a hybrid approach as well though.
 For example, could we implement some sort of more formal lieutenant
 system for drivers? We've talked about it in the past but never been
 able to express how it would work in practise.

Gerrit makes it hard to express that formally due to the lack of
path based permissioning. If we do go for the virt driver split,
it would none the less be useful if we trialled a lieutenant or
sub-team model during Kilo, as a way to prepare for an eventual
driver split in L. So this is worth talking about regardless
I reckon.

I still think on balance a virt driver split is benefical since
it brings benefits beyond just the review team.

 The last few days have been interesting as I watch FFEs come 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread John Garbutt
On 5 September 2014 00:26, Jay Pipes jaypi...@gmail.com wrote:
 On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

 Basically +1 with what Daniel is saying (note that, as mentioned, a
 side effect of our effort to split out the scheduler will help but
 not solve this problem).


 The difference between Dan's proposal and the Gantt split is that Dan's
 proposal features quite prominently the following:

 == begin ==

  - The nova/virt/driver.py class would need to be much better
specified. All parameters / return values which are opaque dicts
must be replaced with objects + attributes. Completion of the
objectification work is mandatory, so there is cleaner separation
between virt driver impls  the rest of Nova.

 == end ==

 In other words, Dan's proposal above is EXACTLY what I've been saying needs
 to be done to the interfaces between nova-conductor, nova-compute, and
 nova-scheduler *before* any split of the scheduler code is even remotely
 feasible.

 Splitting the scheduler out before this is done would actually not help but
 not solve this problem -- it would instead further the problem, IMO.

Given any changes we make to the scheduler interface need to be
backwards compatible, I am not totally convinced being in a separate
repo makes things a whole lot worse, vs the review bottlenecks we
have. Anyways, I certainly agree that work needs to be done ASAP, and
if we can make that a priority in Nova, it would be much quicker and
easier to do while still inside Nova.

We have similar issues with glance, cinder and neutron right now that
need fixing soon too. I know we have patches up for some improvements
in that area, but it certainly feels like we need to do better there.

The virt driver is a step ahead of the scheduler because we know what
interface we are talking about, and we already have most of a
versioning plan in place.

I think the key work we have with the scheduler is to actually draw
out the interface (in code), so we agree what interface we need to
firm up and version. I think we are starting to get agreement on that
now, which is great.

I still think the scheduler split is as urgent as the virt split, but
the virt split is much closer to being possible right now.

At this point, it feels like all of kilo-1 gets dedicated to splitting
out these interfaces, and completing objects. But lets see what the
summit brings.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 11:29:43AM +0100, John Garbutt wrote:
 On 4 September 2014 23:48, Russell Bryant rbry...@redhat.com wrote:
  On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
  If we ignored gerrit for a moment, is rapid increase in splitting out
  components the ideal workflow?  Would we be better off finding a way to
  finally just implement a model more like the Linux kernel with
  sub-system maintainers and pull requests to a top-level tree?  Maybe.
  I'm not convinced that split of repos is obviously better.
 
 I was thinking along similar lines.
 
 Regardless of that, we should try this for Kilo.
 
 If it feels like we are getting too much driver divergence, and
 tempest is not keeping everyone inline, the community is fragmenting
 and no one is working on the core of nova, then we might have to think
 about an alternative plan for L, including bringing the drivers back
 in tree.
 
 At least the separate repos will help us firm up the interfaces, which
 I think is a good thing.
 
 I worry about what it means to test a feature in nova common, nova
 api, or nova core or whatever we call it, if there are no virt
 drivers in tree. To some extent we might want to improve the fake virt
 driver for some in-tree functional tests anyways. But thats a separate
 discussion.

I look at what we do with Ironic testing current as a guide here.
We have tempest job that runs against Nova, that validates changes
to nova don't break the separate Ironic git repo. So my thought
is that all our current tempest jobs would simply work in that
way. IOW changes to so called nova common would run jobs that
validate the change against all the virt driver git repos. I think
this kind of setup is pretty much mandatory for split repos to be
viable, because I don't want to see us loose testing coverage in
this proposed change.

Having a decent in-tree fake virt driver would none the less be
a nice idea, because it would allow for more complete functional
testing isolated from the risks of bugs in the virt drivers
themselves.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
 On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
 On Thu, 4 Sep 2014 11:24:29 +0100
 Daniel P. Berrange berra...@redhat.com wrote:

  - A fairly significant amount of nova code would need to be
considered semi-stable API. Certainly everything under nova/virt
and any object which is passed in/out of the virt driver API.
Changes to such APIs would have to be done in a backwards
compatible manner, since it is no longer possible to lock-step
change all the virt driver impls. In some ways I think this would
be a good thing as it will encourage people to put more thought
into the long term maintainability of nova internal code instead
of relying on being able to rip it apart later, at will.

  - The nova/virt/driver.py class would need to be much better
specified. All parameters / return values which are opaque dicts
must be replaced with objects + attributes. Completion of the
objectification work is mandatory, so there is cleaner separation
between virt driver impls  the rest of Nova.

 I think for this to work well with multiple repositories and drivers
 having different priorities over implementing changes in the API it
 would not just need to be semi-stable, but stable with versioning built
 in from the start to allow for backwards incompatible changes. And
 the interface would have to be very well documented including things
 such as what exceptions are allowed to be raised through the API.
 Hopefully this would be enforced through code as well. But as long as
 driver maintainers are willing to commit to this extra overhead I can
 see it working. 
 
 With our primary REST or RPC APIs we're under quite strict rules about
 what we can  can't change - almost impossible to remove an existing
 API from the REST API for example. With the internal virt driver API
 we would probably have a little more freedom. For example, I think
 if we found an existing virt driver API that was insufficient for a
 new bit of work, we could add a new API in parallel with it, give the
 virt drivers 1 dev cycle to convert, and then permanently delete the
 original virt driver API. So a combination of that kind of API
 replacement,  versioning for some data structures/objects, and use of
 the capabilties flags would probably be sufficient. That's what I mean
 by semi-stable here - no need to maintain existing virt driver APIs
 indefinitely - we can remove  replace them in reasonably short time
 scales as long as we avoid any lock-step updates.

I have spent a lot of time over the last year working on things that
require coordinated code lands between projects it's much more
friction than you give it credit.

Every added git tree adds a non linear cost to mental overhead, and a
non linear integration cost. Realistically the reason the gate is in the
state it is has a ton to do with the fact that it's integrating 40 git
trees. Because virt drivers run in the process space of Nova Compute,
they can pretty much do whatever, and the impacts are going to be
somewhat hard to figure out.

Also, if spinning these out seems like the right idea, I think nova-core
needs to retain core rights over the drivers as well. Because there do
need to be veto authority on some of the worst craziness.

If the VMWare team stopped trying to build a distributed lock manager
inside their compute driver, or the Hyperv team didn't wait until J2 to
start pushing patches, I think there would be more trust in some of
these teams. But, I am seriously concerned in both those cases, and the
slow review there is a function of a historic lack of trust in judgment.
I also personally went on a moratorium a year ago in reviewing either
driver because entities at both places where complaining to my
management chain through back channels that I was -1ing their code...
when I was one of the few people actually trying to provide constructive
feedback (basically only Russell and I were reviewing that code in
Grizzly, everyone else was ignoring it). Things may have changed since
then, at least I see a ton of good work from tjones in making Nova
overall better, but that was a pretty bitter pill. (Sorry for the
tangent, but honestly if we are going to fix what's broken we probably
have to expose all related brokens.)


If the concern is that we are keeping out too many contributors by the
CI requirements: let's let Class C back in tree. I believe in the
Freebsd case you were one of the original opponents to a top level
driver, and that they should go through libvirt instead. But I'm cool
with them just showing up as a Class C.

But I honestly don't think the virt driver split is going to make any of
this easier, when you account for the additional overhead it's going to
create, and the work required to get there.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
 On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
  On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
  On Thu, 4 Sep 2014 11:24:29 +0100
  Daniel P. Berrange berra...@redhat.com wrote:
 
   - A fairly significant amount of nova code would need to be
 considered semi-stable API. Certainly everything under nova/virt
 and any object which is passed in/out of the virt driver API.
 Changes to such APIs would have to be done in a backwards
 compatible manner, since it is no longer possible to lock-step
 change all the virt driver impls. In some ways I think this would
 be a good thing as it will encourage people to put more thought
 into the long term maintainability of nova internal code instead
 of relying on being able to rip it apart later, at will.
 
   - The nova/virt/driver.py class would need to be much better
 specified. All parameters / return values which are opaque dicts
 must be replaced with objects + attributes. Completion of the
 objectification work is mandatory, so there is cleaner separation
 between virt driver impls  the rest of Nova.
 
  I think for this to work well with multiple repositories and drivers
  having different priorities over implementing changes in the API it
  would not just need to be semi-stable, but stable with versioning built
  in from the start to allow for backwards incompatible changes. And
  the interface would have to be very well documented including things
  such as what exceptions are allowed to be raised through the API.
  Hopefully this would be enforced through code as well. But as long as
  driver maintainers are willing to commit to this extra overhead I can
  see it working. 
  
  With our primary REST or RPC APIs we're under quite strict rules about
  what we can  can't change - almost impossible to remove an existing
  API from the REST API for example. With the internal virt driver API
  we would probably have a little more freedom. For example, I think
  if we found an existing virt driver API that was insufficient for a
  new bit of work, we could add a new API in parallel with it, give the
  virt drivers 1 dev cycle to convert, and then permanently delete the
  original virt driver API. So a combination of that kind of API
  replacement,  versioning for some data structures/objects, and use of
  the capabilties flags would probably be sufficient. That's what I mean
  by semi-stable here - no need to maintain existing virt driver APIs
  indefinitely - we can remove  replace them in reasonably short time
  scales as long as we avoid any lock-step updates.
 
 I have spent a lot of time over the last year working on things that
 require coordinated code lands between projects it's much more
 friction than you give it credit.
 
 Every added git tree adds a non linear cost to mental overhead, and a
 non linear integration cost. Realistically the reason the gate is in the
 state it is has a ton to do with the fact that it's integrating 40 git
 trees. Because virt drivers run in the process space of Nova Compute,
 they can pretty much do whatever, and the impacts are going to be
 somewhat hard to figure out.
 
 Also, if spinning these out seems like the right idea, I think nova-core
 needs to retain core rights over the drivers as well. Because there do
 need to be veto authority on some of the worst craziness.

If they want todo crazy stuff, let them live or die with the
consequences.

 If the VMWare team stopped trying to build a distributed lock manager
 inside their compute driver, or the Hyperv team didn't wait until J2 to
 start pushing patches, I think there would be more trust in some of
 these teams. But, I am seriously concerned in both those cases, and the
 slow review there is a function of a historic lack of trust in judgment.
 I also personally went on a moratorium a year ago in reviewing either
 driver because entities at both places where complaining to my
 management chain through back channels that I was -1ing their code...

I venture to suggest that the reason we care so much about those kind
of things is precisely because of our policy of pulling them in the
tree. Having them in tree means their quality (or not) reflects directly
on the project as a whole. Separate them from Nova as a whole and give
them control of their own desinty and they can deal with the consequences
of their actions and people can judge the results for themselves.

We don't have the time or resources go continue baby-sitting them
ourselves - attempting todo so has just resulted in a scenario where
they end up getting largely ignored as you admit here. This ultimately
makes their quality even worse, because the lack of reviewer availability
means they stand little chance of pushing through the work to fix what
problems they have. We've seen this first hand with the major refactoring
that vmware driver team has been trying todo. Our 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
 On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
  A handy example of this I can think of is the currently granted FFE for
  serial consoles - consider how much of the code went into the common
  part vs. the libvirt specific part, I would say the ratio is very close
  to 1 if not even in favour of the common part (current 4 outstanding
  patches are all for core, and out of the 5 merged - only one of them was
  purely libvirt specific, assuming virt/ will live in nova-common).
  
  Joe asked a similar question elsewhere on the thread.
  
  Once again - I am not against doing it - what I am saying is that we
  need to look into this closer as it may not be as big of a win from the
  number of changes needed per feature as we may think.
  
  Just some things to think about with regards to the whole idea, by no
  means exhaustive.
 
 So maybe the better question is: what are the top sources of technical
 debt in Nova that we need to address? And if we did, everyone would be
 more sane, and feel less burnt.
 
 Maybe the drivers are the worst debt, and jettisoning them makes them
 someone else's problem, so that helps some. I'm not entirely convinced
 right now.
 
 I think Cells represents a lot of debt right now. It doesn't fully work
 with the rest of Nova, and produces a ton of extra code paths special
 cased for the cells path.
 
 The Scheduler has a ton of debt as has been pointed out by the efforts
 in and around Gannt. The focus has been on the split, but realistically
 I'm with Jay is that we should focus on the debt, and exposing a REST
 interface in Nova.
 
 What about the Nova objects transition? That continues to be slow
 because it's basically Dan (with a few other helpers from time to time).
 Would it be helpful if we did an all hands on deck transition of the
 rest of Nova for K1 and just get it done? Would be nice to have the bulk
 of Nova core working on one thing like this and actually be in shared
 context with everyone else for a while.

I think the idea that we can tell everyone in Nova what they should
focus on for a cycle, or more generally, is doomed to failure. This
isn't a closed source company controlled project where you can dictate
what everyones priority must be. We must accept that rely on all our
contributors good will in voluntarily giving their time  resource to
the projct, to scratch whatever itch they have in the project. We have
to encourage them to want to work nova and demonstrate that we value
whatever form of contributor they choose to make. If we have technical
debt that we think is important to address we need to illustrate /
show people why they should care about helping. If they none the less
decide that work isn't for them, we can't just cast them aside and/or
ignore their contributions, while we get on with other things. This
is why I think it is important that we split up nova to allow each
are to self-organize around what they consider to be priorities in
their area of interest / motivation. Not enabling that is going to
to continue to kill our community

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 07:26 AM, Daniel P. Berrange wrote:
 On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
 On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
 On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
 On Thu, 4 Sep 2014 11:24:29 +0100
 Daniel P. Berrange berra...@redhat.com wrote:

  - A fairly significant amount of nova code would need to be
considered semi-stable API. Certainly everything under nova/virt
and any object which is passed in/out of the virt driver API.
Changes to such APIs would have to be done in a backwards
compatible manner, since it is no longer possible to lock-step
change all the virt driver impls. In some ways I think this would
be a good thing as it will encourage people to put more thought
into the long term maintainability of nova internal code instead
of relying on being able to rip it apart later, at will.

  - The nova/virt/driver.py class would need to be much better
specified. All parameters / return values which are opaque dicts
must be replaced with objects + attributes. Completion of the
objectification work is mandatory, so there is cleaner separation
between virt driver impls  the rest of Nova.

 I think for this to work well with multiple repositories and drivers
 having different priorities over implementing changes in the API it
 would not just need to be semi-stable, but stable with versioning built
 in from the start to allow for backwards incompatible changes. And
 the interface would have to be very well documented including things
 such as what exceptions are allowed to be raised through the API.
 Hopefully this would be enforced through code as well. But as long as
 driver maintainers are willing to commit to this extra overhead I can
 see it working. 

 With our primary REST or RPC APIs we're under quite strict rules about
 what we can  can't change - almost impossible to remove an existing
 API from the REST API for example. With the internal virt driver API
 we would probably have a little more freedom. For example, I think
 if we found an existing virt driver API that was insufficient for a
 new bit of work, we could add a new API in parallel with it, give the
 virt drivers 1 dev cycle to convert, and then permanently delete the
 original virt driver API. So a combination of that kind of API
 replacement,  versioning for some data structures/objects, and use of
 the capabilties flags would probably be sufficient. That's what I mean
 by semi-stable here - no need to maintain existing virt driver APIs
 indefinitely - we can remove  replace them in reasonably short time
 scales as long as we avoid any lock-step updates.

 I have spent a lot of time over the last year working on things that
 require coordinated code lands between projects it's much more
 friction than you give it credit.

 Every added git tree adds a non linear cost to mental overhead, and a
 non linear integration cost. Realistically the reason the gate is in the
 state it is has a ton to do with the fact that it's integrating 40 git
 trees. Because virt drivers run in the process space of Nova Compute,
 they can pretty much do whatever, and the impacts are going to be
 somewhat hard to figure out.

 Also, if spinning these out seems like the right idea, I think nova-core
 needs to retain core rights over the drivers as well. Because there do
 need to be veto authority on some of the worst craziness.
 
 If they want todo crazy stuff, let them live or die with the
 consequences.
 
 If the VMWare team stopped trying to build a distributed lock manager
 inside their compute driver, or the Hyperv team didn't wait until J2 to
 start pushing patches, I think there would be more trust in some of
 these teams. But, I am seriously concerned in both those cases, and the
 slow review there is a function of a historic lack of trust in judgment.
 I also personally went on a moratorium a year ago in reviewing either
 driver because entities at both places where complaining to my
 management chain through back channels that I was -1ing their code...
 
 I venture to suggest that the reason we care so much about those kind
 of things is precisely because of our policy of pulling them in the
 tree. Having them in tree means their quality (or not) reflects directly
 on the project as a whole. Separate them from Nova as a whole and give
 them control of their own desinty and they can deal with the consequences
 of their actions and people can judge the results for themselves.
 
 We don't have the time or resources go continue baby-sitting them
 ourselves - attempting todo so has just resulted in a scenario where
 they end up getting largely ignored as you admit here. This ultimately
 makes their quality even worse, because the lack of reviewer availability
 means they stand little chance of pushing through the work to fix what
 problems they have. We've seen this first hand with the major refactoring
 that vmware driver team has been 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sean Dague
On 09/05/2014 07:40 AM, Daniel P. Berrange wrote:
 On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
 On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
 A handy example of this I can think of is the currently granted FFE for
 serial consoles - consider how much of the code went into the common
 part vs. the libvirt specific part, I would say the ratio is very close
 to 1 if not even in favour of the common part (current 4 outstanding
 patches are all for core, and out of the 5 merged - only one of them was
 purely libvirt specific, assuming virt/ will live in nova-common).

 Joe asked a similar question elsewhere on the thread.

 Once again - I am not against doing it - what I am saying is that we
 need to look into this closer as it may not be as big of a win from the
 number of changes needed per feature as we may think.

 Just some things to think about with regards to the whole idea, by no
 means exhaustive.

 So maybe the better question is: what are the top sources of technical
 debt in Nova that we need to address? And if we did, everyone would be
 more sane, and feel less burnt.

 Maybe the drivers are the worst debt, and jettisoning them makes them
 someone else's problem, so that helps some. I'm not entirely convinced
 right now.

 I think Cells represents a lot of debt right now. It doesn't fully work
 with the rest of Nova, and produces a ton of extra code paths special
 cased for the cells path.

 The Scheduler has a ton of debt as has been pointed out by the efforts
 in and around Gannt. The focus has been on the split, but realistically
 I'm with Jay is that we should focus on the debt, and exposing a REST
 interface in Nova.

 What about the Nova objects transition? That continues to be slow
 because it's basically Dan (with a few other helpers from time to time).
 Would it be helpful if we did an all hands on deck transition of the
 rest of Nova for K1 and just get it done? Would be nice to have the bulk
 of Nova core working on one thing like this and actually be in shared
 context with everyone else for a while.
 
 I think the idea that we can tell everyone in Nova what they should
 focus on for a cycle, or more generally, is doomed to failure. This
 isn't a closed source company controlled project where you can dictate
 what everyones priority must be. We must accept that rely on all our
 contributors good will in voluntarily giving their time  resource to
 the projct, to scratch whatever itch they have in the project. We have
 to encourage them to want to work nova and demonstrate that we value
 whatever form of contributor they choose to make. If we have technical
 debt that we think is important to address we need to illustrate /
 show people why they should care about helping. If they none the less
 decide that work isn't for them, we can't just cast them aside and/or
 ignore their contributions, while we get on with other things. This
 is why I think it is important that we split up nova to allow each
 are to self-organize around what they consider to be priorities in
 their area of interest / motivation. Not enabling that is going to
 to continue to kill our community

I'm getting tired of the reprieve that because we are an Open Source
project declaring priorities is pointless, because it's not. I would say
it's actually the exception that a developer wakes up in the morning and
says I completely disregard what anyone else thinks is important in
this project, this is what I'm going to do today. Because if that's how
they felt they wouldn't choose to be part of a community, they would
just go do their own thing. Lone wolfs by definition don't form
communities.

And the FFE process is firm demonstration that when we pick a small
number of things to look at, they move a lot more quickly.

People are always free to work on whatever they want. But providing some
focus to debt clean up. FFE++ effectively, would be really nice.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 07:49:04AM -0400, Sean Dague wrote:
 On 09/05/2014 07:26 AM, Daniel P. Berrange wrote:
  On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
  On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
  On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
  On Thu, 4 Sep 2014 11:24:29 +0100
  Daniel P. Berrange berra...@redhat.com wrote:
 
   - A fairly significant amount of nova code would need to be
 considered semi-stable API. Certainly everything under nova/virt
 and any object which is passed in/out of the virt driver API.
 Changes to such APIs would have to be done in a backwards
 compatible manner, since it is no longer possible to lock-step
 change all the virt driver impls. In some ways I think this would
 be a good thing as it will encourage people to put more thought
 into the long term maintainability of nova internal code instead
 of relying on being able to rip it apart later, at will.
 
   - The nova/virt/driver.py class would need to be much better
 specified. All parameters / return values which are opaque dicts
 must be replaced with objects + attributes. Completion of the
 objectification work is mandatory, so there is cleaner separation
 between virt driver impls  the rest of Nova.
 
  I think for this to work well with multiple repositories and drivers
  having different priorities over implementing changes in the API it
  would not just need to be semi-stable, but stable with versioning built
  in from the start to allow for backwards incompatible changes. And
  the interface would have to be very well documented including things
  such as what exceptions are allowed to be raised through the API.
  Hopefully this would be enforced through code as well. But as long as
  driver maintainers are willing to commit to this extra overhead I can
  see it working. 
 
  With our primary REST or RPC APIs we're under quite strict rules about
  what we can  can't change - almost impossible to remove an existing
  API from the REST API for example. With the internal virt driver API
  we would probably have a little more freedom. For example, I think
  if we found an existing virt driver API that was insufficient for a
  new bit of work, we could add a new API in parallel with it, give the
  virt drivers 1 dev cycle to convert, and then permanently delete the
  original virt driver API. So a combination of that kind of API
  replacement,  versioning for some data structures/objects, and use of
  the capabilties flags would probably be sufficient. That's what I mean
  by semi-stable here - no need to maintain existing virt driver APIs
  indefinitely - we can remove  replace them in reasonably short time
  scales as long as we avoid any lock-step updates.
 
  I have spent a lot of time over the last year working on things that
  require coordinated code lands between projects it's much more
  friction than you give it credit.
 
  Every added git tree adds a non linear cost to mental overhead, and a
  non linear integration cost. Realistically the reason the gate is in the
  state it is has a ton to do with the fact that it's integrating 40 git
  trees. Because virt drivers run in the process space of Nova Compute,
  they can pretty much do whatever, and the impacts are going to be
  somewhat hard to figure out.
 
  Also, if spinning these out seems like the right idea, I think nova-core
  needs to retain core rights over the drivers as well. Because there do
  need to be veto authority on some of the worst craziness.
  
  If they want todo crazy stuff, let them live or die with the
  consequences.
  
  If the VMWare team stopped trying to build a distributed lock manager
  inside their compute driver, or the Hyperv team didn't wait until J2 to
  start pushing patches, I think there would be more trust in some of
  these teams. But, I am seriously concerned in both those cases, and the
  slow review there is a function of a historic lack of trust in judgment.
  I also personally went on a moratorium a year ago in reviewing either
  driver because entities at both places where complaining to my
  management chain through back channels that I was -1ing their code...
  
  I venture to suggest that the reason we care so much about those kind
  of things is precisely because of our policy of pulling them in the
  tree. Having them in tree means their quality (or not) reflects directly
  on the project as a whole. Separate them from Nova as a whole and give
  them control of their own desinty and they can deal with the consequences
  of their actions and people can judge the results for themselves.
  
  We don't have the time or resources go continue baby-sitting them
  ourselves - attempting todo so has just resulted in a scenario where
  they end up getting largely ignored as you admit here. This ultimately
  makes their quality even worse, because the lack of reviewer availability
  means they stand little chance of 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Thierry Carrez
Daniel P. Berrange wrote:
 For a long time I've use the LKML 'subsystem maintainers' model as the
 reference point for ideas. In a more LKML like model, each virt team
 (or other subsystem team) would have their own separate GIT repo with
 a complete Nova codebase, where they did they day to day code submissions,
 reviews and merges. Periodically the primary subsystem maintainer would
 submit a large pull / merge requests to the overall Nova maintainer.
 The $1,000,000 question in such a model is what kind of code review
 happens during the big pull requests to integrate subsystem trees. 

Please note that the Kernel subsystem model is actually a trust tree
based on 20 years of trust building. OpenStack is only 4 years old, so
it's difficult to apply the same model as-is.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
help but not solve this problem -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent 
ones due to my house move (finally done, yay!). I believe we are mostly 
aligned on the plan of record, but I see no urgency in splitting out the 
scheduler. I only see urgency on cleaning up the interfaces. But, that 
said, let's not highjack Dan's thread here too much. We can discuss on 
IRC. I was only saying that Don's comment that splitting the scheduler 
out would help solve the bandwidth issues should be predicated on the 
same contingency that Dan placed on splitting out the virt drivers: that 
the internal interfaces be cleaned up, documented and stabilized.


snip


So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call half-cores)
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation 
proposals. It's just really a technical problem to solve w.r.t. Gerrit 
permissions.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
help but not solve this problem -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent 
ones due to my house move (finally done, yay!). I believe we are 
mostly aligned on the plan of record, but I see no urgency in 
splitting out the scheduler. I only see urgency on cleaning up the 
interfaces. But, that said, let's not highjack Dan's thread here too 
much. We can discuss on IRC. I was only saying that Don's comment that 
splitting the scheduler out would help solve the bandwidth issues 
should be predicated on the same contingency that Dan placed on 
splitting out the virt drivers: that the internal interfaces be 
cleaned up, documented and stabilized.


snip


So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call half-cores)
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation 
proposals. It's just really a technical problem to solve w.r.t. Gerrit 
permissions.




Well, that just requires new Gerrit groups and a new label (like 
Subteam-Approved) so that members of this group could just 
+Subteam-Approved if they're OK (here I imagine 2 people from the group 
labelling it)


Of course, all the groups could have permissions to label any file of 
Nova, but here we can just define a gentleman's agreement, like we do 
for having two +2s before approving.


That would say that cores could just search using Gerrit with 
'label:Subteam-Approved=1'


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 12:48, Sean Dague a écrit :

On 09/05/2014 03:02 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:22, Michael Still a écrit :

On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange
berra...@redhat.com wrote:

[Heavy snipping because of length]


The radical (?) solution to the nova core team bottleneck is thus to
follow this lead and split the nova virt drivers out into separate
projects and delegate their maintainence to new dedicated teams.

   - Nova becomes the home for the public APIs, RPC system, database
 persistent and the glue that ties all this together with the
 virt driver API.

   - Each virt driver project gets its own core team and is responsible
 for dealing with review, merge  release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

   - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
   - splitting all the virt drivers out of the nova tree

Ahem, IIRC, there is a third proposal for Kilo :
  - create subteam's half-cores responsible for reviewing patch's
iterations and send to cores approvals requests once they consider the
patch enough stable for it.

As I explained, it would allow to free up reviewing time for cores
without loosing the control over what is being merged.

I don't really understand how the half core idea works outside of a math
equation, because the point is in core is to have trust over the
judgement of your fellow core members so that they can land code when
you aren't looking. I'm not sure how I manage to build up half trust in
someone any quicker.


Well, this thread is becoming huge so that's becoming hard to follow all 
the discussion but I explained the idea elsewhere. Let me just provide 
it here too :
The idea is *not* to land patches by the halfcores. Core team will still 
be fully responsible for approving patches. The main problem in Nova is 
that cores are spending lots of time because they review each iteration 
of a patch, and also have to look at if a patch is good or not.


That's really time consuming, and for most of the time, quite 
frustrating as it requires to follow the patch's life, so there are high 
risks that your core attention is becoming distracted over the life of 
the patch.


Here, the idea is to reduce dramatically this time by having teams 
dedicated to specific areas (as it's already done anyway for the various 
majority of reviewers) who could on their own take time for reviewing 
all the iterations. Of course, that doesn't mean cores would loose the 
possibility to specifically follow a patch and bypass the halfcores, 
that's just for helping them if they're overwhelmed.


About the question of trusting cores or halfcores, I can just say that 
Nova team is anyway needing to grow up or divide it so the trusting 
delegation has to be real anyway.


This whole process is IMHO very encouraging for newcomers because that 
creates dedicated teams that could help them to improve their changes, 
and not waiting 2 months for getting a -1 and a frank reply.



As I said elsewhere, I dislike the slots proposal because it sends to 
the developers the message that the price to pay for contributing to 
Nova is increasing. Again, that's not because you're prioritizing that 
you increase your velocity, that's 2 distinct subjects.


-Sylvain



-Sean




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 08:58 AM, Sylvain Bauza wrote:

Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
help but not solve this problem -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent
ones due to my house move (finally done, yay!). I believe we are
mostly aligned on the plan of record, but I see no urgency in
splitting out the scheduler. I only see urgency on cleaning up the
interfaces. But, that said, let's not highjack Dan's thread here too
much. We can discuss on IRC. I was only saying that Don's comment that
splitting the scheduler out would help solve the bandwidth issues
should be predicated on the same contingency that Dan placed on
splitting out the virt drivers: that the internal interfaces be
cleaned up, documented and stabilized.

snip


So, this effort requires at least one cycle, and as Dan stated, there is
urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call half-cores)
for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation
proposals. It's just really a technical problem to solve w.r.t. Gerrit
permissions.



Well, that just requires new Gerrit groups and a new label (like
Subteam-Approved) so that members of this group could just
+Subteam-Approved if they're OK (here I imagine 2 people from the group
labelling it)


And what about code that crosses module boundaries? Would we need a 
LibvirtSubteamApproved, SchedulerSubteamApproved, etc?



Of course, all the groups could have permissions to label any file of
Nova, but here we can just define a gentleman's agreement, like we do
for having two +2s before approving.


Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce 
this kind of policy, that's what I was getting at.



That would say that cores could just search using Gerrit with
'label:Subteam-Approved=1'


Interesting, yes, that would be useful.

-jay


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Day, Phil


 -Original Message-
 From: Sean Dague [mailto:s...@dague.net]
 Sent: 05 September 2014 11:49
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out
 virt drivers
 
 On 09/05/2014 03:02 AM, Sylvain Bauza wrote:
 
 
  Ahem, IIRC, there is a third proposal for Kilo :
   - create subteam's half-cores responsible for reviewing patch's
  iterations and send to cores approvals requests once they consider the
  patch enough stable for it.
 
  As I explained, it would allow to free up reviewing time for cores
  without loosing the control over what is being merged.
 
 I don't really understand how the half core idea works outside of a math
 equation, because the point is in core is to have trust over the judgement of
 your fellow core members so that they can land code when you aren't
 looking. I'm not sure how I manage to build up half trust in someone any
 quicker.
 
   -Sean
 
You seem to be looking at a model Sean where trust is purely binary - you’re 
either trusted to know about all of Nova or not trusted at all.  

What Sylvain is proposing (I think) is something more akin to having folks that 
are trusted in some areas of the system and/or trusted to be right enough of 
the time that their reviewing skills take a significant part of the burden of 
the core reviewers.That kind of incremental development of trust feels like 
a fairly natural model me.Its some way between the full divide and rule 
approach of splitting out various components (which doesn't feel like a short 
term solution) and the blanket approach of adding more cores.

Making it easier to incrementally grant trust, and having the processes and 
will to remove it if its seen to be misused feels to me like it has to be part 
of the solution to breaking out of the we need more people we trust, but we 
don’t feel comfortable trusting more than N people at any one time.  Sometimes 
you have to give people a chance in small, well defined and controlled steps.

Phil

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Sylvain Bauza


Le 05/09/2014 15:11, Jay Pipes a écrit :

On 09/05/2014 08:58 AM, Sylvain Bauza wrote:

Le 05/09/2014 14:48, Jay Pipes a écrit :

On 09/05/2014 02:59 AM, Sylvain Bauza wrote:

Le 05/09/2014 01:26, Jay Pipes a écrit :

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that
Dan's proposal features quite prominently the following:

== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying
needs to be done to the interfaces between nova-conductor,
nova-compute, and nova-scheduler *before* any split of the scheduler
code is even remotely feasible.

Splitting the scheduler out before this is done would actually not
help but not solve this problem -- it would instead further the
problem, IMO.



Jay, we agreed on a plan to carry on, please be sure we're working on
it, see the Gantt meetings logs for what my vision is.


I've attended most of the Gantt meetings, except for a couple recent
ones due to my house move (finally done, yay!). I believe we are
mostly aligned on the plan of record, but I see no urgency in
splitting out the scheduler. I only see urgency on cleaning up the
interfaces. But, that said, let's not highjack Dan's thread here too
much. We can discuss on IRC. I was only saying that Don's comment that
splitting the scheduler out would help solve the bandwidth issues
should be predicated on the same contingency that Dan placed on
splitting out the virt drivers: that the internal interfaces be
cleaned up, documented and stabilized.

snip

So, this effort requires at least one cycle, and as Dan stated, 
there is

urgency, so I think we need to identify a short-term solution which
doesn't require refactoring. My personal opinion is what Russell and
Thierry expressed, ie. subteam delegation (to what I call 
half-cores)

for iterations and only approvals for cores.


Yeah, I don't have much of an issue with the subteam delegation
proposals. It's just really a technical problem to solve w.r.t. Gerrit
permissions.



Well, that just requires new Gerrit groups and a new label (like
Subteam-Approved) so that members of this group could just
+Subteam-Approved if they're OK (here I imagine 2 people from the group
labelling it)


And what about code that crosses module boundaries? Would we need a 
LibvirtSubteamApproved, SchedulerSubteamApproved, etc?




Luckily not. I think we only need one more label (we only have 3 now : 
Verified, Code-Review, Approved).


Here the key thing is having a search label that cores can consume 
because they know that this label is worth of interest. If something is 
crosses module, then that's something that probably a core would help.


For example, if I'm an API halfcore, I can subteam-approve all the 
changes related to the API itself (so that encourages small and readable 
patches btw.) but I leave my turn if I'm looking at something I don't 
know enough (or I provide +1)


The porting idea is to encourage reviewing because the step is not so 
high as if I wanted to be core. On the other hand, if an halfcore is 
becoming enough trustable (because he also provides good +1s for other 
areas and is enough involved in the release process), then this folk is 
a good candidate for becoming core.



As you identified, most of the proposal is based on gentle-person 
agreement because Gerrit is not enough flexible for doing that (although 
since 2.8, you can search all patches related to a path, like 
file:^nova/scheduler/*)


-Sylvain

Of course, all the groups could have permissions to label any file of
Nova, but here we can just define a gentleman's agreement, like we do
for having two +2s before approving.


Yes, it would be a gentle-person's agreement. :) Gerrit cannot enforce 
this kind of policy, that's what I was getting at.



That would say that cores could just search using Gerrit with
'label:Subteam-Approved=1'


Interesting, yes, that would be useful.

-jay


-Sylvain


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Eric Windisch


  - Each virt driver project gets its own core team and is responsible
for dealing with review, merge  release of their codebase.

 Note, I really do mean *all* virt drivers should be separate. I do
 not want to see some virt drivers split out and others remain in tree
 because I feel that signifies that the out of tree ones are second
 class citizens.


+1. I made this same proposal to Michael during the mid-cycle. However, I
haven't wanted to conflate this issue with bringing Docker back into Nova.
For the Docker driver in particular, I feel that being able to stay out of
tree and having our own core team would be beneficial, but  I wouldn't want
to do this unless it applied equally to all drivers.

-- 
Regards,
Eric Windisch
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 06:29 AM, John Garbutt wrote:

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.


-1

Without first cleaning up the interfaces around resource tracking, claim 
creation and processing, and the communication interfaces between the 
nova-conductor, nova-scheduler, and nova-compute.


I see no urgency at all in splitting out the scheduler. The cleanup of 
the interfaces around the resource tracker and scheduler has great 
priority, though, IMO.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Kevin L. Mitchell
On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote:
  2. Removal of drivers other than the reference implementation for each
  project could be the healthiest option
  a. Requires transparent, public, automated 3'rd party CI
  b. Requires a TRUE plugin architecture and mentality
  c. Requires a stable and well defined API
 
 As mentioned in the original mail I don't want to see a situation where
 we end up with some drivers in tree and others out of tree as it sets up
 bad dynamics within the project. Those out of tree will always have the
 impression of being second class citizens and thus there will be constant
 pressure to accept drivers back into tree. The so called 'reference'
 driver that stayed in tree would also continue to be penalized in the
 way it is today, and so its development would be disadvantaged compared
 to the out of tree drivers.

I have one quibble with the notion of not even one driver in core: I
think it is probably useful to include a dummy, do-nothing driver that
can be used for in-tree functional tests and as an example to point
those interested in writing a driver.  Then, the second-class citizen
is the one actually in the tree :)  Beyond that, I agree with this
proposal: it has never made sense to me that *all* drivers live in the
tree, and it actually offends my sense of organization to have the tree
so cluttered; we split functions when they get too big, we split modules
when they get too big, and we create subdirectories when packages get
too big, so why not split repos when they get too big?
-- 
Kevin L. Mitchell kevin.mitch...@rackspace.com
Rackspace


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Lucas Alvares Gomes
 I look at what we do with Ironic testing current as a guide here.
 We have tempest job that runs against Nova, that validates changes
 to nova don't break the separate Ironic git repo. So my thought
 is that all our current tempest jobs would simply work in that
 way. IOW changes to so called nova common would run jobs that
 validate the change against all the virt driver git repos. I think
 this kind of setup is pretty much mandatory for split repos to be
 viable, because I don't want to see us loose testing coverage in
 this proposed change.

Thanks Daniel for raising it this problem.

Yeah I think that what we did with Ironic while the driver is* out of
the Nova tree serves as a good example. I also think that having
drivers out of the tree is possible, making the tests run against the
nova-common and assert things didn't break is no problem. But as you
described before the process of code submission was quite painful and
required a lot of effort and coordination from the Ironic and Nova
teams, we would need to improve that.

Another problem we will have in splitting the drivers out is that
classic limitation of launchpad blueprints, we can't track tasks
across multiple projects. (This will change once Storyboard is
completed I guess).

But that's all a long-term solution. In the short term I don't have
see any real solution yet, this thing about asking companies/projects
that has a driver in Nova to help with reviews is not so bad IMO. I've
started reviewing code in Nova today and will continue doing that,
maybe aiming for core so that we can speed up the future reviews to
the Ironic driver.

Now, I let me throw a crazy idea here into the mix (it might be stupid, but):

Maybe Nova is doing much more than it should, deprecating the
baremetal and network part and splitting the scheduler out of the
project helps a lot. But, and if other parts were splitted as well,
like managing flavors, creating the instances etc... And then Nova can
be the thing that knows how to talk/manage hypervisors only and won't
have to deal with crazy cases like the Ironic where we try make real
machines looks  feel like VMs to fit into Nova, because that's
painful and I think we are going to have many limitations if we
continue to do that (I believe the same may happen with the Docker
driver).

So if we have another project on top of Nova, Ironic and
$CONTAINER_PROJECT_NAME** that abstract all the rest and only talks to
Nova when a VM is going to be deployed or Ironic when a Baremetal
machine is going to be deployed, etc... Maybe then Nova will be
considerable small and can keep all drivers in tree (hypervisor
drivers only, no Docker or Ironic).

* was tempted to write 'was' there :)
** A new project that will know how to handle the containers case.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Daniel P. Berrange
On Fri, Sep 05, 2014 at 10:25:09AM -0500, Kevin L. Mitchell wrote:
 On Fri, 2014-09-05 at 10:26 +0100, Daniel P. Berrange wrote:
   2. Removal of drivers other than the reference implementation for each
   project could be the healthiest option
   a. Requires transparent, public, automated 3'rd party CI
   b. Requires a TRUE plugin architecture and mentality
   c. Requires a stable and well defined API
  
  As mentioned in the original mail I don't want to see a situation where
  we end up with some drivers in tree and others out of tree as it sets up
  bad dynamics within the project. Those out of tree will always have the
  impression of being second class citizens and thus there will be constant
  pressure to accept drivers back into tree. The so called 'reference'
  driver that stayed in tree would also continue to be penalized in the
  way it is today, and so its development would be disadvantaged compared
  to the out of tree drivers.
 
 I have one quibble with the notion of not even one driver in core: I
 think it is probably useful to include a dummy, do-nothing driver that
 can be used for in-tree functional tests and as an example to point
 those interested in writing a driver.  Then, the second-class citizen
 is the one actually in the tree :)  Beyond that, I agree with this
 proposal: it has never made sense to me that *all* drivers live in the
 tree, and it actually offends my sense of organization to have the tree
 so cluttered; we split functions when they get too big, we split modules
 when they get too big, and we create subdirectories when packages get
 too big, so why not split repos when they get too big?

Oh sure, having a fake virt driver in tree is fine and indeed desirable
for the reasons you mention. I was exclusively thinking about the real
virt drivers in my earlier statement.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Dugger, Donald D
Well, I and I believe a few others feel a slightly higher sense of urgency 
about splitting out the scheduler but I don't want to hijack this thread for 
that debate.  Fair warning, I intend to start a new thread where we can talk 
specifically about the scheduler split, I'm afraid we're in the situation where 
we're all in agreement but everyone has a different view of what that agreement 
is.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jay Pipes [mailto:jaypi...@gmail.com] 
Sent: Friday, September 5, 2014 8:07 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
virt drivers

On 09/05/2014 06:29 AM, John Garbutt wrote:
 Scheduler: I think we need to split out the scheduler with a similar 
 level of urgency. We keep blocking features on the split, because we 
 know we don't have the review bandwidth to deal with them. Right now I 
 am talking about a compute related scheduler in the compute program, 
 that might evolve to worry about other services at a later date.

-1

Without first cleaning up the interfaces around resource tracking, claim 
creation and processing, and the communication interfaces between the 
nova-conductor, nova-scheduler, and nova-compute.

I see no urgency at all in splitting out the scheduler. The cleanup of the 
interfaces around the resource tracker and scheduler has great priority, 
though, IMO.

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Chris Friesen

On 09/05/2014 03:52 AM, Daniel P. Berrange wrote:



So my biggest fear with a model where each team had their own full
Nova tree and did large pull requests, is that we'd suffer major
pain during the merging of large pull requests, especially if any
of the merges touched common code. It could make the pull requests
take a really long time to get accepted into the primary repo.

By constrast with split out git repos per virt driver code, we will
only ever have 1 stage of code review for each patch. Changes to
common code would go straight to main nova common repo and so get
reviewed by the experts there without delay, avoiding the 2nd stage
of review from merge requests.


Why treat things differently?  It seems to me that even in the first 
scenario you could still send common code changes straight to the main 
nova repo.  Then the pulls from the virt repo would literally only touch 
the virt code in the common repo.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Russell Bryant
On 09/05/2014 10:06 AM, Jay Pipes wrote:
 On 09/05/2014 06:29 AM, John Garbutt wrote:
 Scheduler: I think we need to split out the scheduler with a similar
 level of urgency. We keep blocking features on the split, because we
 know we don't have the review bandwidth to deal with them. Right now I
 am talking about a compute related scheduler in the compute program,
 that might evolve to worry about other services at a later date.
 
 -1
 
 Without first cleaning up the interfaces around resource tracking, claim
 creation and processing, and the communication interfaces between the
 nova-conductor, nova-scheduler, and nova-compute.
 
 I see no urgency at all in splitting out the scheduler. The cleanup of
 the interfaces around the resource tracker and scheduler has great
 priority, though, IMO.

I'd just reframe things ... I'd like the work you're referring to here
be treated as an obvious key pre-requisite to a split, and this cleanup
is what should be treated with urgency by those with a vested interest
in getting more autonomy around scheduler development.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Jay Pipes

On 09/05/2014 03:01 PM, Russell Bryant wrote:

On 09/05/2014 10:06 AM, Jay Pipes wrote:

On 09/05/2014 06:29 AM, John Garbutt wrote:

Scheduler: I think we need to split out the scheduler with a similar
level of urgency. We keep blocking features on the split, because we
know we don't have the review bandwidth to deal with them. Right now I
am talking about a compute related scheduler in the compute program,
that might evolve to worry about other services at a later date.


-1

Without first cleaning up the interfaces around resource tracking, claim
creation and processing, and the communication interfaces between the
nova-conductor, nova-scheduler, and nova-compute.

I see no urgency at all in splitting out the scheduler. The cleanup of
the interfaces around the resource tracker and scheduler has great
priority, though, IMO.


I'd just reframe things ... I'd like the work you're referring to here
be treated as an obvious key pre-requisite to a split, and this cleanup
is what should be treated with urgency by those with a vested interest
in getting more autonomy around scheduler development.


Sure, that's a perfectly gentle way of putting it :)

Thanks!
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread James Bottomley

On Fri, 2014-09-05 at 08:02 -0400, Sean Dague wrote:
 On 09/05/2014 07:40 AM, Daniel P. Berrange wrote:
  On Fri, Sep 05, 2014 at 07:12:37AM -0400, Sean Dague wrote:
  On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
  A handy example of this I can think of is the currently granted FFE for
  serial consoles - consider how much of the code went into the common
  part vs. the libvirt specific part, I would say the ratio is very close
  to 1 if not even in favour of the common part (current 4 outstanding
  patches are all for core, and out of the 5 merged - only one of them was
  purely libvirt specific, assuming virt/ will live in nova-common).
 
  Joe asked a similar question elsewhere on the thread.
 
  Once again - I am not against doing it - what I am saying is that we
  need to look into this closer as it may not be as big of a win from the
  number of changes needed per feature as we may think.
 
  Just some things to think about with regards to the whole idea, by no
  means exhaustive.
 
  So maybe the better question is: what are the top sources of technical
  debt in Nova that we need to address? And if we did, everyone would be
  more sane, and feel less burnt.
 
  Maybe the drivers are the worst debt, and jettisoning them makes them
  someone else's problem, so that helps some. I'm not entirely convinced
  right now.
 
  I think Cells represents a lot of debt right now. It doesn't fully work
  with the rest of Nova, and produces a ton of extra code paths special
  cased for the cells path.
 
  The Scheduler has a ton of debt as has been pointed out by the efforts
  in and around Gannt. The focus has been on the split, but realistically
  I'm with Jay is that we should focus on the debt, and exposing a REST
  interface in Nova.
 
  What about the Nova objects transition? That continues to be slow
  because it's basically Dan (with a few other helpers from time to time).
  Would it be helpful if we did an all hands on deck transition of the
  rest of Nova for K1 and just get it done? Would be nice to have the bulk
  of Nova core working on one thing like this and actually be in shared
  context with everyone else for a while.
  
  I think the idea that we can tell everyone in Nova what they should
  focus on for a cycle, or more generally, is doomed to failure. This
  isn't a closed source company controlled project where you can dictate
  what everyones priority must be. We must accept that rely on all our
  contributors good will in voluntarily giving their time  resource to
  the projct, to scratch whatever itch they have in the project. We have
  to encourage them to want to work nova and demonstrate that we value
  whatever form of contributor they choose to make. If we have technical
  debt that we think is important to address we need to illustrate /
  show people why they should care about helping. If they none the less
  decide that work isn't for them, we can't just cast them aside and/or
  ignore their contributions, while we get on with other things. This
  is why I think it is important that we split up nova to allow each
  are to self-organize around what they consider to be priorities in
  their area of interest / motivation. Not enabling that is going to
  to continue to kill our community
 
 I'm getting tired of the reprieve that because we are an Open Source
 project declaring priorities is pointless, because it's not. I would say
 it's actually the exception that a developer wakes up in the morning and
 says I completely disregard what anyone else thinks is important in
 this project, this is what I'm going to do today. Because if that's how
 they felt they wouldn't choose to be part of a community, they would
 just go do their own thing. Lone wolfs by definition don't form
 communities.

Actually, I don't think this analysis is accurate.  Some people are
simply interested in small aspects of a project.  It's the scratch your
own itch part of open source.  The thing which makes itch scratchers
not lone wolfs is the desire to go the extra mile to make what they've
done useful to the community.  If they never do this, they likely have a
forked repo with only their changes (and are the epitome of a lone
wolf).  If you scratch your own itch and make the effort to get it
upstream, you're assisting the community (even if that's the only piece
of code you do) and that assistance makes you (at least for a time) part
of the community.

A community doesn't necessarily require continuity from all its
elements.  It requires continuity from some (the core, if you will), but
it also allows for contributions from people who only have one or two
things they need doing.  For OpenStack to convert its users into its
contributors, it is going to have to embrace this, because they likely
only need a couple of things fixing, so they'll pop into the community,
fix what they need fixing and then go back to being users again.

Some projects, the linux kernel in particular, deliberately don't
enforce 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread Nathanael Burton
Daniel,

Thanks for the well thought out and thorough proposal to help Nova.

As an OpenStack operator/developer since Cactus time, it has definitely
gotten harder and harder to get fixes in Nova for small bugs that we find
running at scale with production systems. This forces us to maintain more
and more custom patches in-house (or for longer periods of time).  The huge
amount of time necessary to shepherd patches through review discourages
additional devs from contributing patches because of the amount of time
investment required.

I believe whatever we can do to improve the ability to fix technical debt
within Nova and both keep and grow the non-core contributors of Nova would
be greatly beneficial.

Thanks!

Nate
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-05 Thread James Bottomley
On Fri, 2014-09-05 at 14:14 +0200, Thierry Carrez wrote:
 Daniel P. Berrange wrote:
  For a long time I've use the LKML 'subsystem maintainers' model as the
  reference point for ideas. In a more LKML like model, each virt team
  (or other subsystem team) would have their own separate GIT repo with
  a complete Nova codebase, where they did they day to day code submissions,
  reviews and merges. Periodically the primary subsystem maintainer would
  submit a large pull / merge requests to the overall Nova maintainer.
  The $1,000,000 question in such a model is what kind of code review
  happens during the big pull requests to integrate subsystem trees. 
 
 Please note that the Kernel subsystem model is actually a trust tree
 based on 20 years of trust building. OpenStack is only 4 years old, so
 it's difficult to apply the same model as-is.

That's true but not entirely accurate.  The kernel maintainership is a
trust tree, but not every person in that tree has been in the position
for 20 years.  We have one or two who have (Dave Miller, net maintainer,
for instance), but we have some newcomers: Sarah Sharp has only been on
USB3.0 for a year.  People pass in and out of the maintainer tree all
the time.

In many ways, the Open Stack core model is also a trust tree (you elect
people to the core and support their nominations because you trust them
to do the required job).  It's not a 1 for 1 conversion, but it should
be possible to derive the trust you need from the model you already
have, should you wish to make OpenStack function more like the Linux
Kernel.

Essentially Daniel's proposal boils down to making the trust boundaries
align with separated community interests to get more scaling in the
model.  This is very similar to the way the kernel operates: most
maintainers only have expertise in their own areas.  We have a few
people with broad reach, like Andrew and Linus, but by and large most
people settle down in a much smaller area.  However, you don't have to
follow the kernel model to get this to happen, you just have to identify
the natural interest boundaries of the contributors and align around
them (provided they have enough mass to form their own community).

James



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
Position statement
==

Over the past year I've increasingly come to the conclusion that
Nova is heading for (or probably already at) a major crisis. If
steps are not taken to avert this, the project is likely to loose
a non-trivial amount of talent, both regular code contributors and
core team members. That includes myself. This is not good for
Nova's long term health and so should be of concern to anyone
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive
summary is that the nova-core team is an unfixable bottleneck
in our development process with our current project structure.
The only way I see to remove the bottleneck is to split the virt
drivers out of tree and let them all have their own core teams
in their area of code, leaving current nova core to focus on
all the common code outside the virt driver impls. I, now, none
the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

 - Burn out of core team members from over work 
 - Difficulty bringing new talent into the core team
 - Long delay in getting code reviewed  merged
 - Marginalization of code areas which aren't popular
 - Increasing size of nova code through new drivers
 - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they
add up to a big problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear
that the backlog of code up for review never goes away. Even
intensive code review efforts at various points in the dev cycle
makes only a small impact on the backlog. This has a pretty
significant impact on core team members, as their work is never
done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with
the reviews in a more efficient manner than plain gerrit allows
for. These certainly help, but they can't ever solve the problem
on their own - just make it slightly more bearable. And this is
not even considering that core team members might have useful
contributions to make in ways beyond just code review. Ultimately
the workload is just too high to sustain the levels of review
required, so core team members will eventually burn out (as they
have done many times already).

Even if one person attempts to take the initiative to heavily
invest in review of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag
team' it is hard for one person to make a difference. The end
result is that a patch is +2d and then sits idle for weeks or
more until a merge conflict requires it to be reposted at which
point even that one +2 is lost. This is a pretty demotivating
outcome for both reviewers  the patch contributor.


New core team talent


It can't escape attention that the Nova core team does not grow
in size very often. When Nova was younger and its code base was
smaller, it was easier for contributors to get onto core because
the base level of knowledge required was that much smaller. To
get onto core today requires a major investment in learning Nova
over a year or more. Even people who potentially have the latent
skills may not have the time available to invest in learning the
entire of Nova.

With the number of reviews proposed to Nova, the core team should
probably be at least double its current size[1]. There is plenty of
expertize in the project as a whole but it is typically focused
into specific areas of the codebase. There is nowhere we can find
20 more people with broad knowledge of the codebase who could be
promoted even over the next year, let alone today. This is ignoring
that many existing members of core are relatively inactive due to
burnout and so need replacing. That means we really need another
25-30 people for core. That's not going to happen.


Code review delays
--

The obvious result of having too much work for too few reviewers
is that code contributors face major delays in getting their work
reviewed and merged. From personal experience, during Juno, I've
probably spent 1 week in aggregate on actual code development vs
8 weeks on waiting on code review. You have to constantly be on
alert for review comments because unless you can respond quickly
(and repost) while you still have the attention of the reviewer,
they may not be look again for days/weeks.

The length of time to get work merged serves as a demotivator to
actually do work in the first place. I've personally avoided doing
alot of code refactoring  cleanup work that would improve the
maintainability of the libvirt driver in the long term, because
I can't face the battle to get it reviewed  merged. Other people
have told me much the same. It is not uncommon to see changes that
have been pending for 2 dev cycles, not because the code was bad
but because 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Day, Phil
Hi Daniel,

Thanks for putting together such a thoughtful piece - I probably need to 
re-read it  few times to take in everything you're saying, but  a couple of 
thoughts that did occur to me:

- I can see how this could help where a change is fully contained within a virt 
driver, but I wonder how many of those there really are ?   Of the things that 
I've see go through recently nearly all also seem to touch the compute manager 
in someway, and a lot (like the Numa changes) also have impacts into the 
scheduler. Isn't it going to make it harder to get any of those changes in 
if they have to be co-ordinated across two or more repos ?  

- I think you hit the nail on the head in terms of the scope of Nova and how 
few people probably really understand all of it, but given the amount of trust 
that goes with being a core wouldn't it also be able to make people cores on 
the understanding that they will only approve code in the areas they are expert 
in ?It kind of feels that this happens to a large extent already, for 
example I don't see Chris or Ken'ichi  taking on work outside of the API layer. 
   It kind of feels as if given a small amount of trust we could have 
additional core reviewers focused on specific parts of the system without 
having to split up the code base if that's where the problem is.

Phil




 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: 04 September 2014 11:24
 To: OpenStack Development
 Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt
 drivers
 
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that Nova is
 heading for (or probably already at) a major crisis. If steps are not taken to
 avert this, the project is likely to loose a non-trivial amount of talent, 
 both
 regular code contributors and core team members. That includes myself. This
 is not good for Nova's long term health and so should be of concern to
 anyone involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive summary is
 that the nova-core team is an unfixable bottleneck in our development
 process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt drivers out 
 of
 tree and let them all have their own core teams in their area of code, leaving
 current nova core to focus on all the common code outside the virt driver
 impls. I, now, none the less urge people to read the whole mail.
 
 
 Background information
 ==
 
 I see many factors coming together to form the crisis
 
  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing
 
 Each item on their own may not seem too bad, but combined they add up to
 a big problem.
 
 Core team burn out
 --
 
 Having been involved in Nova for several dev cycles now, it is clear that the
 backlog of code up for review never goes away. Even intensive code review
 efforts at various points in the dev cycle makes only a small impact on the
 backlog. This has a pretty significant impact on core team members, as their
 work is never done. At best, the dial is sometimes set to 10, instead of 11.
 
 Many people, myself included, have built tools to help deal with the reviews
 in a more efficient manner than plain gerrit allows for. These certainly help,
 but they can't ever solve the problem on their own - just make it slightly
 more bearable. And this is not even considering that core team members
 might have useful contributions to make in ways beyond just code review.
 Ultimately the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they have done
 many times already).
 
 Even if one person attempts to take the initiative to heavily invest in review
 of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag team' it is hard
 for one person to make a difference. The end result is that a patch is +2d and
 then sits idle for weeks or more until a merge conflict requires it to be
 reposted at which point even that one +2 is lost. This is a pretty 
 demotivating
 outcome for both reviewers  the patch contributor.
 
 
 New core team talent
 
 
 It can't escape attention that the Nova core team does not grow in size very
 often. When Nova was younger and its code base was smaller, it was easier
 for contributors to get onto core because the base level of knowledge
 required was that much smaller. To get onto core today requires a major
 investment in learning Nova over a year or more. Even people who
 potentially have the latent skills may

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 12:14:39PM +, Day, Phil wrote:
 Hi Daniel,
 
 Thanks for putting together such a thoughtful piece - I probably need to
 re-read it  few times to take in everything you're saying, but  a couple
 of thoughts that did occur to me:
 
 - I can see how this could help where a change is fully contained within
 a virt driver, but I wonder how many of those there really are ?   Of the
 things that I've see go through recently nearly all also seem to touch the
 compute manager in someway, and a lot (like the Numa changes) also have 
 impacts into the scheduler. Isn't it going to make it harder to get
 any of those changes in if they have to be co-ordinated across two or
 more repos ?  

Actually, in my experiance of reviewing code this past cycle or two
I see a fairly significant portion of code that is entirely within
the scope of a virt driver. I'm also seeing that people are refraining
from actually doing changes to the virt drivers because of the burden
of getting code past review, so what we see today is probably not even
representative of the potential.

There are certainly some high profile exceptions such as the NUMA
work, or the new serial console work where you're going to cross the
repos. In such work we already try to break patches into isolated
pieces, so the stuff touching common code is a separate commit from
the stuff touching virt code. This is general good practice to be
encouraging. So, yes, it would need coordination across the repos
to get the full work submitted, but I don't think that burden is
unduly large compared to current practice. We do in fact already
see this need for co-ordination in other ways, For example, API
changes have parts that affect python-novaclient, and perhaps
horizon too. Storage  network changes often cross Neutron /
Cinder and Nova. If we can reduce the burden on nova-core the
stuff going into common codebase shoudl stand more chance of
getting review too.

So overall yes, this is a valid point, but I'm not particularly
concerned about the negatives impacts of it, because we're already
dealing with them today to a large extent.

 - I think you hit the nail on the head in terms of the scope of
 Nova and how few people probably really understand all of it,
 but given the amount of trust that goes with being a core wouldn't
 it also be able to make people cores on the understanding that
 they will only approve code in the areas they are expert in ?
   It kind of feels that this happens to a large extent already,
 for example I don't see Chris or Ken'ichi  taking on work outside
 of the API layer.It kind of feels as if given a small amount
 of trust we could have additional core reviewers focused on
 specific parts of the system without having to split up the
 code base if that's where the problem is.

Yes, you are right that it happens to some extent but I think it
is quite a big jump to effectively scale it up that amount of
trust to a team that realistically would need to be 40+ people in
size.

Also this isn't soley about review bandwidth. One of the things
I raised was about how there's certain standards required for
being part of nova, such as CI testing. If you can't meet that
you're forced into  a sub-optimal development practice compared
to the rest of nova where you are out of tree at subject to be
broken by Nova changes at any time, which is what Docker and
Ironic have been facing.  Separate repos will also facilitate
more targetted application of our testing resources, so vmware
repo changes wouldn't need to suffer false failures from libvirt
tempest jobs, and similarly vmware CI could be made gating for
vmware without causing libvirt code to suffer instability.

  -Original Message-
  From: Daniel P. Berrange [mailto:berra...@redhat.com]
  Sent: 04 September 2014 11:24
  To: OpenStack Development
  Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
  virt
  drivers
  
  Position statement
  ==
  
  Over the past year I've increasingly come to the conclusion that Nova is
  heading for (or probably already at) a major crisis. If steps are not taken 
  to
  avert this, the project is likely to loose a non-trivial amount of talent, 
  both
  regular code contributors and core team members. That includes myself. This
  is not good for Nova's long term health and so should be of concern to
  anyone involved in Nova and OpenStack.
  
  For those who don't want to read the whole mail, the executive summary is
  that the nova-core team is an unfixable bottleneck in our development
  process with our current project structure.
  The only way I see to remove the bottleneck is to split the virt drivers 
  out of
  tree and let them all have their own core teams in their area of code, 
  leaving
  current nova core to focus on all the common code outside the virt driver
  impls. I, now, none the less urge people to read the whole mail.
  
  
  Background information

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Thierry Carrez
Like I mentioned before, I think the only way out of the Nova death
spiral is to split code and give control over it to smaller dedicated
review teams. This is one way to do it. Thanks Dan for pulling this
together :)

A couple comments inline:

Daniel P. Berrange wrote:
 [...]
 This is a crisis. A large crisis. In fact, if you got a moment, it's
 a twelve-storey crisis with a magnificent entrance hall, carpeting
 throughout, 24-hour portage, and an enormous sign on the roof,
 saying 'This Is a Large Crisis'. A large crisis requires a large
 plan.
 [...]

I totally agree. We need a plan now, because we can't go through another
cycle without a solution in sight.

 [...]
 This has quite a few implications for the way development would
 operate.
 
  - The Nova core team at least, would be voluntarily giving up a big
amount of responsibility over the evolution of virt drivers. Due
to human nature, people are not good at giving up power, so this
may be painful to swallow. Realistically current nova core are
not experts in most of the virt drivers to start with, and more
important we clearly do not have sufficient time to do a good job
of review with everything submitted. Much of the current need
for core review of virt drivers is to prevent the mis-use of a
poorly defined virt driver API...which can be mitigated - See
later point(s)
 
  - Nova core would/should not have automatic +2 over the virt driver
repositories since it is unreasonable to assume they have the
suitable domain knowledge for all virt drivers out there. People
would of course be able to be members of multiple core teams. For
example John G would naturally be nova-core and nova-xen-core. I
would aim for nova-core and nova-libvirt-core, and so on. I do not
want any +2 responsibility over VMWare/HyperV/Docker drivers since
they're not my area of expertize - I only look at them today because
they have no other nova-core representation.
 
  - Not sure if it implies the Nova PTL would be solely focused on
Nova common. eg would there continue to be one PTL over all virt
driver implementation projects, or would each project have its
own PTL. Maybe this is irrelevant if a Czars approach is chosen
by virt driver projects for their work. I'd be inclined to say
that a single PTL should stay as a figurehead to represent all
the virt driver projects, acting as a point of contact to ensure
we keep communication / co-operation between the drivers in sync.
 [...]

At this point it may look like our current structure (programs, one PTL,
single core teams...) prevents us from implementing that solution. I
just want to say that in OpenStack, organizational structure reflects
how we work, not the other way around. If we need to reorganize
official project structure to work in smarter and long-term healthy
ways, that's a really small price to pay.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Gary Kotton
Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.
Fragmenting the project will be very unhealthy.
For what it is worth having a release date at the end of a vacation is
really bad. Look at the numbers:
http://stackalytics.com/report/contribution/nova-group/30
Thanks
Gary

On 9/4/14, 3:59 PM, Thierry Carrez thie...@openstack.org wrote:

Like I mentioned before, I think the only way out of the Nova death
spiral is to split code and give control over it to smaller dedicated
review teams. This is one way to do it. Thanks Dan for pulling this
together :)

A couple comments inline:

Daniel P. Berrange wrote:
 [...]
 This is a crisis. A large crisis. In fact, if you got a moment, it's
 a twelve-storey crisis with a magnificent entrance hall, carpeting
 throughout, 24-hour portage, and an enormous sign on the roof,
 saying 'This Is a Large Crisis'. A large crisis requires a large
 plan.
 [...]

I totally agree. We need a plan now, because we can't go through another
cycle without a solution in sight.

 [...]
 This has quite a few implications for the way development would
 operate.
 
  - The Nova core team at least, would be voluntarily giving up a big
amount of responsibility over the evolution of virt drivers. Due
to human nature, people are not good at giving up power, so this
may be painful to swallow. Realistically current nova core are
not experts in most of the virt drivers to start with, and more
important we clearly do not have sufficient time to do a good job
of review with everything submitted. Much of the current need
for core review of virt drivers is to prevent the mis-use of a
poorly defined virt driver API...which can be mitigated - See
later point(s)
 
  - Nova core would/should not have automatic +2 over the virt driver
repositories since it is unreasonable to assume they have the
suitable domain knowledge for all virt drivers out there. People
would of course be able to be members of multiple core teams. For
example John G would naturally be nova-core and nova-xen-core. I
would aim for nova-core and nova-libvirt-core, and so on. I do not
want any +2 responsibility over VMWare/HyperV/Docker drivers since
they're not my area of expertize - I only look at them today because
they have no other nova-core representation.
 
  - Not sure if it implies the Nova PTL would be solely focused on
Nova common. eg would there continue to be one PTL over all virt
driver implementation projects, or would each project have its
own PTL. Maybe this is irrelevant if a Czars approach is chosen
by virt driver projects for their work. I'd be inclined to say
that a single PTL should stay as a figurehead to represent all
the virt driver projects, acting as a point of contact to ensure
we keep communication / co-operation between the drivers in sync.
 [...]

At this point it may look like our current structure (programs, one PTL,
single core teams...) prevents us from implementing that solution. I
just want to say that in OpenStack, organizational structure reflects
how we work, not the other way around. If we need to reorganize
official project structure to work in smarter and long-term healthy
ways, that's a really small price to pay.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Dugger, Donald D
Basically +1 with what Daniel is saying (note that, as mentioned, a side effect 
of our effort to split out the scheduler will help but not solve this problem).

My only question is about the need to separate out each virt driver into a 
separate project, wouldn't you accomplish a lot of the benefit by creating a 
single virt project that includes all of the drivers?  I wouldn't necessarily 
expect a VMware guy to understand the specifics of the HyperV implementation 
but both people should understand what a virt driver does, how it interfaces to 
Nova and they should be able to intelligently review each other's code.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com] 
Sent: Thursday, September 4, 2014 4:24 AM
To: OpenStack Development
Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt 
drivers

Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is 
heading for (or probably already at) a major crisis. If steps are not taken to 
avert this, the project is likely to loose a non-trivial amount of talent, both 
regular code contributors and core team members. That includes myself. This is 
not good for Nova's long term health and so should be of concern to anyone 
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is that 
the nova-core team is an unfixable bottleneck in our development process with 
our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out of 
tree and let them all have their own core teams in their area of code, leaving 
current nova core to focus on all the common code outside the virt driver 
impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

 - Burn out of core team members from over work
 - Difficulty bringing new talent into the core team
 - Long delay in getting code reviewed  merged
 - Marginalization of code areas which aren't popular
 - Increasing size of nova code through new drivers
 - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a big 
problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the 
backlog of code up for review never goes away. Even intensive code review 
efforts at various points in the dev cycle makes only a small impact on the 
backlog. This has a pretty significant impact on core team members, as their 
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the reviews in 
a more efficient manner than plain gerrit allows for. These certainly help, but 
they can't ever solve the problem on their own - just make it slightly more 
bearable. And this is not even considering that core team members might have 
useful contributions to make in ways beyond just code review. Ultimately the 
workload is just too high to sustain the levels of review required, so core 
team members will eventually burn out (as they have done many times already).

Even if one person attempts to take the initiative to heavily invest in review 
of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag team' it is hard 
for one person to make a difference. The end result is that a patch is +2d and 
then sits idle for weeks or more until a merge conflict requires it to be 
reposted at which point even that one +2 is lost. This is a pretty demotivating 
outcome for both reviewers  the patch contributor.


New core team talent


It can't escape attention that the Nova core team does not grow in size very 
often. When Nova was younger and its code base was smaller, it was easier for 
contributors to get onto core because the base level of knowledge required was 
that much smaller. To get onto core today requires a major investment in 
learning Nova over a year or more. Even people who potentially have the latent 
skills may not have the time available to invest in learning the entire of Nova.

With the number of reviews proposed to Nova, the core team should probably be 
at least double its current size[1]. There is plenty of expertize in the 
project as a whole but it is typically focused into specific areas of the 
codebase. There is nowhere we can find
20 more people with broad knowledge of the codebase who could be promoted even 
over the next year, let alone today. This is ignoring that many existing 
members of core are relatively inactive due to burnout and so need replacing. 
That means we really need another
25-30 people for core. That's not going to happen.


Code review delays

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Solly Ross
 My only question is about the need to separate out each virt driver into a 
 separate project, wouldn't you 
 accomplish a lot of the benefit by creating a single virt project that 
 includes all of the drivers?

I don't think there's particularly a *point* to having all drivers in one repo. 
 Part of code review is looking for code gotchas, but part of code review is 
looking for subtle issues that are caused by the very nature of the driver.  A 
HyperV core reviewing a libvirt change should certainly be able to provide 
the former, but most likely cannot provide the latter to a sufficient degree 
(if he or she can, then he or she should be a libvirt core as well).

A strong +1 to Dan's proposal.  I think this would also make it easier for 
non-core reviewers to get started reviewing, without having a specialized tool 
setup.

Best Regards,
Solly Ross

P.S. 
This is a crisis. A large crisis. In fact, if you got a moment, it's
 a twelve-storey crisis with a magnificent entrance hall, carpeting
 throughout, 24-hour portage, and an enormous sign on the roof,
 saying 'This Is a Large Crisis'. A large crisis requires a large
 plan.

Ha!

- Original Message -
 From: Donald D Dugger donald.d.dug...@intel.com
 To: Daniel P. Berrange berra...@redhat.com, OpenStack Development 
 Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org
 Sent: Thursday, September 4, 2014 10:33:27 AM
 Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
 virt drivers
 
 Basically +1 with what Daniel is saying (note that, as mentioned, a side
 effect of our effort to split out the scheduler will help but not solve this
 problem).
 
 My only question is about the need to separate out each virt driver into a
 separate project, wouldn't you accomplish a lot of the benefit by creating a
 single virt project that includes all of the drivers?  I wouldn't
 necessarily expect a VMware guy to understand the specifics of the HyperV
 implementation but both people should understand what a virt driver does,
 how it interfaces to Nova and they should be able to intelligently review
 each other's code.
 
 --
 Don Dugger
 Censeo Toto nos in Kansa esse decisse. - D. Gale
 Ph: 303/443-3786
 
 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: Thursday, September 4, 2014 4:24 AM
 To: OpenStack Development
 Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out
 virt drivers
 
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that Nova is
 heading for (or probably already at) a major crisis. If steps are not taken
 to avert this, the project is likely to loose a non-trivial amount of
 talent, both regular code contributors and core team members. That includes
 myself. This is not good for Nova's long term health and so should be of
 concern to anyone involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive summary is
 that the nova-core team is an unfixable bottleneck in our development
 process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt drivers out
 of tree and let them all have their own core teams in their area of code,
 leaving current nova core to focus on all the common code outside the virt
 driver impls. I, now, none the less urge people to read the whole mail.
 
 
 Background information
 ==
 
 I see many factors coming together to form the crisis
 
  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing
 
 Each item on their own may not seem too bad, but combined they add up to a
 big problem.
 
 Core team burn out
 --
 
 Having been involved in Nova for several dev cycles now, it is clear that the
 backlog of code up for review never goes away. Even intensive code review
 efforts at various points in the dev cycle makes only a small impact on the
 backlog. This has a pretty significant impact on core team members, as their
 work is never done. At best, the dial is sometimes set to 10, instead of 11.
 
 Many people, myself included, have built tools to help deal with the reviews
 in a more efficient manner than plain gerrit allows for. These certainly
 help, but they can't ever solve the problem on their own - just make it
 slightly more bearable. And this is not even considering that core team
 members might have useful contributions to make in ways beyond just code
 review. Ultimately the workload is just too high to sustain the levels of
 review required, so core team members will eventually burn out (as they have
 done many times already).
 
 Even if one person attempts to take the initiative

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 15:36, Gary Kotton a écrit :

Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.
Fragmenting the project will be very unhealthy.
For what it is worth having a release date at the end of a vacation is
really bad. Look at the numbers:
http://stackalytics.com/report/contribution/nova-group/30
Thanks
Gary


From my perspective, the raw number of reviews should not be the only 
metric for saying if someone good for being a core. Indeed, that's quite 
easy to provide some comments on cosmetic but if you see why the patches 
are getting a -1 from a core, that's mostly because of a more important 
design issue or going reverse from another current effort.



Also, I can note that Stackanalytics metrics are *really* different from 
other tools like 
http://russellbryant.net/openstack-stats/nova-reviewers-30.txt


As a non-core people, I can just say that a core people must be at least 
there during Nova meetings and voice his opinions, provide some help 
with the gate status, look at bugs, give feedback to newcomers etc. and 
not just click on -1 or +1



Here, the problem is that the core team is not scalable : I don't want 
to provide examples of governments but just adding more people is often 
not the solution. Instead, providing delegations to subteams seems maybe 
the intermediate solution for helping this as it could help the core 
team to only approve and leave the subteam's half-cores reviewing the 
iterations until they consider the patch enough good for being merged.


Of course, nova cores could still bypass half-cores as they know the 
whole knowledge of Nova, or they could disapprove what the halfcores 
agreed, but that would free a lot of time for cores without giving them 
more bureaucracy.



I really like Dan's proposal of splitting code into different repos with 
separate teams and a single PTL (that's exactly the difference in 
between a Program and a Project) but as it requires some prework, I'm 
just thinking of allocating halfcores as a short-term solution until all 
the bits are sorted out.


And yes, there is urgency, I also felt the pain.

-Sylvain



On 9/4/14, 3:59 PM, Thierry Carrez thie...@openstack.org wrote:


Like I mentioned before, I think the only way out of the Nova death
spiral is to split code and give control over it to smaller dedicated
review teams. This is one way to do it. Thanks Dan for pulling this
together :)

A couple comments inline:

Daniel P. Berrange wrote:

[...]
This is a crisis. A large crisis. In fact, if you got a moment, it's
a twelve-storey crisis with a magnificent entrance hall, carpeting
throughout, 24-hour portage, and an enormous sign on the roof,
saying 'This Is a Large Crisis'. A large crisis requires a large
plan.
[...]

I totally agree. We need a plan now, because we can't go through another
cycle without a solution in sight.


[...]
This has quite a few implications for the way development would
operate.

  - The Nova core team at least, would be voluntarily giving up a big
amount of responsibility over the evolution of virt drivers. Due
to human nature, people are not good at giving up power, so this
may be painful to swallow. Realistically current nova core are
not experts in most of the virt drivers to start with, and more
important we clearly do not have sufficient time to do a good job
of review with everything submitted. Much of the current need
for core review of virt drivers is to prevent the mis-use of a
poorly defined virt driver API...which can be mitigated - See
later point(s)

  - Nova core would/should not have automatic +2 over the virt driver
repositories since it is unreasonable to assume they have the
suitable domain knowledge for all virt drivers out there. People
would of course be able to be members of multiple core teams. For
example John G would naturally be nova-core and nova-xen-core. I
would aim for nova-core and nova-libvirt-core, and so on. I do not
want any +2 responsibility over VMWare/HyperV/Docker drivers since
they're not my area of expertize - I only look at them today because
they have no other nova-core representation.

  - Not sure if it implies the Nova PTL would be solely focused on
Nova common. eg would there continue to be one PTL over all virt
driver implementation projects, or would each project have its
own PTL. Maybe this is irrelevant if a Czars approach is chosen
by virt driver projects for their work. I'd be inclined to say
that a single PTL should stay as a figurehead to represent all
the virt driver projects, acting as a point of 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Matt Riedemann



On 9/4/2014 9:57 AM, Daniel P. Berrange wrote:

On Thu, Sep 04, 2014 at 02:33:27PM +, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned,
a side effect of our effort to split out the scheduler will help
but not solve this problem).


Thanks for taking the time to read  give feedback


My only question is about the need to separate out each virt driver
into a separate project, wouldn't you accomplish a lot of the
benefit by creating a single virt project that includes all of the
drivers?  I wouldn't necessarily expect a VMware guy to understand
the specifics of the HyperV implementation but both people should
understand what a virt driver does, how it interfaces to Nova and
they should be able to intelligently review each other's code.


A single repo for virt drivers would have all the same costs of
separating from nova common, but with fewer of the benefits of
separate repos per driver. IOW, if we're going to split the
virt drivers out from the nova common, then we should go all
the way.

I think the separate driver repos is fairly compelling for a
number of reasons besides just core team size. As mentioned
elsewhere it allows better targeting of CI test jobs. ie a
VMware CI job can be easily made gating for only VMware code
changes. So VMWare CI instability won't affect libvirt code
submissions, and libvirt CI instability won't affect VMware
code submissions. Separate repos means that people starting
off a new driver (like Ironic or Docker) would not have to
immediately meet the same very high quality  testing bar
that existing drivers do. THey can evolve at their own pace
and not have to then undergo the disruption of jumping from
their initial repo to the 'official' repo.  Finally, I would
like each drivers team to be isolated from each other in terms
of code review capacity planning as far as practical - ie the
libvirt team should be able to accept as many libvirt features
as they can handle without being concerned that they'll reduce
what vmware is able to accept (though changes involving the
nova common code would obviously still contend).



Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is 
heading for (or probably already at) a major crisis. If steps are not taken to 
avert this, the project is likely to loose a non-trivial amount of talent, both 
regular code contributors and core team members. That includes myself. This is 
not good for Nova's long term health and so should be of concern to anyone 
involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is that 
the nova-core team is an unfixable bottleneck in our development process with 
our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out of 
tree and let them all have their own core teams in their area of code, leaving 
current nova core to focus on all the common code outside the virt driver 
impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a big 
problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the 
backlog of code up for review never goes away. Even intensive code review 
efforts at various points in the dev cycle makes only a small impact on the 
backlog. This has a pretty significant impact on core team members, as their 
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the reviews in 
a more efficient manner than plain gerrit allows for. These certainly help, but 
they can't ever solve the problem on their own - just make it slightly more 
bearable. And this is not even considering that core team members might have 
useful contributions to make in ways beyond just code review. Ultimately the 
workload is just too high to sustain the levels of review required, so core 
team members will eventually burn out (as they have done many times already).

Even if one person attempts to take the initiative to heavily invest in review 
of certain features it is often to no avail.
Unless a second dedicated core reviewer can be found to 'tag team' it is hard for 
one person to make a difference. The end result is that a patch is +2d and then 
sits idle for weeks or more until a merge conflict requires it to be reposted at 
which point even that one +2 is lost. This is a pretty 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 17:00, Solly Ross a écrit :

My only question is about the need to separate out each virt driver into a 
separate project, wouldn't you
accomplish a lot of the benefit by creating a single virt project that includes 
all of the drivers?

I don't think there's particularly a *point* to having all drivers in one repo.  Part of code review is 
looking for code gotchas, but part of code review is looking for subtle issues that are caused by 
the very nature of the driver.  A HyperV core reviewing a libvirt change should certainly be able 
to provide the former, but most likely cannot provide the latter to a sufficient degree (if he or she can, 
then he or she should be a libvirt core as well).

A strong +1 to Dan's proposal.  I think this would also make it easier for 
non-core reviewers to get started reviewing, without having a specialized tool 
setup.


As I said previously, I'm also giving a +1 to this proposal. That said, 
as I think it deserves at least one iteration for getting this done 
(look at the scheduler split and since hox long we're working on it), I 
also think we need a short-term solution like the one proposed by 
Thierry, ie. what I call half-cores - people who help reviewing an 
code area and free up time for cores just for approving instead of 
focusing on each iteration.


-Sylvain



Best Regards,
Solly Ross

P.S.

This is a crisis. A large crisis. In fact, if you got a moment, it's
a twelve-storey crisis with a magnificent entrance hall, carpeting
throughout, 24-hour portage, and an enormous sign on the roof,
saying 'This Is a Large Crisis'. A large crisis requires a large
plan.

Ha!

- Original Message -

From: Donald D Dugger donald.d.dug...@intel.com
To: Daniel P. Berrange berra...@redhat.com, OpenStack Development Mailing List 
(not for usage questions)
openstack-dev@lists.openstack.org
Sent: Thursday, September 4, 2014 10:33:27 AM
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out   
virt drivers

Basically +1 with what Daniel is saying (note that, as mentioned, a side
effect of our effort to split out the scheduler will help but not solve this
problem).

My only question is about the need to separate out each virt driver into a
separate project, wouldn't you accomplish a lot of the benefit by creating a
single virt project that includes all of the drivers?  I wouldn't
necessarily expect a VMware guy to understand the specifics of the HyperV
implementation but both people should understand what a virt driver does,
how it interfaces to Nova and they should be able to intelligently review
each other's code.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Daniel P. Berrange [mailto:berra...@redhat.com]
Sent: Thursday, September 4, 2014 4:24 AM
To: OpenStack Development
Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting out
virt drivers

Position statement
==

Over the past year I've increasingly come to the conclusion that Nova is
heading for (or probably already at) a major crisis. If steps are not taken
to avert this, the project is likely to loose a non-trivial amount of
talent, both regular code contributors and core team members. That includes
myself. This is not good for Nova's long term health and so should be of
concern to anyone involved in Nova and OpenStack.

For those who don't want to read the whole mail, the executive summary is
that the nova-core team is an unfixable bottleneck in our development
process with our current project structure.
The only way I see to remove the bottleneck is to split the virt drivers out
of tree and let them all have their own core teams in their area of code,
leaving current nova core to focus on all the common code outside the virt
driver impls. I, now, none the less urge people to read the whole mail.


Background information
==

I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

Each item on their own may not seem too bad, but combined they add up to a
big problem.

Core team burn out
--

Having been involved in Nova for several dev cycles now, it is clear that the
backlog of code up for review never goes away. Even intensive code review
efforts at various points in the dev cycle makes only a small impact on the
backlog. This has a pretty significant impact on core team members, as their
work is never done. At best, the dial is sometimes set to 10, instead of 11.

Many people, myself included, have built tools to help deal with the reviews
in a more efficient manner than plain gerrit allows for. These certainly
help, but they can't ever

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 10:18:04AM -0500, Matt Riedemann wrote:
 
 
   - Changes submitted to nova common code would trigger running of CI
 tests against the external virt drivers. Each virt driver core team
 would decide whether they want their driver to be tested upon Nova
 common changes. Expect that all would choose to be included to the
 same extent that they are today. So level of validation of nova code
 would remain at least at current level. I don't want to reduce the
 amount of code testing here since that's contrary to the direction
 we're taking wrt testing.
 
   - Changes submitted to virt drivers would trigger running CI tests
 that are applicable. eg changes to libvirt driver repo would not
 involve running database migration tests, since all database code
 is isolated in nova. libvirt changes would not trigger vmware,
 xenserver, ironic, etc CI systems. Virt driver changes should
 see fewer false positives in the tests as a result, and those
 that do occur should be more explicitly related to the code being
 proposed. eg a change to vmware is not going to trigger a tempest
 run that uses libvirt, so non-deterministic failures in libvirt
 will no longer plague vmware developers reviews. This would also
 make it possible for VMWare CI to be made gating for changes to
 the VMWare virt driver repository, without negatively impacting
 other virt drivers. So this change should increase testing quality
 for non-libvirt virt drivers and reduce pain of false failures
 for everyone.

[snip]

 Even if we split the virt drivers out, libvirt would still be the default in
 the Tempest gate runs right?

Yes, what I'm calling the nova common repository would still need to
have a tempest job that was gating on at least one virt driver as a
sanity check. As mentioned above, I'd pretty much expect that all
current tempest jobs for nova common code would continue unchanged.
IOW, a libvirt job would still be gating, and there'd still be a
number of 3rd party CIs for other virt drivers non-gating too.

The only change in testing jobs would be wrt to the new git repos for
the individual virt drivers. Those would be only running jobs directly
related to the code in those repos. it vmware is tested by a vmware CI
job and libvirt is tested by a libvirt CI job.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Vladik Romanovsky
+1 

I very much agree with Dan's the propsal.

I am concerned about difficulties we will face with merging
patches that spreads accross various regions: manager, conductor, scheduler, 
etc..
However, I think, this is a small price to pay for having a more focused teams.

IMO, we will stiil have to pay it, the moment the scheduler will separate.

Regards,
Vladik

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 01:36:04PM +, Gary Kotton wrote:
 Hi,
 I do not think that Nova is in a death spiral. I just think that the
 current way of working at the moment is strangling the project. I do not
 understand why we need to split drivers out of the core project. Why not
 have the ability to provide Œcore review¹ status to people for reviewing
 those parts of the code? We have enough talented people in OpenStack to be
 able to write a driver above gerrit to enable that.

The consensus view at the summit was that, having tried  failed at getting
useful changes into gerrit, it is not a viable option unless we undertake a
permanent fork of the code base. There didn't seem to be any apetite for
maintaining  developing a large java app ourselves. So people we're looking
to start writing a replacement for gerrit from scratch (albeit reusing the
database schema).

Even if we did have such fine grained permissioning in gerrit or another
review tool, I'd still suggest a split because this is about more than just
the review team size. There are a number of other compelling benefits to
having fully separate drivers I've mentioned in the original thread  other
replies here.

 Fragmenting the project will be very unhealthy.

On the contrary, I think it will re-invigorate the project. The other
historical cases where open stack projects have split out code have
resulted in a pretty significant benefit for all involved. The testing
frameworks we have will help ensure that the virt drivers continue to
provide consistent semantics, just as they do today, and any eventual
openstack trademark certifications would re-inforce that. Improving
the specification of the virt driver interface by introducing more
objects and killing undocumented dict usage will also further help
in keeping virt drivers aligned.

 On 9/4/14, 3:59 PM, Thierry Carrez thie...@openstack.org wrote:
 
 Like I mentioned before, I think the only way out of the Nova death
 spiral is to split code and give control over it to smaller dedicated
 review teams. This is one way to do it. Thanks Dan for pulling this
 together :)
 
 A couple comments inline:
 
 Daniel P. Berrange wrote:
  [...]
  This is a crisis. A large crisis. In fact, if you got a moment, it's
  a twelve-storey crisis with a magnificent entrance hall, carpeting
  throughout, 24-hour portage, and an enormous sign on the roof,
  saying 'This Is a Large Crisis'. A large crisis requires a large
  plan.
  [...]
 
 I totally agree. We need a plan now, because we can't go through another
 cycle without a solution in sight.
 
  [...]
  This has quite a few implications for the way development would
  operate.
  
   - The Nova core team at least, would be voluntarily giving up a big
 amount of responsibility over the evolution of virt drivers. Due
 to human nature, people are not good at giving up power, so this
 may be painful to swallow. Realistically current nova core are
 not experts in most of the virt drivers to start with, and more
 important we clearly do not have sufficient time to do a good job
 of review with everything submitted. Much of the current need
 for core review of virt drivers is to prevent the mis-use of a
 poorly defined virt driver API...which can be mitigated - See
 later point(s)
  
   - Nova core would/should not have automatic +2 over the virt driver
 repositories since it is unreasonable to assume they have the
 suitable domain knowledge for all virt drivers out there. People
 would of course be able to be members of multiple core teams. For
 example John G would naturally be nova-core and nova-xen-core. I
 would aim for nova-core and nova-libvirt-core, and so on. I do not
 want any +2 responsibility over VMWare/HyperV/Docker drivers since
 they're not my area of expertize - I only look at them today because
 they have no other nova-core representation.
  
   - Not sure if it implies the Nova PTL would be solely focused on
 Nova common. eg would there continue to be one PTL over all virt
 driver implementation projects, or would each project have its
 own PTL. Maybe this is irrelevant if a Czars approach is chosen
 by virt driver projects for their work. I'd be inclined to say
 that a single PTL should stay as a figurehead to represent all
 the virt driver projects, acting as a point of contact to ensure
 we keep communication / co-operation between the drivers in sync.
  [...]
 
 At this point it may look like our current structure (programs, one PTL,
 single core teams...) prevents us from implementing that solution. I
 just want to say that in OpenStack, organizational structure reflects
 how we work, not the other way around. If we need to reorganize
 official project structure to work in smarter and long-term healthy
 ways, that's a really small price to pay.
 
 -- 
 Thierry Carrez (ttx)
 
 ___
 OpenStack-dev mailing 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Dugger, Donald D
Actually, I think Sylvain's point is even stronger as I don't think splitting 
the virt drivers out of Nova is a complete fix.  It may solve the review 
latency for the virt driver area but, unless virt drivers are the bulk of Nova 
patches, the Nova core team will still be swamped with review requests.  Some 
solution, maybe half-cores, will still be needed for Nova long term.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786

-Original Message-
From: Sylvain Bauza [mailto:sba...@redhat.com] 
Sent: Thursday, September 4, 2014 9:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out 
virt drivers


Le 04/09/2014 17:00, Solly Ross a écrit :
 My only question is about the need to separate out each virt driver 
 into a separate project, wouldn't you accomplish a lot of the benefit by 
 creating a single virt project that includes all of the drivers?
 I don't think there's particularly a *point* to having all drivers in one 
 repo.  Part of code review is looking for code gotchas, but part of code 
 review is looking for subtle issues that are caused by the very nature of the 
 driver.  A HyperV core reviewing a libvirt change should certainly be able 
 to provide the former, but most likely cannot provide the latter to a 
 sufficient degree (if he or she can, then he or she should be a libvirt 
 core as well).

 A strong +1 to Dan's proposal.  I think this would also make it easier for 
 non-core reviewers to get started reviewing, without having a specialized 
 tool setup.

As I said previously, I'm also giving a +1 to this proposal. That said, as I 
think it deserves at least one iteration for getting this done (look at the 
scheduler split and since hox long we're working on it), I also think we need a 
short-term solution like the one proposed by Thierry, ie. what I call 
half-cores - people who help reviewing an code area and free up time for 
cores just for approving instead of focusing on each iteration.

-Sylvain


 Best Regards,
 Solly Ross

 P.S.
 This is a crisis. A large crisis. In fact, if you got a moment, it's 
 a twelve-storey crisis with a magnificent entrance hall, carpeting 
 throughout, 24-hour portage, and an enormous sign on the roof, saying 
 'This Is a Large Crisis'. A large crisis requires a large plan.
 Ha!

 - Original Message -
 From: Donald D Dugger donald.d.dug...@intel.com
 To: Daniel P. Berrange berra...@redhat.com, OpenStack Development 
 Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org
 Sent: Thursday, September 4, 2014 10:33:27 AM
 Subject: Re: [openstack-dev] [nova] Averting the Nova crisis by splitting 
 outvirt drivers

 Basically +1 with what Daniel is saying (note that, as mentioned, a 
 side effect of our effort to split out the scheduler will help but 
 not solve this problem).

 My only question is about the need to separate out each virt driver 
 into a separate project, wouldn't you accomplish a lot of the benefit 
 by creating a single virt project that includes all of the drivers?  
 I wouldn't necessarily expect a VMware guy to understand the 
 specifics of the HyperV implementation but both people should 
 understand what a virt driver does, how it interfaces to Nova and 
 they should be able to intelligently review each other's code.

 --
 Don Dugger
 Censeo Toto nos in Kansa esse decisse. - D. Gale
 Ph: 303/443-3786

 -Original Message-
 From: Daniel P. Berrange [mailto:berra...@redhat.com]
 Sent: Thursday, September 4, 2014 4:24 AM
 To: OpenStack Development
 Subject: [openstack-dev] [nova] Averting the Nova crisis by splitting 
 out virt drivers

 Position statement
 ==

 Over the past year I've increasingly come to the conclusion that Nova 
 is heading for (or probably already at) a major crisis. If steps are 
 not taken to avert this, the project is likely to loose a non-trivial 
 amount of talent, both regular code contributors and core team 
 members. That includes myself. This is not good for Nova's long term 
 health and so should be of concern to anyone involved in Nova and OpenStack.

 For those who don't want to read the whole mail, the executive 
 summary is that the nova-core team is an unfixable bottleneck in our 
 development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt 
 drivers out of tree and let them all have their own core teams in 
 their area of code, leaving current nova core to focus on all the 
 common code outside the virt driver impls. I, now, none the less urge people 
 to read the whole mail.


 Background information
 ==

 I see many factors coming together to form the crisis

   - Burn out of core team members from over work
   - Difficulty bringing new talent into the core team
   - Long delay in getting code reviewed  merged
   - Marginalization

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote:
 Actually, I think Sylvain's point is even stronger as I don't think
 splitting the virt drivers out of Nova is a complete fix.  It may
 solve the review latency for the virt driver area but, unless virt
 drivers are the bulk of Nova patches, the Nova core team will still
 be swamped with review requests.  Some solution, maybe half-cores,
 will still be needed for Nova long term.

Absolutely, nova core will still have an awful lot of work todo
and will need to have fresh blood. The split will free up some %
of existing cores time though as there's certainly plenty of virt
driver only patches going through merge that are taking up non
negligble review time. eg I've done loads of review on vmware
only code which I'd be relieved of with vmware maintainers able
to form their own review core for their driver. There is also the
fact that people are holding back on even submitting code for
many drivers because they know it'll never get reviewed. So the
proportion of virt driver only code is likely to be higher than
what we currently see on review today.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Duncan Thomas
On 4 September 2014 16:00, Solly Ross sr...@redhat.com wrote:
 My only question is about the need to separate out each virt driver into a 
 separate project, wouldn't you
 accomplish a lot of the benefit by creating a single virt project that 
 includes all of the drivers?

 I don't think there's particularly a *point* to having all drivers in one 
 repo.  Part of code review is looking for code gotchas, but part of code 
 review is looking for subtle issues that are caused by the very nature of the 
 driver.  A HyperV core reviewing a libvirt change should certainly be able 
 to provide the former, but most likely cannot provide the latter to a 
 sufficient degree (if he or she can, then he or she should be a libvirt 
 core as well).

I think that having a shared review team across all of the drivers has
definite benefits in terms of coherency and consistency - it is very
easy for experts on one technology to become tunnel-visioned on some
points and miss the wider, cross project picture. A common drivers
team is likely to have a broad enough range of opinions to keep things
healthy, compared to one repo (and team) per driver, and also they are
able to speak collectively to teh core nova team, which helps set
priorities there when they need to be influenced on behalf of the
drivers team.

TLDR: I don't think there's particularly a point to splitting out the
drivers into individual repos, and much to be gained from keeping them
all in one (but still breaking them out of nova)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Sylvain Bauza


Le 04/09/2014 17:57, Daniel P. Berrange a écrit :

On Thu, Sep 04, 2014 at 03:49:26PM +, Dugger, Donald D wrote:

Actually, I think Sylvain's point is even stronger as I don't think
splitting the virt drivers out of Nova is a complete fix.  It may
solve the review latency for the virt driver area but, unless virt
drivers are the bulk of Nova patches, the Nova core team will still
be swamped with review requests.  Some solution, maybe half-cores,
will still be needed for Nova long term.

Absolutely, nova core will still have an awful lot of work todo
and will need to have fresh blood. The split will free up some %
of existing cores time though as there's certainly plenty of virt
driver only patches going through merge that are taking up non
negligble review time. eg I've done loads of review on vmware
only code which I'd be relieved of with vmware maintainers able
to form their own review core for their driver. There is also the
fact that people are holding back on even submitting code for
many drivers because they know it'll never get reviewed. So the
proportion of virt driver only code is likely to be higher than
what we currently see on review today.



I totally understand your point and I agree with it. I'm just thinking 
that for Kilo and Lxxx, we also need to experiment some halfcore teams 
in order to free up your review duty, at least until the virt code is 
splitted out correctly.


On a side note, assuming I'm a non-core (so you can just throw my 
advice), I don't think the runway/slot proposal for Kilo will increase 
the reviewing bandwidth as it will just create another layer of 
prioritization without addressing the velocity. In another world, that's 
not because you just create a Scrum's sprint with 2 people and provide 
poker planning that you can address a 2-month man-day work.


-Sylvain


Regards,
Daniel



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Daniel P. Berrange
On Thu, Sep 04, 2014 at 05:11:22PM +0100, Duncan Thomas wrote:
 On 4 September 2014 16:00, Solly Ross sr...@redhat.com wrote:
  My only question is about the need to separate out each virt driver into a 
  separate project, wouldn't you
  accomplish a lot of the benefit by creating a single virt project that 
  includes all of the drivers?
 
  I don't think there's particularly a *point* to having all drivers in one 
  repo.  Part of code review is looking for code gotchas, but part of code 
  review is looking for subtle issues that are caused by the very nature of 
  the driver.  A HyperV core reviewing a libvirt change should certainly be 
  able to provide the former, but most likely cannot provide the latter to a 
  sufficient degree (if he or she can, then he or she should be a libvirt 
  core as well).
 
 I think that having a shared review team across all of the drivers has
 definite benefits in terms of coherency and consistency - it is very
 easy for experts on one technology to become tunnel-visioned on some
 points and miss the wider, cross project picture. A common drivers
 team is likely to have a broad enough range of opinions to keep things
 healthy, compared to one repo (and team) per driver, and also they are
 able to speak collectively to teh core nova team, which helps set
 priorities there when they need to be influenced on behalf of the
 drivers team.

If people are interested in reviewing all the driver code there's nothing
preventing them doing that. It is easy to setup gerrit to notify you on
changes across many drivers if you have that desire, or to write scripts
to query gerrit too. Realistically though, even today most people working
on a virt driver totally ignore the other virt drivers and so separating
them isn't going to make things significantly worse in that regard.

 TLDR: I don't think there's particularly a point to splitting out the
 drivers into individual repos, and much to be gained from keeping them
 all in one (but still breaking them out of nova)

There's significant benefits in the way we can test and gate changes
by having separate repos. It also ensures that the workload for changes
for one driver don't impact on the workload of changes for another
driver which is a very real problem today. It also ensures that any
new drivers can start off on a level playing field wrt existing drivers
and not have to jump over a huge initial bar to get into the official
repo. So there is a great deal of benefit to having  one repo per
driver.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 11:32 AM, Vladik Romanovsky wrote:

+1

I very much agree with Dan's the propsal.

I am concerned about difficulties we will face with merging
patches that spreads accross various regions: manager, conductor, scheduler, 
etc..
However, I think, this is a small price to pay for having a more focused teams.

IMO, we will stiil have to pay it, the moment the scheduler will separate.


There will be more pain the moment the scheduler separates, IMO, 
especially with its current design and interfaces.


-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Alessandro Pilotti
Hi all,

This is an issue that has been discussed quite a few times. As I was fearing the
bottleneck effect is getting worse with each release.

Nova grew simply too much and even though features like networking and block
storage have been spun off at some point in time, it still lacks the cohesion
necessary for a successful long term lifecycle, or in other terms, it’s just 
too big to
be properly maintained by a handful of amazing and overworked people.

Compute drivers are easy to identify as decoupled sub-projects and are among
those which suffer to a bigger extent the lack of an independent development
process. 

Nova is a mature project (at least relatively to the OpenStack’s context) and as
such new features and bug fixes need to go through a very thorough screening and
review before being approved and merged, which does not work well with
sub-projects that need to grow faster, especially when introduced later in the
lifecycle (e.g the current Hyper-V driver introduced in Folsom) or when being
pushed by more aggressive market requirements. 

Just as an example, only 3 out of 8 Hyper-V blueprint specs have been approved
and implemented in Juno, the rest will simply get bumped to Kilo, which means
that new additional specs will need to be bumped to L and so on introducing
further delays. We ended up privileging feature parity blueprints, delaying
almost anything else.

Bug fixes landing time in stable releases is also another issue for the user
base since merging in master takes a long time and backporting requires another
long review process, e.g. more than four months in some cases [1]. 
As a result we ended up releasing the fixes in a project fork that became our de
facto stable release in place of upstream, while waiting for upstream merge.

We never experienced similar issues in smaller projects like Neutron, Cinder,
Ceilometer or Horizon where we are involved as well, which can be a practical
example of the potential benefits of splitting Nova.

OpenStack has a clear process for incubation, letting new projects grow as fast
as they need during their youth and integrating them into core only when a
mature stage is reached [2]. Unfortunately this process applies to projects, but
not to subprojects (Hyper-V and VMWare drivers in particular, but not only)
resulting in a way slower development pace compared to what a project lead by an
independent team could have allowed. On the other hand, Docker is an example of
a driver going the StackForge way, but its ultimate potential inclusion in Nova
will just increase the current pain points.

From an Hyper-V team perspective, in the late Havana cycle the same reasons
highlighted in this thread almost lead us to ask for removal of the driver from
Nova in order to improve our development process, even at the cost of the
subsequent fall from (core) grace and StackForge incubation Purgatory period, so
I’m definitely happy that the conversation has been resumed with a bigger
consensus.

The main factor that blocked the Hyper-V driver’s exit from Nova was the
introduction of the Hyper-V CI during the same cycle. Regressions are a very
sensitive topic when you run OpenStack components on an operating system which
is not Linux and the CI helped a lot in blocking or discovering issues in a
timely fashion. Beside that, the size of the Hyper-V team increased considerably
during Icehouse and Juno [3], so the Hyper-V CI became a mandatory and almost
irreplaceable tool in our review process, leading us to reach an excellent level
of stability of the driver on every supported version of Hyper-V (and
progressive CI voting stability as well, but that’s another topic [4]).

This means that if we reach a point in which we agree to spin off the drivers in
separate core projects, we need to consider how driver related CIs will be still
included in the Nova review process, possibly with voting rights when the
individual CI stability allows it. Having each third party CI to vote only on
its spin-off driver project is not an option IMO, as it won’t catch regressions
introduced in Nova that affect the drivers, including race conditions [5]

An interesting area of discussion is who is going to be part of the initial core
teams for each new subproject. I truly appreciated the experience and help of
the Nova core guys, so in order to allow a smoother transition I’d suggest to
have for each new project (e.g. nova-compute-hyperv, nova-compute-vmware, etc)
an initial core team consisting in one or two members of the current Nova
sub-team and one Nova core, with ideally each patch reviewed by both the domain
experts and the Nova core. The team could then go on its way by voting its own
members as any other OpenStack project does.

Another point of discussion is the stabilization and documentation of the driver
interface. There are simply too many areas where the behavior between drivers
differs, and looking at some other driver’s behavior was in too many cases the
only source of 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Kyle Mestery
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange berra...@redhat.com wrote:
 Position statement
 ==

 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.

 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.

As others have said, thanks for writing this up Daniel.


 Background information
 ==

 I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

 Each item on their own may not seem too bad, but combined they
 add up to a big problem.

 Core team burn out
 --

 Having been involved in Nova for several dev cycles now, it is clear
 that the backlog of code up for review never goes away. Even
 intensive code review efforts at various points in the dev cycle
 makes only a small impact on the backlog. This has a pretty
 significant impact on core team members, as their work is never
 done. At best, the dial is sometimes set to 10, instead of 11.

 Many people, myself included, have built tools to help deal with
 the reviews in a more efficient manner than plain gerrit allows
 for. These certainly help, but they can't ever solve the problem
 on their own - just make it slightly more bearable. And this is
 not even considering that core team members might have useful
 contributions to make in ways beyond just code review. Ultimately
 the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they
 have done many times already).

 Even if one person attempts to take the initiative to heavily
 invest in review of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag
 team' it is hard for one person to make a difference. The end
 result is that a patch is +2d and then sits idle for weeks or
 more until a merge conflict requires it to be reposted at which
 point even that one +2 is lost. This is a pretty demotivating
 outcome for both reviewers  the patch contributor.


 New core team talent
 

 It can't escape attention that the Nova core team does not grow
 in size very often. When Nova was younger and its code base was
 smaller, it was easier for contributors to get onto core because
 the base level of knowledge required was that much smaller. To
 get onto core today requires a major investment in learning Nova
 over a year or more. Even people who potentially have the latent
 skills may not have the time available to invest in learning the
 entire of Nova.

 With the number of reviews proposed to Nova, the core team should
 probably be at least double its current size[1]. There is plenty of
 expertize in the project as a whole but it is typically focused
 into specific areas of the codebase. There is nowhere we can find
 20 more people with broad knowledge of the codebase who could be
 promoted even over the next year, let alone today. This is ignoring
 that many existing members of core are relatively inactive due to
 burnout and so need replacing. That means we really need another
 25-30 people for core. That's not going to happen.


 Code review delays
 --

 The obvious result of having too much work for too few reviewers
 is that code contributors face major delays in getting their work
 reviewed and merged. From personal experience, during Juno, I've
 probably spent 1 week in aggregate on actual code development vs
 8 weeks on waiting on code review. You have to constantly be on
 alert for review comments because unless you can respond quickly
 (and repost) while you still have the attention of the reviewer,
 they may not be look again for days/weeks.

 The length of time to get work merged serves as a demotivator to
 actually do work in the first place. I've personally avoided doing
 alot of code refactoring  cleanup work that would improve the
 maintainability of the libvirt driver in the long 

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Joe Gordon
On Thu, Sep 4, 2014 at 3:24 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 Position statement
 ==

 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.

 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.


 Background information
 ==

 I see many factors coming together to form the crisis

  - Burn out of core team members from over work
  - Difficulty bringing new talent into the core team
  - Long delay in getting code reviewed  merged
  - Marginalization of code areas which aren't popular
  - Increasing size of nova code through new drivers
  - Exclusion of developers without corporate backing

 Each item on their own may not seem too bad, but combined they
 add up to a big problem.

 Core team burn out
 --

 Having been involved in Nova for several dev cycles now, it is clear
 that the backlog of code up for review never goes away. Even
 intensive code review efforts at various points in the dev cycle
 makes only a small impact on the backlog. This has a pretty
 significant impact on core team members, as their work is never
 done. At best, the dial is sometimes set to 10, instead of 11.

 Many people, myself included, have built tools to help deal with
 the reviews in a more efficient manner than plain gerrit allows
 for. These certainly help, but they can't ever solve the problem
 on their own - just make it slightly more bearable. And this is
 not even considering that core team members might have useful
 contributions to make in ways beyond just code review. Ultimately
 the workload is just too high to sustain the levels of review
 required, so core team members will eventually burn out (as they
 have done many times already).

 Even if one person attempts to take the initiative to heavily
 invest in review of certain features it is often to no avail.
 Unless a second dedicated core reviewer can be found to 'tag
 team' it is hard for one person to make a difference. The end
 result is that a patch is +2d and then sits idle for weeks or
 more until a merge conflict requires it to be reposted at which
 point even that one +2 is lost. This is a pretty demotivating
 outcome for both reviewers  the patch contributor.


 New core team talent
 

 It can't escape attention that the Nova core team does not grow
 in size very often. When Nova was younger and its code base was
 smaller, it was easier for contributors to get onto core because
 the base level of knowledge required was that much smaller. To
 get onto core today requires a major investment in learning Nova
 over a year or more. Even people who potentially have the latent
 skills may not have the time available to invest in learning the
 entire of Nova.

 With the number of reviews proposed to Nova, the core team should
 probably be at least double its current size[1]. There is plenty of
 expertize in the project as a whole but it is typically focused
 into specific areas of the codebase. There is nowhere we can find
 20 more people with broad knowledge of the codebase who could be
 promoted even over the next year, let alone today. This is ignoring
 that many existing members of core are relatively inactive due to
 burnout and so need replacing. That means we really need another
 25-30 people for core. That's not going to happen.


 Code review delays
 --

 The obvious result of having too much work for too few reviewers
 is that code contributors face major delays in getting their work
 reviewed and merged. From personal experience, during Juno, I've
 probably spent 1 week in aggregate on actual code development vs
 8 weeks on waiting on code review. You have to constantly be on
 alert for review comments because unless you can respond quickly
 (and repost) while you still have the attention of the reviewer,
 they may not be look again for days/weeks.

 The length of time to get work merged serves as a demotivator to
 actually do work in the first place. I've personally avoided doing
 alot of code refactoring  cleanup work that would improve the
 maintainability of the libvirt driver in the long term, because
 I can't face the battle to get it reviewed  

Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Russell Bryant
On 09/04/2014 06:24 AM, Daniel P. Berrange wrote:
 Position statement
 ==
 
 Over the past year I've increasingly come to the conclusion that
 Nova is heading for (or probably already at) a major crisis. If
 steps are not taken to avert this, the project is likely to loose
 a non-trivial amount of talent, both regular code contributors and
 core team members. That includes myself. This is not good for
 Nova's long term health and so should be of concern to anyone
 involved in Nova and OpenStack.
 
 For those who don't want to read the whole mail, the executive
 summary is that the nova-core team is an unfixable bottleneck
 in our development process with our current project structure.
 The only way I see to remove the bottleneck is to split the virt
 drivers out of tree and let them all have their own core teams
 in their area of code, leaving current nova core to focus on
 all the common code outside the virt driver impls. I, now, none
 the less urge people to read the whole mail.

Fantastic write-up.  I can't +1 enough the problem statement, which I
think you've done a nice job of framing.  We've taken steps to try to
improve this, but none of them have been big enough.  I feel we've
reached a tipping point.  I think many others do too, and several
proposals being discussed all seem rooted in this same core issue.

When it comes to the proposed solution, I'm +1 on that too, but part of
that is that it's hard for me to ignore the limitations placed on us by
our current review infrastructure (gerrit).

If we ignored gerrit for a moment, is rapid increase in splitting out
components the ideal workflow?  Would we be better off finding a way to
finally just implement a model more like the Linux kernel with
sub-system maintainers and pull requests to a top-level tree?  Maybe.
I'm not convinced that split of repos is obviously better.

You make some good arguments for why splitting has other benefits.
Besides, even if we weren't going to split them and instead wanted to
have separate branches, we'd have to take interface stability much more
seriously.   I think the work immediately needed overlaps quite a bit.

In any case, let's not completely side-tracked on the end game workflow.
 I am completely on board with the idea that we have to move to a model
that involves more than one team and spreading out the responsibility
further than we have thus far.

I don't think we can afford to wait much longer without drastic change,
so let's make it happen.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 09:36 AM, Gary Kotton wrote:

Hi,
I do not think that Nova is in a death spiral. I just think that the
current way of working at the moment is strangling the project. I do not
understand why we need to split drivers out of the core project. Why not
have the ability to provide Œcore review¹ status to people for reviewing
those parts of the code? We have enough talented people in OpenStack to be
able to write a driver above gerrit to enable that.


Clearly you have never looked at the Gerrit source code.

:)

-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Michael Still
On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange berra...@redhat.com wrote:

[Heavy snipping because of length]

 The radical (?) solution to the nova core team bottleneck is thus to
 follow this lead and split the nova virt drivers out into separate
 projects and delegate their maintainence to new dedicated teams.

  - Nova becomes the home for the public APIs, RPC system, database
persistent and the glue that ties all this together with the
virt driver API.

  - Each virt driver project gets its own core team and is responsible
for dealing with review, merge  release of their codebase.

I think this is the crux of the matter. We're not doing a great job of
landing code at the moment, because we can't keep up with the review
workload.

So far we've had two proposals mooted:

 - slots / runways, where we try to rate limit the number of things
we're trying to review at once to maintain focus
 - splitting all the virt drivers out of the nova tree

Splitting the drivers out of the nova tree does come at a cost -- we'd
need to stabilise and probably version the hypervisor driver
interface, and that will encourage more out of tree drivers, which
are things we haven't historically wanted to do. If we did this split,
I think we need to acknowledge that we are changing policy there. It
also means that nova-core wouldn't be the ones holding the quality bar
for hypervisor drivers any more, I guess this would open the door for
drivers to more actively compete on the quality of their
implementations, which might be a good thing.

Both of these have interesting aspects, and I agree we need to do
_something_. I do wonder if there is a hybrid approach as well though.
For example, could we implement some sort of more formal lieutenant
system for drivers? We've talked about it in the past but never been
able to express how it would work in practise.

The last few days have been interesting as I watch FFEs come through.
People post explaining their feature, its importance, and the risk
associated with it. Three cores sign on for review. All of the ones
I've looked at have received active review since being posted. Would
it be bonkers to declare nova to be in permanent feature freeze? If
we could maintain the level of focus we see now, then we'd be getting
heaps more done that before.

These issues should very definitely be on the agenda for the design
summit, probably early in the week.

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes

On 09/04/2014 10:33 AM, Dugger, Donald D wrote:

Basically +1 with what Daniel is saying (note that, as mentioned, a
side effect of our effort to split out the scheduler will help but
not solve this problem).


The difference between Dan's proposal and the Gantt split is that Dan's 
proposal features quite prominently the following:


== begin ==

 - The nova/virt/driver.py class would need to be much better
   specified. All parameters / return values which are opaque dicts
   must be replaced with objects + attributes. Completion of the
   objectification work is mandatory, so there is cleaner separation
   between virt driver impls  the rest of Nova.

== end ==

In other words, Dan's proposal above is EXACTLY what I've been saying 
needs to be done to the interfaces between nova-conductor, nova-compute, 
and nova-scheduler *before* any split of the scheduler code is even 
remotely feasible.


Splitting the scheduler out before this is done would actually not help 
but not solve this problem -- it would instead further the problem, IMO.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Jay Pipes



On 09/04/2014 12:11 PM, Duncan Thomas wrote:

I think that having a shared review team across all of the drivers
has definite benefits in terms of coherency and consistency - it is
very easy for experts on one technology to become tunnel-visioned on
some points and miss the wider, cross project picture. A common
drivers team is likely to have a broad enough range of opinions to
keep things healthy, compared to one repo (and team) per driver, and
also they are able to speak collectively to teh core nova team, which
helps set priorities there when they need to be influenced on behalf
of the drivers team.


In theory, the above sounds good. In practice, it doesn't happen. The 
code in the virt drivers is horribly inconsistent, duplicative and yet 
slightly and pointlessly different, and uses paradigms that make sense 
for the one platform but don't necessarily make sense for another platform.


The testing/CI benefits that Dan highlighted -- in terms of patches to 
non-related virt drivers not interfering with the stability and progress 
of a patch to another virt driver -- is the #1 critical benefit to Dan's 
proposal, and doing a single virt drivers core team and repo totally 
throws that benefit away.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread Russell Bryant


- Original Message -
 On 09/04/2014 11:32 AM, Vladik Romanovsky wrote:
  +1
 
  I very much agree with Dan's the propsal.
 
  I am concerned about difficulties we will face with merging
  patches that spreads accross various regions: manager, conductor,
  scheduler, etc..
  However, I think, this is a small price to pay for having a more focused
  teams.
 
  IMO, we will stiil have to pay it, the moment the scheduler will separate.
 
 There will be more pain the moment the scheduler separates, IMO,
 especially with its current design and interfaces.

I absolutely agree that the scheduler split is a non-starter without 
stabilizing all of the relevant interfaces.  I hope there's not much debate on 
that high level point.  Of course, identifying exactly what those interfaces 
should be a bit more complicated, but I hope the focus can stay there.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

2014-09-04 Thread John Griffith
On Thu, Sep 4, 2014 at 4:32 PM, Jay Pipes jaypi...@gmail.com wrote:



 On 09/04/2014 12:11 PM, Duncan Thomas wrote:

 I think that having a shared review team across all of the drivers
 has definite benefits in terms of coherency and consistency - it is
 very easy for experts on one technology to become tunnel-visioned on
 some points and miss the wider, cross project picture. A common
 drivers team is likely to have a broad enough range of opinions to
 keep things healthy, compared to one repo (and team) per driver, and
 also they are able to speak collectively to teh core nova team, which
 helps set priorities there when they need to be influenced on behalf
 of the drivers team.


 In theory, the above sounds good. In practice, it doesn't happen. The code
 in the virt drivers is horribly inconsistent, duplicative and yet slightly
 and pointlessly different, and uses paradigms that make sense for the one
 platform but don't necessarily make sense for another platform.

 The testing/CI benefits that Dan highlighted -- in terms of patches to
 non-related virt drivers not interfering with the stability and progress of
 a patch to another virt driver -- is the #1 critical benefit to Dan's
 proposal, and doing a single virt drivers core team and repo totally throws
 that benefit away.

 Best,
 -jay


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Just some thoughts and observations I've had regarding this topic in Cinder
the past couple of years.  I realize this is a Nova thread so hopefully
some of this can be applied in a more general context.

TLDR:
1. I think moving drivers into their own repo is just shoveling the pile to
make a new pile (not really solving anything)

2. Removal of drivers other than the reference implementation for each
project could be the healthiest option
a. Requires transparent, public, automated 3'rd party CI
b. Requires a TRUE plugin architecture and mentality
c. Requires a stable and well defined API

3. While I'm still sort of a fan of the removal of drivers, I do think
Cinder is making it work, there have been missteps and yes it's a pain
sometimes but it's working ok and we've got plans to try and improve

4. Adding restrictions like drivers only in first milestone and more
intense scrutinization of features will go a long way to help resolve the
issues we do have currently

Now the long winded version with a little more detail and context;





I've spent a fair amount of time thinking about the explosive number of
drivers being added to Cinder over the past year or so.  I've been a pretty
vocal proponent of the idea of removing all drivers except the LVM
reference implementation from Cinder.  I'd rather see Vendors drivers
maintained in their own Github Repo and truly follow a plugin model.
 This of course means that Cinder has to be truly designed and maintained
with a real plugin architecture kept in mind in every aspect of development
(experience proves this harder to do than it sounds).  I think with things
stable and well defined interfaces as well as 3'rd party CI this is
actually a reasonable approach and could be effective.  I do not see how
creating a separate repo and in essence yet another set of OpenStack
Projects really helps with the problem.  The fact is that the biggest issue
most people see with driver contributions is those that are made by
organizations that work on their driver only and don't contribute back to
the core project (whether that be in the form of reviews of core
contributions).  I'm not sure I understand why that would be any different
by just putting the code in a separate bucket.  In other words, getting a
solid and consistent team working on that project seems like you've just
kicked the can down the road so you don't have to deal with it.

Any time I've mentioned the removal approach the response is typically that
there's no quality control, or that Vendors won't be as willing to invest
in OpenStack because they can focus on their own interests and get by with
that.  The quality control one was a tough one to counter, but now that
we're moving towards things like 3'rd party CI I'm not sure that's quite as
significant as it was a year ago.  I'd still like to see a public record of
testing in the form of CI, NOT just Vendor-A submitting something that
says.. yeah, I'm awesome.  I suspect that OpenStack adopters would look
at things like public CI postings to determine what's worth pursuing and
what's not.

The other concern I had in the past was we'd loose valuable contributors.
 There are vendors that are directly responsible for providing us with some
great contributors in the Core of the Cinder project.  They do a great job
of balancing the tactical and strategic interests, and the concern is that
if the drivers aren't in Cinder then maybe they