Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Mon, Mar 21, 2016 at 10:19:47AM -0400, Emilien Macchi wrote: > On Mon, Mar 21, 2016 at 9:59 AM, Steven Hardy wrote: > > On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote: > >> On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy wrote: > >> > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote: > >> >>Emilien, > >> >> > >> >>Agree on the rant. But not clear on concrete proposal to fix it. > >> >> > >> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage. > >> >> > >> >>Unless we test known working version of each project in TripleO CI > >> >> you are > >> >>dependent on health of other components. > >> > > >> > I've so far resisted replying to this thread, because while valid, many > >> > of > >> > the concerns expressed by Emilien are quite general complaints, and it's > >> > hard to reply with specific solutions. > >> > > >> > However work *is* going on to improve many of these problems, let's see > >> > if > >> > I can provide a summary, to clarify the various "concrete proposals" > >> > which > >> > do exist. > >> > > >> > 1. Core team & review velocity > >> > > >> > We've had a small and very overloaded core team for a while now, and this > >> > will be helped by expanding our community to include those who've been > >> > regularly contributing excellent work and reviews as core reviewers: > >> > > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html > >> > > >> > Note that I personally think it's absolutely fine for folks to be more > >> > expert in some subsystem and to focus review extra attention on e.g API, > >> > UI, Puppet or whatever. This subsystem-core model has been well proven > >> > in > >> > other projects, and folks will naturally broaden their areas of deeper > >> > knowledge over time. > >> > > >> > Related to this is movement of code, such as the puppet-tripleo > >> > refactoring > >> > mentioned by Michael - this has already started, and will help with > >> > providing a cleaner interface between the puppet and heat pieces (which > >> > will also help focus reviewer attention appropriately). > >> > >> Indeed, Michael, Dan & I are working on moving out the Puppet code > >> from THT to puppet-tripleo. > >> That's a nice move, and I appreciate TripleO team support on it. > >> > >> > 2. Day 1 developer experience > >> > > >> > This is closely related to the CI failure rate - there are efforts to > >> > integrate with the RDO tripleo-quickstart tooling, which simplifies the > >> > initial undercloud setup, and potentially makes consuming pre-built, > >> > validated undercloud images (probably output artefacts from our new > >> > periodic CI job) much easier. > >> > > >> > So, this will mean that both developers and CI can potentially be less > >> > regularly impacted by trunk regressions which often cause CI to fail, and > >> > break developer environments. > >> > > >> > https://review.openstack.org/#/c/276810/5 > >> > > >> > 3. CI coverage and trunk failure rate > >> > > >> > We've been working really hard to improve things here, which are really > >> > several inter-related issues: > >> > > >> > - Lack of Hardware capacity in the tripleo CI cloud > >> > - Frequent trunk regressions breaking our CI > >> > - Lack of coverage of some key features (network isolation, SSL, IPv6, > >> > upgrades) > >> > - Lack of coverage for vendor plugin templates/puppet code > >> > > >> > There's work ongoing to improve this from multiple perspectives: > >> > > >> > New periodic CI job (to be used for automated promotion of the > >> > current-tripleo repo, and for pre-built undercloud images): > >> > https://review.openstack.org/#/c/271370/ > >> > > >> > Add network isolation support to CI: > >> > https://review.openstack.org/#/c/288163/ > >> > > >> > Test SSL enabled in overcloud: > >> > https://review.openstack.org/#/c/281988/ > >> > > >> > CI coverage of IPv6: > >> > https://review.openstack.org/#/c/289445/ > >> > > >> > Discussion around better documented integration for third-party CI: > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html > >> > >> Do we have plans to execute Tempest? > > > > This is something which has been discussed several times, but right now we > > don't have the time available to run it per-commit because we'll hit the > > job timeout. > > > > This situation will improve as we gain time e.g through use of cached > > pre-built images, but right now I think we could look at enabling it only > > on the periodic job when that is fully proven. > > > > Having said that, I should point out that tempest doesn't get us great > > coverage of some newer projects - e.g all Heat scenario coverage was moved > > out of the
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Mon, Mar 21, 2016 at 9:59 AM, Steven Hardy wrote: > On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote: >> On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy wrote: >> > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote: >> >>Emilien, >> >> >> >>Agree on the rant. But not clear on concrete proposal to fix it. >> >> >> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage. >> >> >> >>Unless we test known working version of each project in TripleO CI you >> >> are >> >>dependent on health of other components. >> > >> > I've so far resisted replying to this thread, because while valid, many of >> > the concerns expressed by Emilien are quite general complaints, and it's >> > hard to reply with specific solutions. >> > >> > However work *is* going on to improve many of these problems, let's see if >> > I can provide a summary, to clarify the various "concrete proposals" which >> > do exist. >> > >> > 1. Core team & review velocity >> > >> > We've had a small and very overloaded core team for a while now, and this >> > will be helped by expanding our community to include those who've been >> > regularly contributing excellent work and reviews as core reviewers: >> > >> > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html >> > >> > Note that I personally think it's absolutely fine for folks to be more >> > expert in some subsystem and to focus review extra attention on e.g API, >> > UI, Puppet or whatever. This subsystem-core model has been well proven in >> > other projects, and folks will naturally broaden their areas of deeper >> > knowledge over time. >> > >> > Related to this is movement of code, such as the puppet-tripleo refactoring >> > mentioned by Michael - this has already started, and will help with >> > providing a cleaner interface between the puppet and heat pieces (which >> > will also help focus reviewer attention appropriately). >> >> Indeed, Michael, Dan & I are working on moving out the Puppet code >> from THT to puppet-tripleo. >> That's a nice move, and I appreciate TripleO team support on it. >> >> > 2. Day 1 developer experience >> > >> > This is closely related to the CI failure rate - there are efforts to >> > integrate with the RDO tripleo-quickstart tooling, which simplifies the >> > initial undercloud setup, and potentially makes consuming pre-built, >> > validated undercloud images (probably output artefacts from our new >> > periodic CI job) much easier. >> > >> > So, this will mean that both developers and CI can potentially be less >> > regularly impacted by trunk regressions which often cause CI to fail, and >> > break developer environments. >> > >> > https://review.openstack.org/#/c/276810/5 >> > >> > 3. CI coverage and trunk failure rate >> > >> > We've been working really hard to improve things here, which are really >> > several inter-related issues: >> > >> > - Lack of Hardware capacity in the tripleo CI cloud >> > - Frequent trunk regressions breaking our CI >> > - Lack of coverage of some key features (network isolation, SSL, IPv6, >> > upgrades) >> > - Lack of coverage for vendor plugin templates/puppet code >> > >> > There's work ongoing to improve this from multiple perspectives: >> > >> > New periodic CI job (to be used for automated promotion of the >> > current-tripleo repo, and for pre-built undercloud images): >> > https://review.openstack.org/#/c/271370/ >> > >> > Add network isolation support to CI: >> > https://review.openstack.org/#/c/288163/ >> > >> > Test SSL enabled in overcloud: >> > https://review.openstack.org/#/c/281988/ >> > >> > CI coverage of IPv6: >> > https://review.openstack.org/#/c/289445/ >> > >> > Discussion around better documented integration for third-party CI: >> > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html >> >> Do we have plans to execute Tempest? > > This is something which has been discussed several times, but right now we > don't have the time available to run it per-commit because we'll hit the > job timeout. > > This situation will improve as we gain time e.g through use of cached > pre-built images, but right now I think we could look at enabling it only > on the periodic job when that is fully proven. > > Having said that, I should point out that tempest doesn't get us great > coverage of some newer projects - e.g all Heat scenario coverage was moved > out of the tempest tree, and other projects have done similar AFAIK, so we > may end up with very sparse API surface tests (or nothing at all) in these > cases. In Puppet OpenStack CI, we execute smoke tests (a few tests of each service, and some scenarios), and some tests not in smoke, for Ironic, Aodh, Horizon. It tak
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Mon, Mar 21, 2016 at 09:41:42AM -0400, Emilien Macchi wrote: > On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy wrote: > > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote: > >>Emilien, > >> > >>Agree on the rant. But not clear on concrete proposal to fix it. > >> > >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage. > >> > >>Unless we test known working version of each project in TripleO CI you > >> are > >>dependent on health of other components. > > > > I've so far resisted replying to this thread, because while valid, many of > > the concerns expressed by Emilien are quite general complaints, and it's > > hard to reply with specific solutions. > > > > However work *is* going on to improve many of these problems, let's see if > > I can provide a summary, to clarify the various "concrete proposals" which > > do exist. > > > > 1. Core team & review velocity > > > > We've had a small and very overloaded core team for a while now, and this > > will be helped by expanding our community to include those who've been > > regularly contributing excellent work and reviews as core reviewers: > > > > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html > > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html > > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html > > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html > > > > Note that I personally think it's absolutely fine for folks to be more > > expert in some subsystem and to focus review extra attention on e.g API, > > UI, Puppet or whatever. This subsystem-core model has been well proven in > > other projects, and folks will naturally broaden their areas of deeper > > knowledge over time. > > > > Related to this is movement of code, such as the puppet-tripleo refactoring > > mentioned by Michael - this has already started, and will help with > > providing a cleaner interface between the puppet and heat pieces (which > > will also help focus reviewer attention appropriately). > > Indeed, Michael, Dan & I are working on moving out the Puppet code > from THT to puppet-tripleo. > That's a nice move, and I appreciate TripleO team support on it. > > > 2. Day 1 developer experience > > > > This is closely related to the CI failure rate - there are efforts to > > integrate with the RDO tripleo-quickstart tooling, which simplifies the > > initial undercloud setup, and potentially makes consuming pre-built, > > validated undercloud images (probably output artefacts from our new > > periodic CI job) much easier. > > > > So, this will mean that both developers and CI can potentially be less > > regularly impacted by trunk regressions which often cause CI to fail, and > > break developer environments. > > > > https://review.openstack.org/#/c/276810/5 > > > > 3. CI coverage and trunk failure rate > > > > We've been working really hard to improve things here, which are really > > several inter-related issues: > > > > - Lack of Hardware capacity in the tripleo CI cloud > > - Frequent trunk regressions breaking our CI > > - Lack of coverage of some key features (network isolation, SSL, IPv6, > > upgrades) > > - Lack of coverage for vendor plugin templates/puppet code > > > > There's work ongoing to improve this from multiple perspectives: > > > > New periodic CI job (to be used for automated promotion of the > > current-tripleo repo, and for pre-built undercloud images): > > https://review.openstack.org/#/c/271370/ > > > > Add network isolation support to CI: > > https://review.openstack.org/#/c/288163/ > > > > Test SSL enabled in overcloud: > > https://review.openstack.org/#/c/281988/ > > > > CI coverage of IPv6: > > https://review.openstack.org/#/c/289445/ > > > > Discussion around better documented integration for third-party CI: > > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html > > Do we have plans to execute Tempest? This is something which has been discussed several times, but right now we don't have the time available to run it per-commit because we'll hit the job timeout. This situation will improve as we gain time e.g through use of cached pre-built images, but right now I think we could look at enabling it only on the periodic job when that is fully proven. Having said that, I should point out that tempest doesn't get us great coverage of some newer projects - e.g all Heat scenario coverage was moved out of the tempest tree, and other projects have done similar AFAIK, so we may end up with very sparse API surface tests (or nothing at all) in these cases. There is probably more we can do within the existing pingtest though, it creates the instance inside a heat stack, so we could just add a bunch more resources for all the overcloud services, and pretty quickly prove that the deployed services are at least running (without extending the runtime much). Steve __
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Mon, Mar 21, 2016 at 6:57 AM, Steven Hardy wrote: > On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote: >>Emilien, >> >>Agree on the rant. But not clear on concrete proposal to fix it. >> >>Spend more time “fixing” CI and use Tempest as a gate is a bit wage. >> >>Unless we test known working version of each project in TripleO CI you are >>dependent on health of other components. > > I've so far resisted replying to this thread, because while valid, many of > the concerns expressed by Emilien are quite general complaints, and it's > hard to reply with specific solutions. > > However work *is* going on to improve many of these problems, let's see if > I can provide a summary, to clarify the various "concrete proposals" which > do exist. > > 1. Core team & review velocity > > We've had a small and very overloaded core team for a while now, and this > will be helped by expanding our community to include those who've been > regularly contributing excellent work and reviews as core reviewers: > > http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html > http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html > > Note that I personally think it's absolutely fine for folks to be more > expert in some subsystem and to focus review extra attention on e.g API, > UI, Puppet or whatever. This subsystem-core model has been well proven in > other projects, and folks will naturally broaden their areas of deeper > knowledge over time. > > Related to this is movement of code, such as the puppet-tripleo refactoring > mentioned by Michael - this has already started, and will help with > providing a cleaner interface between the puppet and heat pieces (which > will also help focus reviewer attention appropriately). Indeed, Michael, Dan & I are working on moving out the Puppet code from THT to puppet-tripleo. That's a nice move, and I appreciate TripleO team support on it. > 2. Day 1 developer experience > > This is closely related to the CI failure rate - there are efforts to > integrate with the RDO tripleo-quickstart tooling, which simplifies the > initial undercloud setup, and potentially makes consuming pre-built, > validated undercloud images (probably output artefacts from our new > periodic CI job) much easier. > > So, this will mean that both developers and CI can potentially be less > regularly impacted by trunk regressions which often cause CI to fail, and > break developer environments. > > https://review.openstack.org/#/c/276810/5 > > 3. CI coverage and trunk failure rate > > We've been working really hard to improve things here, which are really > several inter-related issues: > > - Lack of Hardware capacity in the tripleo CI cloud > - Frequent trunk regressions breaking our CI > - Lack of coverage of some key features (network isolation, SSL, IPv6, > upgrades) > - Lack of coverage for vendor plugin templates/puppet code > > There's work ongoing to improve this from multiple perspectives: > > New periodic CI job (to be used for automated promotion of the > current-tripleo repo, and for pre-built undercloud images): > https://review.openstack.org/#/c/271370/ > > Add network isolation support to CI: > https://review.openstack.org/#/c/288163/ > > Test SSL enabled in overcloud: > https://review.openstack.org/#/c/281988/ > > CI coverage of IPv6: > https://review.openstack.org/#/c/289445/ > > Discussion around better documented integration for third-party CI: > http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html Do we have plans to execute Tempest? > In summary, we're doing a ton of work as a community to address the > concerns raised by Emilien, and we've still got a lot more to do, but there > *is* clear agreement on many of the problems, and a concrete plan in most > cases to resolve them. The recent weeks showed real improvements (like you mentioned with examples) and that's a good sign for TripleO project. Thanks, -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Fri, Mar 18, 2016 at 01:27:33PM +, arkady_kanev...@dell.com wrote: >Emilien, > >Agree on the rant. But not clear on concrete proposal to fix it. > >Spend more time “fixing” CI and use Tempest as a gate is a bit wage. > >Unless we test known working version of each project in TripleO CI you are >dependent on health of other components. I've so far resisted replying to this thread, because while valid, many of the concerns expressed by Emilien are quite general complaints, and it's hard to reply with specific solutions. However work *is* going on to improve many of these problems, let's see if I can provide a summary, to clarify the various "concrete proposals" which do exist. 1. Core team & review velocity We've had a small and very overloaded core team for a while now, and this will be helped by expanding our community to include those who've been regularly contributing excellent work and reviews as core reviewers: http://lists.openstack.org/pipermail/openstack-dev/2016-February/087774.html http://lists.openstack.org/pipermail/openstack-dev/2016-March/089235.html http://lists.openstack.org/pipermail/openstack-dev/2016-March/089912.html http://lists.openstack.org/pipermail/openstack-dev/2016-March/089913.html Note that I personally think it's absolutely fine for folks to be more expert in some subsystem and to focus review extra attention on e.g API, UI, Puppet or whatever. This subsystem-core model has been well proven in other projects, and folks will naturally broaden their areas of deeper knowledge over time. Related to this is movement of code, such as the puppet-tripleo refactoring mentioned by Michael - this has already started, and will help with providing a cleaner interface between the puppet and heat pieces (which will also help focus reviewer attention appropriately). 2. Day 1 developer experience This is closely related to the CI failure rate - there are efforts to integrate with the RDO tripleo-quickstart tooling, which simplifies the initial undercloud setup, and potentially makes consuming pre-built, validated undercloud images (probably output artefacts from our new periodic CI job) much easier. So, this will mean that both developers and CI can potentially be less regularly impacted by trunk regressions which often cause CI to fail, and break developer environments. https://review.openstack.org/#/c/276810/5 3. CI coverage and trunk failure rate We've been working really hard to improve things here, which are really several inter-related issues: - Lack of Hardware capacity in the tripleo CI cloud - Frequent trunk regressions breaking our CI - Lack of coverage of some key features (network isolation, SSL, IPv6, upgrades) - Lack of coverage for vendor plugin templates/puppet code There's work ongoing to improve this from multiple perspectives: New periodic CI job (to be used for automated promotion of the current-tripleo repo, and for pre-built undercloud images): https://review.openstack.org/#/c/271370/ Add network isolation support to CI: https://review.openstack.org/#/c/288163/ Test SSL enabled in overcloud: https://review.openstack.org/#/c/281988/ CI coverage of IPv6: https://review.openstack.org/#/c/289445/ Discussion around better documented integration for third-party CI: http://lists.openstack.org/pipermail/openstack-dev/2016-March/088972.html In summary, we're doing a ton of work as a community to address the concerns raised by Emilien, and we've still got a lot more to do, but there *is* clear agreement on many of the problems, and a concrete plan in most cases to resolve them. Thanks, Steve __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
Emilien, Agree on the rant. But not clear on concrete proposal to fix it. Spend more time "fixing" CI and use Tempest as a gate is a bit wage. Unless we test known working version of each project in TripleO CI you are dependent on health of other components. Thanks, Arkady -Original Message- From: Emilien Macchi [mailto:emil...@redhat.com] Sent: Friday, March 04, 2016 8:23 AM To: OpenStack Development Mailing List Subject: [openstack-dev] [tripleo] Contributing to TripleO is challenging That's not the name of any Summit's talk, it's just an e-mail I wanted to write for a long time. It is an attempt to expose facts or things I've heard a lot; and bring constructive thoughts about why it's challenging to contribute in TripleO project. 1/ "I don't review this patch, we don't have CI coverage." One thing I've noticed in TripleO is that a very few people are involved in CI work. In my opinion, CI system is more critical than any feature in a product. Developing Software without tests is a bit like http://goo.gl/OlgFRc All people - specially core - in the project should be involved in CI work. If you are TripleO core and you don't contribute on CI, you might ask yourself why. 2/ "I don't review this patch, CI is broken." Another thing I've noticed in TripleO is that when CI is broken, again, a very few people are actually working on fixing failures. My experience over the last years taught me to stop my daily work when CI is broken and fix it asap. 3/ "I don't review it, because this feature / code is not my area". My first though is "Aren't we supposed to be engineers and learn new areas?" My second though is that I think we have a problem with TripleO Heat Templates. THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If TripleO core say "I'm not familiar with Puppet", we have a problem here, isn't? Maybe should we split this repository? Or revisit the list of people who can +2 patches on THT. 4/ Patches are stalled. Most of the time. Over the last 12 months, I've pushed a lot of patches in TripleO and one thing I've noticed is that if I don't ping people, my patch got no review. And I have to rebase it, every week, because the interface changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 again... and so on.. I personally spent 20% of my time to review code, every day. I wrote a blog post about how I'm doing review, with Gertty: http://my1.fr/blog/reviewing-puppet-openstack-patches/ I suggest TripleO folks to spend more time on reviews, for some reasons: * decreasing frustration from contributors * accelerate development process * teach new contributors to work on TripleO, and eventually scale-up the core team. It's a time investment, but worth it. In Puppet team, we have weekly triage sessions and it's pretty helpful. 5/ Most of the tests are run... manually. How many times I've heard "I've tested this patch locally, and it does not work so -1". The only test we do in current CI is a ping to an instance. Seriously? Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and real scenarios. And we run a ping. That's similar to 1/ but I wanted to raise it too. If we don't change our way to work on TripleO, people will be more frustrated and reduce contributions at some point. I hope from here we can have a open and constructive discussion to try to improve the TripleO project. Thank you for reading so far. -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Fri, Mar 11, 2016 at 3:46 AM, Michael Chapman wrote: > > > On Sat, Mar 5, 2016 at 10:31 AM, Giulio Fidente wrote: >> >> On 03/04/2016 03:23 PM, Emilien Macchi wrote: >>> >>> That's not the name of any Summit's talk, it's just an e-mail I wanted >>> to write for a long time. >>> >>> It is an attempt to expose facts or things I've heard a lot; and bring >>> constructive thoughts about why it's challenging to contribute in >>> TripleO project. >> >> >> hi Emilien, >> >> thanks for bringing this up, it's not an easy topic and yet of most >> crucial. As a core contributors I feel, to some extent, responsible for the >> current status of things and I think it's time for us to reflect more about >> what we can, individually, do. >> >> I have some ideas but I want to start by commenting to your points. >> >>> 1/ "I don't review this patch, we don't have CI coverage." >>> >>> One thing I've noticed in TripleO is that a very few people are involved >>> in CI work. >>> In my opinion, CI system is more critical than any feature in a product. >>> Developing Software without tests is a bit like http://goo.gl/OlgFRc >>> All people - specially core - in the project should be involved in CI >>> work. If you are TripleO core and you don't contribute on CI, you might >>> ask yourself why. >> >> >> Agreed, we need more 'eyes' on out CI to cope with both the infra and the >> inavoidable failures due to changes/bugs in the puppet modules or openstack >> itself. >> >> But there is more hiding behind this problem ... we already have quite a >> number of optional and even pluggable features in TripleO and we're even >> designing an interface to make this easier; testing them all isn't going to >> happen. So we'll always hit something we don't have coverage for. >> >> Let's have a conversation on how we can improve coverage at the summit! >> Maybe we can make simply make our CI scenarios more variegated/complex in >> the attempt to touch more features? >> >>> 2/ "I don't review this patch, CI is broken." >>> >>> Another thing I've noticed in TripleO is that when CI is broken, again, >>> a very few people are actually working on fixing failures. >>> My experience over the last years taught me to stop my daily work when >>> CI is broken and fix it asap. >> >> >> Agreed. More eyes and more coverage to increase its dependability. >> >>> 3/ "I don't review it, because this feature / code is not my area". >>> >>> My first though is "Aren't we supposed to be engineers and learn new >>> areas?" >>> My second though is that I think we have a problem with TripleO Heat >>> Templates. >>> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If >>> TripleO core say "I'm not familiar with Puppet", we have a problem here, >>> isn't? >>> Maybe should we split this repository? Or revisit the list of people who >>> can +2 patches on THT. >> >> >> Not sure here, I find that manifests and templates are pretty much "meant >> to go together" so I am worried that a split could solve some problems but >> also cause others. > > > This is pretty much what I proposed last week > (https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests) > and I noticed Dan approved the blueprint yesterday (cheers). It's definitely > going to cause problems in that THT defines the data interface and > puppet-tripleo is going to have to keep up with that interface in lock-step > in some cases so be prepared to deal with that as a patch author. This isn't > really any different to non-tripleo puppet module situations where a change > to the repo holding hiera data will be tied to changes in modules. > > Ideally I'd like to incrementally decouple the puppet-tripleo profiles from > the data heat provides but for the first cut they'll be joined at the hip. Michael, I've also been thinking at decoupling THT into puppet-tripleo manifests, please review: puppet-tripleo: glance api/registry: https://review.openstack.org/289459 THT: use puppet-tripleo to deploy Glance: https://review.openstack.org/289466 Any feedback is welcome, > So given a new home (puppet-tripleo) for a large portion of the code > (starting with overcloud controller and controller_pacemaker), hopefully > this paves the way for giving those who know puppet well the opportunity to > take on responsibility for the manifests without necessarily being > intimately familiar with the rest of the system, which I guess helps with > Emilien's original concern that there's a skill split across the tooling > lines. > >> >> >> This said, let's be honest, an effective patch for THT requires a good >> understanding of many different problems which can be TripleO specific (eg. >> implications on upgrades), tooling specific (eg. Heat/Puppet), OpenStack >> specific (eg. cooperation with other, optional, features) so I have myself >> skipped changes when I didn't feel comfortable with it. >> >> But one problem which I think is more recently slowing reviews and which >> is somewhat concause of 3) is t
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Sat, Mar 5, 2016 at 10:31 AM, Giulio Fidente wrote: > On 03/04/2016 03:23 PM, Emilien Macchi wrote: > >> That's not the name of any Summit's talk, it's just an e-mail I wanted >> to write for a long time. >> >> It is an attempt to expose facts or things I've heard a lot; and bring >> constructive thoughts about why it's challenging to contribute in >> TripleO project. >> > > hi Emilien, > > thanks for bringing this up, it's not an easy topic and yet of most > crucial. As a core contributors I feel, to some extent, responsible for the > current status of things and I think it's time for us to reflect more about > what we can, individually, do. > > I have some ideas but I want to start by commenting to your points. > > 1/ "I don't review this patch, we don't have CI coverage." >> >> One thing I've noticed in TripleO is that a very few people are involved >> in CI work. >> In my opinion, CI system is more critical than any feature in a product. >> Developing Software without tests is a bit like http://goo.gl/OlgFRc >> All people - specially core - in the project should be involved in CI >> work. If you are TripleO core and you don't contribute on CI, you might >> ask yourself why. >> > > Agreed, we need more 'eyes' on out CI to cope with both the infra and the > inavoidable failures due to changes/bugs in the puppet modules or openstack > itself. > > But there is more hiding behind this problem ... we already have quite a > number of optional and even pluggable features in TripleO and we're even > designing an interface to make this easier; testing them all isn't going to > happen. So we'll always hit something we don't have coverage for. > > Let's have a conversation on how we can improve coverage at the summit! > Maybe we can make simply make our CI scenarios more variegated/complex in > the attempt to touch more features? > > 2/ "I don't review this patch, CI is broken." >> >> Another thing I've noticed in TripleO is that when CI is broken, again, >> a very few people are actually working on fixing failures. >> My experience over the last years taught me to stop my daily work when >> CI is broken and fix it asap. >> > > Agreed. More eyes and more coverage to increase its dependability. > > 3/ "I don't review it, because this feature / code is not my area". >> >> My first though is "Aren't we supposed to be engineers and learn new >> areas?" >> My second though is that I think we have a problem with TripleO Heat >> Templates. >> THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If >> TripleO core say "I'm not familiar with Puppet", we have a problem here, >> isn't? >> Maybe should we split this repository? Or revisit the list of people who >> can +2 patches on THT. >> > > Not sure here, I find that manifests and templates are pretty much "meant > to go together" so I am worried that a split could solve some problems but > also cause others. > This is pretty much what I proposed last week ( https://blueprints.launchpad.net/tripleo/+spec/refactor-puppet-manifests) and I noticed Dan approved the blueprint yesterday (cheers). It's definitely going to cause problems in that THT defines the data interface and puppet-tripleo is going to have to keep up with that interface in lock-step in some cases so be prepared to deal with that as a patch author. This isn't really any different to non-tripleo puppet module situations where a change to the repo holding hiera data will be tied to changes in modules. Ideally I'd like to incrementally decouple the puppet-tripleo profiles from the data heat provides but for the first cut they'll be joined at the hip. So given a new home (puppet-tripleo) for a large portion of the code (starting with overcloud controller and controller_pacemaker), hopefully this paves the way for giving those who know puppet well the opportunity to take on responsibility for the manifests without necessarily being intimately familiar with the rest of the system, which I guess helps with Emilien's original concern that there's a skill split across the tooling lines. > > This said, let's be honest, an effective patch for THT requires a good > understanding of many different problems which can be TripleO specific (eg. > implications on upgrades), tooling specific (eg. Heat/Puppet), OpenStack > specific (eg. cooperation with other, optional, features) so I have myself > skipped changes when I didn't feel comfortable with it. > > But one problem which I think is more recently slowing reviews and which > is somewhat concause of 3) is that we're not dealing too well with code > duplication in the yamls and with conditional logic in the manifests. > > Maybe we could stop and think a together about new HOT functionalities > which could help us? Interesting for the summit as well? > > 4/ Patches are stalled. Most of the time. >> >> Over the last 12 months, I've pushed a lot of patches in TripleO and one >> thing I've noticed is that if I don't ping people, my patch got no >> review. And
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Fri, Mar 4, 2016 at 9:23 AM, Emilien Macchi wrote: > That's not the name of any Summit's talk, it's just an e-mail I wanted > to write for a long time. > > It is an attempt to expose facts or things I've heard a lot; and bring > constructive thoughts about why it's challenging to contribute in > TripleO project. Thanks for sharing your thoughts. I struggled a bit with responding to be honest, as I think the points you call attention to often lead to frustration on the side of both patch authors and reviewers. I think these things are more anecdotal than fact based to be honest. But, that doesn't mean that it's not a constructive conversation, so thanks for calling out the issues. At the core, we need a lot more investment in CI. We could use more physical resources and more contributors. Or, we could direct some of the capacity away from new development and put it towards CI improvements instead. That might allow us to cover more features in CI, which I think would have a direct impact on review velocity. There are also some architectural changes proposed (split-stack) that will allow us to scale CI more effectively than we have in the past. > > > 1/ "I don't review this patch, we don't have CI coverage." If a patch is obviously not covered by CI, it would go along way if the author indicated if and how they manually tested a patch. Often, I see a non-trivial patch that our CI does not cover. When I try to manually test the patch, it doesn't work. Sometimes in very obvious ways. Over time, this sort of pattern lowers confidence of core reviewers to +2 and approve non-trivial patches that they themselves haven't manually tested. I know that manual testing doesn't scale. However, right now, our CI doesn't scale to cover every possible combination of features either. So, if we want to keep moving forward at all, some things will have to be manually tested. If patch authors added a comment such as "i tested this with network isolation and it worked as expected, and it doesn't break existing CI", that would go a long ways towards giving people the confidence to approve it. > > One thing I've noticed in TripleO is that a very few people are involved > in CI work. > In my opinion, CI system is more critical than any feature in a product. > Developing Software without tests is a bit like http://goo.gl/OlgFRc > All people - specially core - in the project should be involved in CI > work. If you are TripleO core and you don't contribute on CI, you might > ask yourself why. That's fair. Although it's also fair to ask the same of non-cores. It's not just the job of core reviewers to get patches to pass CI. A lot of times I get pinged to review patches with a sense of urgency about them, yet they are sitting with failed CI. And it's not like people are pinging me to help them understand and fix why the patch has failed CI. They just want it reviewed/approved. > > > 2/ "I don't review this patch, CI is broken." Do you mean when CI is generally failing across the board? I can honestly say that when CI is generally failing for whatever reason (infrastructure issue, OpenStack regression, TripleO regression), there is almost always 1 or 2 TripleO cores all over that issue. Not everyone needs to be working on the issue at once. But, I can honestly say I've never seen TripleO just completely red where at least one person wasn't working on it almost exclusively. However, if you're saying that people don't review a patch if that specific patch has failed CI on it, then I think there is a lot of shared responsibility there. It's not just on reviewers to see why something has failed CI, or to try to get it to pass. I'm less likely to review a patch if it has been sitting for several days with a failed CI job on it. The author probably doesn't need it landed that bad if that's the case. Often, there's not even a comment if they looked at the failed job to see why. So, yea, I'm less likely to review those patches honestly, and maybe that's not fair. Just so I'm clear, I'm not saying that I ignore patches with failed CI. I try and help new contributors or people I might not recognize get their patches to pass CI. I recognize there is a steep learning curve there. But when I see patches out there from folks who are capable of at least triaging the failure, and that hasn't been done, I'm certainly guilty of de-prioritizing reviewing that patch. Maybe that's a bad thing. > > Another thing I've noticed in TripleO is that when CI is broken, again, > a very few people are actually working on fixing failures. This sounds like you're talking about scenarios when all of TripleO Ci is failing. In which case, I really disagree with your assertion that people aren't rapidly fixing those failures. > My experience over the last years taught me to stop my daily work when > CI is broken and fix it asap. > > 3/ "I don't review it, because this feature / code is not my area". > > My first though is "Aren't we supposed to be
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
While I agree with some (not all) of the sentiment below I'm not sure I want to spend the time debating this rather broad set of topics in this email thread. I'm not sure we'd actually ever see the end of it if we did. Nor can the upstream TripleO team control all the forces at play here. So rather than this... Would it be reasonable to ask that we take this a step further and split things out into concrete ideas to improve these areas? Perhaps each in its own spec or email thread so that we can reach clear conclusions to each problem... a step at a time. A couple of things to set the record straight: On the CI issues We actually have some really good ideas on the table to solve some of these CI problems including architectural changes like "split stack" ideas which could allow parts of our overcloud CI to run on normal cloud instances, auto-promoting package repositories based on nightly periodic jobs, caching our image builds, etc. Some of these things will open the door to new features like the ability to run more test suites (which we haven't done yet due to the long wall time associated with our CI at this point). There are reasons for TripleO CI, why it exists, why we have put so much effort into keeping it running over the years. Yes our tests take a long time to run, and yes we have some things we still do manually, but we do catch a lot of issues and breakages in both our own and other OpenStack projects. And while our core team often disagrees on things I think we do agree that continuing to expand upstream CI coverage on major features is key to digging out of the hole we are in. As for the rest of it I think a lot of it has to do with doing the best we can with limited upstream resources. To me the real problem driving a majority of the issues you describe below is simply trying to land X number of features upstream by a given date with little to no CI coverage. The sooner we take the time and discipline to stop this the better. Dan On Fri, 2016-03-04 at 09:23 -0500, Emilien Macchi wrote: > That's not the name of any Summit's talk, it's just an e-mail I > wanted > to write for a long time. > > It is an attempt to expose facts or things I've heard a lot; and > bring > constructive thoughts about why it's challenging to contribute in > TripleO project. > > > 1/ "I don't review this patch, we don't have CI coverage." > > One thing I've noticed in TripleO is that a very few people are > involved > in CI work. > In my opinion, CI system is more critical than any feature in a > product. > Developing Software without tests is a bit like http://goo.gl/OlgFRc > All people - specially core - in the project should be involved in CI > work. If you are TripleO core and you don't contribute on CI, you > might > ask yourself why. > > > 2/ "I don't review this patch, CI is broken." > > Another thing I've noticed in TripleO is that when CI is broken, > again, > a very few people are actually working on fixing failures. > My experience over the last years taught me to stop my daily work > when > CI is broken and fix it asap. > > > 3/ "I don't review it, because this feature / code is not my area". > > My first though is "Aren't we supposed to be engineers and learn new > areas?" > My second though is that I think we have a problem with TripleO Heat > Templates. > THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If > TripleO core say "I'm not familiar with Puppet", we have a problem > here, > isn't? > Maybe should we split this repository? Or revisit the list of people > who > can +2 patches on THT. > > > 4/ Patches are stalled. Most of the time. > > Over the last 12 months, I've pushed a lot of patches in TripleO and > one > thing I've noticed is that if I don't ping people, my patch got no > review. And I have to rebase it, every week, because the interface > changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for > +2 > again... and so on.. > > I personally spent 20% of my time to review code, every day. > I wrote a blog post about how I'm doing review, with Gertty: > http://my1.fr/blog/reviewing-puppet-openstack-patches/ > I suggest TripleO folks to spend more time on reviews, for some > reasons: > > * decreasing frustration from contributors > * accelerate development process > * teach new contributors to work on TripleO, and eventually scale-up > the > core team. It's a time investment, but worth it. > > In Puppet team, we have weekly triage sessions and it's pretty > helpful. > > > 5/ Most of the tests are run... manually. > > How many times I've heard "I've tested this patch locally, and it > does > not work so -1". > > The only test we do in current CI is a ping to an instance. > Seriously? > Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs > and > real scenarios. And we run a ping. > That's similar to 1/ but I wanted to raise it too. > > > > If we don't change our way to work on TripleO, people will be more > frustrated an
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
Any comment? Cheers, S On Sat, Mar 5, 2016 at 11:03 AM, Adam Young wrote: > On 03/04/2016 09:23 AM, Emilien Macchi wrote: > > That's not the name of any Summit's talk, it's just an e-mail I wanted > to write for a long time. > > It is an attempt to expose facts or things I've heard a lot; and bring > constructive thoughts about why it's challenging to contribute in > TripleO project. > > > 1/ "I don't review this patch, we don't have CI coverage." > > One thing I've noticed in TripleO is that a very few people are involved > in CI work. > In my opinion, CI system is more critical than any feature in a product. > Developing Software without tests is a bit like http://goo.gl/OlgFRc > All people - specially core - in the project should be involved in CI > work. If you are TripleO core and you don't contribute on CI, you might > ask yourself why. > > > OK...so what is the state of Tripleo CI? My experience with Tripleo has > shown that it is quite resource intesive, far more so than, say, Keystone, > and so I could see that being the gating factor. > > > In order for me to be able to get into Tripleo coding, I needed a new > machine, with 32 Gb of Ram, separate from my everyday work machine. Not a > killer outlay, but enough to hold me up until I got the HW allocated. > > If we could split up the testing undercloud vs. overcloud, it might be more > feasable. I see no fundamental reason that the majority of the Overcloud > development and testing could not be done on top of a non-ironic based > OpenStack deployment. > > That leaves just the undercloud, which could, possibly, also run onto top of > an existing OpenStack deployment for much of the development. > > A true end to end run of Tripleo with HA requires a lot: 3 Physical > machines plus a little overhead for the Overcloud. But this is what is > really needed. Ideally, on multiple vendors' systems, so that we identify > some aspect of the Hardware variation. > > > > > 2/ "I don't review this patch, CI is broken." > > Another thing I've noticed in TripleO is that when CI is broken, again, > a very few people are actually working on fixing failures. > My experience over the last years taught me to stop my daily work when > CI is broken and fix it asap. > > > Puppet and Heat are black boxes to me still. I don't clearly understand how > they fit together. > > I think we need to start depuppetifying Tripleo. I know we have a lot of > sunk costs in to it, but we went with Puppet because it was all we had, not > that it well matched the problem set. > > I'd recommend a freeze on all new Puppet development, and start doing all > new features in Ansible. Fully acknowledging the havoc this will wreak, I > think it is important strategically. It is really hard to swap between two > languages, and the rest of OpenStack in Python. Switching to Ruby is hard. > > All of our Client support is in Python. > > The number of people that know Puppet that actively contribute to OpenStack > is small. The number of real Ruby experts is smaller. > > > > 3/ "I don't review it, because this feature / code is not my area". > > My first though is "Aren't we supposed to be engineers and learn new areas?" > My second though is that I think we have a problem with TripleO Heat > Templates. > THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If > TripleO core say "I'm not familiar with Puppet", we have a problem here, > isn't? > Maybe should we split this repository? Or revisit the list of people who > can +2 patches on THT. > > I am more than happy to review anything Keystone related, but again, I > struggle with Puppet. > > Not really knowing Heat as well makes it even tougher. We need a better > overall orientation guide if people are going to come up to speed quicker. > > > > > 4/ Patches are stalled. Most of the time. > > Over the last 12 months, I've pushed a lot of patches in TripleO and one > thing I've noticed is that if I don't ping people, my patch got no > review. And I have to rebase it, every week, because the interface > changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 > again... and so on.. > > Same is true on Keystone. There is just a lot to get done on this project. > All these projects. > > > I personally spent 20% of my time to review code, every day. > I wrote a blog post about how I'm doing review, with Gertty: > http://my1.fr/blog/reviewing-puppet-openstack-patches/ > I suggest TripleO folks to spend more time on reviews, for some reasons: > > > Nice of you to write that up. > > * decreasing frustration from contributors > * accelerate development process > * teach new contributors to work on TripleO, and eventually scale-up the > core team. It's a time investment, but worth it. > > In Puppet team, we have weekly triage sessions and it's pretty helpful. > > > 5/ Most of the tests are run... manually. > > How many times I've heard "I've tested this patch locally, and it does > not work so -1". > > The only test we do
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On 03/04/2016 09:23 AM, Emilien Macchi wrote: That's not the name of any Summit's talk, it's just an e-mail I wanted to write for a long time. It is an attempt to expose facts or things I've heard a lot; and bring constructive thoughts about why it's challenging to contribute in TripleO project. 1/ "I don't review this patch, we don't have CI coverage." One thing I've noticed in TripleO is that a very few people are involved in CI work. In my opinion, CI system is more critical than any feature in a product. Developing Software without tests is a bit like http://goo.gl/OlgFRc All people - specially core - in the project should be involved in CI work. If you are TripleO core and you don't contribute on CI, you might ask yourself why. OK...so what is the state of Tripleo CI? My experience with Tripleo has shown that it is quite resource intesive, far more so than, say, Keystone, and so I could see that being the gating factor. In order for me to be able to get into Tripleo coding, I needed a new machine, with 32 Gb of Ram, separate from my everyday work machine. Not a killer outlay, but enough to hold me up until I got the HW allocated. If we could split up the testing undercloud vs. overcloud, it might be more feasable. I see no fundamental reason that the majority of the Overcloud development and testing could not be done on top of a non-ironic based OpenStack deployment. That leaves just the undercloud, which could, possibly, also run onto top of an existing OpenStack deployment for much of the development. A true end to end run of Tripleo with HA requires a lot: 3 Physical machines plus a little overhead for the Overcloud. But this is what is really needed. Ideally, on multiple vendors' systems, so that we identify some aspect of the Hardware variation. 2/ "I don't review this patch, CI is broken." Another thing I've noticed in TripleO is that when CI is broken, again, a very few people are actually working on fixing failures. My experience over the last years taught me to stop my daily work when CI is broken and fix it asap. Puppet and Heat are black boxes to me still. I don't clearly understand how they fit together. I think we need to start depuppetifying Tripleo. I know we have a lot of sunk costs in to it, but we went with Puppet because it was all we had, not that it well matched the problem set. I'd recommend a freeze on all new Puppet development, and start doing all new features in Ansible. Fully acknowledging the havoc this will wreak, I think it is important strategically. It is really hard to swap between two languages, and the rest of OpenStack in Python. Switching to Ruby is hard. All of our Client support is in Python. The number of people that know Puppet that actively contribute to OpenStack is small. The number of real Ruby experts is smaller. 3/ "I don't review it, because this feature / code is not my area". My first though is "Aren't we supposed to be engineers and learn new areas?" My second though is that I think we have a problem with TripleO Heat Templates. THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If TripleO core say "I'm not familiar with Puppet", we have a problem here, isn't? Maybe should we split this repository? Or revisit the list of people who can +2 patches on THT. I am more than happy to review anything Keystone related, but again, I struggle with Puppet. Not really knowing Heat as well makes it even tougher. We need a better overall orientation guide if people are going to come up to speed quicker. 4/ Patches are stalled. Most of the time. Over the last 12 months, I've pushed a lot of patches in TripleO and one thing I've noticed is that if I don't ping people, my patch got no review. And I have to rebase it, every week, because the interface changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 again... and so on.. Same is true on Keystone. There is just a lot to get done on this project. All these projects. I personally spent 20% of my time to review code, every day. I wrote a blog post about how I'm doing review, with Gertty: http://my1.fr/blog/reviewing-puppet-openstack-patches/ I suggest TripleO folks to spend more time on reviews, for some reasons: Nice of you to write that up. * decreasing frustration from contributors * accelerate development process * teach new contributors to work on TripleO, and eventually scale-up the core team. It's a time investment, but worth it. In Puppet team, we have weekly triage sessions and it's pretty helpful. 5/ Most of the tests are run... manually. How many times I've heard "I've tested this patch locally, and it does not work so -1". The only test we do in current CI is a ping to an instance. Seriously? Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and real scenarios. And we run a ping. That's similar to 1/ but I wanted to raise it too. Again, testing is expensive; if I am
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On 03/04/2016 03:23 PM, Emilien Macchi wrote: That's not the name of any Summit's talk, it's just an e-mail I wanted to write for a long time. It is an attempt to expose facts or things I've heard a lot; and bring constructive thoughts about why it's challenging to contribute in TripleO project. hi Emilien, thanks for bringing this up, it's not an easy topic and yet of most crucial. As a core contributors I feel, to some extent, responsible for the current status of things and I think it's time for us to reflect more about what we can, individually, do. I have some ideas but I want to start by commenting to your points. 1/ "I don't review this patch, we don't have CI coverage." One thing I've noticed in TripleO is that a very few people are involved in CI work. In my opinion, CI system is more critical than any feature in a product. Developing Software without tests is a bit like http://goo.gl/OlgFRc All people - specially core - in the project should be involved in CI work. If you are TripleO core and you don't contribute on CI, you might ask yourself why. Agreed, we need more 'eyes' on out CI to cope with both the infra and the inavoidable failures due to changes/bugs in the puppet modules or openstack itself. But there is more hiding behind this problem ... we already have quite a number of optional and even pluggable features in TripleO and we're even designing an interface to make this easier; testing them all isn't going to happen. So we'll always hit something we don't have coverage for. Let's have a conversation on how we can improve coverage at the summit! Maybe we can make simply make our CI scenarios more variegated/complex in the attempt to touch more features? 2/ "I don't review this patch, CI is broken." Another thing I've noticed in TripleO is that when CI is broken, again, a very few people are actually working on fixing failures. My experience over the last years taught me to stop my daily work when CI is broken and fix it asap. Agreed. More eyes and more coverage to increase its dependability. 3/ "I don't review it, because this feature / code is not my area". My first though is "Aren't we supposed to be engineers and learn new areas?" My second though is that I think we have a problem with TripleO Heat Templates. THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If TripleO core say "I'm not familiar with Puppet", we have a problem here, isn't? Maybe should we split this repository? Or revisit the list of people who can +2 patches on THT. Not sure here, I find that manifests and templates are pretty much "meant to go together" so I am worried that a split could solve some problems but also cause others. This said, let's be honest, an effective patch for THT requires a good understanding of many different problems which can be TripleO specific (eg. implications on upgrades), tooling specific (eg. Heat/Puppet), OpenStack specific (eg. cooperation with other, optional, features) so I have myself skipped changes when I didn't feel comfortable with it. But one problem which I think is more recently slowing reviews and which is somewhat concause of 3) is that we're not dealing too well with code duplication in the yamls and with conditional logic in the manifests. Maybe we could stop and think a together about new HOT functionalities which could help us? Interesting for the summit as well? 4/ Patches are stalled. Most of the time. Over the last 12 months, I've pushed a lot of patches in TripleO and one thing I've noticed is that if I don't ping people, my patch got no review. And I have to rebase it, every week, because the interface changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 again... and so on.. I personally spent 20% of my time to review code, every day. I wrote a blog post about how I'm doing review, with Gertty: http://my1.fr/blog/reviewing-puppet-openstack-patches/ I suggest TripleO folks to spend more time on reviews, for some reasons: * decreasing frustration from contributors * accelerate development process * teach new contributors to work on TripleO, and eventually scale-up the core team. It's a time investment, but worth it. I'm inclined to think that this is a bit of a consequence of 1), 2) and 3) together. In Puppet team, we have weekly triage sessions and it's pretty helpful. Right. I think we experimented with something like this before but it was probably perceived as an emergency measure so we put it on a side after a while. I remember we had a list of 'hot reviews' which we would review during the weekly meetings. But it isn't trivial to understand which type of review is considered hot. What is the purpose of the puppet team triaging? To find old reviews? Mergeable reviews? To dropping stale reviews? To speed up bug fixes? To get attention on features? 5/ Most of the tests are run... manually. How many times I've heard "I've tested this patch locally, and it
Re: [openstack-dev] [tripleo] Contributing to TripleO is challenging
On Fri, Mar 04, 2016 at 09:23:19AM -0500, Emilien Macchi wrote: > That's not the name of any Summit's talk, it's just an e-mail I wanted > to write for a long time. > > It is an attempt to expose facts or things I've heard a lot; and bring > constructive thoughts about why it's challenging to contribute in > TripleO project. > > > 1/ "I don't review this patch, we don't have CI coverage." > > One thing I've noticed in TripleO is that a very few people are involved > in CI work. > In my opinion, CI system is more critical than any feature in a product. > Developing Software without tests is a bit like http://goo.gl/OlgFRc > All people - specially core - in the project should be involved in CI > work. If you are TripleO core and you don't contribute on CI, you might > ask yourself why. > As somebody who contributes to openstack-infa and knows most of the ins and outs of OpenStack CI, I often wish the TripleO CI would be more inline with openstack-infa. Right now, TripleO CI is a black hole to me. I understand there are some reason to have separate CI (eg: baremetal provisioning) but it would be nice to revisit the current setup and see if we can move more inline with openstack-infra. For the simple reason, having common tooling means I can contribute to TripleO CI if needed. > > 2/ "I don't review this patch, CI is broken." > > Another thing I've noticed in TripleO is that when CI is broken, again, > a very few people are actually working on fixing failures. > My experience over the last years taught me to stop my daily work when > CI is broken and fix it asap. > See my above comment. I think this would go a great way to helping the team. > > 3/ "I don't review it, because this feature / code is not my area". > > My first though is "Aren't we supposed to be engineers and learn new areas?" > My second though is that I think we have a problem with TripleO Heat > Templates. > THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If > TripleO core say "I'm not familiar with Puppet", we have a problem here, > isn't? > Maybe should we split this repository? Or revisit the list of people who > can +2 patches on THT. > > > 4/ Patches are stalled. Most of the time. > > Over the last 12 months, I've pushed a lot of patches in TripleO and one > thing I've noticed is that if I don't ping people, my patch got no > review. And I have to rebase it, every week, because the interface > changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 > again... and so on.. > > I personally spent 20% of my time to review code, every day. > I wrote a blog post about how I'm doing review, with Gertty: > http://my1.fr/blog/reviewing-puppet-openstack-patches/ > I suggest TripleO folks to spend more time on reviews, for some reasons: > > * decreasing frustration from contributors > * accelerate development process > * teach new contributors to work on TripleO, and eventually scale-up the > core team. It's a time investment, but worth it. > > In Puppet team, we have weekly triage sessions and it's pretty helpful. > > > 5/ Most of the tests are run... manually. > > How many times I've heard "I've tested this patch locally, and it does > not work so -1". > > The only test we do in current CI is a ping to an instance. Seriously? > Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and > real scenarios. And we run a ping. > That's similar to 1/ but I wanted to raise it too. > > > > If we don't change our way to work on TripleO, people will be more > frustrated and reduce contributions at some point. > I hope from here we can have a open and constructive discussion to try > to improve the TripleO project. > > Thank you for reading so far. > -- > Emilien Macchi > So for me, I'd love to help more but having to context shift into TripleO CI is a deal breaker for me (and more of -infra is I was a betting man). So, anything I can do to help move things like base images or using AFS mirrors into TripleO I am happy to help. However, having the TripleO team maintain CI themselves doesn't seem to be the best case scenario. > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] Contributing to TripleO is challenging
That's not the name of any Summit's talk, it's just an e-mail I wanted to write for a long time. It is an attempt to expose facts or things I've heard a lot; and bring constructive thoughts about why it's challenging to contribute in TripleO project. 1/ "I don't review this patch, we don't have CI coverage." One thing I've noticed in TripleO is that a very few people are involved in CI work. In my opinion, CI system is more critical than any feature in a product. Developing Software without tests is a bit like http://goo.gl/OlgFRc All people - specially core - in the project should be involved in CI work. If you are TripleO core and you don't contribute on CI, you might ask yourself why. 2/ "I don't review this patch, CI is broken." Another thing I've noticed in TripleO is that when CI is broken, again, a very few people are actually working on fixing failures. My experience over the last years taught me to stop my daily work when CI is broken and fix it asap. 3/ "I don't review it, because this feature / code is not my area". My first though is "Aren't we supposed to be engineers and learn new areas?" My second though is that I think we have a problem with TripleO Heat Templates. THT or TripleO Heat Templates's code is 80% of Puppet / Hiera. If TripleO core say "I'm not familiar with Puppet", we have a problem here, isn't? Maybe should we split this repository? Or revisit the list of people who can +2 patches on THT. 4/ Patches are stalled. Most of the time. Over the last 12 months, I've pushed a lot of patches in TripleO and one thing I've noticed is that if I don't ping people, my patch got no review. And I have to rebase it, every week, because the interface changed. I got +2, cool ! Oh, merge conflict. Rebasing. Waiting for +2 again... and so on.. I personally spent 20% of my time to review code, every day. I wrote a blog post about how I'm doing review, with Gertty: http://my1.fr/blog/reviewing-puppet-openstack-patches/ I suggest TripleO folks to spend more time on reviews, for some reasons: * decreasing frustration from contributors * accelerate development process * teach new contributors to work on TripleO, and eventually scale-up the core team. It's a time investment, but worth it. In Puppet team, we have weekly triage sessions and it's pretty helpful. 5/ Most of the tests are run... manually. How many times I've heard "I've tested this patch locally, and it does not work so -1". The only test we do in current CI is a ping to an instance. Seriously? Most of OpenStack CIs (Fuel included), run Tempest, for testing APIs and real scenarios. And we run a ping. That's similar to 1/ but I wanted to raise it too. If we don't change our way to work on TripleO, people will be more frustrated and reduce contributions at some point. I hope from here we can have a open and constructive discussion to try to improve the TripleO project. Thank you for reading so far. -- Emilien Macchi signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev