Juju induction sprint summary
Hi all So last week we had a Juju induction sprint for Tanzanite and Moonstone teams to welcome Eric and Katherine to the Juju fold. Following is a summary of some key outcomes from the sprint that are relevant to others working on Juju (we also did other stuff not generally applicable for this email). Some items will interest some folks, while others may not quite be so relevant to you, so scan the topics to see what you find interesting. * Architectural overview - and a cool new tool The sprint started with an architectural overview of the Juju moving parts and how they interacted to deploy and maintain a Juju environment. Katherine noted that our in-tree documentation has lots of text and no diagrams. She pointed out a great tool for easily putting together UML diagrams using a simple text based syntax - Plant UML http://plantuml.sourceforge.net. Check it out, it's pretty cool. We'll be adding a diagram or two to the in-tree docs to show how it works. * Code review (replacement for Github's native code review) We are going to use Review Board. When we first looked at it before the sprint, a major show stopper was lack of an auth plugin which worked with Github. Eric has stepped up and written the necessary plugin. We'll have something deployed this week or early next week, once some more tooling to finish the Github integration is done. The key features: - Login with Github button on main login screen - pull requests automatically imported to Review Board and added to review queue - diffs can be uploaded to Review Board as WIP and submitted to Github when finalised * Fixing the Juju state.State mess state is a mess of layering violations and intermingled concerns. The result is slow and fragile unit tests, scalability issues, hard to understand code, code which is difficult to extend and refactor (to name a few issues). The correct layering should be something like: * remote service interface (aka apiserver) * juju services for managing machines, services, units etc * juju domain model * model persistence (aka state) The persistence layer above is all that should be in the state package. The plan is to incrementally extract Juju service business logic out of state and pull it up into a services layer. The first to be done is the machine placement and deployment logic. Wayne has a WIP branch for this. The benefit of this work can't be overstated, and the sprint allowed both teams to be able to work together to understand the direction and intent of the work. * Mongo 2.6 support The work to port Juju to Mongo 2.6 is pretty much complete. The newer Mongo version offers a number of bug fixes and improvements over the 2.4 series, and we need to be able to run with an up-to-date version. * Providers don't need to have a storage implementation (almost) A significant chunk of old code which was to support agents connecting directly to mongo was removed (along with the necessary refactoring). This then allowed the Environ interface to drop the StateInfo() method and instead implement a method which returns the state server instances (not committed yet but close). The next step is to remove the Storage() interface from Environ and make storage an internal implementation detail which is not mandatory, so long as providers have a way to figure out their state servers (this can be done using tagging for example). * Juju 1.20.1 release (aka juju/mongo issues) A number of issues with how Juju and mongo interact became apparent when replicasets were used for HA. Unfortunately Juju 1.20 shipped with these issues unfixed. Part of the sprint was spent working on some urgent fixes to ship a bug fix 1.20.1 release. There's still an outstanding mongo session issue that needs to be fixed this week for a 1.20.2 release. Michael is working on it. The tl;dr is that we are holding onto sessions and not refreshing, which means that the underlying socket can time out and Juju loses connection to mongo. * Add support for Juju in China for Amazon (almost) The supported regions for the EC2 provider are hard coded and so new regions in China were not supported. The Chinese regions also use a new signing algorithm. There should be a fix in place this week. Since all the changes are in the goamz library, the change to juju-core is merely a dependency update. So this feature should be available in the 1.20.2 release. All up, a productive sprint with some great collaboration between the two teams. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju induction sprint summary
Great work, I am particularly happy to see that we have an incremental, and useful plan to take care of some of the technical debt in state.State. This is the classic form of technical debt as described by Ward Cunningham -- we have learned a good bit about the problem space and where flexibility is needed since the code was originally written and the way we would do it today is different than the best way we knew how to build it two years ago. --Mark Ramm On Mon, Jul 14, 2014 at 4:43 AM, Ian Booth ian.bo...@canonical.com wrote: Hi all So last week we had a Juju induction sprint for Tanzanite and Moonstone teams to welcome Eric and Katherine to the Juju fold. Following is a summary of some key outcomes from the sprint that are relevant to others working on Juju (we also did other stuff not generally applicable for this email). Some items will interest some folks, while others may not quite be so relevant to you, so scan the topics to see what you find interesting. * Architectural overview - and a cool new tool The sprint started with an architectural overview of the Juju moving parts and how they interacted to deploy and maintain a Juju environment. Katherine noted that our in-tree documentation has lots of text and no diagrams. She pointed out a great tool for easily putting together UML diagrams using a simple text based syntax - Plant UML http://plantuml.sourceforge.net. Check it out, it's pretty cool. We'll be adding a diagram or two to the in-tree docs to show how it works. * Code review (replacement for Github's native code review) We are going to use Review Board. When we first looked at it before the sprint, a major show stopper was lack of an auth plugin which worked with Github. Eric has stepped up and written the necessary plugin. We'll have something deployed this week or early next week, once some more tooling to finish the Github integration is done. The key features: - Login with Github button on main login screen - pull requests automatically imported to Review Board and added to review queue - diffs can be uploaded to Review Board as WIP and submitted to Github when finalised * Fixing the Juju state.State mess state is a mess of layering violations and intermingled concerns. The result is slow and fragile unit tests, scalability issues, hard to understand code, code which is difficult to extend and refactor (to name a few issues). The correct layering should be something like: * remote service interface (aka apiserver) * juju services for managing machines, services, units etc * juju domain model * model persistence (aka state) The persistence layer above is all that should be in the state package. The plan is to incrementally extract Juju service business logic out of state and pull it up into a services layer. The first to be done is the machine placement and deployment logic. Wayne has a WIP branch for this. The benefit of this work can't be overstated, and the sprint allowed both teams to be able to work together to understand the direction and intent of the work. * Mongo 2.6 support The work to port Juju to Mongo 2.6 is pretty much complete. The newer Mongo version offers a number of bug fixes and improvements over the 2.4 series, and we need to be able to run with an up-to-date version. * Providers don't need to have a storage implementation (almost) A significant chunk of old code which was to support agents connecting directly to mongo was removed (along with the necessary refactoring). This then allowed the Environ interface to drop the StateInfo() method and instead implement a method which returns the state server instances (not committed yet but close). The next step is to remove the Storage() interface from Environ and make storage an internal implementation detail which is not mandatory, so long as providers have a way to figure out their state servers (this can be done using tagging for example). * Juju 1.20.1 release (aka juju/mongo issues) A number of issues with how Juju and mongo interact became apparent when replicasets were used for HA. Unfortunately Juju 1.20 shipped with these issues unfixed. Part of the sprint was spent working on some urgent fixes to ship a bug fix 1.20.1 release. There's still an outstanding mongo session issue that needs to be fixed this week for a 1.20.2 release. Michael is working on it. The tl;dr is that we are holding onto sessions and not refreshing, which means that the underlying socket can time out and Juju loses connection to mongo. * Add support for Juju in China for Amazon (almost) The supported regions for the EC2 provider are hard coded and so new regions in China were not supported. The Chinese regions also use a new signing algorithm. There should be a fix in place this week. Since all the changes are in the goamz library, the change to juju-core is merely a dependency update. So this feature should be
Devel is broken, we cannot release
Devel has been broken for weeks because of regressions. We cannot release devel. The stable 1.20.0 that we release is actually older than it appears because we had to search CI for an older revision that worked. We have a systemic problem: once a regression is introduced, it blocks the release for weeks, and we build on top of the regression. We often see many regressions.The regression mutate as people merge more branches. The current two regressions are: * win juju client still broken with unknown from 2014-06-27 which has varied as a compilation problem or panic during execution. https://bugs.launchpad.net/juju-core/+bug/1335328 * FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which had a secondary bug that broke compilation. https://bugs.launchpad.net/juju-core/+bug/1336089 I think the problem is engineers are focused on there feature. They don't see the fallout from their changes. They may hope the fix will arrive soon, and that maybe someone else will fix it. I propose a change in policy. When a there is a regression in CI, no new branches can be merged except those that link to the blocking bug. This will encourage engineers to fix the regression. One way to fix the regression is to identify and revert the commit that broken CI. -- Curtis Hovey Canonical Cloud Development and Operations http://launchpad.net/~sinzui -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Devel is broken, we cannot release
+1 I've experienced this type of policy and it leads to few things. More tests, better tests, and better self reviews and developer QA. I believe borrowing the other ideas from lean and agile but not having the big stop button when a defect is found is an unsustainable approach to development and with the recent growth in the number of people actively working on the code base we are now experiencing that first had. On Jul 14, 2014 3:06 PM, Curtis Hovey-Canonical cur...@canonical.com wrote: Devel has been broken for weeks because of regressions. We cannot release devel. The stable 1.20.0 that we release is actually older than it appears because we had to search CI for an older revision that worked. We have a systemic problem: once a regression is introduced, it blocks the release for weeks, and we build on top of the regression. We often see many regressions.The regression mutate as people merge more branches. The current two regressions are: * win juju client still broken with unknown from 2014-06-27 which has varied as a compilation problem or panic during execution. https://bugs.launchpad.net/juju-core/+bug/1335328 * FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which had a secondary bug that broke compilation. https://bugs.launchpad.net/juju-core/+bug/1336089 I think the problem is engineers are focused on there feature. They don't see the fallout from their changes. They may hope the fix will arrive soon, and that maybe someone else will fix it. I propose a change in policy. When a there is a regression in CI, no new branches can be merged except those that link to the blocking bug. This will encourage engineers to fix the regression. One way to fix the regression is to identify and revert the commit that broken CI. -- Curtis Hovey Canonical Cloud Development and Operations http://launchpad.net/~sinzui -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Devel is broken, we cannot release
I think the problem is engineers are focused on there feature. They don't see the fallout from their changes. They may hope the fix will arrive soon, and that maybe someone else will fix it. I propose a change in policy. When a there is a regression in CI, no new branches can be merged except those that link to the blocking bug. This will encourage engineers to fix the regression. One way to fix the regression is to identify and revert the commit that broken CI. Agree in principal. However, we have seen some issues on CI whereby the unreliability of the underlying cloud has caused failures. So long as the issue identified indeed has a root cause that we can fix in juju itself, then we should block landings to trunk until it is fixed. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Devel is broken, we cannot release
* FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which had a secondary bug that broke compilation. https://bugs.launchpad.net/juju-core/+bug/1336089 This bug brings up another issue. The code concerned has now been moved off to a juju sub project - blobstorage. So the juju-core ppc64 tests will no longer fail. Martin is in the process of setting up Jenkins landing jobs for all the sub projects (there are several). But there won't initially be ppc64 variants of these jobs. So it will be possible for juju-core to now pass ppc64 testing even though sub projects it depends on may not. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Devel is broken, we cannot release
On 15/07/14 15:48, Ian Booth wrote: * FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which had a secondary bug that broke compilation. https://bugs.launchpad.net/juju-core/+bug/1336089 This bug brings up another issue. The code concerned has now been moved off to a juju sub project - blobstorage. So the juju-core ppc64 tests will no longer fail. Martin is in the process of setting up Jenkins landing jobs for all the sub projects (there are several). But there won't initially be ppc64 variants of these jobs. So it will be possible for juju-core to now pass ppc64 testing even though sub projects it depends on may not. Surely this just means that we need real end to end tests on all supported architectures, right? Tim -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Devel is broken, we cannot release
On 15/07/14 14:17, Tim Penhey wrote: On 15/07/14 15:48, Ian Booth wrote: * FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which had a secondary bug that broke compilation. https://bugs.launchpad.net/juju-core/+bug/1336089 This bug brings up another issue. The code concerned has now been moved off to a juju sub project - blobstorage. So the juju-core ppc64 tests will no longer fail. Martin is in the process of setting up Jenkins landing jobs for all the sub projects (there are several). But there won't initially be ppc64 variants of these jobs. So it will be possible for juju-core to now pass ppc64 testing even though sub projects it depends on may not. Surely this just means that we need real end to end tests on all supported architectures, right? In theory. The number of combinations will be large and I'm not sure we currently have the capacity to do that? But the issue also it that functional tests may well pass even though some particular unit tests fail. -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev