Juju induction sprint summary

2014-07-14 Thread Ian Booth
Hi all

So last week we had a Juju induction sprint for Tanzanite and Moonstone teams to
welcome Eric and Katherine to the Juju fold. Following is a summary of some key
outcomes from the sprint that are relevant to others working on Juju (we also
did other stuff not generally applicable for this email). Some items will
interest some folks, while others may not quite be so relevant to you, so scan
the topics to see what you find interesting.

* Architectural overview - and a cool new tool

The sprint started with an architectural overview of the Juju moving parts and
how they interacted to deploy and maintain a Juju environment. Katherine noted
that our in-tree documentation has lots of text and no diagrams. She pointed out
a great tool for easily putting together UML diagrams using a simple text based
syntax - Plant UML http://plantuml.sourceforge.net. Check it out, it's pretty
cool. We'll be adding a diagram or two to the in-tree docs to show how it works.

* Code review (replacement for Github's native code review)

We are going to use Review Board. When we first looked at it before the sprint,
a major show stopper was lack of an auth plugin which worked with Github. Eric
has stepped up and written the necessary plugin. We'll have something deployed
this week or early next week, once some more tooling to finish the Github
integration is done. The key features:
- Login with Github button on main login screen
- pull requests automatically imported to Review Board and added to review queue
- diffs can be uploaded to Review Board as WIP and submitted to Github when
finalised

* Fixing the Juju state.State mess

state is a mess of layering violations and intermingled concerns. The result is
slow and fragile unit tests, scalability issues, hard to understand code, code
which is difficult to extend and refactor (to name a few issues).

The correct layering should be something like:
* remote service interface (aka apiserver)
* juju services for managing machines, services, units etc
* juju domain model
* model persistence (aka state)

The persistence layer above is all that should be in the state package. The plan
is to incrementally extract Juju service business logic out of state and pull it
up into a services layer. The first to be done is the machine placement and
deployment logic. Wayne has a WIP branch for this. The benefit of this work
can't be overstated, and the sprint allowed both teams to be able to work
together to understand the direction and intent of the work.

* Mongo 2.6 support

The work to port Juju to Mongo 2.6 is pretty much complete. The newer Mongo
version offers a number of bug fixes and  improvements over the 2.4 series, and
we need to be able to run with an up-to-date version.

* Providers don't need to have a storage implementation (almost)

A significant chunk of old code which was to support agents connecting directly
to mongo was removed (along with the necessary refactoring). This then allowed
the Environ interface to drop the StateInfo() method and instead implement a
method which returns the state server instances (not committed yet but close).
The next step is to remove the Storage() interface from Environ and make storage
an internal implementation detail which is not mandatory, so long as providers
have a way to figure out their state servers (this can be done using tagging for
example).

* Juju 1.20.1 release (aka juju/mongo issues)

A number of issues with how Juju and mongo interact became apparent when
replicasets were used for HA. Unfortunately Juju 1.20 shipped with these issues
unfixed. Part of the sprint was spent working on some urgent fixes to ship a bug
fix 1.20.1 release. There's still an outstanding mongo session issue that needs
to be fixed this week for a 1.20.2 release. Michael is working on it. The tl;dr
is that we are holding onto sessions and not refreshing, which means that the
underlying socket can time out and Juju loses connection to mongo.

* Add support for Juju in China for Amazon (almost)

The supported regions for the EC2 provider are hard coded and so new regions in
China were not supported. The Chinese regions also use a new signing algorithm.
There should be a fix in place this week. Since all the changes are in the goamz
library, the change to juju-core is merely a dependency update. So this feature
should be available in the 1.20.2 release.

All up, a productive sprint with some great collaboration between the two teams.





-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Juju induction sprint summary

2014-07-14 Thread Mark Ramm-Christensen (Canonical.com)
Great work,

I am particularly happy to see that we have an incremental, and useful plan
to take care of some of the technical debt in state.State.   This is the
classic form of technical debt as described by Ward Cunningham -- we have
learned a good bit about the problem space and where flexibility is needed
since the code was originally written and the way we would do it today is
different than the best way we knew how to build it two years ago.

--Mark Ramm


On Mon, Jul 14, 2014 at 4:43 AM, Ian Booth ian.bo...@canonical.com wrote:

 Hi all

 So last week we had a Juju induction sprint for Tanzanite and Moonstone
 teams to
 welcome Eric and Katherine to the Juju fold. Following is a summary of
 some key
 outcomes from the sprint that are relevant to others working on Juju (we
 also
 did other stuff not generally applicable for this email). Some items will
 interest some folks, while others may not quite be so relevant to you, so
 scan
 the topics to see what you find interesting.

 * Architectural overview - and a cool new tool

 The sprint started with an architectural overview of the Juju moving parts
 and
 how they interacted to deploy and maintain a Juju environment. Katherine
 noted
 that our in-tree documentation has lots of text and no diagrams. She
 pointed out
 a great tool for easily putting together UML diagrams using a simple text
 based
 syntax - Plant UML http://plantuml.sourceforge.net. Check it out, it's
 pretty
 cool. We'll be adding a diagram or two to the in-tree docs to show how it
 works.

 * Code review (replacement for Github's native code review)

 We are going to use Review Board. When we first looked at it before the
 sprint,
 a major show stopper was lack of an auth plugin which worked with Github.
 Eric
 has stepped up and written the necessary plugin. We'll have something
 deployed
 this week or early next week, once some more tooling to finish the Github
 integration is done. The key features:
 - Login with Github button on main login screen
 - pull requests automatically imported to Review Board and added to review
 queue
 - diffs can be uploaded to Review Board as WIP and submitted to Github when
 finalised

 * Fixing the Juju state.State mess

 state is a mess of layering violations and intermingled concerns. The
 result is
 slow and fragile unit tests, scalability issues, hard to understand code,
 code
 which is difficult to extend and refactor (to name a few issues).

 The correct layering should be something like:
 * remote service interface (aka apiserver)
 * juju services for managing machines, services, units etc
 * juju domain model
 * model persistence (aka state)

 The persistence layer above is all that should be in the state package.
 The plan
 is to incrementally extract Juju service business logic out of state and
 pull it
 up into a services layer. The first to be done is the machine placement and
 deployment logic. Wayne has a WIP branch for this. The benefit of this work
 can't be overstated, and the sprint allowed both teams to be able to work
 together to understand the direction and intent of the work.

 * Mongo 2.6 support

 The work to port Juju to Mongo 2.6 is pretty much complete. The newer Mongo
 version offers a number of bug fixes and  improvements over the 2.4
 series, and
 we need to be able to run with an up-to-date version.

 * Providers don't need to have a storage implementation (almost)

 A significant chunk of old code which was to support agents connecting
 directly
 to mongo was removed (along with the necessary refactoring). This then
 allowed
 the Environ interface to drop the StateInfo() method and instead implement
 a
 method which returns the state server instances (not committed yet but
 close).
 The next step is to remove the Storage() interface from Environ and make
 storage
 an internal implementation detail which is not mandatory, so long as
 providers
 have a way to figure out their state servers (this can be done using
 tagging for
 example).

 * Juju 1.20.1 release (aka juju/mongo issues)

 A number of issues with how Juju and mongo interact became apparent when
 replicasets were used for HA. Unfortunately Juju 1.20 shipped with these
 issues
 unfixed. Part of the sprint was spent working on some urgent fixes to ship
 a bug
 fix 1.20.1 release. There's still an outstanding mongo session issue that
 needs
 to be fixed this week for a 1.20.2 release. Michael is working on it. The
 tl;dr
 is that we are holding onto sessions and not refreshing, which means that
 the
 underlying socket can time out and Juju loses connection to mongo.

 * Add support for Juju in China for Amazon (almost)

 The supported regions for the EC2 provider are hard coded and so new
 regions in
 China were not supported. The Chinese regions also use a new signing
 algorithm.
 There should be a fix in place this week. Since all the changes are in the
 goamz
 library, the change to juju-core is merely a dependency update. So this
 feature
 should be 

Devel is broken, we cannot release

2014-07-14 Thread Curtis Hovey-Canonical
Devel has been broken for weeks because of regressions. We cannot
release devel. The stable 1.20.0 that we release is actually older
than it appears because we had to search CI for an older revision that
worked.

We have a systemic problem: once a regression is introduced, it blocks
the release for weeks, and we build on top of the regression. We often
see many regressions.The regression mutate as people merge more
branches.

The current two regressions are:
* win juju client still broken with unknown
  from  2014-06-27 which has varied as a compilation
  problem or panic during execution.
  https://bugs.launchpad.net/juju-core/+bug/1335328

* FAIL: managedstorage_test trusty ppc64
  from 2014-06-30 which had a secondary bug that broke compilation.
  https://bugs.launchpad.net/juju-core/+bug/1336089

I think the problem is engineers are focused on there feature. They
don't see the fallout from their changes. They may hope the fix will
arrive soon, and that maybe someone else will fix it.

I propose a change in policy. When a there is a regression in CI, no
new branches can be merged except those that link to the blocking bug.
This will encourage engineers to fix the regression. One way to fix
the regression is to identify and revert the commit that broken CI.


-- 
Curtis Hovey
Canonical Cloud Development and Operations
http://launchpad.net/~sinzui

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Wayne Witzel
+1

I've experienced this type of policy and it leads to few things. More
tests, better tests, and better self reviews and developer QA.

I believe borrowing the other ideas from lean and agile but not having the
big stop button when a defect is found is an unsustainable approach to
development and with the recent growth in the number of people actively
working on the code base we are now experiencing that first had.
On Jul 14, 2014 3:06 PM, Curtis Hovey-Canonical cur...@canonical.com
wrote:

 Devel has been broken for weeks because of regressions. We cannot
 release devel. The stable 1.20.0 that we release is actually older
 than it appears because we had to search CI for an older revision that
 worked.

 We have a systemic problem: once a regression is introduced, it blocks
 the release for weeks, and we build on top of the regression. We often
 see many regressions.The regression mutate as people merge more
 branches.

 The current two regressions are:
 * win juju client still broken with unknown
   from  2014-06-27 which has varied as a compilation
   problem or panic during execution.
   https://bugs.launchpad.net/juju-core/+bug/1335328

 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089

 I think the problem is engineers are focused on there feature. They
 don't see the fallout from their changes. They may hope the fix will
 arrive soon, and that maybe someone else will fix it.

 I propose a change in policy. When a there is a regression in CI, no
 new branches can be merged except those that link to the blocking bug.
 This will encourage engineers to fix the regression. One way to fix
 the regression is to identify and revert the commit that broken CI.


 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth
 
 I think the problem is engineers are focused on there feature. They
 don't see the fallout from their changes. They may hope the fix will
 arrive soon, and that maybe someone else will fix it.
 
 I propose a change in policy. When a there is a regression in CI, no
 new branches can be merged except those that link to the blocking bug.
 This will encourage engineers to fix the regression. One way to fix
 the regression is to identify and revert the commit that broken CI.
 

Agree in principal. However, we have seen some issues on CI whereby the
unreliability of the underlying cloud has caused failures. So long as the issue
identified indeed has a root cause that we can fix in juju itself, then we
should block landings to trunk until it is fixed.


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth
 
 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089
 

This bug brings up another issue.
The code concerned has now been moved off to a juju sub project - blobstorage.
So the juju-core ppc64 tests will no longer fail.

Martin is in the process of setting up Jenkins landing jobs for all the sub
projects (there are several). But there won't initially be ppc64 variants of
these jobs. So it will be possible for juju-core to now pass ppc64 testing even
though sub projects it depends on may not.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Tim Penhey
On 15/07/14 15:48, Ian Booth wrote:

 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089

 
 This bug brings up another issue.
 The code concerned has now been moved off to a juju sub project - blobstorage.
 So the juju-core ppc64 tests will no longer fail.
 
 Martin is in the process of setting up Jenkins landing jobs for all the sub
 projects (there are several). But there won't initially be ppc64 variants of
 these jobs. So it will be possible for juju-core to now pass ppc64 testing 
 even
 though sub projects it depends on may not.

Surely this just means that we need real end to end tests on all
supported architectures, right?

Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth


On 15/07/14 14:17, Tim Penhey wrote:
 On 15/07/14 15:48, Ian Booth wrote:

 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089


 This bug brings up another issue.
 The code concerned has now been moved off to a juju sub project - 
 blobstorage.
 So the juju-core ppc64 tests will no longer fail.

 Martin is in the process of setting up Jenkins landing jobs for all the sub
 projects (there are several). But there won't initially be ppc64 variants of
 these jobs. So it will be possible for juju-core to now pass ppc64 testing 
 even
 though sub projects it depends on may not.
 
 Surely this just means that we need real end to end tests on all
 supported architectures, right?
 

In theory. The number of combinations will be large and I'm not sure we
currently have the capacity to do that?

But the issue also it that functional tests may well pass even though some
particular unit tests fail.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev