Re: Devel is broken, we cannot release

2014-07-15 Thread Tim Penhey
On 16/07/14 01:57, William Reade wrote:
> On Tue, Jul 15, 2014 at 2:51 PM, Wayne Witzel
> mailto:wayne.wit...@canonical.com>> wrote:
> 
> If we aren't stopping the line when our CI is in the red, then what
> is the point of even having CI at all? If we are not prepared to
> adjust the culture of our development. To truly halt forward
> progress in favor of chasing down regressions then I struggle to
> find the benefit that CI and testing is giving us at all. Just
> confirming that master is broken and we are still ignoring it? If
> master is broken, we all our broken. No development you are doing
> should proceed that is based on a broken master. That work cannot at
> any point be considered in good working condition. A problem in
> master is everyone's problem.
> 
> Bugs that are found throughout the normal operations and usage of
> juju should be assigned a priority and queued, but regression is a
> sign of a greater problem that should be resolved immediately.
> Allowing regressions to not stop the line is taking the stance that
> we don't care about deterioration in our code base.
> 
> 
> +100 to this. Regressions are a Big Deal and should be treated as such;
> leaving other merges queued until such a time as the regression is fixed
> (or backed out for rework) is entirely reasonable (and I think we've got
> enough evidence that the alternative really doesn't fly effectively).

Stop the line doesn't mean that everyone stops work, only that trunk
stops accepting new merges until the critical issue is fixed.

This could mean that one or more people actually work on the critical
issue, and others continue with other work, but there are no other trunk
commits until the bug is fixed.  This means that everyone on the team is
aware of the blocker and the progress to fix it because it directly
effects their ability to land their work. This means that there is
internal team pressure as well, and normally this ends up being more
offers to help, review and get the code landed ASAP so it unblocks people.

In the past, we had the landing bot tweaked so it was put into
"emergency only mode" which meant that normal merges were ignored and
only approved critical landings were accepted until the mode was
changed.  This seems like it could be relatively easy to do.

Perhaps a flag like $$ci-fix-merge$$ ?? /me handwaves

Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread John Meinel
My concern was with the speed of response. I'm happy to have a QA Team
switch that must be fixed (with an associated email to juju - dev so
everyone knows why their patch won't land). I *would* like us to be
tracking stuff like how long we go into regression mode, etc.

I think ideally the process would be automated, but our current CI seems to
need a fair amount of manual filtering.

On Jul 15, 2014 6:35 PM, "Curtis Hovey-Canonical" 
wrote:
>

> We are doing some combinatorial testing because we need to ensure
> every series+arch combination works. In Vegas Sprint, we settled on
> unittests and lxc tests as the best way to identify issues with arch
> or series. We test:
>
>   precise + amd64
>   utopic + amd64
>   trusty + amd64
>   trusty + i386
>   trusty + ppc64
>   trusty + arm64
>

That looks M+N to me. (All series amd64 + trust for all arches). The MxN
would be all series x all arches.

...
> > I have the feeling, though, that "better CI" might be making some
developers
> > a bit more lax and doing less direct testing themselves, because they
expect
> > that CI will catch things they don't.
>
> I don't feel this. I think the problem is the complexity of Juju.
> Mongo changes for HA broken the backup-restore feature, I think these
> are different areas of expertise that needed better coordination.

I think there was also some Auth changes that meant we couldn't bootstrap
at all.
I really like that CI caught it. I wonder if it had to get that far.

>
> > I like the stop-the-line-when-CI-is-broken, as long as we have reliable
ways
> > to stop it. Given the timescales we're working on, I'd probably be ok
with
> > having it be a manual thing, so that when Azure decides to rev their
API and
> > break everything that used to work, we aren't immediately crippled.
Maybe we
> > can identify a subset of CI that is reliable (or high priority) enough
that
> > it really is automatically stop-the-line worthy. (Trusty unit tests, PPC
> > unit tests, local provider, ec2 tests come to mind.)
>
> Cloud failures are not regressions in juju code. I spend a day or more
> a week tweaking CI to give Juju the best chance of success. I might
> change a test, or write a script that cleans up the resources in
> cloud/host.
>
> Since I am taking time to give juju more chances to pass, I delay
> reporting the bugs. 5 revisions might merge while I prove that juju is
> really broken. Since the defect can mutate with the extra commits. it
> isn't easy to identify the 1 or more revisions that are at fault.
>
> When we report a "ci regression" it is something we genuinely believe
> to work when we retest an old revision. I do provide a list of commits
> that can be investigated.
>
> As for automating a stop the line policy, we might be fine with a
> small hack to the git-merge-juju job to check for commits that claim
> to fix a regression, when not the case, the job fails early with the
> reason that we are waiting for a specific fix. Rollback is always an
> option.
>
>

I absolutely support trying to find ways to help keep CI blue (green). It's
definitely the background I come from and a culture I want us to have.
I think a difficulty is figuring out who/what is responsible and the slow
turn around to unblocking everything. If we make what we think is the fix
even if it is just reverting a change, doesn't it take hours to run CI
again and even then some bits may fail spuriously/for a different reason?
If we need manual intervention on both ends that means a stop-the-line
takes us out of working order for 24 hours. I'm just trying to explore the
consequences. I really do think we need good feedback into keeping CI happy.

John
=:->
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Curtis Hovey-Canonical
On Tue, Jul 15, 2014 at 1:29 AM, John Meinel  wrote:
> It seems worthy to just run "go test github.com/juju/..." as our CI testing,
> isn't it? (i.e., run all unit tests across all packages that we write on all
> platforms), rather than *just* github.com/juju/juju.

Ah!. That looks easy. We could add a test like this in a day.

> I don't think we run into the combinatorial problem here (if we can run all
> of the github.com/juju/juju tests, than we aren't really adding much to run
> the rest of the dependencies as well).



> I think having a "full bootstrap, deploy, upgrade, destroy" on all platforms
> is necessary as a functional test, I'm not sure that we need to cross
> product it with on-all-environments. (which *would* start to run into
> combinatorial problems)

We are doing some combinatorial testing because we need to ensure
every series+arch combination works. In Vegas Sprint, we settled on
unittests and lxc tests as the best way to identify issues with arch
or series. We test:

  precise + amd64
  utopic + amd64
  trusty + amd64
  trusty + i386
  trusty + ppc64
  trusty + arm64

Cloud tests always do a deploy and an upgrade because both scenarios
use simple streams, which are also under test. CI is testing
juju-release-tools too since juju is isn't very useful unless it is
packaged and tools are published to the CPCs.

There is a large class of function tests and some integration tests
with other software that we need to add this cycle.

> I have the feeling, though, that "better CI" might be making some developers
> a bit more lax and doing less direct testing themselves, because they expect
> that CI will catch things they don't.

I don't feel this. I think the problem is the complexity of Juju.
Mongo changes for HA broken the backup-restore feature, I think these
are different areas of expertise that needed better coordination.

> I like the stop-the-line-when-CI-is-broken, as long as we have reliable ways
> to stop it. Given the timescales we're working on, I'd probably be ok with
> having it be a manual thing, so that when Azure decides to rev their API and
> break everything that used to work, we aren't immediately crippled. Maybe we
> can identify a subset of CI that is reliable (or high priority) enough that
> it really is automatically stop-the-line worthy. (Trusty unit tests, PPC
> unit tests, local provider, ec2 tests come to mind.)

Cloud failures are not regressions in juju code. I spend a day or more
a week tweaking CI to give Juju the best chance of success. I might
change a test, or write a script that cleans up the resources in
cloud/host.

Since I am taking time to give juju more chances to pass, I delay
reporting the bugs. 5 revisions might merge while I prove that juju is
really broken. Since the defect can mutate with the extra commits. it
isn't easy to identify the 1 or more revisions that are at fault.

When we report a "ci regression" it is something we genuinely believe
to work when we retest an old revision. I do provide a list of commits
that can be investigated.

As for automating a stop the line policy, we might be fine with a
small hack to the git-merge-juju job to check for commits that claim
to fix a regression, when not the case, the job fails early with the
reason that we are waiting for a specific fix. Rollback is always an
option.



-- 
Curtis Hovey
Canonical Cloud Development and Operations
http://launchpad.net/~sinzui

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Nate Finch
I think that's a fair assessment.

Perhaps the easiest fix is to install a switch QA could throw to change the
required merge message to something like !!ThisFixesCI!!


On Tue, Jul 15, 2014 at 9:57 AM, William Reade 
wrote:

> On Tue, Jul 15, 2014 at 2:51 PM, Wayne Witzel 
> wrote:
>
>> If we aren't stopping the line when our CI is in the red, then what is
>> the point of even having CI at all? If we are not prepared to adjust the
>> culture of our development. To truly halt forward progress in favor of
>> chasing down regressions then I struggle to find the benefit that CI and
>> testing is giving us at all. Just confirming that master is broken and we
>> are still ignoring it? If master is broken, we all our broken. No
>> development you are doing should proceed that is based on a broken master.
>> That work cannot at any point be considered in good working condition. A
>> problem in master is everyone's problem.
>>
>> Bugs that are found throughout the normal operations and usage of juju
>> should be assigned a priority and queued, but regression is a sign of a
>> greater problem that should be resolved immediately. Allowing regressions
>> to not stop the line is taking the stance that we don't care about
>> deterioration in our code base.
>>
>
> +100 to this. Regressions are a Big Deal and should be treated as such;
> leaving other merges queued until such a time as the regression is fixed
> (or backed out for rework) is entirely reasonable (and I think we've got
> enough evidence that the alternative really doesn't fly effectively).
>
> Cheers
> William
>
>
>>
>>
>> On Tue, Jul 15, 2014 at 9:37 AM, Nate Finch 
>> wrote:
>>
>>> I don't think we need to stop the world to get these things fixed.  It
>>> is the responsibility of the team leads to make sure someone's actively
>>> working on fixes for regressions.  If they're not getting fixed, it's our
>>> fault.  We should have one of the team leads pick up the regression and
>>> assign someone to work on it, just like any other high priority bug.
>>>
>>>
>>>
>>> On Mon, Jul 14, 2014 at 3:05 PM, Curtis Hovey-Canonical <
>>> cur...@canonical.com> wrote:
>>>
 Devel has been broken for weeks because of regressions. We cannot
 release devel. The stable 1.20.0 that we release is actually older
 than it appears because we had to search CI for an older revision that
 worked.

 We have a systemic problem: once a regression is introduced, it blocks
 the release for weeks, and we build on top of the regression. We often
 see many regressions.The regression mutate as people merge more
 branches.

 The current two regressions are:
 * win juju client still broken with unknown
   from  2014-06-27 which has varied as a compilation
   problem or panic during execution.
   https://bugs.launchpad.net/juju-core/+bug/1335328

 * FAIL: managedstorage_test trusty ppc64
   from 2014-06-30 which had a secondary bug that broke compilation.
   https://bugs.launchpad.net/juju-core/+bug/1336089

 I think the problem is engineers are focused on there feature. They
 don't see the fallout from their changes. They may hope the fix will
 arrive soon, and that maybe someone else will fix it.

 I propose a change in policy. When a there is a regression in CI, no
 new branches can be merged except those that link to the blocking bug.
 This will encourage engineers to fix the regression. One way to fix
 the regression is to identify and revert the commit that broken CI.


 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>>
>>
>>
>> --
>> Wayne Witzel III
>> wayne.wit...@canonical.com
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread William Reade
On Tue, Jul 15, 2014 at 2:51 PM, Wayne Witzel 
wrote:

> If we aren't stopping the line when our CI is in the red, then what is the
> point of even having CI at all? If we are not prepared to adjust the
> culture of our development. To truly halt forward progress in favor of
> chasing down regressions then I struggle to find the benefit that CI and
> testing is giving us at all. Just confirming that master is broken and we
> are still ignoring it? If master is broken, we all our broken. No
> development you are doing should proceed that is based on a broken master.
> That work cannot at any point be considered in good working condition. A
> problem in master is everyone's problem.
>
> Bugs that are found throughout the normal operations and usage of juju
> should be assigned a priority and queued, but regression is a sign of a
> greater problem that should be resolved immediately. Allowing regressions
> to not stop the line is taking the stance that we don't care about
> deterioration in our code base.
>

+100 to this. Regressions are a Big Deal and should be treated as such;
leaving other merges queued until such a time as the regression is fixed
(or backed out for rework) is entirely reasonable (and I think we've got
enough evidence that the alternative really doesn't fly effectively).

Cheers
William


>
>
> On Tue, Jul 15, 2014 at 9:37 AM, Nate Finch 
> wrote:
>
>> I don't think we need to stop the world to get these things fixed.  It is
>> the responsibility of the team leads to make sure someone's actively
>> working on fixes for regressions.  If they're not getting fixed, it's our
>> fault.  We should have one of the team leads pick up the regression and
>> assign someone to work on it, just like any other high priority bug.
>>
>>
>>
>> On Mon, Jul 14, 2014 at 3:05 PM, Curtis Hovey-Canonical <
>> cur...@canonical.com> wrote:
>>
>>> Devel has been broken for weeks because of regressions. We cannot
>>> release devel. The stable 1.20.0 that we release is actually older
>>> than it appears because we had to search CI for an older revision that
>>> worked.
>>>
>>> We have a systemic problem: once a regression is introduced, it blocks
>>> the release for weeks, and we build on top of the regression. We often
>>> see many regressions.The regression mutate as people merge more
>>> branches.
>>>
>>> The current two regressions are:
>>> * win juju client still broken with unknown
>>>   from  2014-06-27 which has varied as a compilation
>>>   problem or panic during execution.
>>>   https://bugs.launchpad.net/juju-core/+bug/1335328
>>>
>>> * FAIL: managedstorage_test trusty ppc64
>>>   from 2014-06-30 which had a secondary bug that broke compilation.
>>>   https://bugs.launchpad.net/juju-core/+bug/1336089
>>>
>>> I think the problem is engineers are focused on there feature. They
>>> don't see the fallout from their changes. They may hope the fix will
>>> arrive soon, and that maybe someone else will fix it.
>>>
>>> I propose a change in policy. When a there is a regression in CI, no
>>> new branches can be merged except those that link to the blocking bug.
>>> This will encourage engineers to fix the regression. One way to fix
>>> the regression is to identify and revert the commit that broken CI.
>>>
>>>
>>> --
>>> Curtis Hovey
>>> Canonical Cloud Development and Operations
>>> http://launchpad.net/~sinzui
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
>
> --
> Wayne Witzel III
> wayne.wit...@canonical.com
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Wayne Witzel
If we aren't stopping the line when our CI is in the red, then what is the
point of even having CI at all? If we are not prepared to adjust the
culture of our development. To truly halt forward progress in favor of
chasing down regressions then I struggle to find the benefit that CI and
testing is giving us at all. Just confirming that master is broken and we
are still ignoring it? If master is broken, we all our broken. No
development you are doing should proceed that is based on a broken master.
That work cannot at any point be considered in good working condition. A
problem in master is everyone's problem.

Bugs that are found throughout the normal operations and usage of juju
should be assigned a priority and queued, but regression is a sign of a
greater problem that should be resolved immediately. Allowing regressions
to not stop the line is taking the stance that we don't care about
deterioration in our code base.


On Tue, Jul 15, 2014 at 9:37 AM, Nate Finch 
wrote:

> I don't think we need to stop the world to get these things fixed.  It is
> the responsibility of the team leads to make sure someone's actively
> working on fixes for regressions.  If they're not getting fixed, it's our
> fault.  We should have one of the team leads pick up the regression and
> assign someone to work on it, just like any other high priority bug.
>
>
>
> On Mon, Jul 14, 2014 at 3:05 PM, Curtis Hovey-Canonical <
> cur...@canonical.com> wrote:
>
>> Devel has been broken for weeks because of regressions. We cannot
>> release devel. The stable 1.20.0 that we release is actually older
>> than it appears because we had to search CI for an older revision that
>> worked.
>>
>> We have a systemic problem: once a regression is introduced, it blocks
>> the release for weeks, and we build on top of the regression. We often
>> see many regressions.The regression mutate as people merge more
>> branches.
>>
>> The current two regressions are:
>> * win juju client still broken with unknown
>>   from  2014-06-27 which has varied as a compilation
>>   problem or panic during execution.
>>   https://bugs.launchpad.net/juju-core/+bug/1335328
>>
>> * FAIL: managedstorage_test trusty ppc64
>>   from 2014-06-30 which had a secondary bug that broke compilation.
>>   https://bugs.launchpad.net/juju-core/+bug/1336089
>>
>> I think the problem is engineers are focused on there feature. They
>> don't see the fallout from their changes. They may hope the fix will
>> arrive soon, and that maybe someone else will fix it.
>>
>> I propose a change in policy. When a there is a regression in CI, no
>> new branches can be merged except those that link to the blocking bug.
>> This will encourage engineers to fix the regression. One way to fix
>> the regression is to identify and revert the commit that broken CI.
>>
>>
>> --
>> Curtis Hovey
>> Canonical Cloud Development and Operations
>> http://launchpad.net/~sinzui
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>


-- 
Wayne Witzel III
wayne.wit...@canonical.com
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Nate Finch
I don't think we need to stop the world to get these things fixed.  It is
the responsibility of the team leads to make sure someone's actively
working on fixes for regressions.  If they're not getting fixed, it's our
fault.  We should have one of the team leads pick up the regression and
assign someone to work on it, just like any other high priority bug.



On Mon, Jul 14, 2014 at 3:05 PM, Curtis Hovey-Canonical <
cur...@canonical.com> wrote:

> Devel has been broken for weeks because of regressions. We cannot
> release devel. The stable 1.20.0 that we release is actually older
> than it appears because we had to search CI for an older revision that
> worked.
>
> We have a systemic problem: once a regression is introduced, it blocks
> the release for weeks, and we build on top of the regression. We often
> see many regressions.The regression mutate as people merge more
> branches.
>
> The current two regressions are:
> * win juju client still broken with unknown
>   from  2014-06-27 which has varied as a compilation
>   problem or panic during execution.
>   https://bugs.launchpad.net/juju-core/+bug/1335328
>
> * FAIL: managedstorage_test trusty ppc64
>   from 2014-06-30 which had a secondary bug that broke compilation.
>   https://bugs.launchpad.net/juju-core/+bug/1336089
>
> I think the problem is engineers are focused on there feature. They
> don't see the fallout from their changes. They may hope the fix will
> arrive soon, and that maybe someone else will fix it.
>
> I propose a change in policy. When a there is a regression in CI, no
> new branches can be merged except those that link to the blocking bug.
> This will encourage engineers to fix the regression. One way to fix
> the regression is to identify and revert the commit that broken CI.
>
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Aaron Bentley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14-07-14 11:48 PM, Ian Booth wrote:
>> 
>> * FAIL: managedstorage_test trusty ppc64 from 2014-06-30 which
>> had a secondary bug that broke compilation. 
>> https://bugs.launchpad.net/juju-core/+bug/1336089
>> 
> 
> This bug brings up another issue. The code concerned has now been
> moved off to a juju sub project - blobstorage. So the juju-core
> ppc64 tests will no longer fail.
> 
> Martin is in the process of setting up Jenkins landing jobs for all
> the sub projects (there are several). But there won't initially be
> ppc64 variants of these jobs.

Is there anything blocking ppc64 variants?  We already have a ppc64
slave set up.

Aaron
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTxSpFAAoJEK84cMOcf+9h2C8IAJdD+SOYJxQKwDv/6iKZjA3v
Q98S7Z76PIt0J5Q++eVUwK8uMWGa9XCtM6Ou36rjyjA8KsEpn52Vwg40LGPfOFp5
AVs7gzGCHNAlhNMMIxdSXxxL0cNKBs1bW7CsjjZ7vjzmVUwnlhF9j340EB0d1tS0
Jq87y+8FdD2yfYqZnQ0o8Ls2yLEGmfRGe3Jb2P4//9RrR+kOZNWYxZtfg+xzMr20
05NSXI2DAU1LgB0IuTcy9ri4Pv9ACuebmPZEuWK5nLRofjRyy0vG4fsdJ1DMji68
lhiwwLVBXnTyKSL2U4z7iKE2kXna230iW5ReUBewyN5JZZBozsHVfpQ6nO48mEA=
=NLEv
-END PGP SIGNATURE-

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Aaron Bentley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14-07-14 11:42 PM, David Cheney wrote:
> Who is 'we' ?
> 
> Is some automated process going to manage this moratorium ? Do we 
> switch off the bot, or perturb it's access ?

No, we don't want to switch off the bot, we want to limit the kinds of
branches it will accept while a regression is present.  One possible
implementation is that the bot queries Launchpad for open regressions
before attempting to land each branch.

Aaron
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTxSlTAAoJEK84cMOcf+9hdSsIAMFANcgc0PIvbg53LMt+dYR2
W2ADjNwoBab45SAWoOTIJ5pKVWaFc66AGsgwDpITs5F+6U1htLaJLCl2IU8vARMO
pzYw1bqpYuwsPShgWiK1zZLOkrv7pkUQu81K3ohLiAPEaNg+zPFhWCqPeiC1zb9O
AHR8TCm9+hsRS+95rRt/PfE0FunHjFwG0siWFuY4rNd+pFkouhxaMZ19SJgaQU6j
vMVhdZLc1wk2ellb+AU5zus4qI/OFvmeVVehFBsY1zgcbB8Brd4/zwZg/pt7h5af
hfpYnbF00sykwh+c4B1Kve6y1hH8eUigxJ+JgV+zViGQfIlleSHGubjhLLXDat8=
=5oMZ
-END PGP SIGNATURE-

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-15 Thread Aaron Bentley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 14-07-14 11:38 PM, Ian Booth wrote:
>> I propose a change in policy. When a there is a regression in CI,
>> no new branches can be merged except those that link to the
>> blocking bug. This will encourage engineers to fix the
>> regression. One way to fix the regression is to identify and
>> revert the commit that broken CI.
>> 
> 
> Agree in principal. However, we have seen some issues on CI whereby
> the unreliability of the underlying cloud has caused failures. So
> long as the issue identified indeed has a root cause that we can
> fix in juju itself, then we should block landings to trunk until it
> is fixed.

It's true that CI cries wolf, but Curtis is talking about bugs that CI
found, Curtis confirmed and determined were regressions.  I don't
think we need to worry about cloud glitches there.

Aaron
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTxSiaAAoJEK84cMOcf+9hT4wH/ipEUlTpjVApsF9y1qzglsUY
VaDn6AHsZKDLwX9zEMYdWIoN1CFU0KQrngN5D5/S2FAxZ+C8/Z+B17nRffSSB25r
LNEH9XOdZz9qaTZV5kz3kR3jYgUkvHe0QAHMRLnU+qOBzQjhsy4sWkHp17862pjz
QnIIRpvkd4P2tgnCEFslhQuMCPCbcx26cUkkzKDN8qEj346aP2Mj7Gf0yAdi8jTB
G2JXa1XKa251NIWHGonJa8jMpKrq7FJWF31QDuYFmjseUV0HMZn49pavVBL/NLh4
krfOJTk2Fr8Pu4aX9ofniPRF1KzwLEvRKH+jkTWK4H5yl2ZuUgR/CaGL6Zt/Tas=
=fxgZ
-END PGP SIGNATURE-

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread John Meinel
It seems worthy to just run "go test github.com/juju/..." as our CI
testing, isn't it? (i.e., run all unit tests across all packages that we
write on all platforms), rather than *just* github.com/juju/juju.

I don't think we run into the combinatorial problem here (if we can run all
of the github.com/juju/juju tests, than we aren't really adding much to run
the rest of the dependencies as well).

I think having a "full bootstrap, deploy, upgrade, destroy" on all
platforms is necessary as a functional test, I'm not sure that we need to
cross product it with on-all-environments. (which *would* start to run into
combinatorial problems)

And then a "do the above" with all environments and a single source
platform is again avoiding any M*N testing (it should be M+N). I wonder if
we'd have better total coverage if we actually ran our "run against
Azure/EC2/etc" using a PPC client, just because all the developers aren't
actually using PPC clients.

I have the feeling, though, that "better CI" might be making some
developers a bit more lax and doing less direct testing themselves, because
they expect that CI will catch things they don't.

I know that *I* used to read all the bug mail, but have not been succeeding
of late. (399 unread threads). I certainly appreciate it when Curtis
summarizes what is going on with "this is broken", and I want Curtis to
feel empowered to write those messages, and not feel like it should only be
a last resort.


I like the stop-the-line-when-CI-is-broken, as long as we have reliable
ways to stop it. Given the timescales we're working on, I'd probably be ok
with having it be a manual thing, so that when Azure decides to rev their
API and break everything that used to work, we aren't immediately crippled.
Maybe we can identify a subset of CI that is reliable (or high priority)
enough that it really is automatically stop-the-line worthy. (Trusty unit
tests, PPC unit tests, local provider, ec2 tests come to mind.)

John
=:->



On Tue, Jul 15, 2014 at 8:33 AM, Ian Booth  wrote:

>
>
> On 15/07/14 14:17, Tim Penhey wrote:
> > On 15/07/14 15:48, Ian Booth wrote:
> >>>
> >>> * FAIL: managedstorage_test trusty ppc64
> >>>   from 2014-06-30 which had a secondary bug that broke compilation.
> >>>   https://bugs.launchpad.net/juju-core/+bug/1336089
> >>>
> >>
> >> This bug brings up another issue.
> >> The code concerned has now been moved off to a juju sub project -
> blobstorage.
> >> So the juju-core ppc64 tests will no longer fail.
> >>
> >> Martin is in the process of setting up Jenkins landing jobs for all the
> sub
> >> projects (there are several). But there won't initially be ppc64
> variants of
> >> these jobs. So it will be possible for juju-core to now pass ppc64
> testing even
> >> though sub projects it depends on may not.
> >
> > Surely this just means that we need real end to end tests on all
> > supported architectures, right?
> >
>
> In theory. The number of combinations will be large and I'm not sure we
> currently have the capacity to do that?
>
> But the issue also it that functional tests may well pass even though some
> particular unit tests fail.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth


On 15/07/14 14:17, Tim Penhey wrote:
> On 15/07/14 15:48, Ian Booth wrote:
>>>
>>> * FAIL: managedstorage_test trusty ppc64
>>>   from 2014-06-30 which had a secondary bug that broke compilation.
>>>   https://bugs.launchpad.net/juju-core/+bug/1336089
>>>
>>
>> This bug brings up another issue.
>> The code concerned has now been moved off to a juju sub project - 
>> blobstorage.
>> So the juju-core ppc64 tests will no longer fail.
>>
>> Martin is in the process of setting up Jenkins landing jobs for all the sub
>> projects (there are several). But there won't initially be ppc64 variants of
>> these jobs. So it will be possible for juju-core to now pass ppc64 testing 
>> even
>> though sub projects it depends on may not.
> 
> Surely this just means that we need real end to end tests on all
> supported architectures, right?
> 

In theory. The number of combinations will be large and I'm not sure we
currently have the capacity to do that?

But the issue also it that functional tests may well pass even though some
particular unit tests fail.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Tim Penhey
On 15/07/14 15:48, Ian Booth wrote:
>>
>> * FAIL: managedstorage_test trusty ppc64
>>   from 2014-06-30 which had a secondary bug that broke compilation.
>>   https://bugs.launchpad.net/juju-core/+bug/1336089
>>
> 
> This bug brings up another issue.
> The code concerned has now been moved off to a juju sub project - blobstorage.
> So the juju-core ppc64 tests will no longer fail.
> 
> Martin is in the process of setting up Jenkins landing jobs for all the sub
> projects (there are several). But there won't initially be ppc64 variants of
> these jobs. So it will be possible for juju-core to now pass ppc64 testing 
> even
> though sub projects it depends on may not.

Surely this just means that we need real end to end tests on all
supported architectures, right?

Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth
> 
> * FAIL: managedstorage_test trusty ppc64
>   from 2014-06-30 which had a secondary bug that broke compilation.
>   https://bugs.launchpad.net/juju-core/+bug/1336089
> 

This bug brings up another issue.
The code concerned has now been moved off to a juju sub project - blobstorage.
So the juju-core ppc64 tests will no longer fail.

Martin is in the process of setting up Jenkins landing jobs for all the sub
projects (there are several). But there won't initially be ppc64 variants of
these jobs. So it will be possible for juju-core to now pass ppc64 testing even
though sub projects it depends on may not.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread David Cheney
Who is 'we' ?

Is some automated process going to manage this moratorium ? Do we
switch off the bot, or perturb it's access ?

On Tue, Jul 15, 2014 at 1:38 PM, Ian Booth  wrote:
>>
>> I think the problem is engineers are focused on there feature. They
>> don't see the fallout from their changes. They may hope the fix will
>> arrive soon, and that maybe someone else will fix it.
>>
>> I propose a change in policy. When a there is a regression in CI, no
>> new branches can be merged except those that link to the blocking bug.
>> This will encourage engineers to fix the regression. One way to fix
>> the regression is to identify and revert the commit that broken CI.
>>
>
> Agree in principal. However, we have seen some issues on CI whereby the
> unreliability of the underlying cloud has caused failures. So long as the 
> issue
> identified indeed has a root cause that we can fix in juju itself, then we
> should block landings to trunk until it is fixed.
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: 
> https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Ian Booth
> 
> I think the problem is engineers are focused on there feature. They
> don't see the fallout from their changes. They may hope the fix will
> arrive soon, and that maybe someone else will fix it.
> 
> I propose a change in policy. When a there is a regression in CI, no
> new branches can be merged except those that link to the blocking bug.
> This will encourage engineers to fix the regression. One way to fix
> the regression is to identify and revert the commit that broken CI.
> 

Agree in principal. However, we have seen some issues on CI whereby the
unreliability of the underlying cloud has caused failures. So long as the issue
identified indeed has a root cause that we can fix in juju itself, then we
should block landings to trunk until it is fixed.


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Devel is broken, we cannot release

2014-07-14 Thread Wayne Witzel
+1

I've experienced this type of policy and it leads to few things. More
tests, better tests, and better self reviews and developer QA.

I believe borrowing the other ideas from lean and agile but not having the
big stop button when a defect is found is an unsustainable approach to
development and with the recent growth in the number of people actively
working on the code base we are now experiencing that first had.
On Jul 14, 2014 3:06 PM, "Curtis Hovey-Canonical" 
wrote:

> Devel has been broken for weeks because of regressions. We cannot
> release devel. The stable 1.20.0 that we release is actually older
> than it appears because we had to search CI for an older revision that
> worked.
>
> We have a systemic problem: once a regression is introduced, it blocks
> the release for weeks, and we build on top of the regression. We often
> see many regressions.The regression mutate as people merge more
> branches.
>
> The current two regressions are:
> * win juju client still broken with unknown
>   from  2014-06-27 which has varied as a compilation
>   problem or panic during execution.
>   https://bugs.launchpad.net/juju-core/+bug/1335328
>
> * FAIL: managedstorage_test trusty ppc64
>   from 2014-06-30 which had a secondary bug that broke compilation.
>   https://bugs.launchpad.net/juju-core/+bug/1336089
>
> I think the problem is engineers are focused on there feature. They
> don't see the fallout from their changes. They may hope the fix will
> arrive soon, and that maybe someone else will fix it.
>
> I propose a change in policy. When a there is a regression in CI, no
> new branches can be merged except those that link to the blocking bug.
> This will encourage engineers to fix the regression. One way to fix
> the regression is to identify and revert the commit that broken CI.
>
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev