Re: Intentionally introducing failures into Juju

2014-08-16 Thread Stuart Bishop
On 14 August 2014 20:39, Gustavo Niemeyer
 wrote:
> On Thu, Aug 14, 2014 at 3:42 AM, Stuart Bishop
>  wrote:
>> Further to just injecting failures, I'm interested in controlling when
>> and the order hooks can run. A sort of manual mode, which could be
>> driven by a test harness such as Amulet.
>
> This sounds quite heavyweight and intrusive. Introducing delays in
> certain actions so that the flow is altered sounds okay, but manually
> modifying the order of hooks defined to be the correct one doesn't
> sound right. The former is also more useful in stress testing, while
> the latter is going to be seldom used because people need to think
> through the cases that could explode, and then hand-code the failure
> scenario.

I think tests requiring particular hook orderings would be more common
than you believe. Stress testing, and integration testing in general,
often trip over these bugs. At the moment, we just fix them. I'd like
to be able to fix them, and in addition add tests so it doesn't break
that way again. I think it will encourage robust code that works 100%
of the time, rather than the current more common approach of working
most of the time.

Do you mean heavy and intrusive in juju-core, or at the client end? I
would only expect this to be used by test harnesses so am not
concerned about the difficulty or performance to clients.

If it is too intrusive in juju-core, I think I can implement enough of
what I need in charm-helpers (and all of it if we get a generic 'hook
retry' mechanism).


>> Perhaps all hooks in the
>> queue are initially held, and I can unhold them one at a time.
>
> You should be able to do that one with debug-hooks today already, right?

Yes, and if I put checks for certain lock files in the charm-helpers
@hook decorator, I can get similar behaviour. If I had a 'hook retry'
I could even get arbitrary ordering. Blocking is enough for now, as I
don't think I currently have use cases for arbitrary ordering.


-- 
Stuart Bishop 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Intentionally introducing failures into Juju

2014-08-14 Thread Gustavo Niemeyer
On Thu, Aug 14, 2014 at 3:42 AM, Stuart Bishop
 wrote:
> Further to just injecting failures, I'm interested in controlling when
> and the order hooks can run. A sort of manual mode, which could be
> driven by a test harness such as Amulet.

This sounds quite heavyweight and intrusive. Introducing delays in
certain actions so that the flow is altered sounds okay, but manually
modifying the order of hooks defined to be the correct one doesn't
sound right. The former is also more useful in stress testing, while
the latter is going to be seldom used because people need to think
through the cases that could explode, and then hand-code the failure
scenario.

> Perhaps all hooks in the
> queue are initially held, and I can unhold them one at a time.

You should be able to do that one with debug-hooks today already, right?

> This would let me test the odd edge cases, such as peers departing
> peer relations during handshaking, or what happens when a new client

That's not just reordering, but introducing a failure scenario where a
peer unit leaves the relation. It would indeed be useful to support
that sort of failure injection, but with a proper mechanism for it
rather than fiddling with hook ordering.

> unit is added and its relation-changed hooks manages to run before the
> relation-joined hooks at the server end.

Similar case.

> If you could do this, you could inject your failures by actually
> breaking your units using juju run or juju ssh. Deploy your units, run
> the install hooks, juju ssh in breaking one of the units (rm -rf /,
> whatever), run the peer relation hooks, confirm that the service is
> still usable despite the failed unit.

Similar case as well. We should be able to inject those failures
without having to manually fiddle with hook ordering.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Intentionally introducing failures into Juju

2014-08-13 Thread Stuart Bishop
On 14 August 2014 07:31, Menno Smits  wrote:
> I like the idea being able to trigger failures using the juju command line.
>
> I'm undecided about how the need to fail should be stored. An obvious
> location would be in a new collection managed by state, or even as a field
> on existing state objects and documents. The downside of this approach is
> that a connection to state will then need to be available from where-ever we
> would like failures to be triggered - this isn't always possible or
> convenient.
>
> Another approach would be to have "juju inject-failure" drop files in some
> location (along the lines of what I've already implemented) using SSH. This
> has the advantage of making the failure checks easy to perform from anywhere
> with the disadvantage of making it more difficult to manage existing
> failures. There would also be some added complexity when creating failure
> files for about-to-be-created entities (e.g. the "juju deploy
> --inject-failure" case).
>
> Do you have any thoughts on this?


Further to just injecting failures, I'm interested in controlling when
and the order hooks can run. A sort of manual mode, which could be
driven by a test harness such as Amulet. Perhaps all hooks in the
queue are initially held, and I can unhold them one at a time.

This would let me test the odd edge cases, such as peers departing
peer relations during handshaking, or what happens when a new client
unit is added and its relation-changed hooks manages to run before the
relation-joined hooks at the server end.

If you could do this, you could inject your failures by actually
breaking your units using juju run or juju ssh. Deploy your units, run
the install hooks, juju ssh in breaking one of the units (rm -rf /,
whatever), run the peer relation hooks, confirm that the service is
still usable despite the failed unit.

-- 
Stuart Bishop 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits
I like the idea of being able to trigger failures stochastically. I'll
integrate this into whatever we settle on for Juju's failure injection.


On 14 August 2014 02:29, Gustavo Niemeyer 
wrote:

> Ah, and one more thing: when developing the chaos-injection mechanism
> in the mgo/txn package, I also added both a "chance" parameter for
> either killing or slowing down a given breakpoint. It sounds like it
> would be useful for juju's mechanism too. If you kill every time, it's
> hard to tell whether the system would know how to retry properly.
> Killing or slowing down just sometimes, or perhaps the first 2 times
> out of every 3, for example, would enable the system to recover
> itself, and an external agent to ensure it continues to work properly.
>
> On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
>  wrote:
> > That's a nice direction, Menno.
> >
> > The main thing that comes to mind is that it sounds quite inconvenient
> > to turn the feature on. It may sound otherwise because it's so easy to
> > drop files at arbitrary places in our local machines, but when dealing
> > with a distributed system that knows how to spawn its own resources
> > up, suddenly the "just write a file" becomes surprisingly boring and
> > race prone.
> >
> > What about:
> >
> > juju inject-failure [--unit=unit] [--service=service]  name>"?
> > juju deploy [--inject-failure=name] ...
> >
> >
> >
> > On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits 
> wrote:
> >> There's been some discussion recently about adding some feature to Juju
> to
> >> allow developers or CI tests to intentionally trigger otherwise hard to
> >> induce failures in specific parts of Juju. The idea is that sometimes we
> >> need some kind of failure to happen in a CI test or when manually
> testing
> >> but those failures can often be hard to make happen.
> >>
> >> For example, for changes Juju's upgrade mechanics that I'm working on
> at the
> >> moment I would like to ensure that an upgrade is cleanly aborted if one
> of
> >> the state servers in a HA environment refuses to start the upgrade. This
> >> logic is well unit tested but there's nothing like seeing it actually
> work
> >> in a real environment to build confidence - however, it isn't easy to
> make a
> >> state server misbehave in this way.
> >>
> >> To help with this kind of testing scenario, I've created a new top-level
> >> package called "wrench" which lets us "drop a wrench in the works" so to
> >> speak. It's very simple with one main API which can be called from
> >> judiciously chosen points in Juju's execution to decide whether some
> failure
> >> should be triggered.
> >>
> >> The module looks for files in $jujudatadir/wrench (typically
> >> /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> >> upgrade failure described above I could drop a file in that directory
> on one
> >> of the state servers named say "machine-agent" with the content:
> >>
> >> refuse-upgrade
> >>
> >> Then in some part of jujud's upgrade code there could be a check like:
> >>
> >> if wrench.IsActive("machine-agent", "refuse-upgrade") {
> >>  // trigger the failure
> >> }
> >>
> >> The idea is this check would be left in the code to aid CI tests and
> future
> >> manual tests.
> >>
> >> You can see the incomplete wrench package here:
> >> https://github.com/juju/juju/pull/508
> >>
> >> There are a few issues to nut out.
> >>
> >> 1. It needs to be difficult/impossible for someone to accidentally or
> >> maliciously activate this feature, especially in production
> environments. I
> >> have almost finished (but not pushed to Github) some changes to the
> wrench
> >> package which make it strict about the ownership and permissions on the
> >> wrench files. This should make it harder for the wrong person to drop
> files
> >> in to the wrench directory.
> >>
> >> The idea has also been floated to only enable this functionality in
> >> non-stable builds. This certainly gives a good level of protection but
> I'm
> >> slightly wary of this approach because it makes it impossible for CI to
> take
> >> advantage of the wrench feature when testing stable release builds. I'm
> >> happy to be convinced that the benefit is worth the cost.
> >>
> >> Other ideas on how to better handle this are very welcome.
> >>
> >> 2. The wrench functionality needs to be disabled during unit test runs
> >> because we don't want any wrench files a developer may have lying
> around to
> >> affect Juju's behaviour during test runs. The wrench package has a
> global
> >> on/off switch so I plan on switching it off in BaseSuite's setup or
> similar.
> >>
> >> 3. The name is a bikeshedding magnet :)  Other names that have been
> bandied
> >> about for this feature are "chaos" and "spanner". I don't care too much
> so
> >> if there's a strong consensus for another name let's use that. I chose
> >> "wrench" over "spanner" because I believe that's the more common usage
> in
> >> the US and because Spanner is a DB from Go

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits
I like the idea being able to trigger failures using the juju command line.

I'm undecided about how the need to fail should be stored. An obvious
location would be in a new collection managed by state, or even as a field
on existing state objects and documents. The downside of this approach is
that a connection to state will then need to be available from where-ever
we would like failures to be triggered - this isn't always possible or
convenient.

Another approach would be to have "juju inject-failure" drop files in some
location (along the lines of what I've already implemented) using SSH. This
has the advantage of making the failure checks easy to perform from
anywhere with the disadvantage of making it more difficult to manage
existing failures. There would also be some added complexity when creating
failure files for about-to-be-created entities (e.g. the "juju deploy
--inject-failure" case).

Do you have any thoughts on this?




On 14 August 2014 02:25, Gustavo Niemeyer 
wrote:

> That's a nice direction, Menno.
>
> The main thing that comes to mind is that it sounds quite inconvenient
> to turn the feature on. It may sound otherwise because it's so easy to
> drop files at arbitrary places in our local machines, but when dealing
> with a distributed system that knows how to spawn its own resources
> up, suddenly the "just write a file" becomes surprisingly boring and
> race prone.
>
> What about:
>
> juju inject-failure [--unit=unit] [--service=service] "?
> juju deploy [--inject-failure=name] ...
>
>
>
> On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits 
> wrote:
> > There's been some discussion recently about adding some feature to Juju
> to
> > allow developers or CI tests to intentionally trigger otherwise hard to
> > induce failures in specific parts of Juju. The idea is that sometimes we
> > need some kind of failure to happen in a CI test or when manually testing
> > but those failures can often be hard to make happen.
> >
> > For example, for changes Juju's upgrade mechanics that I'm working on at
> the
> > moment I would like to ensure that an upgrade is cleanly aborted if one
> of
> > the state servers in a HA environment refuses to start the upgrade. This
> > logic is well unit tested but there's nothing like seeing it actually
> work
> > in a real environment to build confidence - however, it isn't easy to
> make a
> > state server misbehave in this way.
> >
> > To help with this kind of testing scenario, I've created a new top-level
> > package called "wrench" which lets us "drop a wrench in the works" so to
> > speak. It's very simple with one main API which can be called from
> > judiciously chosen points in Juju's execution to decide whether some
> failure
> > should be triggered.
> >
> > The module looks for files in $jujudatadir/wrench (typically
> > /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> > upgrade failure described above I could drop a file in that directory on
> one
> > of the state servers named say "machine-agent" with the content:
> >
> > refuse-upgrade
> >
> > Then in some part of jujud's upgrade code there could be a check like:
> >
> > if wrench.IsActive("machine-agent", "refuse-upgrade") {
> >  // trigger the failure
> > }
> >
> > The idea is this check would be left in the code to aid CI tests and
> future
> > manual tests.
> >
> > You can see the incomplete wrench package here:
> > https://github.com/juju/juju/pull/508
> >
> > There are a few issues to nut out.
> >
> > 1. It needs to be difficult/impossible for someone to accidentally or
> > maliciously activate this feature, especially in production
> environments. I
> > have almost finished (but not pushed to Github) some changes to the
> wrench
> > package which make it strict about the ownership and permissions on the
> > wrench files. This should make it harder for the wrong person to drop
> files
> > in to the wrench directory.
> >
> > The idea has also been floated to only enable this functionality in
> > non-stable builds. This certainly gives a good level of protection but
> I'm
> > slightly wary of this approach because it makes it impossible for CI to
> take
> > advantage of the wrench feature when testing stable release builds. I'm
> > happy to be convinced that the benefit is worth the cost.
> >
> > Other ideas on how to better handle this are very welcome.
> >
> > 2. The wrench functionality needs to be disabled during unit test runs
> > because we don't want any wrench files a developer may have lying around
> to
> > affect Juju's behaviour during test runs. The wrench package has a global
> > on/off switch so I plan on switching it off in BaseSuite's setup or
> similar.
> >
> > 3. The name is a bikeshedding magnet :)  Other names that have been
> bandied
> > about for this feature are "chaos" and "spanner". I don't care too much
> so
> > if there's a strong consensus for another name let's use that. I chose
> > "wrench" over "spanner" because I believe that's the 

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Wayne Witzel
Not much to add except to say I really like this work and I think it is
going to really help us make Juju much better when encountering failures. I
also like the idea of providing easy access to triggering  failures through
CLI commands.


On Wed, Aug 13, 2014 at 10:29 AM, Gustavo Niemeyer <
gustavo.nieme...@canonical.com> wrote:

> Ah, and one more thing: when developing the chaos-injection mechanism
> in the mgo/txn package, I also added both a "chance" parameter for
> either killing or slowing down a given breakpoint. It sounds like it
> would be useful for juju's mechanism too. If you kill every time, it's
> hard to tell whether the system would know how to retry properly.
> Killing or slowing down just sometimes, or perhaps the first 2 times
> out of every 3, for example, would enable the system to recover
> itself, and an external agent to ensure it continues to work properly.
>
> On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
>  wrote:
> > That's a nice direction, Menno.
> >
> > The main thing that comes to mind is that it sounds quite inconvenient
> > to turn the feature on. It may sound otherwise because it's so easy to
> > drop files at arbitrary places in our local machines, but when dealing
> > with a distributed system that knows how to spawn its own resources
> > up, suddenly the "just write a file" becomes surprisingly boring and
> > race prone.
> >
> > What about:
> >
> > juju inject-failure [--unit=unit] [--service=service]  name>"?
> > juju deploy [--inject-failure=name] ...
> >
> >
> >
> > On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits 
> wrote:
> >> There's been some discussion recently about adding some feature to Juju
> to
> >> allow developers or CI tests to intentionally trigger otherwise hard to
> >> induce failures in specific parts of Juju. The idea is that sometimes we
> >> need some kind of failure to happen in a CI test or when manually
> testing
> >> but those failures can often be hard to make happen.
> >>
> >> For example, for changes Juju's upgrade mechanics that I'm working on
> at the
> >> moment I would like to ensure that an upgrade is cleanly aborted if one
> of
> >> the state servers in a HA environment refuses to start the upgrade. This
> >> logic is well unit tested but there's nothing like seeing it actually
> work
> >> in a real environment to build confidence - however, it isn't easy to
> make a
> >> state server misbehave in this way.
> >>
> >> To help with this kind of testing scenario, I've created a new top-level
> >> package called "wrench" which lets us "drop a wrench in the works" so to
> >> speak. It's very simple with one main API which can be called from
> >> judiciously chosen points in Juju's execution to decide whether some
> failure
> >> should be triggered.
> >>
> >> The module looks for files in $jujudatadir/wrench (typically
> >> /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> >> upgrade failure described above I could drop a file in that directory
> on one
> >> of the state servers named say "machine-agent" with the content:
> >>
> >> refuse-upgrade
> >>
> >> Then in some part of jujud's upgrade code there could be a check like:
> >>
> >> if wrench.IsActive("machine-agent", "refuse-upgrade") {
> >>  // trigger the failure
> >> }
> >>
> >> The idea is this check would be left in the code to aid CI tests and
> future
> >> manual tests.
> >>
> >> You can see the incomplete wrench package here:
> >> https://github.com/juju/juju/pull/508
> >>
> >> There are a few issues to nut out.
> >>
> >> 1. It needs to be difficult/impossible for someone to accidentally or
> >> maliciously activate this feature, especially in production
> environments. I
> >> have almost finished (but not pushed to Github) some changes to the
> wrench
> >> package which make it strict about the ownership and permissions on the
> >> wrench files. This should make it harder for the wrong person to drop
> files
> >> in to the wrench directory.
> >>
> >> The idea has also been floated to only enable this functionality in
> >> non-stable builds. This certainly gives a good level of protection but
> I'm
> >> slightly wary of this approach because it makes it impossible for CI to
> take
> >> advantage of the wrench feature when testing stable release builds. I'm
> >> happy to be convinced that the benefit is worth the cost.
> >>
> >> Other ideas on how to better handle this are very welcome.
> >>
> >> 2. The wrench functionality needs to be disabled during unit test runs
> >> because we don't want any wrench files a developer may have lying
> around to
> >> affect Juju's behaviour during test runs. The wrench package has a
> global
> >> on/off switch so I plan on switching it off in BaseSuite's setup or
> similar.
> >>
> >> 3. The name is a bikeshedding magnet :)  Other names that have been
> bandied
> >> about for this feature are "chaos" and "spanner". I don't care too much
> so
> >> if there's a strong consensus for another name let's use that.

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Gustavo Niemeyer
Ah, and one more thing: when developing the chaos-injection mechanism
in the mgo/txn package, I also added both a "chance" parameter for
either killing or slowing down a given breakpoint. It sounds like it
would be useful for juju's mechanism too. If you kill every time, it's
hard to tell whether the system would know how to retry properly.
Killing or slowing down just sometimes, or perhaps the first 2 times
out of every 3, for example, would enable the system to recover
itself, and an external agent to ensure it continues to work properly.

On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
 wrote:
> That's a nice direction, Menno.
>
> The main thing that comes to mind is that it sounds quite inconvenient
> to turn the feature on. It may sound otherwise because it's so easy to
> drop files at arbitrary places in our local machines, but when dealing
> with a distributed system that knows how to spawn its own resources
> up, suddenly the "just write a file" becomes surprisingly boring and
> race prone.
>
> What about:
>
> juju inject-failure [--unit=unit] [--service=service] "?
> juju deploy [--inject-failure=name] ...
>
>
>
> On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits  
> wrote:
>> There's been some discussion recently about adding some feature to Juju to
>> allow developers or CI tests to intentionally trigger otherwise hard to
>> induce failures in specific parts of Juju. The idea is that sometimes we
>> need some kind of failure to happen in a CI test or when manually testing
>> but those failures can often be hard to make happen.
>>
>> For example, for changes Juju's upgrade mechanics that I'm working on at the
>> moment I would like to ensure that an upgrade is cleanly aborted if one of
>> the state servers in a HA environment refuses to start the upgrade. This
>> logic is well unit tested but there's nothing like seeing it actually work
>> in a real environment to build confidence - however, it isn't easy to make a
>> state server misbehave in this way.
>>
>> To help with this kind of testing scenario, I've created a new top-level
>> package called "wrench" which lets us "drop a wrench in the works" so to
>> speak. It's very simple with one main API which can be called from
>> judiciously chosen points in Juju's execution to decide whether some failure
>> should be triggered.
>>
>> The module looks for files in $jujudatadir/wrench (typically
>> /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
>> upgrade failure described above I could drop a file in that directory on one
>> of the state servers named say "machine-agent" with the content:
>>
>> refuse-upgrade
>>
>> Then in some part of jujud's upgrade code there could be a check like:
>>
>> if wrench.IsActive("machine-agent", "refuse-upgrade") {
>>  // trigger the failure
>> }
>>
>> The idea is this check would be left in the code to aid CI tests and future
>> manual tests.
>>
>> You can see the incomplete wrench package here:
>> https://github.com/juju/juju/pull/508
>>
>> There are a few issues to nut out.
>>
>> 1. It needs to be difficult/impossible for someone to accidentally or
>> maliciously activate this feature, especially in production environments. I
>> have almost finished (but not pushed to Github) some changes to the wrench
>> package which make it strict about the ownership and permissions on the
>> wrench files. This should make it harder for the wrong person to drop files
>> in to the wrench directory.
>>
>> The idea has also been floated to only enable this functionality in
>> non-stable builds. This certainly gives a good level of protection but I'm
>> slightly wary of this approach because it makes it impossible for CI to take
>> advantage of the wrench feature when testing stable release builds. I'm
>> happy to be convinced that the benefit is worth the cost.
>>
>> Other ideas on how to better handle this are very welcome.
>>
>> 2. The wrench functionality needs to be disabled during unit test runs
>> because we don't want any wrench files a developer may have lying around to
>> affect Juju's behaviour during test runs. The wrench package has a global
>> on/off switch so I plan on switching it off in BaseSuite's setup or similar.
>>
>> 3. The name is a bikeshedding magnet :)  Other names that have been bandied
>> about for this feature are "chaos" and "spanner". I don't care too much so
>> if there's a strong consensus for another name let's use that. I chose
>> "wrench" over "spanner" because I believe that's the more common usage in
>> the US and because Spanner is a DB from Google. Let's not get carried
>> away...
>>
>> All comments, ideas and concerns welcome.
>>
>> - Menno
>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> gustavo @ http://niemeyer.net



-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe a

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Gustavo Niemeyer
That's a nice direction, Menno.

The main thing that comes to mind is that it sounds quite inconvenient
to turn the feature on. It may sound otherwise because it's so easy to
drop files at arbitrary places in our local machines, but when dealing
with a distributed system that knows how to spawn its own resources
up, suddenly the "just write a file" becomes surprisingly boring and
race prone.

What about:

juju inject-failure [--unit=unit] [--service=service] "?
juju deploy [--inject-failure=name] ...



On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits  wrote:
> There's been some discussion recently about adding some feature to Juju to
> allow developers or CI tests to intentionally trigger otherwise hard to
> induce failures in specific parts of Juju. The idea is that sometimes we
> need some kind of failure to happen in a CI test or when manually testing
> but those failures can often be hard to make happen.
>
> For example, for changes Juju's upgrade mechanics that I'm working on at the
> moment I would like to ensure that an upgrade is cleanly aborted if one of
> the state servers in a HA environment refuses to start the upgrade. This
> logic is well unit tested but there's nothing like seeing it actually work
> in a real environment to build confidence - however, it isn't easy to make a
> state server misbehave in this way.
>
> To help with this kind of testing scenario, I've created a new top-level
> package called "wrench" which lets us "drop a wrench in the works" so to
> speak. It's very simple with one main API which can be called from
> judiciously chosen points in Juju's execution to decide whether some failure
> should be triggered.
>
> The module looks for files in $jujudatadir/wrench (typically
> /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
> upgrade failure described above I could drop a file in that directory on one
> of the state servers named say "machine-agent" with the content:
>
> refuse-upgrade
>
> Then in some part of jujud's upgrade code there could be a check like:
>
> if wrench.IsActive("machine-agent", "refuse-upgrade") {
>  // trigger the failure
> }
>
> The idea is this check would be left in the code to aid CI tests and future
> manual tests.
>
> You can see the incomplete wrench package here:
> https://github.com/juju/juju/pull/508
>
> There are a few issues to nut out.
>
> 1. It needs to be difficult/impossible for someone to accidentally or
> maliciously activate this feature, especially in production environments. I
> have almost finished (but not pushed to Github) some changes to the wrench
> package which make it strict about the ownership and permissions on the
> wrench files. This should make it harder for the wrong person to drop files
> in to the wrench directory.
>
> The idea has also been floated to only enable this functionality in
> non-stable builds. This certainly gives a good level of protection but I'm
> slightly wary of this approach because it makes it impossible for CI to take
> advantage of the wrench feature when testing stable release builds. I'm
> happy to be convinced that the benefit is worth the cost.
>
> Other ideas on how to better handle this are very welcome.
>
> 2. The wrench functionality needs to be disabled during unit test runs
> because we don't want any wrench files a developer may have lying around to
> affect Juju's behaviour during test runs. The wrench package has a global
> on/off switch so I plan on switching it off in BaseSuite's setup or similar.
>
> 3. The name is a bikeshedding magnet :)  Other names that have been bandied
> about for this feature are "chaos" and "spanner". I don't care too much so
> if there's a strong consensus for another name let's use that. I chose
> "wrench" over "spanner" because I believe that's the more common usage in
> the US and because Spanner is a DB from Google. Let's not get carried
> away...
>
> All comments, ideas and concerns welcome.
>
> - Menno
>
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>

-- 
gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev