Re: High Availability command line interface - future plans.

2013-11-11 Thread roger peppe
In the end I think it comes down to a philosophical difference.

I believe in implementing systems from the bottom up out of well
understood simple-as-possible components with easily understood
properties.  I am aware that another approach is to start with a partially
implemented primitive that represents a design goal and fill out its
implementation until it meets that goal.

In this discussion, ensure-ha seems to me to epitomise the second
approach and I do understand the arguments for it. With reference to
Mark's mention of the inmates running the asylum, I realise that, by
that analogy, I am most certainly an inmate here. My ideas about what
might make for a solid and straightforward tool to use are biased by my
knowledge of the structure of the system.

William's response is clear, so ensure-ha it is.  I'm afraid we're back
where we started but I've found this conversation useful and hope that
others have too.

 I don't have the will to bike-shed around the actual command we use,
 however I strongly suggest that we go with something that makes sense to
 Jorge and Marco (and to our CTS folks) as they are our people on the
 ground, using this tool.

It would have been great to have had feedback from the CTS folks (possibly
the biggest current operational users of Juju?) for their views.

  cheers,
rog.

On 11 November 2013 03:50, Tim Penhey tim.pen...@canonical.com wrote:
 On 09/11/13 03:04, roger peppe wrote:
 On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
 juju add-state-server --api-only-please-thanks

 And if we want to allow a machine that runs the environment-manager
 workers but not the api server or mongo server (not actually an unlikely 
 thing
 given certain future possibilities) then add-state-server is a command that
 doesn't necessarily add a state server at all... That thought
 was the source of my doubt.

 I think that it is reasonable to think of just the db and the api server
 from the user's point of view.

 The fact that we may run other workers along side the api server is up
 to us, and not something we actually need to expose to people.

 Most of our users should have no problem at all understanding juju:db
 and juju:api (or whatever names we call them).

 That said, it's just a spelling. If there's general agreement on 
 state-server,
 so be it - I'm very happy to move forward with that.

 I cringe whenever I see state used anywhere.

 I would like use to move towards namespaced services with a common
 understanding, but I'm happy to have that significantly down the line.

 Just remember that whatever command we come up with, it needs to be
 easily explained to our new users.  I like the idea of a special command
 that handles the HA-ness of juju, because it means we can give
 meaningful error messages when people do things not quite right (like
 adding just one more mongo db thinking it is enough).

 I don't have the will to bike-shed around the actual command we use,
 however I strongly suggest that we go with something that makes sense to
 Jorge and Marco (and to our CTS folks) as they are our people on the
 ground, using this tool.

 Cheers,
 Tim


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: 
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-10 Thread Tim Penhey
On 09/11/13 03:04, roger peppe wrote:
 On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
 juju add-state-server --api-only-please-thanks
 
 And if we want to allow a machine that runs the environment-manager
 workers but not the api server or mongo server (not actually an unlikely thing
 given certain future possibilities) then add-state-server is a command that
 doesn't necessarily add a state server at all... That thought
 was the source of my doubt.

I think that it is reasonable to think of just the db and the api server
from the user's point of view.

The fact that we may run other workers along side the api server is up
to us, and not something we actually need to expose to people.

Most of our users should have no problem at all understanding juju:db
and juju:api (or whatever names we call them).

 That said, it's just a spelling. If there's general agreement on 
 state-server,
 so be it - I'm very happy to move forward with that.

I cringe whenever I see state used anywhere.

I would like use to move towards namespaced services with a common
understanding, but I'm happy to have that significantly down the line.

Just remember that whatever command we come up with, it needs to be
easily explained to our new users.  I like the idea of a special command
that handles the HA-ness of juju, because it means we can give
meaningful error messages when people do things not quite right (like
adding just one more mongo db thinking it is enough).

I don't have the will to bike-shed around the actual command we use,
however I strongly suggest that we go with something that makes sense to
Jorge and Marco (and to our CTS folks) as they are our people on the
ground, using this tool.

Cheers,
Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
I have a few high level thoughts on all of this, but the key thing I want
to say is that we need to get a meeting setup next week for the solution to
get hammered out.

First, conceptually, I don't believe the user model needs to match the
implementation model.  That way lies madness -- users care about the things
they care about and should not have to understand how the system works to
get something basic done. See:
http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
reasons why I call this madness.

For that reason I think the path of adding a --jobs flag to add-machine is
not a move forward.  It is exposing implementation detail to users and
forcing them into a more complex conceptual model.

Second, we don't have to boil the ocean all at once. An ensure-ha command
that sets up additional server nodes is better than what we have now --
nothing.  Nate is right, the box need not be black, we could have an juju
ha-status command that just shows the state of HA.   This is fundamentally
different than changing the behavior and meaning of add-machines to know
about juju jobs and agents and forcing folks to think about that.

Third, we I think it is possible to chart a course from ensure-ha as a
shortcut (implemented first) to the type of syntax and feature set that
Kapil is talking about.  And let's not kid ourselves, there are a bunch of
new features in that proposal:

 * Namespaces for services
 * support for subordinates to state services
 * logging changes
 * lifecycle events on juju jobs
 * special casing the removal of services that would kill the environment
 * special casing the stats to know about HA and warn for even state server
nodes

I think we will be adding a new concept and some new syntax when we add HA
to juju -- so the idea is just to make it easier for users to understand,
and to allow a path forward to something like what Kapil suggests in the
future.   And I'm pretty solidly convinced that there is an incremental
path forward.

Fourth, the spelling ensure-ha is probably not a very good idea, the
cracks in that system (like taking a -n flag, and dealing with failed
machines) are already apparent.

I think something like Nick's proposal for add-manager would be better.
Though I don't think that's quite right either.

So, I propose we add one new idea for users -- a state-server.

then you'd have:

juju management --info
juju management --add
juju management --add --to 3
juju management --remove-from

I know this is not following the add-machine format, but I think it would
be better to migrate that to something more like this:

juju machine --add

--Mark Ramm





On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote:

 On 6 November 2013 20:07, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
  instead of adding more complexity and concepts, it would be ideal if we
  could reuse the primitives we already have. ie juju environments have
 three
  user exposed services, that users can add-unit / remove-unit etc.  they
 have
  a juju prefix and therefore are omitted by default from status listing.
  That's a much simpler story to document. how do i scale my state server..
  juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

 I have a lot of sympathy with this point of view. I've thought about
 it quite a bit.

 I see two possibilities for implementing it:

 1) Keep something like the existing architecture, where machine agents can
 take on managerial roles, but provide a veneer over the top which
 specially interprets service operations on the juju built-in services
 and translates them into operations on machine jobs.

 2) Actually implement the various juju services as proper services.

 The difficulty I have with 1) is that there's a significant mismatch
 between
 the user's view of things and what's going on underneath.
 For instance, with a built-in service, can I:

 - add a subordinate service to it?
 - see the relevant log file in the usual place for a unit?
 - see its charm metadata?
 - join to its juju-info relation?

 If it's a single service, how can its units span different series?
 (presumably it has got a charm URL, which includes the series)

 I fear that if we try this approach, the cracks show through
 and the result is a system that's hard to understand because
 too many things are not what they appear.
 And that's not even going into the plethora of special
 casing that this approach would require throughout the code.

 2) is more attractive, as it's actually doing what's written on the
 label. But this has its own problems.

 - it's a highly significant architectural change.

 - juju managerial services are tightly tied into the operation
 of juju itself (not surprisingly). There are many chicken and egg
 problems here - we would be trying to use the system to support itself,
 and that could easily lead to deadlock as one part of the system
 tries to talk to another part of the system that relies on the 

Re: High Availability command line interface - future plans.

2013-11-08 Thread Andrew Wilkins
On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen 
mark.ramm-christen...@canonical.com wrote:

 I have a few high level thoughts on all of this, but the key thing I want
 to say is that we need to get a meeting setup next week for the solution to
 get hammered out.

 First, conceptually, I don't believe the user model needs to match the
 implementation model.  That way lies madness -- users care about the things
 they care about and should not have to understand how the system works to
 get something basic done. See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
 reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to add-machine is
 not a move forward.  It is exposing implementation detail to users and
 forcing them into a more complex conceptual model.

 Second, we don't have to boil the ocean all at once. An ensure-ha
 command that sets up additional server nodes is better than what we have
 now -- nothing.  Nate is right, the box need not be black, we could have an
 juju ha-status command that just shows the state of HA.   This is
 fundamentally different than changing the behavior and meaning of
 add-machines to know about juju jobs and agents and forcing folks to think
 about that.

 Third, we I think it is possible to chart a course from ensure-ha as a
 shortcut (implemented first) to the type of syntax and feature set that
 Kapil is talking about.  And let's not kid ourselves, there are a bunch of
 new features in that proposal:

  * Namespaces for services
  * support for subordinates to state services
  * logging changes
  * lifecycle events on juju jobs
  * special casing the removal of services that would kill the environment
  * special casing the stats to know about HA and warn for even state
 server nodes

 I think we will be adding a new concept and some new syntax when we add HA
 to juju -- so the idea is just to make it easier for users to understand,
 and to allow a path forward to something like what Kapil suggests in the
 future.   And I'm pretty solidly convinced that there is an incremental
 path forward.

 Fourth, the spelling ensure-ha is probably not a very good idea, the
 cracks in that system (like taking a -n flag, and dealing with failed
 machines) are already apparent.

 I think something like Nick's proposal for add-manager would be better.
   Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info
 juju management --add
 juju management --add --to 3
 juju management --remove-from


Sounds good to me. Similar to how I was thinking of doing it originally,
but segregating it from add-machine etc. should prevent adding cognitive
overhead for users that don't care. Also, not so much leakage of internals,
and no magic (a good thing!)

I know this is not following the add-machine format, but I think it would
 be better to migrate that to something more like this:

 juju machine --add

 --Mark Ramm





 On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote:

 On 6 November 2013 20:07, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
  instead of adding more complexity and concepts, it would be ideal if we
  could reuse the primitives we already have. ie juju environments have
 three
  user exposed services, that users can add-unit / remove-unit etc.  they
 have
  a juju prefix and therefore are omitted by default from status listing.
  That's a much simpler story to document. how do i scale my state
 server..
  juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

 I have a lot of sympathy with this point of view. I've thought about
 it quite a bit.

 I see two possibilities for implementing it:

 1) Keep something like the existing architecture, where machine agents can
 take on managerial roles, but provide a veneer over the top which
 specially interprets service operations on the juju built-in services
 and translates them into operations on machine jobs.

 2) Actually implement the various juju services as proper services.

 The difficulty I have with 1) is that there's a significant mismatch
 between
 the user's view of things and what's going on underneath.
 For instance, with a built-in service, can I:

 - add a subordinate service to it?
 - see the relevant log file in the usual place for a unit?
 - see its charm metadata?
 - join to its juju-info relation?

 If it's a single service, how can its units span different series?
 (presumably it has got a charm URL, which includes the series)

 I fear that if we try this approach, the cracks show through
 and the result is a system that's hard to understand because
 too many things are not what they appear.
 And that's not even going into the plethora of special
 casing that this approach would require throughout the code.

 2) is more attractive, as it's actually doing what's written on the
 label. But 

Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
Given a bit of thought the reasons that I proposed the sub command
remove-from rather than just remove are both obscure enough that I should
have explained them, and wrong enough that I should not have proposed that
syntax.

I was thinking that remove always requires a machine ID, and that add did
not which made them asymmetric enough to justify a different spelling, but
a bit of further thinking leads me to think that this is already the case
with add-unit and remove-unit, and therefore consistency is better than a
new spelling.



On Fri, Nov 8, 2013 at 5:15 PM, Andrew Wilkins andrew.wilk...@canonical.com
 wrote:

 On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen 
 mark.ramm-christen...@canonical.com wrote:

 I have a few high level thoughts on all of this, but the key thing I want
 to say is that we need to get a meeting setup next week for the solution to
 get hammered out.

 First, conceptually, I don't believe the user model needs to match the
 implementation model.  That way lies madness -- users care about the things
 they care about and should not have to understand how the system works to
 get something basic done. See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
 reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to add-machine
 is not a move forward.  It is exposing implementation detail to users and
 forcing them into a more complex conceptual model.

 Second, we don't have to boil the ocean all at once. An ensure-ha
 command that sets up additional server nodes is better than what we have
 now -- nothing.  Nate is right, the box need not be black, we could have an
 juju ha-status command that just shows the state of HA.   This is
 fundamentally different than changing the behavior and meaning of
 add-machines to know about juju jobs and agents and forcing folks to think
 about that.

 Third, we I think it is possible to chart a course from ensure-ha as a
 shortcut (implemented first) to the type of syntax and feature set that
 Kapil is talking about.  And let's not kid ourselves, there are a bunch of
 new features in that proposal:

  * Namespaces for services
  * support for subordinates to state services
  * logging changes
  * lifecycle events on juju jobs
  * special casing the removal of services that would kill the environment
  * special casing the stats to know about HA and warn for even state
 server nodes

 I think we will be adding a new concept and some new syntax when we add
 HA to juju -- so the idea is just to make it easier for users to
 understand, and to allow a path forward to something like what Kapil
 suggests in the future.   And I'm pretty solidly convinced that there is an
 incremental path forward.

 Fourth, the spelling ensure-ha is probably not a very good idea, the
 cracks in that system (like taking a -n flag, and dealing with failed
 machines) are already apparent.

 I think something like Nick's proposal for add-manager would be
 better.   Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info
 juju management --add
 juju management --add --to 3
 juju management --remove-from


 Sounds good to me. Similar to how I was thinking of doing it originally,
 but segregating it from add-machine etc. should prevent adding cognitive
 overhead for users that don't care. Also, not so much leakage of internals,
 and no magic (a good thing!)

 I know this is not following the add-machine format, but I think it would
 be better to migrate that to something more like this:

 juju machine --add

 --Mark Ramm





 On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.comwrote:

 On 6 November 2013 20:07, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
  instead of adding more complexity and concepts, it would be ideal if we
  could reuse the primitives we already have. ie juju environments have
 three
  user exposed services, that users can add-unit / remove-unit etc.
  they have
  a juju prefix and therefore are omitted by default from status listing.
  That's a much simpler story to document. how do i scale my state
 server..
  juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

 I have a lot of sympathy with this point of view. I've thought about
 it quite a bit.

 I see two possibilities for implementing it:

 1) Keep something like the existing architecture, where machine agents
 can
 take on managerial roles, but provide a veneer over the top which
 specially interprets service operations on the juju built-in services
 and translates them into operations on machine jobs.

 2) Actually implement the various juju services as proper services.

 The difficulty I have with 1) is that there's a significant mismatch
 between
 the user's view of things and what's going on underneath.
 For instance, with a built-in service, can I:

 - add a subordinate 

Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 08:47, Mark Canonical Ramm-Christensen
mark.ramm-christen...@canonical.com wrote:
 I have a few high level thoughts on all of this, but the key thing I want to
 say is that we need to get a meeting setup next week for the solution to get
 hammered out.

 First, conceptually, I don't believe the user model needs to match the
 implementation model.  That way lies madness -- users care about the things
 they care about and should not have to understand how the system works to
 get something basic done. See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
 reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to add-machine is
 not a move forward.  It is exposing implementation detail to users and
 forcing them into a more complex conceptual model.

 Second, we don't have to boil the ocean all at once. An ensure-ha command
 that sets up additional server nodes is better than what we have now --
 nothing.  Nate is right, the box need not be black, we could have an juju
 ha-status command that just shows the state of HA.   This is fundamentally
 different than changing the behavior and meaning of add-machines to know
 about juju jobs and agents and forcing folks to think about that.

 Third, we I think it is possible to chart a course from ensure-ha as a
 shortcut (implemented first) to the type of syntax and feature set that
 Kapil is talking about.  And let's not kid ourselves, there are a bunch of
 new features in that proposal:

  * Namespaces for services
  * support for subordinates to state services
  * logging changes
  * lifecycle events on juju jobs
  * special casing the removal of services that would kill the environment
  * special casing the stats to know about HA and warn for even state server
 nodes

 I think we will be adding a new concept and some new syntax when we add HA
 to juju -- so the idea is just to make it easier for users to understand,
 and to allow a path forward to something like what Kapil suggests in the
 future.   And I'm pretty solidly convinced that there is an incremental path
 forward.

 Fourth, the spelling ensure-ha is probably not a very good idea, the
 cracks in that system (like taking a -n flag, and dealing with failed
 machines) are already apparent.

 I think something like Nick's proposal for add-manager would be better.
 Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info
 juju management --add
 juju management --add --to 3
 juju management --remove-from

This seems like a reasonable approach in principle (it's essentially isomorphic
to the --jobs approach AFAICS which makes me happy).

I have to say that I'm not keen on using flags to switch
the basic behaviour of a command. The interaction between
the flags can then become non-obvious (for example a --constraints
flag might be appropriate with --add but not --remove-from).

Ah, but your next message seems to go along with that.

So, to couch your proposal in terms that are consistent with the
rest of the juju commands, here's how I see it could look,
in terms of possible help output from the commands:

usage: juju add-management [options]
purpose: Add Juju management functionality to a machine,
or start a new machine with management functionality.
Any Juju machine can potentially participate as a Juju
manager - this command adds a new such manager.
Note that there should always be an odd number
of active management machines, otherwise the Juju
environment is potentially vulnerable to network
partitioning. If a management machine fails,
a new one should be started to replace it.

options:
--constraints  (= )
additional machine constraints. Ignored if --to is specified.
-e, --environment (= local)
juju environment to operate in
--series (= )
the Ubuntu series of the new machine. Ignored if --to is specified.
--to (=)
   the id of the machine to add management to. If this is not specified,
   a new machine is provisioned.

usage: juju remove-management [options] machine-id
purpose: Remove Juju management functionality from
the machine with the given id. The machine itself is not
destroyed. Note that if there are less than three management
machines remaining, the operation of the Juju environment
will be vulnerable to the failure of a single machine.
It is not possible to remove the last management machine.

options:
-e, --environment (= local)
juju environment to operate in

As a start, we could implement only the add-management command,
and not implement the --to flag. That would be sufficient for our
HA deliverable, I believe. The other features could be added in time
or according to customer demand.

 I know this is not following the add-machine format, but I think it would be
 better to migrate that to something more like this:

 juju machine --add

If we are going to do that, I think we should probably change 

Re: High Availability command line interface - future plans.

2013-11-08 Thread John Arbash Meinel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2013-11-08 14:15, roger peppe wrote:
 On 8 November 2013 08:47, Mark Canonical Ramm-Christensen 
 mark.ramm-christen...@canonical.com wrote:
 I have a few high level thoughts on all of this, but the key
 thing I want to say is that we need to get a meeting setup next
 week for the solution to get hammered out.
 
 First, conceptually, I don't believe the user model needs to
 match the implementation model.  That way lies madness -- users
 care about the things they care about and should not have to
 understand how the system works to get something basic done.
 See: 
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140
 for reasons why I call this madness.
 
 For that reason I think the path of adding a --jobs flag to
 add-machine is not a move forward.  It is exposing implementation
 detail to users and forcing them into a more complex conceptual
 model.
 
 Second, we don't have to boil the ocean all at once. An
 ensure-ha command that sets up additional server nodes is
 better than what we have now -- nothing.  Nate is right, the box
 need not be black, we could have an juju ha-status command that
 just shows the state of HA.   This is fundamentally different
 than changing the behavior and meaning of add-machines to know 
 about juju jobs and agents and forcing folks to think about
 that.
 
 Third, we I think it is possible to chart a course from ensure-ha
 as a shortcut (implemented first) to the type of syntax and
 feature set that Kapil is talking about.  And let's not kid
 ourselves, there are a bunch of new features in that proposal:
 
 * Namespaces for services * support for subordinates to state
 services * logging changes * lifecycle events on juju jobs *
 special casing the removal of services that would kill the
 environment * special casing the stats to know about HA and warn
 for even state server nodes
 
 I think we will be adding a new concept and some new syntax when
 we add HA to juju -- so the idea is just to make it easier for
 users to understand, and to allow a path forward to something
 like what Kapil suggests in the future.   And I'm pretty solidly
 convinced that there is an incremental path forward.
 
 Fourth, the spelling ensure-ha is probably not a very good
 idea, the cracks in that system (like taking a -n flag, and
 dealing with failed machines) are already apparent.
 
 I think something like Nick's proposal for add-manager would be
 better. Though I don't think that's quite right either.
 
 So, I propose we add one new idea for users -- a state-server.
 
 then you'd have:
 
 juju management --info juju management --add juju management
 --add --to 3 juju management --remove-from
 
 This seems like a reasonable approach in principle (it's
 essentially isomorphic to the --jobs approach AFAICS which makes me
 happy).
 
 I have to say that I'm not keen on using flags to switch the basic
 behaviour of a command. The interaction between the flags can then
 become non-obvious (for example a --constraints flag might be
 appropriate with --add but not --remove-from).
 
 Ah, but your next message seems to go along with that.
 
 So, to couch your proposal in terms that are consistent with the 
 rest of the juju commands, here's how I see it could look, in terms
 of possible help output from the commands:
 
 usage: juju add-management [options] purpose: Add Juju management
 functionality to a machine, or start a new machine with management
 functionality. Any Juju machine can potentially participate as a
 Juju manager - this command adds a new such manager. Note that
 there should always be an odd number of active management machines,
 otherwise the Juju environment is potentially vulnerable to
 network partitioning. If a management machine fails, a new one
 should be started to replace it.

I would probably avoid putting such an emphasis on any machine can be
a manager machine. But that is my personal opinion. (If you want HA
you probably want it on dedicated nodes.)

 
 options: --constraints  (= ) additional machine constraints.
 Ignored if --to is specified. -e, --environment (= local) juju
 environment to operate in --series (= ) the Ubuntu series of the
 new machine. Ignored if --to is specified. --to (=) the id of the
 machine to add management to. If this is not specified, a new
 machine is provisioned.
 
 usage: juju remove-management [options] machine-id purpose:
 Remove Juju management functionality from the machine with the
 given id. The machine itself is not destroyed. Note that if there
 are less than three management machines remaining, the operation of
 the Juju environment will be vulnerable to the failure of a single
 machine. It is not possible to remove the last management machine.
 

I would probably also remove the machine if the only thing on it was
the management. Certainly that is how people want us to do juju
remove-unit.


 options: -e, --environment (= local) juju environment to operate
 in
 
 

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel
j...@arbash-meinel.com wrote:
 I would probably avoid putting such an emphasis on any machine can be
 a manager machine. But that is my personal opinion. (If you want HA
 you probably want it on dedicated nodes.)

Resource waste holds juju back for the small users. Being able to
share a state server with other resources does sound attractive from
that perspective. It may be the difference between running 3 machines
or 6.

 I would probably also remove the machine if the only thing on it was
 the management. Certainly that is how people want us to do juju
 remove-unit.

If there are other units in the same machine, we should definitely not
remove the machine on remove-unit. The principle sounds the same with
state servers.

 The main problem with this is that it feels slightly too easy to add
 just 1 machine and then not actually have HA (mongo stops allowing
 writes if you have a 2-node cluster and lose one, right?)

+1


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
On Fri, Nov 8, 2013 at 6:34 AM, Gustavo Niemeyer gust...@niemeyer.netwrote:

 On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel
 j...@arbash-meinel.com wrote:
  I would probably avoid putting such an emphasis on any machine can be
  a manager machine. But that is my personal opinion. (If you want HA
  you probably want it on dedicated nodes.)

 Resource waste holds juju back for the small users. Being able to
 share a state server with other resources does sound attractive from
 that perspective. It may be the difference between running 3 machines
 or 6.


If you only have 3 machines, do you really need HA from juju? You don't
have HA from your machines that are actually *running your service*.


  I would probably also remove the machine if the only thing on it was
  the management. Certainly that is how people want us to do juju
  remove-unit.

 If there are other units in the same machine, we should definitely not
 remove the machine on remove-unit. The principle sounds the same with
 state servers.

  The main problem with this is that it feels slightly too easy to add
  just 1 machine and then not actually have HA (mongo stops allowing
  writes if you have a 2-node cluster and lose one, right?)

 +1


Yeah, same here. I still think we need a turn on HA mode command that'll
bring you to 3 servers.  It doesn't have to be the swiss army knife that we
said before... just something to go from non-HA to valid HA environment.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
These are *very* good points, Mark. Taking them to heart will
definitely lead into a good direction for the overall feature
development.

It sounds like we should avoid using a management command for
anything in juju, though. Most things in juju are about management one
way or the other, so juju management becomes very unclear and hard
to search for.

Instead, the command might be named after what we've been calling them:

juju add-state-server -n 2

For implementation convenience sake, it would be okay to only ever
accept -n 2 when this is first released. I can also imagine the
behavior of this command resembling add-unit in a few aspects, since a
state server is in fact code that just needs a home to run in. This
may yield other common options across them, such as machine selection.


On Fri, Nov 8, 2013 at 6:47 AM, Mark Canonical Ramm-Christensen
mark.ramm-christen...@canonical.com wrote:
 I have a few high level thoughts on all of this, but the key thing I want to
 say is that we need to get a meeting setup next week for the solution to get
 hammered out.

 First, conceptually, I don't believe the user model needs to match the
 implementation model.  That way lies madness -- users care about the things
 they care about and should not have to understand how the system works to
 get something basic done. See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
 reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to add-machine is
 not a move forward.  It is exposing implementation detail to users and
 forcing them into a more complex conceptual model.

 Second, we don't have to boil the ocean all at once. An ensure-ha command
 that sets up additional server nodes is better than what we have now --
 nothing.  Nate is right, the box need not be black, we could have an juju
 ha-status command that just shows the state of HA.   This is fundamentally
 different than changing the behavior and meaning of add-machines to know
 about juju jobs and agents and forcing folks to think about that.

 Third, we I think it is possible to chart a course from ensure-ha as a
 shortcut (implemented first) to the type of syntax and feature set that
 Kapil is talking about.  And let's not kid ourselves, there are a bunch of
 new features in that proposal:

  * Namespaces for services
  * support for subordinates to state services
  * logging changes
  * lifecycle events on juju jobs
  * special casing the removal of services that would kill the environment
  * special casing the stats to know about HA and warn for even state server
 nodes

 I think we will be adding a new concept and some new syntax when we add HA
 to juju -- so the idea is just to make it easier for users to understand,
 and to allow a path forward to something like what Kapil suggests in the
 future.   And I'm pretty solidly convinced that there is an incremental path
 forward.

 Fourth, the spelling ensure-ha is probably not a very good idea, the
 cracks in that system (like taking a -n flag, and dealing with failed
 machines) are already apparent.

 I think something like Nick's proposal for add-manager would be better.
 Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info
 juju management --add
 juju management --add --to 3
 juju management --remove-from

 I know this is not following the add-machine format, but I think it would be
 better to migrate that to something more like this:

 juju machine --add

 --Mark Ramm





 On Thu, Nov 7, 2013 at 8:16 PM, roger peppe roger.pe...@canonical.com
 wrote:

 On 6 November 2013 20:07, Kapil Thangavelu
 kapil.thangav...@canonical.com wrote:
  instead of adding more complexity and concepts, it would be ideal if we
  could reuse the primitives we already have. ie juju environments have
  three
  user exposed services, that users can add-unit / remove-unit etc.  they
  have
  a juju prefix and therefore are omitted by default from status listing.
  That's a much simpler story to document. how do i scale my state
  server..
  juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

 I have a lot of sympathy with this point of view. I've thought about
 it quite a bit.

 I see two possibilities for implementing it:

 1) Keep something like the existing architecture, where machine agents can
 take on managerial roles, but provide a veneer over the top which
 specially interprets service operations on the juju built-in services
 and translates them into operations on machine jobs.

 2) Actually implement the various juju services as proper services.

 The difficulty I have with 1) is that there's a significant mismatch
 between
 the user's view of things and what's going on underneath.
 For instance, with a built-in service, can I:

 - add a subordinate service to it?
 - see the relevant log file in the usual place for a unit?
 - see its 

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 9:39 AM, Nate Finch nate.fi...@canonical.com wrote:
 If you only have 3 machines, do you really need HA from juju? You don't have
 HA from your machines that are actually running your service.

Why not? I have three machines..

 Yeah, same here. I still think we need a turn on HA mode command that'll
 bring you to 3 servers.  It doesn't have to be the swiss army knife that we
 said before... just something to go from non-HA to valid HA environment.

This looks fine:

juju add-state-server -n 2

It's easy to error if current + n is not a good number.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 11:31, Gustavo Niemeyer gust...@niemeyer.net wrote:
 These are *very* good points, Mark. Taking them to heart will
 definitely lead into a good direction for the overall feature
 development.

 It sounds like we should avoid using a management command for
 anything in juju, though. Most things in juju are about management one
 way or the other, so juju management becomes very unclear and hard
 to search for.

 Instead, the command might be named after what we've been calling them:

 juju add-state-server -n 2

I'm not sure that state-server is the right name here.  For a start there
are two kinds of state servers, mongo and API, which we may want to scale
independently as they have totally different characteristics, and the
management workers (provisioner, etc) also fall under the same umbrella.
Management has been the best I've seen so far, though
I do realise it is overly generic.

Other suggestions?

Are you suggesting that we also have destroy-state-server, BTW?

 It's easy to error if current + n is not a good number.

That seems reasonable. Do you think this needs to be transactional?
That is, if current is 2 and two people concurrently do add-state-server -n 1,
should one of those requests necessarily fail? My inclination is we
don't need to
worry too much - but YMMV.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 10:31, John Arbash Meinel j...@arbash-meinel.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 2013-11-08 14:15, roger peppe wrote:
 On 8 November 2013 08:47, Mark Canonical Ramm-Christensen
 mark.ramm-christen...@canonical.com wrote:
 I have a few high level thoughts on all of this, but the key
 thing I want to say is that we need to get a meeting setup next
 week for the solution to get hammered out.

 First, conceptually, I don't believe the user model needs to
 match the implementation model.  That way lies madness -- users
 care about the things they care about and should not have to
 understand how the system works to get something basic done.
 See:
 http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140
 for reasons why I call this madness.

 For that reason I think the path of adding a --jobs flag to
 add-machine is not a move forward.  It is exposing implementation
 detail to users and forcing them into a more complex conceptual
 model.

 Second, we don't have to boil the ocean all at once. An
 ensure-ha command that sets up additional server nodes is
 better than what we have now -- nothing.  Nate is right, the box
 need not be black, we could have an juju ha-status command that
 just shows the state of HA.   This is fundamentally different
 than changing the behavior and meaning of add-machines to know
 about juju jobs and agents and forcing folks to think about
 that.

 Third, we I think it is possible to chart a course from ensure-ha
 as a shortcut (implemented first) to the type of syntax and
 feature set that Kapil is talking about.  And let's not kid
 ourselves, there are a bunch of new features in that proposal:

 * Namespaces for services * support for subordinates to state
 services * logging changes * lifecycle events on juju jobs *
 special casing the removal of services that would kill the
 environment * special casing the stats to know about HA and warn
 for even state server nodes

 I think we will be adding a new concept and some new syntax when
 we add HA to juju -- so the idea is just to make it easier for
 users to understand, and to allow a path forward to something
 like what Kapil suggests in the future.   And I'm pretty solidly
 convinced that there is an incremental path forward.

 Fourth, the spelling ensure-ha is probably not a very good
 idea, the cracks in that system (like taking a -n flag, and
 dealing with failed machines) are already apparent.

 I think something like Nick's proposal for add-manager would be
 better. Though I don't think that's quite right either.

 So, I propose we add one new idea for users -- a state-server.

 then you'd have:

 juju management --info juju management --add juju management
 --add --to 3 juju management --remove-from

 This seems like a reasonable approach in principle (it's
 essentially isomorphic to the --jobs approach AFAICS which makes me
 happy).

 I have to say that I'm not keen on using flags to switch the basic
 behaviour of a command. The interaction between the flags can then
 become non-obvious (for example a --constraints flag might be
 appropriate with --add but not --remove-from).

 Ah, but your next message seems to go along with that.

 So, to couch your proposal in terms that are consistent with the
 rest of the juju commands, here's how I see it could look, in terms
 of possible help output from the commands:

 usage: juju add-management [options] purpose: Add Juju management
 functionality to a machine, or start a new machine with management
 functionality. Any Juju machine can potentially participate as a
 Juju manager - this command adds a new such manager. Note that
 there should always be an odd number of active management machines,
 otherwise the Juju environment is potentially vulnerable to
 network partitioning. If a management machine fails, a new one
 should be started to replace it.

 I would probably avoid putting such an emphasis on any machine can be
 a manager machine. But that is my personal opinion. (If you want HA
 you probably want it on dedicated nodes.)


 options: --constraints  (= ) additional machine constraints.
 Ignored if --to is specified. -e, --environment (= local) juju
 environment to operate in --series (= ) the Ubuntu series of the
 new machine. Ignored if --to is specified. --to (=) the id of the
 machine to add management to. If this is not specified, a new
 machine is provisioned.

 usage: juju remove-management [options] machine-id purpose:
 Remove Juju management functionality from the machine with the
 given id. The machine itself is not destroyed. Note that if there
 are less than three management machines remaining, the operation of
 the Juju environment will be vulnerable to the failure of a single
 machine. It is not possible to remove the last management machine.


 I would probably also remove the machine if the only thing on it was
 the management. Certainly that is how people want us to do juju
 remove-unit.

That seems 

Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 12:03, Gustavo Niemeyer gust...@niemeyer.net wrote:
 Splitting API and db at some point sounds sensible, but it may be easy and
 convenient to think about a state server as API+db for the time being.

I'd prefer to start with a command name that implies that possibility;
otherwise we'll end up either with a command that doesn't
describe what it actually does, or more very similar commands
where one could be sufficient.

Hence my discomfort with add-state-server as a command name.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
We'll end up with a command that adds a state server, with a replica
of the database and an API server. That's the notion of state server
we've been using all along, and sounds quite reasonable, easy to
explain and understand.

On Fri, Nov 8, 2013 at 10:15 AM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 12:03, Gustavo Niemeyer gust...@niemeyer.net wrote:
 Splitting API and db at some point sounds sensible, but it may be easy and
 convenient to think about a state server as API+db for the time being.

 I'd prefer to start with a command name that implies that possibility;
 otherwise we'll end up either with a command that doesn't
 describe what it actually does, or more very similar commands
 where one could be sufficient.

 Hence my discomfort with add-state-server as a command name.



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
juju add-state-server --api-only-please-thanks




On Fri, Nov 8, 2013 at 11:43 AM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 13:33, Gustavo Niemeyer gust...@niemeyer.net wrote:
 We'll end up with a command that adds a state server, with a replica
 of the database and an API server. That's the notion of state server
 we've been using all along, and sounds quite reasonable, easy to
 explain and understand.

 And when we want to split API and db, as you thought perhaps
 might be sensible at some point, what then?



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
 juju add-state-server --api-only-please-thanks

And if we want to allow a machine that runs the environment-manager
workers but not the api server or mongo server (not actually an unlikely thing
given certain future possibilities) then add-state-server is a command that
doesn't necessarily add a state server at all... That thought
was the source of my doubt.

That said, it's just a spelling. If there's general agreement on state-server,
so be it - I'm very happy to move forward with that.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
Reminds me of one of my favorite quotes:

Knobs are distracting, confusing and annoying.  Personally, I'd rather
things be 90% good 100% of the time than see 90 knobs.  - Brad Fitzpatrick
on having more than one Go scheduler.

https://groups.google.com/forum/#!msg/golang-dev/eu0WzsTtNPo/pcD-zS3JkTYJ


On Fri, Nov 8, 2013 at 9:32 AM, Gustavo Niemeyer gust...@niemeyer.netwrote:

 On Fri, Nov 8, 2013 at 12:04 PM, roger peppe roger.pe...@canonical.com
 wrote:
  On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
  juju add-state-server --api-only-please-thanks
 
  And if we want to allow a machine that runs the environment-manager
  workers but not the api server or mongo server (not actually an unlikely
 thing
  given certain future possibilities) then add-state-server is a command
 that
  doesn't necessarily add a state server at all... That thought
  was the source of my doubt.

 The fact you can organize things a thousand ways doesn't mean we
 should offer a thousand knobs. A state server is a good abstraction
 for there are management routines running there. You can define what
 that means, as long as you don't let things fall down when N/2-1
 machines fall down.


 gustavo @ http://niemeyer.net

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 12:04 PM, roger peppe roger.pe...@canonical.com wrote:
 On 8 November 2013 13:51, Gustavo Niemeyer gust...@niemeyer.net wrote:
 juju add-state-server --api-only-please-thanks

 And if we want to allow a machine that runs the environment-manager
 workers but not the api server or mongo server (not actually an unlikely thing
 given certain future possibilities) then add-state-server is a command that
 doesn't necessarily add a state server at all... That thought
 was the source of my doubt.

The fact you can organize things a thousand ways doesn't mean we
should offer a thousand knobs. A state server is a good abstraction
for there are management routines running there. You can define what
that means, as long as you don't let things fall down when N/2-1
machines fall down.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread William Reade
I'm concerned that we're (1) rehashing decisions made during the sprint and
(2) deviating from requirements in doing so.

In particular, abstracting HA away into management manipulations -- as
roger notes, pretty much isomorphic to the jobs proposal -- doesn't give
users HA so much as it gives them a limited toolkit with which they can
more-or-less construct their own HA; in particular, allowing people to use
an even number of state servers is strictly a bad thing [0], and I'm
extremely suspicious of any proposal that opens that door.

Of course, some will argue that mongo should be able to scale separately
from the api servers and other management tasks, and this is a worthy goal;
but in this context it sucks us down into the morass of exposing different
types of management on different machines, and ends up approaching the jobs
proposal still closer, in that it requires users to assimilate a whole load
of extra terminology in order to perform a conceptually simple function.

Conversely, ensure-ha (with possible optional --redundancy=N flag,
defaulting to 1) is a simple model that can be simply explained: the
command's sole purpose is to ensure that juju management cannot fail as a
result to the simultaneous failure of =N machines. It's a *user-level*
construct that will always be applicable even in the context of a more
sophisticated future language (no matter what's going on with this
complicated management/jobs business, you can run that and be assured
you'll end up with at least enough manager machines to fulfil the
requirement you clearly stated in the command line).

I haven't seen anything that makes me think that redesigning from scratch
is in any way superior to refining what we already agreed upon; and it's
distracting us from the questions of reporting and correcting manager
failure when it occurs. I assert the following series of arguments:

* users may discover at any time that they need to make an existing
environment HA, so ensure-ha is *always* a reasonable user action
* users who *don't* need an HA environment can, by definition, afford to
take the environment down and reconstruct it without HA if it becomes
unimportant
* therefore, scaling management *down* is not the highest priority for us
(but is nonetheless easily amenable to future control via the ensure-ha
command -- just explicitly set a lower redundancy number)
* similarly, allowing users to *directly* destroy management machines
enables exciting new failure modes that don't really need to exist

* the notion of HA is somewhat limited in worth when there's no way to make
a vulnerable environment robust again
* the more complexity we shovel onto the user's plate, the less likely she
is to resolve the situation correctly under stress
* the most obvious, and foolproof, command for repairing HA would be
ensure-ha itself, which could very reasonably take it upon itself to
replace manager nodes detected as down -- assuming a robust presence
implementation, which we need anyway, this (1) works trivially for machines
that die unexpectedly and (2) allows a backdoor for resolution of weird
situations: the user can manually shutdown a misbehaving manager
out-of-band, and run ensure-ha to cause a new one to be spun up in its
place; once HA is restored, the old machine will no longer be a manager, no
longer be indestructible, and can be cleaned up at leisure

* the notion is even more limited when you can't even tell when something
goes wrong
* therefore, HA state should *at least* be clearly and loudly communicated
in status
* but that's not very proactive, and I'd like to see a plan for how we're
going to respond to these situations when we detect them

* the data accessible to a manager node is sensitive, and we shouldn't
generally be putting manager nodes on dirty machines; but density is an
important consideration, and I don't think it's confusing to allow
preferred machines to be specified in ensure-ha, such that *if*
management capacity needs to be added it will be put onto those machines
before finding clean ones or provisioning new ones
* strawman syntax: juju ensure-ha --prefer-machines 11,37 to place any
additional manager tasks that may be required on the supplied machines in
order of preference -- but even this falls far behind the essential goal,
which is make HA *easy* for our users.
* (ofc, we should continue not to put units onto manager machines by
default, but allow them when forced with --to as before)

I don't believe that any of this precludes more sophisticated management of
juju's internal functions *when* the need becomes pressing -- whether via
jobs, or namespaced pseudo-services, or whatever -- but at this stage I
think it is far better to expose the policies we're capable of supporting,
and thus allow ourselves wiggle room to allow the mechanism to evolve, than
to define a user-facing model that is, at best, a woolly reflection of an
internal model that's likely to change as we explore the solution space in

Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
Scaling jobs independently doesn't really get you much.  If you need 7
machines of redundancy for mongo... why would you not just also want the
API on all 7 machines?  It's 100% upside... now your API is that much more
redundant/scaled, and we already know the API and mongo run just fine
together on a single machine.

The only point at which it makes sense to break out of just make N copies
of the whole state server is:


   1.  if you need to go beyond mongo's 12 node maximum, or
   2. if you want to somehow have HA without using up N extra machines by
   putting bits and pieces on machines also hosting units.


Neither of those seem like critical things we need to support in v1 of HA.
 And we should probably only try to do what is critical for v1.


On Fri, Nov 8, 2013 at 11:00 AM, William Reade
william.re...@canonical.comwrote:

 I'm concerned that we're (1) rehashing decisions made during the sprint
 and (2) deviating from requirements in doing so.

 In particular, abstracting HA away into management manipulations -- as
 roger notes, pretty much isomorphic to the jobs proposal -- doesn't give
 users HA so much as it gives them a limited toolkit with which they can
 more-or-less construct their own HA; in particular, allowing people to use
 an even number of state servers is strictly a bad thing [0], and I'm
 extremely suspicious of any proposal that opens that door.

 Of course, some will argue that mongo should be able to scale separately
 from the api servers and other management tasks, and this is a worthy goal;
 but in this context it sucks us down into the morass of exposing different
 types of management on different machines, and ends up approaching the jobs
 proposal still closer, in that it requires users to assimilate a whole load
 of extra terminology in order to perform a conceptually simple function.

 Conversely, ensure-ha (with possible optional --redundancy=N flag,
 defaulting to 1) is a simple model that can be simply explained: the
 command's sole purpose is to ensure that juju management cannot fail as a
 result to the simultaneous failure of =N machines. It's a *user-level*
 construct that will always be applicable even in the context of a more
 sophisticated future language (no matter what's going on with this
 complicated management/jobs business, you can run that and be assured
 you'll end up with at least enough manager machines to fulfil the
 requirement you clearly stated in the command line).

 I haven't seen anything that makes me think that redesigning from scratch
 is in any way superior to refining what we already agreed upon; and it's
 distracting us from the questions of reporting and correcting manager
 failure when it occurs. I assert the following series of arguments:

 * users may discover at any time that they need to make an existing
 environment HA, so ensure-ha is *always* a reasonable user action
 * users who *don't* need an HA environment can, by definition, afford to
 take the environment down and reconstruct it without HA if it becomes
 unimportant
 * therefore, scaling management *down* is not the highest priority for us
 (but is nonetheless easily amenable to future control via the ensure-ha
 command -- just explicitly set a lower redundancy number)
 * similarly, allowing users to *directly* destroy management machines
 enables exciting new failure modes that don't really need to exist

 * the notion of HA is somewhat limited in worth when there's no way to
 make a vulnerable environment robust again
 * the more complexity we shovel onto the user's plate, the less likely she
 is to resolve the situation correctly under stress
 * the most obvious, and foolproof, command for repairing HA would be
 ensure-ha itself, which could very reasonably take it upon itself to
 replace manager nodes detected as down -- assuming a robust presence
 implementation, which we need anyway, this (1) works trivially for machines
 that die unexpectedly and (2) allows a backdoor for resolution of weird
 situations: the user can manually shutdown a misbehaving manager
 out-of-band, and run ensure-ha to cause a new one to be spun up in its
 place; once HA is restored, the old machine will no longer be a manager, no
 longer be indestructible, and can be cleaned up at leisure

 * the notion is even more limited when you can't even tell when something
 goes wrong
 * therefore, HA state should *at least* be clearly and loudly communicated
 in status
 * but that's not very proactive, and I'd like to see a plan for how we're
 going to respond to these situations when we detect them

 * the data accessible to a manager node is sensitive, and we shouldn't
 generally be putting manager nodes on dirty machines; but density is an
 important consideration, and I don't think it's confusing to allow
 preferred machines to be specified in ensure-ha, such that *if*
 management capacity needs to be added it will be put onto those machines
 before finding clean ones or provisioning new ones
 * 

Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
  On Fri, Nov 8, 2013 at 7:31 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:


 It sounds like we should avoid using a management command for
 anything in juju, though. Most things in juju are about management one
 way or the other, so juju management becomes very unclear and hard
 to search for.


I'd also considered this spelling at one point in my doodling on CLI API
yesterday:

juju ha setup --to [list, of, machines]
creates 3 servers (optionally on the specified machines)

juju ha status
tells me details about the state server status

juju ha add-servers
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
It doesn't feel like the difference between

juju ensure-ha --prefer-machines 11,37

and

juju add-state-server --to 11,37

is worth the amount of reasoning there.  I'm clearly in favor of the
latter, but I wouldn't argue so much for it.


On Fri, Nov 8, 2013 at 2:00 PM, William Reade
william.re...@canonical.com wrote:
 I'm concerned that we're (1) rehashing decisions made during the sprint and
 (2) deviating from requirements in doing so.

 In particular, abstracting HA away into management manipulations -- as
 roger notes, pretty much isomorphic to the jobs proposal -- doesn't give
 users HA so much as it gives them a limited toolkit with which they can
 more-or-less construct their own HA; in particular, allowing people to use
 an even number of state servers is strictly a bad thing [0], and I'm
 extremely suspicious of any proposal that opens that door.

 Of course, some will argue that mongo should be able to scale separately
 from the api servers and other management tasks, and this is a worthy goal;
 but in this context it sucks us down into the morass of exposing different
 types of management on different machines, and ends up approaching the jobs
 proposal still closer, in that it requires users to assimilate a whole load
 of extra terminology in order to perform a conceptually simple function.

 Conversely, ensure-ha (with possible optional --redundancy=N flag,
 defaulting to 1) is a simple model that can be simply explained: the
 command's sole purpose is to ensure that juju management cannot fail as a
 result to the simultaneous failure of =N machines. It's a *user-level*
 construct that will always be applicable even in the context of a more
 sophisticated future language (no matter what's going on with this
 complicated management/jobs business, you can run that and be assured you'll
 end up with at least enough manager machines to fulfil the requirement you
 clearly stated in the command line).

 I haven't seen anything that makes me think that redesigning from scratch is
 in any way superior to refining what we already agreed upon; and it's
 distracting us from the questions of reporting and correcting manager
 failure when it occurs. I assert the following series of arguments:

 * users may discover at any time that they need to make an existing
 environment HA, so ensure-ha is *always* a reasonable user action
 * users who *don't* need an HA environment can, by definition, afford to
 take the environment down and reconstruct it without HA if it becomes
 unimportant
 * therefore, scaling management *down* is not the highest priority for us
 (but is nonetheless easily amenable to future control via the ensure-ha
 command -- just explicitly set a lower redundancy number)
 * similarly, allowing users to *directly* destroy management machines
 enables exciting new failure modes that don't really need to exist

 * the notion of HA is somewhat limited in worth when there's no way to make
 a vulnerable environment robust again
 * the more complexity we shovel onto the user's plate, the less likely she
 is to resolve the situation correctly under stress
 * the most obvious, and foolproof, command for repairing HA would be
 ensure-ha itself, which could very reasonably take it upon itself to
 replace manager nodes detected as down -- assuming a robust presence
 implementation, which we need anyway, this (1) works trivially for machines
 that die unexpectedly and (2) allows a backdoor for resolution of weird
 situations: the user can manually shutdown a misbehaving manager
 out-of-band, and run ensure-ha to cause a new one to be spun up in its
 place; once HA is restored, the old machine will no longer be a manager, no
 longer be indestructible, and can be cleaned up at leisure

 * the notion is even more limited when you can't even tell when something
 goes wrong
 * therefore, HA state should *at least* be clearly and loudly communicated
 in status
 * but that's not very proactive, and I'd like to see a plan for how we're
 going to respond to these situations when we detect them

 * the data accessible to a manager node is sensitive, and we shouldn't
 generally be putting manager nodes on dirty machines; but density is an
 important consideration, and I don't think it's confusing to allow
 preferred machines to be specified in ensure-ha, such that *if*
 management capacity needs to be added it will be put onto those machines
 before finding clean ones or provisioning new ones
 * strawman syntax: juju ensure-ha --prefer-machines 11,37 to place any
 additional manager tasks that may be required on the supplied machines in
 order of preference -- but even this falls far behind the essential goal,
 which is make HA *easy* for our users.
 * (ofc, we should continue not to put units onto manager machines by
 default, but allow them when forced with --to as before)

 I don't believe that any of this precludes more sophisticated management of
 juju's internal functions *when* the need becomes 

Re: High Availability command line interface - future plans.

2013-11-07 Thread roger peppe
On 6 November 2013 20:07, Kapil Thangavelu
kapil.thangav...@canonical.com wrote:
 instead of adding more complexity and concepts, it would be ideal if we
 could reuse the primitives we already have. ie juju environments have three
 user exposed services, that users can add-unit / remove-unit etc.  they have
 a juju prefix and therefore are omitted by default from status listing.
 That's a much simpler story to document. how do i scale my state server..
 juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

I have a lot of sympathy with this point of view. I've thought about
it quite a bit.

I see two possibilities for implementing it:

1) Keep something like the existing architecture, where machine agents can
take on managerial roles, but provide a veneer over the top which
specially interprets service operations on the juju built-in services
and translates them into operations on machine jobs.

2) Actually implement the various juju services as proper services.

The difficulty I have with 1) is that there's a significant mismatch between
the user's view of things and what's going on underneath.
For instance, with a built-in service, can I:

- add a subordinate service to it?
- see the relevant log file in the usual place for a unit?
- see its charm metadata?
- join to its juju-info relation?

If it's a single service, how can its units span different series?
(presumably it has got a charm URL, which includes the series)

I fear that if we try this approach, the cracks show through
and the result is a system that's hard to understand because
too many things are not what they appear.
And that's not even going into the plethora of special
casing that this approach would require throughout the code.

2) is more attractive, as it's actually doing what's written on the
label. But this has its own problems.

- it's a highly significant architectural change.

- juju managerial services are tightly tied into the operation
of juju itself (not surprisingly). There are many chicken and egg
problems here - we would be trying to use the system to support itself,
and that could easily lead to deadlock as one part of the system
tries to talk to another part of the system that relies on the first.
I think it *might* be possible, but it's not gonna be easy
and I suspect nasty gotchas at the end of a long development process.

- again there are inevitably going to be many special cases
throughout the code - for instance, how does a unit
acquire the credentials it needs to talk to the API
server?

It may be that a hybrid approach is possible - for example
implementing the workers as a service and still having mongo
and the API server as machine workers. I think that's
a reasonable evolutionary step from the approach I'm proposing.


The reasoning behind my proposed approach perhaps
comes from the fact that (I'm almost ashamed to admit it)
I'm a lazy programmer. I don't like creating mountains of code
where a small amount will do almost as well.

Adding the concept of jobs on machines maps very closely
to the architecture that we have today. It is a single
extra concept for the user to understand - all the other
features (e.g. add-machine and destroy-machine) are already
exposed.

I agree that in an ideal world we would scale juju meta-services
just as we would scale normal services, but I think it's actually
reasonable to have a special case here.

Allowing the user to know that machines can take on juju managerial
roles doesn't seem to be a huge ask. And we get just as much
functionality with considerably less code, which seems like a significant
win to me in terms of ongoing maintainability and agility for the future.

  cheers,
rog.

PS apologies; my last cross-post, honest! followups to
juju-dev@lists.ubuntu.com only.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Nate Finch
The answer to how does the user know how to X? is the same as it always
has been.  Documentation.  Now, that's not to say that we still don't need
to do some work to make it intuitive... but I think that for something that
is complicated like HA, leaning on documentation a little more is ok.

More inline:

On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?


- As above, documentation about what it means when you see servers in juju
status labelled as Juju State Server (or whatever).

- Have actual feedback from commands:

$ juju bootstrap --high-availability
Machines 0, 1, and 2 provisioned as juju server nodes.
Juju successfully bootstrapped environment Foo in high availability mode.

or

$ juju bootstrap
Machine 0 provisioned as juju server node.
Juju successfully bootstrapped environment Foo.

$ juju ensure-ha -n 7
Enabling high availability mode with 7 juju servers.
Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes.

$ juju ensure-ha -n 5
Reducing number of Juju server nodes to 5.
Machines 2 and 6 destroyed.

b) fix the system when a machine dies?


$ juju destroy-machine 5
Destroyed machine/5.
Automatically replacing destroyed Juju server node.
Machine/8 created as new Juju server node.


 c) scale up the system to x thousand nodes


Hopefully 12 machines is plenty of Juju servers for 5000 nodes.  We will
need to revisit this if it's not, but it seems like it should be plenty.
 As above, I think a simple -n is fine for both raising and lowering the
number of state servers.  If we get to the point of needing more than


 d) scale down the system?


 $ juju disable-ha -y
Destroyed machine/1 and machine/2.
The Juju server node for environment Foo is machine/0.
High availability mode disabled for Juju environment Foo.
-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: High Availability command line interface - future plans.

2013-11-06 Thread Kapil Thangavelu
On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?

 b) fix the system when a machine dies?

 c) scale up the system to x thousand nodes?

 d) scale down the system?

For a), we could tag a machine in the status as a state server, and
 hope that the user knows what that means.

 For b) the suggestion is that the user notice that a state server machine
 is non-responsive (as marked in status) and runs destroy-machine on it,
 which will notice that it's a state server machine and automatically
 start another one to replace it. Destroy-machine would refuse to work
 on a state server machine that seems to be alive.

 For c) we could add a flag to ensure-ha-state suggesting a desired number
 of state-server nodes.

 I'm not sure what the suggestion is for d) given that we refuse to
 destroy live state-server machines.

 Although ensure-ha-state might be a fine way to turn
 on HA initially I'm not entirely happy with expanding it to cover
 all the above cases. It seems to me like we're going
 to create a leaky abstraction that purports to be magic (just wave the
 HA wand!) and ends up being limiting, and in some cases confusing
 (Huh? I asked to destroy that machine and there's another one
 just been created)

 I believe that any user that's using HA will need to understand that
 some machines are running state servers, and when things fail, they
 will need to manage those machines individually (for example by calling
 destroy-machine).

 I also think that the solution to c) is limiting, because there is
 actually no such thing as a state server - we have at least three
 independently scalable juju components (the database servers (mongodb),
 the API servers and the environment managers) with different scaling
 characteristics. I believe that in any sufficiently large environment,
 the user will not want to scale all of those at the same rate. For example
 MongoDB will allow at most 12 members of a replica set, but a caching API
 server could potentially usefully scale up much higher than that. We could
 add more flags to ensure-ha-state (e.g.--state-server-count) but we then
 we'd lack the capability to suggest which might be grouped with which.

 PROPOSAL

 My suggestion is that we go for a slightly less magic approach.
 that provides the user with the tools to manage
 their own high availability set up, adding appropriate automation in time.

 I suggest that we let the user know that machines can run as juju server
 nodes, and provide them with the capability to *choose* which machines
 will run as server nodes and which can host units - that is, what *jobs*
 a machine will run.

 Here's a possible proposal:

 We already have an add-machine command. We'd add a --jobs flag
 to allow the user to specify the jobs that the new machine(s) will
 run. Initially we might have just two jobs, manager and unit
 - the machine can either host service units, or it can manage the
 juju environment (including running the state server database),
 or both. In time we could add finer levels of granularity to allow
 separate scalability of juju server components, without losing backwards
 compatibility.

 If the new machine is marked as a manager, it would run a mongo
 replica set peer. This *would* mean that it would be possible to have
 an even number of mongo peers, with the potential for a split vote
 if the nodes were partitioned evenly, and resulting database stasis.
 I don't *think* that would actually be a severe problem in practice.
 We would make juju status point out the potential problem very clearly,
 just as it should point out the potential problem if one of an existing
 odd-sized replica set dies. The potential problems are the same in both
 cases, and are straightforward for even a relatively naive user to avoid.

 Thus, juju ensure-ha-state is almost equivalent to:

 juju add-machine --jobs manager -n 2

 In my view, this command feels less magic than ensure-ha-state - the
 runtime implication (e.g. cost) of what's going on are easier for the
 user to understand and it requires no new entities in a user's model of
 the system.

 In addition to the new add-machine flag, we'd add a single new command,
 juju machine-jobs, which would allow the user to change the jobs
 associated with an existing machine.  That could be a later addition -
 it's not necessary in the first cut.

 With these primitives, I *think* the responsibilities of the system and
 the model to the user become clearer.  Looking back to the original
 user questions:

Re: High Availability command line interface - future plans.

2013-11-06 Thread David Cheney
+1 (million), this solution keeps coming up, and I still feel it is
the right one.

On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu
kapil.thangav...@canonical.com wrote:



 On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?

 b) fix the system when a machine dies?

 c) scale up the system to x thousand nodes?

 d) scale down the system?

 For a), we could tag a machine in the status as a state server, and
 hope that the user knows what that means.

 For b) the suggestion is that the user notice that a state server machine
 is non-responsive (as marked in status) and runs destroy-machine on it,
 which will notice that it's a state server machine and automatically
 start another one to replace it. Destroy-machine would refuse to work
 on a state server machine that seems to be alive.

 For c) we could add a flag to ensure-ha-state suggesting a desired number
 of state-server nodes.

 I'm not sure what the suggestion is for d) given that we refuse to
 destroy live state-server machines.

 Although ensure-ha-state might be a fine way to turn
 on HA initially I'm not entirely happy with expanding it to cover
 all the above cases. It seems to me like we're going
 to create a leaky abstraction that purports to be magic (just wave the
 HA wand!) and ends up being limiting, and in some cases confusing
 (Huh? I asked to destroy that machine and there's another one
 just been created)

 I believe that any user that's using HA will need to understand that
 some machines are running state servers, and when things fail, they
 will need to manage those machines individually (for example by calling
 destroy-machine).

 I also think that the solution to c) is limiting, because there is
 actually no such thing as a state server - we have at least three
 independently scalable juju components (the database servers (mongodb),
 the API servers and the environment managers) with different scaling
 characteristics. I believe that in any sufficiently large environment,
 the user will not want to scale all of those at the same rate. For example
 MongoDB will allow at most 12 members of a replica set, but a caching API
 server could potentially usefully scale up much higher than that. We could
 add more flags to ensure-ha-state (e.g.--state-server-count) but we then
 we'd lack the capability to suggest which might be grouped with which.

 PROPOSAL

 My suggestion is that we go for a slightly less magic approach.
 that provides the user with the tools to manage
 their own high availability set up, adding appropriate automation in time.

 I suggest that we let the user know that machines can run as juju server
 nodes, and provide them with the capability to *choose* which machines
 will run as server nodes and which can host units - that is, what *jobs*
 a machine will run.

 Here's a possible proposal:

 We already have an add-machine command. We'd add a --jobs flag
 to allow the user to specify the jobs that the new machine(s) will
 run. Initially we might have just two jobs, manager and unit
 - the machine can either host service units, or it can manage the
 juju environment (including running the state server database),
 or both. In time we could add finer levels of granularity to allow
 separate scalability of juju server components, without losing backwards
 compatibility.

 If the new machine is marked as a manager, it would run a mongo
 replica set peer. This *would* mean that it would be possible to have
 an even number of mongo peers, with the potential for a split vote
 if the nodes were partitioned evenly, and resulting database stasis.
 I don't *think* that would actually be a severe problem in practice.
 We would make juju status point out the potential problem very clearly,
 just as it should point out the potential problem if one of an existing
 odd-sized replica set dies. The potential problems are the same in both
 cases, and are straightforward for even a relatively naive user to avoid.

 Thus, juju ensure-ha-state is almost equivalent to:

 juju add-machine --jobs manager -n 2

 In my view, this command feels less magic than ensure-ha-state - the
 runtime implication (e.g. cost) of what's going on are easier for the
 user to understand and it requires no new entities in a user's model of
 the system.

 In addition to the new add-machine flag, we'd add a single new command,
 juju machine-jobs, which would allow the user to change the jobs
 associated with an existing machine.  That could be a later addition -
 it's not necessary 

Re: High Availability command line interface - future plans.

2013-11-06 Thread Nick Veitch
just my tuppence...

Would it not be clearer to add an additional command to implement your
proposal? E.g. add-manager and possibly destroy/remove-manager
This could also support switches for later fine control, and possibly
be less open to misinterpretation than overloading the add-machine
command?

Nick

On Wed, Nov 6, 2013 at 6:49 PM, roger peppe rogpe...@gmail.com wrote:
 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?

 b) fix the system when a machine dies?

 c) scale up the system to x thousand nodes?

 d) scale down the system?

 For a), we could tag a machine in the status as a state server, and
 hope that the user knows what that means.

 For b) the suggestion is that the user notice that a state server machine
 is non-responsive (as marked in status) and runs destroy-machine on it,
 which will notice that it's a state server machine and automatically
 start another one to replace it. Destroy-machine would refuse to work
 on a state server machine that seems to be alive.

 For c) we could add a flag to ensure-ha-state suggesting a desired number
 of state-server nodes.

 I'm not sure what the suggestion is for d) given that we refuse to
 destroy live state-server machines.

 Although ensure-ha-state might be a fine way to turn
 on HA initially I'm not entirely happy with expanding it to cover
 all the above cases. It seems to me like we're going
 to create a leaky abstraction that purports to be magic (just wave the
 HA wand!) and ends up being limiting, and in some cases confusing
 (Huh? I asked to destroy that machine and there's another one
 just been created)

 I believe that any user that's using HA will need to understand that
 some machines are running state servers, and when things fail, they
 will need to manage those machines individually (for example by calling
 destroy-machine).

 I also think that the solution to c) is limiting, because there is
 actually no such thing as a state server - we have at least three
 independently scalable juju components (the database servers (mongodb),
 the API servers and the environment managers) with different scaling
 characteristics. I believe that in any sufficiently large environment,
 the user will not want to scale all of those at the same rate. For example
 MongoDB will allow at most 12 members of a replica set, but a caching API
 server could potentially usefully scale up much higher than that. We could
 add more flags to ensure-ha-state (e.g.--state-server-count) but we then
 we'd lack the capability to suggest which might be grouped with which.

 PROPOSAL

 My suggestion is that we go for a slightly less magic approach.
 that provides the user with the tools to manage
 their own high availability set up, adding appropriate automation in time.

 I suggest that we let the user know that machines can run as juju server
 nodes, and provide them with the capability to *choose* which machines
 will run as server nodes and which can host units - that is, what *jobs*
 a machine will run.

 Here's a possible proposal:

 We already have an add-machine command. We'd add a --jobs flag
 to allow the user to specify the jobs that the new machine(s) will
 run. Initially we might have just two jobs, manager and unit
 - the machine can either host service units, or it can manage the
 juju environment (including running the state server database),
 or both. In time we could add finer levels of granularity to allow
 separate scalability of juju server components, without losing backwards
 compatibility.

 If the new machine is marked as a manager, it would run a mongo
 replica set peer. This *would* mean that it would be possible to have
 an even number of mongo peers, with the potential for a split vote
 if the nodes were partitioned evenly, and resulting database stasis.
 I don't *think* that would actually be a severe problem in practice.
 We would make juju status point out the potential problem very clearly,
 just as it should point out the potential problem if one of an existing
 odd-sized replica set dies. The potential problems are the same in both
 cases, and are straightforward for even a relatively naive user to avoid.

 Thus, juju ensure-ha-state is almost equivalent to:

 juju add-machine --jobs manager -n 2

 In my view, this command feels less magic than ensure-ha-state - the
 runtime implication (e.g. cost) of what's going on are easier for the
 user to understand and it requires no new entities in a user's model of
 the system.

 In addition to the new add-machine flag, we'd add a single new command,
 juju machine-jobs, which 

Re: High Availability command line interface - future plans.

2013-11-06 Thread Nate Finch
The answer to how does the user know how to X? is the same as it always
has been.  Documentation.  Now, that's not to say that we still don't need
to do some work to make it intuitive... but I think that for something that
is complicated like HA, leaning on documentation a little more is ok.

More inline:

On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?


- As above, documentation about what it means when you see servers in juju
status labelled as Juju State Server (or whatever).

- Have actual feedback from commands:

$ juju bootstrap --high-availability
Machines 0, 1, and 2 provisioned as juju server nodes.
Juju successfully bootstrapped environment Foo in high availability mode.

or

$ juju bootstrap
Machine 0 provisioned as juju server node.
Juju successfully bootstrapped environment Foo.

$ juju ensure-ha -n 7
Enabling high availability mode with 7 juju servers.
Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes.

$ juju ensure-ha -n 5
Reducing number of Juju server nodes to 5.
Machines 2 and 6 destroyed.

b) fix the system when a machine dies?


$ juju destroy-machine 5
Destroyed machine/5.
Automatically replacing destroyed Juju server node.
Machine/8 created as new Juju server node.


 c) scale up the system to x thousand nodes


Hopefully 12 machines is plenty of Juju servers for 5000 nodes.  We will
need to revisit this if it's not, but it seems like it should be plenty.
 As above, I think a simple -n is fine for both raising and lowering the
number of state servers.  If we get to the point of needing more than


 d) scale down the system?


 $ juju disable-ha -y
Destroyed machine/1 and machine/2.
The Juju server node for environment Foo is machine/0.
High availability mode disabled for Juju environment Foo.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Nate Finch
Oops, missed the end of a thought there.  If we get to the point of needing
more than 12 server nodes (not unfathomable), then we have to start doing
some more work for our hyperscale customers, which will probably involve
much more customization and require much more knowledge of the system.

I think one of the points of making HA simple is that we don't want people
to have to learn how Juju works before they can deploy their own stuff in a
robust manner.  Keep the barrier of entry as low as possible.  We can give
general guidelines about how many Juju servers you need for N unit agents,
and then people will know what to set N to, when they do juju ensure-ha -n.

I think most people will be happy knowing there are N servers out there,
and if one goes down, another will take its place. They don't want to know
about this job and that job.  Just make it work and let me get on with my
life. That's kind of the whole point of Juju, right?


On Wed, Nov 6, 2013 at 2:56 PM, Nate Finch nate.fi...@canonical.com wrote:

 The answer to how does the user know how to X? is the same as it always
 has been.  Documentation.  Now, that's not to say that we still don't need
 to do some work to make it intuitive... but I think that for something that
 is complicated like HA, leaning on documentation a little more is ok.

 More inline:

 On Wed, Nov 6, 2013 at 1:49 PM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?


 - As above, documentation about what it means when you see servers in juju
 status labelled as Juju State Server (or whatever).

 - Have actual feedback from commands:

 $ juju bootstrap --high-availability
 Machines 0, 1, and 2 provisioned as juju server nodes.
 Juju successfully bootstrapped environment Foo in high availability mode.

 or

 $ juju bootstrap
 Machine 0 provisioned as juju server node.
 Juju successfully bootstrapped environment Foo.

 $ juju ensure-ha -n 7
 Enabling high availability mode with 7 juju servers.
 Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes.

 $ juju ensure-ha -n 5
 Reducing number of Juju server nodes to 5.
 Machines 2 and 6 destroyed.

 b) fix the system when a machine dies?


 $ juju destroy-machine 5
 Destroyed machine/5.
 Automatically replacing destroyed Juju server node.
 Machine/8 created as new Juju server node.


 c) scale up the system to x thousand nodes


 Hopefully 12 machines is plenty of Juju servers for 5000 nodes.  We will
 need to revisit this if it's not, but it seems like it should be plenty.
  As above, I think a simple -n is fine for both raising and lowering the
 number of state servers.  If we get to the point of needing more than


 d) scale down the system?


  $ juju disable-ha -y
 Destroyed machine/1 and machine/2.
 The Juju server node for environment Foo is machine/0.
 High availability mode disabled for Juju environment Foo.


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Kapil Thangavelu
On Thu, Nov 7, 2013 at 2:49 AM, roger peppe rogpe...@gmail.com wrote:

 The current plan is to have a single juju ensure-ha-state juju
 command. This would create new state server machines if there are less
 than the required number (currently 3).

 Taking that as given, I'm wondering what we should do
 in the future, when users require more than a single
 big On switch for HA.

 How does the user:

 a) know about the HA machines so the costs of HA are not hidden, and that
 the implications of particular machine failures are clear?

 b) fix the system when a machine dies?

 c) scale up the system to x thousand nodes?

 d) scale down the system?

For a), we could tag a machine in the status as a state server, and
 hope that the user knows what that means.

 For b) the suggestion is that the user notice that a state server machine
 is non-responsive (as marked in status) and runs destroy-machine on it,
 which will notice that it's a state server machine and automatically
 start another one to replace it. Destroy-machine would refuse to work
 on a state server machine that seems to be alive.

 For c) we could add a flag to ensure-ha-state suggesting a desired number
 of state-server nodes.

 I'm not sure what the suggestion is for d) given that we refuse to
 destroy live state-server machines.

 Although ensure-ha-state might be a fine way to turn
 on HA initially I'm not entirely happy with expanding it to cover
 all the above cases. It seems to me like we're going
 to create a leaky abstraction that purports to be magic (just wave the
 HA wand!) and ends up being limiting, and in some cases confusing
 (Huh? I asked to destroy that machine and there's another one
 just been created)

 I believe that any user that's using HA will need to understand that
 some machines are running state servers, and when things fail, they
 will need to manage those machines individually (for example by calling
 destroy-machine).

 I also think that the solution to c) is limiting, because there is
 actually no such thing as a state server - we have at least three
 independently scalable juju components (the database servers (mongodb),
 the API servers and the environment managers) with different scaling
 characteristics. I believe that in any sufficiently large environment,
 the user will not want to scale all of those at the same rate. For example
 MongoDB will allow at most 12 members of a replica set, but a caching API
 server could potentially usefully scale up much higher than that. We could
 add more flags to ensure-ha-state (e.g.--state-server-count) but we then
 we'd lack the capability to suggest which might be grouped with which.

 PROPOSAL

 My suggestion is that we go for a slightly less magic approach.
 that provides the user with the tools to manage
 their own high availability set up, adding appropriate automation in time.

 I suggest that we let the user know that machines can run as juju server
 nodes, and provide them with the capability to *choose* which machines
 will run as server nodes and which can host units - that is, what *jobs*
 a machine will run.

 Here's a possible proposal:

 We already have an add-machine command. We'd add a --jobs flag
 to allow the user to specify the jobs that the new machine(s) will
 run. Initially we might have just two jobs, manager and unit
 - the machine can either host service units, or it can manage the
 juju environment (including running the state server database),
 or both. In time we could add finer levels of granularity to allow
 separate scalability of juju server components, without losing backwards
 compatibility.

 If the new machine is marked as a manager, it would run a mongo
 replica set peer. This *would* mean that it would be possible to have
 an even number of mongo peers, with the potential for a split vote
 if the nodes were partitioned evenly, and resulting database stasis.
 I don't *think* that would actually be a severe problem in practice.
 We would make juju status point out the potential problem very clearly,
 just as it should point out the potential problem if one of an existing
 odd-sized replica set dies. The potential problems are the same in both
 cases, and are straightforward for even a relatively naive user to avoid.

 Thus, juju ensure-ha-state is almost equivalent to:

 juju add-machine --jobs manager -n 2

 In my view, this command feels less magic than ensure-ha-state - the
 runtime implication (e.g. cost) of what's going on are easier for the
 user to understand and it requires no new entities in a user's model of
 the system.

 In addition to the new add-machine flag, we'd add a single new command,
 juju machine-jobs, which would allow the user to change the jobs
 associated with an existing machine.  That could be a later addition -
 it's not necessary in the first cut.

 With these primitives, I *think* the responsibilities of the system and
 the model to the user become clearer.  Looking back to the original
 user questions:

Re: High Availability command line interface - future plans.

2013-11-06 Thread Andrew Wilkins
On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com wrote:

 So, I haven't been involved directly in a lot of the discussion, but my 2c
 is:

 +1 to juju ensure-ha

 Users don't give a f*ck about how Juju achieves HA, they just want to know
 their
 data will survive a node outage. What Juju does under the covers to make
 that
 happen, what jobs are run on what nodes etc - that's for Juju to care
 about.


I'm not so sure about that. I expect there'll be users who wants to know
*exactly* how it works, because otherwise they won't feel they can trust it
with their services. That's not to say that ensure-ha can't be trusted -
just that some users will want to know what it's doing under the covers.
Speculative, but based on past experience with banks, insurance companies,
etc.

Another thing to consider is that one person's HA is not the next person's.
I may want to disperse my state servers across multiple regions (were that
supported); you might find this costs too much in inter-region traffic.
What happens if I have a temporary outage in one region - where does
ensure-ha automatically spin up a new one? What happens when the original
comes back? Each of these things are things people may want to do
differently, because they each have different trade-offs.

I'm not really keen on ensure-ha due to the magical nature, but if it's
just a stop gap... I guess.

+1 to high level, namespaced services (juju:api, juju:db etc)

 This is a step above ensure-ha for more advanced users, but one which still
 presents the solution space in terms any IS person involved in managing
 things
 like scalable web services understands. ie there's the concept of services
 which
 process requests and those which store data, and those which insert role
 here.
 If the volume of incoming requests are such that the load on the api
 servers is
 high while the database is still coping ok, juju add-unit juju:api -n 3
 can be
 used to solve that efficiently, and vice versa. So it's all about mapping
 what
 Juju does to terms and concepts already understood, and getting the level
 of
 abstraction correct so the solution is usable by the target audience.

 Anything that involves exposing things like jobs etc is not the right way
 of
 looking at it IMO.


I had suggested something very similar (add-machine --state) at SFO to what
Roger's suggested, but I can see the arguments against it. Overloading
add-unit seems like a decent alternative.

Cheers,
Andrew
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Tim Penhey
On 07/11/13 15:00, Andrew Wilkins wrote:
 On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com
 mailto:ian.bo...@canonical.com wrote:
 
 So, I haven't been involved directly in a lot of the discussion, but
 my 2c is:
 
 +1 to juju ensure-ha
 
 Users don't give a f*ck about how Juju achieves HA, they just want
 to know their
 data will survive a node outage. What Juju does under the covers to
 make that
 happen, what jobs are run on what nodes etc - that's for Juju to
 care about.
 
  
 I'm not so sure about that. I expect there'll be users who wants to know
 *exactly* how it works, because otherwise they won't feel they can trust
 it with their services. That's not to say that ensure-ha can't be
 trusted - just that some users will want to know what it's doing under
 the covers. Speculative, but based on past experience with banks,
 insurance companies, etc.

I think if we gave no feedback at all, then yes, this would feel like
magic.  However, I'd expect us to at least say what we are doing on the
command line :-)

I think ensure-ha is sufficient for a first cut, and a way to get ha on
a running system.

For the record, we discussed the default behaviour for ensure-ha was to
make three nodes of manager services.  The user could override this by
specifying -n 5 or -n 7, or some other odd number.

 Another thing to consider is that one person's HA is not the next
 person's. I may want to disperse my state servers across multiple
 regions (were that supported); you might find this costs too much in
 inter-region traffic. What happens if I have a temporary outage in one
 region - where does ensure-ha automatically spin up a new one? What
 happens when the original comes back? Each of these things are things
 people may want to do differently, because they each have different
 trade-offs.

I agree that support over regions is an important idea, but this is way
outside the scope of this HA discussion.

AFAIK, our cross-region story is still all about cross-environment
relations, not spanning regions with one environment.

Tim

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Marco Ceppi
Hi guys,

I'm glad j...@lists.ubuntu.com got accidentally looped in because I may not
have caught wind of this. I can understand both sides of the discussion,
one where we provide more magic and the users trust that it works and the
other where we leverage existing components and command structures of juju
to provide this magic.

I have to agree with Kapil's point about add-unit/remove-unit syntax for
Juju HA. Having had to teach and demonstrate juju to quite a few people
now, juju is not an easy concept to grasp. Orchestration is really
something that people are just now starting to think about in general,
never mind how to wrap their heads around the concept and then furthermore
how we envision that concept, which is distilled in our product - juju. At
the end of the day I get it, we get it, it's easy for us because we're here
building it, but for the people out there it's a whole new language. If we
start off by saying Oh, hey there, just run this ensure-ha command and
things will just be fantastic is fine, but once you open up that route
it's going to be hard to back-peddle.

We already teach Oh, your services is popular? Just `juju add-unit
service` magic will happen, units will fire, and you've scaled up. You've
added an additional available unit and you're safer than you were before.
Being able to convey the same strategy for when you want to safeguard and
make Juju's bootstrap a highly available service, the natural logic would
be to `juju add-unit`. In fact I was even asked this in a Charm School
recently, I'm paraphrasing but it was to some extent Can I juju add-unit
bootstrap.

Since the majority of people seem to believe having the ultimate goal of
adding and removing juju specific services via a unique and reserved
namespace is a great goal to have it seems not shooting for that first
would simply introduce another awkward period of time which we have this
great feature but it's going to change soon so videos, blog posts, content
we produce to promote this shear awesomeness becomes stale and out of date
just as soon as the more permanent method of HA lands. For new users
learning a language this just becomes another hurtle to overcome in order
to be an expert and one more reason to look at something else other than
Juju.

Therefore, I (who really has no major say in this, simply because I'm not
capable of helping produce a solution) believe it's best to work for the
ultimate goal now instead of having to build a stop gap just to say we have
HA.

On a final note, if namespacing does become a thing, can we *please *use a
unique character for the separation of namespace:service? A : would be
fantastic as calling something juju-db could very well be mistaken or
deployed as another service? `juju deploy some-random-thing juju-*` now we
have things sharing a special namespace that aren't actually special. (Like
juju-gui, though juju-gui is quite special and awesome, it's not juju-
core namespace special).

Thanks for all the awesome work you all do. I look forward to a solution,
whatever it may be, in the future!

Marco Ceppi


On Wed, Nov 6, 2013 at 9:22 PM, Tim Penhey tim.pen...@canonical.com wrote:

 On 07/11/13 15:00, Andrew Wilkins wrote:
  On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth ian.bo...@canonical.com
  mailto:ian.bo...@canonical.com wrote:
 
  So, I haven't been involved directly in a lot of the discussion, but
  my 2c is:
 
  +1 to juju ensure-ha
 
  Users don't give a f*ck about how Juju achieves HA, they just want
  to know their
  data will survive a node outage. What Juju does under the covers to
  make that
  happen, what jobs are run on what nodes etc - that's for Juju to
  care about.
 
 
  I'm not so sure about that. I expect there'll be users who wants to know
  *exactly* how it works, because otherwise they won't feel they can trust
  it with their services. That's not to say that ensure-ha can't be
  trusted - just that some users will want to know what it's doing under
  the covers. Speculative, but based on past experience with banks,
  insurance companies, etc.

 I think if we gave no feedback at all, then yes, this would feel like
 magic.  However, I'd expect us to at least say what we are doing on the
 command line :-)

 I think ensure-ha is sufficient for a first cut, and a way to get ha on
 a running system.

 For the record, we discussed the default behaviour for ensure-ha was to
 make three nodes of manager services.  The user could override this by
 specifying -n 5 or -n 7, or some other odd number.

  Another thing to consider is that one person's HA is not the next
  person's. I may want to disperse my state servers across multiple
  regions (were that supported); you might find this costs too much in
  inter-region traffic. What happens if I have a temporary outage in one
  region - where does ensure-ha automatically spin up a new one? What
  happens when the original comes back? Each of these things are things
  people may want