Re: High Availability command line interface - future plans.

2013-11-06 Thread Nick Veitch
just my tuppence...

Would it not be clearer to add an additional command to implement your
proposal? E.g. "add-manager" and possibly "destroy/remove-manager"
This could also support switches for later fine control, and possibly
be less open to misinterpretation than overloading the add-machine
command?

Nick

On Wed, Nov 6, 2013 at 6:49 PM, roger peppe  wrote:
> The current plan is to have a single "juju ensure-ha-state" juju
> command. This would create new state server machines if there are less
> than the required number (currently 3).
>
> Taking that as given, I'm wondering what we should do
> in the future, when users require more than a single
> big On switch for HA.
>
> How does the user:
>
> a) know about the HA machines so the costs of HA are not hidden, and that
> the implications of particular machine failures are clear?
>
> b) fix the system when a machine dies?
>
> c) scale up the system to x thousand nodes?
>
> d) scale down the system?
>
> For a), we could tag a machine in the status as a "state server", and
> hope that the user knows what that means.
>
> For b) the suggestion is that the user notice that a state server machine
> is non-responsive (as marked in status) and runs destroy-machine on it,
> which will notice that it's a state server machine and automatically
> start another one to replace it. Destroy-machine would refuse to work
> on a state server machine that seems to be alive.
>
> For c) we could add a flag to ensure-ha-state suggesting a desired number
> of state-server nodes.
>
> I'm not sure what the suggestion is for d) given that we refuse to
> destroy live state-server machines.
>
> Although ensure-ha-state might be a fine way to turn
> on HA initially I'm not entirely happy with expanding it to cover
> all the above cases. It seems to me like we're going
> to create a leaky abstraction that purports to be magic ("just wave the
> HA wand!") and ends up being limiting, and in some cases confusing
> ("Huh? I asked to destroy that machine and there's another one
> just been created")
>
> I believe that any user that's using HA will need to understand that
> some machines are running state servers, and when things fail, they
> will need to manage those machines individually (for example by calling
> destroy-machine).
>
> I also think that the solution to c) is limiting, because there is
> actually no such thing as a "state server" - we have at least three
> independently scalable juju components (the database servers (mongodb),
> the API servers and the environment managers) with different scaling
> characteristics. I believe that in any sufficiently large environment,
> the user will not want to scale all of those at the same rate. For example
> MongoDB will allow at most 12 members of a replica set, but a caching API
> server could potentially usefully scale up much higher than that. We could
> add more flags to ensure-ha-state (e.g.--state-server-count) but we then
> we'd lack the capability to suggest which might be grouped with which.
>
> PROPOSAL
>
> My suggestion is that we go for a "slightly less magic" approach.
> that provides the user with the tools to manage
> their own high availability set up, adding appropriate automation in time.
>
> I suggest that we let the user know that machines can run as juju server
> nodes, and provide them with the capability to *choose* which machines
> will run as server nodes and which can host units - that is, what *jobs*
> a machine will run.
>
> Here's a possible proposal:
>
> We already have an "add-machine" command. We'd add a "--jobs" flag
> to allow the user to specify the jobs that the new machine(s) will
> run. Initially we might have just two jobs, "manager" and "unit"
> - the machine can either host service units, or it can manage the
> juju environment (including running the state server database),
> or both. In time we could add finer levels of granularity to allow
> separate scalability of juju server components, without losing backwards
> compatibility.
>
> If the new machine is marked as a "manager", it would run a mongo
> replica set peer. This *would* mean that it would be possible to have
> an even number of mongo peers, with the potential for a split vote
> if the nodes were partitioned evenly, and resulting database stasis.
> I don't *think* that would actually be a severe problem in practice.
> We would make juju status point out the potential problem very clearly,
> just as it should point out the potential problem if one of an existing
> odd-sized replica set dies. The potential problems are the same in both
> cases, and are straightforward for even a relatively naive user to avoid.
>
> Thus, juju ensure-ha-state is almost equivalent to:
>
> juju add-machine --jobs manager -n 2
>
> In my view, this command feels less "magic" than ensure-ha-state - the
> runtime implication (e.g. cost) of what's going on are easier for the
> user to understand and it requires no new entities in a user's model of
> the sy

Re: High Availability command line interface - future plans.

2013-11-06 Thread Nate Finch
The answer to "how does the user know how to X?" is the same as it always
has been.  Documentation.  Now, that's not to say that we still don't need
to do some work to make it intuitive... but I think that for something that
is complicated like HA, leaning on documentation a little more is ok.

More inline:

On Wed, Nov 6, 2013 at 1:49 PM, roger peppe  wrote:

> The current plan is to have a single "juju ensure-ha-state" juju
> command. This would create new state server machines if there are less
> than the required number (currently 3).
>
> Taking that as given, I'm wondering what we should do
> in the future, when users require more than a single
> big On switch for HA.
>
> How does the user:
>
> a) know about the HA machines so the costs of HA are not hidden, and that
> the implications of particular machine failures are clear?
>

- As above, documentation about what it means when you see servers in juju
status labelled as "Juju State Server" (or whatever).

- Have actual feedback from commands:

$ juju bootstrap --high-availability
Machines 0, 1, and 2 provisioned as juju server nodes.
Juju successfully bootstrapped environment Foo in high availability mode.

or

$ juju bootstrap
Machine 0 provisioned as juju server node.
Juju successfully bootstrapped environment Foo.

$ juju ensure-ha -n 7
Enabling high availability mode with 7 juju servers.
Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes.

$ juju ensure-ha -n 5
Reducing number of Juju server nodes to 5.
Machines 2 and 6 destroyed.

b) fix the system when a machine dies?
>

$ juju destroy-machine 5
Destroyed machine/5.
Automatically replacing destroyed Juju server node.
Machine/8 created as new Juju server node.


> c) scale up the system to x thousand nodes


Hopefully 12 machines is plenty of Juju servers for 5000 nodes.  We will
need to revisit this if it's not, but it seems like it should be plenty.
 As above, I think a simple -n is fine for both raising and lowering the
number of state servers.  If we get to the point of needing more than


> d) scale down the system?
>

 $ juju disable-ha -y
Destroyed machine/1 and machine/2.
The Juju server node for environment Foo is machine/0.
High availability mode disabled for Juju environment Foo.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Nate Finch
Oops, missed the end of a thought there.  If we get to the point of needing
more than 12 server nodes (not unfathomable), then we have to start doing
some more work for our "hyperscale" customers, which will probably involve
much more customization and require much more knowledge of the system.

I think one of the points of making HA simple is that we don't want people
to have to learn how Juju works before they can deploy their own stuff in a
robust manner.  Keep the barrier of entry as low as possible.  We can give
general guidelines about how many Juju servers you need for N unit agents,
and then people will know what to set N to, when they do juju ensure-ha -n.

I think most people will be happy knowing there are N servers out there,
and if one goes down, another will take its place. They don't want to know
about this job and that job.  Just make it work and let me get on with my
life. That's kind of the whole point of Juju, right?


On Wed, Nov 6, 2013 at 2:56 PM, Nate Finch  wrote:

> The answer to "how does the user know how to X?" is the same as it always
> has been.  Documentation.  Now, that's not to say that we still don't need
> to do some work to make it intuitive... but I think that for something that
> is complicated like HA, leaning on documentation a little more is ok.
>
> More inline:
>
> On Wed, Nov 6, 2013 at 1:49 PM, roger peppe  wrote:
>
>> The current plan is to have a single "juju ensure-ha-state" juju
>> command. This would create new state server machines if there are less
>> than the required number (currently 3).
>>
>> Taking that as given, I'm wondering what we should do
>> in the future, when users require more than a single
>> big On switch for HA.
>>
>> How does the user:
>>
>> a) know about the HA machines so the costs of HA are not hidden, and that
>> the implications of particular machine failures are clear?
>>
>
> - As above, documentation about what it means when you see servers in juju
> status labelled as "Juju State Server" (or whatever).
>
> - Have actual feedback from commands:
>
> $ juju bootstrap --high-availability
> Machines 0, 1, and 2 provisioned as juju server nodes.
> Juju successfully bootstrapped environment Foo in high availability mode.
>
> or
>
> $ juju bootstrap
> Machine 0 provisioned as juju server node.
> Juju successfully bootstrapped environment Foo.
>
> $ juju ensure-ha -n 7
> Enabling high availability mode with 7 juju servers.
> Machines 1, 2, 3, 4, 5, and 6 provisioned as additional Juju server nodes.
>
> $ juju ensure-ha -n 5
> Reducing number of Juju server nodes to 5.
> Machines 2 and 6 destroyed.
>
> b) fix the system when a machine dies?
>>
>
> $ juju destroy-machine 5
> Destroyed machine/5.
> Automatically replacing destroyed Juju server node.
> Machine/8 created as new Juju server node.
>
>
>> c) scale up the system to x thousand nodes
>
>
> Hopefully 12 machines is plenty of Juju servers for 5000 nodes.  We will
> need to revisit this if it's not, but it seems like it should be plenty.
>  As above, I think a simple -n is fine for both raising and lowering the
> number of state servers.  If we get to the point of needing more than
>
>
>> d) scale down the system?
>>
>
>  $ juju disable-ha -y
> Destroyed machine/1 and machine/2.
> The Juju server node for environment Foo is machine/0.
> High availability mode disabled for Juju environment Foo.
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Kapil Thangavelu
On Thu, Nov 7, 2013 at 2:49 AM, roger peppe  wrote:

> The current plan is to have a single "juju ensure-ha-state" juju
> command. This would create new state server machines if there are less
> than the required number (currently 3).
>
> Taking that as given, I'm wondering what we should do
> in the future, when users require more than a single
> big On switch for HA.
>
> How does the user:
>
> a) know about the HA machines so the costs of HA are not hidden, and that
> the implications of particular machine failures are clear?
>
> b) fix the system when a machine dies?
>
> c) scale up the system to x thousand nodes?
>
> d) scale down the system?
>
For a), we could tag a machine in the status as a "state server", and
> hope that the user knows what that means.
>
> For b) the suggestion is that the user notice that a state server machine
> is non-responsive (as marked in status) and runs destroy-machine on it,
> which will notice that it's a state server machine and automatically
> start another one to replace it. Destroy-machine would refuse to work
> on a state server machine that seems to be alive.
>
> For c) we could add a flag to ensure-ha-state suggesting a desired number
> of state-server nodes.
>
> I'm not sure what the suggestion is for d) given that we refuse to
> destroy live state-server machines.
>
> Although ensure-ha-state might be a fine way to turn
> on HA initially I'm not entirely happy with expanding it to cover
> all the above cases. It seems to me like we're going
> to create a leaky abstraction that purports to be magic ("just wave the
> HA wand!") and ends up being limiting, and in some cases confusing
> ("Huh? I asked to destroy that machine and there's another one
> just been created")
>
> I believe that any user that's using HA will need to understand that
> some machines are running state servers, and when things fail, they
> will need to manage those machines individually (for example by calling
> destroy-machine).
>
> I also think that the solution to c) is limiting, because there is
> actually no such thing as a "state server" - we have at least three
> independently scalable juju components (the database servers (mongodb),
> the API servers and the environment managers) with different scaling
> characteristics. I believe that in any sufficiently large environment,
> the user will not want to scale all of those at the same rate. For example
> MongoDB will allow at most 12 members of a replica set, but a caching API
> server could potentially usefully scale up much higher than that. We could
> add more flags to ensure-ha-state (e.g.--state-server-count) but we then
> we'd lack the capability to suggest which might be grouped with which.
>
> PROPOSAL
>
> My suggestion is that we go for a "slightly less magic" approach.
> that provides the user with the tools to manage
> their own high availability set up, adding appropriate automation in time.
>
> I suggest that we let the user know that machines can run as juju server
> nodes, and provide them with the capability to *choose* which machines
> will run as server nodes and which can host units - that is, what *jobs*
> a machine will run.
>
> Here's a possible proposal:
>
> We already have an "add-machine" command. We'd add a "--jobs" flag
> to allow the user to specify the jobs that the new machine(s) will
> run. Initially we might have just two jobs, "manager" and "unit"
> - the machine can either host service units, or it can manage the
> juju environment (including running the state server database),
> or both. In time we could add finer levels of granularity to allow
> separate scalability of juju server components, without losing backwards
> compatibility.
>
> If the new machine is marked as a "manager", it would run a mongo
> replica set peer. This *would* mean that it would be possible to have
> an even number of mongo peers, with the potential for a split vote
> if the nodes were partitioned evenly, and resulting database stasis.
> I don't *think* that would actually be a severe problem in practice.
> We would make juju status point out the potential problem very clearly,
> just as it should point out the potential problem if one of an existing
> odd-sized replica set dies. The potential problems are the same in both
> cases, and are straightforward for even a relatively naive user to avoid.
>
> Thus, juju ensure-ha-state is almost equivalent to:
>
> juju add-machine --jobs manager -n 2
>
> In my view, this command feels less "magic" than ensure-ha-state - the
> runtime implication (e.g. cost) of what's going on are easier for the
> user to understand and it requires no new entities in a user's model of
> the system.
>
> In addition to the new add-machine flag, we'd add a single new command,
> "juju machine-jobs", which would allow the user to change the jobs
> associated with an existing machine.  That could be a later addition -
> it's not necessary in the first cut.
>
> With these primitives, I *think* the responsib

Re: High Availability command line interface - future plans.

2013-11-06 Thread David Cheney
+1 (million), this solution keeps coming up, and I still feel it is
the right one.

On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu
 wrote:
>
>
>
> On Thu, Nov 7, 2013 at 2:49 AM, roger peppe  wrote:
>>
>> The current plan is to have a single "juju ensure-ha-state" juju
>> command. This would create new state server machines if there are less
>> than the required number (currently 3).
>>
>> Taking that as given, I'm wondering what we should do
>> in the future, when users require more than a single
>> big On switch for HA.
>>
>> How does the user:
>>
>> a) know about the HA machines so the costs of HA are not hidden, and that
>> the implications of particular machine failures are clear?
>>
>> b) fix the system when a machine dies?
>>
>> c) scale up the system to x thousand nodes?
>>
>> d) scale down the system?
>>
>> For a), we could tag a machine in the status as a "state server", and
>> hope that the user knows what that means.
>>
>> For b) the suggestion is that the user notice that a state server machine
>> is non-responsive (as marked in status) and runs destroy-machine on it,
>> which will notice that it's a state server machine and automatically
>> start another one to replace it. Destroy-machine would refuse to work
>> on a state server machine that seems to be alive.
>>
>> For c) we could add a flag to ensure-ha-state suggesting a desired number
>> of state-server nodes.
>>
>> I'm not sure what the suggestion is for d) given that we refuse to
>> destroy live state-server machines.
>>
>> Although ensure-ha-state might be a fine way to turn
>> on HA initially I'm not entirely happy with expanding it to cover
>> all the above cases. It seems to me like we're going
>> to create a leaky abstraction that purports to be magic ("just wave the
>> HA wand!") and ends up being limiting, and in some cases confusing
>> ("Huh? I asked to destroy that machine and there's another one
>> just been created")
>>
>> I believe that any user that's using HA will need to understand that
>> some machines are running state servers, and when things fail, they
>> will need to manage those machines individually (for example by calling
>> destroy-machine).
>>
>> I also think that the solution to c) is limiting, because there is
>> actually no such thing as a "state server" - we have at least three
>> independently scalable juju components (the database servers (mongodb),
>> the API servers and the environment managers) with different scaling
>> characteristics. I believe that in any sufficiently large environment,
>> the user will not want to scale all of those at the same rate. For example
>> MongoDB will allow at most 12 members of a replica set, but a caching API
>> server could potentially usefully scale up much higher than that. We could
>> add more flags to ensure-ha-state (e.g.--state-server-count) but we then
>> we'd lack the capability to suggest which might be grouped with which.
>>
>> PROPOSAL
>>
>> My suggestion is that we go for a "slightly less magic" approach.
>> that provides the user with the tools to manage
>> their own high availability set up, adding appropriate automation in time.
>>
>> I suggest that we let the user know that machines can run as juju server
>> nodes, and provide them with the capability to *choose* which machines
>> will run as server nodes and which can host units - that is, what *jobs*
>> a machine will run.
>>
>> Here's a possible proposal:
>>
>> We already have an "add-machine" command. We'd add a "--jobs" flag
>> to allow the user to specify the jobs that the new machine(s) will
>> run. Initially we might have just two jobs, "manager" and "unit"
>> - the machine can either host service units, or it can manage the
>> juju environment (including running the state server database),
>> or both. In time we could add finer levels of granularity to allow
>> separate scalability of juju server components, without losing backwards
>> compatibility.
>>
>> If the new machine is marked as a "manager", it would run a mongo
>> replica set peer. This *would* mean that it would be possible to have
>> an even number of mongo peers, with the potential for a split vote
>> if the nodes were partitioned evenly, and resulting database stasis.
>> I don't *think* that would actually be a severe problem in practice.
>> We would make juju status point out the potential problem very clearly,
>> just as it should point out the potential problem if one of an existing
>> odd-sized replica set dies. The potential problems are the same in both
>> cases, and are straightforward for even a relatively naive user to avoid.
>>
>> Thus, juju ensure-ha-state is almost equivalent to:
>>
>> juju add-machine --jobs manager -n 2
>>
>> In my view, this command feels less "magic" than ensure-ha-state - the
>> runtime implication (e.g. cost) of what's going on are easier for the
>> user to understand and it requires no new entities in a user's model of
>> the system.
>>
>> In addition to the new add-machine flag, we'd add 

Re: High Availability command line interface - future plans.

2013-11-06 Thread Tim Penhey
On 07/11/13 09:11, David Cheney wrote:
> +1 (million), this solution keeps coming up, and I still feel it is
> the right one.
> 
> On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu
>  wrote:
>>
>> instead of adding more complexity and concepts, it would be ideal if we
>> could reuse the primitives we already have. ie juju environments have three
>> user exposed services, that users can add-unit / remove-unit etc.  they have
>> a juju prefix and therefore are omitted by default from status listing.
>> That's a much simpler story to document. how do i scale my state server..
>> juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
>>
>> -k

NOTE: removed j...@lists.ubuntu.com from the recipients;
  PLEASE DON'T CROSS-POST
Seriously!


For future direction I agree with this.  We talked about the idea behind
having the core parts of juju exposed as special services with units.
We talked about using namespaces.

I recall that Gustavo's point at the time is that we don't *need* this
to get HA, and that we can get HA much simpler to start with.

I fully support an approach where we have a simple command to get us
over the initial hump of managing support.

  juju ensure-ha  (note: not ensure-ha-state)

This brings up multiple manager nodes.

I like the idea that we treat manager nodes as special, and that
destroy-machine on them doesn't work the same way.

Consider this:

  juju boostrap
  juju ensure-ha

later machine-2 (a manager node goes down)

  juju ensure-ha

removes machine-2, and brings up machine-x to take it's place.  I was
talking with William, and I think we both agreed that we don't want to
restart manager nodes by magic, but wait for user intervention.

Now, looking to the future:

We would have services like:
  juju:db
  juju:api
  juju:something-else (for the other manager worker tasks)

bootstrap would then give machine-0 with a unit of each of these.

ensure-ha would bring up new machines with units of each of these.

A user could add two more api servers by going:

  juju add-unit juju:api -n 2

I think this gives us a clean, and understandable way of doing things,
but we SHOULD NOT do this first.

Tim

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Kapil Thangavelu
On Thu, Nov 7, 2013 at 5:54 AM, Tim Penhey  wrote:

> On 07/11/13 09:11, David Cheney wrote:
> > +1 (million), this solution keeps coming up, and I still feel it is
> > the right one.
> >
> > On Thu, Nov 7, 2013 at 7:07 AM, Kapil Thangavelu
> >  wrote:
> >>
> >> instead of adding more complexity and concepts, it would be ideal if we
> >> could reuse the primitives we already have. ie juju environments have
> three
> >> user exposed services, that users can add-unit / remove-unit etc.  they
> have
> >> a juju prefix and therefore are omitted by default from status listing.
> >> That's a much simpler story to document. how do i scale my state
> server..
> >> juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
> >>
> >> -k
>
> NOTE: removed j...@lists.ubuntu.com from the recipients;
>   PLEASE DON'T CROSS-POST
> Seriously!
>
>
> For future direction I agree with this.  We talked about the idea behind
> having the core parts of juju exposed as special services with units.
> We talked about using namespaces.
>

yes hierarchical namespaces would be ideal to convey this separation of
services, as a short term aid to proper namespace support a special cased
'juju' prefix as a namespace would suffice.



> I recall that Gustavo's point at the time is that we don't *need* this
> to get HA, and that we can get HA much simpler to start with.
>

simpler to implement perhaps, but at what complexity to end users.

i want to distinguish i don't care i the implementation if the
implementation is job based internally. What i care about is orthogonality
of interface for end users.



>
> I fully support an approach where we have a simple command to get us
> over the initial hump of managing support.
>
>   juju ensure-ha  (note: not ensure-ha-state)
>


>
> This brings up multiple manager nodes.
>

How many nodes? how do i know which machines are manager?  how does a user
see if there's an internal error on one of these? how do they resolve
errors on them?

we have known solutions for all of these things in juju, lets not invent a
parallel syntax or even worse assume it always just works in a blackbox.
Roger's proposal tries to address some of these but at the cost of a
parallel syntax via add-machine/remove-machine/status and a new concept of
end user job management (although internally job management would be useful
for internal schema upgrades). Trying to isolate it soley to ensure-ha
obscures visibility and behavior imo.


> I like the idea that we treat manager nodes as special, and that
> destroy-machine on them doesn't work the same way.
>
> Consider this:
>
>   juju boostrap
>   juju ensure-ha



> later machine-2 (a manager node goes down)



>
>
  juju ensure-ha
>
> removes machine-2, and brings up machine-x to take it's place.  I was
> talking with William, and I think we both agreed that we don't want to
> restart manager nodes by magic, but wait for user intervention.
>
>
you want to avoid magic, but removing and adding machines with special
behavior isn't magic? remove-machine wouldn't work on a manager machines,
status needs to grow behavior (else how do i even know i need to run
ensure-ha again).

What if there was other placed workloads on those machines?



> Now, looking to the future:
>
> We would have services like:
>   juju:db
>   juju:api
>   juju:something-else (for the other manager worker tasks)
>
> bootstrap would then give machine-0 with a unit of each of these.
>

sounds good.


> ensure-ha would bring up new machines with units of each of these.
>
> A user could add two more api servers by going:
>
>   juju add-unit juju:api -n 2
>
>
There's some notion of a unit step count missing for the db service, ie.
you ideally always have an odd count for the quorum else leader election
votes need an arbiter/weighting.



> I think this gives us a clean, and understandable way of doing things,
> but we SHOULD NOT do this first.
>
>
I'm not convinced.. but implementor's choice. otoh, If we're doing stop
gaps to expose internal mechanisms, then perhaps we should distribute them
as plugins.


cheers,

Kapil
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Ian Booth
So, I haven't been involved directly in a lot of the discussion, but my 2c is:

+1 to juju ensure-ha

Users don't give a f*ck about how Juju achieves HA, they just want to know their
data will survive a node outage. What Juju does under the covers to make that
happen, what jobs are run on what nodes etc - that's for Juju to care about.

+1 to high level, namespaced services (juju:api, juju:db etc)

This is a step above ensure-ha for more advanced users, but one which still
presents the solution space in terms any IS person involved in managing things
like scalable web services understands. ie there's the concept of services which
process requests and those which store data, and those which .
If the volume of incoming requests are such that the load on the api servers is
high while the database is still coping ok, "juju add-unit juju:api -n 3" can be
used to solve that efficiently, and vice versa. So it's all about mapping what
Juju does to terms and concepts already understood, and getting the level of
abstraction correct so the solution is usable by the target audience.

Anything that involves exposing things like jobs etc is not the right way of
looking at it IMO.


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Andrew Wilkins
On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth  wrote:

> So, I haven't been involved directly in a lot of the discussion, but my 2c
> is:
>
> +1 to juju ensure-ha
>
> Users don't give a f*ck about how Juju achieves HA, they just want to know
> their
> data will survive a node outage. What Juju does under the covers to make
> that
> happen, what jobs are run on what nodes etc - that's for Juju to care
> about.
>

I'm not so sure about that. I expect there'll be users who wants to know
*exactly* how it works, because otherwise they won't feel they can trust it
with their services. That's not to say that ensure-ha can't be trusted -
just that some users will want to know what it's doing under the covers.
Speculative, but based on past experience with banks, insurance companies,
etc.

Another thing to consider is that one person's HA is not the next person's.
I may want to disperse my state servers across multiple regions (were that
supported); you might find this costs too much in inter-region traffic.
What happens if I have a temporary outage in one region - where does
ensure-ha automatically spin up a new one? What happens when the original
comes back? Each of these things are things people may want to do
differently, because they each have different trade-offs.

I'm not really keen on ensure-ha due to the magical nature, but if it's
just a stop gap... I guess.

+1 to high level, namespaced services (juju:api, juju:db etc)
>
> This is a step above ensure-ha for more advanced users, but one which still
> presents the solution space in terms any IS person involved in managing
> things
> like scalable web services understands. ie there's the concept of services
> which
> process requests and those which store data, and those which  here>.
> If the volume of incoming requests are such that the load on the api
> servers is
> high while the database is still coping ok, "juju add-unit juju:api -n 3"
> can be
> used to solve that efficiently, and vice versa. So it's all about mapping
> what
> Juju does to terms and concepts already understood, and getting the level
> of
> abstraction correct so the solution is usable by the target audience.
>
> Anything that involves exposing things like jobs etc is not the right way
> of
> looking at it IMO.
>

I had suggested something very similar (add-machine --state) at SFO to what
Roger's suggested, but I can see the arguments against it. Overloading
add-unit seems like a decent alternative.

Cheers,
Andrew
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Ian Booth


On 07/11/13 12:00, Andrew Wilkins wrote:
> On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth  wrote:
> 
>> So, I haven't been involved directly in a lot of the discussion, but my 2c
>> is:
>>
>> +1 to juju ensure-ha
>>
>> Users don't give a f*ck about how Juju achieves HA, they just want to know
>> their
>> data will survive a node outage. What Juju does under the covers to make
>> that
>> happen, what jobs are run on what nodes etc - that's for Juju to care
>> about.
>>
> 
> I'm not so sure about that. I expect there'll be users who wants to know
> *exactly* how it works, because otherwise they won't feel they can trust it
> with their services. That's not to say that ensure-ha can't be trusted -
> just that some users will want to know what it's doing under the covers.
> Speculative, but based on past experience with banks, insurance companies,
> etc.
> 
> Another thing to consider is that one person's HA is not the next person's.
> I may want to disperse my state servers across multiple regions (were that
> supported); you might find this costs too much in inter-region traffic.
> What happens if I have a temporary outage in one region - where does
> ensure-ha automatically spin up a new one? What happens when the original
> comes back? Each of these things are things people may want to do
> differently, because they each have different trade-offs.
> 
> I'm not really keen on ensure-ha due to the magical nature, but if it's
> just a stop gap... I guess.
> 

ensure-has does not automatically bring up new services if a node goes down.
That will be a user initiated action initially. So there's user control.

Whether users want to know the gory detail of how HA works under the covers - I
agree users from different industry segments will have different expectations.
My past experience was such that users didn't care, so long as it worked. I
really don't see ensure-has as a stop gap. It's a legitimate solution for many
users.

My view is that the users who don't care will use ensure-ha; other users will be
able to deploy redundant services as described in my original email if they want
more control. But the abstraction needs to be done right, and in my view, it's
best done at the service level.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Tim Penhey
On 07/11/13 15:00, Andrew Wilkins wrote:
> On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth  > wrote:
> 
> So, I haven't been involved directly in a lot of the discussion, but
> my 2c is:
> 
> +1 to juju ensure-ha
> 
> Users don't give a f*ck about how Juju achieves HA, they just want
> to know their
> data will survive a node outage. What Juju does under the covers to
> make that
> happen, what jobs are run on what nodes etc - that's for Juju to
> care about.
> 
>  
> I'm not so sure about that. I expect there'll be users who wants to know
> *exactly* how it works, because otherwise they won't feel they can trust
> it with their services. That's not to say that ensure-ha can't be
> trusted - just that some users will want to know what it's doing under
> the covers. Speculative, but based on past experience with banks,
> insurance companies, etc.

I think if we gave no feedback at all, then yes, this would feel like
magic.  However, I'd expect us to at least say what we are doing on the
command line :-)

I think ensure-ha is sufficient for a first cut, and a way to get ha on
a running system.

For the record, we discussed the default behaviour for ensure-ha was to
make three nodes of manager services.  The user could override this by
specifying "-n 5" or "-n 7", or some other odd number.

> Another thing to consider is that one person's HA is not the next
> person's. I may want to disperse my state servers across multiple
> regions (were that supported); you might find this costs too much in
> inter-region traffic. What happens if I have a temporary outage in one
> region - where does ensure-ha automatically spin up a new one? What
> happens when the original comes back? Each of these things are things
> people may want to do differently, because they each have different
> trade-offs.

I agree that support over regions is an important idea, but this is way
outside the scope of this HA discussion.

AFAIK, our cross-region story is still all about cross-environment
relations, not spanning regions with one environment.

Tim

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-06 Thread Marco Ceppi
Hi guys,

I'm glad j...@lists.ubuntu.com got accidentally looped in because I may not
have caught wind of this. I can understand both sides of the discussion,
one where we provide more magic and the users trust that it works and the
other where we leverage existing components and command structures of juju
to provide this magic.

I have to agree with Kapil's point about add-unit/remove-unit syntax for
Juju HA. Having had to teach and demonstrate juju to quite a few people
now, juju is not an easy concept to grasp. Orchestration is really
something that people are just now starting to think about in general,
never mind how to wrap their heads around the concept and then furthermore
how we envision that concept, which is distilled in our product - juju. At
the end of the day I get it, we get it, it's easy for us because we're here
building it, but for the people out there it's a whole new language. If we
start off by saying "Oh, hey there, just run this ensure-ha command and
things will just be fantastic" is fine, but once you open up that route
it's going to be hard to back-peddle.

We already teach "Oh, your services is popular? Just `juju add-unit
` magic will happen, units will fire, and you've scaled up. You've
added an additional available unit and you're safer than you were before".
Being able to convey the same strategy for when you want to safeguard and
make Juju's bootstrap a highly available service, the natural logic would
be to `juju add-unit`. In fact I was even asked this in a Charm School
recently, I'm paraphrasing but it was to some extent "Can I juju add-unit
bootstrap".

Since the majority of people seem to believe having the ultimate goal of
adding and removing juju specific services via a unique and reserved
namespace is a great goal to have it seems not shooting for that first
would simply introduce another awkward period of time which we have this
great feature but it's going to change soon so videos, blog posts, content
we produce to promote this shear awesomeness becomes stale and out of date
just as soon as the more "permanent" method of HA lands. For new users
learning a language this just becomes another hurtle to overcome in order
to be an expert and one more reason to look at something else other than
Juju.

Therefore, I (who really has no major say in this, simply because I'm not
capable of helping produce a solution) believe it's best to work for the
ultimate goal now instead of having to build a stop gap just to say we have
HA.

On a final note, if namespacing does become a thing, can we *please *use a
unique character for the separation of namespace:service? A : would be
fantastic as calling something juju-db could very well be mistaken or
deployed as another service? `juju deploy some-random-thing juju-*` now we
have things sharing a special namespace that aren't actually special. (Like
juju-gui, though juju-gui is quite special and awesome, it's not "juju-"
core namespace special).

Thanks for all the awesome work you all do. I look forward to a solution,
whatever it may be, in the future!

Marco Ceppi


On Wed, Nov 6, 2013 at 9:22 PM, Tim Penhey  wrote:

> On 07/11/13 15:00, Andrew Wilkins wrote:
> > On Thu, Nov 7, 2013 at 9:23 AM, Ian Booth  > > wrote:
> >
> > So, I haven't been involved directly in a lot of the discussion, but
> > my 2c is:
> >
> > +1 to juju ensure-ha
> >
> > Users don't give a f*ck about how Juju achieves HA, they just want
> > to know their
> > data will survive a node outage. What Juju does under the covers to
> > make that
> > happen, what jobs are run on what nodes etc - that's for Juju to
> > care about.
> >
> >
> > I'm not so sure about that. I expect there'll be users who wants to know
> > *exactly* how it works, because otherwise they won't feel they can trust
> > it with their services. That's not to say that ensure-ha can't be
> > trusted - just that some users will want to know what it's doing under
> > the covers. Speculative, but based on past experience with banks,
> > insurance companies, etc.
>
> I think if we gave no feedback at all, then yes, this would feel like
> magic.  However, I'd expect us to at least say what we are doing on the
> command line :-)
>
> I think ensure-ha is sufficient for a first cut, and a way to get ha on
> a running system.
>
> For the record, we discussed the default behaviour for ensure-ha was to
> make three nodes of manager services.  The user could override this by
> specifying "-n 5" or "-n 7", or some other odd number.
>
> > Another thing to consider is that one person's HA is not the next
> > person's. I may want to disperse my state servers across multiple
> > regions (were that supported); you might find this costs too much in
> > inter-region traffic. What happens if I have a temporary outage in one
> > region - where does ensure-ha automatically spin up a new one? What
> > happens when the original comes back? Each of these things

Re: High Availability command line interface - future plans.

2013-11-06 Thread Tim Penhey
On 07/11/13 16:23, Marco Ceppi wrote:
> On a final note, if namespacing does become a thing, can we /please /use
> a unique character for the separation of namespace:service? A : would be
> fantastic as calling something juju-db could very well be mistaken or
> deployed as another service? `juju deploy some-random-thing juju-*` now
> we have things sharing a special namespace that aren't actually special.
> (Like juju-gui, though juju-gui is quite special and awesome, it's not
> "juju-" core namespace special).

colon isn't in the regex for service names, so
  namespace:service
makes a certain amount of sense.

Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-07 Thread roger peppe
On 6 November 2013 20:07, Kapil Thangavelu
 wrote:
> instead of adding more complexity and concepts, it would be ideal if we
> could reuse the primitives we already have. ie juju environments have three
> user exposed services, that users can add-unit / remove-unit etc.  they have
> a juju prefix and therefore are omitted by default from status listing.
> That's a much simpler story to document. how do i scale my state server..
> juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.

I have a lot of sympathy with this point of view. I've thought about
it quite a bit.

I see two possibilities for implementing it:

1) Keep something like the existing architecture, where machine agents can
take on managerial roles, but provide a veneer over the top which
specially interprets service operations on the juju built-in services
and translates them into operations on machine jobs.

2) Actually implement the various juju services as proper services.

The difficulty I have with 1) is that there's a significant mismatch between
the user's view of things and what's going on underneath.
For instance, with a built-in service, can I:

- add a subordinate service to it?
- see the relevant log file in the usual place for a unit?
- see its charm metadata?
- join to its juju-info relation?

If it's a single service, how can its units span different series?
(presumably it has got a charm URL, which includes the series)

I fear that if we try this approach, the cracks show through
and the result is a system that's hard to understand because
too many things are not what they appear.
And that's not even going into the plethora of special
casing that this approach would require throughout the code.

2) is more attractive, as it's actually doing what's written on the
label. But this has its own problems.

- it's a highly significant architectural change.

- juju managerial services are tightly tied into the operation
of juju itself (not surprisingly). There are many chicken and egg
problems here - we would be trying to use the system to support itself,
and that could easily lead to deadlock as one part of the system
tries to talk to another part of the system that relies on the first.
I think it *might* be possible, but it's not gonna be easy
and I suspect nasty gotchas at the end of a long development process.

- again there are inevitably going to be many special cases
throughout the code - for instance, how does a unit
acquire the credentials it needs to talk to the API
server?

It may be that a hybrid approach is possible - for example
implementing the workers as a service and still having mongo
and the API server as machine workers. I think that's
a reasonable evolutionary step from the approach I'm proposing.


The reasoning behind my proposed approach perhaps
comes from the fact that (I'm almost ashamed to admit it)
I'm a lazy programmer. I don't like creating mountains of code
where a small amount will do almost as well.

Adding the concept of jobs on machines maps very closely
to the architecture that we have today. It is a single
extra concept for the user to understand - all the other
features (e.g. add-machine and destroy-machine) are already
exposed.

I agree that in an ideal world we would scale juju meta-services
just as we would scale normal services, but I think it's actually
reasonable to have a special case here.

Allowing the user to know that machines can take on juju managerial
roles doesn't seem to be a huge ask. And we get just as much
functionality with considerably less code, which seems like a significant
win to me in terms of ongoing maintainability and agility for the future.

  cheers,
rog.

PS apologies; my last cross-post, honest! followups to
juju-dev@lists.ubuntu.com only.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
I have a few high level thoughts on all of this, but the key thing I want
to say is that we need to get a meeting setup next week for the solution to
get hammered out.

First, conceptually, I don't believe the user model needs to match the
implementation model.  That way lies madness -- users care about the things
they care about and should not have to understand how the system works to
get something basic done. See:
http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
reasons why I call this madness.

For that reason I think the path of adding a --jobs flag to add-machine is
not a move forward.  It is exposing implementation detail to users and
forcing them into a more complex conceptual model.

Second, we don't have to boil the ocean all at once. An "ensure-ha" command
that sets up additional server nodes is better than what we have now --
nothing.  Nate is right, the box need not be black, we could have an juju
ha-status command that just shows the state of HA.   This is fundamentally
different than changing the behavior and meaning of add-machines to know
about juju jobs and agents and forcing folks to think about that.

Third, we I think it is possible to chart a course from ensure-ha as a
shortcut (implemented first) to the type of syntax and feature set that
Kapil is talking about.  And let's not kid ourselves, there are a bunch of
new features in that proposal:

 * Namespaces for services
 * support for subordinates to state services
 * logging changes
 * lifecycle events on juju "jobs"
 * special casing the removal of services that would kill the environment
 * special casing the stats to know about HA and warn for even state server
nodes

I think we will be adding a new concept and some new syntax when we add HA
to juju -- so the idea is just to make it easier for users to understand,
and to allow a path forward to something like what Kapil suggests in the
future.   And I'm pretty solidly convinced that there is an incremental
path forward.

Fourth, the spelling "ensure-ha" is probably not a very good idea, the
cracks in that system (like taking a -n flag, and dealing with failed
machines) are already apparent.

I think something like Nick's proposal for "add-manager" would be better.
Though I don't think that's quite right either.

So, I propose we add one new idea for users -- a state-server.

then you'd have:

juju management --info
juju management --add
juju management --add --to 3
juju management --remove-from

I know this is not following the add-machine format, but I think it would
be better to migrate that to something more like this:

juju machine --add

--Mark Ramm





On Thu, Nov 7, 2013 at 8:16 PM, roger peppe wrote:

> On 6 November 2013 20:07, Kapil Thangavelu
>  wrote:
> > instead of adding more complexity and concepts, it would be ideal if we
> > could reuse the primitives we already have. ie juju environments have
> three
> > user exposed services, that users can add-unit / remove-unit etc.  they
> have
> > a juju prefix and therefore are omitted by default from status listing.
> > That's a much simpler story to document. how do i scale my state server..
> > juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
>
> I have a lot of sympathy with this point of view. I've thought about
> it quite a bit.
>
> I see two possibilities for implementing it:
>
> 1) Keep something like the existing architecture, where machine agents can
> take on managerial roles, but provide a veneer over the top which
> specially interprets service operations on the juju built-in services
> and translates them into operations on machine jobs.
>
> 2) Actually implement the various juju services as proper services.
>
> The difficulty I have with 1) is that there's a significant mismatch
> between
> the user's view of things and what's going on underneath.
> For instance, with a built-in service, can I:
>
> - add a subordinate service to it?
> - see the relevant log file in the usual place for a unit?
> - see its charm metadata?
> - join to its juju-info relation?
>
> If it's a single service, how can its units span different series?
> (presumably it has got a charm URL, which includes the series)
>
> I fear that if we try this approach, the cracks show through
> and the result is a system that's hard to understand because
> too many things are not what they appear.
> And that's not even going into the plethora of special
> casing that this approach would require throughout the code.
>
> 2) is more attractive, as it's actually doing what's written on the
> label. But this has its own problems.
>
> - it's a highly significant architectural change.
>
> - juju managerial services are tightly tied into the operation
> of juju itself (not surprisingly). There are many chicken and egg
> problems here - we would be trying to use the system to support itself,
> and that could easily lead to deadlock as one part of the system
> tries to talk to another part of the system that reli

Re: High Availability command line interface - future plans.

2013-11-08 Thread Andrew Wilkins
On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen <
mark.ramm-christen...@canonical.com> wrote:

> I have a few high level thoughts on all of this, but the key thing I want
> to say is that we need to get a meeting setup next week for the solution to
> get hammered out.
>
> First, conceptually, I don't believe the user model needs to match the
> implementation model.  That way lies madness -- users care about the things
> they care about and should not have to understand how the system works to
> get something basic done. See:
> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
> reasons why I call this madness.
>
> For that reason I think the path of adding a --jobs flag to add-machine is
> not a move forward.  It is exposing implementation detail to users and
> forcing them into a more complex conceptual model.
>
> Second, we don't have to boil the ocean all at once. An "ensure-ha"
> command that sets up additional server nodes is better than what we have
> now -- nothing.  Nate is right, the box need not be black, we could have an
> juju ha-status command that just shows the state of HA.   This is
> fundamentally different than changing the behavior and meaning of
> add-machines to know about juju jobs and agents and forcing folks to think
> about that.
>
> Third, we I think it is possible to chart a course from ensure-ha as a
> shortcut (implemented first) to the type of syntax and feature set that
> Kapil is talking about.  And let's not kid ourselves, there are a bunch of
> new features in that proposal:
>
>  * Namespaces for services
>  * support for subordinates to state services
>  * logging changes
>  * lifecycle events on juju "jobs"
>  * special casing the removal of services that would kill the environment
>  * special casing the stats to know about HA and warn for even state
> server nodes
>
> I think we will be adding a new concept and some new syntax when we add HA
> to juju -- so the idea is just to make it easier for users to understand,
> and to allow a path forward to something like what Kapil suggests in the
> future.   And I'm pretty solidly convinced that there is an incremental
> path forward.
>
> Fourth, the spelling "ensure-ha" is probably not a very good idea, the
> cracks in that system (like taking a -n flag, and dealing with failed
> machines) are already apparent.
>
> I think something like Nick's proposal for "add-manager" would be better.
>   Though I don't think that's quite right either.
>
> So, I propose we add one new idea for users -- a state-server.
>
> then you'd have:
>
> juju management --info
> juju management --add
> juju management --add --to 3
> juju management --remove-from
>

Sounds good to me. Similar to how I was thinking of doing it originally,
but segregating it from add-machine etc. should prevent adding cognitive
overhead for users that don't care. Also, not so much leakage of internals,
and no magic (a good thing!)

I know this is not following the add-machine format, but I think it would
> be better to migrate that to something more like this:
>
> juju machine --add
>
> --Mark Ramm
>
>
>
>
>
> On Thu, Nov 7, 2013 at 8:16 PM, roger peppe wrote:
>
>> On 6 November 2013 20:07, Kapil Thangavelu
>>  wrote:
>> > instead of adding more complexity and concepts, it would be ideal if we
>> > could reuse the primitives we already have. ie juju environments have
>> three
>> > user exposed services, that users can add-unit / remove-unit etc.  they
>> have
>> > a juju prefix and therefore are omitted by default from status listing.
>> > That's a much simpler story to document. how do i scale my state
>> server..
>> > juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
>>
>> I have a lot of sympathy with this point of view. I've thought about
>> it quite a bit.
>>
>> I see two possibilities for implementing it:
>>
>> 1) Keep something like the existing architecture, where machine agents can
>> take on managerial roles, but provide a veneer over the top which
>> specially interprets service operations on the juju built-in services
>> and translates them into operations on machine jobs.
>>
>> 2) Actually implement the various juju services as proper services.
>>
>> The difficulty I have with 1) is that there's a significant mismatch
>> between
>> the user's view of things and what's going on underneath.
>> For instance, with a built-in service, can I:
>>
>> - add a subordinate service to it?
>> - see the relevant log file in the usual place for a unit?
>> - see its charm metadata?
>> - join to its juju-info relation?
>>
>> If it's a single service, how can its units span different series?
>> (presumably it has got a charm URL, which includes the series)
>>
>> I fear that if we try this approach, the cracks show through
>> and the result is a system that's hard to understand because
>> too many things are not what they appear.
>> And that's not even going into the plethora of special
>> casing that this approach wo

Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
Given a bit of thought the reasons that I proposed the sub command
remove-from rather than just remove are both obscure enough that I should
have explained them, and wrong enough that I should not have proposed that
syntax.

I was thinking that remove always requires a machine ID, and that add did
not which made them asymmetric enough to justify a different spelling, but
a bit of further thinking leads me to think that this is already the case
with add-unit and remove-unit, and therefore consistency is better than a
new spelling.



On Fri, Nov 8, 2013 at 5:15 PM, Andrew Wilkins  wrote:

> On Fri, Nov 8, 2013 at 4:47 PM, Mark Canonical Ramm-Christensen <
> mark.ramm-christen...@canonical.com> wrote:
>
>> I have a few high level thoughts on all of this, but the key thing I want
>> to say is that we need to get a meeting setup next week for the solution to
>> get hammered out.
>>
>> First, conceptually, I don't believe the user model needs to match the
>> implementation model.  That way lies madness -- users care about the things
>> they care about and should not have to understand how the system works to
>> get something basic done. See:
>> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
>> reasons why I call this madness.
>>
>> For that reason I think the path of adding a --jobs flag to add-machine
>> is not a move forward.  It is exposing implementation detail to users and
>> forcing them into a more complex conceptual model.
>>
>> Second, we don't have to boil the ocean all at once. An "ensure-ha"
>> command that sets up additional server nodes is better than what we have
>> now -- nothing.  Nate is right, the box need not be black, we could have an
>> juju ha-status command that just shows the state of HA.   This is
>> fundamentally different than changing the behavior and meaning of
>> add-machines to know about juju jobs and agents and forcing folks to think
>> about that.
>>
>> Third, we I think it is possible to chart a course from ensure-ha as a
>> shortcut (implemented first) to the type of syntax and feature set that
>> Kapil is talking about.  And let's not kid ourselves, there are a bunch of
>> new features in that proposal:
>>
>>  * Namespaces for services
>>  * support for subordinates to state services
>>  * logging changes
>>  * lifecycle events on juju "jobs"
>>  * special casing the removal of services that would kill the environment
>>  * special casing the stats to know about HA and warn for even state
>> server nodes
>>
>> I think we will be adding a new concept and some new syntax when we add
>> HA to juju -- so the idea is just to make it easier for users to
>> understand, and to allow a path forward to something like what Kapil
>> suggests in the future.   And I'm pretty solidly convinced that there is an
>> incremental path forward.
>>
>> Fourth, the spelling "ensure-ha" is probably not a very good idea, the
>> cracks in that system (like taking a -n flag, and dealing with failed
>> machines) are already apparent.
>>
>> I think something like Nick's proposal for "add-manager" would be
>> better.   Though I don't think that's quite right either.
>>
>> So, I propose we add one new idea for users -- a state-server.
>>
>> then you'd have:
>>
>> juju management --info
>> juju management --add
>> juju management --add --to 3
>> juju management --remove-from
>>
>
> Sounds good to me. Similar to how I was thinking of doing it originally,
> but segregating it from add-machine etc. should prevent adding cognitive
> overhead for users that don't care. Also, not so much leakage of internals,
> and no magic (a good thing!)
>
> I know this is not following the add-machine format, but I think it would
>> be better to migrate that to something more like this:
>>
>> juju machine --add
>>
>> --Mark Ramm
>>
>>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 8:16 PM, roger peppe wrote:
>>
>>> On 6 November 2013 20:07, Kapil Thangavelu
>>>  wrote:
>>> > instead of adding more complexity and concepts, it would be ideal if we
>>> > could reuse the primitives we already have. ie juju environments have
>>> three
>>> > user exposed services, that users can add-unit / remove-unit etc.
>>>  they have
>>> > a juju prefix and therefore are omitted by default from status listing.
>>> > That's a much simpler story to document. how do i scale my state
>>> server..
>>> > juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
>>>
>>> I have a lot of sympathy with this point of view. I've thought about
>>> it quite a bit.
>>>
>>> I see two possibilities for implementing it:
>>>
>>> 1) Keep something like the existing architecture, where machine agents
>>> can
>>> take on managerial roles, but provide a veneer over the top which
>>> specially interprets service operations on the juju built-in services
>>> and translates them into operations on machine jobs.
>>>
>>> 2) Actually implement the various juju services as proper services.
>>>
>>> The difficulty I have with 1) is that there's a sign

Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 08:47, Mark Canonical Ramm-Christensen
 wrote:
> I have a few high level thoughts on all of this, but the key thing I want to
> say is that we need to get a meeting setup next week for the solution to get
> hammered out.
>
> First, conceptually, I don't believe the user model needs to match the
> implementation model.  That way lies madness -- users care about the things
> they care about and should not have to understand how the system works to
> get something basic done. See:
> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
> reasons why I call this madness.
>
> For that reason I think the path of adding a --jobs flag to add-machine is
> not a move forward.  It is exposing implementation detail to users and
> forcing them into a more complex conceptual model.
>
> Second, we don't have to boil the ocean all at once. An "ensure-ha" command
> that sets up additional server nodes is better than what we have now --
> nothing.  Nate is right, the box need not be black, we could have an juju
> ha-status command that just shows the state of HA.   This is fundamentally
> different than changing the behavior and meaning of add-machines to know
> about juju jobs and agents and forcing folks to think about that.
>
> Third, we I think it is possible to chart a course from ensure-ha as a
> shortcut (implemented first) to the type of syntax and feature set that
> Kapil is talking about.  And let's not kid ourselves, there are a bunch of
> new features in that proposal:
>
>  * Namespaces for services
>  * support for subordinates to state services
>  * logging changes
>  * lifecycle events on juju "jobs"
>  * special casing the removal of services that would kill the environment
>  * special casing the stats to know about HA and warn for even state server
> nodes
>
> I think we will be adding a new concept and some new syntax when we add HA
> to juju -- so the idea is just to make it easier for users to understand,
> and to allow a path forward to something like what Kapil suggests in the
> future.   And I'm pretty solidly convinced that there is an incremental path
> forward.
>
> Fourth, the spelling "ensure-ha" is probably not a very good idea, the
> cracks in that system (like taking a -n flag, and dealing with failed
> machines) are already apparent.
>
> I think something like Nick's proposal for "add-manager" would be better.
> Though I don't think that's quite right either.
>
> So, I propose we add one new idea for users -- a state-server.
>
> then you'd have:
>
> juju management --info
> juju management --add
> juju management --add --to 3
> juju management --remove-from

This seems like a reasonable approach in principle (it's essentially isomorphic
to the --jobs approach AFAICS which makes me happy).

I have to say that I'm not keen on using flags to switch
the basic behaviour of a command. The interaction between
the flags can then become non-obvious (for example a --constraints
flag might be appropriate with --add but not --remove-from).

Ah, but your next message seems to go along with that.

So, to couch your proposal in terms that are consistent with the
rest of the juju commands, here's how I see it could look,
in terms of possible help output from the commands:

usage: juju add-management [options]
purpose: Add Juju management functionality to a machine,
or start a new machine with management functionality.
Any Juju machine can potentially participate as a Juju
manager - this command adds a new such manager.
Note that there should always be an odd number
of active management machines, otherwise the Juju
environment is potentially vulnerable to network
partitioning. If a management machine fails,
a new one should be started to replace it.

options:
--constraints  (= )
additional machine constraints. Ignored if --to is specified.
-e, --environment (= "local")
juju environment to operate in
--series (= "")
the Ubuntu series of the new machine. Ignored if --to is specified.
--to (="")
   the id of the machine to add management to. If this is not specified,
   a new machine is provisioned.

usage: juju remove-management [options] 
purpose: Remove Juju management functionality from
the machine with the given id. The machine itself is not
destroyed. Note that if there are less than three management
machines remaining, the operation of the Juju environment
will be vulnerable to the failure of a single machine.
It is not possible to remove the last management machine.

options:
-e, --environment (= "local")
juju environment to operate in

As a start, we could implement only the add-management command,
and not implement the --to flag. That would be sufficient for our
HA deliverable, I believe. The other features could be added in time
or according to customer demand.

> I know this is not following the add-machine format, but I think it would be
> better to migrate that to something more like this:
>
> juju machine --add

If we are going to do that, I thin

Re: High Availability command line interface - future plans.

2013-11-08 Thread John Arbash Meinel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2013-11-08 14:15, roger peppe wrote:
> On 8 November 2013 08:47, Mark Canonical Ramm-Christensen 
>  wrote:
>> I have a few high level thoughts on all of this, but the key
>> thing I want to say is that we need to get a meeting setup next
>> week for the solution to get hammered out.
>> 
>> First, conceptually, I don't believe the user model needs to
>> match the implementation model.  That way lies madness -- users
>> care about the things they care about and should not have to
>> understand how the system works to get something basic done.
>> See: 
>> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140
>> for reasons why I call this madness.
>> 
>> For that reason I think the path of adding a --jobs flag to
>> add-machine is not a move forward.  It is exposing implementation
>> detail to users and forcing them into a more complex conceptual
>> model.
>> 
>> Second, we don't have to boil the ocean all at once. An
>> "ensure-ha" command that sets up additional server nodes is
>> better than what we have now -- nothing.  Nate is right, the box
>> need not be black, we could have an juju ha-status command that
>> just shows the state of HA.   This is fundamentally different
>> than changing the behavior and meaning of add-machines to know 
>> about juju jobs and agents and forcing folks to think about
>> that.
>> 
>> Third, we I think it is possible to chart a course from ensure-ha
>> as a shortcut (implemented first) to the type of syntax and
>> feature set that Kapil is talking about.  And let's not kid
>> ourselves, there are a bunch of new features in that proposal:
>> 
>> * Namespaces for services * support for subordinates to state
>> services * logging changes * lifecycle events on juju "jobs" *
>> special casing the removal of services that would kill the
>> environment * special casing the stats to know about HA and warn
>> for even state server nodes
>> 
>> I think we will be adding a new concept and some new syntax when
>> we add HA to juju -- so the idea is just to make it easier for
>> users to understand, and to allow a path forward to something
>> like what Kapil suggests in the future.   And I'm pretty solidly
>> convinced that there is an incremental path forward.
>> 
>> Fourth, the spelling "ensure-ha" is probably not a very good
>> idea, the cracks in that system (like taking a -n flag, and
>> dealing with failed machines) are already apparent.
>> 
>> I think something like Nick's proposal for "add-manager" would be
>> better. Though I don't think that's quite right either.
>> 
>> So, I propose we add one new idea for users -- a state-server.
>> 
>> then you'd have:
>> 
>> juju management --info juju management --add juju management
>> --add --to 3 juju management --remove-from
> 
> This seems like a reasonable approach in principle (it's
> essentially isomorphic to the --jobs approach AFAICS which makes me
> happy).
> 
> I have to say that I'm not keen on using flags to switch the basic
> behaviour of a command. The interaction between the flags can then
> become non-obvious (for example a --constraints flag might be
> appropriate with --add but not --remove-from).
> 
> Ah, but your next message seems to go along with that.
> 
> So, to couch your proposal in terms that are consistent with the 
> rest of the juju commands, here's how I see it could look, in terms
> of possible help output from the commands:
> 
> usage: juju add-management [options] purpose: Add Juju management
> functionality to a machine, or start a new machine with management
> functionality. Any Juju machine can potentially participate as a
> Juju manager - this command adds a new such manager. Note that
> there should always be an odd number of active management machines,
> otherwise the Juju environment is potentially vulnerable to
> network partitioning. If a management machine fails, a new one
> should be started to replace it.

I would probably avoid putting such an emphasis on "any machine can be
a manager machine". But that is my personal opinion. (If you want HA
you probably want it on dedicated nodes.)

> 
> options: --constraints  (= ) additional machine constraints.
> Ignored if --to is specified. -e, --environment (= "local") juju
> environment to operate in --series (= "") the Ubuntu series of the
> new machine. Ignored if --to is specified. --to (="") the id of the
> machine to add management to. If this is not specified, a new
> machine is provisioned.
> 
> usage: juju remove-management [options]  purpose:
> Remove Juju management functionality from the machine with the
> given id. The machine itself is not destroyed. Note that if there
> are less than three management machines remaining, the operation of
> the Juju environment will be vulnerable to the failure of a single
> machine. It is not possible to remove the last management machine.
> 

I would probably also remove the machine if the only thing on it was
the management. Certainly tha

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel
 wrote:
> I would probably avoid putting such an emphasis on "any machine can be
> a manager machine". But that is my personal opinion. (If you want HA
> you probably want it on dedicated nodes.)

Resource waste holds juju back for the small users. Being able to
share a state server with other resources does sound attractive from
that perspective. It may be the difference between running 3 machines
or 6.

> I would probably also remove the machine if the only thing on it was
> the management. Certainly that is how people want us to do "juju
> remove-unit".

If there are other units in the same machine, we should definitely not
remove the machine on remove-unit. The principle sounds the same with
state servers.

> The main problem with this is that it feels slightly too easy to add
> just 1 machine and then not actually have HA (mongo stops allowing
> writes if you have a 2-node cluster and lose one, right?)

+1


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
On Fri, Nov 8, 2013 at 6:34 AM, Gustavo Niemeyer wrote:

> On Fri, Nov 8, 2013 at 8:31 AM, John Arbash Meinel
>  wrote:
> > I would probably avoid putting such an emphasis on "any machine can be
> > a manager machine". But that is my personal opinion. (If you want HA
> > you probably want it on dedicated nodes.)
>
> Resource waste holds juju back for the small users. Being able to
> share a state server with other resources does sound attractive from
> that perspective. It may be the difference between running 3 machines
> or 6.


If you only have 3 machines, do you really need HA from juju? You don't
have HA from your machines that are actually *running your service*.


> > I would probably also remove the machine if the only thing on it was
> > the management. Certainly that is how people want us to do "juju
> > remove-unit".
>
> If there are other units in the same machine, we should definitely not
> remove the machine on remove-unit. The principle sounds the same with
> state servers.
>
> > The main problem with this is that it feels slightly too easy to add
> > just 1 machine and then not actually have HA (mongo stops allowing
> > writes if you have a 2-node cluster and lose one, right?)
>
> +1
>

Yeah, same here. I still think we need a "turn on HA mode" command that'll
bring you to 3 servers.  It doesn't have to be the swiss army knife that we
said before... just something to go from non-HA to valid HA environment.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
These are *very* good points, Mark. Taking them to heart will
definitely lead into a good direction for the overall feature
development.

It sounds like we should avoid using a "management" command for
anything in juju, though. Most things in juju are about management one
way or the other, so "juju management" becomes very unclear and hard
to search for.

Instead, the command might be named after what we've been calling them:

juju add-state-server -n 2

For implementation convenience sake, it would be okay to only ever
accept -n 2 when this is first released. I can also imagine the
behavior of this command resembling add-unit in a few aspects, since a
state server is in fact code that just needs a home to run in. This
may yield other common options across them, such as machine selection.


On Fri, Nov 8, 2013 at 6:47 AM, Mark Canonical Ramm-Christensen
 wrote:
> I have a few high level thoughts on all of this, but the key thing I want to
> say is that we need to get a meeting setup next week for the solution to get
> hammered out.
>
> First, conceptually, I don't believe the user model needs to match the
> implementation model.  That way lies madness -- users care about the things
> they care about and should not have to understand how the system works to
> get something basic done. See:
> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140 for
> reasons why I call this madness.
>
> For that reason I think the path of adding a --jobs flag to add-machine is
> not a move forward.  It is exposing implementation detail to users and
> forcing them into a more complex conceptual model.
>
> Second, we don't have to boil the ocean all at once. An "ensure-ha" command
> that sets up additional server nodes is better than what we have now --
> nothing.  Nate is right, the box need not be black, we could have an juju
> ha-status command that just shows the state of HA.   This is fundamentally
> different than changing the behavior and meaning of add-machines to know
> about juju jobs and agents and forcing folks to think about that.
>
> Third, we I think it is possible to chart a course from ensure-ha as a
> shortcut (implemented first) to the type of syntax and feature set that
> Kapil is talking about.  And let's not kid ourselves, there are a bunch of
> new features in that proposal:
>
>  * Namespaces for services
>  * support for subordinates to state services
>  * logging changes
>  * lifecycle events on juju "jobs"
>  * special casing the removal of services that would kill the environment
>  * special casing the stats to know about HA and warn for even state server
> nodes
>
> I think we will be adding a new concept and some new syntax when we add HA
> to juju -- so the idea is just to make it easier for users to understand,
> and to allow a path forward to something like what Kapil suggests in the
> future.   And I'm pretty solidly convinced that there is an incremental path
> forward.
>
> Fourth, the spelling "ensure-ha" is probably not a very good idea, the
> cracks in that system (like taking a -n flag, and dealing with failed
> machines) are already apparent.
>
> I think something like Nick's proposal for "add-manager" would be better.
> Though I don't think that's quite right either.
>
> So, I propose we add one new idea for users -- a state-server.
>
> then you'd have:
>
> juju management --info
> juju management --add
> juju management --add --to 3
> juju management --remove-from
>
> I know this is not following the add-machine format, but I think it would be
> better to migrate that to something more like this:
>
> juju machine --add
>
> --Mark Ramm
>
>
>
>
>
> On Thu, Nov 7, 2013 at 8:16 PM, roger peppe 
> wrote:
>>
>> On 6 November 2013 20:07, Kapil Thangavelu
>>  wrote:
>> > instead of adding more complexity and concepts, it would be ideal if we
>> > could reuse the primitives we already have. ie juju environments have
>> > three
>> > user exposed services, that users can add-unit / remove-unit etc.  they
>> > have
>> > a juju prefix and therefore are omitted by default from status listing.
>> > That's a much simpler story to document. how do i scale my state
>> > server..
>> > juju add-unit juju-db... my provisioner juju add-unit juju-provisioner.
>>
>> I have a lot of sympathy with this point of view. I've thought about
>> it quite a bit.
>>
>> I see two possibilities for implementing it:
>>
>> 1) Keep something like the existing architecture, where machine agents can
>> take on managerial roles, but provide a veneer over the top which
>> specially interprets service operations on the juju built-in services
>> and translates them into operations on machine jobs.
>>
>> 2) Actually implement the various juju services as proper services.
>>
>> The difficulty I have with 1) is that there's a significant mismatch
>> between
>> the user's view of things and what's going on underneath.
>> For instance, with a built-in service, can I:
>>
>> - add a subordinate service to it?
>> - see t

Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 9:39 AM, Nate Finch  wrote:
> If you only have 3 machines, do you really need HA from juju? You don't have
> HA from your machines that are actually running your service.

Why not? I have three machines..

> Yeah, same here. I still think we need a "turn on HA mode" command that'll
> bring you to 3 servers.  It doesn't have to be the swiss army knife that we
> said before... just something to go from non-HA to valid HA environment.

This looks fine:

juju add-state-server -n 2

It's easy to error if current + n is not a good number.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 11:31, Gustavo Niemeyer  wrote:
> These are *very* good points, Mark. Taking them to heart will
> definitely lead into a good direction for the overall feature
> development.
>
> It sounds like we should avoid using a "management" command for
> anything in juju, though. Most things in juju are about management one
> way or the other, so "juju management" becomes very unclear and hard
> to search for.
>
> Instead, the command might be named after what we've been calling them:
>
> juju add-state-server -n 2

I'm not sure that state-server is the right name here.  For a start there
are two kinds of state servers, mongo and API, which we may want to scale
independently as they have totally different characteristics, and the
management workers (provisioner, etc) also fall under the same umbrella.
"Management" has been the best I've seen so far, though
I do realise it is overly generic.

Other suggestions?

Are you suggesting that we also have "destroy-state-server", BTW?

> It's easy to error if current + n is not a good number.

That seems reasonable. Do you think this needs to be transactional?
That is, if current is 2 and two people concurrently do add-state-server -n 1,
should one of those requests necessarily fail? My inclination is we
don't need to
worry too much - but YMMV.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 10:31, John Arbash Meinel  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 2013-11-08 14:15, roger peppe wrote:
>> On 8 November 2013 08:47, Mark Canonical Ramm-Christensen
>>  wrote:
>>> I have a few high level thoughts on all of this, but the key
>>> thing I want to say is that we need to get a meeting setup next
>>> week for the solution to get hammered out.
>>>
>>> First, conceptually, I don't believe the user model needs to
>>> match the implementation model.  That way lies madness -- users
>>> care about the things they care about and should not have to
>>> understand how the system works to get something basic done.
>>> See:
>>> http://www.amazon.com/The-Inmates-Are-Running-Asylum/dp/0672326140
>>> for reasons why I call this madness.
>>>
>>> For that reason I think the path of adding a --jobs flag to
>>> add-machine is not a move forward.  It is exposing implementation
>>> detail to users and forcing them into a more complex conceptual
>>> model.
>>>
>>> Second, we don't have to boil the ocean all at once. An
>>> "ensure-ha" command that sets up additional server nodes is
>>> better than what we have now -- nothing.  Nate is right, the box
>>> need not be black, we could have an juju ha-status command that
>>> just shows the state of HA.   This is fundamentally different
>>> than changing the behavior and meaning of add-machines to know
>>> about juju jobs and agents and forcing folks to think about
>>> that.
>>>
>>> Third, we I think it is possible to chart a course from ensure-ha
>>> as a shortcut (implemented first) to the type of syntax and
>>> feature set that Kapil is talking about.  And let's not kid
>>> ourselves, there are a bunch of new features in that proposal:
>>>
>>> * Namespaces for services * support for subordinates to state
>>> services * logging changes * lifecycle events on juju "jobs" *
>>> special casing the removal of services that would kill the
>>> environment * special casing the stats to know about HA and warn
>>> for even state server nodes
>>>
>>> I think we will be adding a new concept and some new syntax when
>>> we add HA to juju -- so the idea is just to make it easier for
>>> users to understand, and to allow a path forward to something
>>> like what Kapil suggests in the future.   And I'm pretty solidly
>>> convinced that there is an incremental path forward.
>>>
>>> Fourth, the spelling "ensure-ha" is probably not a very good
>>> idea, the cracks in that system (like taking a -n flag, and
>>> dealing with failed machines) are already apparent.
>>>
>>> I think something like Nick's proposal for "add-manager" would be
>>> better. Though I don't think that's quite right either.
>>>
>>> So, I propose we add one new idea for users -- a state-server.
>>>
>>> then you'd have:
>>>
>>> juju management --info juju management --add juju management
>>> --add --to 3 juju management --remove-from
>>
>> This seems like a reasonable approach in principle (it's
>> essentially isomorphic to the --jobs approach AFAICS which makes me
>> happy).
>>
>> I have to say that I'm not keen on using flags to switch the basic
>> behaviour of a command. The interaction between the flags can then
>> become non-obvious (for example a --constraints flag might be
>> appropriate with --add but not --remove-from).
>>
>> Ah, but your next message seems to go along with that.
>>
>> So, to couch your proposal in terms that are consistent with the
>> rest of the juju commands, here's how I see it could look, in terms
>> of possible help output from the commands:
>>
>> usage: juju add-management [options] purpose: Add Juju management
>> functionality to a machine, or start a new machine with management
>> functionality. Any Juju machine can potentially participate as a
>> Juju manager - this command adds a new such manager. Note that
>> there should always be an odd number of active management machines,
>> otherwise the Juju environment is potentially vulnerable to
>> network partitioning. If a management machine fails, a new one
>> should be started to replace it.
>
> I would probably avoid putting such an emphasis on "any machine can be
> a manager machine". But that is my personal opinion. (If you want HA
> you probably want it on dedicated nodes.)
>
>>
>> options: --constraints  (= ) additional machine constraints.
>> Ignored if --to is specified. -e, --environment (= "local") juju
>> environment to operate in --series (= "") the Ubuntu series of the
>> new machine. Ignored if --to is specified. --to (="") the id of the
>> machine to add management to. If this is not specified, a new
>> machine is provisioned.
>>
>> usage: juju remove-management [options]  purpose:
>> Remove Juju management functionality from the machine with the
>> given id. The machine itself is not destroyed. Note that if there
>> are less than three management machines remaining, the operation of
>> the Juju environment will be vulnerable to the failure of a single
>> machine. It is not possible

Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 12:03, Gustavo Niemeyer  wrote:
> Splitting API and db at some point sounds sensible, but it may be easy and
> convenient to think about a state server as API+db for the time being.

I'd prefer to start with a command name that implies that possibility;
otherwise we'll end up either with a command that doesn't
describe what it actually does, or more very similar commands
where one could be sufficient.

Hence my discomfort with "add-state-server" as a command name.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
We'll end up with a command that adds a state server, with a replica
of the database and an API server. That's the notion of state server
we've been using all along, and sounds quite reasonable, easy to
explain and understand.

On Fri, Nov 8, 2013 at 10:15 AM, roger peppe  wrote:
> On 8 November 2013 12:03, Gustavo Niemeyer  wrote:
>> Splitting API and db at some point sounds sensible, but it may be easy and
>> convenient to think about a state server as API+db for the time being.
>
> I'd prefer to start with a command name that implies that possibility;
> otherwise we'll end up either with a command that doesn't
> describe what it actually does, or more very similar commands
> where one could be sufficient.
>
> Hence my discomfort with "add-state-server" as a command name.



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 13:33, Gustavo Niemeyer  wrote:
> We'll end up with a command that adds a state server, with a replica
> of the database and an API server. That's the notion of state server
> we've been using all along, and sounds quite reasonable, easy to
> explain and understand.

And when we want to split API and db, as you thought perhaps
might be sensible at some point, what then?

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
juju add-state-server --api-only-please-thanks




On Fri, Nov 8, 2013 at 11:43 AM, roger peppe  wrote:
> On 8 November 2013 13:33, Gustavo Niemeyer  wrote:
>> We'll end up with a command that adds a state server, with a replica
>> of the database and an API server. That's the notion of state server
>> we've been using all along, and sounds quite reasonable, easy to
>> explain and understand.
>
> And when we want to split API and db, as you thought perhaps
> might be sensible at some point, what then?



-- 

gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread roger peppe
On 8 November 2013 13:51, Gustavo Niemeyer  wrote:
> juju add-state-server --api-only-please-thanks

And if we want to allow a machine that runs the environment-manager
workers but not the api server or mongo server (not actually an unlikely thing
given certain future possibilities) then add-state-server is a command that
doesn't necessarily add a state server at all... That thought
was the source of my doubt.

That said, it's just a spelling. If there's general agreement on "state-server",
so be it - I'm very happy to move forward with that.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
Reminds me of one of my favorite quotes:

"Knobs are distracting, confusing and annoying.  Personally, I'd rather
things be 90% good 100% of the time than see 90 knobs."  - Brad Fitzpatrick
on having more than one Go scheduler.

https://groups.google.com/forum/#!msg/golang-dev/eu0WzsTtNPo/pcD-zS3JkTYJ


On Fri, Nov 8, 2013 at 9:32 AM, Gustavo Niemeyer wrote:

> On Fri, Nov 8, 2013 at 12:04 PM, roger peppe 
> wrote:
> > On 8 November 2013 13:51, Gustavo Niemeyer  wrote:
> >> juju add-state-server --api-only-please-thanks
> >
> > And if we want to allow a machine that runs the environment-manager
> > workers but not the api server or mongo server (not actually an unlikely
> thing
> > given certain future possibilities) then add-state-server is a command
> that
> > doesn't necessarily add a state server at all... That thought
> > was the source of my doubt.
>
> The fact you can organize things a thousand ways doesn't mean we
> should offer a thousand knobs. A state server is a good abstraction
> for "there are management routines running there". You can define what
> that means, as long as you don't let things fall down when N/2-1
> machines fall down.
>
>
> gustavo @ http://niemeyer.net
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
On Fri, Nov 8, 2013 at 12:04 PM, roger peppe  wrote:
> On 8 November 2013 13:51, Gustavo Niemeyer  wrote:
>> juju add-state-server --api-only-please-thanks
>
> And if we want to allow a machine that runs the environment-manager
> workers but not the api server or mongo server (not actually an unlikely thing
> given certain future possibilities) then add-state-server is a command that
> doesn't necessarily add a state server at all... That thought
> was the source of my doubt.

The fact you can organize things a thousand ways doesn't mean we
should offer a thousand knobs. A state server is a good abstraction
for "there are management routines running there". You can define what
that means, as long as you don't let things fall down when N/2-1
machines fall down.


gustavo @ http://niemeyer.net

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread William Reade
I'm concerned that we're (1) rehashing decisions made during the sprint and
(2) deviating from requirements in doing so.

In particular, abstracting HA away into "management" manipulations -- as
roger notes, pretty much isomorphic to the "jobs" proposal -- doesn't give
users HA so much as it gives them a limited toolkit with which they can
more-or-less construct their own HA; in particular, allowing people to use
an even number of state servers is strictly a bad thing [0], and I'm
extremely suspicious of any proposal that opens that door.

Of course, some will argue that mongo should be able to scale separately
from the api servers and other management tasks, and this is a worthy goal;
but in this context it sucks us down into the morass of exposing different
types of management on different machines, and ends up approaching the jobs
proposal still closer, in that it requires users to assimilate a whole load
of extra terminology in order to perform a conceptually simple function.

Conversely, "ensure-ha" (with possible optional --redundancy=N flag,
defaulting to 1) is a simple model that can be simply explained: the
command's sole purpose is to ensure that juju management cannot fail as a
result to the simultaneous failure of <=N machines. It's a *user-level*
construct that will always be applicable even in the context of a more
sophisticated future language (no matter what's going on with this
complicated management/jobs business, you can run that and be assured
you'll end up with at least enough manager machines to fulfil the
requirement you clearly stated in the command line).

I haven't seen anything that makes me think that redesigning from scratch
is in any way superior to refining what we already agreed upon; and it's
distracting us from the questions of reporting and correcting manager
failure when it occurs. I assert the following series of arguments:

* users may discover at any time that they need to make an existing
environment HA, so ensure-ha is *always* a reasonable user action
* users who *don't* need an HA environment can, by definition, afford to
take the environment down and reconstruct it without HA if it becomes
unimportant
* therefore, scaling management *down* is not the highest priority for us
(but is nonetheless easily amenable to future control via the "ensure-ha"
command -- just explicitly set a lower redundancy number)
* similarly, allowing users to *directly* destroy management machines
enables exciting new failure modes that don't really need to exist

* the notion of HA is somewhat limited in worth when there's no way to make
a vulnerable environment robust again
* the more complexity we shovel onto the user's plate, the less likely she
is to resolve the situation correctly under stress
* the most obvious, and foolproof, command for repairing HA would be
"ensure-ha" itself, which could very reasonably take it upon itself to
replace manager nodes detected as "down" -- assuming a robust presence
implementation, which we need anyway, this (1) works trivially for machines
that die unexpectedly and (2) allows a backdoor for resolution of "weird"
situations: the user can manually shutdown a misbehaving manager
out-of-band, and run ensure-ha to cause a new one to be spun up in its
place; once HA is restored, the old machine will no longer be a manager, no
longer be indestructible, and can be cleaned up at leisure

* the notion is even more limited when you can't even tell when something
goes wrong
* therefore, HA state should *at least* be clearly and loudly communicated
in status
* but that's not very proactive, and I'd like to see a plan for how we're
going to respond to these situations when we detect them

* the data accessible to a manager node is sensitive, and we shouldn't
generally be putting manager nodes on dirty machines; but density is an
important consideration, and I don't think it's confusing to allow
"preferred" machines to be specified in "ensure-ha", such that *if*
management capacity needs to be added it will be put onto those machines
before finding clean ones or provisioning new ones
* strawman syntax: "juju ensure-ha --prefer-machines 11,37" to place any
additional manager tasks that may be required on the supplied machines in
order of preference -- but even this falls far behind the essential goal,
which is "make HA *easy* for our users".
* (ofc, we should continue not to put units onto manager machines by
default, but allow them when forced with --to as before)

I don't believe that any of this precludes more sophisticated management of
juju's internal functions *when* the need becomes pressing -- whether via
jobs, or namespaced pseudo-services, or whatever -- but at this stage I
think it is far better to expose the policies we're capable of supporting,
and thus allow ourselves wiggle room to allow the mechanism to evolve, than
to define a user-facing model that is, at best, a woolly reflection of an
internal model that's likely to change as we explore the so

Re: High Availability command line interface - future plans.

2013-11-08 Thread Nate Finch
Scaling jobs independently doesn't really get you much.  If you need 7
machines of redundancy for mongo... why would you not just also want the
API on all 7 machines?  It's 100% upside... now your API is that much more
redundant/scaled, and we already know the API and mongo run just fine
together on a single machine.

The only point at which it makes sense to break out of "just make N copies
of the whole state server" is:


   1.  if you need to go beyond mongo's 12 node maximum, or
   2. if you want to somehow have HA without using up N extra machines by
   putting bits and pieces on machines also hosting units.


Neither of those seem like critical things we need to support in v1 of HA.
 And we should probably only try to do what is critical for v1.


On Fri, Nov 8, 2013 at 11:00 AM, William Reade
wrote:

> I'm concerned that we're (1) rehashing decisions made during the sprint
> and (2) deviating from requirements in doing so.
>
> In particular, abstracting HA away into "management" manipulations -- as
> roger notes, pretty much isomorphic to the "jobs" proposal -- doesn't give
> users HA so much as it gives them a limited toolkit with which they can
> more-or-less construct their own HA; in particular, allowing people to use
> an even number of state servers is strictly a bad thing [0], and I'm
> extremely suspicious of any proposal that opens that door.
>
> Of course, some will argue that mongo should be able to scale separately
> from the api servers and other management tasks, and this is a worthy goal;
> but in this context it sucks us down into the morass of exposing different
> types of management on different machines, and ends up approaching the jobs
> proposal still closer, in that it requires users to assimilate a whole load
> of extra terminology in order to perform a conceptually simple function.
>
> Conversely, "ensure-ha" (with possible optional --redundancy=N flag,
> defaulting to 1) is a simple model that can be simply explained: the
> command's sole purpose is to ensure that juju management cannot fail as a
> result to the simultaneous failure of <=N machines. It's a *user-level*
> construct that will always be applicable even in the context of a more
> sophisticated future language (no matter what's going on with this
> complicated management/jobs business, you can run that and be assured
> you'll end up with at least enough manager machines to fulfil the
> requirement you clearly stated in the command line).
>
> I haven't seen anything that makes me think that redesigning from scratch
> is in any way superior to refining what we already agreed upon; and it's
> distracting us from the questions of reporting and correcting manager
> failure when it occurs. I assert the following series of arguments:
>
> * users may discover at any time that they need to make an existing
> environment HA, so ensure-ha is *always* a reasonable user action
> * users who *don't* need an HA environment can, by definition, afford to
> take the environment down and reconstruct it without HA if it becomes
> unimportant
> * therefore, scaling management *down* is not the highest priority for us
> (but is nonetheless easily amenable to future control via the "ensure-ha"
> command -- just explicitly set a lower redundancy number)
> * similarly, allowing users to *directly* destroy management machines
> enables exciting new failure modes that don't really need to exist
>
> * the notion of HA is somewhat limited in worth when there's no way to
> make a vulnerable environment robust again
> * the more complexity we shovel onto the user's plate, the less likely she
> is to resolve the situation correctly under stress
> * the most obvious, and foolproof, command for repairing HA would be
> "ensure-ha" itself, which could very reasonably take it upon itself to
> replace manager nodes detected as "down" -- assuming a robust presence
> implementation, which we need anyway, this (1) works trivially for machines
> that die unexpectedly and (2) allows a backdoor for resolution of "weird"
> situations: the user can manually shutdown a misbehaving manager
> out-of-band, and run ensure-ha to cause a new one to be spun up in its
> place; once HA is restored, the old machine will no longer be a manager, no
> longer be indestructible, and can be cleaned up at leisure
>
> * the notion is even more limited when you can't even tell when something
> goes wrong
> * therefore, HA state should *at least* be clearly and loudly communicated
> in status
> * but that's not very proactive, and I'd like to see a plan for how we're
> going to respond to these situations when we detect them
>
> * the data accessible to a manager node is sensitive, and we shouldn't
> generally be putting manager nodes on dirty machines; but density is an
> important consideration, and I don't think it's confusing to allow
> "preferred" machines to be specified in "ensure-ha", such that *if*
> management capacity needs to be added it will be put onto those machin

Re: High Availability command line interface - future plans.

2013-11-08 Thread Mark Canonical Ramm-Christensen
>  On Fri, Nov 8, 2013 at 7:31 PM, Gustavo Niemeyer wrote:
>
>
>> It sounds like we should avoid using a "management" command for
>> anything in juju, though. Most things in juju are about management one
>> way or the other, so "juju management" becomes very unclear and hard
>> to search for.
>>
>
I'd also considered this spelling at one point in my doodling on CLI API
yesterday:

juju ha setup --to [list, of, machines]
creates 3 servers (optionally on the specified machines)

juju ha status
tells me details about the state server status

juju ha add-servers
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-08 Thread Gustavo Niemeyer
It doesn't feel like the difference between

juju ensure-ha --prefer-machines 11,37

and

juju add-state-server --to 11,37

is worth the amount of reasoning there.  I'm clearly in favor of the
latter, but I wouldn't argue so much for it.


On Fri, Nov 8, 2013 at 2:00 PM, William Reade
 wrote:
> I'm concerned that we're (1) rehashing decisions made during the sprint and
> (2) deviating from requirements in doing so.
>
> In particular, abstracting HA away into "management" manipulations -- as
> roger notes, pretty much isomorphic to the "jobs" proposal -- doesn't give
> users HA so much as it gives them a limited toolkit with which they can
> more-or-less construct their own HA; in particular, allowing people to use
> an even number of state servers is strictly a bad thing [0], and I'm
> extremely suspicious of any proposal that opens that door.
>
> Of course, some will argue that mongo should be able to scale separately
> from the api servers and other management tasks, and this is a worthy goal;
> but in this context it sucks us down into the morass of exposing different
> types of management on different machines, and ends up approaching the jobs
> proposal still closer, in that it requires users to assimilate a whole load
> of extra terminology in order to perform a conceptually simple function.
>
> Conversely, "ensure-ha" (with possible optional --redundancy=N flag,
> defaulting to 1) is a simple model that can be simply explained: the
> command's sole purpose is to ensure that juju management cannot fail as a
> result to the simultaneous failure of <=N machines. It's a *user-level*
> construct that will always be applicable even in the context of a more
> sophisticated future language (no matter what's going on with this
> complicated management/jobs business, you can run that and be assured you'll
> end up with at least enough manager machines to fulfil the requirement you
> clearly stated in the command line).
>
> I haven't seen anything that makes me think that redesigning from scratch is
> in any way superior to refining what we already agreed upon; and it's
> distracting us from the questions of reporting and correcting manager
> failure when it occurs. I assert the following series of arguments:
>
> * users may discover at any time that they need to make an existing
> environment HA, so ensure-ha is *always* a reasonable user action
> * users who *don't* need an HA environment can, by definition, afford to
> take the environment down and reconstruct it without HA if it becomes
> unimportant
> * therefore, scaling management *down* is not the highest priority for us
> (but is nonetheless easily amenable to future control via the "ensure-ha"
> command -- just explicitly set a lower redundancy number)
> * similarly, allowing users to *directly* destroy management machines
> enables exciting new failure modes that don't really need to exist
>
> * the notion of HA is somewhat limited in worth when there's no way to make
> a vulnerable environment robust again
> * the more complexity we shovel onto the user's plate, the less likely she
> is to resolve the situation correctly under stress
> * the most obvious, and foolproof, command for repairing HA would be
> "ensure-ha" itself, which could very reasonably take it upon itself to
> replace manager nodes detected as "down" -- assuming a robust presence
> implementation, which we need anyway, this (1) works trivially for machines
> that die unexpectedly and (2) allows a backdoor for resolution of "weird"
> situations: the user can manually shutdown a misbehaving manager
> out-of-band, and run ensure-ha to cause a new one to be spun up in its
> place; once HA is restored, the old machine will no longer be a manager, no
> longer be indestructible, and can be cleaned up at leisure
>
> * the notion is even more limited when you can't even tell when something
> goes wrong
> * therefore, HA state should *at least* be clearly and loudly communicated
> in status
> * but that's not very proactive, and I'd like to see a plan for how we're
> going to respond to these situations when we detect them
>
> * the data accessible to a manager node is sensitive, and we shouldn't
> generally be putting manager nodes on dirty machines; but density is an
> important consideration, and I don't think it's confusing to allow
> "preferred" machines to be specified in "ensure-ha", such that *if*
> management capacity needs to be added it will be put onto those machines
> before finding clean ones or provisioning new ones
> * strawman syntax: "juju ensure-ha --prefer-machines 11,37" to place any
> additional manager tasks that may be required on the supplied machines in
> order of preference -- but even this falls far behind the essential goal,
> which is "make HA *easy* for our users".
> * (ofc, we should continue not to put units onto manager machines by
> default, but allow them when forced with --to as before)
>
> I don't believe that any of this precludes more sophistic

Re: High Availability command line interface - future plans.

2013-11-10 Thread Tim Penhey
On 09/11/13 03:04, roger peppe wrote:
> On 8 November 2013 13:51, Gustavo Niemeyer  wrote:
>> juju add-state-server --api-only-please-thanks
> 
> And if we want to allow a machine that runs the environment-manager
> workers but not the api server or mongo server (not actually an unlikely thing
> given certain future possibilities) then add-state-server is a command that
> doesn't necessarily add a state server at all... That thought
> was the source of my doubt.

I think that it is reasonable to think of just the db and the api server
from the user's point of view.

The fact that we may run other workers along side the api server is up
to us, and not something we actually need to expose to people.

Most of our users should have no problem at all understanding juju:db
and juju:api (or whatever names we call them).

> That said, it's just a spelling. If there's general agreement on 
> "state-server",
> so be it - I'm very happy to move forward with that.

I cringe whenever I see "state" used anywhere.

I would like use to move towards namespaced services with a common
understanding, but I'm happy to have that significantly down the line.

Just remember that whatever command we come up with, it needs to be
easily explained to our new users.  I like the idea of a special command
that handles the HA-ness of juju, because it means we can give
meaningful error messages when people do things not quite right (like
adding just one more mongo db thinking it is enough).

I don't have the will to bike-shed around the actual command we use,
however I strongly suggest that we go with something that makes sense to
Jorge and Marco (and to our CTS folks) as they are our people on the
ground, using this tool.

Cheers,
Tim


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: High Availability command line interface - future plans.

2013-11-11 Thread roger peppe
In the end I think it comes down to a philosophical difference.

I believe in implementing systems from the bottom up out of well
understood simple-as-possible components with easily understood
properties.  I am aware that another approach is to start with a partially
implemented primitive that represents a design goal and fill out its
implementation until it meets that goal.

In this discussion, "ensure-ha" seems to me to epitomise the second
approach and I do understand the arguments for it. With reference to
Mark's mention of "the inmates running the asylum", I realise that, by
that analogy, I am most certainly an inmate here. My ideas about what
might make for a solid and straightforward tool to use are biased by my
knowledge of the structure of the system.

William's response is clear, so ensure-ha it is.  I'm afraid we're back
where we started but I've found this conversation useful and hope that
others have too.

> I don't have the will to bike-shed around the actual command we use,
> however I strongly suggest that we go with something that makes sense to
> Jorge and Marco (and to our CTS folks) as they are our people on the
> ground, using this tool.

It would have been great to have had feedback from the CTS folks (possibly
the biggest current operational users of Juju?) for their views.

  cheers,
rog.

On 11 November 2013 03:50, Tim Penhey  wrote:
> On 09/11/13 03:04, roger peppe wrote:
>> On 8 November 2013 13:51, Gustavo Niemeyer  wrote:
>>> juju add-state-server --api-only-please-thanks
>>
>> And if we want to allow a machine that runs the environment-manager
>> workers but not the api server or mongo server (not actually an unlikely 
>> thing
>> given certain future possibilities) then add-state-server is a command that
>> doesn't necessarily add a state server at all... That thought
>> was the source of my doubt.
>
> I think that it is reasonable to think of just the db and the api server
> from the user's point of view.
>
> The fact that we may run other workers along side the api server is up
> to us, and not something we actually need to expose to people.
>
> Most of our users should have no problem at all understanding juju:db
> and juju:api (or whatever names we call them).
>
>> That said, it's just a spelling. If there's general agreement on 
>> "state-server",
>> so be it - I'm very happy to move forward with that.
>
> I cringe whenever I see "state" used anywhere.
>
> I would like use to move towards namespaced services with a common
> understanding, but I'm happy to have that significantly down the line.
>
> Just remember that whatever command we come up with, it needs to be
> easily explained to our new users.  I like the idea of a special command
> that handles the HA-ness of juju, because it means we can give
> meaningful error messages when people do things not quite right (like
> adding just one more mongo db thinking it is enough).
>
> I don't have the will to bike-shed around the actual command we use,
> however I strongly suggest that we go with something that makes sense to
> Jorge and Marco (and to our CTS folks) as they are our people on the
> ground, using this tool.
>
> Cheers,
> Tim
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: 
> https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev