The current plan is to have a single "juju ensure-ha-state" juju
command. This would create new state server machines if there are less
than the required number (currently 3).

Taking that as given, I'm wondering what we should do
in the future, when users require more than a single
big On switch for HA.

How does the user:

a) know about the HA machines so the costs of HA are not hidden, and that
the implications of particular machine failures are clear?

b) fix the system when a machine dies?

c) scale up the system to x thousand nodes?

d) scale down the system?

For a), we could tag a machine in the status as a "state server", and
hope that the user knows what that means.

For b) the suggestion is that the user notice that a state server machine
is non-responsive (as marked in status) and runs destroy-machine on it,
which will notice that it's a state server machine and automatically
start another one to replace it. Destroy-machine would refuse to work
on a state server machine that seems to be alive.

For c) we could add a flag to ensure-ha-state suggesting a desired number
of state-server nodes.

I'm not sure what the suggestion is for d) given that we refuse to
destroy live state-server machines.

Although ensure-ha-state might be a fine way to turn
on HA initially I'm not entirely happy with expanding it to cover
all the above cases. It seems to me like we're going
to create a leaky abstraction that purports to be magic ("just wave the
HA wand!") and ends up being limiting, and in some cases confusing
("Huh? I asked to destroy that machine and there's another one
just been created")

I believe that any user that's using HA will need to understand that
some machines are running state servers, and when things fail, they
will need to manage those machines individually (for example by calling
destroy-machine).

I also think that the solution to c) is limiting, because there is
actually no such thing as a "state server" - we have at least three
independently scalable juju components (the database servers (mongodb),
the API servers and the environment managers) with different scaling
characteristics. I believe that in any sufficiently large environment,
the user will not want to scale all of those at the same rate. For example
MongoDB will allow at most 12 members of a replica set, but a caching API
server could potentially usefully scale up much higher than that. We could
add more flags to ensure-ha-state (e.g.--state-server-count) but we then
we'd lack the capability to suggest which might be grouped with which.

PROPOSAL

My suggestion is that we go for a "slightly less magic" approach.
that provides the user with the tools to manage
their own high availability set up, adding appropriate automation in time.

I suggest that we let the user know that machines can run as juju server
nodes, and provide them with the capability to *choose* which machines
will run as server nodes and which can host units - that is, what *jobs*
a machine will run.

Here's a possible proposal:

We already have an "add-machine" command. We'd add a "--jobs" flag
to allow the user to specify the jobs that the new machine(s) will
run. Initially we might have just two jobs, "manager" and "unit"
- the machine can either host service units, or it can manage the
juju environment (including running the state server database),
or both. In time we could add finer levels of granularity to allow
separate scalability of juju server components, without losing backwards
compatibility.

If the new machine is marked as a "manager", it would run a mongo
replica set peer. This *would* mean that it would be possible to have
an even number of mongo peers, with the potential for a split vote
if the nodes were partitioned evenly, and resulting database stasis.
I don't *think* that would actually be a severe problem in practice.
We would make juju status point out the potential problem very clearly,
just as it should point out the potential problem if one of an existing
odd-sized replica set dies. The potential problems are the same in both
cases, and are straightforward for even a relatively naive user to avoid.

Thus, juju ensure-ha-state is almost equivalent to:

    juju add-machine --jobs manager -n 2

In my view, this command feels less "magic" than ensure-ha-state - the
runtime implication (e.g. cost) of what's going on are easier for the
user to understand and it requires no new entities in a user's model of
the system.

In addition to the new add-machine flag, we'd add a single new command,
"juju machine-jobs", which would allow the user to change the jobs
associated with an existing machine.  That could be a later addition -
it's not necessary in the first cut.

With these primitives, I *think* the responsibilities of the system and
the model to the user become clearer.  Looking back to the original
user questions:

a) The "state manager" status of certain machines in the status is no
longer something entirely divorced from user control - it means something
in terms of the commands the user is provided with.

b) The user already knows about destroy-machine.  They can manage broken
state manager machines just as they would manage any other broken machine.
Destroy-machine would refuse to destroy the any state server machine that
would take the currently connected set of mongo peers below a majority.

c) We already have add-machine.

d) We already have destroy-machine. See c) above.

REQUEST FOR COMMENTS

If there is broad agreement on the above, then I propose that
we start off by implementing ensure-ha-state with as little
internal logic as possible - we don't necessarily need to put the transactional
logic in to make sure it works without starting extra machines
when many people are calling ensure-ha-state concurrently, for example.

In fact, ensure-ha-state could probably be written as a thin layer
on top of add-machine --jobs.

Thoughts?

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to