While we cannot avoid data loss on node crashs if we have plain instances, we can ensure that the cluster has enough capacity to reinstall the instances on a new node. Add a design describing how we enusre this.
Signed-off-by: Klaus Aehlig <[email protected]> --- Makefile.am | 1 + doc/design-draft.rst | 1 + doc/design-plain-redundancy.rst | 61 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+) create mode 100644 doc/design-plain-redundancy.rst diff --git a/Makefile.am b/Makefile.am index ea447a1..c4bcf04 100644 --- a/Makefile.am +++ b/Makefile.am @@ -704,6 +704,7 @@ docinput = \ doc/design-os.rst \ doc/design-ovf-support.rst \ doc/design-partitioned.rst \ + doc/design-plain-redundancy.rst \ doc/design-performance-tests.rst \ doc/design-query-splitting.rst \ doc/design-query2.rst \ diff --git a/doc/design-draft.rst b/doc/design-draft.rst index dc77f64..e4dd2e0 100644 --- a/doc/design-draft.rst +++ b/doc/design-draft.rst @@ -27,6 +27,7 @@ Design document drafts design-multi-storage-htools.rst design-shared-storage-redundancy.rst design-repaird.rst + design-plain-redundancy.rst .. vim: set textwidth=72 : .. Local Variables: diff --git a/doc/design-plain-redundancy.rst b/doc/design-plain-redundancy.rst new file mode 100644 index 0000000..0233bcd --- /dev/null +++ b/doc/design-plain-redundancy.rst @@ -0,0 +1,61 @@ +====================================== +Redundancy for the plain disk template +====================================== + +.. contents:: :depth: 4 + +This document describes how N+1 redundancy is achieved +for instanes using the plain disk template. + + +Current state and shortcomings +============================== + +Ganeti has long considered N+1 redundancy for DRBD, making sure that +on the secondary nodes enough memory is reserved to host the instances, +should one node fail. Recently, ``htools`` have been extended to +also take :doc:`design-shared-storage-redundancy` into account. + +For plain instances, there is no direct notion of redundancy: if the +node the instance is running on dies, the instance is lost. However, +if the instance can be reinstalled (e.g, because it is providing a +stateless service), it does make sense to ask if the remaining nodes +have enough free capacity for the instances to be recreated. This +form of capacity planning is currently not addressed by current +Ganeti. + + +Proposed changes +================ + +The basic considerations follow those of :doc:`design-shared-storage-redundancy`. +Also, the changes to the tools follow the same pattern. + +Definition of N+1 redundancy in the presence of shared and plain storage +------------------------------------------------------------------------ + +A cluster is considered N+1 redundant, if, for every node, the following +steps can be carried out. First all DRBD instances are migrated out. Then, +all shared-storage instances of that node are relocated to another node in +the same node group. Finally, all plain instances of that node are reinstalled +on a different node in the same node group; in the search for a new nodes for +the plain instances, they will be recreated in order of decreasing memory +size. + +Note that the first two setps are those in the definition of N+1 redundancy +for shared storage. In particular, this notion of redundancy strictly extends +the one for shared storage. Again, checking this notion of redundancy is +computationally expensive and the non-DRBD part is mainly a capacity property +in the sense that we expect the majority of instance moves that are fine +from a DRBD point of view will not lead from a redundant to a non-redundant +situation. + +Modifications to existing tools +------------------------------- + +The changes to the exisiting tools are literally the same as +for :doc:`design-shared-storage-redundancy` with the above definition of +N+1 redundancy substituted in for that of redundancy for shared storage. +In particular, ``gnt-cluster verify`` will not be changed and ``hbal`` +will use N+1 redundancy as a final filter step to disallow moves +that lead from a redundant to a non-redundant situation. -- 2.4.3.573.g4eafbef
