On Wed, Jun 24, 2015 at 11:28:50AM +0200, 'Klaus Aehlig' via ganeti-devel wrote:
While we cannot avoid data loss on node crashs if we
have plain instances, we can ensure that the cluster
has enough capacity to reinstall the instances on a
new node. Add a design describing how we enusre this.

Signed-off-by: Klaus Aehlig <[email protected]>
---
Makefile.am                     |  1 +
doc/design-draft.rst            |  1 +
doc/design-plain-redundancy.rst | 61 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+)
create mode 100644 doc/design-plain-redundancy.rst

diff --git a/Makefile.am b/Makefile.am
index ea447a1..c4bcf04 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -704,6 +704,7 @@ docinput = \
        doc/design-os.rst \
        doc/design-ovf-support.rst \
        doc/design-partitioned.rst \
+       doc/design-plain-redundancy.rst \
        doc/design-performance-tests.rst \
        doc/design-query-splitting.rst \
        doc/design-query2.rst \
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index dc77f64..e4dd2e0 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -27,6 +27,7 @@ Design document drafts
   design-multi-storage-htools.rst
   design-shared-storage-redundancy.rst
   design-repaird.rst
+   design-plain-redundancy.rst

.. vim: set textwidth=72 :
.. Local Variables:
diff --git a/doc/design-plain-redundancy.rst b/doc/design-plain-redundancy.rst
new file mode 100644
index 0000000..0233bcd
--- /dev/null
+++ b/doc/design-plain-redundancy.rst
@@ -0,0 +1,61 @@
+======================================
+Redundancy for the plain disk template
+======================================
+
+.. contents:: :depth: 4
+
+This document describes how N+1 redundancy is achieved
+for instanes using the plain disk template.
+
+
+Current state and shortcomings
+==============================
+
+Ganeti has long considered N+1 redundancy for DRBD, making sure that
+on the secondary nodes enough memory is reserved to host the instances,
+should one node fail. Recently, ``htools`` have been extended to
+also take :doc:`design-shared-storage-redundancy` into account.
+
+For plain instances, there is no direct notion of redundancy: if the
+node the instance is running on dies, the instance is lost. However,
+if the instance can be reinstalled (e.g, because it is providing a
+stateless service), it does make sense to ask if the remaining nodes
+have enough free capacity for the instances to be recreated. This
+form of capacity planning is currently not addressed by current
+Ganeti.
+
+
+Proposed changes
+================
+
+The basic considerations follow those of 
:doc:`design-shared-storage-redundancy`.
+Also, the changes to the tools follow the same pattern.
+
+Definition of N+1 redundancy in the presence of shared and plain storage
+------------------------------------------------------------------------
+
+A cluster is considered N+1 redundant, if, for every node, the following
+steps can be carried out. First all DRBD instances are migrated out. Then,
+all shared-storage instances of that node are relocated to another node in
+the same node group. Finally, all plain instances of that node are reinstalled
+on a different node in the same node group; in the search for a new nodes for
+the plain instances, they will be recreated in order of decreasing memory
+size.
+
+Note that the first two setps are those in the definition of N+1 redundancy
+for shared storage. In particular, this notion of redundancy strictly extends
+the one for shared storage. Again, checking this notion of redundancy is
+computationally expensive and the non-DRBD part is mainly a capacity property
+in the sense that we expect the majority of instance moves that are fine
+from a DRBD point of view will not lead from a redundant to a non-redundant
+situation.
+
+Modifications to existing tools
+-------------------------------
+
+The changes to the exisiting tools are literally the same as
+for :doc:`design-shared-storage-redundancy` with the above definition of
+N+1 redundancy substituted in for that of redundancy for shared storage.
+In particular, ``gnt-cluster verify`` will not be changed and ``hbal``
+will use N+1 redundancy as a final filter step to disallow moves
+that lead from a redundant to a non-redundant situation.
--
2.4.3.573.g4eafbef


LGTM

Reply via email to