LGTM with nits On Tue, Sep 1, 2015 at 8:38 PM, Klaus Aehlig <[email protected]> wrote:
> > Thanks for your review. I'd like to add the following interdiff > > > commit 3ae104892fb695830d69d557c9706e202014030d > Author: Klaus Aehlig <[email protected]> > Date: Tue Sep 1 20:35:20 2015 +0200 > > Interdiff > > diff --git a/doc/design-memory-over-commitment.rst > b/doc/design-memory-over-commitment.rst > index 2a84085..5489a9a 100644 > --- a/doc/design-memory-over-commitment.rst > +++ b/doc/design-memory-over-commitment.rst > @@ -10,9 +10,9 @@ overcommitment in Ganeti. > Background > ========== > > -Memory is a non-preemptable resource, thus cannot be shared, e.g., > +Memory is a non-preemptable resource, and thus cannot be shared, e.g., > in a round-robin fashion. Therefore, Ganeti is very careful to make > -sure, there is always enough physical memory for the memory promised > +sure there is always enough physical memory for the memory promised > to the instances. In fact, even in an N+1 redundant way: should one > node fail, its instances can be relocated to other nodes while still > having enough physical memory for the memory promised to all instances. > @@ -20,15 +20,15 @@ having enough physical memory for the memory promised > to all instances. > Overview over the current memory model > -------------------------------------- > > -To make decissions, ``htools`` query the following parameter from Ganeti. > +To make decisions, ``htools`` query the following parameters from Ganeti. > > - The amount of memory used by each instance. This is the state-of-record > backend parameter ``maxmem`` for that instance (maybe inherited from > group-level or cluster-level backend paramters). It tells the hypervisor > the maximal amount of memory that instance may use. > > -- The state-of-world paramters for the node memory. They are collected > - live and are hypervisor specific. The following paramters are collected. > +- The state-of-world parameters for the node memory. They are collected > + live and are hypervisor specific. The following parameters are > collected. > > - memory_total: the total memory size on the node > > @@ -37,7 +37,7 @@ To make decissions, ``htools`` query the following > parameter from Ganeti. > - memory_dom0: the memory used by the node itself, if available > > For Xen, the amount of total and free memory are obtained by parsing > - the outout of Xen ``info`` command (e.g., ``xm info``). The dom0 > + the output of Xen ``info`` command (e.g., ``xm info``). The dom0 > memory is obtained by looking in the output of the ``list`` command > for ``Domain-0``. > > @@ -51,12 +51,12 @@ To make decissions, ``htools`` query the following > parameter from Ganeti. > Current state and shortcomings > ============================== > > -While the current model of never over provisioning memory serves well > +While the current model of never over committing memory serves well > to provide reliability guarantees to instances, it does not suit well > situations were the actual use of memory in the instances is spiky. > Consider > a scenario where instances only touch a small portion of their memory most > -of the time, but occasionally use a large amount memory. Then, at any > moment, > -a large fraction of the memory used for the instances sits there without > +of the time, but occasionally use a large amount of memory. Then, at any > moment, > +a large fraction of the memory used for the instances sits around without > being actively used. By swapping out the not actively used memory, > resources > can be used more efficiently. > > @@ -64,15 +64,15 @@ Proposed changes > ================ > > We propose to support over commitment of memory if desired by the > -administrator. Memory will change from beeing a hard constraint to > +administrator. Memory will change from being a hard constraint to > being a question of policy. The default will be not to over commit > memory. > > Extension of the policy by a new parameter > ------------------------------------------ > > -The instance policy is extenden by a new real-number field > ``memory-ratio``. > -Policies on groups inherit this paramter from the cluster wide policy in > the > +The instance policy is extended by a new real-number field > ``memory-ratio``. > +Policies on groups inherit this parameter from the cluster wide policy in > the > same way as all other parameters of the instance policy. > > When a cluster is upgraded from an earlier version not containing > @@ -100,9 +100,9 @@ Therefore, for ``kvm`` we will report as ``dom0`` the > state-of-record > backend paramter ``memory_dom0`` for the ``kvm`` hypervisor. As a > hypervisor > backend paramter, it is run-time tunable and inheritable at group level. > If this paramter is not present, a default value of ``1024M`` will be > used, > -which is conservative estimate of the amount of memory used by Ganeti on a > +which is a conservative estimate of the amount of memory used by Ganeti > on a > medium-sized cluster. The reason for using a state-of-record value is to > -have a stable amount of reserved memory, irrsespectively of the current > activity > +have a stable amount of reserved memory, irrespective of the current > activity > of Ganeti. > > > @@ -112,9 +112,11 @@ Changes to the memory policy > The memory policy will be changed in that we assume that one byte > of physical node memory can hold ``memory-ratio`` bytes of instance > memory, but still only one byte of Ganeti memory. Of course, in practise > -this has to be backed by swap sapce; it is the administrator > responsibility > +this has to be backed by swap space; it is the administrator > responsibility > Nit: administrator's > to ensure that each node has swap of at > -least ``(memory-ratio - 1.0) * (memory_total - memory_dom0)``. > +least ``(memory-ratio - 1.0) * (memory_total - memory_dom0)``. Ganeti > +will warn if the amount of swap space is not big enough. > + > > The new memory policy will be as follows. > > @@ -134,9 +136,9 @@ The new memory policy will be as follows. > memory-ratio this is considered a hard violation, otherwise > it is considered a soft violation. > > -- The definition of N+1 redunandcy (including > +- The definition of N+1 redundancy (including > :doc:`design-shared-storage-redundancy`) is kept literally as is. > - Note, however, that the meaning does change, the definition depends > + Note, however, that the meaning does change, as definition depends > s/as/as the/ > on the notion of allowed moves, which is changed by this proposal. > > > @@ -147,6 +149,9 @@ The only place where the Ganeti core handles memory is > when ``gnt-cluster verify`` verifies N+1 redundancy. This code will be > changed > to follow the new memory model. > > +Additionally, ``gnt-cluster verify`` will warn if the sum of available > memory > +and swap space is not at least as big as the used memory. > + > Changes to ``htools`` > --------------------- > > @@ -166,8 +171,12 @@ discardable memory will serve as an indication of > memory activity; > as usual, the measure will be the standard deviation of the relative > value (i.e., the ratio of non-discardable memory to available > memory). The weighting for this metric component will have to be > -determined by experimentation; as starting point we will use the > -status quo value of ``1.0``. > +determined by experimentation and will depend on the memory ratio; > +for a memory ratio of ``1.0`` the weight will be ``0.0``, as memory > +need not be taken into account if no over-commitment is in place. > +As a starting point for experimenation we will use weight ``0.0`` > s/experimenation/experimentation/ > +if the memory ratio is ``1.0`` and weight ``1.0`` if it is bigger > +than ``1.0``. > > > > > -- > Klaus Aehlig > Google Germany GmbH, Dienerstr. 12, 80331 Muenchen > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschaeftsfuehrer: Graham Law, Christine Elizabeth Flores > Hrvoje Ribicic Ganeti Engineering Google Germany GmbH Dienerstr. 12, 80331, München Geschäftsführer: Graham Law, Christine Elizabeth Flores Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
