On Thu, Feb 27, 2014 at 4:35 PM, Ilias Tsitsimpis <ilias...@grnet.gr> wrote:
> This patch adds a design document detailing the implementation of disks > as a new top-level citizen in the config file (just like instances, > nodes etc). > > Signed-off-by: Ilias Tsitsimpis <ilias...@grnet.gr> > --- > > Hello team, > > As discussed in GanetiCon 2013, we would like to have disks as top-level > citizens in the config file. This patch adds the design document > detailing the steps needed to achieve that. > > Once we agree on the first step and have the code merged we can extend > the design doc to include the next planned features. > > Kind Regards, > Ilias > > > doc/design-disks.rst | 181 > +++++++++++++++++++++++++++++++++++++++++++++++++++ > doc/index.rst | 1 + > 2 files changed, 182 insertions(+) > create mode 100644 doc/design-disks.rst > > diff --git a/doc/design-disks.rst b/doc/design-disks.rst > new file mode 100644 > index 0000000..f9c9f5e > --- /dev/null > +++ b/doc/design-disks.rst > @@ -0,0 +1,181 @@ > +===== > +Disks > +===== > + > +.. contents:: :depth: 4 > + > +This is a design document detailing the implementation of disks as a new > +top-level citizen in the config file (just like instances, nodes etc). > + > + > +Current state and shortcomings > +============================== > + > +Currently, Disks are stored in Ganeti's config file as a list > +(container) of Disk objects under the Instance in which they belong. > +This implementation imposes a number of limitations: > + > +* A Disk object cannot live outside an Instance. This means that one > + cannot detach a disk from an instance (without destroying the disk) > + and then reattach it (to the same or even to a different instance). > + > +* Disks are not taggable objects, as only top-level citizens of the > + config file can be made taggable. Having taggable disks will allow for > + further customizations. > + > + > +Proposed changes > +================ > + > +The implementation is going to be split in four parts: > + > +* Make disks a top-level citizen in config file. The Instance object > + will no longer contain a list of Disk objects, but a list of disk > + UUIDs. > + > +* Add locks for Disk objects and make them taggable. > + > +* Allow to attach/detach an existing disk to/from an instance. > + > +* Allow creation/modification/deletion of disks that are not attached to > + any instance (requires new LUs for disks). > + > + > +Design decisions > +================ > + > +Disks as config top-level citizens > +---------------------------------- > + > +The first patch-series is going to add a new top-level citizen in the > +config object (namely ``disks``) and separate the disk objects from the > +instances. In doing so there are a number of problems that we have to > +overcome: > + > +* How the Disk object will be represented in the config file and how it > + is going to be connected with the instance it belongs to. > + > +* How the existing code will get the disks belonging to an instance. > + > +* What it means for a disk to be attached/detached to/from an instance. > + > +* How disks are going to be created/deleted, attached/detached using > + the existing code. > + > + > +Disk representation > +~~~~~~~~~~~~~~~~~~~ > + > +The ``Disk`` object gets two extra slots, ``_TIMESTAMPS`` and > +``serial_no``. > + > +The ``Instance`` object will no longer contain the list of disk objects > +that are attached to it. Instead, an Instance object will refer to its > +disks using their UUIDs. Since the order in which the disks are attached > +to an instance is important we are going to have a list of disk UUIDs > +under the Instance object which will denote the disks attached to the > +instance and their order at the same time. So the Instance's ``disks`` > +slot is going to be a list of disk UUIDs. The `Disk` object is not going > +to have a slot pointing to the `Instance` in which it belongs since this > +is redundant. > + > +From now on we will say that a disk is attached to an instance if the > +Instance's `disks` slot contains the disk's uuid and we will > +attach/detach a disk by manipulating this slot. > + > + > +Get instance's disks > +~~~~~~~~~~~~~~~~~~~~ > + > +A new function ``GetInstanceDisks`` will be added to the config that > +given an instance will return a list of Disk objects with the disks > +attached to this instance. This list will be exactly the same as > +'instance.disks' was before. Because everywhere in the code base we make > +the assumption that the disk objects are accessible through the instance > +object, and because we would like not to have to ask the config file > +every time we want to get the disks of an instance, we are going to > +annotate the Instance object with it's disks as soon as we read it from > +the config. This means that ``Instance`` object will have a new slot > +named ``disks_info`` that at run-time will be identical to the current > +``disks``. This new slot will be filled with the disks objects when we > +read the Instance object from the config file (``GetInstanceInfo``). > +When the ``Instance`` object is going to be written back to the config > +file, the ``ToDict`` property will extract the disk UUIDs from the > +`disks_info` slot, save them to the ``disks`` slot and discard the > +`disks_info` one. > This sounds like a premature optimization. Are there any good reasons to not just call config.GetInstanceDisks(instance) wherever instance.disks was before? If this proves to be a performance bottleneck, which I strongly doubt, then we can still look for an optimization. Note that we already do that this way every time we access the nodes of an instance. > + > +Everywhere in the code we are going to replace the 'instance.disks' > +(which from now one will contain a list of disk UUIDs) with the > +'instance.disks_info'. Other than that there should be no other changes > +(the `Instance` object is going to be treated as before since it will > +contain the list of its `Disk` objects). > + > + > +Configuration changes > +~~~~~~~~~~~~~~~~~~~~~ > + > +The ``ConfigData`` object gets one extra slot: ``disks``. Also there > +will be two new functions, ``AddDisk`` and ``RemoveDisk`` that will > +create/remove a disk objects from the config. > + > +The ``VerifyConfig`` function will be changed so it can check that there > +are no dangling pointers from instances to disks (i.e. an instance > +points to a disk that doesn't exist in the config). > Note that one disk should only ever be attached to either zero or one instance. So we should test for this as well. > + > + > +Apply Disk modifications > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Disk modifications are done through the ``ApplyContainerMods`` function. > +``ApplyContainerMods`` support creating, deletion and modification of a > +disk. We will continue to use the existing code (``ApplyContainerMods``) > +for the disks as follows: > + > +**Create a new disk:** > + > +* ``ApplyContainerMods`` calls ``_CreateNewDisks`` which creates the > + disk and adds it to the config file. > + > +* ``ApplyContainerMods`` itself adds the newly created Disk object to > + Instance's `disks_info` slot with simple container modifications (i.e. > + attaches the disk to the instance). > + > +* ``ApplyContainerMods`` calls ``_PostAddDisk`` takes care of hotplugging. > + > +**Modify a disk:** > + > +* ``ApplyContainerMods`` calls ``_ModifyDisk`` which modifies the disk > + and calls config's ``Update`` function to save back the Disk object. > + > +**Delete a disk:** > + > +* ``ApplyContainerMods`` calls ``_PreRemoveDisk`` which takes care of > + hotplugging. This function doesn't exist and has to be implemented. > + > +* ``ApplyContainerMods`` itself removes the disk object from the > + Instance's `disk_info` slot (i.e. detaches the disk from the > + instance). > + > +* ``ApplyContainerMods`` deletes the disk and removes it from the config > + file by invoking ``RemoveDisk``. > + > + > +Backend changes > +~~~~~~~~~~~~~~~ > + > +During the RPC, the instance will get annotated with its disk. So in the > +backend we will only have to replace ``disks`` slot with ``disks_info``. > I don't really like the concept of changing/annotating objects (no matter if its nodes, instances, disks, etc.) before sending them over RPC. It's rather confusing, because some slots of the object are only filled when executed as part of a RPC by noded. And this also hinders code reuse a bit, because there can be "config" objects and "sent-over-RPC" objects which are of the same type but are not quite the same. Personally, I would prefer a solution where the instance object is sent as-is over RPC (so only containing a disk-list with UUIDs) and an additional parameter (disks? instance_disks?) is introduced which is essentially a dict of UUID -> disk object (or simply the list returned by config.GetInstanceDisks(instance)). This will require some refactoring in the short term, but certainly creates cleaner code in the long term. > + > + > +.. TODO: Locks for Disk objects > + > +.. TODO: Attach/Dettach disks > + > +.. TODO: LUs for disks > I see some more TODOs (not necessarily in priority order): * Migration/Failover: How are those operations affected by this change? An instance with a plain disk attached to it can't be migrated any more, DRBD disks reduce the set of potential migration targets. * Disk activation: Right now, `gnt-instance activate-disks` activates all disks of an instance. How will that change? Individual disks should have the `activated` flag, which is right now a flag in the instance. * How does this play together with disk templates? Should disk templates for instances go away altogether? * DRBD specifics: DRBD disks are right now a tree of disks and the reason why disks can have children. IMHO, that was a really bad idea and should be removed completely (all the recursive function dealing with children were probably never tested and just litter the code base). I think that's the opportunity to fix this and explicitly _not_ have DRBD disks with "pointers" to child disks, but let them handle their base LVM disks directly (while sharing code, obviously). * Configuration upgrades, and much harder, downgrades (we have to support at least the case of an upgrade and an immediate downgrade, but it would be good if we could as well support "compatible" cluster operations before a downgrade). I'm aware of the fact that we can't do all of those at once, and that we have to start from somewhere. I just wanted to point those points out so they are not forgotten in the design of what we're doing first. Cheers, Thomas + > + > +.. vim: set textwidth=72 : > +.. Local Variables: > +.. mode: rst > +.. fill-column: 72 > +.. End: > diff --git a/doc/index.rst b/doc/index.rst > index ab2cccf..419d560 100644 > --- a/doc/index.rst > +++ b/doc/index.rst > @@ -97,6 +97,7 @@ Draft designs > design-cmdlib-unittests.rst > design-cpu-pinning.rst > design-device-uuid-name.rst > + design-disks.rst > design-file-based-storage.rst > design-hroller.rst > design-hotplug.rst > -- > 1.8.5.3 > -- Thomas Thrainer | Software Engineer | thoma...@google.com | Google Germany GmbH Dienerstr. 12 80331 München Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Geschäftsführer: Graham Law, Christine Elizabeth Flores