On Tue, Mar 18, 2014 at 10:39AM, Jose A. Lopes wrote: > On Feb 28 13:39, Ilias Tsitsimpis wrote: > > On Fri, Feb 28, 2014 at 09:24AM, Thomas Thrainer wrote: > > > On Thu, Feb 27, 2014 at 4:35 PM, Ilias Tsitsimpis <ilias...@grnet.gr> > > > wrote: > > > > > > > This patch adds a design document detailing the implementation of disks > > > > as a new top-level citizen in the config file (just like instances, > > > > nodes etc). > > > > > > > > Signed-off-by: Ilias Tsitsimpis <ilias...@grnet.gr> > > > > --- > > > > > > > > Hello team, > > > > > > > > As discussed in GanetiCon 2013, we would like to have disks as top-level > > > > citizens in the config file. This patch adds the design document > > > > detailing the steps needed to achieve that. > > > > > > > > Once we agree on the first step and have the code merged we can extend > > > > the design doc to include the next planned features. > > > > > > > > Kind Regards, > > > > Ilias > > > > > > > > > > > > doc/design-disks.rst | 181 > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > doc/index.rst | 1 + > > > > 2 files changed, 182 insertions(+) > > > > create mode 100644 doc/design-disks.rst > > > > > > > > diff --git a/doc/design-disks.rst b/doc/design-disks.rst > > > > new file mode 100644 > > > > index 0000000..f9c9f5e > > > > --- /dev/null > > > > +++ b/doc/design-disks.rst > > > > @@ -0,0 +1,181 @@ > > > > +===== > > > > +Disks > > > > +===== > > > > + > > > > +.. contents:: :depth: 4 > > > > + > > > > +This is a design document detailing the implementation of disks as a > > > > new > > > > +top-level citizen in the config file (just like instances, nodes etc). > > > > + > > > > + > > > > +Current state and shortcomings > > > > +============================== > > > > + > > > > +Currently, Disks are stored in Ganeti's config file as a list > > > > +(container) of Disk objects under the Instance in which they belong. > > > > +This implementation imposes a number of limitations: > > > > + > > > > +* A Disk object cannot live outside an Instance. This means that one > > > > + cannot detach a disk from an instance (without destroying the disk) > > > > + and then reattach it (to the same or even to a different instance). > > > > + > > > > +* Disks are not taggable objects, as only top-level citizens of the > > > > + config file can be made taggable. Having taggable disks will allow > > > > for > > > > + further customizations. > > > > + > > > > + > > > > +Proposed changes > > > > +================ > > > > + > > > > +The implementation is going to be split in four parts: > > > > + > > > > +* Make disks a top-level citizen in config file. The Instance object > > > > + will no longer contain a list of Disk objects, but a list of disk > > > > + UUIDs. > > > > + > > > > +* Add locks for Disk objects and make them taggable. > > > > + > > > > +* Allow to attach/detach an existing disk to/from an instance. > > > > + > > > > +* Allow creation/modification/deletion of disks that are not attached > > > > to > > > > + any instance (requires new LUs for disks). > > > > + > > > > + > > > > +Design decisions > > > > +================ > > > > + > > > > +Disks as config top-level citizens > > > > +---------------------------------- > > > > + > > > > +The first patch-series is going to add a new top-level citizen in the > > > > +config object (namely ``disks``) and separate the disk objects from the > > > > +instances. In doing so there are a number of problems that we have to > > > > +overcome: > > > > + > > > > +* How the Disk object will be represented in the config file and how it > > > > + is going to be connected with the instance it belongs to. > > > > + > > > > +* How the existing code will get the disks belonging to an instance. > > > > + > > > > +* What it means for a disk to be attached/detached to/from an instance. > > > > + > > > > +* How disks are going to be created/deleted, attached/detached using > > > > + the existing code. > > > > + > > > > + > > > > +Disk representation > > > > +~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +The ``Disk`` object gets two extra slots, ``_TIMESTAMPS`` and > > > > +``serial_no``. > > > > + > > > > +The ``Instance`` object will no longer contain the list of disk objects > > > > +that are attached to it. Instead, an Instance object will refer to its > > > > +disks using their UUIDs. Since the order in which the disks are > > > > attached > > > > +to an instance is important we are going to have a list of disk UUIDs > > > > +under the Instance object which will denote the disks attached to the > > > > +instance and their order at the same time. So the Instance's ``disks`` > > > > +slot is going to be a list of disk UUIDs. The `Disk` object is not > > > > going > > > > +to have a slot pointing to the `Instance` in which it belongs since > > > > this > > > > +is redundant. > > > > + > > > > +From now on we will say that a disk is attached to an instance if the > > > > +Instance's `disks` slot contains the disk's uuid and we will > > > > +attach/detach a disk by manipulating this slot. > > > > + > > > > + > > > > +Get instance's disks > > > > +~~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +A new function ``GetInstanceDisks`` will be added to the config that > > > > +given an instance will return a list of Disk objects with the disks > > > > +attached to this instance. This list will be exactly the same as > > > > +'instance.disks' was before. Because everywhere in the code base we > > > > make > > > > +the assumption that the disk objects are accessible through the > > > > instance > > > > +object, and because we would like not to have to ask the config file > > > > +every time we want to get the disks of an instance, we are going to > > > > +annotate the Instance object with it's disks as soon as we read it from > > > > +the config. This means that ``Instance`` object will have a new slot > > > > +named ``disks_info`` that at run-time will be identical to the current > > > > +``disks``. This new slot will be filled with the disks objects when we > > > > +read the Instance object from the config file (``GetInstanceInfo``). > > > > +When the ``Instance`` object is going to be written back to the config > > > > +file, the ``ToDict`` property will extract the disk UUIDs from the > > > > +`disks_info` slot, save them to the ``disks`` slot and discard the > > > > +`disks_info` one. > > > > > > > > > > This sounds like a premature optimization. Are there any good reasons to > > > not just call config.GetInstanceDisks(instance) wherever instance.disks > > > was > > > before? If this proves to be a performance bottleneck, which I strongly > > > doubt, then we can still look for an optimization. > > > Note that we already do that this way every time we access the nodes of an > > > instance. > > > > > > > > > > It doesn't seem to be a performance bottleneck and it will work. The > > only downside of this approach is that it breaks the unification of > > 'ApplyContainerMods'. Since disk objects will not be part of the > > instance, 'ApplyContainerMods' will not be able to attach/detach them so > > we have to introduce two new functions (AttachInstDisk and > > DetachInstDisk) to config. If you are ok with that, we can proceed as > > you say. > > I agree with Thomas. It doesn't make sense to have two slots to > represent the same thing and it will only create problems in making sure > the two slots have a consistent state. And also making sure > serialization is correct. If you want to make it easy to get the disk > objects out of the disk UUIDs, just add some helper functions to the > config or the instance class. >
ACK. I will implement it this way and update the design doc accordingly. Thanks, Ilias > > > > + > > > > +Everywhere in the code we are going to replace the 'instance.disks' > > > > +(which from now one will contain a list of disk UUIDs) with the > > > > +'instance.disks_info'. Other than that there should be no other changes > > > > +(the `Instance` object is going to be treated as before since it will > > > > +contain the list of its `Disk` objects). > > > > + > > > > + > > > > +Configuration changes > > > > +~~~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +The ``ConfigData`` object gets one extra slot: ``disks``. Also there > > > > +will be two new functions, ``AddDisk`` and ``RemoveDisk`` that will > > > > +create/remove a disk objects from the config. > > > > + > > > > +The ``VerifyConfig`` function will be changed so it can check that > > > > there > > > > +are no dangling pointers from instances to disks (i.e. an instance > > > > +points to a disk that doesn't exist in the config). > > > > > > > > > > Note that one disk should only ever be attached to either zero or one > > > instance. So we should test for this as well. > > > > > > > > > > ACK. > > > > > > + > > > > + > > > > +Apply Disk modifications > > > > +~~~~~~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +Disk modifications are done through the ``ApplyContainerMods`` > > > > function. > > > > +``ApplyContainerMods`` support creating, deletion and modification of a > > > > +disk. We will continue to use the existing code > > > > (``ApplyContainerMods``) > > > > +for the disks as follows: > > > > + > > > > +**Create a new disk:** > > > > + > > > > +* ``ApplyContainerMods`` calls ``_CreateNewDisks`` which creates the > > > > + disk and adds it to the config file. > > > > + > > > > +* ``ApplyContainerMods`` itself adds the newly created Disk object to > > > > + Instance's `disks_info` slot with simple container modifications > > > > (i.e. > > > > + attaches the disk to the instance). > > > > + > > > > +* ``ApplyContainerMods`` calls ``_PostAddDisk`` takes care of > > > > hotplugging. > > > > + > > > > +**Modify a disk:** > > > > + > > > > +* ``ApplyContainerMods`` calls ``_ModifyDisk`` which modifies the disk > > > > + and calls config's ``Update`` function to save back the Disk object. > > > > + > > > > +**Delete a disk:** > > > > + > > > > +* ``ApplyContainerMods`` calls ``_PreRemoveDisk`` which takes care of > > > > + hotplugging. This function doesn't exist and has to be implemented. > > > > + > > > > +* ``ApplyContainerMods`` itself removes the disk object from the > > > > + Instance's `disk_info` slot (i.e. detaches the disk from the > > > > + instance). > > > > + > > > > +* ``ApplyContainerMods`` deletes the disk and removes it from the > > > > config > > > > + file by invoking ``RemoveDisk``. > > > > + > > > > + > > > > +Backend changes > > > > +~~~~~~~~~~~~~~~ > > > > + > > > > +During the RPC, the instance will get annotated with its disk. So in > > > > the > > > > +backend we will only have to replace ``disks`` slot with > > > > ``disks_info``. > > > > > > > > > > I don't really like the concept of changing/annotating objects (no matter > > > if its nodes, instances, disks, etc.) before sending them over RPC. It's > > > rather confusing, because some slots of the object are only filled when > > > executed as part of a RPC by noded. And this also hinders code reuse a > > > bit, > > > because there can be "config" objects and "sent-over-RPC" objects which > > > are > > > of the same type but are not quite the same. > > > > > > Personally, I would prefer a solution where the instance object is sent > > > as-is over RPC (so only containing a disk-list with UUIDs) and an > > > additional parameter (disks? instance_disks?) is introduced which is > > > essentially a dict of UUID -> disk object (or simply the list returned by > > > config.GetInstanceDisks(instance)). This will require some refactoring in > > > the short term, but certainly creates cleaner code in the long term. > > > > > > > > > > This sounds like a big change, that should probably be independent to > > the Disks changes. Furthermore, changing/annotating objects before > > sending them over RPC seems a standard practice (see Instance's > > hvparams, NIC's netinfo, etc). > > > > I think we should discuss the way annotations and RPCs work in a > > different context/thread. > > > > > > + > > > > + > > > > +.. TODO: Locks for Disk objects > > > > + > > > > +.. TODO: Attach/Dettach disks > > > > + > > > > +.. TODO: LUs for disks > > > > > > > > > > I see some more TODOs (not necessarily in priority order): > > > * Migration/Failover: How are those operations affected by this change? > > > An > > > instance with a plain disk attached to it can't be migrated any more, DRBD > > > disks reduce the set of potential migration targets. > > > > This design doc doesn't target having disks of different templates on > > the same instance. It sets the ground for such features, but these will > > come later. So, migration/failover code stays as is for now. There is no > > need for changes there. > > > > > * Disk activation: Right now, `gnt-instance activate-disks` activates all > > > disks of an instance. How will that change? Individual disks should have > > > the `activated` flag, which is right now a flag in the instance. > > > > This definitely needs more discussion. Is it possible to have only the > > first and the third disk of an instance activated? How will that affect > > they way they appear inside the machine (vda0, vda1, etc). I believe we > > can make this discussion after we will have locks and LUs for Disks. > > > > Again, as a first step this stays as is. All disks of the instance will > > get activated. > > > > > * How does this play together with disk templates? Should disk templates > > > for instances go away altogether? > > > > See above about multi-template disks. > > > > AFAIK, it is already in the roadmap to eliminate disk templates from the > > instance level. Guido and Helga know more about this. In the end it will > > be necessary to be able to have disks of different disk templates. > > > > > * DRBD specifics: DRBD disks are right now a tree of disks and the reason > > > why disks can have children. IMHO, that was a really bad idea and should > > > be > > > removed completely (all the recursive function dealing with children were > > > probably never tested and just litter the code base). I think that's the > > > opportunity to fix this and explicitly _not_ have DRBD disks with > > > "pointers" to child disks, but let them handle their base LVM disks > > > directly (while sharing code, obviously). > > > > Again, this seems a big architectural change that can go in a different > > design doc. > > > > > * Configuration upgrades, and much harder, downgrades (we have to support > > > at least the case of an upgrade and an immediate downgrade, but it would > > > be > > > good if we could as well support "compatible" cluster operations before a > > > downgrade). > > > > ACK. The first patch-set will modify cfgupgrade to support configuration > > upgrades/downgrades. Sorry that I didn't mention about cfgupgrade, I'll > > add it in the design doc. > > > > > > > > I'm aware of the fact that we can't do all of those at once, and that we > > > have to start from somewhere. I just wanted to point those points out so > > > they are not forgotten in the design of what we're doing first. > > > > > > > As you say we can't do all of those at once. At the moment we intoduce > > only the change that disks should be top-level citizens at config file > > (following Guido's plan as stated at GanetiCon). As we proceed with disk > > locks and LUs more modifications should emerge (thus the TODO sections). > > > > Kind Regards, > > Ilias > > > > > Cheers, > > > Thomas > > > > > > + > > > > + > > > > +.. vim: set textwidth=72 : > > > > +.. Local Variables: > > > > +.. mode: rst > > > > +.. fill-column: 72 > > > > +.. End: > > > > diff --git a/doc/index.rst b/doc/index.rst > > > > index ab2cccf..419d560 100644 > > > > --- a/doc/index.rst > > > > +++ b/doc/index.rst > > > > @@ -97,6 +97,7 @@ Draft designs > > > > design-cmdlib-unittests.rst > > > > design-cpu-pinning.rst > > > > design-device-uuid-name.rst > > > > + design-disks.rst > > > > design-file-based-storage.rst > > > > design-hroller.rst > > > > design-hotplug.rst > > > > -- > > > > 1.8.5.3 > > > > > > > > > > > > > > > > -- > > > Thomas Thrainer | Software Engineer | thoma...@google.com | > > > > > > Google Germany GmbH > > > Dienerstr. 12 > > > 80331 München > > > > > > Registergericht und -nummer: Hamburg, HRB 86891 > > > Sitz der Gesellschaft: Hamburg > > > Geschäftsführer: Graham Law, Christine Elizabeth Flores > > > > -- > Jose Antonio Lopes > Ganeti Engineering > Google Germany GmbH > Dienerstr. 12, 80331, München > > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschäftsführer: Graham Law, Christine Elizabeth Flores > Steuernummer: 48/725/00206 > Umsatzsteueridentifikationsnummer: DE813741370
signature.asc
Description: Digital signature