On Tue, Mar 18, 2014 at 10:39AM, Jose A. Lopes wrote:
> On Feb 28 13:39, Ilias Tsitsimpis wrote:
> > On Fri, Feb 28, 2014 at 09:24AM, Thomas Thrainer wrote:
> > > On Thu, Feb 27, 2014 at 4:35 PM, Ilias Tsitsimpis <ilias...@grnet.gr> 
> > > wrote:
> > > 
> > > > This patch adds a design document detailing the implementation of disks
> > > > as a new top-level citizen in the config file (just like instances,
> > > > nodes etc).
> > > >
> > > > Signed-off-by: Ilias Tsitsimpis <ilias...@grnet.gr>
> > > > ---
> > > >
> > > > Hello team,
> > > >
> > > > As discussed in GanetiCon 2013, we would like to have disks as top-level
> > > > citizens in the config file. This patch adds the design document
> > > > detailing the steps needed to achieve that.
> > > >
> > > > Once we agree on the first step and have the code merged we can extend
> > > > the design doc to include the next planned features.
> > > >
> > > > Kind Regards,
> > > > Ilias
> > > >
> > > >
> > > >  doc/design-disks.rst | 181
> > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  doc/index.rst        |   1 +
> > > >  2 files changed, 182 insertions(+)
> > > >  create mode 100644 doc/design-disks.rst
> > > >
> > > > diff --git a/doc/design-disks.rst b/doc/design-disks.rst
> > > > new file mode 100644
> > > > index 0000000..f9c9f5e
> > > > --- /dev/null
> > > > +++ b/doc/design-disks.rst
> > > > @@ -0,0 +1,181 @@
> > > > +=====
> > > > +Disks
> > > > +=====
> > > > +
> > > > +.. contents:: :depth: 4
> > > > +
> > > > +This is a design document detailing the implementation of disks as a 
> > > > new
> > > > +top-level citizen in the config file (just like instances, nodes etc).
> > > > +
> > > > +
> > > > +Current state and shortcomings
> > > > +==============================
> > > > +
> > > > +Currently, Disks are stored in Ganeti's config file as a list
> > > > +(container) of Disk objects under the Instance in which they belong.
> > > > +This implementation imposes a number of limitations:
> > > > +
> > > > +* A Disk object cannot live outside an Instance. This means that one
> > > > +  cannot detach a disk from an instance (without destroying the disk)
> > > > +  and then reattach it (to the same or even to a different instance).
> > > > +
> > > > +* Disks are not taggable objects, as only top-level citizens of the
> > > > +  config file can be made taggable. Having taggable disks will allow 
> > > > for
> > > > +  further customizations.
> > > > +
> > > > +
> > > > +Proposed changes
> > > > +================
> > > > +
> > > > +The implementation is going to be split in four parts:
> > > > +
> > > > +* Make disks a top-level citizen in config file. The Instance object
> > > > +  will no longer contain a list of Disk objects, but a list of disk
> > > > +  UUIDs.
> > > > +
> > > > +* Add locks for Disk objects and make them taggable.
> > > > +
> > > > +* Allow to attach/detach an existing disk to/from an instance.
> > > > +
> > > > +* Allow creation/modification/deletion of disks that are not attached 
> > > > to
> > > > +  any instance (requires new LUs for disks).
> > > > +
> > > > +
> > > > +Design decisions
> > > > +================
> > > > +
> > > > +Disks as config top-level citizens
> > > > +----------------------------------
> > > > +
> > > > +The first patch-series is going to add a new top-level citizen in the
> > > > +config object (namely ``disks``) and separate the disk objects from the
> > > > +instances. In doing so there are a number of problems that we have to
> > > > +overcome:
> > > > +
> > > > +* How the Disk object will be represented in the config file and how it
> > > > +  is going to be connected with the instance it belongs to.
> > > > +
> > > > +* How the existing code will get the disks belonging to an instance.
> > > > +
> > > > +* What it means for a disk to be attached/detached to/from an instance.
> > > > +
> > > > +* How disks are going to be created/deleted, attached/detached using
> > > > +  the existing code.
> > > > +
> > > > +
> > > > +Disk representation
> > > > +~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +The ``Disk`` object gets two extra slots, ``_TIMESTAMPS`` and
> > > > +``serial_no``.
> > > > +
> > > > +The ``Instance`` object will no longer contain the list of disk objects
> > > > +that are attached to it. Instead, an Instance object will refer to its
> > > > +disks using their UUIDs. Since the order in which the disks are 
> > > > attached
> > > > +to an instance is important we are going to have a list of disk UUIDs
> > > > +under the Instance object which will denote the disks attached to the
> > > > +instance and their order at the same time. So the Instance's ``disks``
> > > > +slot is going to be a list of disk UUIDs. The `Disk` object is not 
> > > > going
> > > > +to have a slot pointing to the `Instance` in which it belongs since 
> > > > this
> > > > +is redundant.
> > > > +
> > > > +From now on we will say that a disk is attached to an instance if the
> > > > +Instance's `disks` slot contains the disk's uuid and we will
> > > > +attach/detach a disk by manipulating this slot.
> > > > +
> > > > +
> > > > +Get instance's disks
> > > > +~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +A new function ``GetInstanceDisks`` will be added to the config that
> > > > +given an instance will return a list of Disk objects with the disks
> > > > +attached to this instance. This list will be exactly the same as
> > > > +'instance.disks' was before. Because everywhere in the code base we 
> > > > make
> > > > +the assumption that the disk objects are accessible through the 
> > > > instance
> > > > +object, and because we would like not to have to ask the config file
> > > > +every time we want to get the disks of an instance, we are going to
> > > > +annotate the Instance object with it's disks as soon as we read it from
> > > > +the config. This means that ``Instance`` object will have a new slot
> > > > +named ``disks_info`` that at run-time will be identical to the current
> > > > +``disks``. This new slot will be filled with the disks objects when we
> > > > +read the Instance object from the config file (``GetInstanceInfo``).
> > > > +When the ``Instance`` object is going to be written back to the config
> > > > +file, the ``ToDict`` property will extract the disk UUIDs from the
> > > > +`disks_info` slot, save them to the ``disks`` slot and discard the
> > > > +`disks_info` one.
> > > >
> > > 
> > > This sounds like a premature optimization. Are there any good reasons to
> > > not just call config.GetInstanceDisks(instance) wherever instance.disks 
> > > was
> > > before? If this proves to be a performance bottleneck, which I strongly
> > > doubt, then we can still look for an optimization.
> > > Note that we already do that this way every time we access the nodes of an
> > > instance.
> > > 
> > > 
> > 
> > It doesn't seem to be a performance bottleneck and it will work. The
> > only downside of this approach is that it breaks the unification of
> > 'ApplyContainerMods'. Since disk objects will not be part of the
> > instance, 'ApplyContainerMods' will not be able to attach/detach them so
> > we have to introduce two new functions (AttachInstDisk and
> > DetachInstDisk) to config. If you are ok with that, we can proceed as
> > you say.
> 
> I agree with Thomas.  It doesn't make sense to have two slots to
> represent the same thing and it will only create problems in making sure
> the two slots have a consistent state.  And also making sure
> serialization is correct.  If you want to make it easy to get the disk
> objects out of the disk UUIDs, just add some helper functions to the
> config or the instance class.
> 

ACK. I will implement it this way and update the design doc accordingly.


Thanks,
Ilias

> > > > +
> > > > +Everywhere in the code we are going to replace the 'instance.disks'
> > > > +(which from now one will contain a list of disk UUIDs) with the
> > > > +'instance.disks_info'. Other than that there should be no other changes
> > > > +(the `Instance` object is going to be treated as before since it will
> > > > +contain the list of its `Disk` objects).
> > > > +
> > > > +
> > > > +Configuration changes
> > > > +~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +The ``ConfigData`` object gets one extra slot: ``disks``.  Also there
> > > > +will be two new functions, ``AddDisk`` and ``RemoveDisk`` that will
> > > > +create/remove a disk objects from the config.
> > > > +
> > > > +The ``VerifyConfig`` function will be changed so it can check that 
> > > > there
> > > > +are no dangling pointers from instances to disks (i.e. an instance
> > > > +points to a disk that doesn't exist in the config).
> > > >
> > > 
> > > Note that one disk should only ever be attached to either zero or one
> > > instance. So we should test for this as well.
> > > 
> > > 
> > 
> > ACK.
> > 
> > > > +
> > > > +
> > > > +Apply Disk modifications
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Disk modifications are done through the ``ApplyContainerMods`` 
> > > > function.
> > > > +``ApplyContainerMods`` support creating, deletion and modification of a
> > > > +disk. We will continue to use the existing code 
> > > > (``ApplyContainerMods``)
> > > > +for the disks as follows:
> > > > +
> > > > +**Create a new disk:**
> > > > +
> > > > +* ``ApplyContainerMods`` calls ``_CreateNewDisks`` which creates the
> > > > +  disk and adds it to the config file.
> > > > +
> > > > +* ``ApplyContainerMods`` itself adds the newly created Disk object to
> > > > +  Instance's `disks_info` slot with simple container modifications 
> > > > (i.e.
> > > > +  attaches the disk to the instance).
> > > > +
> > > > +* ``ApplyContainerMods`` calls ``_PostAddDisk`` takes care of 
> > > > hotplugging.
> > > > +
> > > > +**Modify a disk:**
> > > > +
> > > > +* ``ApplyContainerMods`` calls ``_ModifyDisk`` which modifies the disk
> > > > +  and calls config's ``Update`` function to save back the Disk object.
> > > > +
> > > > +**Delete a disk:**
> > > > +
> > > > +* ``ApplyContainerMods`` calls ``_PreRemoveDisk`` which takes care of
> > > > +  hotplugging. This function doesn't exist and has to be implemented.
> > > > +
> > > > +* ``ApplyContainerMods`` itself removes the disk object from the
> > > > +  Instance's `disk_info` slot (i.e. detaches the disk from the
> > > > +  instance).
> > > > +
> > > > +* ``ApplyContainerMods`` deletes the disk and removes it from the 
> > > > config
> > > > +  file by invoking ``RemoveDisk``.
> > > > +
> > > > +
> > > > +Backend changes
> > > > +~~~~~~~~~~~~~~~
> > > > +
> > > > +During the RPC, the instance will get annotated with its disk. So in 
> > > > the
> > > > +backend we will only have to replace ``disks`` slot with 
> > > > ``disks_info``.
> > > >
> > > 
> > > I don't really like the concept of changing/annotating objects (no matter
> > > if its nodes, instances, disks, etc.) before sending them over RPC. It's
> > > rather confusing, because some slots of the object are only filled when
> > > executed as part of a RPC by noded. And this also hinders code reuse a 
> > > bit,
> > > because there can be "config" objects and "sent-over-RPC" objects which 
> > > are
> > > of the same type but are not quite the same.
> > > 
> > > Personally, I would prefer a solution where the instance object is sent
> > > as-is over RPC (so only containing a disk-list with UUIDs) and an
> > > additional parameter (disks? instance_disks?) is introduced which is
> > > essentially a dict of UUID -> disk object (or simply the list returned by
> > > config.GetInstanceDisks(instance)). This will require some refactoring in
> > > the short term, but certainly creates cleaner code in the long term.
> > > 
> > > 
> > 
> > This sounds like a big change, that should probably be independent to
> > the Disks changes. Furthermore, changing/annotating objects before
> > sending them over RPC seems a standard practice (see Instance's
> > hvparams, NIC's netinfo, etc).
> > 
> > I think we should discuss the way annotations and RPCs work in a
> > different context/thread.
> > 
> > > > +
> > > > +
> > > > +.. TODO: Locks for Disk objects
> > > > +
> > > > +.. TODO: Attach/Dettach disks
> > > > +
> > > > +.. TODO: LUs for disks
> > > >
> > > 
> > > I see some more TODOs (not necessarily in priority order):
> > >  * Migration/Failover: How are those operations affected by this change? 
> > > An
> > > instance with a plain disk attached to it can't be migrated any more, DRBD
> > > disks reduce the set of potential migration targets.
> > 
> > This design doc doesn't target having disks of different templates on
> > the same instance. It sets the ground for such features, but these will
> > come later. So, migration/failover code stays as is for now. There is no
> > need for changes there.
> > 
> > >  * Disk activation: Right now, `gnt-instance activate-disks` activates all
> > > disks of an instance. How will that change? Individual disks should have
> > > the `activated` flag, which is right now a flag in the instance.
> > 
> > This definitely needs more discussion. Is it possible to have only the
> > first and the third disk of an instance activated? How will that affect
> > they way they appear inside the machine (vda0, vda1, etc). I believe we
> > can make this discussion after we will have locks and LUs for Disks.
> > 
> > Again, as a first step this stays as is. All disks of the instance will
> > get activated.
> > 
> > >  * How does this play together with disk templates? Should disk templates
> > > for instances go away altogether?
> > 
> > See above about multi-template disks.
> > 
> > AFAIK, it is already in the roadmap to eliminate disk templates from the
> > instance level. Guido and Helga know more about this. In the end it will
> > be necessary to be able to have disks of different disk templates.
> > 
> > >  * DRBD specifics: DRBD disks are right now a tree of disks and the reason
> > > why disks can have children. IMHO, that was a really bad idea and should 
> > > be
> > > removed completely (all the recursive function dealing with children were
> > > probably never tested and just litter the code base). I think that's the
> > > opportunity to fix this and explicitly _not_ have DRBD disks with
> > > "pointers" to child disks, but let them handle their base LVM disks
> > > directly (while sharing code, obviously).
> > 
> > Again, this seems a big architectural change that can go in a different
> > design doc.
> > 
> > >  * Configuration upgrades, and much harder, downgrades (we have to support
> > > at least the case of an upgrade and an immediate downgrade, but it would 
> > > be
> > > good if we could as well support "compatible" cluster operations before a
> > > downgrade).
> > 
> > ACK. The first patch-set will modify cfgupgrade to support configuration
> > upgrades/downgrades. Sorry that I didn't mention about cfgupgrade, I'll
> > add it in the design doc.
> > 
> > > 
> > > I'm aware of the fact that we can't do all of those at once, and that we
> > > have to start from somewhere. I just wanted to point those points out so
> > > they are not forgotten in the design of what we're doing first.
> > > 
> > 
> > As you say we can't do all of those at once. At the moment we intoduce
> > only the change that disks should be top-level citizens at config file
> > (following Guido's plan as stated at GanetiCon). As we proceed with disk
> > locks and LUs more modifications should emerge (thus the TODO sections).
> > 
> > Kind Regards,
> > Ilias
> > 
> > > Cheers,
> > > Thomas
> > > 
> > > +
> > > > +
> > > > +.. vim: set textwidth=72 :
> > > > +.. Local Variables:
> > > > +.. mode: rst
> > > > +.. fill-column: 72
> > > > +.. End:
> > > > diff --git a/doc/index.rst b/doc/index.rst
> > > > index ab2cccf..419d560 100644
> > > > --- a/doc/index.rst
> > > > +++ b/doc/index.rst
> > > > @@ -97,6 +97,7 @@ Draft designs
> > > >     design-cmdlib-unittests.rst
> > > >     design-cpu-pinning.rst
> > > >     design-device-uuid-name.rst
> > > > +   design-disks.rst
> > > >     design-file-based-storage.rst
> > > >     design-hroller.rst
> > > >     design-hotplug.rst
> > > > --
> > > > 1.8.5.3
> > > >
> > > 
> > > 
> > > 
> > > -- 
> > > Thomas Thrainer | Software Engineer | thoma...@google.com |
> > > 
> > > Google Germany GmbH
> > > Dienerstr. 12
> > > 80331 München
> > > 
> > > Registergericht und -nummer: Hamburg, HRB 86891
> > > Sitz der Gesellschaft: Hamburg
> > > Geschäftsführer: Graham Law, Christine Elizabeth Flores
> 
> 
> 
> -- 
> Jose Antonio Lopes
> Ganeti Engineering
> Google Germany GmbH
> Dienerstr. 12, 80331, München
> 
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Steuernummer: 48/725/00206
> Umsatzsteueridentifikationsnummer: DE813741370

Attachment: signature.asc
Description: Digital signature

Reply via email to