LGTM thanks, I'll push the patch.
On Fri, May 23, 2014 at 2:35 PM, Dimitris Bliablias <[email protected]>wrote: > This patch adds a design document detailing the implementation of a > generic mechanism which will provide support for converting between > different disk templates in Ganeti. > > Signed-off-by: Dimitris Bliablias <[email protected]> > Signed-off-by: Constantinos Venetsanopoulos <[email protected]> > --- > > Makefile.am | 1 + > doc/design-disk-conversion.rst | 281 > ++++++++++++++++++++++++++++++++++++++++ > doc/design-draft.rst | 1 + > 3 files changed, 283 insertions(+) > create mode 100644 doc/design-disk-conversion.rst > > --- > 1.7.10.4 > > diff --git a/Makefile.am b/Makefile.am > index 40f0a1a..996c8bd 100644 > --- a/Makefile.am > +++ b/Makefile.am > @@ -579,6 +579,7 @@ docinput = \ > doc/design-cpu-speed.rst \ > doc/design-daemons.rst \ > doc/design-device-uuid-name.rst \ > + doc/design-disk-conversion.rst \ > doc/design-draft.rst \ > doc/design-file-based-storage.rst \ > doc/design-glusterfs-ganeti-support.rst \ > diff --git a/doc/design-disk-conversion.rst > b/doc/design-disk-conversion.rst > new file mode 100644 > index 0000000..47f94a8 > --- /dev/null > +++ b/doc/design-disk-conversion.rst > @@ -0,0 +1,281 @@ > +================================= > +Conversion between disk templates > +================================= > + > +.. contents:: :depth: 4 > + > +This design document describes the support for generic disk template > +conversion in Ganeti. The logic used is disk template agnostic and > +targets to cover the majority of conversions among the supported disk > +templates. > + > + > +Current state and shortcomings > +============================== > + > +Currently, Ganeti supports choosing among different disk templates when > +creating an instance. However, converting the disk template of an > +existing instance is possible only between the ``plain`` and ``drbd`` > +templates. This feature was added in Ganeti since its early versions > +when the number of supported disk templates was limited. Now that Ganeti > +supports plenty of choices, this feature should be extended to provide > +more flexibility to the user. > + > +The procedure for converting from the plain to the drbd disk template > +works as follows. Firstly, a completely new disk template is generated > +matching the size, mode, and the count of the current instance's disks. > +The missing volumes are created manually both in the primary (meta disk) > +and the secondary node. The original LVs running on the primary node are > +renamed to match the new names. The last step is to manually associate > +the DRBD devices with their mirror block device pairs. The conversion > +from the drbd to the plain disk template is much simpler than the > +opposite. Firstly, the DRBD mirroring is manually disabled. Then the > +unnecessary volumes including the meta disk(s) of the primary node, and > +the meta and data disk(s) from the previously secondary node are > +removed. > + > + > +Proposed changes > +================ > + > +This design proposes the creation of a unified interface for handling > +the disk template conversions in Ganeti. Currently, there is no such > +interface and each one of the supported conversions uses a separate code > +path. > + > +This proposal introduces a single, disk-agnostic interface for handling > +the disk template conversions in Ganeti, keeping in mind that we want it > +to be as generic as possible. An exception case will be the currently > +supported conversions between the LVM-based disk templates. Their basic > +functionality will not be affected and will diverge from the rest disk > +template conversions. The target is to provide support for conversions > +among the majority of the available disk templates, and also creating > +a mechanism that will easily support any new templates that may be > +probably added in Ganeti, at a future point. > + > + > +Design decisions > +================ > + > +Currently, the supported conversions for the LVM-based templates are > +handled by the ``LUInstanceSetParams`` LU. Our implementation will > +follow the same approach. From a high-level point-of-view this design > +can be split in two parts: > + > +* The extension of the LU's checks to cover all the supported template > + conversions > + > +* The new functionality which will be introduced to provide the new > + feature > + > +The instance must be stopped before starting the disk template > +conversion, as it currently is, otherwise the operation will fail. The > +new mechanism will need to copy the disk's data for the conversion to be > +possible. We propose using the Unix ``dd`` command to copy the > +instance's data. It can be used to copy data from source to destination, > +block-by-block, regardless of their filesystem types, making it a > +convenient tool for the case. Since the conversion will be done via data > +copy it will take a long time for bigger disks to copy their data and > +consequently for the instance to switch to the new template. > + > +Some template conversions can be done faster without copying explicitly > +their disks' data. A use case is the conversions between the LVM-based > +templates, i.e., ``drbd`` and ``plain`` which will be done as happens > +now and not using the ``dd`` command. Also, this implementation will > +provide partial support for the ``blockdev`` disk template which will > +act only as a source template. Since those volumes are adopted > +pre-existent block devices we will not support conversions targeting > +this template. Another exception case will be the ``diskless`` template. > +Since it is a testing template that creates instances with no disks we > +will not provide support for conversions that include this template > +type. > + > + > +We divide the design into the following parts: > + > +* Block device changes, that include the new methods which will be > + introduced and will be responsible for building the commands for the > + data copy from/to the requested devices > + > +* Backend changes, that include a new RPC call which will concatenate > + the output of the above two methods and will execute the data copy > + command > + > +* Core changes, that include the modifications in the Logical Unit > + > +* User interface changes, i.e., command line changes > + > + > +Block device changes > +-------------------- > + > +The block device abstract class will be extended with two new methods, > +named ``Import`` and ``Export``. Those methods will be responsible for > +building the commands that will be used for the data copy between the > +corresponding devices. The ``Export`` method will build the command > +which will export the data from the source device, while the ``Import`` > +method will do the opposite. It will import the data to the newly > +created target device. Those two methods will not perform the actual > +data copy; they will simply return the requested commands for > +transferring the data from/to the individual devices. The output of the > +two methods will be combined using a pipe ("|") by the caller method in > +the backend level. > + > +By default the data import and export will be done using the ``dd`` > +command. All the inherited classes will use the base functionality > +unless there is a faster way to convert to. In that case the underlying > +block device will overwrite those methods with its specific > +functionality. A use case will be the Ceph/RADOS block devices which > +will make use of the ``rbd import`` and ``rbd export`` commands to copy > +their data instead of using the default ``dd`` command. > + > +Keeping the data copy functionality in the block device layer, provides > +us with a generic mechanism that works between almost all conversions > +and furthermore can be easily extended for new disk templates. It also > +covers the devices that support the ``access=userspace`` parameter and > +solves this problem in a generic way, by implementing the logic in the > +right level where we know what is the best to do for each device. > + > + > +Backend changes > +--------------- > + > +Introduce a new RPC call: > + > +* blockdev_convert(src_disk, dest_disk) > + > +where ``src_disk`` and ``dest_disk`` are the original and the new disk > +objects respectively. First, the actual device instances will be > +computed and then they will be used to build the export and import > +commands for the data copy. The output of those methods will be > +concatenated using a pipe, following a similar approach with the impexp > +daemon. Finally, the unified data copy command will be executed, at this > +level, by the ``nodeD``. > + > + > +Core changes > +------------ > + > +The main modifications will be made in the ``LUInstanceSetParams`` LU. > +The implementation of the conversion mechanism will be split into the > +following parts: > + > +* The generation of the new disk template for the instance. The new > + disks will match the size, mode, and name of the original volumes. > + Those parameters and any other needed, .i.e., the provider's name for > + the ExtStorage conversions, will be computed by a new method which we > + will introduce, named ``ComputeDisksInfo``. The output of that > + function will be used as the ``disk_info`` argument of the > + ``GenerateDiskTemplate`` method. > + > +* The creation of the new block devices. We will make use of the > + ``CreateDisks`` method which creates and attaches the new block > + devices. > + > +* The data copy for each disk of the instance from the original to the > + newly created volume. The data copy will be made by the ``nodeD`` with > + the rpc call we have introduced earlier in this design. In case some > + disks fail to copy their data the operation will fail and the newly > + created disks will be removed. The instance will remain intact. > + > +* The detachment of the original disks of the instance when the data > + copy operation successfully completes by calling the > + ``RemoveInstanceDisk`` method for each instance's disk. > + > +* The attachment of the new disks to the instance by calling the > + ``AddInstanceDisk`` method for each disk we have created. > + > +* The update of the configuration file with the new values. > + > +* The removal of the original block devices from the node using the > + ``BlockdevRemove`` method for each one of the old disks. > + > + > +User interface changes > +---------------------- > + > +The ``-t`` (``--disk-template``) option from the gnt-instance modify > +command will specify the disk template to convert *to*, as it happens > +now. The rest disk options such as its size, its mode, and its name will > +be computed from the original volumes by the conversion mechanism, and > +the user will not explicitly provide them. > + > + > +ExtStorage conversions > +~~~~~~~~~~~~~~~~~~~~~~ > + > +When converting to an ExtStorage disk template the > +``provider=*PROVIDER*`` option which specifies the ExtStorage provider > +will be mandatory. Also, arbitrary parameters can be passed to the > +ExtStorage provider. Those parameters will be optional and could be > +passed as additional comma separated options. Since it is not allowed to > +convert the disk template of an instance and make use of the ``--disk`` > +option at the same time, we propose to introduce a new option named > +``--ext-params`` to handle the ``ext`` template conversions. > + > +:: > + > + gnt-instance modify -t ext --ext-params provider=pvdr1 test_vm > + gnt-instance modify -t ext --ext-params > provider=pvdr1,param1=val1,param2=val2 test_vm > + > + > +File-based conversions > +~~~~~~~~~~~~~~~~~~~~~~ > + > +For conversions *to* a file-based template the ``--file-storage-dir`` > +and the ``--file-driver`` options could be used, similarly to the > +**add** command, to manually configure the storage directory and the > +preferred driver for the file-based disks. > + > +:: > + > + gnt-instance modify -t file --file-storage-dir=mysubdir test_vm > + > + > +Supported template conversions > +============================== > + > +This is a summary of the disk template conversions that the conversion > +mechanism will support: > + > > ++--------------+-----------------------------------------------------------------------------------+ > +| Source | Target Disk Template > | > +| Disk > +---------+-------+------+------------+---------+------+------+----------+----------+ > +| Template | Plain | DRBD | File | Sharedfile | Gluster | RBD | > Ext | BlockDev | Diskless | > > ++==============+=========+=======+======+============+=========+======+======+==========+==========+ > +| Plain | - | Yes. | Yes. | Yes. | Yes. | Yes. | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| DRBD | Yes. | - | Yes. | Yes. | Yes. | Yes. | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| File | Yes. | Yes. | - | Yes. | Yes. | Yes. | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| Sharedfile | Yes. | Yes. | Yes. | - | Yes. | Yes. | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| Gluster | Yes. | Yes. | Yes. | Yes. | - | Yes. | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| RBD | Yes. | Yes. | Yes. | Yes. | Yes. | - | > Yes. | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| Ext | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | > - | No. | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| BlockDev | Yes. | Yes. | Yes. | Yes. | Yes. | Yes. | > Yes. | - | No. | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > +| Diskless | No. | No. | No. | No. | No. | No. | > No. | No. | - | > > ++--------------+---------+-------+------+------------+---------+------+------+----------+----------+ > + > + > +Future Work > +=========== > + > +Expand the conversion mechanism to provide a visual indication of the > +data copy operation. We could monitor the progress of the data sent via > +a pipe, and provide to the user information such as the time elapsed, > +percentage completed (probably with a progress bar), total data > +transferred, and so on, similar to the progress tracking that is > +currently done by the impexp daemon. > + > + > +.. vim: set textwidth=72 : > +.. Local Variables: > +.. mode: rst > +.. fill-column: 72 > +.. End: > diff --git a/doc/design-draft.rst b/doc/design-draft.rst > index 55bed7c..95ecdc6 100644 > --- a/doc/design-draft.rst > +++ b/doc/design-draft.rst > @@ -23,6 +23,7 @@ Design document drafts > design-node-security.rst > design-systemd.rst > design-cpu-speed.rst > + design-disk-conversion.rst > > .. vim: set textwidth=72 : > .. Local Variables: > -- > 1.7.10.4 > >
