Signed-off-by: Pulkit Singhal <[email protected]>
---
doc/design-ceph-ganeti-support.rst | 184 ++++++++++++++++++++++++++++++++++++
doc/design-draft.rst | 1 +
2 files changed, 185 insertions(+)
create mode 100644 doc/design-ceph-ganeti-support.rst
diff --git a/doc/design-ceph-ganeti-support.rst
b/doc/design-ceph-ganeti-support.rst
new file mode 100644
index 0000000..8d0229c
--- /dev/null
+++ b/doc/design-ceph-ganeti-support.rst
@@ -0,0 +1,184 @@
+============================
+RADOS/Ceph support in Ganeti
+============================
+
+.. contents:: :depth: 4
+
+Objective
+=========
+
+The project aims to improve Ceph RBD support in Ganeti. It can be
+primarily divided into following tasks.
+
+- Use Qemu/KVM RBD driver to provide instances with direct RBD
+ support.
+- Allow Ceph RBDs' configuration through Ganeti.
+- Write a data collector to monitor Ceph nodes.
+
+Background
+==========
+
+Ceph RBD
+--------
+
+Ceph is a distributed storage system which provides data access as
+files, objects and blocks. As part of this project, we're interested in
+integrating ceph's block device(RBD) directly with Qemu/KVM.
+
+Primary components/daemons of Ceph.
+- Monitor - Serve as authentication point for clients.
+- Metadata - Store all the filesystem metadata (Not configured here as
+they are not required for RBD)
+- OSD - Object storage devices. One daemon for each drive/location.
+
+RBD support in Ganeti
+---------------------
+
+Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
+This is enabled through RBD disk templates. These templates allow RBD
+volume's access through RBD Linux driver. The volumes are mapped to host
+as local block devices which are then attached to the instances. This
+method incurs an additional overhead. We plan to resolve it by using
+Qemu's RBD driver to enable direct access to RBD volumes for KVM
+instances.
+
+Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
+Allowing configuration of ceph nodes through Ganeti will be a good
+addition to its prime features.
+
+
+Qemu/KVM Direct RBD Integration
+===============================
+
+A new disk param ``access`` is introduced. It's added at
+cluster/node-group level to simplify prototype implementation.
+It will specify the access method either as ``userspace`` or
+``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
+device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice
+and is added to the params dictionary as ``kvm_dev_path``.
+
+This approach ensures that no disk template specific changes are
+required in hv_kvm.py allowing easy integration of other distributed
+storage systems(like Gluster).
+
+Note that the RBD volume is mapped as a local block device as before.
+The local mapping won't be used during instance operation in the
+``userspace`` access mode, but can be used by administrators and OS
+scripts.
+
+Updated commands
+----------------
+::
+ $ gnt-instance info
+
+``access:userspace/kernelspace`` will be added to Disks category. This
+output applies to KVM based instances only.
+
+Ceph configuration on Ganeti nodes
+==================================
+
+This document proposes configuration of distributed storage
+pool(Ceph or Gluster) through ganeti. Currently, this design document
+focuses on configuring a Ceph cluster. A prerequisite of this setup
+would be installation of ceph packages on all the concerned nodes.
+
+At Ganeti Cluster init, the user will set distributed-storage specific
+options which will be stored at cluster level. The Storage cluster
+will be initialized using ``gnt-storage``. For the prototype, only a
+single storage pool/node-group is configured.
+
+Following steps take place when a node-group is initialized as a storage
+cluster.
+- Check for an existing ceph cluster through /etc/ceph/ceph.conf file on
+ each node.
+- Fetch cluster configuration parameters and create a distributed
+ storage object accordingly.
+- Issue an 'init distributed storage' RPC to group nodes(if any).
+- On each node, ``ceph`` cli tool will run appropriate services.
+- Mark nodes as well as the node-group as distributed-storage-enabled.
+
+The storage cluster will operate at a node-group level. The ceph
+cluster will be initiated using gnt-storage. A new sub-command
+``init-distributed-storage`` will be added to it.
+
+The configuration of the nodes will be handled through an init function
+called by the node daemons running on the respective nodes. A new RPC is
+introduced to handle the calls.
+
+A new object will be created to send the storage parameters to the node
+- storage_type, devices, node_role(mon/osd) etc.
+
+A new node can be directly assigned to the storage enabled node-group.
+During the 'gnt-node add' process, required ceph daemons will be started
+and node will be added to the ceph cluster.
+
+Only an offline node can be assigned to storage enabled node-group.
+``gnt-node --readd`` needs to be performed to issue RPCs for spawning
+appropriate services on the newly assigned node.
+
+Updated Commands
+----------------
+
+Following are the affected commands.::
+
+ $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
+
+During cluster initialization, ceph specific options are provided which
+apply at cluster-level.::
+
+ $ gnt-cluster modify -S ceph:option=value2...
+
+For now, cluster modification will be allowed when there is no
+initialized storage cluster.::
+
+ $ gnt-storage init-distributed-storage -s{--storage-type} ceph
+<node-group>
+
+Ensure that no other node-group is configured as distributed storage
+cluster and configure ceph on the specified node-group. If there is no
+node in the node-group, it'll only be marked as distributed storage
+enabled and no action will be taken.::
+
+ $ gnt-group assign-nodes <group> <node>
+
+It ensures that the node is offline if the node-group specified is
+distributed storage capable. Ceph configuration on the newly assigned
+node is not performed at this step.::
+
+ $ gnt-node --offline
+
+If the node is part of storage node-group, an offline call will stop/remove
+ceph daemons.::
+
+ $ gnt-node add --readd
+
+If the node is now part of the storage node-group, issue init
+distributed storage RPC to the respective node. This step is required
+after assigning a node to the storage enabled node-group::
+
+ $ gnt-node remove
+
+A warning will be issued stating that the node is part of distributed
+storage, mark it offline before removal.
+
+Data collector for Ceph
+-----------------------
+
+TBD
+
+Future Work
+-----------
+
+Due to the loopback bug in ceph, one may run into daemon hang issues
+while performing writes to a RBD volumes through block device mapping.
+This bug is applicable only when the RBD volume is stored on the OSD
+running on the local node. In order to mitigate this issue, we can
+create storage pools on different nodegroups and access RBD
+volumes on different pools.
+http://tracker.ceph.com/issues/3076
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End:
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index f164c7c..f49885f 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -24,6 +24,7 @@ Design document drafts
design-cmdlib-unittests.rst
design-hotplug.rst
design-optables.rst
+ design-ceph-ganeti-support.rst
.. vim: set textwidth=72 :
.. Local Variables:
--
1.7.9.5