I had to fix some style errors and warnings.
Please perform the automatic checks before sending a patch. A simple `make`
didn't run through with this patch added.

LGTM (I pushed the patch to master), thanks,

Thomas


On Thu, Jul 25, 2013 at 11:07 PM, Pulkit Singhal <[email protected]>wrote:

> Hi Thomas,
>
> I've sent an updated Design document. Forgot to fix the "node (" issue
> though :P
>
> The details regarding 'common operations' are added under 'Ceph
> configuration on Ganeti nodes'.  Please mention if any other details are
> required.
>
>
> On Fri, Jul 26, 2013 at 2:21 AM, Pulkit Singhal <[email protected]>wrote:
>
>> Signed-off-by: Pulkit Singhal <[email protected]>
>> ---
>>  doc/design-ceph-ganeti-support.rst |  184
>> ++++++++++++++++++++++++++++++++++++
>>  doc/design-draft.rst               |    1 +
>>  2 files changed, 185 insertions(+)
>>  create mode 100644 doc/design-ceph-ganeti-support.rst
>>
>> diff --git a/doc/design-ceph-ganeti-support.rst
>> b/doc/design-ceph-ganeti-support.rst
>> new file mode 100644
>> index 0000000..8d0229c
>> --- /dev/null
>> +++ b/doc/design-ceph-ganeti-support.rst
>> @@ -0,0 +1,184 @@
>> +============================
>> +RADOS/Ceph support in Ganeti
>> +============================
>> +
>> +.. contents:: :depth: 4
>> +
>> +Objective
>> +=========
>> +
>> +The project aims to improve Ceph RBD support in Ganeti. It can be
>> +primarily divided into following tasks.
>> +
>> +- Use Qemu/KVM RBD driver to provide instances with direct RBD
>> +  support.
>> +- Allow Ceph RBDs' configuration through Ganeti.
>> +- Write a data collector to monitor Ceph nodes.
>> +
>> +Background
>> +==========
>> +
>> +Ceph RBD
>> +--------
>> +
>> +Ceph is a distributed storage system which provides data access as
>> +files, objects and blocks. As part of this project, we're interested in
>> +integrating ceph's block device(RBD) directly with Qemu/KVM.
>> +
>> +Primary components/daemons of Ceph.
>> +- Monitor - Serve as authentication point for clients.
>> +- Metadata - Store all the filesystem metadata (Not configured here as
>> +they are not required for RBD)
>> +- OSD - Object storage devices. One daemon for each drive/location.
>> +
>> +RBD support in Ganeti
>> +---------------------
>> +
>> +Currently, Ganeti supports RBD volumes on a pre-configured Ceph cluster.
>> +This is enabled through RBD disk templates. These templates allow RBD
>> +volume's access through RBD Linux driver. The volumes are mapped to host
>> +as local block devices which are then attached to the instances. This
>> +method incurs an additional overhead. We plan to resolve it by using
>> +Qemu's RBD driver to enable direct access to RBD volumes for KVM
>> +instances.
>> +
>> +Also, Ganeti currently uses RBD volumes on a pre-configured ceph cluster.
>> +Allowing configuration of ceph nodes through Ganeti will be a good
>> +addition to its prime features.
>> +
>> +
>> +Qemu/KVM Direct RBD Integration
>> +===============================
>> +
>> +A new disk param ``access`` is introduced. It's added at
>> +cluster/node-group level to simplify prototype implementation.
>> +It will specify the access method either as ``userspace`` or
>> +``kernelspace``. It's accessible to StartInstance() in hv_kvm.py. The
>> +device path, ``rbd:<pool>/<vol_name>``, is generated by RADOSBlockDevice
>> +and is added to the params dictionary as ``kvm_dev_path``.
>> +
>> +This approach ensures that no disk template specific changes are
>> +required in hv_kvm.py allowing easy integration of other distributed
>> +storage systems(like Gluster).
>> +
>> +Note that the RBD volume is mapped as a local block device as before.
>> +The local mapping won't be used during instance operation in the
>> +``userspace`` access mode, but can be used by administrators and OS
>> +scripts.
>> +
>> +Updated commands
>> +----------------
>> +::
>> +  $ gnt-instance info
>> +
>> +``access:userspace/kernelspace`` will be added to Disks category. This
>> +output applies to KVM based instances only.
>> +
>> +Ceph configuration on Ganeti nodes
>> +==================================
>> +
>> +This document proposes configuration of distributed storage
>> +pool(Ceph or Gluster) through ganeti. Currently, this design document
>> +focuses on configuring a Ceph cluster. A prerequisite of this setup
>> +would be installation of ceph packages on all the concerned nodes.
>> +
>> +At Ganeti Cluster init, the user will set distributed-storage specific
>> +options which will be stored at cluster level. The Storage cluster
>> +will be initialized using ``gnt-storage``. For the prototype, only a
>> +single storage pool/node-group is configured.
>> +
>> +Following steps take place when a node-group is initialized as a storage
>> +cluster.
>> +- Check for an existing ceph cluster through /etc/ceph/ceph.conf file on
>> +  each node.
>> +- Fetch cluster configuration parameters and create a distributed
>> +  storage object accordingly.
>> +- Issue an 'init distributed storage' RPC to group nodes(if any).
>> +- On each node, ``ceph`` cli tool will run appropriate services.
>> +- Mark nodes as well as the node-group as distributed-storage-enabled.
>> +
>> +The storage cluster will operate at a node-group level. The ceph
>> +cluster will be initiated using gnt-storage. A new sub-command
>> +``init-distributed-storage`` will be added to it.
>> +
>> +The configuration of the nodes will be handled through an init function
>> +called by the node daemons running on the respective nodes. A new RPC is
>> +introduced to handle the calls.
>> +
>> +A new object will be created to send the storage parameters to the node
>> +- storage_type, devices, node_role(mon/osd) etc.
>> +
>> +A new node can be directly assigned to the storage enabled node-group.
>> +During the 'gnt-node add' process, required ceph daemons will be started
>> +and node will be added to the ceph cluster.
>> +
>> +Only an offline node can be assigned to storage enabled node-group.
>> +``gnt-node --readd`` needs to be performed to issue RPCs for spawning
>> +appropriate services on the newly assigned node.
>> +
>> +Updated Commands
>> +----------------
>> +
>> +Following are the affected commands.::
>> +
>> +  $ gnt-cluster init -S ceph:disk=/dev/sdb,option=value...
>> +
>> +During cluster initialization, ceph specific options are provided which
>> +apply at cluster-level.::
>> +
>> +  $ gnt-cluster modify -S ceph:option=value2...
>> +
>> +For now, cluster modification will be allowed when there is no
>> +initialized storage cluster.::
>> +
>> +  $ gnt-storage init-distributed-storage -s{--storage-type} ceph
>> +<node-group>
>> +
>> +Ensure that no other node-group is configured as distributed storage
>> +cluster and configure ceph on the specified node-group. If there is no
>> +node in the node-group, it'll only be marked as distributed storage
>> +enabled and no action will be taken.::
>> +
>> +  $ gnt-group assign-nodes <group> <node>
>> +
>> +It ensures that the node is offline if the node-group specified is
>> +distributed storage capable. Ceph configuration on the newly assigned
>> +node is not performed at this step.::
>> +
>> +  $ gnt-node --offline
>> +
>> +If the node is part of storage node-group, an offline call will
>> stop/remove
>> +ceph daemons.::
>> +
>> +  $ gnt-node add --readd
>> +
>> +If the node is now part of the storage node-group, issue init
>> +distributed storage RPC to the respective node. This step is required
>> +after assigning a node to the storage enabled node-group::
>> +
>> +  $ gnt-node remove
>> +
>> +A warning will be issued stating that the node is part of distributed
>> +storage, mark it offline before removal.
>> +
>> +Data collector for Ceph
>> +-----------------------
>> +
>> +TBD
>> +
>> +Future Work
>> +-----------
>> +
>> +Due to the loopback bug in ceph, one may run into daemon hang issues
>> +while performing writes to a RBD volumes through block device mapping.
>> +This bug is applicable only when the RBD volume is stored on the OSD
>> +running on the local node. In order to mitigate this issue, we can
>> +create storage pools on different nodegroups and access RBD
>> +volumes on different pools.
>> +http://tracker.ceph.com/issues/3076
>> +
>> +.. vim: set textwidth=72 :
>> +.. Local Variables:
>> +.. mode: rst
>> +.. fill-column: 72
>> +.. End:
>> diff --git a/doc/design-draft.rst b/doc/design-draft.rst
>> index f164c7c..f49885f 100644
>> --- a/doc/design-draft.rst
>> +++ b/doc/design-draft.rst
>> @@ -24,6 +24,7 @@ Design document drafts
>>     design-cmdlib-unittests.rst
>>     design-hotplug.rst
>>     design-optables.rst
>> +   design-ceph-ganeti-support.rst
>>
>>  .. vim: set textwidth=72 :
>>  .. Local Variables:
>> --
>> 1.7.9.5
>>
>>
>


-- 
Thomas Thrainer | Software Engineer | [email protected] |

Google Germany GmbH
Dienerstr. 12
80331 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Graham Law, Christine Elizabeth Flores

Reply via email to