Re: [PATCH master 1/7] Update the shared storage design document

Iustin Pop Mon, 08 Oct 2012 06:48:35 -0700

On Thu, Sep 27, 2012 at 02:46:40PM +0300, Constantinos Venetsanopoulos wrote:
> On 09/27/2012 02:02 PM, Iustin Pop wrote:
> >On Thu, Sep 27, 2012 at 12:37:41PM +0300, Constantinos Venetsanopoulos wrote:
> >>On 09/26/2012 07:21 PM, Iustin Pop wrote:
> >>>On Wed, Sep 26, 2012 at 05:38:17PM +0300, Constantinos Venetsanopoulos 
> >>>wrote:
> >>>>Update the shared storage design document to reflect the current
> >>>>changes, after the implementation of the ExtStorage interface.
> >>>>
> >>>>Signed-off-by: Constantinos Venetsanopoulos <[email protected]>
> >>>>---
> >>>>  doc/design-shared-storage.rst |  204 
> >>>> ++++++++++++++++++++++------------------
> >>>>  1 files changed, 112 insertions(+), 92 deletions(-)
> >>>>
> >>>>diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst
> >>>>index c175476..7080182 100644
> >>>>--- a/doc/design-shared-storage.rst
> >>>>+++ b/doc/design-shared-storage.rst
> >>>>@@ -64,15 +64,11 @@ The design addresses the following procedures:
> >>>>    filesystems.
> >>>>  - Introduction of shared block device disk template with device
> >>>>    adoption.
> >>>>+- Introduction of an External Storage Interface.
> >>>>  Additionally, mid- to long-term goals include:
> >>>>  - Support for external “storage pools”.
> >>>>-- Introduction of an interface for communicating with external scripts,
> >>>>-  providing methods for the various stages of a block device's and
> >>>>-  instance's life-cycle. In order to provide storage provisioning
> >>>>-  capabilities for various SAN appliances, external helpers in the form
> >>>>-  of a “storage driver” will be possibly introduced as well.
> >>>>  Refactoring of all code referring to constants.DTS_NET_MIRROR
> >>>>  =============================================================
> >>>>@@ -159,6 +155,104 @@ The shared block device template will make the 
> >>>>following assumptions:
> >>>>  - The device will be available with the same path under all nodes in the
> >>>>    node group.
> >>>>+Introduction of an External Storage Interface
> >>>>+==============================================
> >>>>+Overview
> >>>>+--------
> >>>>+
> >>>>+To extend the shared block storage template and give Ganeti the ability
> >>>>+to control and manipulate external storage (provisioning, removal,
> >>>>+growing, etc.) we need a more generic approach. The generic method for
> >>>>+supporting external shared storage in Ganeti will be to have an
> >>>>+ExtStorage provider for each external shared storage hardware type. The
> >>>>+ExtStorage provider will be a set of files (executable scripts and text
> >>>>+files), contained inside a directory which will be named after the
> >>>>+provider. This directory must be present across all nodes of a nodegroup
> >>>>+(Ganeti doesn't replicate it), in order for the provider to be usable by
> >>>>+Ganeti for this nodegroup (valid).
> >>>How will Ganeti behave if they are not consistent? Report errors? (in
> >>>cluster verify?) Ignore the provider? Etc.
> >>The ExtStorage code follows exactly the behavior of the code
> >>handling OS definitions: It produces appropriate error messages
> >>and also comes with `gnt-storage {diagnose, info}' similarly to
> >>`gnt-os {diagnose, info}'.
> >>
> >>There is only one difference compared to the way OS defs are
> >>handled:
> >>
> >>The ExtStorage diagnose code calculates the validity of each provider
> >>for each nodegroup in the cmdlib logic rather than in the client.
> >>This was marked as 'TODO'' inside cmdlib for OS diagnose.
> >>
> >>This gives you the flexibility to do neat things easily, such as running
> >>the LU from inside cluster verify and producing validity statuses
> >>each provider-nodegroup combination. So, presumably this can also
> >>be used inside `gnt-cluster verify' in the future.
> >Sounds very good, thanks!
> >
> >>>>The external shared storage hardware
> >>>>+should also be accessible by all nodes of this nodegroup too.
> >>>>+
> >>>>+An “ExtStorage provider” will have to provide the following methods:
> >>>>+
> >>>>+- Create a disk
> >>>>+- Remove a disk
> >>>>+- Grow a disk
> >>>>+- Attach a disk to a given node
> >>>>+- Detach a disk from a given node
> >>>>+- Verify its supported parameters
> >>>>+
> >>>>+The proposed ExtStorage interface borrows heavily from the OS
> >>>>+interface and follows a one-script-per-function approach. An ExtStorage
> >>>>+provider is expected to provide the following scripts:
> >>>>+
> >>>>+- `create`
> >>>>+- `remove`
> >>>>+- `grow`
> >>>>+- `attach`
> >>>>+- `detach`
> >>>>+- `verify`
> >>>>+
> >>>>+All scripts will be called with no arguments and get their input via
> >>>>+environment variables. A common set of variables will be exported for
> >>>>+all commands, and some of them might have extra ones.
> >>>>+
> >>>>+- `VOL_NAME`: The name of the volume. This is unique for Ganeti and it
> >>>>+  uses it to refer to a specific volume inside the external storage.
> >>>>+- `VOL_SIZE`: The volume's size in mebibytes.
> >>>>+- `VOL_NEW_SIZE`: Available only to the `grow` script. It declares the
> >>>>+  new size of the volume after grow (in mebibytes).
> >>>>+- `EXTP_name`: ExtStorage parameter, where `name` is the parameter in
> >>>>+  upper-case (same as OS interface's `OSP_*` parameters).
> >>>>+
> >>>>+All scripts except `attach` should return 0 on success and non-zero on
> >>>>+error, accompanied by an appropriate error message on stderr. The
> >>>>+`attach` script should return a string on stdout on success, which is
> >>>>+the block device's full path, after it has been successfully attached to
> >>>>+the host node. On error it should return non-zero.
> >>>>+
> >>>>+Implementation
> >>>>+--------------
> >>>>+
> >>>>+To support the ExtStorage interface, we will introduce a new disk
> >>>>+template called `ext`. This template will implement the existing Ganeti
> >>>>+disk interface in `lib/bdev.py` (create, remove, attach, assemble,
> >>>>+shutdown, grow), and will simultaneously pass control to the external
> >>>>+scripts to actually handle the above actions. The `ext` disk template
> >>>>+will act as a translation layer between the current Ganeti disk
> >>>>+interface and the ExtStorage providers.
> >>>>+
> >>>>+We will also introduce a new IDISK_PARAM called `IDISK_PROVIDER =
> >>>>+provider`, which will be used at the command line to select the desired
> >>>>+ExtStorage provider. This parameter will be valid only for template
> >>>>+`ext` e.g.::
> >>>>+
> >>>>+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1
> >>>>+
> >>>>+The Extstorage interface will support different disks to be created by
> >>>>+different providers. e.g.::
> >>>>+
> >>>>+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1
> >>>>+                         --disk=1:size=1G,provider=sample_provider2
> >>>>+                         --disk=2:size=3G,provider=sample_provider1
> >>>This (also in the context of your other design changes) makes me a bit
> >>>uneasy, with regards to coordinating changes across multiple providers
> >>>in live migration and similar changes (even startup). Have you thought
> >>>about this?
> >>I'm not sure I can understand your point completely. Given the
> >>diagnose functionality described above, are you concerned providers
> >>are going to be in inconsistent state among nodes? Is it a matter
> >>of how the allocator decides the target node given different providers?
> >Ah no, see below.
> >
> >>Can you expand on the "coordinating changes across multiple providers
> >>in live migration and similar changes (even startup)" part of your
> >>question? Perhaps with some examples?
> >I'll try :)
> 
> OK. now its clear, thanks.
> 
> >I have a _very slight_ worry on that handling "complex" instances will
> >become more tricky if the behaviour of different storage providers or
> >disk templates (this is in the context of the other designs) differs.
> >
> >For example, let's say we have an instance with first disk DRBD, second
> >disk ext,provider=p1, third disk ext,provider=p2.
> >
> >We know we can live migrate an instance across node groups for DRBD, and
> >we now we can migrate ext providers if they are available in both
> >groups. But combining all these checks across multiple disks is just 2%
> >more tricky: we need to move from "disk_template in
> >constants.DTS_MIRRORED" to something like "does all instance disks allow
> >migration/failover/move from (nodegroup A, nodes [a,b]) to (nodegroup B,
> >nodes [c,d])" (where A could be equal to B)?
> >
> >This is doable, just means that a lot of decisions about the instance
> >behaviour (can be moved, can be live migrated, etc.) will move away from
> >the instance level (disk_template) and become an aggregate of the
> >instance's disk capabilities.
> 
> Exactly! Wrt the ExtStorage patchset, we won't need to make changes
> in the decision making because everything still stays at instance level,
> even though we have different providers on different disks. All we have
> to do, is make sure all providers are present at the node/nodegroup we
> want to migrate/failover/move (I have tested live migrations of instances
> with let's say disk0 ext,provider=p1, disk1 ext,provider=p2 without
> changing anything in the current allocation logic).
> 
> When we introduce Storage Pools and the ability to have different
> disks of an instance residing in different Storage Pools, then we will
> have to do exactly as you are saying (and is also written in the design
> doc).


Hah, sorry, I didn't read those designs except very briefly :)

> We should move the decision logic from operating at instance
> level, to operating at the aggregation of the instance's disks Storage
> Pools. At that point, we also don't have a problem with providers,
> because providers will be moved from an IDISK_PARAM (which we
> need right now as a transition level) to a parameter of the Storage
> Pool. Thus, the decision logic will not need to know anything about
> providers as it doesn't need to know now.
> 
> As you say, we will move from:
> 
> "disk_template in constants.DTS_MIRRORED"
> 
> to:
> 
> "are all the instance's disks Storage Pools connected to the
> nodegroup we want to migrate/failover/move".
> For Storage Pools of disk template EXT_MIRROR that's all,
> for Storage Pools of disk template INT_MIRROR (DRBD)
> we will have to adjust the current code that handles the
> secondary node.

Sounds good.

> >Which is all fine, now that I thought it through, just something that
> >we need to keep in mind.
> 
> Sure. Sounds really good you find that fine, and I think that with a
> little more effort in the decision logic (when we move to Storage
> Pools), we will result with a very simple and unified design that will
> give even more functionality to Ganeti.

Indeed.

thanks,
iustin

Re: [PATCH master 1/7] Update the shared storage design document

Reply via email to