On 09/27/2012 02:02 PM, Iustin Pop wrote:
On Thu, Sep 27, 2012 at 12:37:41PM +0300, Constantinos Venetsanopoulos wrote:
On 09/26/2012 07:21 PM, Iustin Pop wrote:
On Wed, Sep 26, 2012 at 05:38:17PM +0300, Constantinos Venetsanopoulos wrote:
Update the shared storage design document to reflect the current
changes, after the implementation of the ExtStorage interface.

Signed-off-by: Constantinos Venetsanopoulos <[email protected]>
---
  doc/design-shared-storage.rst |  204 ++++++++++++++++++++++------------------
  1 files changed, 112 insertions(+), 92 deletions(-)

diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst
index c175476..7080182 100644
--- a/doc/design-shared-storage.rst
+++ b/doc/design-shared-storage.rst
@@ -64,15 +64,11 @@ The design addresses the following procedures:
    filesystems.
  - Introduction of shared block device disk template with device
    adoption.
+- Introduction of an External Storage Interface.
  Additionally, mid- to long-term goals include:
  - Support for external “storage pools”.
-- Introduction of an interface for communicating with external scripts,
-  providing methods for the various stages of a block device's and
-  instance's life-cycle. In order to provide storage provisioning
-  capabilities for various SAN appliances, external helpers in the form
-  of a “storage driver” will be possibly introduced as well.
  Refactoring of all code referring to constants.DTS_NET_MIRROR
  =============================================================
@@ -159,6 +155,104 @@ The shared block device template will make the following 
assumptions:
  - The device will be available with the same path under all nodes in the
    node group.
+Introduction of an External Storage Interface
+==============================================
+Overview
+--------
+
+To extend the shared block storage template and give Ganeti the ability
+to control and manipulate external storage (provisioning, removal,
+growing, etc.) we need a more generic approach. The generic method for
+supporting external shared storage in Ganeti will be to have an
+ExtStorage provider for each external shared storage hardware type. The
+ExtStorage provider will be a set of files (executable scripts and text
+files), contained inside a directory which will be named after the
+provider. This directory must be present across all nodes of a nodegroup
+(Ganeti doesn't replicate it), in order for the provider to be usable by
+Ganeti for this nodegroup (valid).
How will Ganeti behave if they are not consistent? Report errors? (in
cluster verify?) Ignore the provider? Etc.
The ExtStorage code follows exactly the behavior of the code
handling OS definitions: It produces appropriate error messages
and also comes with `gnt-storage {diagnose, info}' similarly to
`gnt-os {diagnose, info}'.

There is only one difference compared to the way OS defs are
handled:

The ExtStorage diagnose code calculates the validity of each provider
for each nodegroup in the cmdlib logic rather than in the client.
This was marked as 'TODO'' inside cmdlib for OS diagnose.

This gives you the flexibility to do neat things easily, such as running
the LU from inside cluster verify and producing validity statuses
each provider-nodegroup combination. So, presumably this can also
be used inside `gnt-cluster verify' in the future.
Sounds very good, thanks!

The external shared storage hardware
+should also be accessible by all nodes of this nodegroup too.
+
+An “ExtStorage provider” will have to provide the following methods:
+
+- Create a disk
+- Remove a disk
+- Grow a disk
+- Attach a disk to a given node
+- Detach a disk from a given node
+- Verify its supported parameters
+
+The proposed ExtStorage interface borrows heavily from the OS
+interface and follows a one-script-per-function approach. An ExtStorage
+provider is expected to provide the following scripts:
+
+- `create`
+- `remove`
+- `grow`
+- `attach`
+- `detach`
+- `verify`
+
+All scripts will be called with no arguments and get their input via
+environment variables. A common set of variables will be exported for
+all commands, and some of them might have extra ones.
+
+- `VOL_NAME`: The name of the volume. This is unique for Ganeti and it
+  uses it to refer to a specific volume inside the external storage.
+- `VOL_SIZE`: The volume's size in mebibytes.
+- `VOL_NEW_SIZE`: Available only to the `grow` script. It declares the
+  new size of the volume after grow (in mebibytes).
+- `EXTP_name`: ExtStorage parameter, where `name` is the parameter in
+  upper-case (same as OS interface's `OSP_*` parameters).
+
+All scripts except `attach` should return 0 on success and non-zero on
+error, accompanied by an appropriate error message on stderr. The
+`attach` script should return a string on stdout on success, which is
+the block device's full path, after it has been successfully attached to
+the host node. On error it should return non-zero.
+
+Implementation
+--------------
+
+To support the ExtStorage interface, we will introduce a new disk
+template called `ext`. This template will implement the existing Ganeti
+disk interface in `lib/bdev.py` (create, remove, attach, assemble,
+shutdown, grow), and will simultaneously pass control to the external
+scripts to actually handle the above actions. The `ext` disk template
+will act as a translation layer between the current Ganeti disk
+interface and the ExtStorage providers.
+
+We will also introduce a new IDISK_PARAM called `IDISK_PROVIDER =
+provider`, which will be used at the command line to select the desired
+ExtStorage provider. This parameter will be valid only for template
+`ext` e.g.::
+
+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1
+
+The Extstorage interface will support different disks to be created by
+different providers. e.g.::
+
+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1
+                         --disk=1:size=1G,provider=sample_provider2
+                         --disk=2:size=3G,provider=sample_provider1
This (also in the context of your other design changes) makes me a bit
uneasy, with regards to coordinating changes across multiple providers
in live migration and similar changes (even startup). Have you thought
about this?
I'm not sure I can understand your point completely. Given the
diagnose functionality described above, are you concerned providers
are going to be in inconsistent state among nodes? Is it a matter
of how the allocator decides the target node given different providers?
Ah no, see below.

Can you expand on the "coordinating changes across multiple providers
in live migration and similar changes (even startup)" part of your
question? Perhaps with some examples?
I'll try :)

OK. now its clear, thanks.

I have a _very slight_ worry on that handling "complex" instances will
become more tricky if the behaviour of different storage providers or
disk templates (this is in the context of the other designs) differs.

For example, let's say we have an instance with first disk DRBD, second
disk ext,provider=p1, third disk ext,provider=p2.

We know we can live migrate an instance across node groups for DRBD, and
we now we can migrate ext providers if they are available in both
groups. But combining all these checks across multiple disks is just 2%
more tricky: we need to move from "disk_template in
constants.DTS_MIRRORED" to something like "does all instance disks allow
migration/failover/move from (nodegroup A, nodes [a,b]) to (nodegroup B,
nodes [c,d])" (where A could be equal to B)?

This is doable, just means that a lot of decisions about the instance
behaviour (can be moved, can be live migrated, etc.) will move away from
the instance level (disk_template) and become an aggregate of the
instance's disk capabilities.

Exactly! Wrt the ExtStorage patchset, we won't need to make changes
in the decision making because everything still stays at instance level,
even though we have different providers on different disks. All we have
to do, is make sure all providers are present at the node/nodegroup we
want to migrate/failover/move (I have tested live migrations of instances
with let's say disk0 ext,provider=p1, disk1 ext,provider=p2 without
changing anything in the current allocation logic).

When we introduce Storage Pools and the ability to have different
disks of an instance residing in different Storage Pools, then we will
have to do exactly as you are saying (and is also written in the design
doc). We should move the decision logic from operating at instance
level, to operating at the aggregation of the instance's disks Storage
Pools. At that point, we also don't have a problem with providers,
because providers will be moved from an IDISK_PARAM (which we
need right now as a transition level) to a parameter of the Storage
Pool. Thus, the decision logic will not need to know anything about
providers as it doesn't need to know now.

As you say, we will move from:

"disk_template in constants.DTS_MIRRORED"

to:

"are all the instance's disks Storage Pools connected to the
nodegroup we want to migrate/failover/move".
For Storage Pools of disk template EXT_MIRROR that's all,
for Storage Pools of disk template INT_MIRROR (DRBD)
we will have to adjust the current code that handles the
secondary node.


Which is all fine, now that I thought it through, just something that
we need to keep in mind.

Sure. Sounds really good you find that fine, and I think that with a
little more effort in the decision logic (when we move to Storage
Pools), we will result with a very simple and unified design that will
give even more functionality to Ganeti.

Thanks,
Constantinos


+Finally, the ExtStorage interface will support passing of parameters to
+the ExtStorage provider. This will also be done per disk, from the
+command line::
+
+ gnt-instance add -t ext --disk=0:size=1G,provider=sample_provider1,
+                                  param1=value1,param2=value2
+
+The above parameters will be exported to the ExtStorage provider's
+scripts as the enviromental variables:
+
+- `EXTP_PARAM1 = str(value1)`
+- `EXTP_PARAM2 = str(value2)`
+
+We will also introduce a new Ganeti client called `gnt-storage` which
+will be used to diagnose ExtStorage providers and show information about
+them, similarly to the way  `gnt-os diagose` and `gnt-os info` handle OS
+definitions.
Hmm… you got me here (I was hoping to avoid new python-based CLI
front-ends, but it's too early for the switch ;-).
This makes me happy :-)
I understand this may need some porting to Haskell later on.
I hope my Haskell skills will have improved enough by then, for me to
be able to contribute to the effort.
Sounds good!

thanks a lot.

iustin

Reply via email to