On Thu, Sep 27, 2012 at 02:46:40PM +0300, Constantinos Venetsanopoulos wrote: > On 09/27/2012 02:02 PM, Iustin Pop wrote: > >On Thu, Sep 27, 2012 at 12:37:41PM +0300, Constantinos Venetsanopoulos wrote: > >>On 09/26/2012 07:21 PM, Iustin Pop wrote: > >>>On Wed, Sep 26, 2012 at 05:38:17PM +0300, Constantinos Venetsanopoulos > >>>wrote: > >>>>Update the shared storage design document to reflect the current > >>>>changes, after the implementation of the ExtStorage interface. > >>>> > >>>>Signed-off-by: Constantinos Venetsanopoulos <[email protected]> > >>>>--- > >>>> doc/design-shared-storage.rst | 204 > >>>> ++++++++++++++++++++++------------------ > >>>> 1 files changed, 112 insertions(+), 92 deletions(-) > >>>> > >>>>diff --git a/doc/design-shared-storage.rst b/doc/design-shared-storage.rst > >>>>index c175476..7080182 100644 > >>>>--- a/doc/design-shared-storage.rst > >>>>+++ b/doc/design-shared-storage.rst > >>>>@@ -64,15 +64,11 @@ The design addresses the following procedures: > >>>> filesystems. > >>>> - Introduction of shared block device disk template with device > >>>> adoption. > >>>>+- Introduction of an External Storage Interface. > >>>> Additionally, mid- to long-term goals include: > >>>> - Support for external “storage pools”. > >>>>-- Introduction of an interface for communicating with external scripts, > >>>>- providing methods for the various stages of a block device's and > >>>>- instance's life-cycle. In order to provide storage provisioning > >>>>- capabilities for various SAN appliances, external helpers in the form > >>>>- of a “storage driver” will be possibly introduced as well. > >>>> Refactoring of all code referring to constants.DTS_NET_MIRROR > >>>> ============================================================= > >>>>@@ -159,6 +155,104 @@ The shared block device template will make the > >>>>following assumptions: > >>>> - The device will be available with the same path under all nodes in the > >>>> node group. > >>>>+Introduction of an External Storage Interface > >>>>+============================================== > >>>>+Overview > >>>>+-------- > >>>>+ > >>>>+To extend the shared block storage template and give Ganeti the ability > >>>>+to control and manipulate external storage (provisioning, removal, > >>>>+growing, etc.) we need a more generic approach. The generic method for > >>>>+supporting external shared storage in Ganeti will be to have an > >>>>+ExtStorage provider for each external shared storage hardware type. The > >>>>+ExtStorage provider will be a set of files (executable scripts and text > >>>>+files), contained inside a directory which will be named after the > >>>>+provider. This directory must be present across all nodes of a nodegroup > >>>>+(Ganeti doesn't replicate it), in order for the provider to be usable by > >>>>+Ganeti for this nodegroup (valid). > >>>How will Ganeti behave if they are not consistent? Report errors? (in > >>>cluster verify?) Ignore the provider? Etc. > >>The ExtStorage code follows exactly the behavior of the code > >>handling OS definitions: It produces appropriate error messages > >>and also comes with `gnt-storage {diagnose, info}' similarly to > >>`gnt-os {diagnose, info}'. > >> > >>There is only one difference compared to the way OS defs are > >>handled: > >> > >>The ExtStorage diagnose code calculates the validity of each provider > >>for each nodegroup in the cmdlib logic rather than in the client. > >>This was marked as 'TODO'' inside cmdlib for OS diagnose. > >> > >>This gives you the flexibility to do neat things easily, such as running > >>the LU from inside cluster verify and producing validity statuses > >>each provider-nodegroup combination. So, presumably this can also > >>be used inside `gnt-cluster verify' in the future. > >Sounds very good, thanks! > > > >>>>The external shared storage hardware > >>>>+should also be accessible by all nodes of this nodegroup too. > >>>>+ > >>>>+An “ExtStorage provider” will have to provide the following methods: > >>>>+ > >>>>+- Create a disk > >>>>+- Remove a disk > >>>>+- Grow a disk > >>>>+- Attach a disk to a given node > >>>>+- Detach a disk from a given node > >>>>+- Verify its supported parameters > >>>>+ > >>>>+The proposed ExtStorage interface borrows heavily from the OS > >>>>+interface and follows a one-script-per-function approach. An ExtStorage > >>>>+provider is expected to provide the following scripts: > >>>>+ > >>>>+- `create` > >>>>+- `remove` > >>>>+- `grow` > >>>>+- `attach` > >>>>+- `detach` > >>>>+- `verify` > >>>>+ > >>>>+All scripts will be called with no arguments and get their input via > >>>>+environment variables. A common set of variables will be exported for > >>>>+all commands, and some of them might have extra ones. > >>>>+ > >>>>+- `VOL_NAME`: The name of the volume. This is unique for Ganeti and it > >>>>+ uses it to refer to a specific volume inside the external storage. > >>>>+- `VOL_SIZE`: The volume's size in mebibytes. > >>>>+- `VOL_NEW_SIZE`: Available only to the `grow` script. It declares the > >>>>+ new size of the volume after grow (in mebibytes). > >>>>+- `EXTP_name`: ExtStorage parameter, where `name` is the parameter in > >>>>+ upper-case (same as OS interface's `OSP_*` parameters). > >>>>+ > >>>>+All scripts except `attach` should return 0 on success and non-zero on > >>>>+error, accompanied by an appropriate error message on stderr. The > >>>>+`attach` script should return a string on stdout on success, which is > >>>>+the block device's full path, after it has been successfully attached to > >>>>+the host node. On error it should return non-zero. > >>>>+ > >>>>+Implementation > >>>>+-------------- > >>>>+ > >>>>+To support the ExtStorage interface, we will introduce a new disk > >>>>+template called `ext`. This template will implement the existing Ganeti > >>>>+disk interface in `lib/bdev.py` (create, remove, attach, assemble, > >>>>+shutdown, grow), and will simultaneously pass control to the external > >>>>+scripts to actually handle the above actions. The `ext` disk template > >>>>+will act as a translation layer between the current Ganeti disk > >>>>+interface and the ExtStorage providers. > >>>>+ > >>>>+We will also introduce a new IDISK_PARAM called `IDISK_PROVIDER = > >>>>+provider`, which will be used at the command line to select the desired > >>>>+ExtStorage provider. This parameter will be valid only for template > >>>>+`ext` e.g.:: > >>>>+ > >>>>+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1 > >>>>+ > >>>>+The Extstorage interface will support different disks to be created by > >>>>+different providers. e.g.:: > >>>>+ > >>>>+ gnt-instance add -t ext --disk=0:size=2G,provider=sample_provider1 > >>>>+ --disk=1:size=1G,provider=sample_provider2 > >>>>+ --disk=2:size=3G,provider=sample_provider1 > >>>This (also in the context of your other design changes) makes me a bit > >>>uneasy, with regards to coordinating changes across multiple providers > >>>in live migration and similar changes (even startup). Have you thought > >>>about this? > >>I'm not sure I can understand your point completely. Given the > >>diagnose functionality described above, are you concerned providers > >>are going to be in inconsistent state among nodes? Is it a matter > >>of how the allocator decides the target node given different providers? > >Ah no, see below. > > > >>Can you expand on the "coordinating changes across multiple providers > >>in live migration and similar changes (even startup)" part of your > >>question? Perhaps with some examples? > >I'll try :) > > OK. now its clear, thanks. > > >I have a _very slight_ worry on that handling "complex" instances will > >become more tricky if the behaviour of different storage providers or > >disk templates (this is in the context of the other designs) differs. > > > >For example, let's say we have an instance with first disk DRBD, second > >disk ext,provider=p1, third disk ext,provider=p2. > > > >We know we can live migrate an instance across node groups for DRBD, and > >we now we can migrate ext providers if they are available in both > >groups. But combining all these checks across multiple disks is just 2% > >more tricky: we need to move from "disk_template in > >constants.DTS_MIRRORED" to something like "does all instance disks allow > >migration/failover/move from (nodegroup A, nodes [a,b]) to (nodegroup B, > >nodes [c,d])" (where A could be equal to B)? > > > >This is doable, just means that a lot of decisions about the instance > >behaviour (can be moved, can be live migrated, etc.) will move away from > >the instance level (disk_template) and become an aggregate of the > >instance's disk capabilities. > > Exactly! Wrt the ExtStorage patchset, we won't need to make changes > in the decision making because everything still stays at instance level, > even though we have different providers on different disks. All we have > to do, is make sure all providers are present at the node/nodegroup we > want to migrate/failover/move (I have tested live migrations of instances > with let's say disk0 ext,provider=p1, disk1 ext,provider=p2 without > changing anything in the current allocation logic). > > When we introduce Storage Pools and the ability to have different > disks of an instance residing in different Storage Pools, then we will > have to do exactly as you are saying (and is also written in the design > doc).
Hah, sorry, I didn't read those designs except very briefly :) > We should move the decision logic from operating at instance > level, to operating at the aggregation of the instance's disks Storage > Pools. At that point, we also don't have a problem with providers, > because providers will be moved from an IDISK_PARAM (which we > need right now as a transition level) to a parameter of the Storage > Pool. Thus, the decision logic will not need to know anything about > providers as it doesn't need to know now. > > As you say, we will move from: > > "disk_template in constants.DTS_MIRRORED" > > to: > > "are all the instance's disks Storage Pools connected to the > nodegroup we want to migrate/failover/move". > For Storage Pools of disk template EXT_MIRROR that's all, > for Storage Pools of disk template INT_MIRROR (DRBD) > we will have to adjust the current code that handles the > secondary node. Sounds good. > >Which is all fine, now that I thought it through, just something that > >we need to keep in mind. > > Sure. Sounds really good you find that fine, and I think that with a > little more effort in the decision logic (when we move to Storage > Pools), we will result with a very simple and unified design that will > give even more functionality to Ganeti. Indeed. thanks, iustin
