On Wed, Aug 22, 2012 at 04:01:50PM +0200, Iustin Pop wrote: > On Wed, Aug 22, 2012 at 01:15:47PM +0100, Guido Trotter wrote: > > On Tue, Aug 21, 2012 at 5:13 PM, Iustin Pop <ius...@google.com> wrote: > > > On Thu, Aug 16, 2012 at 04:02:09PM +0100, Guido Trotter wrote: > > >> On Thu, Aug 16, 2012 at 3:32 PM, Iustin Pop <iu...@k1024.org> wrote: > > >> > On Thu, Aug 16, 2012 at 03:08:05PM +0100, Guido Trotter wrote: > > >> >> On Thu, Aug 16, 2012 at 2:44 PM, Iustin Pop <iu...@k1024.org> wrote: > > >> >> > On Thu, Aug 16, 2012 at 02:27:27PM +0100, Guido Trotter wrote: > > >> >> >> On Thu, Aug 16, 2012 at 2:11 PM, Iustin Pop <iu...@k1024.org> > > >> >> >> wrote: > > >> >> >> > On Thu, Aug 16, 2012 at 01:37:16PM +0100, Guido Trotter wrote: > > >> >> >> >> This is a long standing bug in Ganeti. Add a small design on > > >> >> >> >> how we plan > > >> >> >> >> to fix this for Ganeti 2.7. > > >> >> >> > > > >> >> >> > Sorry, can you describe the delta from the previous design? Or > > >> >> >> > why did > > >> >> >> > you resend it? > > >> >> >> > > > >> >> >> > > >> >> >> Minor fixes: > > >> >> >> - top title was wrong > > >> >> >> - split the original text in "Proposed Changes" moving the > > >> >> >> configuration level changes to its own "Configuration Changes" > > >> >> >> section. > > >> >> >> - added moving of storage parameters now in the cluster object to > > >> >> >> disk parameters > > >> >> >> > > >> >> >> Sorry for now specifying it before > > >> >> > > > >> >> > Np, thanks! > > >> >> > > > >> >> >> >> diff --git a/doc/design-storagespace.rst > > >> >> >> >> b/doc/design-storagespace.rst > > >> >> >> >> new file mode 100644 > > >> >> >> >> index 0000000..73a2227 > > >> >> >> >> --- /dev/null > > >> >> >> >> +++ b/doc/design-storagespace.rst > > >> >> >> >> @@ -0,0 +1,86 @@ > > >> >> >> >> +============================ > > >> >> >> >> +Storage free space reporting > > >> >> >> >> +============================ > > >> >> >> >> + > > >> >> >> >> +.. contents:: :depth: 4 > > >> >> >> >> + > > >> >> >> >> +Background > > >> >> >> >> +========== > > >> >> >> >> + > > >> >> >> >> +Currently Space reporting is broken for all storage types > > >> >> >> >> except drbd or > > >> >> >> >> +lvm (plain). This design looks at the root causes and proposes > > >> >> >> >> a way to > > >> >> >> >> +fix it. > > >> >> >> >> + > > >> >> >> >> +Proposed changes > > >> >> >> >> +================ > > >> >> >> >> + > > >> >> >> >> +The changes below will streamline Ganeti to properly support > > >> >> >> >> +interaction with different storage types. > > >> >> >> >> + > > >> >> >> >> +Configuration changes > > >> >> >> >> +--------------------- > > >> >> >> >> + > > >> >> >> >> +There will be a list of enabled storage types, of which the > > >> >> >> >> first entry > > >> >> >> >> +will be the default one. As such instances will be able to be > > >> >> >> >> created > > >> >> >> >> +without specifying a storage type, in which case the > > >> >> >> >> cluster/nodegroup > > >> >> >> >> +default will be used. > > >> >> >> > > > >> >> >> > Again, this already exists since 2.6. Are you referring to it or > > >> >> >> > to > > >> >> >> > something else? > > >> >> >> > > > >> >> >> > > >> >> >> Again? Anyway, I couldn't find this feature (looking at "master"), > > >> >> >> could you point me to it, please? > > >> >> > > > >> >> > ipolicy["disk-templates"]; and we handle the first one exactly as is > > >> >> > (the default one used for capacity calculation, etc.). We don't yet > > >> >> > support the auto-selection for instance creation, but that was the > > >> >> > plan. > > >> >> > > > >> >> > > >> >> Ok, no need to do this change then. Perhaps just the fixing of > > >> >> auto-use and moving the cluster level parameters that are storage > > >> >> specific to disk parameters. > > >> > > > >> > Ack. > > >> > > > >> >> qq: why is it an "ipolicy" rather than a cluster level parameter? > > >> > > > >> > I wasn't sure/still not entirely if we need two things (both mechanism > > >> > and policy), or it's enough to have a single one that handles both. > > >> > > > >> > For example, let's say that the policy is to allow only 'drbd' on the > > >> > cluster, but you want existing 'plain' instances to still exist (just > > >> > no > > >> > new ones to be created). If we have the ipolicy only, then it's not > > >> > possible, so this points more to have two separate things. > > >> > > > >> > So the design should maybe refer to the ipolicy and mirror it. > > >> > > > >> >> qq2: should in this view also the list of allowed hypervisors be an > > >> >> ipolicy too? > > >> > > > >> > No, per the above it's not good. On the other hand, having N parameters > > >> > for all the "things" we can enable and then duplicating it in the > > >> > policy > > >> > is not too good either. > > >> > > > >> > Hmm… > > >> > > > >> > > >> I'm not sure we need the "leave alone the current plain instances but > > >> don't allow any more to be created" feature. > > > > > > We do already :( > > > > > > There is a difference between mechanism and policy, and we should have a > > > way to alter both. > > > > > > > Ack, OK. > > > > >> And it would be nice to handle enabled hypervisors and storage spaces > > >> the same way, and without much duplication. > > >> A any suggestion? (ok, if it's too complex we can leave this alone, > > >> for 2.7, and "just" fix free space reporting) > > > > > > For the first round, let's do that. > > > > > > > Ack to this too. > > > > So, finally, should we keep a list of possible "pools" (in the policy) > > and one in the mechanisms, per storage type? > > Indeed. The ones in mechanisms that are not in policy can be selected > via --ignore-policy or such. > > > >> >> >> >> +For new clusters by default we'll enable just lvm and drbd, > > >> >> >> >> but it will > > >> >> >> >> +be possible to select different ones at cluster init or modify > > >> >> >> >> time. At > > >> >> >> >> +upgrade time we'll use as enabled the union of all storage > > >> >> >> >> types > > >> >> >> >> +currently in use by instances, and the "most used" one as the > > >> >> >> >> default. > > >> >> >> >> +(cluster with no instances will still get lvm and drbd only, > > >> >> >> >> and if two > > >> >> >> >> +types are tied one of them will be chosen) > > >> >> >> >> + > > >> >> >> >> +Parameters which now are cluster config values (eg. volume > > >> >> >> >> group name, > > >> >> >> >> +file storage dir path) will be moved as disk parameters for the > > >> >> >> >> +respective storage types. > > >> >> >> >> + > > >> >> >> >> +Enabling/disabling of storage types at ``./configure`` time > > >> >> >> >> will be > > >> >> >> >> +removed. > > >> >> >> >> + > > >> >> >> >> + > > >> >> >> >> +RPC changes > > >> >> >> >> +----------- > > >> >> >> >> + > > >> >> >> >> +The RPC that reports the node storage space will be changed to > > >> >> >> >> accept a > > >> >> >> >> +list of storage methods. It will then report free space for > > >> >> >> >> each of > > >> >> >> >> +them. Note that storage methods at this level will be > > >> >> >> >> different than > > >> >> >> >> +storage types: for example both drbd and plain share the same > > >> >> >> >> storage > > >> >> >> >> +space, so the storage method here will be lvm:<vgname>. For > > >> >> >> >> files and > > >> >> >> >> +sharedfiles the storage method will be filesystem:<dirname>. > > >> >> >> >> Finally > > >> >> >> >> +other storage methods will be their own domain-specific > > >> >> >> >> name/value. > > >> >> >> > > > >> >> >> > This is very tricky. First, sharedfiles ≠ files, so they can't be > > >> >> >> > combined. That aside, the rules of which storages are combined > > >> >> >> > and which > > >> >> >> > not seem to be hardcoded. > > >> >> >> > > > >> >> >> > I wonder if it wouldn't be better that ganeti reports something > > >> >> >> > like: > > >> >> >> > > > >> >> >> > free space = { > > >> >> >> > (drbd, plain): [(xenvg, 500), (vg2, 100)], # per node > > >> >> >> > (file): [(default, 500)], # per node > > >> >> >> > (rbd): [(pool1, 100), (pool2, 200)] # global per cluster > > >> >> >> > (sharedfile): [(default, 100)] # this is global per cluster, > > >> >> >> > but not > > >> >> >> > # explained here > > >> >> >> > ... > > >> >> >> > } > > >> >> >> > > > >> >> >> > I.e. describe in the actual data structures which storage types > > >> >> >> > are > > >> >> >> > combined, instead of making this information "hidden". We still > > >> >> >> > don't > > >> >> >> > have a way to describe which are shared and which not, but that > > >> >> >> > we > > >> >> >> > already hardcode. > > >> >> >> > > > >> >> >> > > >> >> >> Yes, I agree the mapping between which are shared and which are not > > >> >> >> must be explicit and not implicit. > > >> >> >> Although I hadn't specified it here, that was the plan. > > >> >> >> For file/sharedfile what I meant is that it's common the "way of > > >> >> >> fetching the free space from a directory" not the directory itself > > >> >> >> nor > > >> >> >> the space. > > >> >> > > > >> >> > Ack. > > >> >> > > > >> >> >> The point was more that from the backend (noded) point of view > > >> >> >> fetching free space would be "<mechanism>:<key>" (eg: lvm:xenvg) > > >> >> >> and > > >> >> >> there would be a mapping between storage types and mechanisms: > > >> >> >> "plain, > > >> >> >> drbd -> lvm", "file, sharedfile -> filesystem". > > >> >> > > > >> >> > Where would the mapping live? From the point of view of noded, it > > >> >> > should > > >> >> > be very dumb, so it shouldn't make this kind of mapping itself. > > >> >> > > > >> >> > > >> >> No the mapping would live in masterd, it's a transformation needed > > >> >> before performing that query. Node just receives the "method:key" > > >> >> request and replies with the storage space it found in that > > >> >> method,key. > > >> > > > >> > But this is the thing I'm not sure it's good to do. Noded should only > > >> > care about lvm, file, rbd, and not how these map onto "methods"; the > > >> > same way as noded doesn't know anything about disk templates, only > > >> > about > > >> > logical disk types. > > >> > > > >> > > >> But that's what I said: noded will just know how to to query the free > > >> space for a particular method and it will be masterd's job to ask for > > >> the right method depending on the storage type. > > >> (lvm, file (or filesystem) and rbd_pool are the methods, while the > > >> mapping done in masterd is: "plain and drbd map onto lvm", "file and > > >> sharedfile map on "filesystem"), etc. > > >> > > >> I believe we do agree on this, but then maybe I didn't understand your > > >> objection correctly: is the above what you're looking for, or you'd > > >> like something different? In which case, can you go a bit more in > > >> detail into what? > > > > > > I would like the node not to have to know about methods. The node > > > already knows about disk types, and the master can make itself the > > > connection disk type to method, so no need for the node to know methods, > > > similar to how it doesn't know disk templates but it knows disk type. > > > > > > Now, if the argument is that the node has no concept yet of free disk > > > for anything except lvm, that's somewhat true; we have the node storage > > > framework, maybe we should just extend that and use it. > > > > > > > The point here is that the disk type is at a higher level than the > > method, you can know about the method and not the disk type, but not > > vice versa. > > Let me explain: > > > > - if I tell the node "give me the lvm free space on vg xenvg" the > > node can run a command and find out > > - if I tell the node "give me the free space for storage type drbd" > > the node has to find out that drbd uses lvm volume xenvg (somehow) and > > then run the same command above (thus knowing about the method) > > > > Same for files, rbd, etc. So there's no way the node can "not know" > > about the method, and it's not useful for the master to connect them > > to the disk type, somehow, it seems to me (if I'm still not clear we > > can have a quick in-person/over vc chat about this, perhaps) > > OK, this was all a misunderstanding of what mechanism means, sorry for > that. > > The first one is what we want to do, indeed. > > > >> >> ACK, makes sense (at the masterd level, again. we'll keep noded dumb). > > >> >> So drbd would be a transform and use lvm as backing store? In which > > >> >> case, if this is declared, it becomes easy to query lvm when we want > > >> >> the information about drbd. > > >> >> How do we handle in this sense multiple volume groups? > > >> > > > >> > All storage types will have a list of "pools"; for files, it's always a > > >> > list of length one, for drbd/lvm it's how many VGs we have enabled, and > > >> > for rados it's similar - the pools that we have enabled. > > >> > > > >> > > >> Ack. Should we have a better way to specify the pool at instance create > > >> time? > > >> (currently we have vg and metavg, but this doesn't address chosing a > > >> separate rados pool, for example). > > >> > > >> >> (In my view the drbd storage method would list allowed vgs, as would > > >> >> the plain one, but in this view only plain would, and drbd would need > > >> >> to just "use" those). > > >> >> (which is ok, although it doesn't live the flexibility of saying "use > > >> >> this vg for plain only, and this vg for drbd) (not that we have that > > >> >> now, but we could get it for free if we went the other way). > > >> > > > >> > ack, good points. > > >> > > > >> > > >> So we should go with treating drbd and plain as "equal" with the > > >> "allowed vgs" in the disk parameters of each. Ack. > > >> In which case drbd can also have "allowed meta vgs" as those might be > > >> different, I suppose. > > > > > > Yes. > > > > > >> (somehow this goes to: all methods have one pool, except for drbd > > >> which has a pool and a metapool... mmm... ) :( > > > > > > Well, life is complicated? :P > > > > > > > Right. :) > > :) > > > > I'm wondering if this is not a sign that our framework is a bit broken. > > > In the sense that for (e.g.) disk sizes, we have min, max, std > > > (=default), but for metavg we only have default. So in this sense, we > > > would actually need for each parameter, no matter whether it's disk > > > parameter, or node parameter: default and allowed values. > > > > > > > well we have min=max=default, as well. But on the other hand we don't > > track metavgs by policy at all, right? > > Does it belong there? And is this related to disk size reporting? > > (finding out about the correct vg is, but this might not be...) > > > > > Which brings another point: the only reason we protect some values is > > > because some operations take input from "less-trusted" entities (e.g. > > > disk sizes), but we expect that node parameters are from "more-trusted" > > > ones. > > > > > > Argh, quagmire. Let's just do: > > > > > > - we need to have a list of configured VGs per cluster, instead of "one > > > VG", also configured Rados pools (each storage method basically) > > > > Yes. > > > > > - all verification, queries, etc., should only talk to those > > > - drbd and meta volume can choose any of those, but the defaults will be > > > most likely used > > > > > > > Yes. > > > > > We will see later if we find a good framework for defining configured > > > "stuff" for storage methods; right now is just the pool, basically. > > > > > > > Ack. So the simplification is: we don't distinguish between metapool > > and pool, drbd can use both interchangeably and it's up to the user to > > do the right thing. Right? > > Yep. > > Thanks!
And by thanks, I mean LGTM, but I'd like a resend first :) thanks, iustin