Re: [PATCH master] Design correct reporting of storage free space

Iustin Pop Wed, 22 Aug 2012 07:02:50 -0700

On Wed, Aug 22, 2012 at 04:01:50PM +0200, Iustin Pop wrote:
> On Wed, Aug 22, 2012 at 01:15:47PM +0100, Guido Trotter wrote:
> > On Tue, Aug 21, 2012 at 5:13 PM, Iustin Pop <ius...@google.com> wrote:
> > > On Thu, Aug 16, 2012 at 04:02:09PM +0100, Guido Trotter wrote:
> > >> On Thu, Aug 16, 2012 at 3:32 PM, Iustin Pop <iu...@k1024.org> wrote:
> > >> > On Thu, Aug 16, 2012 at 03:08:05PM +0100, Guido Trotter wrote:
> > >> >> On Thu, Aug 16, 2012 at 2:44 PM, Iustin Pop <iu...@k1024.org> wrote:
> > >> >> > On Thu, Aug 16, 2012 at 02:27:27PM +0100, Guido Trotter wrote:
> > >> >> >> On Thu, Aug 16, 2012 at 2:11 PM, Iustin Pop <iu...@k1024.org> 
> > >> >> >> wrote:
> > >> >> >> > On Thu, Aug 16, 2012 at 01:37:16PM +0100, Guido Trotter wrote:
> > >> >> >> >> This is a long standing bug in Ganeti. Add a small design on 
> > >> >> >> >> how we plan
> > >> >> >> >> to fix this for Ganeti 2.7.
> > >> >> >> >
> > >> >> >> > Sorry, can you describe the delta from the previous design? Or 
> > >> >> >> > why did
> > >> >> >> > you resend it?
> > >> >> >> >
> > >> >> >>
> > >> >> >> Minor fixes:
> > >> >> >>   - top title was wrong
> > >> >> >>   - split the original text in "Proposed Changes" moving the
> > >> >> >> configuration level changes to its own "Configuration Changes"
> > >> >> >> section.
> > >> >> >>   - added moving of storage parameters now in the cluster object to
> > >> >> >> disk parameters
> > >> >> >>
> > >> >> >> Sorry for now specifying it before
> > >> >> >
> > >> >> > Np, thanks!
> > >> >> >
> > >> >> >> >> diff --git a/doc/design-storagespace.rst 
> > >> >> >> >> b/doc/design-storagespace.rst
> > >> >> >> >> new file mode 100644
> > >> >> >> >> index 0000000..73a2227
> > >> >> >> >> --- /dev/null
> > >> >> >> >> +++ b/doc/design-storagespace.rst
> > >> >> >> >> @@ -0,0 +1,86 @@
> > >> >> >> >> +============================
> > >> >> >> >> +Storage free space reporting
> > >> >> >> >> +============================
> > >> >> >> >> +
> > >> >> >> >> +.. contents:: :depth: 4
> > >> >> >> >> +
> > >> >> >> >> +Background
> > >> >> >> >> +==========
> > >> >> >> >> +
> > >> >> >> >> +Currently Space reporting is broken for all storage types 
> > >> >> >> >> except drbd or
> > >> >> >> >> +lvm (plain). This design looks at the root causes and proposes 
> > >> >> >> >> a way to
> > >> >> >> >> +fix it.
> > >> >> >> >> +
> > >> >> >> >> +Proposed changes
> > >> >> >> >> +================
> > >> >> >> >> +
> > >> >> >> >> +The changes below will streamline Ganeti to properly support
> > >> >> >> >> +interaction with different storage types.
> > >> >> >> >> +
> > >> >> >> >> +Configuration changes
> > >> >> >> >> +---------------------
> > >> >> >> >> +
> > >> >> >> >> +There will be a list of enabled storage types, of which the 
> > >> >> >> >> first entry
> > >> >> >> >> +will be the default one. As such instances will be able to be 
> > >> >> >> >> created
> > >> >> >> >> +without specifying a storage type, in which case the 
> > >> >> >> >> cluster/nodegroup
> > >> >> >> >> +default will be used.
> > >> >> >> >
> > >> >> >> > Again, this already exists since 2.6. Are you referring to it or 
> > >> >> >> > to
> > >> >> >> > something else?
> > >> >> >> >
> > >> >> >>
> > >> >> >> Again? Anyway, I couldn't find this feature (looking at "master"),
> > >> >> >> could you point me to it, please?
> > >> >> >
> > >> >> > ipolicy["disk-templates"]; and we handle the first one exactly as is
> > >> >> > (the default one used for capacity calculation, etc.). We don't yet
> > >> >> > support the auto-selection for instance creation, but that was the 
> > >> >> > plan.
> > >> >> >
> > >> >>
> > >> >> Ok, no need to do this change then. Perhaps just the fixing of
> > >> >> auto-use and moving the cluster level parameters that are storage
> > >> >> specific to disk parameters.
> > >> >
> > >> > Ack.
> > >> >
> > >> >> qq: why is it an "ipolicy" rather than a cluster level parameter?
> > >> >
> > >> > I wasn't sure/still not entirely if we need two things (both mechanism
> > >> > and policy), or it's enough to have a single one that handles both.
> > >> >
> > >> > For example, let's say that the policy is to allow only 'drbd' on the
> > >> > cluster, but you want existing 'plain' instances to still exist (just 
> > >> > no
> > >> > new ones to be created). If we have the ipolicy only, then it's not
> > >> > possible, so this points more to have two separate things.
> > >> >
> > >> > So the design should maybe refer to the ipolicy and mirror it.
> > >> >
> > >> >> qq2: should in this view also the list of allowed hypervisors be an 
> > >> >> ipolicy too?
> > >> >
> > >> > No, per the above it's not good. On the other hand, having N parameters
> > >> > for all the "things" we can enable and then duplicating it in the 
> > >> > policy
> > >> > is not too good either.
> > >> >
> > >> > Hmm…
> > >> >
> > >>
> > >> I'm not sure we need the "leave alone the current plain instances but
> > >> don't allow any more to be created" feature.
> > >
> > > We do already :(
> > >
> > > There is a difference between mechanism and policy, and we should have a
> > > way to alter both.
> > >
> > 
> > Ack, OK.
> > 
> > >> And it would be nice to handle enabled hypervisors and storage spaces
> > >> the same way, and without much duplication.
> > >> A any suggestion? (ok, if it's too complex we can leave this alone,
> > >> for 2.7, and "just" fix free space reporting)
> > >
> > > For the first round, let's do that.
> > >
> > 
> > Ack to this too.
> > 
> > So, finally, should we keep a list of possible "pools" (in the policy)
> > and one in the mechanisms, per storage type?
> 
> Indeed. The ones in mechanisms that are not in policy can be selected
> via --ignore-policy or such.
> 
> > >> >> >> >> +For new clusters by default we'll enable just lvm and drbd, 
> > >> >> >> >> but it will
> > >> >> >> >> +be possible to select different ones at cluster init or modify 
> > >> >> >> >> time. At
> > >> >> >> >> +upgrade time we'll use as enabled the union of all storage 
> > >> >> >> >> types
> > >> >> >> >> +currently in use by instances, and the "most used" one as the 
> > >> >> >> >> default.
> > >> >> >> >> +(cluster with no instances will still get lvm and drbd only, 
> > >> >> >> >> and if two
> > >> >> >> >> +types are tied one of them will be chosen)
> > >> >> >> >> +
> > >> >> >> >> +Parameters which now are cluster config values (eg. volume 
> > >> >> >> >> group name,
> > >> >> >> >> +file storage dir path) will be moved as disk parameters for the
> > >> >> >> >> +respective storage types.
> > >> >> >> >> +
> > >> >> >> >> +Enabling/disabling of storage types at ``./configure`` time 
> > >> >> >> >> will be
> > >> >> >> >> +removed.
> > >> >> >> >> +
> > >> >> >> >> +
> > >> >> >> >> +RPC changes
> > >> >> >> >> +-----------
> > >> >> >> >> +
> > >> >> >> >> +The RPC that reports the node storage space will be changed to 
> > >> >> >> >> accept a
> > >> >> >> >> +list of storage methods. It will then report free space for 
> > >> >> >> >> each of
> > >> >> >> >> +them. Note that storage methods at this level will be 
> > >> >> >> >> different than
> > >> >> >> >> +storage types: for example both drbd and plain share the same 
> > >> >> >> >> storage
> > >> >> >> >> +space, so the storage method here will be lvm:<vgname>. For 
> > >> >> >> >> files and
> > >> >> >> >> +sharedfiles the storage method will be filesystem:<dirname>. 
> > >> >> >> >> Finally
> > >> >> >> >> +other storage methods will be their own domain-specific 
> > >> >> >> >> name/value.
> > >> >> >> >
> > >> >> >> > This is very tricky. First, sharedfiles ≠ files, so they can't be
> > >> >> >> > combined. That aside, the rules of which storages are combined 
> > >> >> >> > and which
> > >> >> >> > not seem to be hardcoded.
> > >> >> >> >
> > >> >> >> > I wonder if it wouldn't be better that ganeti reports something 
> > >> >> >> > like:
> > >> >> >> >
> > >> >> >> > free space = {
> > >> >> >> >    (drbd, plain): [(xenvg, 500), (vg2, 100)],  # per node
> > >> >> >> >    (file): [(default, 500)],      # per node
> > >> >> >> >    (rbd): [(pool1, 100), (pool2, 200)] # global per cluster
> > >> >> >> >    (sharedfile): [(default, 100)] # this is global per cluster, 
> > >> >> >> > but not
> > >> >> >> >                                   # explained here
> > >> >> >> >    ...
> > >> >> >> > }
> > >> >> >> >
> > >> >> >> > I.e. describe in the actual data structures which storage types 
> > >> >> >> > are
> > >> >> >> > combined, instead of making this information "hidden". We still 
> > >> >> >> > don't
> > >> >> >> > have a way to describe which are shared and which not, but that 
> > >> >> >> > we
> > >> >> >> > already hardcode.
> > >> >> >> >
> > >> >> >>
> > >> >> >> Yes, I agree the mapping between which are shared and which are not
> > >> >> >> must be explicit and not implicit.
> > >> >> >> Although I hadn't specified it here, that was the plan.
> > >> >> >> For file/sharedfile what I meant is that it's common the "way of
> > >> >> >> fetching the free space from a directory" not the directory itself 
> > >> >> >> nor
> > >> >> >> the space.
> > >> >> >
> > >> >> > Ack.
> > >> >> >
> > >> >> >> The point was more that from the backend (noded) point of view
> > >> >> >> fetching free space would be "<mechanism>:<key>" (eg: lvm:xenvg) 
> > >> >> >> and
> > >> >> >> there would be a mapping between storage types and mechanisms: 
> > >> >> >> "plain,
> > >> >> >> drbd -> lvm", "file, sharedfile -> filesystem".
> > >> >> >
> > >> >> > Where would the mapping live? From the point of view of noded, it 
> > >> >> > should
> > >> >> > be very dumb, so it shouldn't make this kind of mapping itself.
> > >> >> >
> > >> >>
> > >> >> No the mapping would live in masterd, it's a transformation needed
> > >> >> before performing that query. Node just receives the "method:key"
> > >> >> request and replies with the storage space it found in that
> > >> >> method,key.
> > >> >
> > >> > But this is the thing I'm not sure it's good to do. Noded should only
> > >> > care about lvm, file, rbd, and not how these map onto "methods"; the
> > >> > same way as noded doesn't know anything about disk templates, only 
> > >> > about
> > >> > logical disk types.
> > >> >
> > >>
> > >> But that's what I said: noded will just know how to to query the free
> > >> space for a particular method and it will be masterd's job to ask for
> > >> the right method depending on the storage type.
> > >> (lvm, file (or filesystem) and rbd_pool are the methods, while the
> > >> mapping done in masterd is: "plain and drbd map onto lvm", "file and
> > >> sharedfile map on "filesystem"), etc.
> > >>
> > >> I believe we do agree on this, but then maybe I didn't understand your
> > >> objection correctly: is the above what you're looking for, or you'd
> > >> like something different? In which case, can you go a bit more in
> > >> detail into what?
> > >
> > > I would like the node not to have to know about methods. The node
> > > already knows about disk types, and the master can make itself the
> > > connection disk type to method, so no need for the node to know methods,
> > > similar to how it doesn't know disk templates but it knows disk type.
> > >
> > > Now, if the argument is that the node has no concept yet of free disk
> > > for anything except lvm, that's somewhat true; we have the node storage
> > > framework, maybe we should just extend that and use it.
> > >
> > 
> > The point here is that the disk type is at a higher level than the
> > method, you can know about the method and not the disk type, but not
> > vice versa.
> > Let me explain:
> > 
> >   - if I tell the node "give me the lvm free space on vg xenvg" the
> > node can run a command and find out
> >   - if I tell the node "give me the free space for storage type drbd"
> > the node has to find out that drbd uses lvm volume xenvg (somehow) and
> > then run the same command above (thus knowing about the method)
> > 
> > Same for files, rbd, etc. So there's no way the node can "not know"
> > about the method, and it's not useful for the master to connect them
> > to the disk type, somehow, it seems to me (if I'm still not clear we
> > can have a quick in-person/over vc chat about this, perhaps)
> 
> OK, this was all a misunderstanding of what mechanism means, sorry for
> that.
> 
> The first one is what we want to do, indeed.
> 
> > >> >> ACK, makes sense (at the masterd level, again. we'll keep noded dumb).
> > >> >> So drbd would be a transform and use lvm as backing store? In which
> > >> >> case, if this is declared, it becomes easy to query lvm when we want
> > >> >> the information about drbd.
> > >> >> How do we handle in this sense multiple volume groups?
> > >> >
> > >> > All storage types will have a list of "pools"; for files, it's always a
> > >> > list of length one, for drbd/lvm it's how many VGs we have enabled, and
> > >> > for rados it's similar - the pools that we have enabled.
> > >> >
> > >>
> > >> Ack. Should we have a better way to specify the pool at instance create 
> > >> time?
> > >> (currently we have vg and metavg, but this doesn't address chosing a
> > >> separate rados pool, for example).
> > >>
> > >> >> (In my view the drbd storage method would list allowed vgs, as would
> > >> >> the plain one, but in this view only plain would, and drbd would need
> > >> >> to just "use" those).
> > >> >> (which is ok, although it doesn't live the flexibility of saying "use
> > >> >> this vg for plain only, and this vg for drbd) (not that we have that
> > >> >> now, but we could get it for free if we went the other way).
> > >> >
> > >> > ack, good points.
> > >> >
> > >>
> > >> So we should go with treating drbd and plain as "equal" with the
> > >> "allowed vgs" in the disk parameters of each. Ack.
> > >> In which case drbd can also have "allowed meta vgs" as those might be
> > >> different, I suppose.
> > >
> > > Yes.
> > >
> > >> (somehow this goes to: all methods have one pool, except for drbd
> > >> which has a pool and a metapool... mmm... ) :(
> > >
> > > Well, life is complicated? :P
> > >
> > 
> > Right. :)
> 
> :)
> 
> > > I'm wondering if this is not a sign that our framework is a bit broken.
> > > In the sense that for (e.g.) disk sizes, we have min, max, std
> > > (=default), but for metavg we only have default. So in this sense, we
> > > would actually need for each parameter, no matter whether it's disk
> > > parameter, or node parameter: default and allowed values.
> > >
> > 
> > well we have min=max=default, as well. But on the other hand we don't
> > track metavgs by policy at all, right?
> > Does it belong there? And is this related to disk size reporting?
> > (finding out about the correct vg is, but this might not be...)
> > 
> > > Which brings another point: the only reason we protect some values is
> > > because some operations take input from "less-trusted" entities (e.g.
> > > disk sizes), but we expect that node parameters are from "more-trusted"
> > > ones.
> > >
> > > Argh, quagmire. Let's just do:
> > >
> > > - we need to have a list of configured VGs per cluster, instead of "one
> > >   VG", also configured Rados pools (each storage method basically)
> > 
> > Yes.
> > 
> > > - all verification, queries, etc., should only talk to those
> > > - drbd and meta volume can choose any of those, but the defaults will be
> > >   most likely used
> > >
> > 
> > Yes.
> > 
> > > We will see later if we find a good framework for defining configured
> > > "stuff" for storage methods; right now is just the pool, basically.
> > >
> > 
> > Ack. So the simplification is: we don't distinguish between metapool
> > and pool, drbd can use both interchangeably and it's up to the user to
> > do the right thing. Right?
> 
> Yep.
> 
> Thanks!


And by thanks, I mean LGTM, but I'd like a resend first :)

thanks,
iustin

Re: [PATCH master] Design correct reporting of storage free space

Reply via email to