Re: [Workman-devel] cgroup: status-quo and userland efforts

2015-03-03 Thread Luke Leighton
Serge Hallyn  writes:

> 
> Quoting Daniel P. Berrange (berrange@...):

> > Are you also planning to actually write a new cgroup parent manager
> > daemon too ? Currently my plan for libvirt is to just talk directly
> 
> I'm toying with the idea, yes.  (Right now my toy runs in either native
> mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
> love if someone else does it, but it needs to be done.
> 
> As I've said elsewhere in the thread, I see 2 problems to be addressed:
> 
> 1. The ability to nest the cgroup manager daemons, so that a daemon
> running in a container can talk to a daemon running on the host.  This
> is the problem my current toy is aiming to address.  But the API it
> exports is just a thin layer over cgroupfs.

 cool!  that's funny, that sounds exactly like what i asked if you
 could provide, and it turns out that you already did :)

 so, in theoorryy. you could have this:

 * run the service on top of /dev/cgroups, republishing [a subset?] as
   /run/cgroups and some other parts as /run/cgroups2

 * have PID1, instead of going directly to /dev/cgroups, to go to
   /run/cgroups *instead*.

 * have lxc, instead of going directly to /dev/cgroups, to go to
   /run/cgroups2 *instead*.

 the problem: as lennart mentions, PID1s such as systemd may be expecting
 to manage the setup of cgroups - entirely - for security or other
 initialisation reasons - *before* even the service that you've created,
 serge, is allowed to run.

 and *that's* why i suggested the idea of following what SE/Linux has
 done, which is to have policy files that compile down to a set of
 permissions that the (various) managers can and cannot do.  bits of
 cgroup that they are and are not permitted to manage.
 
 flat at the kernel implementation level; hierarchical (or other)
 at the "compile-the-policy-file" level.

 l.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-23 Thread Michal Hocko
On Mon 15-07-13 14:49:40, Vivek Goyal wrote:
> On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
> > On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> > > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > [...]
> > > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > > appropriate cgroups?
> > > 
> > > Do you use that daemon in practice?
> > 
> > I am not but my users do. And that is why I care.
> 
> Michael, 
> 
> would you have more details of how those users are exactly using
> rules engine daemon.

The most common usage is uid and exec names.

> To me rulesengined processed 3 kinds of rules.
> 
> - uid based
> - gid based
> - exec file path based
> 
> uid/gid based rule exection can be taken care by pam_cgroup module too.
> So I think one should not need cgrulesengined for that.

I am not familiar with pam_cgroup much but it is a part of libcgroup
package, right?

> I am curious what kind of exec rules are useful. Any placement of
> services one can do using systemd. So only executables we are left
> to manage are which are not services. 

Yes, those are usually backup processes which should not disrupt the
regular server workload.

uid ones are used to keep a leash on local users of the machine but i do
not have many details as I usually do not have access to those machines.
All I see are complains when something explodes ;)
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-07-15 Thread Vivek Goyal
On Sun, Jun 30, 2013 at 08:38:38PM +0200, Michal Hocko wrote:
> On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> [...]
> > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > appropriate cgroups?
> > 
> > Do you use that daemon in practice?
> 
> I am not but my users do. And that is why I care.

Michael, 

would you have more details of how those users are exactly using
rules engine daemon.

To me rulesengined processed 3 kinds of rules.

- uid based
- gid based
- exec file path based

uid/gid based rule exection can be taken care by pam_cgroup module too.
So I think one should not need cgrulesengined for that.

I am curious what kind of exec rules are useful. Any placement of
services one can do using systemd. So only executables we are left
to manage are which are not services. 

In practice is it very useful for an admin to say if "firefox" is launched
by a user then it should run in xyz cgroup. And if user cares about
firefox running in a sub cgroup, then it can always use cgexec to do
that.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-30 Thread Michal Hocko
On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
[...]
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
> 
> Do you use that daemon in practice?

I am not but my users do. And that is why I care.

> For user session logins, I think systemd has plans to put user
> sessions in a cgroup (kind of making pam_cgroup redundant).
> 
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

It doesn't make sense for short lived processes, all right, but it can
be useful for those that live for a long time.
 
> IIUC, systemd can't disable access to cgroupfs from other utilities.

The previous messages read otherwise. And that is why this rised the red
flag at many fronts.

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.
> 
> Thanks
> Vivek

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tejun Heo
On Fri, Jun 28, 2013 at 05:40:53PM -0500, Serge Hallyn wrote:
> > The kernel can exposed a knob that would allow systemd to lock that
> > down
> 
> Gah - why would you give him that idea?  :)

That's one of the ideas I had from the beginning.

> But yes, I'd sort of assume that was coming, eventually.

But I think we'll probably settle with a mechanism to find out whether
someone else is touching the hierarchy, which will be generally useful
for other consumers of cgroup too.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
> On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > > Hello, Mike.
> > > > 
> > > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > > the things usable via dirt simple scripts, very flexible, nice.
> > > > 
> > > > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > > > easy to play with and that part is fine.  But combined with the
> > > > hierarchical nature of cgroup and file permissions, it encourages
> > > > people to "deligate" subdirectories to less previledged domains,
> > > 
> > > OK, this really depends on what you expose to non-root users. I have
> > > seen use cases where admin prepares top-level which is root-only but
> > > it allows creating sub-groups which are under _full_ control of the
> > > subdomain. This worked nicely for memcg for example because hard limit,
> > > oom handling and other knobs are hierarchical so the subdomain cannot
> > > overwrite what admin has said.
> > > 
> > > > which
> > > > in turn leads to normal binaries to manipulate them directly, which is
> > > > where the horror begins.  We end up exposing control knobs which are
> > > > tightly coupled to kernel implementation details right into lay
> > > > binaries and scripts directly used by end users.
> > > >
> > > > I think this is the first time this happened, which is probably why
> > > > nobody really noticed the mess earlier.
> > > > 
> > > > Anyways, if you're root, you can keep doing whatever you want.
> > > 
> > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > appropriate cgroups?
> > 
> > Do you use that daemon in practice? For user session logins, I think
> > systemd has plans to put user sessions in a cgroup (kind of making
> > pam_cgroup redundant). 
> > 
> > Other functionality rulesengined was providing moving tasks automatically
> > in a cgroup based on executable name. I think that was racy and not
> > many people had liked it.
> 
> Regardless of the changes being proposed, IMHO, the cgrulesd should
> never be used. It is just outright dangerous for a daemon to be
> arbitrarily re-arranging what cgroups a process is placed in without
> the applications being aware of it. It can only be safely used in a
> scenario where cgroups are exclusively used by the administrator,
> and never used by applications for their own needs.

Even then it's not safe, since if the program quickly forks or clones a
few times, you can end up with some of the tasks being reclassified
and some not.

> > IIUC, systemd can't disable access to cgroupfs from other utilities.
> 
> The kernel can exposed a knob that would allow systemd to lock that
> down

Gah - why would you give him that idea?  :)

But yes, I'd sort of assume that was coming, eventually.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > Hello, Mike.
> > > 
> > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > the things usable via dirt simple scripts, very flexible, nice.
> > > 
> > > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > > easy to play with and that part is fine.  But combined with the
> > > hierarchical nature of cgroup and file permissions, it encourages
> > > people to "deligate" subdirectories to less previledged domains,
> > 
> > OK, this really depends on what you expose to non-root users. I have
> > seen use cases where admin prepares top-level which is root-only but
> > it allows creating sub-groups which are under _full_ control of the
> > subdomain. This worked nicely for memcg for example because hard limit,
> > oom handling and other knobs are hierarchical so the subdomain cannot
> > overwrite what admin has said.
> > 
> > > which
> > > in turn leads to normal binaries to manipulate them directly, which is
> > > where the horror begins.  We end up exposing control knobs which are
> > > tightly coupled to kernel implementation details right into lay
> > > binaries and scripts directly used by end users.
> > >
> > > I think this is the first time this happened, which is probably why
> > > nobody really noticed the mess earlier.
> > > 
> > > Anyways, if you're root, you can keep doing whatever you want.
> > 
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
> 
> Do you use that daemon in practice? For user session logins, I think
> systemd has plans to put user sessions in a cgroup (kind of making
> pam_cgroup redundant). 
> 
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

Regardless of the changes being proposed, IMHO, the cgrulesd should
never be used. It is just outright dangerous for a daemon to be
arbitrarily re-arranging what cgroups a process is placed in without
the applications being aware of it. It can only be safely used in a
scenario where cgroups are exclusively used by the administrator,
and never used by applications for their own needs.

> IIUC, systemd can't disable access to cgroupfs from other utilities.

The kernel can exposed a knob that would allow systemd to lock that
down

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Tim Hockin
On Fri, Jun 28, 2013 at 8:53 AM, Serge Hallyn  wrote:
> Quoting Daniel P. Berrange (berra...@redhat.com):

>> Are you also planning to actually write a new cgroup parent manager
>> daemon too ? Currently my plan for libvirt is to just talk directly
>
> I'm toying with the idea, yes.  (Right now my toy runs in either native
> mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
> love if someone else does it, but it needs to be done.
>
> As I've said elsewhere in the thread, I see 2 problems to be addressed:
>
> 1. The ability to nest the cgroup manager daemons, so that a daemon
> running in a container can talk to a daemon running on the host.  This
> is the problem my current toy is aiming to address.  But the API it
> exports is just a thin layer over cgroupfs.
>
> 2. Abstract away the kernel/cgroupfs details so that userspace can
> explain its cgroup needs generically.  This is IIUC what systemd is
> addressing with slices and scopes.
>
> (2) is where I'd really like to have a well thought out, community
> designed API that everyone can agree on, and it might be worth getting
> together (with Tejun) at plumbers or something to lay something out.

We're also working on (2) (well, we HAVE it, but we're dis-integrating
it so we can hopefully publish more widely).  But our (2) depends on
direct cgroupfs access.  If that is to change, we need a really robust
(1).  It's OK (desireable, in fact) that (1) be a very thin layer of
abstraction.

> In the end, something like libvirt or lxc should not need to care
> what is running underneat it.  It should be able to make its requests
> the same way regardless of whether it running in fedora or ubuntu,
> and whether it is running on the host or in a tightly bound container.
> That's my goal anyway :)
>
>> to systemd's new DBus APIs for all management of cgroups, and then
>> fall back to writing to cgroupfs directly for cases where systemd
>> is not around.  Having a library to abstract these two possible
>> alternatives isn't all that compelling unless we think there will
>> be multiple cgroups manager daemons. I've been somewhat assuming that
>> even Ubuntu will eventually see the benefits & switch to systemd,
>
> So far I've seen no indication of that :)
>
> If the systemd code to manage slices could be made separately
> compileable as a standalone library or daemon, then I'd advocate
> using that.  But I don't see a lot of incentive for systemd to do
> that, so I'd feel like a heel even asking.

I want to say "let the best API win", but I know that systemd is a
giant katamari ball, and it's absorbing subsystems so it may win by
default.  That isn't going to stop us from trying to do what we do,
and share that with the world.

>> then the issue of multiple manager daemons wouldn't really exist.
>
> True.  But I'm running under the assumption that Ubuntu will stick with
> upstart, and therefore yes I'll need a separate (perhaps pair of)
> management daemons.
>
> Even if we were to switch to systemd, I'd like the API for userspace
> programs to configure and use cgroups to be as generic as possible,
> so that anyone who wanted to write their own daemon could do so.
>
> -serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Vivek Goyal
On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > Hello, Mike.
> > 
> > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > the things usable via dirt simple scripts, very flexible, nice.
> > 
> > Oh, that in itself is not bad.  I mean, if you're root, it's pretty
> > easy to play with and that part is fine.  But combined with the
> > hierarchical nature of cgroup and file permissions, it encourages
> > people to "deligate" subdirectories to less previledged domains,
> 
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.
> 
> > which
> > in turn leads to normal binaries to manipulate them directly, which is
> > where the horror begins.  We end up exposing control knobs which are
> > tightly coupled to kernel implementation details right into lay
> > binaries and scripts directly used by end users.
> >
> > I think this is the first time this happened, which is probably why
> > nobody really noticed the mess earlier.
> > 
> > Anyways, if you're root, you can keep doing whatever you want.
> 
> OK, so libcgroup's rules daemon will still work and place my tasks in
> appropriate cgroups?

Do you use that daemon in practice? For user session logins, I think
systemd has plans to put user sessions in a cgroup (kind of making
pam_cgroup redundant). 

Other functionality rulesengined was providing moving tasks automatically
in a cgroup based on executable name. I think that was racy and not
many people had liked it.

IIUC, systemd can't disable access to cgroupfs from other utilities.
So most likely rulesengined should contine to work. But having both
systemd and libcgroup might not make much sense though.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > with a very lowlevel cgroup manager which supports nesting itself.
> > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > modes - native mode in which it uses cgroupfs, and child mode where it
> > talks to a parent manager to make the changes.
> > 
> > So then the idea would be that userspace (like libvirt and lxc) would
> > talk over /dev/cgroup to its manager.  Userspace inside a container
> > (which can't actually mount cgroups itself) would talk to its own
> > manager which is talking over a passed-in socket to the host manager,
> > which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> > the requestor's cgroup).
> > 
> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> 
> Are you also planning to actually write a new cgroup parent manager
> daemon too ? Currently my plan for libvirt is to just talk directly

I'm toying with the idea, yes.  (Right now my toy runs in either native
mode, using cgroupfs, or child mode, talking to a parent manager)  I'd
love if someone else does it, but it needs to be done.

As I've said elsewhere in the thread, I see 2 problems to be addressed:

1. The ability to nest the cgroup manager daemons, so that a daemon
running in a container can talk to a daemon running on the host.  This
is the problem my current toy is aiming to address.  But the API it
exports is just a thin layer over cgroupfs.

2. Abstract away the kernel/cgroupfs details so that userspace can
explain its cgroup needs generically.  This is IIUC what systemd is
addressing with slices and scopes.

(2) is where I'd really like to have a well thought out, community
designed API that everyone can agree on, and it might be worth getting
together (with Tejun) at plumbers or something to lay something out.

In the end, something like libvirt or lxc should not need to care
what is running underneat it.  It should be able to make its requests
the same way regardless of whether it running in fedora or ubuntu,
and whether it is running on the host or in a tightly bound container.
That's my goal anyway :)

> to systemd's new DBus APIs for all management of cgroups, and then
> fall back to writing to cgroupfs directly for cases where systemd
> is not around.  Having a library to abstract these two possible
> alternatives isn't all that compelling unless we think there will
> be multiple cgroups manager daemons. I've been somewhat assuming that
> even Ubuntu will eventually see the benefits & switch to systemd,

So far I've seen no indication of that :)

If the systemd code to manage slices could be made separately
compileable as a standalone library or daemon, then I'd advocate
using that.  But I don't see a lot of incentive for systemd to do
that, so I'd feel like a heel even asking.

> then the issue of multiple manager daemons wouldn't really exist.

True.  But I'm running under the assumption that Ubuntu will stick with
upstart, and therefore yes I'll need a separate (perhaps pair of)
management daemons.

Even if we were to switch to systemd, I'd like the API for userspace
programs to configure and use cgroups to be as generic as possible,
so that anyone who wanted to write their own daemon could do so.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-06-28 Thread Daniel P. Berrange
On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> FWIW, the code is too embarassing yet to see daylight, but I'm playing
> with a very lowlevel cgroup manager which supports nesting itself.
> Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> /c1/c2", "Create /c3"), but the key feature is that it can run in two
> modes - native mode in which it uses cgroupfs, and child mode where it
> talks to a parent manager to make the changes.
> 
> So then the idea would be that userspace (like libvirt and lxc) would
> talk over /dev/cgroup to its manager.  Userspace inside a container
> (which can't actually mount cgroups itself) would talk to its own
> manager which is talking over a passed-in socket to the host manager,
> which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> the requestor's cgroup).
> 
> At some point (probably soon) we might want to talk about a standard API
> for these things.  However I think it will have to come in the form of
> a standard library, which knows to either send requests over dbus to
> systemd, or over /dev/cgroup sock to the manager.

Are you also planning to actually write a new cgroup parent manager
daemon too ? Currently my plan for libvirt is to just talk directly
to systemd's new DBus APIs for all management of cgroups, and then
fall back to writing to cgroupfs directly for cases where systemd
is not around.  Having a library to abstract these two possible
alternatives isn't all that compelling unless we think there will
be multiple cgroups manager daemons. I've been somewhat assuming that
even Ubuntu will eventually see the benefits & switch to systemd,
then the issue of multiple manager daemons wouldn't really exist.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Tejun Heo
On Mon, Apr 08, 2013 at 03:46:31PM -0400, Vivek Goyal wrote:
> It would be good to think more about it. How a user can ensure minimum
> resources to a partition/service. Because in that case at every level
> somebody needs to keep track how much of resources have been committed
> as minimum requirements and more consumers can't be allowed at same level.
> (This sounds like cpu RT time division among various cgroups).

Yes, please take a step back from what we have right now because it
isn't very good.  It's a general policy decision / enforcement problem
and even the policies may change dynamically.  Having a central
authority doesn't automatically solve any of that and it'd be most
likely as limited as existing solutions at the beginning but it allows
for future improvements unlike scattering the solution all over the
place which just digs the hole deeper.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Vivek Goyal
On Mon, Apr 08, 2013 at 12:20:24PM -0700, Tejun Heo wrote:

[..]
> > For example, one might want to say that maximum IO bandwidth for 
> > virtual machine virt1 on disk sda should be 10MB/s. Now libvirt
> > should be able to save it in virtual machine specific configuration
> > easily and whenever virtual machine is started, create a children
> > cgroup, set the limits as specified.
> 
> Yes, sure, libvirt can *request* whatever it seems appropriate to the
> central authority, which will decide whether it'll be able to honor
> the request and grant it if possible and allowed by policies in
> effect.

10MB/s is an absolute limit. So I guess there is nothing to be requested
from an central authority here in terms of resources.

Even in the case of IO weight or cpu shares, there is nothing to be asked
from central authority. Well, there is. Creation of new croups changes
effective %share of peer groups. More below.

Where it makes sense though is if one says give a particular service
25% cpu. Then suddenly all the peer and parent entities become important.
IIUC, initial draft of workman does not address this issue.

It would be good to think more about it. How a user can ensure minimum
resources to a partition/service. Because in that case at every level
somebody needs to keep track how much of resources have been committed
as minimum requirements and more consumers can't be allowed at same level.
(This sounds like cpu RT time division among various cgroups).

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Tejun Heo
Hey,

On Mon, Apr 08, 2013 at 03:11:05PM -0400, Vivek Goyal wrote:
> >  What if the program crashes?
> 
> I am not sure about this. I guess when applications comes back after crash,
> it can go through all the children cgroups and reclaim empty cgroups.

Fragile, right?  What are you arguing here?

> >  Wouldn't it make more sense to just have
> > a central arbitrator that everyone talks to?
> 
> May be. Just that in the past folks have not liked the idea of talking
> to central authority to figure out resource group of an object they are
> managing.

What we've been doing seems tragically broken to me, so I'm not sure
"people didn't use to do it that way" is a good point.

> >  What's the benefit of
> > distributing the responsiblities here?  It's not like we can put them
> > in different security domains.
> 
> To me it makes sense in a way, as these resources associated with the
> service is just one another property and there does not seem to be
> anything special about this property that it should be managed using
> a single centralized authority.
> 
> For example, one might want to say that maximum IO bandwidth for 
> virtual machine virt1 on disk sda should be 10MB/s. Now libvirt
> should be able to save it in virtual machine specific configuration
> easily and whenever virtual machine is started, create a children
> cgroup, set the limits as specified.

Yes, sure, libvirt can *request* whatever it seems appropriate to the
central authority, which will decide whether it'll be able to honor
the request and grant it if possible and allowed by policies in
effect.

> That would make sense. systemd had this conflict with cgconfig
> too. Problem is that systemd starts first and sets up everything. Now
> if there is a service which sets up cgroups, after systemd startup,
> it is already late.

Come on, that's not a difficult or fundamental problem.  Whatever the
central authority may be, systemd can use it to setup the initial
hierarchy or set up bare-bone hierarchy in compatible manner.  This
isn't that different from udev.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Vivek Goyal
On Mon, Apr 08, 2013 at 11:16:07AM -0700, Tejun Heo wrote:
> Hey, Vivek.
> 
> On Mon, Apr 08, 2013 at 01:59:26PM -0400, Vivek Goyal wrote:
> > But using the library admin application should be able to query the
> > full "paritition" hierarchy and their weigths and calculate % system
> > resources. I think one problem there is cpu controller where % resoruce
> > of a cgroup depends on tasks entities which are peer to group. But that's
> > a kernel issue and not user space thing.
> 
> Yeah, we're gonna have to implement a different operation mode.
> 
> > So I am not sure what are potential problems with proposed model of
> > configuration in workman. All the consumer managers still follow what
> > libarary has told them to do.
> 
> Sure, if we assume everyone follows the rules and behaves nicely.
> It's more about the general approach.  Allowing / encouraging sharing
> or distributing control of cgroup hierarchy without forcing structure
> and rigid control over it is likely to lead to confusion and
> fragility.
> 
> > >  or maybe some other program just happened to choose the
> > >   same name.
> > 
> > Two programs ideally would have their own sub hiearchy. And if not one
> > of the programs should get the conflict when trying to create cgroup and
> > should back-off or fail or give warning...
> 
> And who's responsible for deleting it?

I think "consumer" manager should delete its own cgroup directories when
associated consumer[s] stop running.

And partitions created by workman will just remain there until and unless
user wanted to delete these explicitly.

>  What if the program crashes?

I am not sure about this. I guess when applications comes back after crash,
it can go through all the children cgroups and reclaim empty cgroups.

> 
> > > Who owns config knobs in that directory?
> > 
> > IIUC, workman was looking at two types of cgroups. Once called
> > "partitions" which will be created by library at startup time and
> > library manages the configuration (something like cgconfig.conf).
> > 
> > And individual managers create their own children groups for various
> > services under that partition and control the config knobs for those
> > services.
> > 
> > user-defined-partition
> >  /|\
> >virt1  virt2 virt3  
> > 
> > So user should be able to define a partition and control the configuration
> > using workman lib. And if multiple virtual machines are being run in
> > the partition, then they create their own cgroups and libvirt controls
> > the properties of virt1, virt2, virt3 cgroups. I thought that was the
> > the understanding when we dicussed ownership of config knobs las time.
> > But things might have changed since last time. Workman folks should
> > be able to shed light on this.
> 
> I just read the introduction doc and haven't delved into the API or
> code so I could be off but why should there be multiple managers?
> What's the benefit of that?

A centralized authority does not know about all the managed objects.
Only respective manager knows about what objects it is managing and
what are the controllable attributes of that object.

systemd is managing services and libvirt is managing virtual machines,
containers etc. Some people view associated resource group as just one
additional attribute of the managed service. These managers already
maintain multiple attributes of a service and can store one additional
attribute easily.

>  Wouldn't it make more sense to just have
> a central arbitrator that everyone talks to?

May be. Just that in the past folks have not liked the idea of talking
to central authority to figure out resource group of an object they are
managing.

>  What's the benefit of
> distributing the responsiblities here?  It's not like we can put them
> in different security domains.

To me it makes sense in a way, as these resources associated with the
service is just one another property and there does not seem to be
anything special about this property that it should be managed using
a single centralized authority.

For example, one might want to say that maximum IO bandwidth for 
virtual machine virt1 on disk sda should be 10MB/s. Now libvirt
should be able to save it in virtual machine specific configuration
easily and whenever virtual machine is started, create a children
cgroup, set the limits as specified.

If a central authority keeps track of all this, I am not sure how 
would it look like and might get messy.

[..]
> > > I think the only logical thing to do is creating a centralized
> > > userland authority which takes full ownership of the cgroup filesystem
> > > interface, gives it a sane structure,
> > 
> > Right now systemd seems to be giving initial structure. I guess we will
> > require some changes where systemd itself runs in a cgroup and that
> > allows one to create peer groups. Something like.
> > 
> > root
> > 

Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Tejun Heo
On Mon, Apr 08, 2013 at 11:16:07AM -0700, Tejun Heo wrote:
> > Given the fact that library has view of full system resoruces (both
> > persistent view and active view), shouldn't we just be able to extend
> > the API to meet additional configuration or resource needs.
> 
> Maybe, I don't know.  It just looks like a weird approach to me.
> Wouldn't it make more sense to implement it as a dbus service that
> everyone talks to?  That's how our base system is structured these
> days.  Why should this be any different?

To expand a bit, the base system being composed that way makes a lot
of sense.  It becomes clear who's responsible for what and there's a
reliable way to recover when things go awry on the clients' sides.
Also, it pretty much *forces* you to design an interface which fits
the problem domain properly rather than exposing all the control knobs
there are without thinking how they'd be actually useful.  The
language binding issue is much easier too - it's already solved.

It seems like the only logical thing to do, well, at least to me.
Am I missing something?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Tejun Heo
Hey, Vivek.

On Mon, Apr 08, 2013 at 01:59:26PM -0400, Vivek Goyal wrote:
> But using the library admin application should be able to query the
> full "paritition" hierarchy and their weigths and calculate % system
> resources. I think one problem there is cpu controller where % resoruce
> of a cgroup depends on tasks entities which are peer to group. But that's
> a kernel issue and not user space thing.

Yeah, we're gonna have to implement a different operation mode.

> So I am not sure what are potential problems with proposed model of
> configuration in workman. All the consumer managers still follow what
> libarary has told them to do.

Sure, if we assume everyone follows the rules and behaves nicely.
It's more about the general approach.  Allowing / encouraging sharing
or distributing control of cgroup hierarchy without forcing structure
and rigid control over it is likely to lead to confusion and
fragility.

> >  or maybe some other program just happened to choose the
> >   same name.
> 
> Two programs ideally would have their own sub hiearchy. And if not one
> of the programs should get the conflict when trying to create cgroup and
> should back-off or fail or give warning...

And who's responsible for deleting it?  What if the program crashes?

> > Who owns config knobs in that directory?
> 
> IIUC, workman was looking at two types of cgroups. Once called
> "partitions" which will be created by library at startup time and
> library manages the configuration (something like cgconfig.conf).
> 
> And individual managers create their own children groups for various
> services under that partition and control the config knobs for those
> services.
> 
>   user-defined-partition
>/|\
>virt1  virt2 virt3  
> 
> So user should be able to define a partition and control the configuration
> using workman lib. And if multiple virtual machines are being run in
> the partition, then they create their own cgroups and libvirt controls
> the properties of virt1, virt2, virt3 cgroups. I thought that was the
> the understanding when we dicussed ownership of config knobs las time.
> But things might have changed since last time. Workman folks should
> be able to shed light on this.

I just read the introduction doc and haven't delved into the API or
code so I could be off but why should there be multiple managers?
What's the benefit of that?  Wouldn't it make more sense to just have
a central arbitrator that everyone talks to?  What's the benefit of
distributing the responsiblities here?  It's not like we can put them
in different security domains.

> > * In many cases, resource distribution is system-wide policy decisions
> >   and determining what to do often requires system-wide knowledge.
> >   You can't provision memory limits without knowing what's available
> >   in the system and what else is going on in the system, and you want
> >   to be able to adjust them as situation and configuration changes.
> >   Without anybody having full picture of how resources are
> >   provisioned, how would any of that be possible?
> 
> I thought workman library will provide interfaces so that one can query
> and be able to construct the full system view.
> 
> Their doc says.
> 
> GList *workmanager_partition_get_children(WorkmanPartition *partition,
> GError **error);
> 
> So I am assuming this can be used to construct the full partition
> hierarchy and associated resource allocation.

Sure, maybe it can be used as a building block.

> [..]
> > I think the only logical thing to do is creating a centralized
> > userland authority which takes full ownership of the cgroup filesystem
> > interface, gives it a sane structure,
> 
> Right now systemd seems to be giving initial structure. I guess we will
> require some changes where systemd itself runs in a cgroup and that
> allows one to create peer groups. Something like.
>   
>   root
>   /  \
>  systemd other-groups

No, we need a single structured hierarchy which everyone uses
*including* systemd.

> > represents available resources
> > in a sane form, and makes policy decisions based on configuration and
> > requests.
> 
> Given the fact that library has view of full system resoruces (both
> persistent view and active view), shouldn't we just be able to extend
> the API to meet additional configuration or resource needs.

Maybe, I don't know.  It just looks like a weird approach to me.
Wouldn't it make more sense to implement it as a dbus service that
everyone talks to?  That's how our base system is structured these
days.  Why should this be any different?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.or

Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Vivek Goyal
On Mon, Apr 08, 2013 at 05:46:09PM +0400, Glauber Costa wrote:

[..]
> The cpu cgroup needs a real-time timeslice to accept real time tasks. It
> defaults to 0, meaning that a newly created cpu cgroup cannot accept
> tasks (rt tasks) without the user having to manually configure it.
> As far as I know, this problem hasn't yet been fixed.

Yes, systemd folks wanted this to be fixed so that out of the box they
could put individual user session in a cgroup and still expect that 
any RT applications of user are not broken.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Workman-devel] cgroup: status-quo and userland efforts

2013-04-08 Thread Vivek Goyal
On Fri, Apr 05, 2013 at 06:21:59PM -0700, Tejun Heo wrote:

[..]
>  Userland efforts
>  
> 
> There are currently a few userland efforts trying to make interfacing
> with cgroup less painful.
> 
> * libcg: Make cgroup interface accessible from programming languages
>   with support for configuration persistency, which also brings its
>   own config files to remember what to do on the next boot.  Sans the
>   persistence part, it just seems to directly translate the filesystem
>   interface to function interface.
> 
>   http://libcg.sourceforge.net/
> 
> * Workman: It's a rather young project but as its name (workload
>   management) implies, its aims are higher level than that of libcg.
>   It aims to provide high-level resource allocation and management and
>   introduces new concepts like resource partitions to represent its
>   view of resource hierarchy.  Like libcg, this one is implemented as
>   a library but provides bindings for more languages.
> 
>   https://gitorious.org/workman/pages/Home
> 
> * Pax Controla Groupiana: A document on how not to step on other's
>   toes while using cgroup.  It's not a software project but tries to
>   define precautions that a software or user can take to avoid
>   breaking or confusing other users of the cgroup filesystem.
> 
>   http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
> 
> All try to play nice with other possible users of the cgroup
> filesystem - be it libvirt cgroup, applications doing their own cgroup
> tricks, or hand-crafted custom scripts.  While the approach is
> understandable given that those usages already exist, I don't think
> it's a workable solution in the long term.  There are several reasons
> for that.
> 
> * The configurations aren't independent.  e.g. for weight-based
>   controllers, your weight is only meaningful in relation to other
>   weights at that level.  Distributing configuration to whatever
>   entities which may write to cgroupfs simply cannot work.  It's
>   fundamentally flawed.

Hi Tejun,

I thought in workman, "partition" configuration was still centralized
while individual "consumer" configuration was with consumer manger 
(systemd, libvirt, .. etc). IOW, library can tell consumer manger to
which partition to associate consumer with at startup time. (consumer
manager can assume their own defaults if nothing has been told).

Agreed, that weight is meaningful only if one as full hierarchy view
and then one should be able to calculate effective % share of resoures
of a group.

But using the library admin application should be able to query the
full "paritition" hierarchy and their weigths and calculate % system
resources. I think one problem there is cpu controller where % resoruce
of a cgroup depends on tasks entities which are peer to group. But that's
a kernel issue and not user space thing.

So I am not sure what are potential problems with proposed model of
configuration in workman. All the consumer managers still follow what
libarary has told them to do.

> 
> * It's fragile like hell.  There's no accountability.  Nobody really
>   knows what's going on.  Is this subdirectory still there due to a
>   bug in this program, or something or someone else created it and
>   crashed / forgot to remove it, or what?

I thought any directory under a consumer manger is managed by that
manager and nobody is supposed to dynamically create resource
partition/cgroup there. So that takes away a bit of confusion.

>  Oh, the cgroup I wanted to
>   create already exists.  Maybe the previous instance created it and
>   then crashed

This should be the case as long as we stick to the notion of a manger
managing its own sub-hierarchy.

>  or maybe some other program just happened to choose the
>   same name.

Two programs ideally would have their own sub hiearchy. And if not one
of the programs should get the conflict when trying to create cgroup and
should back-off or fail or give warning...

> Who owns config knobs in that directory?

IIUC, workman was looking at two types of cgroups. Once called
"partitions" which will be created by library at startup time and
library manages the configuration (something like cgconfig.conf).

And individual managers create their own children groups for various
services under that partition and control the config knobs for those
services.

user-defined-partition
 /|\
   virt1  virt2 virt3  

So user should be able to define a partition and control the configuration
using workman lib. And if multiple virtual machines are being run in
the partition, then they create their own cgroups and libvirt controls
the properties of virt1, virt2, virt3 cgroups. I thought that was the
the understanding when we dicussed ownership of config knobs las time.
But things might have changed since last time. Workman folks should
be able to shed light on this.

>  This way lies
>   madness.  I understand why