On Mon, Mar 12, 2007 at 06:12:26PM +0530, Srivatsa Vaddagiri wrote:
> I happened to read the entire thread (@ http://lkml.org/lkml/2007/3/1/159)
> all over again and felt it may be usefull to summarize the discussions so far.
> 
> If I have missed any imp. points or falsely represented someone's view
> (unintentionally of course!), then I would be glad to be corrected.
> 
> 1. Which task-grouping mechanism?
>       
>       [ This question is the most vital one that needs a consensus ]
> 
> Resource management generally works by apply resource controls over a -group- 
> of tasks (tasks of a user, tasks in a vserver/container etc).
> 
> What mechanism do we use to group tasks for res mgmt purposes?
> 
> Options:
> 
> a. Paul Menage's container(/uh-a-diff-name-pls?) patches
> 
>       The patches introduce a new pointer in task_struct, struct
>       container_group *containers, and a new structure 'struct container'.
> 
>       Tasks pointing to the same 'struct container' object (via their
>       tsk->containers->container[] pointer) are considered to form
>       a group associated with that container. The attributes associated
>       with a container (ex: cpu_limit, rss_limit, cpus/mems_allowed) are 
>       decided by the options passed to mount command (which binds 
>       one/more/all resource controllers to a hierarchy).
> 
>       + For workload management, where it is desirable to manage resource 
>         consumption of a run-time defined (potentially arbitrary) group of 
>         tasks, then this patch is handy, as no existing pointers in 
>         task_struct can be used to form such a run-time decided group.
> 
>       - (subjective!) If there is a existing grouping mechanism already (say 
>         tsk->nsproxy[->pid_ns]) over which res control needs to be applied, 
>         then the new grouping mechanism can be considered redundant (it can 
>         eat up unnecessary space in task_struct)
> 
>           What may help avoid this redundancy is to re-build existing 
>         grouping mechanism (say tsk->nsproxy) using the container patches.
>         Serge however expressed some doubts on such a implementation
>         (for ex: how will one build hierarchical cpusets and non-hierarchical
>         namespaces using that single 'grouping' pointer in task_struct) and 
>         also felt it may slow down things a bit from namespaces pov (more 
>         dereferences reqd to get to a task's namespace).
> 
> b. Reuse existing pointers in task_struct, tsk->nsproxy or better perhaps 
>    tsk->nsproxy->pid_ns, as the means to group tasks (rcfs patches)
> 
>       This is based on the observation that the group of tasks whose resource 
>       consumption need to be managed is already defined in the kernel by 
>       existing pointers (either tsk->nsproxy or tsk->nsproxy->pid_ns)
> 
>       + reuses existing grouping mechanism in kernel
> 
>       - mixes resource and name spaces (?)
> 
> c. Introduce yet-another new structure ('struct res_ctl?') which houses 
>    resource control (& possibly pid_ns?) parameters and a new pointer to this 
>    structure in task_struct (Herbert Poetzl).
> 
>       Tasks that have a pointer to the same 'struct res_ctl' are considered 
>       to form a group for res mgmt purpose
> 
>       + Accessing res ctl information in scheduler fast path is
>         optimized (only two-dereferences required)
> 
>       - If all resource control parameters (cpu, memory, io etc) are
>         lumped together in same structure, it makes it hard to
>         have resource classes (cpu, mem etc) that are independent of
>         each other.
> 
>       - If we introduce several pointers in task_struct to allow
>         separation of resource classes, then it will increase storage space 
>         in task_struct and also fork time (we have to take ref count
>         on more than one object now). Herbert thinks this is worthy
>         tradeoff for the benefit gained in scheduler fast paths.

what about identifying different resource categories and
handling them according to the typical usage pattern?

like the following:

 - cpu and scheduler related accounting/limits
 - memory related accounting/limits
 - network related accounting/limits
 - generic/file system related accounting/limits

I don't worry too much about having the generic/file stuff
attached to the nsproxy, but the cpu/sched stuff might be
better off being directly reachable from the task
(the memory related stuff might be placed in a zone or so)

> 2. Where do we put resource control parameters for a group?
> 
>       This depends on 1. So the options are:
> 
> a. Paul Menage's patches:
> 
>       (tsk->containers->container[cpu_ctlr.subsys_id] - X)->cpu_limit
> 
>    An optimized version of the above is:
>       (tsk->containers->subsys[cpu_ctlr.subsys_id] - X)->cpu_limit
> 
> 
> b. rcfs
>       tsk->nsproxy->ctlr_data[cpu_ctlr.subsys_id]->cpu_limit
> 
> c. Herbert's proposal
>       tsk->res_ctl->cpu_limit

see above, but yes ...

> 3. How are cpusets related to vserver/containers?
> 
>       Should it be possible to, lets say, create exclusive cpusets and
>       attach containers to different cpusets?

that is what Linux-VServer does atm, i.e. you can put
an entire guest into a specific cpu set 

> 4. Interface
>       Filesystem vs system call 
> 
>       Filesystem:
>               + natural way to represent hierarchical data
>               + File permission model convenient to delegate
>                 management of part of a tree to one user
>               + Ease of use with scripts
> 
>               (from Herbet Poetzl):
> 
>               - performance of filesystem interfaces is quite bad
>               - you need to do a lot to make the fs consistant for
>                 e.g. find and friends (regarding links and filesize)
>               - you have a quite hard time to do atomic operations
>                 (except for the ioctl interface, which nobody likes)
>               - vfs/mnt namespaces complicate the access to this
>                 new filesystem once you start moving around (between
>                 the spaces)
> 
> 
> 5. If we use filesystem interface, then should it be in /proc? (Eric)
> 
>       - /proc doesn't allow the flexibility of say creating multiple
>         hierarchies and binding different resource controllers to each
>         hierarchy
> 
> 6. As tasks move around namespaces/resource-classes, their
>    tsk->nsproxy/containers object will change. Do we simple create
>    a new nsproxy/containers object or optimize storage by searching
>    for one which matches the task's new requirements?
> 
>       - Linux Vserver follows former approach i.e simply creates
>         a new nsproxy with pointers to required namespace objects

which I consider suboptimal, but it was straight forward 
to implement ... 

> 7. Hierarchy
> 
>       - For res mgmt, do we need to worry about hierarchy at all?
> 
>               - If we consider cpuset to be a resource controller,
>                 then we have one resource controller who already
>                 supports hierarchy
> 
>               - If we don't support hierarchy in res controllers today
>                 but were to add that support later, then
>                 user-interface shouldn't change. That's why
>                 designining -atleast- the user interface to support
>                 hierarchy may make sense
> 
>       - Do we let resource classes to be split independent of each?
> 
>         For ex: CPU resource classes are independent of memory resource
>         classes. This inturn affect whether the Paul Menage's patches
>         need to support multiple hierarchy feature.

thanks,
Herbert

> -- 
> Regards,
> vatsa
> _______________________________________________
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to