On Sat, Feb 23, 2008 at 4:09 AM, Paul Jackson <[EMAIL PROTECTED]> wrote: > A couple of proposals have been made recently by people working Linux > on smaller systems, for improving realtime isolation and memory > pressure handling: > > (1) cpu isolation for hard(er) realtime > http://lkml.org/lkml/2008/2/21/517 > Max Krasnyanskiy <[EMAIL PROTECTED]> > [PATCH sched-devel 0/7] CPU isolation extensions > > (2) notify user space of tight memory > http://lkml.org/lkml/2008/2/9/144 > KOSAKI Motohiro <[EMAIL PROTECTED]> > [PATCH 0/8][for -mm] mem_notify v6 > > In both cases, some of us have responded "why not use cpusets", and the > original submitters have replied "cpusets are too fat" (well, they > were more diplomatic than that, but I guess I can say that ;)
Having read those threads, it looks to me as though: - the parts of Max's problem that would be solved by cpusets can be mostly accomplished just via sched_setaffinity() - Motohiro wants to add a new system-wide API that you would also like to have available on a per-cpuset basis. (Why not just add two access points for the same feature?) I'm don't think that either of these would be enough to justify big changes to cpusets or cgroups, although eliminating bloat is always a good thing. > The primary semantic limit I'd suggest would be supporting exactly > one layer depth of cpusets, not a full hierarchy. So one could still > successfully issue from user space 'mkdir /dev/cpuset/foo', but trying > to do 'mkdir /dev/cpuset/foo/bar' would fail. This reminds me of > very early FAT file systems, which had just a single, fixed size > root directory ;). There might even be a configurable fixed upper > limit on how many /dev/cpuset/* directories were allowed, further > simplifying the locking and dynamic memory behavior of this apparatus. I'm not sure that either of these would make much difference to the overall footprint. A single layer of cpusets would allow you to simplify validate_change() but not much else. I don't see how a fixed upper limit on the number of cpusets makes the locking sufficiently simpler to save much code. > > How this extends to cgroups I don't know; for now I suspect that most > cgroup module development is motivated by the needs of larger systems, > not smaller systems. However, cpusets is now a module client of > cgroups, and it is cgroups that now provides cpusets with its interface > to the vfs infrastructure. It would seem unfortunate if this relation > was not continued with tiny cpusets. Perhaps someone can imagine a tiny > cgroups? This might be the most difficult part of this proposal. If we wanted to go this way, I can imagine a cgroups config option that forces just a single hierarchy, which would allow a bunch of simplifications that would save plenty of text. > > Looking at some IA64 sn2 config builds I have laying about, I see the > following text sizes for a couple of versions, showing the growth of > the cpuset/cgroup apparatus over time: > > 25933 2.6.18-rc3-mm1/kernel/cpuset.o (Aug 2006) > vs. > 37823 2.6.25-rc2-mm1/kernel/cgroup.o (Feb 2008) > 19558 2.6.25-rc2-mm1/kernel/cpuset.o > > So the total has grown from 25933 to 57381 text bytes (note that > this is IA64 arch; most arch's will have proportionately smaller > text sizes.) On x86_64 they're: cgroup.o: 17348 cpuset.o: 8533 Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/