Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Tue, Mar 03, 2015 at 12:31:19AM +1100, Aleksa Sarai wrote: > > If 16-bit PID's aren't a concern anymore, then why do we still default to > > treating it like a 16-bit signed int (the default for > > /proc/sys/kernel/pid_max is 32768)? > > I just want to emphasise that *even if* we changed to another default > limit, the mere existence of a system-wide pid_max makes PIDs a > resource. We seem to fail to communicate. The primary reason why pid promotes itself to a global resource status is because it's globally capped way below its backing resource's (kernel memory) limit and it is very difficult to make it not so due to direct userland dependencies on it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Mon, Mar 02, 2015 at 08:13:23AM -0500, Austin S Hemmelgarn wrote: > If 16-bit PID's aren't a concern anymore, then why do we still default to > treating it like a 16-bit signed int (the default for > /proc/sys/kernel/pid_max is 32768)? Inertia. It has to start there for backward compatibility. Now it's trivial to adjust dynamically and majority of the users don't need to worry about it, so there's no pressing reason to bump it up by default. 16bit pid_t was already a dying breed on 32bit config and it never was an option on 64bit. Any remotely modern distros in the past decade, whether 32 or 64bit, wouldn't have any problem with it. The only possibly problematic case would be legacy code which for some reason explicitly used 16bit integer types instead of pid_t, but at this point, we shouldn't be basing any design decisions on that. If anybody is still depending on that, there are different ways ton deal with the issue on their end including namespacing its pid space. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
> If 16-bit PID's aren't a concern anymore, then why do we still default to > treating it like a 16-bit signed int (the default for > /proc/sys/kernel/pid_max is 32768)? I just want to emphasise that *even if* we changed to another default limit, the mere existence of a system-wide pid_max makes PIDs a resource. -- Aleksa Sarai (cyphar) www.cyphar.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On 2015-02-28 11:43, Tejun Heo wrote: Hello, Tim. On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote: I know there is not much concern for legacy-system problems, but it is worth adding this case - there are systems that limit PIDs for other reasons, eg broken infrastructure that assumes PIDs fit in a short int, hypothetically. Given such a system, PIDs become precious and limiting them per job is important. My main point being that there are less obvious considerations in play than just memory usage. Sure, there are those cases but it'd be unwise to hinge long term decisions on them. It's hard to even argue 16bit pid in legacy code as a significant contributing factor at this point. At any rate, it seems that pid is a global resource which needs to be provisioned for reasonable isolation which is a good reason to consider controlling it via cgroups. If 16-bit PID's aren't a concern anymore, then why do we still default to treating it like a 16-bit signed int (the default for /proc/sys/kernel/pid_max is 32768)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Feb 28, 2015 2:50 PM, "Tejun Heo" wrote: > > On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote: > > Wow, so much anger. I'm not even sure how to respond, so I'll just > > say this and sign off. All I want is a better, friendlier, more > > useful system overall. We clearly have different ways of looking at > > the problem. > > Can you communicate anything w/o passive aggression? If you have a > technical point, just state that. Can you at least agree that we > shouldn't be making design decisions based on 16bit pid_t? Hmm, I have screwed this thread up, I think. I've made some remarks that did not come through with the proper tongue-in-cheek slant. I'm not being passive aggressive - we DO look at this problem differently. OF COURSE we should not make decisions based on ancient artifacts of history. My point was that there are secondary considerations here - PIDs are more than just the memory that backs them. They _ARE_ a constrained resource, and you shouldn't assume the constraint is just physical memory. It is a piece of policy that is outside the control of the kernel proper - we handed those keys to userspace along time ago. Given that, I believe and have believed that the solution should model the problem as the user perceives it - limiting PIDs - rather than attaching to a solution-by-proxy. Yes a solution here partially overlaps with kmemcg, but I don't think that is a significant problem. They are different policies governing behavior that may result in the same condition, but for very different reasons. I do not think that is particularly bad for overall comprehension, and I think the fact that this popped up yet again indicates the existence of some nugget of user experience that is worth paying consideration to. I appreciate your promised consideration through a slightly refocused lens. I will go back to my cave and do something I hope is more productive and less antagonistic. I did not mean to bring out so much vitriol. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote: > On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo wrote: > > > > On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote: > > > I am sorry that real-user problems are not perceived as substantial. This > > > was/is a real issue for us. Being in limbo for years on end might not be > > > a > > > technical point, but I do think it matters, and that was my point. > > > > It's a problem which is localized to you and caused by the specific > > problems of your setup. This isn't a wide-spread problem at all and > > the world doesn't revolve around you. If your setup is so messed up > > as to require sticking to 16bit pids, handle that locally. If > > something at larger scale eases that handling, you get lucky. If not, > > it's *your* predicament to deal with. The rest of the world doesn't > > exist to wipe your ass. > > Wow, so much anger. Yeah, quite surprising after such an intellectually honest discussion: : On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote: : > At least 3 or 4 people have INDEPENDENTLY decided this is what is : > causing them pain and tried to fix it and invested the time to send a : > patch says that it is actually a thing. There exists a problem that : > you are disallowing to be fixed. Do you recognize that users are : > experiencing pain? Why do you hate your users? :) [...] : > Are you willing to put a drop-dead date on it? If we don't have : > kmemcg working well enough to _actually_ bound PID usage and FD usage : > by, say, June 1st, will you then accept a patch to this effect? If : > the answer is no, then I have zero faith that it's coming any time : > soon - I heard this 2 years ago. I believed you then. > I'm not even sure how to respond, so I'll just say this and sign > off. All I want is a better, friendlier, more useful system > overall. We clearly have different ways of looking at the problem. Overlapping features and inconsistent userspace interfaces are only better for the people that pick the hacks. They are the opposite of friendly and useful. They are also horrible to maintain, which could be a reason why you constantly disagree with the people that cleaned up this unholy mess and are now trying to keep a balance between your short term interests and the long-term health of the Linux kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Sat, Feb 28, 2015 at 02:26:58PM -0800, Tim Hockin wrote: > Wow, so much anger. I'm not even sure how to respond, so I'll just > say this and sign off. All I want is a better, friendlier, more > useful system overall. We clearly have different ways of looking at > the problem. Can you communicate anything w/o passive aggression? If you have a technical point, just state that. Can you at least agree that we shouldn't be making design decisions based on 16bit pid_t? -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Sat, Feb 28, 2015 at 8:57 AM, Tejun Heo wrote: > > On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote: > > I am sorry that real-user problems are not perceived as substantial. This > > was/is a real issue for us. Being in limbo for years on end might not be a > > technical point, but I do think it matters, and that was my point. > > It's a problem which is localized to you and caused by the specific > problems of your setup. This isn't a wide-spread problem at all and > the world doesn't revolve around you. If your setup is so messed up > as to require sticking to 16bit pids, handle that locally. If > something at larger scale eases that handling, you get lucky. If not, > it's *your* predicament to deal with. The rest of the world doesn't > exist to wipe your ass. Wow, so much anger. I'm not even sure how to respond, so I'll just say this and sign off. All I want is a better, friendlier, more useful system overall. We clearly have different ways of looking at the problem. No antagonism intended Tim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Sat, Feb 28, 2015 at 08:48:12AM -0800, Tim Hockin wrote: > I am sorry that real-user problems are not perceived as substantial. This > was/is a real issue for us. Being in limbo for years on end might not be a > technical point, but I do think it matters, and that was my point. It's a problem which is localized to you and caused by the specific problems of your setup. This isn't a wide-spread problem at all and the world doesn't revolve around you. If your setup is so messed up as to require sticking to 16bit pids, handle that locally. If something at larger scale eases that handling, you get lucky. If not, it's *your* predicament to deal with. The rest of the world doesn't exist to wipe your ass. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, Tim. On Sat, Feb 28, 2015 at 08:38:07AM -0800, Tim Hockin wrote: > I know there is not much concern for legacy-system problems, but it is > worth adding this case - there are systems that limit PIDs for other > reasons, eg broken infrastructure that assumes PIDs fit in a short int, > hypothetically. Given such a system, PIDs become precious and limiting > them per job is important. > > My main point being that there are less obvious considerations in play than > just memory usage. Sure, there are those cases but it'd be unwise to hinge long term decisions on them. It's hard to even argue 16bit pid in legacy code as a significant contributing factor at this point. At any rate, it seems that pid is a global resource which needs to be provisioned for reasonable isolation which is a good reason to consider controlling it via cgroups. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, Aleksa. On Sat, Feb 28, 2015 at 08:26:34PM +1100, Aleksa Sarai wrote: > I just want to quickly echo my support for this statement. Process IDs > aren't limited by kernel memory, they're a hard-set limit. Thus they are Process IDs become a hard global resource because we didn't switch to long during 64bit transition and put an artifical global limit on it, which allows it to affect system-wide operation while its memory consumption is staying within practical range. > a resource like other global resources (open files, etc). Now, while you Unlike open files. > can argue that it is possible to limit the amount of *effective* > processes you can use in a cgroup through kmemcg (by limiting the amount > of memory spent in storing task_struct data) -- that isn't limiting the > usage of the *actual* resource (the fact you're limiting the number of > PIDs is little more than a by-product). No, the problem is not that. The problem is that pid_t is, as a resource, is decoupled from its backing resource - memory - by the extra artificial and difficult-to-overcome limit put on it. You are saying something which is completely different from what Austin was arguing. > Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing? One strong reason would be because we didn't have a way to account for and limit the fundamental resources. If you can fully contain and control the consumption via rationing the underlying resource, there isn't much point in controlling the upper layer constructs. > To me, that indicates that PID limiting not an esoteric usecase and it > should be possible to use the Linux kernel's home-grown accounting > system to limit the number of PIDs in a cgroup. Otherwise you're stuck Again, I think it's a lot more indicative of the fact that we didn't have any way to control kernel side memory consumption and pids and open files were one of the things which are relatively easy to implement policy-wise. > in a weird world where you *can* limit the number of processes in a > process tree but *not* the number of processes in a cgroup. I'm not sold on the idea of replicating the features of ulimit in cgroups. ulimit is a mixed bag of relatively easily implementable resource limits and their behaviors are a combination of resource limits, per-user usage policies, and per-process behavior safetynets. The only part translatable to cgroups is actual resource related part and even among those we should identify what are actual resources which can't be mapped to consumption of other fundamental resources. > >> In general, I'm pretty strongly against adding controllers for things > >> which aren't fundamental resources in the system. What's next? Open > >> files? Pipe buffer? Number of flocks? Number of session leaders or > >> program groups? > >> > > PID's are a fundamental resource, you run out and it's an only marginally > > better situation than OOM, namely, if you don't already have a shell open > > which has kill builtin (because you can't fork), or have some other reliable > > way to terminate processes without forking, you are stuck either waiting for > > the problem to resolve itself, or have to reset the system. > > I couldn't agree more. PIDs are a fundamental resource because there is > a hard limit on the amount of PIDs you can have in any one system. Once > you've exhausted that limit, there's not much you can do apart from > doing the SYSRQ dance. The reason why this holds is because we can hit the global limit way earlier than a practically sized kmem consumption limits can kick in. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
> I wouldn't think that preventing PID exhaustion would be all that much of a > niche case, it's fully possible for it to happen without using excessive > amounts of kernel memory (think about BIG server systems with terabytes of > memory running (arguably poorly written) forking servers that handle tens of > thousands of client requests per second, each lasting multiple tens of > seconds), and not necessarily as trivial as you might think to handle sanely > (especially if you want callbacks when the limits get hit). > As far as being trivial to achieve, I'm assuming you are referring to rlimit > and PAM's limits module, both of which have their own issues. Using > pam_limits.so to limit processes isn't trivial because it requires calling > through PAM to begin with, which almost no software that isn't login related > does, and rlimits are tricky to set up properly with the granularity that > having a cgroup would provide. I just want to quickly echo my support for this statement. Process IDs aren't limited by kernel memory, they're a hard-set limit. Thus they are a resource like other global resources (open files, etc). Now, while you can argue that it is possible to limit the amount of *effective* processes you can use in a cgroup through kmemcg (by limiting the amount of memory spent in storing task_struct data) -- that isn't limiting the usage of the *actual* resource (the fact you're limiting the number of PIDs is little more than a by-product). Also, If it wasn't an actual resource then why is RLIMIT_NPROC a thing? To me, that indicates that PID limiting not an esoteric usecase and it should be possible to use the Linux kernel's home-grown accounting system to limit the number of PIDs in a cgroup. Otherwise you're stuck in a weird world where you *can* limit the number of processes in a process tree but *not* the number of processes in a cgroup. >> In general, I'm pretty strongly against adding controllers for things >> which aren't fundamental resources in the system. What's next? Open >> files? Pipe buffer? Number of flocks? Number of session leaders or >> program groups? >> > PID's are a fundamental resource, you run out and it's an only marginally > better situation than OOM, namely, if you don't already have a shell open > which has kill builtin (because you can't fork), or have some other reliable > way to terminate processes without forking, you are stuck either waiting for > the problem to resolve itself, or have to reset the system. I couldn't agree more. PIDs are a fundamental resource because there is a hard limit on the amount of PIDs you can have in any one system. Once you've exhausted that limit, there's not much you can do apart from doing the SYSRQ dance. -- Aleksa Sarai (cyphar) www.cyphar.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 01:45:09PM -0800, Tim Hockin wrote: > Are you willing to put a drop-dead date on it? If we don't have > kmemcg working well enough to _actually_ bound PID usage and FD usage > by, say, June 1st, will you then accept a patch to this effect? If > the answer is no, then I have zero faith that it's coming any time > soon - I heard this 2 years ago. I believed you then. Tim, cut this bullshit. That's not how kernel development works. Contribute to techincal discussion or shut it. I'm really getting tired of your whining without any useful substance. > I see further downthread that you said you'll think about it. Thank > you. Just because our use cases are not normal does not mean we're > not valid :) And can you even see why that made progress? -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 9:45 AM, Tejun Heo wrote: > On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote: >> > In general, I'm pretty strongly against adding controllers for things >> > which aren't fundamental resources in the system. What's next? Open >> > files? Pipe buffer? Number of flocks? Number of session leaders or >> > program groups? >> >> Yes to some or all of those. We do exactly this internally and it has >> greatly added to the stability of our overall container management >> system. and while you have been telling everyone to wait for kmemcg, >> we have had an extra 3+ years of stability. > > Yeah, good job. I totally get why kernel part of memory consumption > needs protection. I'm not arguing against that at all. You keep shifting the focus to be about memory, but that's not what people are asking for. You're letting the desire for a perfect solution (which is years late) block good solutions that exist NOW. >> > If you want to prevent a certain class of jobs from exhausting a given >> > resource, protecting that resource is the obvious thing to do. >> >> I don't follow your argument - isn't this exactly what this patch set >> is doing - protecting resources? > > If you have proper protection over kernel memory consumption, this is > completely covered because memory is the fundamental resource here. > Controlling distribution of those fundamental resources is what > cgroups are primarily about. You say that's what cgroups are about, but it's not at all obvious that you are right. What users, admins, systems people want is building blocks that are usable and make sense. Limiting kernel memory is NOT the logical building block, here. It's not something people can reason about or quantify easily. if you need to implement the interfaces in terms of memory, go nuts, but making users think liek that is just not right. >> > Wasn't it like a year ago? Yeah, it's taking longer than everybody >> > hoped but seriously kmemcg reclaimer just got merged and also did the >> > new memcg interface which will tie kmemcg and memcg together. >> >> By my email it was almost 2 years ago, and that was the second or >> third incarnation of this patch. > > Again, I agree this is taking a while. Memory people had to retool > the whole reclamation path to make this work, which is the pattern > being repeated across the different controllers - we're refactoring a > lot of infrastructure code so that resource control can integrate with > the regular operation of the kernel, which BTW is what we should have > been doing from the beginning. > > If your complaint is that this is taking too long, I hear you, and > there's a certain amount of validity in arguing that upstreaming a > temporary measure is the better trade-off, but the rationale for nproc > (or nfds, or virtual memory, whatever) has been pretty weak otherwise. At least 3 or 4 people have INDEPENDENTLY decided this is what is causing them pain and tried to fix it and invested the time to send a patch says that it is actually a thing. There exists a problem that you are disallowing to be fixed. Do you recognize that users are experiencing pain? Why do you hate your users? :) > And as for the different incarnations of this patchset. Reposting the > same stuff repeatedly doesn't really change anything. Why would it? Because reasonable people might survey the ecosystem and say "humm, things have changed over the years - isolation has become a pretty serious topic". or maybe they hope that you'll finally agree that fixing the problem NOW is worthwhile, even if the solution is imperfect, and that a more perfect solution will arrive. >> >> Something like this is long overdue, IMO, and is still more >> >> appropriate and obvious than kmemcg anyway. >> > >> > Thanks for chiming in again but if you aren't bringing out anything >> > new to the table (I don't remember you doing that last time either), >> > I'm not sure why the decision would be different this time. >> >> I'm just vocalizing my support for this idea in defense of practical >> solutions that work NOW instead of "engineering ideals" that never >> actually arrive. >> >> As containers take the server world by storm, stuff like this gets >> more and more important. > > Again, protection of kernel side memory consumption is important. > There's no question about that. As for the never-arriving part, well, > it is arriving. If you still can't believe, just take a look at the > code. Are you willing to put a drop-dead date on it? If we don't have kmemcg working well enough to _actually_ bound PID usage and FD usage by, say, June 1st, will you then accept a patch to this effect? If the answer is no, then I have zero faith that it's coming any time soon - I heard this 2 years ago. I believed you then. I see further downthread that you said you'll think about it. Thank you. Just because our use cases are not normal does not mean we're not valid :) Tim -- To unsubscribe
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, Austin. On Fri, Feb 27, 2015 at 01:49:53PM -0500, Austin S Hemmelgarn wrote: > As far as being trivial to achieve, I'm assuming you are referring to rlimit > and PAM's limits module, both of which have their own issues. Using > pam_limits.so to limit processes isn't trivial because it requires calling > through PAM to begin with, which almost no software that isn't login related > does, and rlimits are tricky to set up properly with the granularity that > having a cgroup would provide. ... > PID's are a fundamental resource, you run out and it's an only marginally > better situation than OOM, namely, if you don't already have a shell open > which has kill builtin (because you can't fork), or have some other reliable > way to terminate processes without forking, you are stuck either waiting for > the problem to resolve itself, or have to reset the system. Right, this is an a lot more valid argument. Currently, we're capping max pid at 4M which translates to some tens of gigs of memory which isn't a crazy amount on modern machines. The hard(er) barrier would be around 2^30 (2^29 from futex side, apparently) which would also be reacheable on configurations w/ terabytes of memory. I'll think more about it and get back. Thanks a lot. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On 2015-02-27 12:06, Tejun Heo wrote: Hello, On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote: Kernel memory consumption isn't the only valid reason to want to limit the number of processes in a cgroup. Limiting the number of processes is very useful to ensure that a program is working correctly (for example, the NTP daemon should (usually) have an _exact_ number of children if it is functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_ children), to prevent PID number exhaustion, to head off DoS attacks against forking network servers before they get to the point of causing kmem exhaustion, and to limit the number of processes in a cgroup that uses lots of kernel memory very infrequently. All the use cases you're listing are extremely niche and can be trivially achieved without introducing another cgroup controller. Not only that, they're actually pretty silly. Let's say NTP daemon is misbehaving (or its code changed w/o you knowing or there are corner cases which trigger extremely infrequently). What do you exactly achieve by rejecting its fork call? It's just adding another variation to the misbehavior. It was misbehaving before and would now be continuing to misbehave after a failed fork. I wouldn't think that preventing PID exhaustion would be all that much of a niche case, it's fully possible for it to happen without using excessive amounts of kernel memory (think about BIG server systems with terabytes of memory running (arguably poorly written) forking servers that handle tens of thousands of client requests per second, each lasting multiple tens of seconds), and not necessarily as trivial as you might think to handle sanely (especially if you want callbacks when the limits get hit). As far as being trivial to achieve, I'm assuming you are referring to rlimit and PAM's limits module, both of which have their own issues. Using pam_limits.so to limit processes isn't trivial because it requires calling through PAM to begin with, which almost no software that isn't login related does, and rlimits are tricky to set up properly with the granularity that having a cgroup would provide. In general, I'm pretty strongly against adding controllers for things which aren't fundamental resources in the system. What's next? Open files? Pipe buffer? Number of flocks? Number of session leaders or program groups? PID's are a fundamental resource, you run out and it's an only marginally better situation than OOM, namely, if you don't already have a shell open which has kill builtin (because you can't fork), or have some other reliable way to terminate processes without forking, you are stuck either waiting for the problem to resolve itself, or have to reset the system. If you want to prevent a certain class of jobs from exhausting a given resource, protecting that resource is the obvious thing to do. Which is why I'm advocating something that provides a more robust method of preventing the system from exhausting PID numbers. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 12:45:03PM -0500, Tejun Heo wrote: > If your complaint is that this is taking too long, I hear you, and > there's a certain amount of validity in arguing that upstreaming a > temporary measure is the better trade-off, but the rationale for nproc > (or nfds, or virtual memory, whatever) has been pretty weak otherwise. Also, note that this is subset of a larger problem. e.g. there's a patchset trying to implement writeback IO control from the filesystem layer. cgroup control of writeback has been a thorny issue for over three years now and the rationale for implementing this reversed controlling scheme is about the same - doing it properly is too difficult, let's bolt something on the top as a practical measure. I think it'd be seriously short-sighted to give in and merge all those. These sorts of shortcuts are crippling in the long term. Again, similarly, proper cgroup writeback support is literally right around the corner. The situation sure can be frustrating if you need something now but we can't make decisions solely on that. This is an a lot longer term project and we better, for once, get things right. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 09:25:10AM -0800, Tim Hockin wrote: > > In general, I'm pretty strongly against adding controllers for things > > which aren't fundamental resources in the system. What's next? Open > > files? Pipe buffer? Number of flocks? Number of session leaders or > > program groups? > > Yes to some or all of those. We do exactly this internally and it has > greatly added to the stability of our overall container management > system. and while you have been telling everyone to wait for kmemcg, > we have had an extra 3+ years of stability. Yeah, good job. I totally get why kernel part of memory consumption needs protection. I'm not arguing against that at all. > > If you want to prevent a certain class of jobs from exhausting a given > > resource, protecting that resource is the obvious thing to do. > > I don't follow your argument - isn't this exactly what this patch set > is doing - protecting resources? If you have proper protection over kernel memory consumption, this is completely covered because memory is the fundamental resource here. Controlling distribution of those fundamental resources is what cgroups are primarily about. > > Wasn't it like a year ago? Yeah, it's taking longer than everybody > > hoped but seriously kmemcg reclaimer just got merged and also did the > > new memcg interface which will tie kmemcg and memcg together. > > By my email it was almost 2 years ago, and that was the second or > third incarnation of this patch. Again, I agree this is taking a while. Memory people had to retool the whole reclamation path to make this work, which is the pattern being repeated across the different controllers - we're refactoring a lot of infrastructure code so that resource control can integrate with the regular operation of the kernel, which BTW is what we should have been doing from the beginning. If your complaint is that this is taking too long, I hear you, and there's a certain amount of validity in arguing that upstreaming a temporary measure is the better trade-off, but the rationale for nproc (or nfds, or virtual memory, whatever) has been pretty weak otherwise. And as for the different incarnations of this patchset. Reposting the same stuff repeatedly doesn't really change anything. Why would it? > >> Something like this is long overdue, IMO, and is still more > >> appropriate and obvious than kmemcg anyway. > > > > Thanks for chiming in again but if you aren't bringing out anything > > new to the table (I don't remember you doing that last time either), > > I'm not sure why the decision would be different this time. > > I'm just vocalizing my support for this idea in defense of practical > solutions that work NOW instead of "engineering ideals" that never > actually arrive. > > As containers take the server world by storm, stuff like this gets > more and more important. Again, protection of kernel side memory consumption is important. There's no question about that. As for the never-arriving part, well, it is arriving. If you still can't believe, just take a look at the code. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 9:06 AM, Tejun Heo wrote: > Hello, > > On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote: >> Kernel memory consumption isn't the only valid reason to want to limit the >> number of processes in a cgroup. Limiting the number of processes is very >> useful to ensure that a program is working correctly (for example, the NTP >> daemon should (usually) have an _exact_ number of children if it is >> functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_ >> children), to prevent PID number exhaustion, to head off DoS attacks against >> forking network servers before they get to the point of causing kmem >> exhaustion, and to limit the number of processes in a cgroup that uses lots >> of kernel memory very infrequently. > > All the use cases you're listing are extremely niche and can be > trivially achieved without introducing another cgroup controller. Not > only that, they're actually pretty silly. Let's say NTP daemon is > misbehaving (or its code changed w/o you knowing or there are corner > cases which trigger extremely infrequently). What do you exactly > achieve by rejecting its fork call? It's just adding another > variation to the misbehavior. It was misbehaving before and would now > be continuing to misbehave after a failed fork. > > In general, I'm pretty strongly against adding controllers for things > which aren't fundamental resources in the system. What's next? Open > files? Pipe buffer? Number of flocks? Number of session leaders or > program groups? Yes to some or all of those. We do exactly this internally and it has greatly added to the stability of our overall container management system. and while you have been telling everyone to wait for kmemcg, we have had an extra 3+ years of stability. > If you want to prevent a certain class of jobs from exhausting a given > resource, protecting that resource is the obvious thing to do. I don't follow your argument - isn't this exactly what this patch set is doing - protecting resources? > Wasn't it like a year ago? Yeah, it's taking longer than everybody > hoped but seriously kmemcg reclaimer just got merged and also did the > new memcg interface which will tie kmemcg and memcg together. By my email it was almost 2 years ago, and that was the second or third incarnation of this patch. >> Something like this is long overdue, IMO, and is still more >> appropriate and obvious than kmemcg anyway. > > Thanks for chiming in again but if you aren't bringing out anything > new to the table (I don't remember you doing that last time either), > I'm not sure why the decision would be different this time. I'm just vocalizing my support for this idea in defense of practical solutions that work NOW instead of "engineering ideals" that never actually arrive. As containers take the server world by storm, stuff like this gets more and more important. Tim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 09:12:45AM -0800, Tim Hockin wrote: > I was told that the plan was to use kmemcg - but I was told that YEARS > AGO. In the mean time we all either do our own thing or we do nothing > and suffer. Wasn't it like a year ago? Yeah, it's taking longer than everybody hoped but seriously kmemcg reclaimer just got merged and also did the new memcg interface which will tie kmemcg and memcg together. > Something like this is long overdue, IMO, and is still more > appropriate and obvious than kmemcg anyway. Thanks for chiming in again but if you aren't bringing out anything new to the table (I don't remember you doing that last time either), I'm not sure why the decision would be different this time. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On Fri, Feb 27, 2015 at 8:42 AM, Austin S Hemmelgarn wrote: > On 2015-02-27 06:49, Tejun Heo wrote: >> >> Hello, >> >> On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote: >>> >>> The current state of resource limitation for the number of open >>> processes (as well as the number of open file descriptors) requires you >>> to use setrlimit(2), which means that you are limited to resource >>> limiting process trees rather than resource limiting cgroups (which is >>> the point of cgroups). >>> >>> There was a patch to implement this in 2011[1], but that was rejected >>> because it implemented a general-purpose rlimit subsystem -- which meant >>> that you couldn't control distinct resource limits in different >>> heirarchies. This patch implements a resource controller *specifically* >>> for the number of processes in a cgroup, overcoming this issue. >>> >>> There has been a similar attempt to implement a resource controller for >>> the number of open file descriptors[2], which has not been merged >>> becasue the reasons were dubious. Merely from a "sane interface" >>> perspective, it should be possible to utilise cgroups to do such >>> rudimentary resource management (which currently only exists for process >>> trees). >> >> >> This isn't a proper resource to control. kmemcg just grew proper >> reclaim support and will be useable to control kernel side of memory >> consumption. I was told that the plan was to use kmemcg - but I was told that YEARS AGO. In the mean time we all either do our own thing or we do nothing and suffer. Something like this is long overdue, IMO, and is still more appropriate and obvious than kmemcg anyway. >> Thanks. >> > Kernel memory consumption isn't the only valid reason to want to limit the > number of processes in a cgroup. Limiting the number of processes is very > useful to ensure that a program is working correctly (for example, the NTP > daemon should (usually) have an _exact_ number of children if it is > functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_ > children), to prevent PID number exhaustion, to head off DoS attacks against > forking network servers before they get to the point of causing kmem > exhaustion, and to limit the number of processes in a cgroup that uses lots > of kernel memory very infrequently. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, On Fri, Feb 27, 2015 at 11:42:10AM -0500, Austin S Hemmelgarn wrote: > Kernel memory consumption isn't the only valid reason to want to limit the > number of processes in a cgroup. Limiting the number of processes is very > useful to ensure that a program is working correctly (for example, the NTP > daemon should (usually) have an _exact_ number of children if it is > functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_ > children), to prevent PID number exhaustion, to head off DoS attacks against > forking network servers before they get to the point of causing kmem > exhaustion, and to limit the number of processes in a cgroup that uses lots > of kernel memory very infrequently. All the use cases you're listing are extremely niche and can be trivially achieved without introducing another cgroup controller. Not only that, they're actually pretty silly. Let's say NTP daemon is misbehaving (or its code changed w/o you knowing or there are corner cases which trigger extremely infrequently). What do you exactly achieve by rejecting its fork call? It's just adding another variation to the misbehavior. It was misbehaving before and would now be continuing to misbehave after a failed fork. In general, I'm pretty strongly against adding controllers for things which aren't fundamental resources in the system. What's next? Open files? Pipe buffer? Number of flocks? Number of session leaders or program groups? If you want to prevent a certain class of jobs from exhausting a given resource, protecting that resource is the obvious thing to do. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
On 2015-02-27 06:49, Tejun Heo wrote: Hello, On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote: The current state of resource limitation for the number of open processes (as well as the number of open file descriptors) requires you to use setrlimit(2), which means that you are limited to resource limiting process trees rather than resource limiting cgroups (which is the point of cgroups). There was a patch to implement this in 2011[1], but that was rejected because it implemented a general-purpose rlimit subsystem -- which meant that you couldn't control distinct resource limits in different heirarchies. This patch implements a resource controller *specifically* for the number of processes in a cgroup, overcoming this issue. There has been a similar attempt to implement a resource controller for the number of open file descriptors[2], which has not been merged becasue the reasons were dubious. Merely from a "sane interface" perspective, it should be possible to utilise cgroups to do such rudimentary resource management (which currently only exists for process trees). This isn't a proper resource to control. kmemcg just grew proper reclaim support and will be useable to control kernel side of memory consumption. Thanks. Kernel memory consumption isn't the only valid reason to want to limit the number of processes in a cgroup. Limiting the number of processes is very useful to ensure that a program is working correctly (for example, the NTP daemon should (usually) have an _exact_ number of children if it is functioning correctly, and rpcbind shouldn't (AFAIK) ever have _any_ children), to prevent PID number exhaustion, to head off DoS attacks against forking network servers before they get to the point of causing kmem exhaustion, and to limit the number of processes in a cgroup that uses lots of kernel memory very infrequently. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, On Fri, Feb 27, 2015 at 02:46:13PM +0100, Richard Weinberger wrote: > just to make sure that I understand the big picture. > The plan is to limit kernel memory per cgroup such that fork bombs and > stuff cannot harm other groups of processes? Yes, the kmem part of memcg hasn't really been functional because the reclaim part was broken and (partially conseqently) kmem config being siloed from the rest but we're very close to solving that at this point. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Tejun, Am 27.02.2015 um 12:49 schrieb Tejun Heo: > This isn't a proper resource to control. kmemcg just grew proper > reclaim support and will be useable to control kernel side of memory > consumption. just to make sure that I understand the big picture. The plan is to limit kernel memory per cgroup such that fork bombs and stuff cannot harm other groups of processes? Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC 0/2] add nproc cgroup subsystem
Hello, On Mon, Feb 23, 2015 at 02:08:09PM +1100, Aleksa Sarai wrote: > The current state of resource limitation for the number of open > processes (as well as the number of open file descriptors) requires you > to use setrlimit(2), which means that you are limited to resource > limiting process trees rather than resource limiting cgroups (which is > the point of cgroups). > > There was a patch to implement this in 2011[1], but that was rejected > because it implemented a general-purpose rlimit subsystem -- which meant > that you couldn't control distinct resource limits in different > heirarchies. This patch implements a resource controller *specifically* > for the number of processes in a cgroup, overcoming this issue. > > There has been a similar attempt to implement a resource controller for > the number of open file descriptors[2], which has not been merged > becasue the reasons were dubious. Merely from a "sane interface" > perspective, it should be possible to utilise cgroups to do such > rudimentary resource management (which currently only exists for process > trees). This isn't a proper resource to control. kmemcg just grew proper reclaim support and will be useable to control kernel side of memory consumption. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RFC 0/2] add nproc cgroup subsystem
The current state of resource limitation for the number of open processes (as well as the number of open file descriptors) requires you to use setrlimit(2), which means that you are limited to resource limiting process trees rather than resource limiting cgroups (which is the point of cgroups). There was a patch to implement this in 2011[1], but that was rejected because it implemented a general-purpose rlimit subsystem -- which meant that you couldn't control distinct resource limits in different heirarchies. This patch implements a resource controller *specifically* for the number of processes in a cgroup, overcoming this issue. There has been a similar attempt to implement a resource controller for the number of open file descriptors[2], which has not been merged becasue the reasons were dubious. Merely from a "sane interface" perspective, it should be possible to utilise cgroups to do such rudimentary resource management (which currently only exists for process trees). Aleksa Sarai (2): cgroups: allow a cgroup subsystem to reject a fork cgroups: add an nproc subsystem include/linux/cgroup.h| 9 ++- include/linux/cgroup_subsys.h | 4 + init/Kconfig | 10 +++ kernel/Makefile | 1 + kernel/cgroup.c | 13 ++- kernel/cgroup_freezer.c | 6 +- kernel/cgroup_nproc.c | 181 ++ kernel/fork.c | 4 +- kernel/sched/core.c | 3 +- 9 files changed, 221 insertions(+), 10 deletions(-) create mode 100644 kernel/cgroup_nproc.c [1]: https://lkml.org/lkml/2011/6/19/170 [2]: https://lkml.org/lkml/2014/7/2/640 -- 2.3.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/