On Fri, Nov 3, 2017 at 6:22 PM, Linus Torvalds <torva...@linux-foundation.org> wrote: > On Fri, Nov 3, 2017 at 5:42 PM, Kees Cook <keesc...@chromium.org> wrote: >> >> If we didn't do the "but no more than 75% of _STK_LIM", and moved to >> something like "check stack utilization after loading the binary", we >> end up in the position where the kernel is past the point of no return >> (so instead of E2BIG, the execve()ing process just SEGVs), which is >> much harder to debug or recover from (i.e. there's no process left to >> return from the execve() from). > > Yeah, we've had that problem in the past, and it's the worst of all worlds. > > You can still trigger it (set RLIMIT_DATA to something much too small, > for example, and then generate more than that by just repeating the > same argument multiple times so that the execve() user doesn't trigger > the limit, but the newly executed process does). > > But it should really be something that you need to be truly insane to trigger. > > I think we still don't know whether we're going to be suid at the time > we copy the arguments, do we?
We don't. (In fact, arg copying happens before we've even figured out which binfmt is involved.) I lifted it to just before the point of no return, but moving it before arg copying looks very hard (which contributed to why we went with the implementation we did). > So it's pretty painful to make the limits different for suid and > non-suid binaries. I would agree. -Kees -- Kees Cook Pixel Security