On 21/06/17 18:25, Jeff Law wrote: > On 06/21/2017 02:41 AM, Richard Earnshaw (lists) wrote: > >>> But the stack pointer might have already been advanced into the guard >>> page by the caller. For the sake of argument assume the guard page is >>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that >>> the caller hasn't touched the 0xf1000 page. >> >> Then make sure the caller does touch the 0xf1000 page. If it's >> allocated that much stack it should be forced to do the probe and not >> rely on all it's children having to do it because it can't be bothered. > That needs to be mandated at the ABI level if it's going to happen. The > threat model assumes that the caller adheres to the ABI, but was not > necessarily compiled with -fstack-check.
The base ABI would never mandate stack probes. Some systems may locate the stack in such a way that it can never collide with the heap, making guard pages and probes completely unnecessary (but perhaps at the expense of limiting the theoretical maximum stack size). It might say "if you do stack probes, use this model", but even then it would need to parameterize the whole model as there are just too many OS configuration options to consider (page size, size of guard zone, for example). > > I'm all for making the common path fast and letting the uncommon cases > pay additional penalties. That mindset has driven the work-to-date. > > But I don't think I have the liberty to change existing ABIs to > facilitate lower overhead approaches. But I think ARM does given it > owns the ABI for aarch64 and I would happily exploit whatever guarantees > we can derive from an updated ABI. > > So if you want the caller to touch the page, you need to amend the ABI. > I'd think touching the lowest address of the alloca area and outgoing > args, if large would be sufficient. > > I can't help but feel there's a bit of a goode olde mediaeval witch hunt going on here. As Wilco points out, we can never defend against a function that is built without probe operations but skips the entire guard zone. The only defence there is a larger guard zone, but how big do you make it? So we can design a half-way house probing scheme which doesn't really solve the problem and is perhaps so expensive that most people will turn it off. Or we can design something that addresses the problem scientifically if applied everywhere and has almost zero impact on the code size or performance. The latter would probably be so cheap that most people would never notice that it was even on at all. Yes, you'd need a system recompile to deploy it in full, but even a fairly limited rebuild of critical libraries (libc, libstdc++) would help. Whichever route we take, this wouldn't be an ABI break. New code will still interoperate with old code; you just don't get full heap protection if you mix old and new. I know which I'd prefer. R.