On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote: > On 4.11.2020 11.29, Florian Weimer wrote: > > * Will Deacon: > > > > > Is there real value in this seccomp filter if it only looks at mprotect(), > > > or was it just implemented because it's easy to do and sounds like a good > > > idea? > > > > It seems bogus to me. Everyone will just create alias mappings instead, > > just like they did for the similar SELinux feature. See “Example code > > to avoid execmem violations” in: > > > > <https://www.akkadia.org/drepper/selinux-mem.html> > > Also note "But this is very dangerous: programs should never use memory > regions which are writable and executable at the same time. Assuming that it > is really necessary to generate executable code while the program runs the > method employed should be reconsidered."
Sure, and to be clear we're not trying to violate the "at the same time" property. We do not want to permit simultaneous PROT_WRITE and PROT_EXEC at any instant in time. What we're asking is to not block changing permissions to PROT_EXEC in the absence of PROT_WRITE. I think that the goal of preventing WRITE -> EXEC transitions for some memory is sane, but I think the existing kernel primitives available to systemd don't allow us to do that in a robust way because we don't have all the relevant state tracked and accessible, and the existing approach gets in the way of doing the right thing for other mitigations. Consequently I think it would be better going forward to add a more robust (kernel) mechanism for enforcement that can distinguish WRITE->EXEC from EXEC->EXEC+BTI, and e.g. can be used to forbid aliasing mappings with differing W/X permissions. Then userspace could eventually transition over to that and get /stronger/ protection while permitting the BTI case we'd like to work now. > If a service legitimately needs executable and writable mappings (due to > JIT, trampolines etc), it's easy to disable the filter whenever really > needed with "MemoryDenyWriteExecute=no" (which is the default) in case of > systemd or a TE rule like "allow type_t self:process { execmem };" for > SELinux. But this shouldn't be the default case, since there are many > services which don't need W&X. > > I'd also question what is the value of BTI if it can be easily circumvented > by removing PROT_BTI with mprotect()? I agree that turning BTI off is a concern, and to that end I'd like to add an enforcement mechanism whereby we could prevent that (ideally the same mechanism by which we could prevent WRITE -> EXEC transitions). But, as with all things it's a matter of degree. MDWE and BTI are both hurdles to an adversary, but neither are absolutes and there are approaches to bypass either. By the time someone's issuing mprotect() with an arbitrary VA and/or prot, they are liable to have been able to do the same with mmap() and circumvent MDWE. I'd really like to not have BTI silently disabled in order to work with MDWE, because the risk is that it gets silently disabled elsewhere. The risk of the changing the kernel to enable BTI for a binary is not well known since we don't control other peoples libraries that might end up not being compatible somehow with that. The risk of disabling a portion of the MDWE protections seems to be the least out of the options we have available, as unfortunate as it seems, and I think we can come up with a better MDWE approach going forward. Thanks, Mark.