Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Tue, 6 Mar 2018 15:42:41 -0800 Chris Mason wrote: > On 6 Mar 2018, at 11:12, Linus Torvalds wrote: > [...] > > > > I do *not* want this to be a magical way to hide things. > > Especially early on, this makes a lot of sense. But I wanted to plug > bps and the hopefully growing set of bpf introspection tools: > > https://github.com/iovisor/bcc/blob/master/introspection/bps_example.txt > > Long term these are probably a good place to tell the admin what's going > on. (related to bpf itself not modprobe subject) Hi Chris, I just want to point out that the tool 'bpftool', is currently the dominating tool for eBPF introspection. And the 'bps' tool you mention seems to have gained less (open source) traction. The bpftool is part of the kernel git-tree: tools/bpf/bpftool https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/ And it even have bash-completion and man-pages in RST format so they even render nicely when viewed via github: https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool.rst https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-prog.rst https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-map.rst -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 22, 2018 at 3:15 PM, Andy Lutomirski wrote: > All we need to do is to make sure that, if this is > distributed as a module, that it's init routine doesn't wait for a > long time, right? Yeap. Luis
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/22/18 3:15 PM, Andy Lutomirski wrote: On Thu, Mar 22, 2018 at 8:54 PM, Luis R. Rodriguez wrote: If we can ensure that these usermode modules don't take *any time at all* on their init *from the start*, it would be wonderful and we'd end up avoiding some really odd corner case issues later. I don't see why this issue needs to exist at all for the new stuff. After all, the new things aren't usermode modules per se. They're regular kernel code (modular or otherwise) that loads a usermode helper. All we need to do is to make sure that, if this is distributed as a module, that it's init routine doesn't wait for a long time, right? I've implemented all of the previous suggestions and now there are zero changes to kernel/module.c I still need to finish tracpoint stuff first and polish umh code a bit before sending new version. Let's hold on this thread until then.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 22, 2018 at 8:54 PM, Luis R. Rodriguez wrote: > If we can ensure that these usermode modules don't take *any time at all* on > their init *from the start*, it would be wonderful and we'd end up avoiding > some really odd corner case issues later. > I don't see why this issue needs to exist at all for the new stuff. After all, the new things aren't usermode modules per se. They're regular kernel code (modular or otherwise) that loads a usermode helper. All we need to do is to make sure that, if this is distributed as a module, that it's init routine doesn't wait for a long time, right?
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Sat, Mar 10, 2018 at 03:16:52PM +, Luis R. Rodriguez wrote: > On Sat, Mar 10, 2018 at 02:08:43PM +, Luis R. Rodriguez wrote: > > The alternative to this would be a simple equivalent of > > try_then_request_module() > > for UMH modules: try_umhm_then_request_umh_module() or whatever. So just as > > I > > argued earlier over UMH limitations, this is not the end of the world for > > umh > > modules, and it doesn't mean you can't get *properly* add umh modules > > upstream, > > it would *just mean* we'd be perpetuating today's (IMHO) horrible and loose > > semantics. > > I was about to suggest that perhaps a try_umhm_then_request_umh_module() or > whatever should not be a macro -- but instead an actual routine, and we don't > export say the simple form to avoid non-deterministic uses of it from the > start... but the thing is *it'd have to be a macro* given that the *check* for > the module *has to be loose*, just as try_then_request_module()... > > *Ugh* gross. > > Another reason for me to want an actual deterministic clean proper solution > from the start. I just thought of another consideration which should be made here for the long term. Some init systems have a timeout for kmod workers, that is the userspace process which issues the modprobe call. That was very well intentioned, however it ended up being nonsense, so at least on SLE systemd we disable the timeout for kmod workers. What others do... is unclear to me. Upstream wise the timeout was increased considerably, however, *if* such timeout is in effect for users it has some implicit implications on the number of devices a driver could support: number_devices = systemd_timeout - max known probe time for driver I've documented the logic to these conclusions [0]. It sounds like we *do* want a full sync wait mechanism, and as I noted I think we should fix the determinism aspect of it. Since no aliases will be supported for usermode modules this will be much easier to support, and I can volunteer to help with that. However given the above... if we're going to use request_module() API (or a really fixed deterministic version of it later) for usermode kernel modules, the limitation above still applies. Are these usermode modules doing all the handy work on init? Or can it be deferred once loaded? How much loading on init should a usermode module need? If we can ensure that these usermode modules don't take *any time at all* on their init *from the start*, it would be wonderful and we'd end up avoiding some really odd corner case issues later. [0] http://www.do-not-panic.com/2015/12/linux-asynchronous-probe.html Luis
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Mon, Mar 12, 2018 at 10:22:00AM -0700, Alexei Starovoitov wrote: > On 3/10/18 7:34 AM, Luis R. Rodriguez wrote: > > Also, > > > > Alexei you never answered my questions out aliases with the umh modules. > > Long term this important to consider. > > aliases always felt like a crutch to me. > I can see an argument when they're used as 'alias pci:* foo' > but the way it's used in networking with ip_set_* and nf-* is > something I prefer not to ever do again. > Definitely no aliases for bpfilter umh. I agree, let's not do that if at all possible for these types of binaries. greg k-h
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/12/18 5:02 AM, Edward Cree wrote: On 09/03/18 18:58, Alexei Starovoitov wrote: It's not waiting for the whole thing, because once bpfilter starts it stays running/sleeping because it's stateful. So, this has been bugging me a bit. If bpfilter takes a signal and crashes, all that state goes away. Does that mean your iptables/netfilter config just got forgotten and next time you run iptables it disappears, so you have to re-apply it all again? It needs normal malloc-ed memory to keep the state of iptable->bpf translation that it will use later during subsequent translation calls. Theoretically it can use bpf maps pinned in kernel memory to keep this state, but then it's non-swappable. It's better to keep bpfilter state in its own user memory. Perhaps the state should live in swappable kernel memory (e.g. a tmpfs thing, which bpfilter could access through a mount). It'd be read-only to userspace, listing the existing rules (in untranslated form), and be updated to reflect the new rule after bpfilter has supplied the updated translation. Then bpfilter can cache things if it wants, but the kernel remains the ultimate arbiter of the state and maintains it over a bpfilter crash. seems like overkill. I consider crashing bpfilter same severity as kernel bug. Whatever firewall rules already installed will continue to work, but new ones won't be able to load and current set cannot be queried. Control plane crashed, dataplane continues to work. Still a ton better than whole system crash. We have plenty of work ahead of us without worrying about restarting that umh and reloading its state from tmpfs. Something to consider for later phases of the project.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/10/18 7:34 AM, Luis R. Rodriguez wrote: Also, Alexei you never answered my questions out aliases with the umh modules. Long term this important to consider. aliases always felt like a crutch to me. I can see an argument when they're used as 'alias pci:* foo' but the way it's used in networking with ip_set_* and nf-* is something I prefer not to ever do again. Definitely no aliases for bpfilter umh.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 09/03/18 18:58, Alexei Starovoitov wrote: > It's not waiting for the whole thing, because once bpfilter starts it > stays running/sleeping because it's stateful. So, this has been bugging me a bit. If bpfilter takes a signal and crashes, all that state goes away. Does that mean your iptables/netfilter config just got forgotten and next time you run iptables it disappears, so you have to re-apply it all again? > It needs normal > malloc-ed memory to keep the state of iptable->bpf translation that > it will use later during subsequent translation calls. > Theoretically it can use bpf maps pinned in kernel memory to keep > this state, but then it's non-swappable. It's better to keep bpfilter > state in its own user memory. Perhaps the state should live in swappable kernel memory (e.g. a tmpfs thing, which bpfilter could access through a mount). It'd be read-only to userspace, listing the existing rules (in untranslated form), and be updated to reflect the new rule after bpfilter has supplied the updated translation. Then bpfilter can cache things if it wants, but the kernel remains the ultimate arbiter of the state and maintains it over a bpfilter crash. Sound reasonable? -Ed
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Sat, Mar 10, 2018 at 02:08:43PM +, Luis R. Rodriguez wrote: > The alternative to this would be a simple equivalent of > try_then_request_module() > for UMH modules: try_umhm_then_request_umh_module() or whatever. So just as I > argued earlier over UMH limitations, this is not the end of the world for umh > modules, and it doesn't mean you can't get *properly* add umh modules > upstream, > it would *just mean* we'd be perpetuating today's (IMHO) horrible and loose > semantics. I was about to suggest that perhaps a try_umhm_then_request_umh_module() or whatever should not be a macro -- but instead an actual routine, and we don't export say the simple form to avoid non-deterministic uses of it from the start... but the thing is *it'd have to be a macro* given that the *check* for the module *has to be loose*, just as try_then_request_module()... *Ugh* gross. Another reason for me to want an actual deterministic clean proper solution from the start. Luis
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
Also, Alexei you never answered my questions out aliases with the umh modules. Long term this important to consider. Luis
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Sat, Mar 10, 2018 at 1:43 AM, Alexei Starovoitov wrote: > On 3/9/18 11:37 AM, Andy Lutomirski wrote: >> >> On Fri, Mar 9, 2018 at 6:55 PM, David Miller wrote: >>> >>> From: Alexei Starovoitov >>> Date: Fri, 9 Mar 2018 10:50:49 -0800 >>> On 3/9/18 10:23 AM, Andy Lutomirski wrote: > > It might not be totally crazy to back it by tmpfs. interesting. how do you propose to do it? Something like: - create /umh_module_tempxxx dir - mount tmpfs there - copy elf into it and exec it? >>> >>> >>> I think the idea is that it's an internal tmpfs mount that only >>> the kernel has access too. >> >> >> That's what I was imagining. There's precedent. For example, there's >> a very short piece of code that does it in >> drivers/gpu/drm/i915/i915_gemfs.c. > > > I can do "monkey see monkey do" approach which will look like: > type = get_fs_type("tmpfs"); > fs = kern_mount(type); > > /* for each request_umh("foo") */ > file = shmem_file_setup_with_mnt(fs, "umh_foo"); > do { > pagecache_write_begin(file,...); > memcpy() > pagecache_write_end(); > } while (umh_elf_size); > do_execve_file(file); > fput(file); > > while keeping fs mounted forever? > is there better way? > Nice! I'm definitely not a pagecache expert, but it looks generally sane. Once the thing is actually functional, we can bang on it, and I'm sure that linux-mm will have some suggestions to tidy it up. As for the actual lifetime of the filesystem, I think it should be mounted once and never unmounted. Whenever it gains a second user, the whole thing can be moved to mm/ or lib/ and all the users can share the same mount. Minor caveat: I would arrange the code a bit differently, like this: static (or extern) unsigned char __initdata the_blob[]; static struct file *umh_blob_file; static int __init my_module_init_function(void) { /* for each request_umh("foo") */ umh_blob_file = shmem_file_setup_with_mnt(fs, "umh_foo"); do { pagecache_write_begin(umh_file,...); memcpy() pagecache_write_end(); } while (umh_elf_size); /* the_blob is implicitly freed after this returns */ } and then actually use the struct file later on. If and when you're sure you're not going to spawn another copy, you can fput() it. This way the memory becomes swappable immediately on load. As for request_module() vs request_module_umh(), my advice would be to write the code and then see what interface makes sense. I wouldn't be surprised if it ends up making more sense to keep all of this entirely independent from the module system. P.S. I suspect that, before this hits a release, someone's going to have to fiddle with the LSM hooks in do_execve() a bit to make sure that LSM unconditionally approves this type of umh program. Otherwise there might be pointless failures on some more locked down configurations. But that can wait until it's more final and the security folks review the code.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 06:34:18PM -0800, Alexei Starovoitov wrote: > On 3/9/18 11:38 AM, Linus Torvalds wrote: > > On Fri, Mar 9, 2018 at 11:12 AM, Linus Torvalds > > wrote: > > > > > > How are you going to handle five processes doing the same setup > > > concurrently? Let's keep in mind we don't have a formal way to specify this as well for modules as well, other than kconfig. Ie it'd be up to the author of the modules to pick this up and make things mutually exclusive. > > Side note: it's not just serialization. It's also "is it actually up > > and running". > > > > The rule for "request_module()" (for a real module) has been that it > > returns when the module is actually alive and active and have done > > their initcalls. Unfortunately this is not accurate though, the loose grammar around this fact is one of the reasons why long term I think either new API should be added, or the existing request_module() API modified. request_module() is not deterministic today, try_then_request_module() *is* but its IMHO its a horrible grammatical solution, and I'm not a fan of the idea of umh modules perpetuating this nonsense *long term*. Details below. > > The UMH_WAIT_EXEC behavior (ignore the serialization - you could do > > that in the caller) behavior doesn't actually have any semantics AT > > ALL. It only means that you get the error returns from execve() > > itself, so you know that the executable file actually existed and > > parsed right enough to get started. > > > > But you don't actually have any reason to believe that it has *done* > > anything, and started processing any requests. There's no reason > > what-so-ever to believe that it has registered itself for any > > asynchronous requests or anything like that. > > I agree that sync approach nicely fits this use case, but waiting > for umh brings the whole set of suspend/resume issues that > Luis explained earlier and if I understood his points that stuff > is still not quite working right It works enough that suspend works well enough on Linux today, but the primary reason is that the big kernel/umh.c API user today is the firmware API for the old fallback firmware interface and it has in place the sync usermodehelper_read_trylock() and async async usermodehelper_read_lock_wait(). Note that in the fallback case there is just the uevent call. > and he's planning a set of fixes. The move of the umh locks out of the non-fallback case was a long term goal I had which I was planning on doing slowly, but recently got jump started via v4.14 via commit f007cad159e9 ("Revert "firmware: add sanity check on shutdown/suspend"). As such that goal is now complete thanks to Linus correctly pushing it forward. > I really don't want the unknown timeline of fixes for 'waiting umh' > to become a blocker for the bpfilter project ... The reason for me to bring up the suspend stuff was that there no other *heavy* UMH API users in the kernel today, and just to highlight that care must be taken if we want to consider in the future further possibly heavy UMH callers which we *can* expect to be called around suspend and resume. *Iff* that will be the case for these new umh userspace modules, we can evaluate a future plan. But not that this is as vague as suggesting the same for any further kernel future UMH API user! If the only umh module we expect for a while will be bpfilter and it we don't expect heavy use during suspend/resume this is a non-issue and likely won't be for a while, and all that *should be done* is become aware of the today's limitations as we *document* any new API, even if its the simple and easy request_umh_module*() calls. Today's use of the UMH API to always use helper_lock() and prevent suspend while a call is being issued should suffice, so long as these umh modules don't become heavy users during suspend/resume. Note that there even *further* further advanced suspend/resume considerations with filesystems but we have a reasonable hand on what to do there [0] and it frankly isn't *that* much work as I have most of it done already. The only other suspend optimization I could think of left to evaluate may be whether or not to we should generalize something like the firmware cache for other UMH callers but that's really a long term topic. So I would not say there is pending work left to do, it should suffice to document the semantics and limitations if the umh modules are to be added. That's it. Linus has proven me right that the *concerns* I've had for these corner cases are just that, and I do believe documenting the limitations should suffice for new APIs. [0] https://lkml.kernel.org/r/20180126090923.gd12...@wotan.suse.de > Also I like Luis suggestion to use some other name than request_module() > Something like: > request_umh_module_sync("foo"); > request_umh_module_nowait("foo"); > > in both cases the kernel would create a pipe, call umh either > with UMH_WAIT_PROC or UMH_WAIT_EXEC and make sure that umh > responds to first
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 11:38 AM, Linus Torvalds wrote: On Fri, Mar 9, 2018 at 11:12 AM, Linus Torvalds wrote: How are you going to handle five processes doing the same setup concurrently? Side note: it's not just serialization. It's also "is it actually up and running". The rule for "request_module()" (for a real module) has been that it returns when the module is actually alive and active and have done their initcalls. The UMH_WAIT_EXEC behavior (ignore the serialization - you could do that in the caller) behavior doesn't actually have any semantics AT ALL. It only means that you get the error returns from execve() itself, so you know that the executable file actually existed and parsed right enough to get started. But you don't actually have any reason to believe that it has *done* anything, and started processing any requests. There's no reason what-so-ever to believe that it has registered itself for any asynchronous requests or anything like that. So in the real module case, you can do request_module("modulename"); and just start using whatever resource you just requested. So the netfilter code literally does request_module("nft-chain-%u-%.*s", family, nla_len(nla), (const char *)nla_data(nla)); nfnl_lock(NFNL_SUBSYS_NFTABLES); type = __nf_tables_chain_type_lookup(nla, family); if (type != NULL) return ERR_PTR(-EAGAIN); and doesn't even care about error handling for request_module() itself, because it knows that either the module got loaded and is ready, or something failed. And it needs to look that chain type up anyway, so the failure is indicated by _that_. With a UMH_WAIT_EXEC? No. You have *nothing*. You know the thing started, but it might have SIGSEGV'd immediately, and you have absolutely no way of knowing, and absolutely no way of even figuring it out. You can wait - forever - for something to bind to whatever dynamic resource you're expecting. You'll just fundamentally never know. You can try again, of course. Add a timeout, and try again in five seconds or something. Maybe it will work then. Maybe it won't. You won't have any way to know the _second_ time around either. Or the third. Or... See why I say it has to be synchronous? If it's synchronous, you can actually do things like (a) maybe you only need a one-time thing, and don't have any state ("load fixed tables, be done") and that's it. If the process returns with no error code, you're all done, and you know it's fine. I agree that sync approach nicely fits this use case, but waiting for umh brings the whole set of suspend/resume issues that Luis explained earlier and if I understood his points that stuff is still not quite working right and he's planning a set of fixes. I really don't want the unknown timeline of fixes for 'waiting umh' to become a blocker for the bpfilter project ... (b) maybe the process wants to start a listener daemon or something like the traditional inetd model. It can open the socket, it can start listening on it, and it can fork off a child and check it's status. It can then do exit(0) if everything is fine, and now request_module() returns. ... while for bpfilter use case we need a daemon and starting umh with UMH_WAIT_PROC which does fork() right away and parent does exit(0) is not any different from kernel pov than UMH_WAIT_EXEC. The kernel will be in the same situation described above. The forked process could have sigsegv quickly (right after telling parent that everything is ok) and kernel is waiting forever. That situation is what I was alluding to regarding that kernel<->umh need to have some sort of health check protocol. Regardless of how umh is invoked. I think what Andy proposing with kernel creating a pipe and giving it to umh can be used as this health check and means of 'load exclusion' to make sure that only one requested umh is running. Also I like Luis suggestion to use some other name than request_module() Something like: request_umh_module_sync("foo"); request_umh_module_nowait("foo"); in both cases the kernel would create a pipe, call umh either with UMH_WAIT_PROC or UMH_WAIT_EXEC and make sure that umh responds to first hello message. On success they would return a handle to that pipe and umh's pid. The further interaction between the kernel and umh will be via that pipe. If kernel->umh request times out the kernel side of bpfilter would sigkill the task and do request_umh() again. If that request_umh() fails there will be no retry, since at this point it's clear that given umh is broken. I'll implement only _nowait() version, since that's what we need for bpfilter and when suspend/resume issues are solved somebody who cares about usb driver in user mode can implement the _sync() variant. All that on top of tmpfs->file->execve_file hack that I still need feedback on.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 11:37 AM, Andy Lutomirski wrote: On Fri, Mar 9, 2018 at 6:55 PM, David Miller wrote: From: Alexei Starovoitov Date: Fri, 9 Mar 2018 10:50:49 -0800 On 3/9/18 10:23 AM, Andy Lutomirski wrote: It might not be totally crazy to back it by tmpfs. interesting. how do you propose to do it? Something like: - create /umh_module_tempxxx dir - mount tmpfs there - copy elf into it and exec it? I think the idea is that it's an internal tmpfs mount that only the kernel has access too. That's what I was imagining. There's precedent. For example, there's a very short piece of code that does it in drivers/gpu/drm/i915/i915_gemfs.c. I can do "monkey see monkey do" approach which will look like: type = get_fs_type("tmpfs"); fs = kern_mount(type); /* for each request_umh("foo") */ file = shmem_file_setup_with_mnt(fs, "umh_foo"); do { pagecache_write_begin(file,...); memcpy() pagecache_write_end(); } while (umh_elf_size); do_execve_file(file); fput(file); while keeping fs mounted forever? is there better way?
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 7:38 PM, Linus Torvalds wrote: > On Fri, Mar 9, 2018 at 11:12 AM, Linus Torvalds > wrote: >> >> How are you going to handle five processes doing the same setup concurrently? > > Side note: it's not just serialization. It's also "is it actually up > and running". > I think the right way to solve this would be to take a hint from systemd's socket activation model. The current patch had the module load process kick off an ELF binary that goes an registers itself to handle something. We can turn that around. Make the module init function create the socket (or pipe or whatever) receives request and pass it to the user program as stdin. Then the kernel can start queueing requests into the socket immediately, and the user program will get to them whenever it finishes initializing. Or it can write some message to the socket saying "hey, I'm ready". This also completely avoids the issue where some clever user manually loads the "module" with exec() ("hey, I'm so clever, I can just run the damn thing instead if using init_module()!" or writes an out-of-tree program that uses whatever supposedly secret API the in-kernel binary is supposed to use to register itself (and I know people who would do exactly that!) and the kernel does request_module() at roughly the same time.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 6:55 PM, David Miller wrote: > From: Alexei Starovoitov > Date: Fri, 9 Mar 2018 10:50:49 -0800 > >> On 3/9/18 10:23 AM, Andy Lutomirski wrote: >>> It might not be totally crazy to back it by tmpfs. >> >> interesting. how do you propose to do it? >> Something like: >> - create /umh_module_tempxxx dir >> - mount tmpfs there >> - copy elf into it and exec it? > > I think the idea is that it's an internal tmpfs mount that only > the kernel has access too. That's what I was imagining. There's precedent. For example, there's a very short piece of code that does it in drivers/gpu/drm/i915/i915_gemfs.c. > > And I don't think that even hurts your debuggability concerns. The > user can just attach using the foo.ko file in the actual filesystem. > Not if the .ko is actually a shim that actually just contains a blob and a few lines of code to kick off the umh. But one could still debug it using kernel debug symbols (like vDSO debugging works right now, at least if your distro is in a good mood) or by reading the contents from /proc/PID/exe.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 11:12 AM, Linus Torvalds wrote: > > How are you going to handle five processes doing the same setup concurrently? Side note: it's not just serialization. It's also "is it actually up and running". The rule for "request_module()" (for a real module) has been that it returns when the module is actually alive and active and have done their initcalls. The UMH_WAIT_EXEC behavior (ignore the serialization - you could do that in the caller) behavior doesn't actually have any semantics AT ALL. It only means that you get the error returns from execve() itself, so you know that the executable file actually existed and parsed right enough to get started. But you don't actually have any reason to believe that it has *done* anything, and started processing any requests. There's no reason what-so-ever to believe that it has registered itself for any asynchronous requests or anything like that. So in the real module case, you can do request_module("modulename"); and just start using whatever resource you just requested. So the netfilter code literally does request_module("nft-chain-%u-%.*s", family, nla_len(nla), (const char *)nla_data(nla)); nfnl_lock(NFNL_SUBSYS_NFTABLES); type = __nf_tables_chain_type_lookup(nla, family); if (type != NULL) return ERR_PTR(-EAGAIN); and doesn't even care about error handling for request_module() itself, because it knows that either the module got loaded and is ready, or something failed. And it needs to look that chain type up anyway, so the failure is indicated by _that_. With a UMH_WAIT_EXEC? No. You have *nothing*. You know the thing started, but it might have SIGSEGV'd immediately, and you have absolutely no way of knowing, and absolutely no way of even figuring it out. You can wait - forever - for something to bind to whatever dynamic resource you're expecting. You'll just fundamentally never know. You can try again, of course. Add a timeout, and try again in five seconds or something. Maybe it will work then. Maybe it won't. You won't have any way to know the _second_ time around either. Or the third. Or... See why I say it has to be synchronous? If it's synchronous, you can actually do things like (a) maybe you only need a one-time thing, and don't have any state ("load fixed tables, be done") and that's it. If the process returns with no error code, you're all done, and you know it's fine. (b) maybe the process wants to start a listener daemon or something like the traditional inetd model. It can open the socket, it can start listening on it, and it can fork off a child and check it's status. It can then do exit(0) if everything is fine, and now request_module() returns. see the difference? Even if you ended up with a background process (like in that (b) case), you did so with *error* handling, and you did so knowing that the state has actually been set up by the time the request_module() returns. And if you use the proper module loading exclusion, it also means that that (b) can know it's the only process starting up, and it's not racing with another one. It might still want to do the usual lock-files in user space to protect against just the admin starting it manually, but at least you don't have the situation that a hundred threads just had a thundering herd where they all ended up using the same kernel facility, and they all independently started a hundred usermode helpers. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 10:57 AM, David Miller wrote: > > Once the helper UMH is invoked, it runs asynchronously taking eBPF > translation requests. How? Really. See my comment about mutual exclusion. The current patch is *broken* because it doesn't handle it. Really. Think of it this way - you may have now started *five* of those things concurrently by mistake. The actual module loading case never does that, because the actual module loading case has per-module serialization that got short-circuited. How are you going to handle five processes doing the same setup concurrently? Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 10:50 AM, Linus Torvalds wrote: On Fri, Mar 9, 2018 at 10:43 AM, Kees Cook wrote: Module loading (via kernel_read_file()) already uses deny_write_access(), and so does do_open_execat(). As long as module loading doesn't call allow_write_access() before the execve() has started in the new implementation, I think we'd be covered here. No. kernel_read_file() only does it *during* the read. So there's a huge big honking gap between the two. Also, the second part of my suggestion was to be entirely synchronous with the whole execution of the process, and do it within the "we do mutual exclusion fo rmodules with the same name" logic. Note that Andrei's patch uses UMH_WAIT_EXEC. That's basically "vfork+exec" - it only waits for the exec to have started, it doesn't wait for the whole thing. It's not waiting for the whole thing, because once bpfilter starts it stays running/sleeping because it's stateful. It needs normal malloc-ed memory to keep the state of iptable->bpf translation that it will use later during subsequent translation calls. Theoretically it can use bpf maps pinned in kernel memory to keep this state, but then it's non-swappable. It's better to keep bpfilter state in its own user memory.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Linus Torvalds Date: Fri, 9 Mar 2018 10:53:45 -0800 > Anyway, see my other suggestion that makes this all irrelevant. Just > wait synchronously (until the exit), and just use deny_write_access(). What exit? Once the helper UMH is invoked, it runs asynchronously taking eBPF translation requests.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Alexei Starovoitov Date: Fri, 9 Mar 2018 10:50:49 -0800 > On 3/9/18 10:23 AM, Andy Lutomirski wrote: >> It might not be totally crazy to back it by tmpfs. > > interesting. how do you propose to do it? > Something like: > - create /umh_module_tempxxx dir > - mount tmpfs there > - copy elf into it and exec it? I think the idea is that it's an internal tmpfs mount that only the kernel has access too. And I don't think that even hurts your debuggability concerns. The user can just attach using the foo.ko file in the actual filesystem.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 10:50 AM, Linus Torvalds wrote: > On Fri, Mar 9, 2018 at 10:43 AM, Kees Cook wrote: >> >> Module loading (via kernel_read_file()) already uses >> deny_write_access(), and so does do_open_execat(). As long as module >> loading doesn't call allow_write_access() before the execve() has >> started in the new implementation, I think we'd be covered here. > > No. kernel_read_file() only does it *during* the read. Ah, true. And looking at this again, shouldn't deny_write_access() happen _before_ the LSM check in kernel_read_file()? That looks like a problem... -Kees -- Kees Cook Pixel Security
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 10:48 AM, Andy Lutomirski wrote: >> On Mar 9, 2018, at 10:17 AM, Linus Torvalds >> wrote: >> >> Hmm. I wish we had an "execute blob" model, but we really don't, and >> it would be hard/impossible to do without pinning the pages in memory. >> > > Why so hard? We can already execute a struct file for execveat, and Alexei > already has this working for umh. > Surely we can make an immutable (as in even root can’t write it) > kernel-internal tmpfs file, execveat it, then unlink it. And what do you think that does? It pins the memory for the whole time. As a *copy* of the original file. Anyway, see my other suggestion that makes this all irrelevant. Just wait synchronously (until the exit), and just use deny_write_access(). The "synchronous wait" means that you don't have the semantic change (and really., it's *required* anyway for the whole mutual exclusion against another thread racing to load the same module), and the deny_write_access() means that we don't neeed to make another copy. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 10:23 AM, Andy Lutomirski wrote: On Mar 9, 2018, at 10:15 AM, Greg KH wrote: Oh, and for the record, I like Andy's proposal as well as dumping this into a kernel module "blob" with the exception that this now would take up unswapable memory, which isn't the nicest and is one big reason we removed the in-kernel-memory firmware blobs many years ago. It might not be totally crazy to back it by tmpfs. interesting. how do you propose to do it? Something like: - create /umh_module_tempxxx dir - mount tmpfs there - copy elf into it and exec it?
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 10:43 AM, Kees Cook wrote: > > Module loading (via kernel_read_file()) already uses > deny_write_access(), and so does do_open_execat(). As long as module > loading doesn't call allow_write_access() before the execve() has > started in the new implementation, I think we'd be covered here. No. kernel_read_file() only does it *during* the read. So there's a huge big honking gap between the two. Also, the second part of my suggestion was to be entirely synchronous with the whole execution of the process, and do it within the "we do mutual exclusion fo rmodules with the same name" logic. Note that Andrei's patch uses UMH_WAIT_EXEC. That's basically "vfork+exec" - it only waits for the exec to have started, it doesn't wait for the whole thing. So I'm saying "use UMH_WAIT_PROC, do it in a different place, and make sure you cover the whole sequence with deny_write_access()". Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
> On Mar 9, 2018, at 10:17 AM, Linus Torvalds > wrote: > > > Hmm. I wish we had an "execute blob" model, but we really don't, and > it would be hard/impossible to do without pinning the pages in memory. > Why so hard? We can already execute a struct file for execveat, and Alexei already has this working for umh. Surely we can make an immutable (as in even root can’t write it) kernel-internal tmpfs file, execveat it, then unlink it. And /proc/PID/exe should be openable and readable. The blob itself would be __initdata so it gets discarded after it lands in tmpfs.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 10:35 AM, David Miller wrote: > From: Linus Torvalds > Date: Fri, 9 Mar 2018 10:17:42 -0800 > >> - use deny_write_access() to make sure that we don't have active >> writers and cannot get them during the execve. > > I agree that this is necessary for image validation purposes. Module loading (via kernel_read_file()) already uses deny_write_access(), and so does do_open_execat(). As long as module loading doesn't call allow_write_access() before the execve() has started in the new implementation, I think we'd be covered here. -Kees -- Kees Cook Pixel Security
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Linus Torvalds Date: Fri, 9 Mar 2018 10:17:42 -0800 > - use deny_write_access() to make sure that we don't have active > writers and cannot get them during the execve. I agree that this is necessary for image validation purposes.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 10:23:27AM -0800, Andy Lutomirski wrote: > > > > On Mar 9, 2018, at 10:15 AM, Greg KH wrote: > > > > > > > Oh, and for the record, I like Andy's proposal as well as dumping this > > into a kernel module "blob" with the exception that this now would take > > up unswapable memory, which isn't the nicest and is one big reason we > > removed the in-kernel-memory firmware blobs many years ago. > > > > It might not be totally crazy to back it by tmpfs. Ah, yeah, tricky, but yes, I'd be fine with that. Micro-kernel here we come! :)
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
> On Mar 9, 2018, at 10:15 AM, Greg KH wrote: > > > Oh, and for the record, I like Andy's proposal as well as dumping this > into a kernel module "blob" with the exception that this now would take > up unswapable memory, which isn't the nicest and is one big reason we > removed the in-kernel-memory firmware blobs many years ago. > It might not be totally crazy to back it by tmpfs.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 9:08 PM, Alexei Starovoitov wrote: > > there is not abi breakage and file cannot disappear from running task. > One cannot umount fs while file is still being used. I think that "cannot umount" part _is_ the ABI breakage that Andy is talking about. > Not only "read twice", but "read many". > If .text sections of elf that are not yet in memory can be modified > by malicious user, later they will be brought in with different code. > I think the easiest fix to tighten this "umh modules" to CAP_SYS_ADMIN. I don't think it actually fixes anything. It might just break things. For all we know, people run modprobe with CAP_SYS_MODULE only, since that is obviously the only capability it needs. Hmm. I wish we had an "execute blob" model, but we really don't, and it would be hard/impossible to do without pinning the pages in memory. My gut feel is that the right direction to explore is: - consider the module loaded for the whole duration of the execve. So the execution is a *blocking* operation (and we get the correct exclusion semantics) - use deny_write_access() to make sure that we don't have active writers and cannot get them during the execve. The above mean that something that executes to load a new ebpf rule will work very well. But a "start and forget" will not work (although you can obviously do so with a internal fork/exec). Hmm? Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 09:32:36AM -0800, Alexei Starovoitov wrote: > On 3/9/18 8:24 AM, Andy Lutomirski wrote: > > On Fri, Mar 9, 2018 at 3:39 PM, Alexei Starovoitov wrote: > > > On 3/9/18 7:16 AM, Andy Lutomirski wrote: > > > > > > > > > > > > On Mar 8, 2018, at 9:08 PM, Alexei Starovoitov wrote: > > > > > > > > > > > > On 3/8/18 7:54 PM, Andy Lutomirski wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Mar 8, 2018, at 7:06 PM, Linus Torvalds > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Honestly, that "read twice" thing may be what scuttles this. > > > > > > > Initially, I thought it was a non-issue, because anybody who > > > > > > > controls > > > > > > > the module subdirectory enough to rewrite files would be in a > > > > > > > position > > > > > > > to just execute the file itself directly instead. > > > > > > > > > > > > > > > > > > On further consideration, I think there’s another showstopper. This > > > > > > patch is a potentially severe ABI break. Right now, loading a module > > > > > > *copies* it into memory and does not hold a reference to the > > > > > > underlying fs. > > > > > > With the patch applied, all kinds of use cases can break in gnarly > > > > > > ways. > > > > > > Initramfs is maybe okay, but initrd may be screwed. If you load an > > > > > > ET_EXEC > > > > > > module from initrd, then umount it, then clear the ramdisk, > > > > > > something will > > > > > > go horribly wrong. Exactly what goes wrong depends on whether > > > > > > userspace > > > > > > notices that umount() failed. Similarly, if you load one of these > > > > > > modules > > > > > > over a network and then lose your connection, you have a problem. > > > > > > > > > > > > > > > there is not abi breakage and file cannot disappear from running task. > > > > > One cannot umount fs while file is still being used. > > > > > > > > > > > > Sure it is. Without your patch, init_module doesn’t keep using the > > > > file, so it’s common practice to load a module and then delete or > > > > unmount it. With your patch, the unmount case breaks. This is likely > > > > to break existing userspace, so, in Linux speak it’s an ABI break. > > > > > > > > > please read the patch again. > > > file is only used in case of umh modules. > > > There is zero difference in default case. > > > > Say I'm running some distro or other working Linux setup. I upgrade > > my kernel to a kernel that uses umh modules. The user tooling > > generates some kind of boot entry that references the new kernel > > image, and it also generates a list of modules to be loaded at various > > times in the boot process. This list might, and probably should, > > include one or more umh modules. (You are being very careful to make > > sure that depmod keeps working, so umh modules are clearly intended to > > work with existing tooling.) So now I have a kernel image and some > > modules to be loaded from various places. And I have an init script > > (initramfs's '/init' or similar) that will call init_module() on that > > .ko file. That script was certainly written under the assumption > > that, once init_module() returns, the kernel is done with the .ko > > file. With your patch applied, that assumption is no longer true. > > There is no intent to use umh modules during boot process. For _your_ use case, yes. For mine and Andy's and someone else's in the future, it might be :) You are creating a very generic, new, user/kernel api that a whole bunch of people are going to want to use. Let's not hamper the ability for us all to use this right from the beginning please. > This is not a replacement for drivers and kernel modules. > From your earlier comments regarding usb driver as umh module > I suspect you're assuming that everything will sooner or later > will convert to umh model. We have userspace drivers for USB today, being able to drag that out-of-tree codebase into the kernel is a _HUGE_ bonus, and something that I would love to do for a lot of reasons. I also can see moving some of our existing in-kernel drivers out of the kernel in a way that provides "it just works" functionality by using this type of feature. So again, please don't prevent us from using this, otherwise you should be just calling this "bpf_usermode_helper" :) Oh, and for the record, I like Andy's proposal as well as dumping this into a kernel module "blob" with the exception that this now would take up unswapable memory, which isn't the nicest and is one big reason we removed the in-kernel-memory firmware blobs many years ago. thanks, greg k-h
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 8:24 AM, Andy Lutomirski wrote: On Fri, Mar 9, 2018 at 3:39 PM, Alexei Starovoitov wrote: On 3/9/18 7:16 AM, Andy Lutomirski wrote: On Mar 8, 2018, at 9:08 PM, Alexei Starovoitov wrote: On 3/8/18 7:54 PM, Andy Lutomirski wrote: On Mar 8, 2018, at 7:06 PM, Linus Torvalds wrote: Honestly, that "read twice" thing may be what scuttles this. Initially, I thought it was a non-issue, because anybody who controls the module subdirectory enough to rewrite files would be in a position to just execute the file itself directly instead. On further consideration, I think there’s another showstopper. This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem. there is not abi breakage and file cannot disappear from running task. One cannot umount fs while file is still being used. Sure it is. Without your patch, init_module doesn’t keep using the file, so it’s common practice to load a module and then delete or unmount it. With your patch, the unmount case breaks. This is likely to break existing userspace, so, in Linux speak it’s an ABI break. please read the patch again. file is only used in case of umh modules. There is zero difference in default case. Say I'm running some distro or other working Linux setup. I upgrade my kernel to a kernel that uses umh modules. The user tooling generates some kind of boot entry that references the new kernel image, and it also generates a list of modules to be loaded at various times in the boot process. This list might, and probably should, include one or more umh modules. (You are being very careful to make sure that depmod keeps working, so umh modules are clearly intended to work with existing tooling.) So now I have a kernel image and some modules to be loaded from various places. And I have an init script (initramfs's '/init' or similar) that will call init_module() on that .ko file. That script was certainly written under the assumption that, once init_module() returns, the kernel is done with the .ko file. With your patch applied, that assumption is no longer true. There is no intent to use umh modules during boot process. This is not a replacement for drivers and kernel modules. From your earlier comments regarding usb driver as umh module I suspect you're assuming that everything will sooner or later will convert to umh model. There is no such intent. umh approach is targeting one specific use case of converting one stable uapi into another stable uapi. It's all control plane that can be a slow as it needs to be. Critical kernel datapath is not going to be affected (especially the one needed to boot) because umh is a user mode app running async with the rest of kernel. With patch applied there are still zero users of it. bpfilter and nft2bpf are the only two that are going to use this interface. Every other potential user will be code reviewed just like everything else in the kernel land. So your statement that with patch applied there is an ABI breakage is just false. At the same time I agree that keeping fs pinned while umh module started from that fs is not great, so I intend to solve it somehow in v2 while keeping the approach being elf based for debuggability reasons explained earlier. Heck, on my laptop, all my .ko files are labeled system_u:object_r:modules_object_t:s0. I wonder how many SELinux setups (and AppArmor, etc) will actually disallow execve() on modules? I don't think it's a good idea to move lsm into umh. Can you please try to have a constructive discussion here? I'd like to ask the same favor. Claiming ABI breakage when there is none is not constructive. Saying that "ohh there must be a security issue here, because it looks complex" is not constructive either.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 3:39 PM, Alexei Starovoitov wrote: > On 3/9/18 7:16 AM, Andy Lutomirski wrote: On Mar 8, 2018, at 9:08 PM, Alexei Starovoitov wrote: On 3/8/18 7:54 PM, Andy Lutomirski wrote: > On Mar 8, 2018, at 7:06 PM, Linus Torvalds > wrote: > > > Honestly, that "read twice" thing may be what scuttles this. > Initially, I thought it was a non-issue, because anybody who controls > the module subdirectory enough to rewrite files would be in a position > to just execute the file itself directly instead. On further consideration, I think there’s another showstopper. This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem. >>> >>> >>> there is not abi breakage and file cannot disappear from running task. >>> One cannot umount fs while file is still being used. >> >> >> Sure it is. Without your patch, init_module doesn’t keep using the >> file, so it’s common practice to load a module and then delete or >> unmount it. With your patch, the unmount case breaks. This is likely >> to break existing userspace, so, in Linux speak it’s an ABI break. > > > please read the patch again. > file is only used in case of umh modules. > There is zero difference in default case. Say I'm running some distro or other working Linux setup. I upgrade my kernel to a kernel that uses umh modules. The user tooling generates some kind of boot entry that references the new kernel image, and it also generates a list of modules to be loaded at various times in the boot process. This list might, and probably should, include one or more umh modules. (You are being very careful to make sure that depmod keeps working, so umh modules are clearly intended to work with existing tooling.) So now I have a kernel image and some modules to be loaded from various places. And I have an init script (initramfs's '/init' or similar) that will call init_module() on that .ko file. That script was certainly written under the assumption that, once init_module() returns, the kernel is done with the .ko file. With your patch applied, that assumption is no longer true. RHEL 5 uses initrd and is still supported. For all I know, some embedded setups put their initial /lib on some block device that literally disappears after they finish booting. There are livecds that let you boot in a mode that lets you remove the CD after you finish booting. Heck, someone must have built something that calls init_module() on a FUSE filesystem. Heck, on my laptop, all my .ko files are labeled system_u:object_r:modules_object_t:s0. I wonder how many SELinux setups (and AppArmor, etc) will actually disallow execve() on modules? > >>> The “read twice” thing is also bad for another reason: containers. Suppose I have a setup where a container can load a signed module blob. With the read twice code, the container can race and run an entirely different blob outside the container. >>> >>> >>> Not only "read twice", but "read many". >>> If .text sections of elf that are not yet in memory can be modified >>> by malicious user, later they will be brought in with different code. >>> I think the easiest fix to tighten this "umh modules" to CAP_SYS_ADMIN. >> >> >> Given this issue, I think the patch would need Kees’s explicit ack. I >> had initially thought your patch had minimal security impact, but I >> was wrong Module security is very complicated and needs to satisfy a >> bunch of requirements. There is a lot of code in the kernel that >> assumes that it’s sufficient to verify a module once at load time, >> your patch changes that, and this has all kinds of nasty interactions >> with autoloading. > > > not true. you misread the patch and making incorrect conclusions. So please tell my exactly how I misread the patch and why Linus' conclusion (which is what I'm echoing) is wrong. > > I think you need to stop overreacting on non-issue. > Can you please try to have a constructive discussion here? It's not so enjoyable to review patches when author declares review comments to be non-issues without actually explaining *why* they're non-issues. Alexei, I'm willing to be shown that I'm wrong, but you have to show me how I'm wrong rather than just telling me over and over that I'm wrong.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/9/18 7:16 AM, Andy Lutomirski wrote: On Mar 8, 2018, at 9:08 PM, Alexei Starovoitov wrote: On 3/8/18 7:54 PM, Andy Lutomirski wrote: On Mar 8, 2018, at 7:06 PM, Linus Torvalds wrote: Honestly, that "read twice" thing may be what scuttles this. Initially, I thought it was a non-issue, because anybody who controls the module subdirectory enough to rewrite files would be in a position to just execute the file itself directly instead. On further consideration, I think there’s another showstopper. This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem. there is not abi breakage and file cannot disappear from running task. One cannot umount fs while file is still being used. Sure it is. Without your patch, init_module doesn’t keep using the file, so it’s common practice to load a module and then delete or unmount it. With your patch, the unmount case breaks. This is likely to break existing userspace, so, in Linux speak it’s an ABI break. please read the patch again. file is only used in case of umh modules. There is zero difference in default case. The “read twice” thing is also bad for another reason: containers. Suppose I have a setup where a container can load a signed module blob. With the read twice code, the container can race and run an entirely different blob outside the container. Not only "read twice", but "read many". If .text sections of elf that are not yet in memory can be modified by malicious user, later they will be brought in with different code. I think the easiest fix to tighten this "umh modules" to CAP_SYS_ADMIN. Given this issue, I think the patch would need Kees’s explicit ack. I had initially thought your patch had minimal security impact, but I was wrong Module security is very complicated and needs to satisfy a bunch of requirements. There is a lot of code in the kernel that assumes that it’s sufficient to verify a module once at load time, your patch changes that, and this has all kinds of nasty interactions with autoloading. not true. you misread the patch and making incorrect conclusions. Kees is very reasonable, and he’ll change his mind and ack a patch that he’s nacked when presented with a valid technical argument. But I think my ABI break observation is also a major problem, and Linus is going to be pissed if this thing lands in his tree and breaks systems due to an issue that was raised during review. So I think you need to either rework the patch or do a serious survey of how all the I think you need to stop overreacting on non-issue. distros deal with modules (dracut, initramfs-tools, all the older stuff, and probably more) and make sure they can all handle your patch. I'd also be concerned about anyone who puts /lib/modules on less reliable storage than they use for the rest of their system. (I know it's quite common to have /boot be the only non-RAID partition on a system, but modules don't generally live in /boot.) Also, I think you really ought to explain how your approach will work with MODULES=n or convince Linus that it’s okay to start adding core networking features that don’t work with MODULES=n and can't be built into the main kernel image. that ship sailed long ago. config BPF_JIT bool "enable BPF Just In Time compiler" depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT depends on MODULES
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
>> On Mar 8, 2018, at 9:08 PM, Alexei Starovoitov wrote: >> >> On 3/8/18 7:54 PM, Andy Lutomirski wrote: >> >> >> >>> On Mar 8, 2018, at 7:06 PM, Linus Torvalds >>> wrote: >>> >>> >>> Honestly, that "read twice" thing may be what scuttles this. >>> Initially, I thought it was a non-issue, because anybody who controls >>> the module subdirectory enough to rewrite files would be in a position >>> to just execute the file itself directly instead. >> >> On further consideration, I think there’s another showstopper. This patch is >> a potentially severe ABI break. Right now, loading a module *copies* it into >> memory and does not hold a reference to the underlying fs. With the patch >> applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe >> okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, >> then umount it, then clear the ramdisk, something will go horribly wrong. >> Exactly what goes wrong depends on whether userspace notices that umount() >> failed. Similarly, if you load one of these modules over a network and then >> lose your connection, you have a problem. > > there is not abi breakage and file cannot disappear from running task. > One cannot umount fs while file is still being used. Sure it is. Without your patch, init_module doesn’t keep using the file, so it’s common practice to load a module and then delete or unmount it. With your patch, the unmount case breaks. This is likely to break existing userspace, so, in Linux speak it’s an ABI break. > >> >> The “read twice” thing is also bad for another reason: containers. Suppose I >> have a setup where a container can load a signed module blob. With the read >> twice code, the container can race and run an entirely different blob >> outside the container. > > Not only "read twice", but "read many". > If .text sections of elf that are not yet in memory can be modified > by malicious user, later they will be brought in with different code. > I think the easiest fix to tighten this "umh modules" to CAP_SYS_ADMIN. Given this issue, I think the patch would need Kees’s explicit ack. I had initially thought your patch had minimal security impact, but I was wrong Module security is very complicated and needs to satisfy a bunch of requirements. There is a lot of code in the kernel that assumes that it’s sufficient to verify a module once at load time, your patch changes that, and this has all kinds of nasty interactions with autoloading. Kees is very reasonable, and he’ll change his mind and ack a patch that he’s nacked when presented with a valid technical argument. But I think my ABI break observation is also a major problem, and Linus is going to be pissed if this thing lands in his tree and breaks systems due to an issue that was raised during review. So I think you need to either rework the patch or do a serious survey of how all the distros deal with modules (dracut, initramfs-tools, all the older stuff, and probably more) and make sure they can all handle your patch. I'd also be concerned about anyone who puts /lib/modules on less reliable storage than they use for the rest of their system. (I know it's quite common to have /boot be the only non-RAID partition on a system, but modules don't generally live in /boot.) Also, I think you really ought to explain how your approach will work with MODULES=n or convince Linus that it’s okay to start adding core networking features that don’t work with MODULES=n and can't be built into the main kernel image.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/8/18 7:54 PM, Andy Lutomirski wrote: On Mar 8, 2018, at 7:06 PM, Linus Torvalds wrote: Honestly, that "read twice" thing may be what scuttles this. Initially, I thought it was a non-issue, because anybody who controls the module subdirectory enough to rewrite files would be in a position to just execute the file itself directly instead. On further consideration, I think there’s another showstopper. This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem. there is not abi breakage and file cannot disappear from running task. One cannot umount fs while file is still being used. The “read twice” thing is also bad for another reason: containers. Suppose I have a setup where a container can load a signed module blob. With the read twice code, the container can race and run an entirely different blob outside the container. Not only "read twice", but "read many". If .text sections of elf that are not yet in memory can be modified by malicious user, later they will be brought in with different code. I think the easiest fix to tighten this "umh modules" to CAP_SYS_ADMIN.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
> On Mar 8, 2018, at 7:06 PM, Linus Torvalds > wrote: > > > Honestly, that "read twice" thing may be what scuttles this. > Initially, I thought it was a non-issue, because anybody who controls > the module subdirectory enough to rewrite files would be in a position > to just execute the file itself directly instead. > On further consideration, I think there’s another showstopper. This patch is a potentially severe ABI break. Right now, loading a module *copies* it into memory and does not hold a reference to the underlying fs. With the patch applied, all kinds of use cases can break in gnarly ways. Initramfs is maybe okay, but initrd may be screwed. If you load an ET_EXEC module from initrd, then umount it, then clear the ramdisk, something will go horribly wrong. Exactly what goes wrong depends on whether userspace notices that umount() failed. Similarly, if you load one of these modules over a network and then lose your connection, you have a problem. The “read twice” thing is also bad for another reason: containers. Suppose I have a setup where a container can load a signed module blob. With the read twice code, the container can race and run an entirely different blob outside the container.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 02:12:24AM +, Andy Lutomirski wrote: > On Fri, Mar 9, 2018 at 1:20 AM, Alexei Starovoitov > wrote: > > On Fri, Mar 09, 2018 at 12:59:36AM +, Andy Lutomirski wrote: > >> > >> Alexei, can you give an example use case? I'm sure it's upthread > >> somewhere, but I'm having trouble finding it. > > > > at the time of iptable's setsockopt() the kernel will do > > err = request_module("bpfilter"); > > once. > > The rough POC code: > > https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/tree/net/ipv4/bpfilter/sockopt.c?h=ipt_bpf#n25 > > Here's what I gather from reading that code: you have a new kernel > feature (consisting of actual kernel code) that wants to defer some of > its implementation to user mode. I like this idea a lot. But I have > a suggestion for a slightly different way of accomplishing the same > thing. Rather than extending init_module() to accept ELF input, > except the call_umh code to be able to call blobs. You'd use it it > very roughly like this: > > First, compile your user code and emit a staitc binary. Use objdump > fiddling or a trivial .S file to make that static binary into a > variable. Then write a tiny shim module like this: > > extern unsigned char __begin_user_code[], __end_user_code[]; > > int __init init_shim_module(void) > { > return call_umh_blob(__begin_user_code, __end_user_code - > __begin_user_code); > } > > By itself, this is clearly a worse solution than yours, but it has two > benefits, one small and two big. The small benefit is that it is > completely invisible to userspace: the .ko file is a bona fide module. Unfortunately it's not quite the case. The normal .ko that does call_umh_blob is indeed seen in lsmod, but the umh process is a separate task. It could have been oomed or killed by admin and this normal .ko wouldn't notice it, so health check of umh process by the kernel is needed regardless. Right now bpfilter has trivial fuse-like protocol. This part is still to be designed cleanly. No doubt that visibility and debuggability into this umh processes is must have, but lsmod/rmmod interface doesn't quite fit. As you said letting this priv tasks register themselves in lsmod is certainly no-go. I think if they will be in lsmod, kernel has to register them and establish health check with umh at the same time. I think worrying about restarting is not necessary. This is still kernel code with the same high standards and review process. If they crash it's really a kernel bug. It only doesn't take the system down. > I think we don't want to end up in a situation where we ship a program > with a .ko extension that opens something in /dev, for example. this part I don't get. What's wrong with open of /dev ? I don't see a use case for it, but technically why not? > call_umh_blob() would create an anon_inode or similar object backed by > the blob and exec it. Interesting. I haven't considered such approach. For full context it all started from the idea of 'unprivileged kernel modules' or 'hardened kernel modules'. Something that kernel can easily interact with, but much safer than traditional kernel module. I've tried a bunch of crappy ideas first: 1. have a piece of kernel .text vm_mmap-ed into user process that doing iptables setsockopt and on return to user space force handle_signal to execute that code. Sort of like forced ld_preload where parasite code is provided by the kernel but runs in user space 2. have a special set of kernel page tables in read-only mode while iptable->bpf conversion is happening 3. have load_module() fork a user task and load real kernel .ko into it trying to hack #3 realized that I'm mainly copy-pasting a lot of load_elf_binary() code without elf_interpreter bits, so figured it's much easier and simpler to blend sys_finit_module with load_elf_binary via tweaking do_execveat_common and keeping that .ko as normal elf which is implemented in this patch. Debugging of that fake .ko is so much better. If it's done via call_umh_blob() the process that starts will indeed be a user mode process, but you won't be able to attach gdb to it. Whereas in this patch it's normal elf and standard debugging techniques apply. A developer can do gdb breakpoints, debug info, etc. That is huge advantage of keeping it as normal elf.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 7:06 PM, Linus Torvalds wrote: > > So I don't like Andy's "let's make it a kernel module and then that > kernel module can execve() a blob". THAT seems like just stupid > indirection. > > But I do like Andy's "execve a blob" part, because it is the *blob* > that has had its signature verified, not the file! To be honest., Andy's approach has the advantage that all the synchronization we do for modules still works. In particular, we have module loading logic where we synchronize loading of modules with the same name, so that two things that do request_module() concurrently will not lead to two modules being loaded. See that whole "module_mutex" thing, and the logic in (for example) "add_unformed_module()". Andrei's patch short-circuits the module loading before that, so if you have two concurrent request_module() calls, you'll end up with no serialization, and two executables. So maybe Andy is right. He often is, after all. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
> On Mar 8, 2018, at 6:31 PM, David Miller wrote: > > From: Andy Lutomirski > Date: Fri, 9 Mar 2018 02:12:24 + > >> First, compile your user code and emit a staitc binary. Use objdump >> fiddling or a trivial .S file to make that static binary into a >> variable. Then write a tiny shim module like this: >> >> extern unsigned char __begin_user_code[], __end_user_code[]; >> >> int __init init_shim_module(void) >> { >> return call_umh_blob(__begin_user_code, __end_user_code - >> __begin_user_code); >> } >> >> By itself, this is clearly a worse solution than yours, but it has two >> benefits, one small and two big. The small benefit is that it is >> completely invisible to userspace: the .ko file is a bona fide module. > > Anything you try to do which makes these binaries "special" is a huge > negative. I don’t know what you mean. Alexei’s approach introduces a whole new kind of special module. Mine doesn’t. > >> The big benefits are: > > I don't see those things as benefits at all, and Alexei's scheme can > easily be made to work in your benefit #1 case too. > How? I think you’ll find that a non-modular implementation of a bundled ELF binary looks a *lot* like my call_umh_blob(). > It's a user binary. It's shipped with the kernel and it's signed. > > If we can't trust that, we can't trust much else. I’m not making any arguments about security at all. I’m talking about functionality. If we apply Alexei’s patch as is, then I think we’ll have a situation where ET_EXEC modules are only useful if they can do their jobs without any filesystem access at all. This is fine for networking, where netlink sockets are used, but I think it’s not so great for other use cases. If we ever try to stick a usb driver into userspace, we’re going to want to instantiate the user task once per device, passed as stdin or similar, and Alexei’s code will make that very awkward.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 5:44 PM, Kees Cook wrote: > > My concerns are mostly about crossing namespaces. If a container > triggers an autoload, the result runs in the init_ns. Heh. I saw that as an advantage. It's basically the same semantics as a normal module load does - in that the "kernel namespace" is init_ns. My own personal worry is actually different - we do check the signature of the file we're loading, but we're then passing it off to execve() not as the image we loaded, but as the file pointer. So the execve() will end up not using the actual buffer we checked the signature on, but instead just re-reading the file. Now, that has two issues: (a) it means that 'init_module' doesn't work, only 'finit_module'. Not a big deal, but I do think that we should fail the 'info->file == NULL' case explicitly (instead of failing when it does an execve() of /dev/null, which is what I think it does now - but that's just from the reading the patch, not from testing it). (b) somebody could maybe try to time it and modify the file after-the-fact of the signature check, and then we execute something else. Honestly, that "read twice" thing may be what scuttles this. Initially, I thought it was a non-issue, because anybody who controls the module subdirectory enough to rewrite files would be in a position to just execute the file itself directly instead. But it turns out that isn't needed. Some bad actor could just do "finit_module()" with a file that they just *copied* from the module directory. Yes, yes, they still need CAP_SYS_MODULE to even get that far, but it does worry me. So this does seem to turn "CAP_SYS_MODULE" into meaning "can run any executable as root in the init namespace". They don't have to have the module signing key that I thought would protect us, because they can do that "rewrite after signature check". So that does seem like a bad problem with the approach. So I don't like Andy's "let's make it a kernel module and then that kernel module can execve() a blob". THAT seems like just stupid indirection. But I do like Andy's "execve a blob" part, because it is the *blob* that has had its signature verified, not the file! Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Andy Lutomirski Date: Fri, 9 Mar 2018 02:12:24 + > First, compile your user code and emit a staitc binary. Use objdump > fiddling or a trivial .S file to make that static binary into a > variable. Then write a tiny shim module like this: > > extern unsigned char __begin_user_code[], __end_user_code[]; > > int __init init_shim_module(void) > { > return call_umh_blob(__begin_user_code, __end_user_code - > __begin_user_code); > } > > By itself, this is clearly a worse solution than yours, but it has two > benefits, one small and two big. The small benefit is that it is > completely invisible to userspace: the .ko file is a bona fide module. Anything you try to do which makes these binaries "special" is a huge negative. > The big benefits are: I don't see those things as benefits at all, and Alexei's scheme can easily be made to work in your benefit #1 case too. It's a user binary. It's shipped with the kernel and it's signed. If we can't trust that, we can't trust much else. And this whole container argument.. It's a mirage. Kernel modules are 1000 times worse, since they can access any container and any namespace they want.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 1:20 AM, Alexei Starovoitov wrote: > On Fri, Mar 09, 2018 at 12:59:36AM +, Andy Lutomirski wrote: >> >> Alexei, can you give an example use case? I'm sure it's upthread >> somewhere, but I'm having trouble finding it. > > at the time of iptable's setsockopt() the kernel will do > err = request_module("bpfilter"); > once. > The rough POC code: > https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/tree/net/ipv4/bpfilter/sockopt.c?h=ipt_bpf#n25 Here's what I gather from reading that code: you have a new kernel feature (consisting of actual kernel code) that wants to defer some of its implementation to user mode. I like this idea a lot. But I have a suggestion for a slightly different way of accomplishing the same thing. Rather than extending init_module() to accept ELF input, except the call_umh code to be able to call blobs. You'd use it it very roughly like this: First, compile your user code and emit a staitc binary. Use objdump fiddling or a trivial .S file to make that static binary into a variable. Then write a tiny shim module like this: extern unsigned char __begin_user_code[], __end_user_code[]; int __init init_shim_module(void) { return call_umh_blob(__begin_user_code, __end_user_code - __begin_user_code); } By itself, this is clearly a worse solution than yours, but it has two benefits, one small and two big. The small benefit is that it is completely invisible to userspace: the .ko file is a bona fide module. The big benefits are: 1. It works even in a non-modular kernel! (Okay, it probably only works if you can arrange for the built-in module to be initialized late enough, but that's straightforward.) 2. It allows future extensions to change the way the glue works. For example, maybe you want the module to integrate properly with lsmod, etc. Rather than adding a mechanism for general privileged programs to register themselves with lsmod (ick!), you could do it entirely in the kernel where lsmod would know that a particular umh task is special. More usefully, you could extend call_umh_blob() to pass in some pre-initialized struct files, which would give a clean way to *synchronously* create a communication channel to user code for whatever service the user code provides. And it would be more straightforward to make the umh blob do what it needs to do without relying on any particular filesystems being mounted. I think we don't want to end up in a situation where we ship a program with a .ko extension that opens something in /dev, for example. call_umh_blob() would create an anon_inode or similar object backed by the blob and exec it.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 08, 2018 at 03:07:01PM -0800, Alexei Starovoitov wrote: > On 3/7/18 5:23 PM, Luis R. Rodriguez wrote: > > > > request_module() has its own world though too. How often in your proof of > > concept is request_module() called? How many times do you envision it being > > called? > > once. What about other users later in the kernel? > > > +static int run_umh(struct file *file) > > > +{ > > > + struct subprocess_info *sub_info = call_usermodehelper_setup_file(file); > > > + > > > + if (!sub_info) > > > + return -ENOMEM; > > > + return call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); > > > +} > > > > run_umh() calls the program and waits. Note that while we are running a UMH > > we > > can't suspend. What if they take forever, who is hosing them down with an > > equivalent: > > > > schedule(); > > try_to_freeze(); > > > > Say they are buggy and never return, will they stall suspend, have you > > tried? > > > > call_usermodehelper_exec() uses helper_lock() which in turn is used for > > umh's accounting for number of running umh's. One of the sad obscure uses > > for this is to deal with suspend/resume. Refer to __usermodehelper* calls > > on kernel/power/process.c > > > > Note how you use UMH_WAIT_EXEC too, so this is all synchronous. > > looks like you misread this code > > #define UMH_NO_WAIT 0 /* don't wait at all */ > #define UMH_WAIT_EXEC 1 /* wait for the exec, but not the process */ > #define UMH_WAIT_PROC 2 /* wait for the process to complete */ > #define UMH_KILLABLE4 /* wait for EXEC/PROC killable */ I certainly did get the incorrect impression this was a sync op, sorry about that. In that case its odd to see a request_module() call, when what is really meant is more of a request_module_nowait(), its also UMH_NO_WAIT on the modprobe call itself as well. In fact I think I'd much prefer at the very least to see a different wrapper for this, even if its using the same bolts and nuts underneath for userspace processes compiled on the kernel. The sanity checks in place for request_module() may change later and this use case seems rather simple and limited. Keeping tabs on this type of users seems desirable. At the *very least* diff --git a/include/linux/kmod.h b/include/linux/kmod.h index 40c89ad4bea6..7530e00da03b 100644 --- a/include/linux/kmod.h +++ b/include/linux/kmod.h @@ -37,11 +37,13 @@ extern __printf(2, 3) int __request_module(bool wait, const char *name, ...); #define request_module(mod...) __request_module(true, mod) #define request_module_nowait(mod...) __request_module(false, mod) +#define request_umh_proc(mod...) __request_module(false, mod) #define try_then_request_module(x, mod...) \ ((x) ?: (__request_module(true, mod), (x))) #else static inline int request_module(const char *name, ...) { return -ENOSYS; } static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; } +static inline int request_umh_proc(const char *name, ...) { return -ENOSYS; } #define try_then_request_module(x, mod...) (x) #endif Modulo, kernel/umh.c is already its own thing, so pick a name that helps identify these things clearly. > and the rest of your concerns with suspend/resume are not applicable any > more. Agreed, except it does still mean these userspace processes are limited to execution only the kernel is up, and not going to suspend, but I think that's a proper sanity check by the umh API. > bpftiler.ko is started once and runs non-stop from there on. > Unless it crashes, but it's a different discussion. Sure. Luis
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 5:38 PM, Linus Torvalds wrote: > On Thu, Mar 8, 2018 at 4:59 PM, Andy Lutomirski wrote: >> >> Also, I don't see how this is any more exploitable than any other >> init_module(). > > Absolutely. If Kees doesn't trust the files to be loaded, an > executable - even if it's running with root privileges and in the > initns - is still fundamentally weaker than a kernel module. > > So I don't understand the security argument AT ALL. It's nonsensical. > The executable loading does all the same security checks that the > module loading does, including the signing check. > > And the whole point is that we can now do things with building and > loading a ebpf rule instead of having a full module. My concerns are mostly about crossing namespaces. If a container triggers an autoload, the result runs in the init_ns. So, really, there's nothing new from my perspective, except that it's in userspace instead of in the kernel. Perhaps it's an orthogonal concern. -Kees -- Kees Cook Pixel Security
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 4:59 PM, Andy Lutomirski wrote: > > Also, I don't see how this is any more exploitable than any other > init_module(). Absolutely. If Kees doesn't trust the files to be loaded, an executable - even if it's running with root privileges and in the initns - is still fundamentally weaker than a kernel module. So I don't understand the security argument AT ALL. It's nonsensical. The executable loading does all the same security checks that the module loading does, including the signing check. And the whole point is that we can now do things with building and loading a ebpf rule instead of having a full module. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 01:04:39AM +, Andy Lutomirski wrote: > On Fri, Mar 9, 2018 at 12:57 AM, Alexei Starovoitov wrote: > > On 3/8/18 4:24 PM, Kees Cook wrote: > >> > >> As Andy asked earlier, why not DYN too to catch PIE executables? Seems > >> like forcing the userspace helper to be non-PIE would defeat some of > >> the userspace defenses in use in most distros. > > > > > > because we don't add features without concrete users. > > I disagree here. If you're going to add a magic trick that triggers > an execve(), then I think you should either support *both* standard, > widely-used types of ELF programs or you should give a compelling use > case that works for ET_EXEC but not for ET_DYN. Keep in mind that > many distros have a very strong preference for ET_DYN. misunderstanding here is that this bpfiler.ko is part of _kernel build_. Kernel decides how it's build. Nothing to do with distros. Current Makefile is very dumb and it's built with HOSTCC: https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/tree/net/bpfilter/Makefile?h=ipt_bpf but it will be standalone with CC before final to make sure crosscompiling works.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Thu, Mar 8, 2018 at 4:57 PM, Alexei Starovoitov wrote: > The above are three paragraphs of security paranoia without single > concrete example of a security issue. How is running an arbitrary ELF as full init_ns root from a container not a concrete example? I'm not saying this approach can never happen. And this isn't paranoia. There are very real security boundary violations in this model, IMO. Though, as Andy says, it's more about autoloading than umh itself. I just don't want to extend that problem further. >> As Andy asked earlier, why not DYN too to catch PIE executables? Seems >> like forcing the userspace helper to be non-PIE would defeat some of >> the userspace defenses in use in most distros. > > > because we don't add features without concrete users. It's just a two-line change, and then it would work on distros that build PIE-by-default. That seems like a concrete use-case. >> The exec.c changes should be split into a separate patch. Something >> feels incorrectly refactored here, though. Passing all three of fd, >> filename, and file to __do_execve_file() seems wrong; perhaps the fd >> to file handling needs to happen externally in what you have here as >> do_execveat_common()? The resulting __do_execve_file() has repeated >> conditionals on filename... maybe what I object to is being able to >> pass a NULL filename at all. The filename can be (painfully) >> reconstructed from the struct file. > > reconstruct the path and instantly introduce the race between execing > something by path vs prior check that it's actual elf of already > opened file ?! > excellent suggestion to improve security. I'm not sure why you're being so hostile. We're both interested in improving the kernel. Path names aren't stable, but they provide good _debugging_ about things when needed. For example, the LSM hooks, if they report on one of these loads, can produce a hint to the user about what happened. Passing "/dev/null" around is confusing at the very least. The ELF is absolutely not /dev/null. Why make someone guess? >>> [...] >>> diff --git a/kernel/module.c b/kernel/module.c >>> index ad2d420024f6..6cfa35795741 100644 >>> --- a/kernel/module.c >>> +++ b/kernel/module.c >>> [...] >>> @@ -3870,14 +3896,21 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char >>> __user *, uargs, int, flags) >>> |MODULE_INIT_IGNORE_VERMAGIC)) >>> return -EINVAL; >>> >>> - err = kernel_read_file_from_fd(fd, &hdr, &size, INT_MAX, >>> - READING_MODULE); >>> + f = fdget(fd); >>> + if (!f.file) >>> + return -EBADF; >>> + >>> + err = kernel_read_file(f.file, &hdr, &size, INT_MAX, >>> READING_MODULE); >> >> >> For the LSM subsystem, I think this should also get it's own enum >> kernel_read_file_id. This is really no longer a kernel module... > > > at this point it's a _file_. It could have been text file just well. > If lsm is thinking that at this point kernel is processing > kernel module that lsm is badly broken. Again, this is about making things more discoverable. We already know if we're loading a kernel module or a umh ELF. Passing this information to the LSM is valuable to the LSM to distinguish between types of files. They have very different effects on the system, for example. The issue is about validating intent with target. "Is this kernel module allowed?" and "Is this umh ELF allowed?" are better questions that "Is something loaded through finit_module() allowed?" Note already that the LSM distinguishes between many other file types in kernel_read_file*(). -Kees -- Kees Cook Pixel Security
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 09, 2018 at 12:59:36AM +, Andy Lutomirski wrote: > > Alexei, can you give an example use case? I'm sure it's upthread > somewhere, but I'm having trouble finding it. at the time of iptable's setsockopt() the kernel will do err = request_module("bpfilter"); once. The rough POC code: https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/tree/net/ipv4/bpfilter/sockopt.c?h=ipt_bpf#n25 > Also, I just tested this concept a bit. Depmod invoked explicitly on > an ET_EXEC with a.ko extension gets mad, but depmod -a on a kernel > that has a "module" like that seems to work fine. Go figure. right. that's with the current patch. In v2 I require .modinfo section to make sure license is specified, but depmod still not very happy: $ depmod /lib/modules/`uname -r`/kernel/net/bpfilter/bpfilter.ko depmod: ERROR: Bad version passed /lib/modules/4.16.0-rc4-00799-g1716f0aa3039-dirty/kernel/net/bpfilter/bpfilter.ko I'm not sure it's worth to silence it, since as you noticed 'depmod -a' works.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 12:57 AM, Alexei Starovoitov wrote: > On 3/8/18 4:24 PM, Kees Cook wrote: >> >> As Andy asked earlier, why not DYN too to catch PIE executables? Seems >> like forcing the userspace helper to be non-PIE would defeat some of >> the userspace defenses in use in most distros. > > > because we don't add features without concrete users. I disagree here. If you're going to add a magic trick that triggers an execve(), then I think you should either support *both* standard, widely-used types of ELF programs or you should give a compelling use case that works for ET_EXEC but not for ET_DYN. Keep in mind that many distros have a very strong preference for ET_DYN. Or you could argue that ET_DYN requires tooling changes, but I think it's awkward to ask the tooling to change in advance of the kernel being willing to actually invoke the thing. I'm not actually convinced that any tooling changes would be needed, though.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Fri, Mar 9, 2018 at 12:24 AM, Kees Cook wrote: > How is this not marked [RFC]? :) > > On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: >> As the first step in development of bpfilter project [1] the request_module() >> code is extended to allow user mode helpers to be invoked. Idea is that >> user mode helpers are built as part of the kernel build and installed as >> traditional kernel modules with .ko file extension into distro specified >> location, such that from a distribution point of view, they are no different >> than regular kernel modules. Thus, allow request_module() logic to load such >> user mode helper (umh) modules via: >> >> request_module("foo") -> >> call_umh("modprobe foo") -> >> sys_finit_module(FD of /lib/modules/.../foo.ko) -> >> call_umh(struct file) > > Yikes. This means autoloaded artbitrary exec() now, instead of > autoloading just loading arbitrary modules? That seems extremely bad. > This just extends all the problems we've had with defining security > boundaries with modules out to umh too. I would need some major > convincing that this can be made safe. > > It's easy for container to trigger arbitrary module loading, so if it > can find/use a similar one of the bugs we've seen in the past to > redirect the module loading we don't just get new module-created > attack surface added to the kernel, but we again get arbitrary ELF > running in the root ns (not in the container). And in the insane case > of a container with CAP_SYS_MODULE, this is a trivial escape without > even needing to build a special kernel module: the root ns, running > with all privileges runs an arbitrary ELF. > > As it stands, I have to NAK this. At the very least, you need to solve > the execution environment problems here: the ELF should run with no > greater privileges than what loaded the module, and very importantly, > must not be allowed to bypass these checks through autoloading. What > _triggered_ the autoload must be the environment, not the "modprobe", > since that's running with full privileges. Basically, we need to fix > module autoloading first, or at least a significant subset of the > problems: https://lkml.org/lkml/2017/11/27/754 I disagree. If we're going to somehow have ELF binaries that pretend to be modules, then they should run with high privilege and, more importantly, should run in a context independent of whoever triggered autoloading. Also, I don't see how this is any more exploitable than any other init_module(). If someone has CAP_SYS_MODULE or autoload privileges and they can trigger init_module() on a file, then they're granting maximum privilege to the contents of that file. So who cares if that file is a kernel module or an ELF binary? Alexei, can you give an example use case? I'm sure it's upthread somewhere, but I'm having trouble finding it. Also, I just tested this concept a bit. Depmod invoked explicitly on an ET_EXEC with a.ko extension gets mad, but depmod -a on a kernel that has a "module" like that seems to work fine. Go figure.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/8/18 4:24 PM, Kees Cook wrote: How is this not marked [RFC]? :) On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: As the first step in development of bpfilter project [1] the request_module() code is extended to allow user mode helpers to be invoked. Idea is that user mode helpers are built as part of the kernel build and installed as traditional kernel modules with .ko file extension into distro specified location, such that from a distribution point of view, they are no different than regular kernel modules. Thus, allow request_module() logic to load such user mode helper (umh) modules via: request_module("foo") -> call_umh("modprobe foo") -> sys_finit_module(FD of /lib/modules/.../foo.ko) -> call_umh(struct file) Yikes. This means autoloaded artbitrary exec() now, instead of autoloading just loading arbitrary modules? That seems extremely bad. This just extends all the problems we've had with defining security boundaries with modules out to umh too. I would need some major convincing that this can be made safe. It's easy for container to trigger arbitrary module loading, so if it can find/use a similar one of the bugs we've seen in the past to redirect the module loading we don't just get new module-created attack surface added to the kernel, but we again get arbitrary ELF running in the root ns (not in the container). And in the insane case of a container with CAP_SYS_MODULE, this is a trivial escape without even needing to build a special kernel module: the root ns, running with all privileges runs an arbitrary ELF. As it stands, I have to NAK this. At the very least, you need to solve the execution environment problems here: the ELF should run with no greater privileges than what loaded the module, and very importantly, must not be allowed to bypass these checks through autoloading. What _triggered_ the autoload must be the environment, not the "modprobe", since that's running with full privileges. Basically, we need to fix module autoloading first, or at least a significant subset of the problems: https://lkml.org/lkml/2017/11/27/754 The above are three paragraphs of security paranoia without single concrete example of a security issue. Such approach enables kernel to delegate functionality traditionally done by kernel modules into user space processes (either root or !root) and reduces security attack surface of such new code, meaning in case of potential bugs only the umh would crash but not the kernel. Another advantage coming with that would be that bpfilter.ko can be debugged and tested out of user space as well (e.g. opening the possibility to run all clang sanitizers, fuzzers or test suites for checking translation). Also, such architecture makes the kernel/user boundary very precise: control plane is done by the user space while data plane stays in the kernel. It's easy to distinguish "umh module" from traditional kernel module: $ readelf -h bld/net/bpfilter/bpfilter.ko|grep Type Type: EXEC (Executable file) As Andy asked earlier, why not DYN too to catch PIE executables? Seems like forcing the userspace helper to be non-PIE would defeat some of the userspace defenses in use in most distros. because we don't add features without concrete users. $ readelf -h bld/net/ipv4/netfilter/iptable_filter.ko |grep Type Type: REL (Relocatable file) Since umh can crash, can be oom-ed by the kernel, killed by admin, the subsystem that uses them (like bpfilter) need to manage life time of umh on its own, so module infra doesn't do any accounting of them. They don't appear in "lsmod" and cannot be "rmmod". Multiple request_module("umh") will load multiple umh.ko processes. Similar to kernel modules the kernel will be tainted if "umh module" has invalid signature. [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_747551_&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=i6WobKxbeG3slzHSIOxTVtYIJw7qjCE6S0spDTKL-J4&m=pMlEM-qKOmGljadUKAdwKBpuYHQRXzApvSGkZFIT4Gg&s=0-vP6LH-GxXnL7LuV-KfMl1NbqsVJUINSygoVa9R6nk&e= Signed-off-by: Alexei Starovoitov --- fs/exec.c | 40 +++- include/linux/binfmts.h | 1 + include/linux/umh.h | 3 +++ kernel/module.c | 43 ++- kernel/umh.c| 24 +--- 5 files changed, 94 insertions(+), 17 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 7eb8d21bcab9..0483c438de7d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1691,14 +1691,13 @@ static int exec_binprm(struct linux_binprm *bprm) /* * sys_execve() executes a new program. */ -static int do_execveat_common(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags) +static int __do_execve_file(int fd, struct filename *filename, +
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
How is this not marked [RFC]? :) On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] the request_module() > code is extended to allow user mode helpers to be invoked. Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) Yikes. This means autoloaded artbitrary exec() now, instead of autoloading just loading arbitrary modules? That seems extremely bad. This just extends all the problems we've had with defining security boundaries with modules out to umh too. I would need some major convincing that this can be made safe. It's easy for container to trigger arbitrary module loading, so if it can find/use a similar one of the bugs we've seen in the past to redirect the module loading we don't just get new module-created attack surface added to the kernel, but we again get arbitrary ELF running in the root ns (not in the container). And in the insane case of a container with CAP_SYS_MODULE, this is a trivial escape without even needing to build a special kernel module: the root ns, running with all privileges runs an arbitrary ELF. As it stands, I have to NAK this. At the very least, you need to solve the execution environment problems here: the ELF should run with no greater privileges than what loaded the module, and very importantly, must not be allowed to bypass these checks through autoloading. What _triggered_ the autoload must be the environment, not the "modprobe", since that's running with full privileges. Basically, we need to fix module autoloading first, or at least a significant subset of the problems: https://lkml.org/lkml/2017/11/27/754 > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Another > advantage coming with that would be that bpfilter.ko can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. > > It's easy to distinguish "umh module" from traditional kernel module: > > $ readelf -h bld/net/bpfilter/bpfilter.ko|grep Type > Type: EXEC (Executable file) As Andy asked earlier, why not DYN too to catch PIE executables? Seems like forcing the userspace helper to be non-PIE would defeat some of the userspace defenses in use in most distros. > $ readelf -h bld/net/ipv4/netfilter/iptable_filter.ko |grep Type > Type: REL (Relocatable file) > > Since umh can crash, can be oom-ed by the kernel, killed by admin, > the subsystem that uses them (like bpfilter) need to manage life > time of umh on its own, so module infra doesn't do any accounting > of them. They don't appear in "lsmod" and cannot be "rmmod". > Multiple request_module("umh") will load multiple umh.ko processes. > > Similar to kernel modules the kernel will be tainted if "umh module" > has invalid signature. > > [1] https://lwn.net/Articles/747551/ > > Signed-off-by: Alexei Starovoitov > --- > fs/exec.c | 40 +++- > include/linux/binfmts.h | 1 + > include/linux/umh.h | 3 +++ > kernel/module.c | 43 ++- > kernel/umh.c| 24 +--- > 5 files changed, 94 insertions(+), 17 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 7eb8d21bcab9..0483c438de7d 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1691,14 +1691,13 @@ static int exec_binprm(struct linux_binprm *bprm) > /* > * sys_execve() executes a new program. > */ > -static int do_execveat_common(int fd, struct filename *filename, > - struct user_arg_ptr argv, > - struct user_arg_ptr envp, > - int flags) > +static int __do_execve_file(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags, struct file *file) > { > char *pathbuf = NULL; > struct linux_binprm *bprm; > - struct file *file; > struct files_struct *displaced; > i
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/7/18 5:23 PM, Luis R. Rodriguez wrote: request_module() has its own world though too. How often in your proof of concept is request_module() called? How many times do you envision it being called? once. +static int run_umh(struct file *file) +{ + struct subprocess_info *sub_info = call_usermodehelper_setup_file(file); + + if (!sub_info) + return -ENOMEM; + return call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); +} run_umh() calls the program and waits. Note that while we are running a UMH we can't suspend. What if they take forever, who is hosing them down with an equivalent: schedule(); try_to_freeze(); Say they are buggy and never return, will they stall suspend, have you tried? call_usermodehelper_exec() uses helper_lock() which in turn is used for umh's accounting for number of running umh's. One of the sad obscure uses for this is to deal with suspend/resume. Refer to __usermodehelper* calls on kernel/power/process.c Note how you use UMH_WAIT_EXEC too, so this is all synchronous. looks like you misread this code and the rest of your concerns with suspend/resume are not applicable any more. #define UMH_NO_WAIT 0 /* don't wait at all */ #define UMH_WAIT_EXEC 1 /* wait for the exec, but not the process */ #define UMH_WAIT_PROC 2 /* wait for the process to complete */ #define UMH_KILLABLE4 /* wait for EXEC/PROC killable */ bpftiler.ko is started once and runs non-stop from there on. Unless it crashes, but it's a different discussion. + if (info->hdr->e_type == ET_EXEC) { +#ifdef CONFIG_MODULE_SIG + if (!info->sig_ok) { + pr_notice_once("umh %s verification failed: signature and/or required key missing - tainting kernel\n", + info->file->f_path.dentry->d_name.name); + add_taint(TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK); + } +#endif So I guess this check is done *after* run_umh() then, what about the enforce mode, don't we want to reject loading at all in any circumstance? already answered this twice in the thread.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Mon, Mar 05, 2018 at 05:34:57PM -0800, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] So meta :) The URL refers an lwn article, which in turn refers to this effort's first RFC. As someone only getting *one* of these patches in emails, It would be useful if the cover letter referenced instead an optional git tree and branch so one could easily get the patches for more careful inspection. In the meantime, can you make such tree available with a branch? Also, since kernel/module.c is involved it would be wise to include Jessica, which I've Cc'd. I'm going to guess Kees may like to review this too. Mimi might want to help review as well. Rafael may care about suspend/resume implications of these "umh modules" as you put it. > the request_module() > code is extended to allow user mode helpers to be invoked. Upon inspection you never touch or use request_module() so this is false and misleading. You don't use or extend request_module() at all, you rely on extending finit_module() so that load_module() itself will now execute (via modified do_execve_file()) the same file which was loaded as an module. This is *very* different given request_module() has its own full magic and is in and of itself a UMH of the kernel implemented in kernel/kmod.c. Nevertheless, why? > Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. It sounds like finit_module() module loading is used as a convenience factor simply to take advantage of being able to ship / maintain/ compile these umh programs as part of the kernel. Is that right? So the finit_module() interface was just a convenient mechanism. Is that right? Ie, if folks had these binaries in place the regular UMH interface / API could be used so that these could be looked for, but instead we want to carry these in tandem with the kernel? If so this still seems like an overly complex way to deal with this. > Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) OK so the use case envisioned here was for networking code to do something like: if (!loaded) { err = request_module("bpfilter"); ... } This is visible on your third patch (this is from your RFC, not this series): https://www.mail-archive.com/netfilter-devel@vger.kernel.org/msg11129.html So indeed all this patch does in the end is just putting tons of wrappers in place so that kernel code can load certain trusted UMH programs we ship, and maintain in the kernel. request_module() has its own world though too. How often in your proof of concept is request_module() called? How many times do you envision it being called? Please review lib/test_kmod.c and tools/testing/selftests/kmod/ for testing your stuff too or consider extending appropriately. Are aliases something which you expect we'll need to support for these userspace... modules? > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Now, this sounds great, however I think that the proof of concept chosen is pretty complex to start off with. Even if its not designed to be a real world life use case, a much simpler proof of concept to do something more simple may be useful, if possible. One wouldn't need to to have it replace a kernel functionality in real life. lib/ is full of CONFIG_TEST_* examples, a simple new stupid kernel functionality which can in turn be replaced with a respective userspace counterpart may be useful, and both kconfig entries would be mutually exclusive. > Another > advantage coming with that would be that bpfilter.ko You mean foo.ko > can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). Great too. > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. I don't see how this is defining any boundary, I see just a loader for a userspace program, and re-using a kernel interface known, finit_module() which makes it convenient for us to load pre-compiled kernel junk. I'm still not convinced this is the right approach. > It's easy to distinguish "umh module" from traditional kernel module: Ah you said it, "umh module". I don't see what makes it a "umh module" so far, all we are doing is executing a user
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
From: Alexei Starovoitov Date: Mon, 5 Mar 2018 17:34:57 -0800 > As the first step in development of bpfilter project [1] the > request_module() code is extended to allow user mode helpers to be > invoked. Looks like there is a little bit of feedback requiring changes, please take care of that and resubmit. Thanks!
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Tue, Mar 06, 2018 at 05:07:45PM -0800, Alexei Starovoitov wrote: > combining multiple answers... > > On 3/6/18 3:05 AM, Greg KH wrote: > > > > Any chance you can add a field to your "umh module" type such that a > > normal 'modinfo' program will be able to notice it is different easily? > > ok. handling of modinfo turned out to be straightforward. > kmod tooling worked fine with simple addition of .modinfo section. > > $ modinfo bpfilter > filename: > /lib/modules/4.16.0-rc4-00799-g1716f0aa3039-dirty/net/bpfilter/bpfilter.ko > umh:Y Nice. But perhaps spell it out, "user_mode_helper"? Anyway, bikesheding now, sorry, whatever you want to call it is fine with me. > license:GPL > > I will require umh=Y and license to be present. > umh has to be set to Y for this 'umh modules' > and taint of kernel will happen if license is not gpl. Interesting, I like it :) > Other modinfo like vermagic are not applicable here, since > umh modules interact with kernel via normal kernel/user abi. Very true. > > > Since umh can crash, can be oom-ed by the kernel, killed by admin, > > > the subsystem that uses them (like bpfilter) need to manage life > > > time of umh on its own, so module infra doesn't do any accounting > > > of them. They don't appear in "lsmod" and cannot be "rmmod". > > > Multiple request_module("umh") will load multiple umh.ko processes. > > > > > > Similar to kernel modules the kernel will be tainted if "umh module" > > > has invalid signature. > > > > Shouldn't we fail to load the "module" if the signature is not valid if > > CONFIG_MODULE_SIG_FORCE=y is enabled, like we do for modules? I run my > > systems like that, and just "warning" isn't probably a good idea for > > systems that want to enforce that everything is signed properly? > > CONFIG_MODULE_SIG_FORCE=y is already handled by this patch. > It's checked first for either .ko or umh.ko (before any elf parsing) > and returns -ENOKEY to user space without any dmesg message. > I think it's best to keep it as-is. > The taint and warning is for 'undef SIG_FORCE' and when module > is signed, but incorrectly. Ah, sorry, I missed that, thanks for clearing it up. > > Other than that, one minor question: > > > > > @@ -1745,7 +1745,9 @@ static int do_execveat_common(int fd, struct > > > filename *filename, > > > sched_exec(); > > > > > > bprm->file = file; > > > - if (fd == AT_FDCWD || filename->name[0] == '/') { > > > + if (!filename) { > > > + bprm->filename = "/dev/null"; > > > > Why the use of "/dev/null" for the filename here, and elsewhere in the > > code? While I'm "sure" that everyone really does have /dev/null/ > > mounted in the root namespace, what is the use of it here? > > filename is assumed to be non-null in several places further > down and instead of hacking it everywhere it's cleaner to assign > some string to it. > I'll change it to filename = "none" > Same in umh part. Thanks, that makes sense. greg k-h
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
combining multiple answers... On 3/6/18 3:05 AM, Greg KH wrote: Any chance you can add a field to your "umh module" type such that a normal 'modinfo' program will be able to notice it is different easily? ok. handling of modinfo turned out to be straightforward. kmod tooling worked fine with simple addition of .modinfo section. $ modinfo bpfilter filename: /lib/modules/4.16.0-rc4-00799-g1716f0aa3039-dirty/net/bpfilter/bpfilter.ko umh:Y license:GPL I will require umh=Y and license to be present. umh has to be set to Y for this 'umh modules' and taint of kernel will happen if license is not gpl. Other modinfo like vermagic are not applicable here, since umh modules interact with kernel via normal kernel/user abi. Since umh can crash, can be oom-ed by the kernel, killed by admin, the subsystem that uses them (like bpfilter) need to manage life time of umh on its own, so module infra doesn't do any accounting of them. They don't appear in "lsmod" and cannot be "rmmod". Multiple request_module("umh") will load multiple umh.ko processes. Similar to kernel modules the kernel will be tainted if "umh module" has invalid signature. Shouldn't we fail to load the "module" if the signature is not valid if CONFIG_MODULE_SIG_FORCE=y is enabled, like we do for modules? I run my systems like that, and just "warning" isn't probably a good idea for systems that want to enforce that everything is signed properly? CONFIG_MODULE_SIG_FORCE=y is already handled by this patch. It's checked first for either .ko or umh.ko (before any elf parsing) and returns -ENOKEY to user space without any dmesg message. I think it's best to keep it as-is. The taint and warning is for 'undef SIG_FORCE' and when module is signed, but incorrectly. Other than that, one minor question: @@ -1745,7 +1745,9 @@ static int do_execveat_common(int fd, struct filename *filename, sched_exec(); bprm->file = file; - if (fd == AT_FDCWD || filename->name[0] == '/') { + if (!filename) { + bprm->filename = "/dev/null"; Why the use of "/dev/null" for the filename here, and elsewhere in the code? While I'm "sure" that everyone really does have /dev/null/ mounted in the root namespace, what is the use of it here? filename is assumed to be non-null in several places further down and instead of hacking it everywhere it's cleaner to assign some string to it. I'll change it to filename = "none" Same in umh part. Also, what "namespace" does this usermode helper run in? I'm guessing the "root" one, which is fine with me, but note that people have complained in the past about other UMH running in that namespace and not in the specific namespace of the "container" that they wanted it to run in. right. this is something we can tweak later if really necessary. Right now most of the bpf is root-only, so bpfilter.ko would have to run as cap_sys_admin for now. Later we plan to tighten it to be cap_net_admin. On 3/6/18 11:12 AM, Linus Torvalds wrote: > > particularly for the early implementation when this is a new thing, I > really want a message like > > executed user process xyz-abc as a pseudo-module > > or something in dmesg. > > I do *not* want this to be a magical way to hide things. right. no intent of hiding anything. The first thing bpfilter.ko does is print 'Starting bpfilter' into /dev/console. Long term the health check of 'umh module' and interaction with the kernel should be standardized and though they're normal processes seen with 'ps' would be good to see them in lsmod as well. For now it's indeed the best to do pr_warn() message like above. Ratelimiting is probably not necessary. On 3/6/18 12:01 PM, Andy Lutomirski wrote: > > I imagine that usermode tooling needs to change regardless > because the existing tools may get rather confused if a .ko "module" the goal is to do zero changes to user tooling. The kmod tools handle this special .ko just fine. Tested with modprobe, depmod, modinfo, insmod. scripts/sign-file also works.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 6 Mar 2018, at 11:12, Linus Torvalds wrote: On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: As the first step in development of bpfilter project [1] the request_module() code is extended to allow user mode helpers to be invoked. Idea is that user mode helpers are built as part of the kernel build and installed as traditional kernel modules with .ko file extension into distro specified location, such that from a distribution point of view, they are no different than regular kernel modules. Thus, allow request_module() logic to load such user mode helper (umh) modules via: [,,] I like this, but I have one request: can we make sure that this action is visible in the system messages? When we load a regular module, at least it shows in lsmod afterwards, although I have a few times wanted to really see module load as an event in the logs too. When we load a module that just executes a user program, and there is no sign of it in the module list, I think we *really* need to make that event show to the admin some way. .. and yes, maybe we'll need to rate-limit the messages, and maybe it turns out that I'm entirely wrong and people will hate the messages after they get used to the concept of these pseudo-modules, but particularly for the early implementation when this is a new thing, I really want a message like executed user process xyz-abc as a pseudo-module or something in dmesg. I do *not* want this to be a magical way to hide things. Especially early on, this makes a lot of sense. But I wanted to plug bps and the hopefully growing set of bpf introspection tools: https://github.com/iovisor/bcc/blob/master/introspection/bps_example.txt Long term these are probably a good place to tell the admin what's going on. -chris
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Tue, Mar 6, 2018 at 12:01 PM, Andy Lutomirski wrote: > > I assume I'm missing some context here, but why does this need to be > handled by the kernel rather than, say, a change to how modprobe > works? Honestly, the less we have to mess with user-mode tooling, the better. We've been *so* much better off moving most of the module loading logic to the kernel, we should not go back in the old broken direction. I do *not* want the kmod project that is then taken over by systemd, and breaks it the same way they broke firmware loading. Keep modprobe doing one thing, and one thing only: track dependencies and mindlessly just load the modules. Do *not* ask for it to do anything else. Right now kmod is a nice simple project. Lots of testsuite stuff, and a very clear goal. Let's keep kmod doing one thing, and not even have to care about internal kernel decisions like "oh, this module might not be a module, but an executable". If anything, I think we want to keep our options open, in the case we need or want to ever consider short-circuiting things and allowing direct loading of the simple cases and bypassing modprobe entirely. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Tue, Mar 6, 2018 at 1:34 AM, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] the request_module() > code is extended to allow user mode helpers to be invoked. Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) > I assume I'm missing some context here, but why does this need to be handled by the kernel rather than, say, a change to how modprobe works? I imagine that usermode tooling needs to change regardless because the existing tools may get rather confused if a .ko "module" is really a dynamically liked program. I notice that you're using ET_EXEC in your example, and that will probably avoid problems, but I imagine that some distros would much rather use ET_DYN.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Mon, Mar 5, 2018 at 5:34 PM, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] the request_module() > code is extended to allow user mode helpers to be invoked. Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. Thus, allow request_module() logic to load such > user mode helper (umh) modules via: [,,] I like this, but I have one request: can we make sure that this action is visible in the system messages? When we load a regular module, at least it shows in lsmod afterwards, although I have a few times wanted to really see module load as an event in the logs too. When we load a module that just executes a user program, and there is no sign of it in the module list, I think we *really* need to make that event show to the admin some way. .. and yes, maybe we'll need to rate-limit the messages, and maybe it turns out that I'm entirely wrong and people will hate the messages after they get used to the concept of these pseudo-modules, but particularly for the early implementation when this is a new thing, I really want a message like executed user process xyz-abc as a pseudo-module or something in dmesg. I do *not* want this to be a magical way to hide things. Linus
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On Mon, Mar 05, 2018 at 05:34:57PM -0800, Alexei Starovoitov wrote: > As the first step in development of bpfilter project [1] the request_module() > code is extended to allow user mode helpers to be invoked. Idea is that > user mode helpers are built as part of the kernel build and installed as > traditional kernel modules with .ko file extension into distro specified > location, such that from a distribution point of view, they are no different > than regular kernel modules. Thus, allow request_module() logic to load such > user mode helper (umh) modules via: > > request_module("foo") -> > call_umh("modprobe foo") -> > sys_finit_module(FD of /lib/modules/.../foo.ko) -> > call_umh(struct file) > > Such approach enables kernel to delegate functionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Another > advantage coming with that would be that bpfilter.ko can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). > Also, such architecture makes the kernel/user boundary very precise: > control plane is done by the user space while data plane stays in the kernel. > > It's easy to distinguish "umh module" from traditional kernel module: > > $ readelf -h bld/net/bpfilter/bpfilter.ko|grep Type > Type: EXEC (Executable file) > $ readelf -h bld/net/ipv4/netfilter/iptable_filter.ko |grep Type > Type: REL (Relocatable file) Any chance you can add a field to your "umh module" type such that a normal 'modinfo' program will be able to notice it is different easily? > Since umh can crash, can be oom-ed by the kernel, killed by admin, > the subsystem that uses them (like bpfilter) need to manage life > time of umh on its own, so module infra doesn't do any accounting > of them. They don't appear in "lsmod" and cannot be "rmmod". > Multiple request_module("umh") will load multiple umh.ko processes. > > Similar to kernel modules the kernel will be tainted if "umh module" > has invalid signature. Shouldn't we fail to load the "module" if the signature is not valid if CONFIG_MODULE_SIG_FORCE=y is enabled, like we do for modules? I run my systems like that, and just "warning" isn't probably a good idea for systems that want to enforce that everything is signed properly? Other than that, one minor question: > @@ -1745,7 +1745,9 @@ static int do_execveat_common(int fd, struct filename > *filename, > sched_exec(); > > bprm->file = file; > - if (fd == AT_FDCWD || filename->name[0] == '/') { > + if (!filename) { > + bprm->filename = "/dev/null"; Why the use of "/dev/null" for the filename here, and elsewhere in the code? While I'm "sure" that everyone really does have /dev/null/ mounted in the root namespace, what is the use of it here? Also, what "namespace" does this usermode helper run in? I'm guessing the "root" one, which is fine with me, but note that people have complained in the past about other UMH running in that namespace and not in the specific namespace of the "container" that they wanted it to run in. Anyway, this is crazy stuff, but I like the idea and have no objection to it overall :) thanks, greg k-h
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
On 3/5/18 6:13 PM, Randy Dunlap wrote: Hi, On 03/05/2018 05:34 PM, Alexei Starovoitov wrote: diff --git a/kernel/module.c b/kernel/module.c index ad2d420024f6..6cfa35795741 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3669,6 +3683,17 @@ static int load_module(struct load_info *info, const char __user *uargs, if (err) goto free_copy; + if (info->hdr->e_type == ET_EXEC) { +#ifdef CONFIG_MODULE_SIG + if (!info->sig_ok) { + pr_notice_once("umh %s verification failed: signature and/or required key missing - tainting kernel\n", That's not a very friendly message to tell a user. "umh" eh? umh is an abbreviation known to kernel newbies: https://kernelnewbies.org/KernelProjects/usermode-helper-enhancements The rest of the message is copy paste of existing one. + info->file->f_path.dentry->d_name.name); + add_taint(TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK); + } And since the signature failed, why is it being loaded at all? because this is how regular kernel modules deal with it. sig_enforce is handled earlier. Is this in the "--force" load path? --force forces modver and modmagic. These things don't apply here.
Re: [PATCH net-next] modules: allow modprobe load regular elf binaries
Hi, On 03/05/2018 05:34 PM, Alexei Starovoitov wrote: > diff --git a/kernel/module.c b/kernel/module.c > index ad2d420024f6..6cfa35795741 100644 > --- a/kernel/module.c > +++ b/kernel/module.c > @@ -3669,6 +3683,17 @@ static int load_module(struct load_info *info, const > char __user *uargs, > if (err) > goto free_copy; > > + if (info->hdr->e_type == ET_EXEC) { > +#ifdef CONFIG_MODULE_SIG > + if (!info->sig_ok) { > + pr_notice_once("umh %s verification failed: signature > and/or required key missing - tainting kernel\n", That's not a very friendly message to tell a user. "umh" eh? > +info->file->f_path.dentry->d_name.name); > + add_taint(TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK); > + } And since the signature failed, why is it being loaded at all? Is this in the "--force" load path? > +#endif > + return 0; > + } > + > /* Figure out module layout, and allocate all the memory. */ > mod = layout_and_allocate(info, flags); > if (IS_ERR(mod)) { thanks, -- ~Randy