Re: fault(4)
On 08.02.2020 11:47, Maxime Villard wrote: > > Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already > gives > an instant crash: > > kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)" > failed: file ".../sys/arch/x86/x86/pmap.c" > There is a number of similar reports on syzbot. > Looks like radixtree.c doesn't handle allocation failures very well > somewhere. > > fault(4) seems like the kind of feature that would be useful for > stress-testing > and fuzzing. As you can see in the diff, its code is extremely simple. > > Maxime > > [1] https://m00nbsd.net/garbage/fault/fault.diff This tool is a must have but I defer review to others.
Re: fault(4)
> Date: Sat, 8 Feb 2020 06:19:43 -0800 (PST) > From: Paul Goyette > > If this is a device on which you can use ioctl() to configure, why is > it not stored under sys/dev and why is it not included in kernel config > with pseudo-device directive (and corresponding files.kern changes)? There's no need for an _autoconf_ device (usually representing a physical device in the hardware's device tree) in order to have a userland /dev node (representing an interface between userland and kernel). See kern_cpu.c, kern_drvctl.c, , for other examples of /dev nodes without autoconf devices. One could use a pseudo-device just to get an initialization routine so that it's self-contained; maxv chose to use a module instead for that. We could have a fault injection hook that an unloadable module would set up, but I see no need to block adoption of a very good thing that already catches bugs until it is generalized to perfection!
Re: fault(4)
On Sat, 8 Feb 2020, Martin Husemann wrote: On Sat, Feb 08, 2020 at 06:19:43AM -0800, Paul Goyette wrote: The module should be MODULE_CLASS_DRIVER. And there should be a sys/module/fault/Makefile to build the module, along with changes to sys/module/Makefile (to descend into the fault directory) and to src/distrib/sets/lists/modules/mi (for new files). The module init code should also have appropriate config_{init,fini}_component() calls for inserting the device into the autoconfig database. Since all probes have to be hard-coded this is very intrusive and I am not sure a module makes sense. True. But in that case it should not be a module at all. But shouldn't it still be a pseudo-device and not just an option? ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
Re: fault(4)
On Sat, Feb 08, 2020 at 06:19:43AM -0800, Paul Goyette wrote: > The module should be MODULE_CLASS_DRIVER. And there should be a > sys/module/fault/Makefile to build the module, along with changes to > sys/module/Makefile (to descend into the fault directory) and to > src/distrib/sets/lists/modules/mi (for new files). The module init > code should also have appropriate config_{init,fini}_component() > calls for inserting the device into the autoconfig database. Since all probes have to be hard-coded this is very intrusive and I am not sure a module makes sense. Martin
Re: fault(4)
If this is a device on which you can use ioctl() to configure, why is it not stored under sys/dev and why is it not included in kernel config with pseudo-device directive (and corresponding files.kern changes)? The module should be MODULE_CLASS_DRIVER. And there should be a sys/module/fault/Makefile to build the module, along with changes to sys/module/Makefile (to descend into the fault directory) and to src/distrib/sets/lists/modules/mi (for new files). The module init code should also have appropriate config_{init,fini}_component() calls for inserting the device into the autoconfig database. Finally, it would be nice if it were possible to unload the module, although I understand why it might be difficult to ensure that it is safe to remove. On Sat, 8 Feb 2020, Maxime Villard wrote: [I am not subscribed to this list, so if you want to answer, make sure to CC me] In order to explore error branches, and to test the kernel's ability to cope with failures, it is often necessary to hard-trigger such failures. Here is an implementation [1] for fault(4), a driver that allows to trigger failures in the kernel. A similar driver exists in Linux. The fault_inject() function decides whether to return true or false, depending on parameters configurable by userland via ioctls on /dev/fault. The caller of this function should then error out depending on the return value. Typically: whatever_subsystem() { ... if (fault_inject()) return NULL; // means failure ... return non_null; // means success } Several modes can be available, I have implemented one for now, the N-th mode: every N-th call to fault_inject (N being configurable) will make it return true. Several scopes are available: global (ie system-wide), or calling LWP. Examples: - mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole kernel will return true, regardless of the LWP. - mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that enabled the mode will return true. For the other LWPs, fault_inject() always returns false. fault_inject() can be introduced in any place of interest. For now I added it in pool_cache_get(): if (flags & PR_NOWAIT) { if (fault_inject()) return NULL; } Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives an instant crash: kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)" failed: file ".../sys/arch/x86/x86/pmap.c" Looks like radixtree.c doesn't handle allocation failures very well somewhere. fault(4) seems like the kind of feature that would be useful for stress-testing and fuzzing. As you can see in the diff, its code is extremely simple. Maxime [1] https://m00nbsd.net/garbage/fault/fault.diff !DSPAM:5e3e9212102679617345149! ++--+---+ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired) | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com | | Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org | ++--+---+
fault(4)
[I am not subscribed to this list, so if you want to answer, make sure to CC me] In order to explore error branches, and to test the kernel's ability to cope with failures, it is often necessary to hard-trigger such failures. Here is an implementation [1] for fault(4), a driver that allows to trigger failures in the kernel. A similar driver exists in Linux. The fault_inject() function decides whether to return true or false, depending on parameters configurable by userland via ioctls on /dev/fault. The caller of this function should then error out depending on the return value. Typically: whatever_subsystem() { ... if (fault_inject()) return NULL; // means failure ... return non_null; // means success } Several modes can be available, I have implemented one for now, the N-th mode: every N-th call to fault_inject (N being configurable) will make it return true. Several scopes are available: global (ie system-wide), or calling LWP. Examples: - mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole kernel will return true, regardless of the LWP. - mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that enabled the mode will return true. For the other LWPs, fault_inject() always returns false. fault_inject() can be introduced in any place of interest. For now I added it in pool_cache_get(): if (flags & PR_NOWAIT) { if (fault_inject()) return NULL; } Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives an instant crash: kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)" failed: file ".../sys/arch/x86/x86/pmap.c" Looks like radixtree.c doesn't handle allocation failures very well somewhere. fault(4) seems like the kind of feature that would be useful for stress-testing and fuzzing. As you can see in the diff, its code is extremely simple. Maxime [1] https://m00nbsd.net/garbage/fault/fault.diff