Re: fault(4)

2020-02-22 Thread Kamil Rytarowski
On 08.02.2020 11:47, Maxime Villard wrote:
>
> Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already
> gives
> an instant crash:
>
> kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)"
> failed: file ".../sys/arch/x86/x86/pmap.c"
>

There is a number of similar reports on syzbot.

> Looks like radixtree.c doesn't handle allocation failures very well
> somewhere.
>
> fault(4) seems like the kind of feature that would be useful for
> stress-testing
> and fuzzing. As you can see in the diff, its code is extremely simple.
>
> Maxime
>
> [1] https://m00nbsd.net/garbage/fault/fault.diff

This tool is a must have but I defer review to others.


Re: fault(4)

2020-02-08 Thread Taylor R Campbell
> Date: Sat, 8 Feb 2020 06:19:43 -0800 (PST)
> From: Paul Goyette 
> 
> If this is a device on which you can use ioctl() to configure, why is
> it not stored under sys/dev and why is it not included in kernel config
> with pseudo-device directive (and corresponding files.kern changes)?

There's no need for an _autoconf_ device (usually representing a
physical device in the hardware's device tree) in order to have a
userland /dev node (representing an interface between userland and
kernel).  See kern_cpu.c, kern_drvctl.c, , for other examples of
/dev nodes without autoconf devices.

One could use a pseudo-device just to get an initialization routine so
that it's self-contained; maxv chose to use a module instead for that.

We could have a fault injection hook that an unloadable module would
set up, but I see no need to block adoption of a very good thing that
already catches bugs until it is generalized to perfection!


Re: fault(4)

2020-02-08 Thread Paul Goyette

On Sat, 8 Feb 2020, Martin Husemann wrote:


On Sat, Feb 08, 2020 at 06:19:43AM -0800, Paul Goyette wrote:

The module should be MODULE_CLASS_DRIVER.  And there should be a
sys/module/fault/Makefile to build the module, along with changes to
sys/module/Makefile (to descend into the fault directory) and to
src/distrib/sets/lists/modules/mi (for new files).  The module init
code should also have appropriate config_{init,fini}_component()
calls for inserting the device into the autoconfig database.


Since all probes have to be hard-coded this is very intrusive and I am
not sure a module makes sense.


True.  But in that case it should not be a module at all.

But shouldn't it still be a pseudo-device and not just an option?



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Re: fault(4)

2020-02-08 Thread Martin Husemann
On Sat, Feb 08, 2020 at 06:19:43AM -0800, Paul Goyette wrote:
> The module should be MODULE_CLASS_DRIVER.  And there should be a
> sys/module/fault/Makefile to build the module, along with changes to
> sys/module/Makefile (to descend into the fault directory) and to
> src/distrib/sets/lists/modules/mi (for new files).  The module init
> code should also have appropriate config_{init,fini}_component()
> calls for inserting the device into the autoconfig database.

Since all probes have to be hard-coded this is very intrusive and I am
not sure a module makes sense.

Martin


Re: fault(4)

2020-02-08 Thread Paul Goyette

If this is a device on which you can use ioctl() to configure, why is
it not stored under sys/dev and why is it not included in kernel config
with pseudo-device directive (and corresponding files.kern changes)?

The module should be MODULE_CLASS_DRIVER.  And there should be a 
sys/module/fault/Makefile to build the module, along with changes to

sys/module/Makefile (to descend into the fault directory) and to
src/distrib/sets/lists/modules/mi (for new files).  The module init
code should also have appropriate config_{init,fini}_component()
calls for inserting the device into the autoconfig database.

Finally, it would be nice if it were possible to unload the module,
although I understand why it might be difficult to ensure that it is
safe to remove.


On Sat, 8 Feb 2020, Maxime Villard wrote:

[I am not subscribed to this list, so if you want to answer, make sure to CC 
me]


In order to explore error branches, and to test the kernel's ability to cope
with failures, it is often necessary to hard-trigger such failures.

Here is an implementation [1] for fault(4), a driver that allows to trigger
failures in the kernel. A similar driver exists in Linux.

The fault_inject() function decides whether to return true or false, 
depending
on parameters configurable by userland via ioctls on /dev/fault. The caller 
of

this function should then error out depending on the return value. Typically:

whatever_subsystem()
{
...
if (fault_inject())
return NULL; // means failure
...
return non_null; // means success
}

Several modes can be available, I have implemented one for now, the N-th 
mode:

every N-th call to fault_inject (N being configurable) will make it return
true.

Several scopes are available: global (ie system-wide), or calling LWP.

Examples:

- mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole 
kernel

  will return true, regardless of the LWP.
- mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that
  enabled the mode will return true. For the other LWPs, fault_inject() 
always

  returns false.

fault_inject() can be introduced in any place of interest. For now I added it
in pool_cache_get():

if (flags & PR_NOWAIT) {
if (fault_inject())
return NULL;
}

Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives
an instant crash:

	kernel diagnostic assertion 
"radix_tree_empty_tree_p(>pm_pvtree)"

failed: file ".../sys/arch/x86/x86/pmap.c"

Looks like radixtree.c doesn't handle allocation failures very well 
somewhere.


fault(4) seems like the kind of feature that would be useful for 
stress-testing

and fuzzing. As you can see in the diff, its code is extremely simple.

Maxime

[1] https://m00nbsd.net/garbage/fault/fault.diff

!DSPAM:5e3e9212102679617345149!




++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


fault(4)

2020-02-08 Thread Maxime Villard

[I am not subscribed to this list, so if you want to answer, make sure to CC me]

In order to explore error branches, and to test the kernel's ability to cope
with failures, it is often necessary to hard-trigger such failures.

Here is an implementation [1] for fault(4), a driver that allows to trigger
failures in the kernel. A similar driver exists in Linux.

The fault_inject() function decides whether to return true or false, depending
on parameters configurable by userland via ioctls on /dev/fault. The caller of
this function should then error out depending on the return value. Typically:

whatever_subsystem()
{
...
if (fault_inject())
return NULL; // means failure
...
return non_null; // means success
}

Several modes can be available, I have implemented one for now, the N-th mode:
every N-th call to fault_inject (N being configurable) will make it return
true.

Several scopes are available: global (ie system-wide), or calling LWP.

Examples:

 - mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole kernel
   will return true, regardless of the LWP.
 - mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that
   enabled the mode will return true. For the other LWPs, fault_inject() always
   returns false.

fault_inject() can be introduced in any place of interest. For now I added it
in pool_cache_get():

if (flags & PR_NOWAIT) {
if (fault_inject())
return NULL;
}

Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives
an instant crash:

kernel diagnostic assertion "radix_tree_empty_tree_p(>pm_pvtree)"
failed: file ".../sys/arch/x86/x86/pmap.c"

Looks like radixtree.c doesn't handle allocation failures very well somewhere.

fault(4) seems like the kind of feature that would be useful for stress-testing
and fuzzing. As you can see in the diff, its code is extremely simple.

Maxime

[1] https://m00nbsd.net/garbage/fault/fault.diff