When a kernel module that registered kprobes is unloaded without calling
unregister_kprobe(), kprobes_module_callback() calls kill_kprobe() to
mark the probe(s) GONE.  If the probe is an aggrprobe, kill_kprobe()
also marks all child probes GONE, but it does not remove them from
the aggrprobe's list.

The problem is that child probes whose struct kprobe resides in the
unloading module's memory are freed along with the module, yet they
remain on the aggrprobe's list.  Later, when another caller registers
a kprobe at the same address, __get_valid_kprobe() walks that list
and dereferences the freed child probe, causing a use-after-free.

Reproduction steps:

    1) Load module A which registers two kprobes on the same kernel
       function address (e.g., do_nanosleep), causing them to be
       aggregated under one aggrprobe.

    2) Unload module A without calling unregister_kprobe().
       Module A's memory is freed, but its two child probes remain
       on the aggrprobe's list as dangling pointers.

    3) Load module B and register a kprobe on the same address
       (e.g., do_nanosleep). register_kprobe() -> __get_valid_kprobe()
       traverses the aggrprobe's list and dereferences the freed child
       probe from module A, triggering a use-after-free and kernel panic.

The resulting crash looks like:
    [  464.950864] BUG: kernel NULL pointer dereference, address: 
0000000000000000
    [  464.950872] #PF: supervisor read access in kernel mode
    [  464.950874] #PF: error_code(0x0000) - not-present page
    ...
    [  464.950915] Call Trace:
    [  464.950922]  <TASK>
    [  464.950923]  register_kprobe+0x65/0x2e0
    [  464.950928]  ? __pfx_stage2_init+0x10/0x10 [kprobe_leak_stage2]
    [  464.950933]  stage2_init+0x37/0xff0 [kprobe_leak_stage2]
    [  464.950938]  ? __pfx_stage2_init+0x10/0x10 [kprobe_leak_stage2]
    [  464.950942]  do_one_initcall+0x56/0x2e0
    [  464.950948]  do_init_module+0x60/0x230
    ...

  Fix this by adding selective cleanup in kprobes_module_callback():
  after calling kill_kprobe() on the aggrprobe, iterate its child list
  and remove any child probe whose struct kprobe is inside the going
  module's memory range (within_module_init / within_module_core).

  This is done in kprobes_module_callback() rather than kill_kprobe()
  because kill_kprobe()'s semantic is "the probed code is going away,
  mark probes GONE".  The lifetime of a probe is bound to the probed
  code, not to the module containing the struct kprobe.  Child probes
  owned by other still-loaded modules or by kmalloc (ftrace, perf,
  kprobe-events) must stay on the list so they can be unregistered
  later.  Only child probes whose memory is about to be freed need to
  be removed from the list to prevent dangling pointers.

Fixes: e8386a0cb22f4 ("kprobes: support probing module __exit function")
Signed-off-by: Shijia Hu <[email protected]>
---
 kernel/kprobes.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bfc89083daa9..ff277314183c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2664,6 +2664,7 @@ static int kprobes_module_callback(struct notifier_block 
*nb,
                                   unsigned long val, void *data)
 {
        struct module *mod = data;
+       struct hlist_node *tmp;
        struct hlist_head *head;
        struct kprobe *p;
        unsigned int i;
@@ -2685,7 +2686,7 @@ static int kprobes_module_callback(struct notifier_block 
*nb,
         */
        for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
                head = &kprobe_table[i];
-               hlist_for_each_entry(p, head, hlist)
+               hlist_for_each_entry_safe(p, tmp, head, hlist) {
                        if (within_module_init((unsigned long)p->addr, mod) ||
                            (checkcore &&
                             within_module_core((unsigned long)p->addr, mod))) {
@@ -2702,6 +2703,26 @@ static int kprobes_module_callback(struct notifier_block 
*nb,
                                 */
                                kill_kprobe(p);
                        }
+
+                       /*
+                        * Child probes are not on the kprobe hash list, so
+                        * the above loop can not find them. If a child probe
+                        * is allocated in the module's memory, it will become
+                        * a dangling pointer after the module is freed.
+                        */
+                       if (kprobe_aggrprobe(p)) {
+                               struct kprobe *kp, *kptmp;
+
+                               list_for_each_entry_safe(kp, kptmp, &p->list, 
list) {
+                                       if (within_module_init((unsigned 
long)kp, mod) ||
+                                           (checkcore &&
+                                            within_module_core((unsigned 
long)kp, mod))) {
+                                               kp->flags |= KPROBE_FLAG_GONE;
+                                               list_del_rcu(&kp->list);
+                                       }
+                               }
+                       }
+               }
        }
        if (val == MODULE_STATE_GOING)
                remove_module_kprobe_blacklist(mod);
-- 
2.20.1


Reply via email to