On Sat, Jul 20, 2024 at 2:17 AM Mathias Krause <mini...@grsecurity.net> wrote:
>
> Hi Steven, Ajay,
>
> [ @Cc list: I found out issues with tracefs have been reported /
>   attempted to get fixed in the past, so you may be interested. ]
>
> I noticed, the user events ftrace selftest is crashing every now and
> then in our automated tests. Digging into, I found that the following
> is triggering the issue very reliable:
>
> - in one shell, as root:
>   # while true; do ./kselftest/user_events/ftrace_test; done
>
> - in a second shell, again as root:
>   # cd /sys/kernel/tracing
>   # while true; do cat events/user_events/__test_event/format; done 
> 2>/dev/null
>

Tried to reproduced on 6.10.0-rc7-100.ph5+, only getting repeated output as:

# while true; do cat events/user_events/__test_event/format; done 2>/dev/null
print fmt: ""
name: __test_event
ID: 2390
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

# while true; do ./kselftest/user_events/ftrace_test; done
TAP version 13
1..6
# Starting 6 tests from 1 test cases.
#  RUN           user.register_events ...
#            OK  user.register_events
ok 1 user.register_events
#  RUN           user.write_events ...
#            OK  user.write_events
ok 2 user.write_events
#  RUN           user.write_empty_events ...
#            OK  user.write_empty_events
ok 3 user.write_empty_events
#  RUN           user.write_fault ...
#            OK  user.write_fault
ok 4 user.write_fault
#  RUN           user.write_validator ...
#            OK  user.write_validator
ok 5 user.write_validator
#  RUN           user.print_fmt ...
#            OK  user.print_fmt
ok 6 user.print_fmt
# PASSED: 6 / 6 tests passed.
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0

Am I missing any step?

> Ignoring that the selftest fails for half of its tests -- which is a
> regression and therefore yet another bug, I guess -- this triggers an
> access fault (GFP/PF/NULL deref) after a few iterations, usually within
> a minute, mostly mere seconds. With KASAN enabled it generates a splat
> almost instantly, like following:
>
> [   23.790955] 
> ==================================================================
> [   23.791692] BUG: KASAN: slab-use-after-free in f_show+0x43b/0x470
> [   23.792244] Read of size 8 at addr ffff888007076878 by task cat/755
> [   23.792753]
> [   23.792947] CPU: 1 PID: 755 Comm: cat Tainted: G      D          N 
> 6.10.0-vanilla-dirty #30
> [   23.793563] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   23.794280] Call Trace:
> [   23.794470]  <TASK>
> [   23.794642]  dump_stack_lvl+0x66/0xa0
> [   23.794909]  print_report+0xd0/0x630
> [   23.795180]  ? f_show+0x43b/0x470
> [   23.795415]  ? __virt_addr_valid+0x208/0x3f0
> [   23.795700]  ? f_show+0x43b/0x470
> [   23.795935]  kasan_report+0xd8/0x110
> [   23.796183]  ? f_show+0x43b/0x470
> [   23.796418]  f_show+0x43b/0x470
> [   23.796645]  seq_read_iter+0x418/0x11e0
> [   23.796907]  ? __page_table_check_ptes_set+0x166/0x1a0
> [   23.797280]  seq_read+0x261/0x350
> [   23.797537]  ? lock_release+0x453/0x600
> [   23.797819]  ? __pfx_seq_read+0x10/0x10
> [   23.798103]  ? set_ptes.isra.0+0x11b/0x150
> [   23.798402]  vfs_read+0x171/0x9e0
> [   23.798657]  ? lock_acquire+0x408/0x4b0
> [   23.798970]  ? __pfx_vfs_read+0x10/0x10
> [   23.799267]  ? lock_release+0x453/0x600
> [   23.799549]  ? do_user_addr_fault+0x45b/0x8b0
> [   23.799862]  ? __pfx_lock_release+0x10/0x10
> [   23.800162]  ? __pfx___up_read+0x10/0x10
> [   23.800448]  ksys_read+0xdd/0x1a0
> [   23.800704]  ? __pfx_ksys_read+0x10/0x10
> [   23.801007]  ? do_user_addr_fault+0x465/0x8b0
> [   23.801341]  do_syscall_64+0x66/0x130
> [   23.801615]  entry_SYSCALL_64_after_hwframe+0x71/0x79
> [   23.801959] RIP: 0033:0x7f1fdb9df40e
> [   23.802230] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 2e 08 0b 00 e8 69 01 02 00 
> 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 
> 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
> [   23.803352] RSP: 002b:00007fff86d62658 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000000
> [   23.803848] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 
> 00007f1fdb9df40e
> [   23.804302] RDX: 0000000000020000 RSI: 00007f1fdb8d3000 RDI: 
> 0000000000000003
> [   23.804753] RBP: 00007f1fdb8d3000 R08: 00007f1fdb8d2010 R09: 
> 0000000000000000
> [   23.805239] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 
> 0000000000000000
> [   23.805694] R13: 0000000000000003 R14: 0000000000020000 R15: 
> 0000000000020000
> [   23.806153]  </TASK>
> [   23.806352]
> [   23.806518] Allocated by task 753:
> [   23.806781]  kasan_save_stack+0x20/0x40
> [   23.807094]  kasan_save_track+0x14/0x30
> [   23.807377]  __kasan_kmalloc+0x8f/0xa0
> [   23.807656]  user_event_parse_cmd+0x590/0x25e0
> [   23.807969]  user_events_ioctl+0xa52/0x17f0
> [   23.808267]  __x64_sys_ioctl+0x133/0x190
> [   23.808555]  do_syscall_64+0x66/0x130
> [   23.808830]  entry_SYSCALL_64_after_hwframe+0x71/0x79
> [   23.809208]
> [   23.809370] Freed by task 58:
> [   23.809609]  kasan_save_stack+0x20/0x40
> [   23.809890]  kasan_save_track+0x14/0x30
> [   23.810173]  kasan_save_free_info+0x3b/0x60
> [   23.810473]  poison_slab_object+0x10a/0x170
> [   23.810771]  __kasan_slab_free+0x14/0x30
> [   23.811090]  kfree+0xe0/0x2f0
> [   23.811338]  destroy_user_event+0x305/0x450
> [   23.811639]  delayed_destroy_user_event+0x5c/0xe0
> [   23.811965]  process_one_work+0x81c/0x1970
> [   23.812259]  worker_thread+0x608/0x1160
> [   23.812542]  kthread+0x2be/0x3b0
> [   23.812794]  ret_from_fork+0x2c/0x70
> [   23.813098]  ret_from_fork_asm+0x1a/0x30
> [   23.813387]
> [   23.813551] Last potentially related work creation:
> [   23.813885]  kasan_save_stack+0x20/0x40
> [   23.814168]  __kasan_record_aux_stack+0x8e/0xa0
> [   23.814489]  insert_work+0x20/0x1b0
> [   23.814756]  __queue_work+0x67a/0xc60
> [   23.815079]  queue_work_on+0x63/0x90
> [   23.815350]  user_event_put+0x1f9/0x390
> [   23.815631]  user_events_ioctl+0x11ed/0x17f0
> [   23.815935]  __x64_sys_ioctl+0x133/0x190
> [   23.816223]  do_syscall_64+0x66/0x130
> [   23.816498]  entry_SYSCALL_64_after_hwframe+0x71/0x79
> [   23.816841]
> [   23.817019] The buggy address belongs to the object at ffff888007076800
> [   23.817019]  which belongs to the cache kmalloc-cg-512 of size 512
> [   23.817799] The buggy address is located 120 bytes inside of
> [   23.817799]  freed 512-byte region [ffff888007076800, ffff888007076a00)
> [   23.818530]
> [   23.818694] The buggy address belongs to the physical page:
> [   23.819092] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 
> pfn:0x7074
> [   23.819602] head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 
> pincount:0
> [   23.820323] memcg:ffff88800277fd01
> [   23.820595] anon flags: 0x40(head|zone=0)
> [   23.820885] page_type: 0xffffefff(slab)
> [   23.821203] raw: 0000000000000040 ffff88800014cdc0 0000000000000000 
> 0000000000000001
> [   23.821706] raw: 0000000000000000 0000000080100010 00000001ffffefff 
> ffff88800277fd01
> [   23.822206] head: 0000000000000040 ffff88800014cdc0 0000000000000000 
> 0000000000000001
> [   23.822711] head: 0000000000000000 0000000080100010 00000001ffffefff 
> ffff88800277fd01
> [   23.823261] head: 0000000000000002 ffffea00001c1d01 ffffffffffffffff 
> 0000000000000000
> [   23.823764] head: ffff888000000004 0000000000000000 00000000ffffffff 
> 0000000000000000
> [   23.824271] page dumped because: kasan: bad access detected
> [   23.824638]
> [   23.824799] Memory state around the buggy address:
> [   23.825162]  ffff888007076700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [   23.825641]  ffff888007076780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 
> fc fc
> [   23.826116] >ffff888007076800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   23.826588]                                                                
>  ^
> [   23.827070]  ffff888007076880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   23.827545]  ffff888007076900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
> fb fb
> [   23.828018] 
> ==================================================================
>
> I did a bisect run and ended up on commit 27152bceea1d ("eventfs: Move
> tracing/events to eventfs"). This is, of cause, only the enabling
> commit. The bug is likely in one of the preceeding commits introducing
> eventfs infrastructure.
>
> The git bisect log (I jumped around a little in the beginning, but yeah,
> v6.10 is still broken):
>
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
> git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
> # status: waiting for good commit(s), bad commit known
> # bad: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
> git bisect bad 0c3836482481200ead7b416ca80c68a29cfdaabd
> # status: waiting for good commit(s), bad commit known
> # bad: [ffc253263a1375a65fa6c9f62a893e9767fbebfa] Linux 6.6
> git bisect bad ffc253263a1375a65fa6c9f62a893e9767fbebfa
> # status: waiting for good commit(s), bad commit known
> # good: [6995e2de6891c724bfeb2db33d7b87775f913ad1] Linux 6.4
> git bisect good 6995e2de6891c724bfeb2db33d7b87775f913ad1
> # good: [692f5510159c79bfa312a4e27a15e266232bfb4c] Merge tag 'asoc-v6.6' of 
> https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
> git bisect good 692f5510159c79bfa312a4e27a15e266232bfb4c
> # good: [0e72db77672ff4758a31fb5259c754a7bb229751] Merge tag 'soc-dt-6.6' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect good 0e72db77672ff4758a31fb5259c754a7bb229751
> # bad: [5eea5820c7340d39e56e169e1b87199391105f6b] Merge tag 
> 'mm-stable-2023-09-04-14-00' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad 5eea5820c7340d39e56e169e1b87199391105f6b
> # good: [28a4f91f5f251689c69155bc6a0b1afc9916c874] Merge tag 
> 'driver-core-6.6-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect good 28a4f91f5f251689c69155bc6a0b1afc9916c874
> # bad: [f7e97ce26972ae7be8bbbae8d819ff311d4c5900] Merge tag 'for-linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> git bisect bad f7e97ce26972ae7be8bbbae8d819ff311d4c5900
> # good: [9a5d660fdb25d20748d7f9e9559c86073c3bb368] media: ivsc: Add ACPI 
> dependency
> git bisect good 9a5d660fdb25d20748d7f9e9559c86073c3bb368
> # good: [704e2c6107f1a5353a1038bac137dda0df2a6dd0] Merge tag 'icc-6.6-rc1' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next
> git bisect good 704e2c6107f1a5353a1038bac137dda0df2a6dd0
> # good: [bd30fe6a7d9b72e73c5ac9109cbc3066dde08034] Merge tag 'wq-for-6.6' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
> git bisect good bd30fe6a7d9b72e73c5ac9109cbc3066dde08034
> # good: [d952f54d01ec2ea5ee9d5e21f2ea3a5807b4bcbc] RDMA/hns: Remove unused 
> declaration hns_roce_modify_srq()
> git bisect good d952f54d01ec2ea5ee9d5e21f2ea3a5807b4bcbc
> # bad: [34232fcfe9a383bea802af682baae5c99f22376c] Merge tag 'trace-v6.6' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> git bisect bad 34232fcfe9a383bea802af682baae5c99f22376c
> # bad: [7c1130ea5cae215588e6d51242b877fd405e6c32] test: ftrace: Fix kprobe 
> test for eventfs
> git bisect bad 7c1130ea5cae215588e6d51242b877fd405e6c32
> # good: [5bdcd5f5331a276a3ddf1fba8605986d0d15298a] eventfs: Implement removal 
> of meta data from eventfs
> git bisect good 5bdcd5f5331a276a3ddf1fba8605986d0d15298a
> # bad: [27152bceea1df27ffebb12ac9cd9adbf2c4c3f35] eventfs: Move 
> tracing/events to eventfs
> git bisect bad 27152bceea1df27ffebb12ac9cd9adbf2c4c3f35
> # first bad commit: [27152bceea1df27ffebb12ac9cd9adbf2c4c3f35] eventfs: Move 
> tracing/events to eventfs
>
> I stared quite hard at the inode/dentry/file handling and, imho, it's
> quite broken as it leaks pointers to refcounted objects which may go
> away anytime, e.g. &user->call via trace_add_event_call() in
> user_event_set_call_visible(). That'll lead to a user reachable file
> hierarchy below events/user_events/$eventname/ that isn't bound to the
> creator's mm.
>
> Assume one process opens events/user_events/$eventname/format, then the
> walked fields -- the whole object, actually -- may go away anytime, once
> the creator either removes the user probe again or closes its fd to
> /sys/kernel/tracing/user_events_data, leading to user_events_release()
> doing the final user_event_put() leading to object destruction. The
> concurrent reader of events/user_events/$eventname/format will now
> operate on a free'd object. Boom!
>
> As mentioned, I stared quite hard at the code and tried to fix it via
> numerous ways but the core issue is, imho, the lack of refcounting the
> &user->call object, or even better, 'user', for the files created below
> events/user_events/$eventname/. However, I found no easy way to do that.
>
> My hopes are, you may be able to, now that there's an easy reproducer.
>
> Beside the obvious bug, I noticed the following (not fixing the issue,
> tho):
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 5d88c184f0fc..687ad0a26458 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -112,7 +112,7 @@ static void release_ei(struct kref *ref)
>                         entry->release(entry->name, ei->data);
>         }
>
> -       call_rcu(&ei->rcu, free_ei_rcu);
> +       call_srcu(&eventfs_srcu, &ei->rcu, free_ei_rcu);
>  }
>
>  static inline void put_ei(struct eventfs_inode *ei)
> @@ -735,7 +735,9 @@ struct eventfs_inode *eventfs_create_dir(const char 
> *name, struct eventfs_inode
>
>         /* Was the parent freed? */
>         if (list_empty(&ei->list)) {
> +               mutex_lock(&eventfs_mutex);
>                 cleanup_ei(ei);
> +               mutex_unlock(&eventfs_mutex);
>                 ei = NULL;
>         }
>         return ei;
>
> The s/call_rcu/call_srcu/ one is because not only do the comments talk
> about SRCU protection, the code does so as well, e.g. eventfs_iterate().
>
> Similar argumentation for taking the mutex around cleanup_ei(). It sets
> the 'is_freed' member which others expect only to be set -- or at least
> to have a stable value -- when the mutex is hold. As 'ei' was exposed to
> its parent before, it may have escaped to other users, e.g. via
> eventfs_iterate().
>
> I may have missed something, painting these changes completely void,
> therefore spared me the time to do proper patches. However, below one is
> legit ("git am --scissors ..." yada yada):
>
> ---8<---
> Subject: [PATCH] eventfs: Don't return NULL in eventfs_create_dir()
>
> Commit 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing
> ei->dentry") added another check, testing if the parent was freed after
> we released the mutex. If so, the function returns NULL. However, all
> callers expect it to either return a valid pointer or an error pointer,
> at least since commit 5264a2f4bb3b ("tracing: Fix a NULL vs IS_ERR() bug
> in event_subsystem_dir()"). Returning NULL will therefore fail the error
> condition check in the caller.
>
> Fix this by substituting the NULL return value with a fitting error
> pointer.
>
> Fixes: 77a06c33a22d ("eventfs: Test for ei->is_freed when accessing 
> ei->dentry")
> Cc: Dan Carpenter <dan.carpen...@linaro.org>
> Signed-off-by: Mathias Krause <mini...@grsecurity.net>
> ---
>  fs/tracefs/event_inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 5d88c184f0fc..a9c28a1d5dc8 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -736,7 +736,7 @@ struct eventfs_inode *eventfs_create_dir(const char 
> *name, struct eventfs_inode
>         /* Was the parent freed? */
>         if (list_empty(&ei->list)) {
>                 cleanup_ei(ei);
> -               ei = NULL;
> +               ei = ERR_PTR(-EBUSY);
>         }
>         return ei;
>  }

Must have. LGTM. Thanks.
Reviewed-by: Ajay Kaher <ajay.ka...@broadcom.com>

- Ajay

Reply via email to