On Tue, Sep 3, 2024 at 4:54 PM Thomas Gleixner <[email protected]> wrote: > On Tue, Sep 03 2024 at 12:24, Kees Cook wrote: > > On Tue, Sep 03, 2024 at 03:22:17PM -0400, Paul Moore wrote: > >> > > might_alloc include/linux/sched/mm.h:337 [inline] > >> > > slab_pre_alloc_hook mm/slub.c:3987 [inline] > >> > > slab_alloc_node mm/slub.c:4065 [inline] > >> > > kmem_cache_alloc_noprof+0x5d/0x2a0 mm/slub.c:4092 > >> > > audit_buffer_alloc kernel/audit.c:1790 [inline] > >> > > audit_log_start+0x15e/0xa30 kernel/audit.c:1912 > >> > > audit_seccomp+0x63/0x1f0 kernel/auditsc.c:3007 > >> > >> The audit_seccomp() function allocates an audit buffer using > >> GFP_KERNEL, which should be the source of the might_sleep. We can fix > >> that easily enough by moving to GFP_ATOMIC (either for just this code > >> path or all callers, need to check that), but I just want to confirm > >> that we can't sleep here? I haven't dug into the syscall code in a > >> while, so I don't recall all the details, but it seems odd to me that > >> we can't safely sleep here ... > > > > I had a similar question.. this is at syscall entry time. What is > > suddenly different here? We've been doing seccomp logging here for > > years... > > Correct. > > syscall_enter_from_user_mode() enables interrupts. At that point > preempt_count is 0. So after that the task can sleep and schedule. > Nothing in the call chain leading up to the allocation disables > preemption or interrupts. > > From the actual console log: > > do not call blocking ops when !TASK_RUNNING; state=2 set at > [<ffffffff81908f9e>] audit_log_start+0x37e/0xa30 > > I have no idea how that state would leak accross schedule_timeout().
Okay, with no obvious root cause and no reproducer, I'm going to ignore this for now. If we start to see this pop up on real systems and/or syzbot finds a reproducer we can dig into it more. -- paul-moore.com
