On Wed, Mar 17, 2021 at 8:52 AM Dmitry Vyukov <dvyu...@google.com> wrote: > On Tue, Mar 16, 2021 at 5:28 PM Arnd Bergmann <a...@arndb.de> wrote: > > On Tue, Mar 16, 2021 at 5:13 PM Dmitry Vyukov <dvyu...@google.com> wrote: > > > On Tue, Mar 16, 2021 at 5:03 PM Arnd Bergmann <a...@arndb.de> wrote: > > > > On Tue, Mar 16, 2021 at 4:51 PM Russell King - ARM Linux admin > > > > <li...@armlinux.org.uk> wrote: > > > > > On Tue, Mar 16, 2021 at 04:44:45PM +0100, Arnd Bergmann wrote: > > > > > > On Tue, Mar 16, 2021 at 11:17 AM Dmitry Vyukov <dvyu...@google.com> > > > > > > wrote: > > > > > > > The compiler is gcc version 10.2.1 20210110 (Debian 10.2.1-6) > > > > > > > > > > > > Ok, building with Ubuntu 10.2.1-1ubuntu1 20201207 locally, that's > > > > > > the closest I have installed, and I think the Debian and Ubuntu > > > > > > versions > > > > > > are generally quite close in case of gcc since they are maintained > > > > > > by > > > > > > the same packagers. > > > > > > > > > > ... which shouldn't be a problem - that's just over 1/4 of the stack > > > > > space. Could it be the syzbot's gcc is doing something weird and > > > > > inflating the stack frames? > > > > > > > > It's possible, I think that's really unlikely given that it's just > > > > Debian's > > > > gcc, which is as close to mainline as the version I was using. > > > > > > > > Uwe's DEBUG_STACKOVERFLOW patch from a while ago might > > > > help if this was the problem though: > > > > https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koe...@pengutronix.de/ > > > > > > > > My best guess is something going wrong in the interrupt > > > > that triggered the preempt_schedule() which ended up calling > > > > task_stack_end_corrupted() in schedule_debug(), as you suggested > > > > earlier. > > > > > > FWIW I see slightly larger frames with the config: > > > > > > 073ab64 <ima_calc_field_array_hash_tfm>: > > > 8073ab64: e1a0c00d mov ip, sp > > > 8073ab68: e92ddff0 push {r4, r5, r6, r7, r8, r9, sl, > > > fp, ip, lr, pc} > > > 8073ab6c: e24cb004 sub fp, ip, #4 > > > 8073ab70: e24ddfa7 sub sp, sp, #668 ; 0x29c > > > > Yes, this is the one that the compiler complained about when warning > > for stack over 600 bytes. It's not called in this call chain though. > > > > > page_alloc can also do reclaim, I had the impression that reclaim can > > > be quite heavy-weight in all respects. > > > > Yes, that is another possibility. What writable file systems or swap > > do you normally have mounted that it could be writing to, and on > > what storage device? > > The root fs is ext4 on virtio-blk. > > There are also several dozens of shrinkers that can be called during reclaim: > https://elixir.bootlin.com/linux/latest/C/ident/unregister_shrinker
Right, unfortunately I don't see a smoking gun there either, unless you are also using NFS or devicemapper. Implementing VMAP_STACK as you suggested earlier is probably the best way to figure out if there is an actual overrun of the stack. Alternatively, adding support for GCC_PLUGIN_STACKLEAK might also help find out if we ever get close to the limit. This is probably less work, but it might not actually help in this case. Arnd