On Wed, Mar 17, 2021 at 8:52 AM Dmitry Vyukov <dvyu...@google.com> wrote:
> On Tue, Mar 16, 2021 at 5:28 PM Arnd Bergmann <a...@arndb.de> wrote:
> > On Tue, Mar 16, 2021 at 5:13 PM Dmitry Vyukov <dvyu...@google.com> wrote:
> > > On Tue, Mar 16, 2021 at 5:03 PM Arnd Bergmann <a...@arndb.de> wrote:
> > > > On Tue, Mar 16, 2021 at 4:51 PM Russell King - ARM Linux admin
> > > > <li...@armlinux.org.uk> wrote:
> > > > > On Tue, Mar 16, 2021 at 04:44:45PM +0100, Arnd Bergmann wrote:
> > > > > > On Tue, Mar 16, 2021 at 11:17 AM Dmitry Vyukov <dvyu...@google.com> 
> > > > > > wrote:
> > > > > > > The compiler is gcc version 10.2.1 20210110 (Debian 10.2.1-6)
> > > > > >
> > > > > > Ok, building with Ubuntu 10.2.1-1ubuntu1 20201207 locally, that's
> > > > > > the closest I have installed, and I think the Debian and Ubuntu 
> > > > > > versions
> > > > > > are generally quite close in case of gcc since they are maintained 
> > > > > > by
> > > > > > the same packagers.
> > > > >
> > > > > ... which shouldn't be a problem - that's just over 1/4 of the stack
> > > > > space. Could it be the syzbot's gcc is doing something weird and
> > > > > inflating the stack frames?
> > > >
> > > > It's possible, I think that's really unlikely given that it's just 
> > > > Debian's
> > > > gcc, which is as close to mainline as the version I was using.
> > > >
> > > > Uwe's DEBUG_STACKOVERFLOW patch from a while ago might
> > > > help if this was the problem though:
> > > > https://lore.kernel.org/linux-arm-kernel/20200108082913.29710-1-u.kleine-koe...@pengutronix.de/
> > > >
> > > > My best guess is something going wrong in the interrupt
> > > > that triggered the preempt_schedule() which ended up calling
> > > > task_stack_end_corrupted() in schedule_debug(), as you suggested
> > > > earlier.
> > >
> > > FWIW I see slightly larger frames with the config:
> > >
> > > 073ab64 <ima_calc_field_array_hash_tfm>:
> > > 8073ab64:       e1a0c00d        mov     ip, sp
> > > 8073ab68:       e92ddff0        push    {r4, r5, r6, r7, r8, r9, sl,
> > > fp, ip, lr, pc}
> > > 8073ab6c:       e24cb004        sub     fp, ip, #4
> > > 8073ab70:       e24ddfa7        sub     sp, sp, #668    ; 0x29c
> >
> > Yes, this is the one that the compiler complained about when warning
> > for stack over 600 bytes. It's not called in this call chain though.
> >
> > > page_alloc can also do reclaim, I had the impression that reclaim can
> > > be quite heavy-weight in all respects.
> >
> > Yes, that is another possibility. What writable file systems or swap
> > do you normally have mounted that it could be writing to, and on
> > what storage device?
>
> The root fs is ext4 on virtio-blk.
>
> There are also several dozens of shrinkers that can be called during reclaim:
> https://elixir.bootlin.com/linux/latest/C/ident/unregister_shrinker

Right, unfortunately I don't see a smoking gun there either, unless you are
also using NFS or devicemapper.

Implementing VMAP_STACK as you suggested earlier is probably the
best way to figure out if there is an actual overrun of the stack.
Alternatively, adding support for GCC_PLUGIN_STACKLEAK might
also help find out if we ever get close to the limit. This is probably
less work, but it might not actually help in this case.

        Arnd

Reply via email to