On Sun, Dec 23, 2018 at 8:37 AM Dmitry Vyukov <dvyu...@google.com> wrote: > > On Sat, Dec 22, 2018 at 8:07 PM Manfred Spraul <manf...@colorfullife.com> > wrote: > > > > Hi Dmitry, > > > > On 12/20/18 4:36 PM, Dmitry Vyukov wrote: > > > On Wed, Dec 19, 2018 at 10:04 AM Manfred Spraul > > > <manf...@colorfullife.com> wrote: > > >> Hello Dmitry, > > >> > > >> On 12/12/18 11:55 AM, Dmitry Vyukov wrote: > > >>> On Tue, Dec 11, 2018 at 9:23 PM syzbot > > >>> <syzbot+1145ec2e23165570c...@syzkaller.appspotmail.com> wrote: > > >>>> Hello, > > >>>> > > >>>> syzbot found the following crash on: > > >>>> > > >>>> HEAD commit: f5d582777bcb Merge branch 'for-linus' of > > >>>> git://git.kernel... > > >>>> git tree: upstream > > >>>> console output: > > >>>> https://syzkaller.appspot.com/x/log.txt?x=135bc547400000 > > >>>> kernel config: > > >>>> https://syzkaller.appspot.com/x/.config?x=c8970c89a0efbb23 > > >>>> dashboard link: > > >>>> https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > >>>> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > >>>> syz repro: > > >>>> https://syzkaller.appspot.com/x/repro.syz?x=16803afb400000 > > >>> +Manfred, this looks similar to the other few crashes related to > > >>> semget$private(0x0, 0x4000, 0x3f) that you looked at. > > >> I found one unexpected (incorrect?) locking, see the attached patch. > > >> > > >> But I doubt that this is the root cause of the crashes. > > > > > > But why? These one-off sporadic crashes reported by syzbot looks > > > exactly like a subtle race and your patch touches sem_exit_ns involved > > > in all reports. > > > So if you don't spot anything else, I would say close these 3 reports > > > with this patch (I see you already included Reported-by tags which is > > > great!) and then wait for syzbot reaction. Since we got 3 of them, if > > > it's still not fixed I would expect that syzbot will be able to > > > retrigger this later again. > > > > As I wrote, unless semop() is used, sma->use_global_lock is always 9 and > > nothing can happen. > > > > Every single-operation semop() reduces use_global_lock by one, i.e a > > single semop call as done here cannot trigger the bug: > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16803afb400000 > > It contains "repeat":true,"procs":6, which means that it run 6 > processes running this test in infinite loop. The last mark about > number of tests executed was: > 2018/12/11 18:38:02 executed programs: 2955 > > > But, one more finding: > > > > https://syzkaller.appspot.com/bug?extid=1145ec2e23165570c3ac > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=109ecf6e400000 > > > > The log file contain 1080 lines like these: > > > > > semget$private(..., 0x4003, ...) > > > > > > semget$private(..., 0x4006, ...) > > > > > > semget$private(..., 0x4007, ...) > > > > It ends up as kmalloc(128*0x400x), i.e. slightly more than 2 MB, an > > allocation in the 4 MB kmalloc buffer: > > > > > [ 1201.210245] kmalloc-4194304 4698112KB 4698112KB > > > > > i.e.: 1147 4 MB kmalloc blocks --> are we leaking nearly 100% of the > > semaphore arrays?? > > /\/\/\/\/\/\ > > Ha, this is definitely not healthy.
I can reproduce this infinite memory consumption with the C program: https://gist.githubusercontent.com/dvyukov/03ec54b3429ade16fa07bf8b2379aff3/raw/ae4f654e279810de2505e8fa41b73dc1d77778e6/gistfile1.txt But this is working as intended, right? It just creates infinite number of large semaphore sets, which reasonably consumes infinite amount of memory. Except that it also violates the memcg bound and a process can have effectively unlimited amount of such "drum memory" in semaphores. > > This one looks similar: > > > > https://syzkaller.appspot.com/bug?extid=c92d3646e35bc5d1a909 > > > > except that the array sizes are mixed, and thus there are kmalloc-1M and > > kmalloc-2M as well. > > > > (and I did not count the number of semget calls) > > > > > > The test apps use unshare(CLONE_NEWNS) and unshare(CLONE_NEWIPC), correct? > > > > I.e. no CLONE_NEWUSER. > > > > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1523 > > CLONE_NEWUSER is used on some instances as well: > https://github.com/google/syzkaller/blob/master/executor/common_linux.h#L1765 > This crash happened on 2 different instances and 1 of them uses > CLONE_NEWUSER and another does not. > If it's important because of CAP_ADMIN in IPC namespace, then all > instances should have it (instances that don't use NEWUSER are just > root).