On Tue, Jan 24, 2017 at 4:48 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote: > On Tue, Jan 24, 2017 at 02:37:28PM +0100, Dmitry Vyukov wrote: >> On Mon, Jan 23, 2017 at 11:50 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote: >> > On Mon, Jan 23, 2017 at 07:43:18PM +0100, Dmitry Vyukov wrote: >> >> Hello, >> >> >> >> I am getting the following warnings while running syzkaller fuzzer on >> >> 7a308bb3016f57e5be11a677d15b821536419d36: >> >> >> >> WARNING: kernel stack frame pointer at ffff880065f7fef8 in >> >> kworker/1:3:21075 has bad value ffffffff81598080 >> >> unwind stack type:0 next_sp: (null) mask:2 graph_idx:0 >> >> ... >> >> >> >> Here are 4 examples: >> >> >> >> https://gist.githubusercontent.com/dvyukov/7258fca11e5c3db6e8f8fb684e6168ff/raw/2ca14f3c7294c7e466611f8ecfb3072c676a657c/gistfile1.txt >> >> https://gist.githubusercontent.com/dvyukov/cc442b144ae712f0e1bdeb065085e36a/raw/f713cef7e9689914fc94838a0a7d4192224fb6d2/gistfile1.txt >> >> https://gist.githubusercontent.com/dvyukov/9058cce2647e3d37157c283c6400498c/raw/7a8ec8d71b9e95eeb24156073eeef96ba61d95ea/gistfile1.txt >> >> https://gist.githubusercontent.com/dvyukov/b899dfa72d611b511a674b63676921f3/raw/90203ed26b6674dedba8c1c16dd76309ef21fbea/gistfile1.txt >> >> >> >> Unfortunately this is not reproducible, but there is a stack dump, so >> >> maybe it's enough to figure out the root cause. >> > >> > This seems like some kind of stack corruption. Here are the stack >> > traces: >> > >> >> https://gist.githubusercontent.com/dvyukov/7258fca11e5c3db6e8f8fb684e6168ff/raw/2ca14f3c7294c7e466611f8ecfb3072c676a657c/gistfile1.txt >> > >> > ffff8800520eef68: ffffffff812b148e (__save_stack_trace+0x6e/0xd0) >> > ffff8800520eefe8: ffffffff812b1506 (save_stack_trace+0x16/0x20) >> > ffff8800520eeff8: ffffffff81a0c843 (save_stack+0x43/0xd0) >> > ffff8800520ef228: ffffffff81a0d17f (kasan_slab_free+0x6f/0xb0) >> > ffff8800520ef250: ffffffff81a0a4e1 (kmem_cache_free+0x71/0x240) >> > ffff8800520ef280: ffffffff822924d4 (ioc_release_fn+0x2e4/0x380) >> > ffff8800520ef550: ffffffff814b4900 (process_one_work+0xbd0/0x1c10) >> > ffff8800520ef9a8: ffffffff814b5b63 (worker_thread+0x223/0x1990) >> > ffff8800520eff00: ffffffff814ce526 (kthread+0x326/0x3f0) >> > >> >> https://gist.githubusercontent.com/dvyukov/cc442b144ae712f0e1bdeb065085e36a/raw/f713cef7e9689914fc94838a0a7d4192224fb6d2/gistfile1.txt >> > >> > ffff88005b40ef68: ffffffff812b260e (__save_stack_trace+0x6e/0xd0) >> > ffff88005b40efe8: ffffffff812b2686 (save_stack_trace+0x16/0x20) >> > ffff88005b40eff8: ffffffff81a0e8c3 (save_stack+0x43/0xd0) >> > ffff88005b40f228: ffffffff81a0f1ff (kasan_slab_free+0x6f/0xb0) >> > ffff88005b40f250: ffffffff81a0c561 (kmem_cache_free+0x71/0x240) >> > ffff88005b40f280: ffffffff822943b4 (ioc_release_fn+0x2e4/0x380) >> > ffff88005b40f550: ffffffff814b6980 (process_one_work+0xbd0/0x1c10) >> > ffff88005b40f9a8: ffffffff814b7be3 (worker_thread+0x223/0x1990) >> > ffff88005b40ff00: ffffffff814d05a6 (kthread+0x326/0x3f0) >> > >> >> https://gist.githubusercontent.com/dvyukov/9058cce2647e3d37157c283c6400498c/raw/7a8ec8d71b9e95eeb24156073eeef96ba61d95ea/gistfile1.txt >> > >> > ffff880065f7ef68: ffffffff812b260e (__save_stack_trace+0x6e/0xd0) >> > ffff880065f7efe8: ffffffff812b2686 (save_stack_trace+0x16/0x20) >> > ffff880065f7eff8: ffffffff81a0e8c3 (save_stack+0x43/0xd0) >> > ffff880065f7f228: ffffffff81a0f1ff (kasan_slab_free+0x6f/0xb0) >> > ffff880065f7f250: ffffffff81a0c561 (kmem_cache_free+0x71/0x240) >> > ffff880065f7f280: ffffffff822943b4 (ioc_release_fn+0x2e4/0x380) >> > ffff880065f7f550: ffffffff814b6980 (process_one_work+0xbd0/0x1c10) >> > ffff880065f7f9a8: ffffffff814b7be3 (worker_thread+0x223/0x1990) >> > ffff880065f7ff00: ffffffff814d05a6 (kthread+0x326/0x3f0) >> > >> >> https://gist.githubusercontent.com/dvyukov/b899dfa72d611b511a674b63676921f3/raw/90203ed26b6674dedba8c1c16dd76309ef21fbea/gistfile1.txt >> > >> > ffff880065e57220: ffffffff812b148e (__save_stack_trace+0x6e/0xd0) >> > ffff880065e572a0: ffffffff812b1506 (save_stack_trace+0x16/0x20) >> > ffff880065e572b0: ffffffff81a0c843 (save_stack+0x43/0xd0) >> > ffff880065e574e0: ffffffff81a0cb0a (kasan_kmalloc+0xaa/0xd0) >> > ffff880065e57520: ffffffff81a0d102 (kasan_slab_alloc+0x12/0x20) >> > ffff880065e57530: ffffffff81a08704 (kmem_cache_alloc+0xf4/0x680) >> > ffff880065e57598: ffffffff823293b3 (ida_pre_get+0x1f3/0x3b0) >> > ffff880065e57720: ffffffff823296d8 (ida_simple_get+0x168/0x320) >> > ffff880065e578e8: ffffffff814ad926 (create_worker+0x96/0x620) >> > ffff880065e579a8: ffffffff814b68ad (worker_thread+0xf6d/0x1990) >> > ffff880065e57f00: ffffffff814ce526 (kthread+0x326/0x3f0) >> > >> > In all four cases, the frame pointer pushed by worker_thread() is >> > somehow changed to be the address at the end of >> > debug_check_no_locks_freed(): >> > >> > ffff880065f7fef8: ffffffff81598080 >> > (debug_check_no_locks_freed+0x5c0/0x5c0) >> > >> > Can you please share the following: >> > >> > - The output of "grep ffffffff81598080 System.map" >> > - The .config file >> >> >> Hi, >> >> Thanks for looking at this. >> >> I've attached my config. >> Unfortunately I don't have that System.map/vmlinux, I've rebased my >> kernel and rebuilt it. But here is what after >> debug_check_no_locks_freed how: >> >> vmlinux:ffffffff815956a0 0000000000000467 T lockdep_count_backward_deps >> vmlinux:ffffffff81595b10 00000000000005b6 T debug_check_no_locks_freed >> vmlinux:ffffffff815960d0 0000000000003428 t __lock_acquire >> >> >> But If you suspect a corruption, then I would suggest to stop spending >> more time on this now. At least until I have more data. >> This happened while I stress tested KVM, and I've seen a low rate of >> what looks like random memory corruptions (e.g. corrupted slab >> freelists and other unexplainable things). So let's write it off onto >> KVM for now. > > Ok. But there are some interesting clues there, so if you see it again, > please provide the System.map and .config and I'll take another look.
ack > What really has me confused is why the worker_thread() stack seems so > big. According to the stack dumps, worker_thread() had a stack size of > 1368 bytes, but when I build with your .config, it's only 208 bytes. My compiler supports KASAN stack instrumentation (--param asan-stack=1) and KASAN use-after-scope (-fsanitize-address-use-after-scope). Both increase frames. But you need gcc 7.0 or something. > Other interesting features of this corruption: > > - always after kthread() called worker_thread() > - always the same value written, presumably the address of > __lock_acquire() > - always to the same offset on the stack (the location of the frame > pointer pushed by worker_thread()) > > ... so it's not exactly random memory corruption. In fact it looks like > "normal" stack corruption. But then again, KASAN is enabled, which > should catch most cases of stack corruption (buffer overflows). Good catch. The other corruptions that I saw were more like typical corruptions -- e.g. freelist link was NULL or completely random garbage. My current (weak) hypothesis is that the corruptions are KVM-related, and so there are good chances that KASAN won't catch them (e.g. writes from guest corrupt host kernel).