Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 9.03.2017 03:58, Theodore Ts'o wrote: > On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote: >> So this is wrong, the reason why the issues seemed fix is because I >> switched my compiler to version 5.4.0. So this manifests only if I'm >> using gcc 4.7.4. With the pr_info added here is the output of a boot. So >> there are multiple invocations of ext4_ext_map_blocks and the freeing, >> including with the address being used in subsequent kasan reports : >> 88006ae8fdb0 > > Can you help bisect this, then? I'm using Debian Testing, and the > default gcc is gcc 6.3.0. I'm currently forcing the use of gcc 5.4.1 > because I was running into problems with gcc 6.x a while back. (TBH, > I was thinking about trying to see if gcc 6.3 was stable for kernel > compiles when I had some spare time.) But I don't have access to > *any* gcc 4.x on my development system, and I don't think I've tried > using gcc 4.x in a long, Long, LONG time. > > I'm currently kicking off a test run using 5.4.1 with KASAN enabled to > see if I can trigger it myself. Can you send me a copy of your > .config so I can see what else might be interesting with your config? > (e.g., SLAB vs SLUB, etc.) Attached the config. FUrther debugging and talking with the kasan developers I think this actually might be a kasan problem when used with an old compiler. I bisected this all the way to 1771c6e1a567ea0ba2, which is the commit introducing the user access instrumentation. Here is a mail thread where I confirmed that this might be a kasan issue : https://lkml.org/lkml/2017/3/8/69 What I believe is happening is that the manual checks inserted in user access code misses some context information due to instrumentation not inserted by the compiler. Kasan gets confused as a result, hence the warnings. > > Thanks, > > - Ted > # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.7.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=4 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="-nbor" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y # CONFIG_USELIB is not set # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y #
Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 9.03.2017 03:58, Theodore Ts'o wrote: > On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote: >> So this is wrong, the reason why the issues seemed fix is because I >> switched my compiler to version 5.4.0. So this manifests only if I'm >> using gcc 4.7.4. With the pr_info added here is the output of a boot. So >> there are multiple invocations of ext4_ext_map_blocks and the freeing, >> including with the address being used in subsequent kasan reports : >> 88006ae8fdb0 > > Can you help bisect this, then? I'm using Debian Testing, and the > default gcc is gcc 6.3.0. I'm currently forcing the use of gcc 5.4.1 > because I was running into problems with gcc 6.x a while back. (TBH, > I was thinking about trying to see if gcc 6.3 was stable for kernel > compiles when I had some spare time.) But I don't have access to > *any* gcc 4.x on my development system, and I don't think I've tried > using gcc 4.x in a long, Long, LONG time. > > I'm currently kicking off a test run using 5.4.1 with KASAN enabled to > see if I can trigger it myself. Can you send me a copy of your > .config so I can see what else might be interesting with your config? > (e.g., SLAB vs SLUB, etc.) Attached the config. FUrther debugging and talking with the kasan developers I think this actually might be a kasan problem when used with an old compiler. I bisected this all the way to 1771c6e1a567ea0ba2, which is the commit introducing the user access instrumentation. Here is a mail thread where I confirmed that this might be a kasan issue : https://lkml.org/lkml/2017/3/8/69 What I believe is happening is that the manual checks inserted in user access code misses some context information due to instrumentation not inserted by the compiler. Kasan gets confused as a result, hence the warnings. > > Thanks, > > - Ted > # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.7.0 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=4 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="-nbor" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y # CONFIG_USELIB is not set # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y #
Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote: > So this is wrong, the reason why the issues seemed fix is because I > switched my compiler to version 5.4.0. So this manifests only if I'm > using gcc 4.7.4. With the pr_info added here is the output of a boot. So > there are multiple invocations of ext4_ext_map_blocks and the freeing, > including with the address being used in subsequent kasan reports : > 88006ae8fdb0 Can you help bisect this, then? I'm using Debian Testing, and the default gcc is gcc 6.3.0. I'm currently forcing the use of gcc 5.4.1 because I was running into problems with gcc 6.x a while back. (TBH, I was thinking about trying to see if gcc 6.3 was stable for kernel compiles when I had some spare time.) But I don't have access to *any* gcc 4.x on my development system, and I don't think I've tried using gcc 4.x in a long, Long, LONG time. I'm currently kicking off a test run using 5.4.1 with KASAN enabled to see if I can trigger it myself. Can you send me a copy of your .config so I can see what else might be interesting with your config? (e.g., SLAB vs SLUB, etc.) Thanks, - Ted
Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On Tue, Mar 07, 2017 at 10:40:53PM +0200, Nikolay Borisov wrote: > So this is wrong, the reason why the issues seemed fix is because I > switched my compiler to version 5.4.0. So this manifests only if I'm > using gcc 4.7.4. With the pr_info added here is the output of a boot. So > there are multiple invocations of ext4_ext_map_blocks and the freeing, > including with the address being used in subsequent kasan reports : > 88006ae8fdb0 Can you help bisect this, then? I'm using Debian Testing, and the default gcc is gcc 6.3.0. I'm currently forcing the use of gcc 5.4.1 because I was running into problems with gcc 6.x a while back. (TBH, I was thinking about trying to see if gcc 6.3 was stable for kernel compiles when I had some spare time.) But I don't have access to *any* gcc 4.x on my development system, and I don't think I've tried using gcc 4.x in a long, Long, LONG time. I'm currently kicking off a test run using 5.4.1 with KASAN enabled to see if I can trigger it myself. Can you send me a copy of your .config so I can see what else might be interesting with your config? (e.g., SLAB vs SLUB, etc.) Thanks, - Ted
Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 7.03.2017 16:33, Nikolay Borisov wrote: > > > On 7.03.2017 11:38, Nikolay Borisov wrote: >> >> >> On 7.03.2017 00:35, Rafael J. Wysocki wrote: >>> On Mon, Mar 6, 2017 at 9:31 PM, Nikolay Borisov >>>wrote: Hello, Booting 4.11-rc1 with kasan enabled and "slub_debug=F" produces the following errors: [7.070797] == [7.071724] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr 88006bc2b0ae [7.071724] Read of size 20 by task systemd/1 [7.071724] CPU: 1 PID: 1 Comm: systemd Not tainted 4.11.0-rc1-nbor #150 [7.071724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [7.071724] Call Trace: [7.071724] dump_stack+0x85/0xc9 [7.071724] kasan_object_err+0x2c/0x90 [7.071724] kasan_report+0x285/0x510 [7.071724] check_memory_region+0x137/0x160 [7.071724] kasan_check_read+0x11/0x20 [7.071724] filldir+0xc3/0x160 [7.071724] call_filldir+0x88/0x140 [7.071724] ext4_readdir+0x757/0x920 [7.071724] ? iterate_dir+0x49/0x190 [7.071724] iterate_dir+0x7d/0x190 [7.071724] ? entry_SYSCALL_64_fastpath+0x5/0xc6 [7.071724] SyS_getdents+0xac/0x170 [7.071724] ? filldir64+0x170/0x170 [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 [7.071724] RIP: 0033:0x7fa37ca2dd3b [7.071724] RSP: 002b:7ffc63daf400 EFLAGS: 0206 ORIG_RAX: 004e [7.071724] RAX: ffda RBX: 0046 RCX: 7fa37ca2dd3b [7.071724] RDX: 8000 RSI: 560b369e4a10 RDI: 0004 [7.071724] RBP: 7fa37cd29b20 R08: 7fa37cd29bd8 R09: [7.071724] R10: 008f R11: 0206 R12: 8041 [7.071724] R13: 7fa37cd29b78 R14: 270f R15: 7fa37cd29b78 [7.071724] Object at 88006bc2b080, in cache kmalloc-96 size: 96 [7.071724] Allocated: [7.071724] PID = 1 [7.071724] save_stack_trace+0x1b/0x20 [7.071724] kasan_kmalloc.part.4+0x64/0xf0 [7.071724] kasan_kmalloc+0x85/0xb0 [7.071724] __kmalloc+0x12b/0x320 [7.071724] ext4_htree_store_dirent+0x3e/0x120 [7.071724] htree_dirblock_to_tree+0xb9/0x1a0 [7.071724] ext4_htree_fill_tree+0xa3/0x310 [7.071724] ext4_readdir+0x6a9/0x920 [7.071724] iterate_dir+0x7d/0x190 [7.071724] SyS_getdents+0xac/0x170 [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 [7.071724] Freed: [7.071724] PID = 1 [7.071724] save_stack_trace+0x1b/0x20 [7.071724] kasan_slab_free+0xbe/0x190 [7.071724] kfree+0xff/0x2f0 [7.071724] acpi_ut_evaluate_object+0x18e/0x19d [7.071724] acpi_ut_execute_STA+0x26/0x53 [7.071724] acpi_ns_get_device_callback+0x73/0x163 [7.071724] acpi_ns_walk_namespace+0xc0/0x17a [7.071724] acpi_get_devices+0x66/0x7d [7.071724] pnpacpi_init+0x52/0x74 [7.071724] do_one_initcall+0x51/0x1b0 [7.071724] kernel_init_freeable+0x20a/0x2a1 [7.071724] kernel_init+0xe/0x100 [7.071724] ret_from_fork+0x31/0x40 [7.071724] Memory state around the buggy address: [7.071724] 88006bc2af80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [7.071724] 88006bc2b000: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [7.071724] >88006bc2b080: 00 00 00 00 00 00 00 00 05 fc fc fc fc fc fc fc [7.071724]^ [7.071724] 88006bc2b100: 00 00 00 00 00 00 00 00 00 04 fc fc fc fc fc fc [7.071724] 88006bc2b180: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc Not killing the VM instantly produces a continuous stream of kasan errors. Most of them are identical to the one above, however there was one which was different: [5.846193] == [5.846787] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr 88006c783eae [5.847177] Read of size 22 by task systemd/1 [5.847177] CPU: 3 PID: 1 Comm: systemd Tainted: GB 4.11.0-rc1-nbor #150 [5.847177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [5.847177] Call Trace: [5.847177] dump_stack+0x85/0xc9 [5.847177] kasan_object_err+0x2c/0x90 [5.847177] kasan_report+0x285/0x510 [5.847177] check_memory_region+0x137/0x160 [5.847177]
Re: Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 7.03.2017 16:33, Nikolay Borisov wrote: > > > On 7.03.2017 11:38, Nikolay Borisov wrote: >> >> >> On 7.03.2017 00:35, Rafael J. Wysocki wrote: >>> On Mon, Mar 6, 2017 at 9:31 PM, Nikolay Borisov >>> wrote: Hello, Booting 4.11-rc1 with kasan enabled and "slub_debug=F" produces the following errors: [7.070797] == [7.071724] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr 88006bc2b0ae [7.071724] Read of size 20 by task systemd/1 [7.071724] CPU: 1 PID: 1 Comm: systemd Not tainted 4.11.0-rc1-nbor #150 [7.071724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [7.071724] Call Trace: [7.071724] dump_stack+0x85/0xc9 [7.071724] kasan_object_err+0x2c/0x90 [7.071724] kasan_report+0x285/0x510 [7.071724] check_memory_region+0x137/0x160 [7.071724] kasan_check_read+0x11/0x20 [7.071724] filldir+0xc3/0x160 [7.071724] call_filldir+0x88/0x140 [7.071724] ext4_readdir+0x757/0x920 [7.071724] ? iterate_dir+0x49/0x190 [7.071724] iterate_dir+0x7d/0x190 [7.071724] ? entry_SYSCALL_64_fastpath+0x5/0xc6 [7.071724] SyS_getdents+0xac/0x170 [7.071724] ? filldir64+0x170/0x170 [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 [7.071724] RIP: 0033:0x7fa37ca2dd3b [7.071724] RSP: 002b:7ffc63daf400 EFLAGS: 0206 ORIG_RAX: 004e [7.071724] RAX: ffda RBX: 0046 RCX: 7fa37ca2dd3b [7.071724] RDX: 8000 RSI: 560b369e4a10 RDI: 0004 [7.071724] RBP: 7fa37cd29b20 R08: 7fa37cd29bd8 R09: [7.071724] R10: 008f R11: 0206 R12: 8041 [7.071724] R13: 7fa37cd29b78 R14: 270f R15: 7fa37cd29b78 [7.071724] Object at 88006bc2b080, in cache kmalloc-96 size: 96 [7.071724] Allocated: [7.071724] PID = 1 [7.071724] save_stack_trace+0x1b/0x20 [7.071724] kasan_kmalloc.part.4+0x64/0xf0 [7.071724] kasan_kmalloc+0x85/0xb0 [7.071724] __kmalloc+0x12b/0x320 [7.071724] ext4_htree_store_dirent+0x3e/0x120 [7.071724] htree_dirblock_to_tree+0xb9/0x1a0 [7.071724] ext4_htree_fill_tree+0xa3/0x310 [7.071724] ext4_readdir+0x6a9/0x920 [7.071724] iterate_dir+0x7d/0x190 [7.071724] SyS_getdents+0xac/0x170 [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 [7.071724] Freed: [7.071724] PID = 1 [7.071724] save_stack_trace+0x1b/0x20 [7.071724] kasan_slab_free+0xbe/0x190 [7.071724] kfree+0xff/0x2f0 [7.071724] acpi_ut_evaluate_object+0x18e/0x19d [7.071724] acpi_ut_execute_STA+0x26/0x53 [7.071724] acpi_ns_get_device_callback+0x73/0x163 [7.071724] acpi_ns_walk_namespace+0xc0/0x17a [7.071724] acpi_get_devices+0x66/0x7d [7.071724] pnpacpi_init+0x52/0x74 [7.071724] do_one_initcall+0x51/0x1b0 [7.071724] kernel_init_freeable+0x20a/0x2a1 [7.071724] kernel_init+0xe/0x100 [7.071724] ret_from_fork+0x31/0x40 [7.071724] Memory state around the buggy address: [7.071724] 88006bc2af80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [7.071724] 88006bc2b000: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [7.071724] >88006bc2b080: 00 00 00 00 00 00 00 00 05 fc fc fc fc fc fc fc [7.071724]^ [7.071724] 88006bc2b100: 00 00 00 00 00 00 00 00 00 04 fc fc fc fc fc fc [7.071724] 88006bc2b180: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc Not killing the VM instantly produces a continuous stream of kasan errors. Most of them are identical to the one above, however there was one which was different: [5.846193] == [5.846787] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr 88006c783eae [5.847177] Read of size 22 by task systemd/1 [5.847177] CPU: 3 PID: 1 Comm: systemd Tainted: GB 4.11.0-rc1-nbor #150 [5.847177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [5.847177] Call Trace: [5.847177] dump_stack+0x85/0xc9 [5.847177] kasan_object_err+0x2c/0x90 [5.847177] kasan_report+0x285/0x510 [5.847177] check_memory_region+0x137/0x160 [5.847177] kasan_check_read+0x11/0x20 [5.847177]
Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 7.03.2017 11:38, Nikolay Borisov wrote: > > > On 7.03.2017 00:35, Rafael J. Wysocki wrote: >> On Mon, Mar 6, 2017 at 9:31 PM, Nikolay Borisov >>wrote: >>> Hello, >>> >>> Booting 4.11-rc1 with kasan enabled and "slub_debug=F" produces the >>> following errors: >>> >>> [7.070797] >>> == >>> [7.071724] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr >>> 88006bc2b0ae >>> [7.071724] Read of size 20 by task systemd/1 >>> [7.071724] CPU: 1 PID: 1 Comm: systemd Not tainted 4.11.0-rc1-nbor #150 >>> [7.071724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >>> [7.071724] Call Trace: >>> [7.071724] dump_stack+0x85/0xc9 >>> [7.071724] kasan_object_err+0x2c/0x90 >>> [7.071724] kasan_report+0x285/0x510 >>> [7.071724] check_memory_region+0x137/0x160 >>> [7.071724] kasan_check_read+0x11/0x20 >>> [7.071724] filldir+0xc3/0x160 >>> [7.071724] call_filldir+0x88/0x140 >>> [7.071724] ext4_readdir+0x757/0x920 >>> [7.071724] ? iterate_dir+0x49/0x190 >>> [7.071724] iterate_dir+0x7d/0x190 >>> [7.071724] ? entry_SYSCALL_64_fastpath+0x5/0xc6 >>> [7.071724] SyS_getdents+0xac/0x170 >>> [7.071724] ? filldir64+0x170/0x170 >>> [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 >>> [7.071724] RIP: 0033:0x7fa37ca2dd3b >>> [7.071724] RSP: 002b:7ffc63daf400 EFLAGS: 0206 ORIG_RAX: >>> 004e >>> [7.071724] RAX: ffda RBX: 0046 RCX: >>> 7fa37ca2dd3b >>> [7.071724] RDX: 8000 RSI: 560b369e4a10 RDI: >>> 0004 >>> [7.071724] RBP: 7fa37cd29b20 R08: 7fa37cd29bd8 R09: >>> >>> [7.071724] R10: 008f R11: 0206 R12: >>> 8041 >>> [7.071724] R13: 7fa37cd29b78 R14: 270f R15: >>> 7fa37cd29b78 >>> [7.071724] Object at 88006bc2b080, in cache kmalloc-96 size: 96 >>> [7.071724] Allocated: >>> [7.071724] PID = 1 >>> [7.071724] save_stack_trace+0x1b/0x20 >>> [7.071724] kasan_kmalloc.part.4+0x64/0xf0 >>> [7.071724] kasan_kmalloc+0x85/0xb0 >>> [7.071724] __kmalloc+0x12b/0x320 >>> [7.071724] ext4_htree_store_dirent+0x3e/0x120 >>> [7.071724] htree_dirblock_to_tree+0xb9/0x1a0 >>> [7.071724] ext4_htree_fill_tree+0xa3/0x310 >>> [7.071724] ext4_readdir+0x6a9/0x920 >>> [7.071724] iterate_dir+0x7d/0x190 >>> [7.071724] SyS_getdents+0xac/0x170 >>> [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 >>> [7.071724] Freed: >>> [7.071724] PID = 1 >>> [7.071724] save_stack_trace+0x1b/0x20 >>> [7.071724] kasan_slab_free+0xbe/0x190 >>> [7.071724] kfree+0xff/0x2f0 >>> [7.071724] acpi_ut_evaluate_object+0x18e/0x19d >>> [7.071724] acpi_ut_execute_STA+0x26/0x53 >>> [7.071724] acpi_ns_get_device_callback+0x73/0x163 >>> [7.071724] acpi_ns_walk_namespace+0xc0/0x17a >>> [7.071724] acpi_get_devices+0x66/0x7d >>> [7.071724] pnpacpi_init+0x52/0x74 >>> [7.071724] do_one_initcall+0x51/0x1b0 >>> [7.071724] kernel_init_freeable+0x20a/0x2a1 >>> [7.071724] kernel_init+0xe/0x100 >>> [7.071724] ret_from_fork+0x31/0x40 >>> [7.071724] Memory state around the buggy address: >>> [7.071724] 88006bc2af80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc >>> fc fc >>> [7.071724] 88006bc2b000: fb fb fb fb fb fb fb fb fb fb fb fb fc fc >>> fc fc >>> [7.071724] >88006bc2b080: 00 00 00 00 00 00 00 00 05 fc fc fc fc fc >>> fc fc >>> [7.071724]^ >>> [7.071724] 88006bc2b100: 00 00 00 00 00 00 00 00 00 04 fc fc fc fc >>> fc fc >>> [7.071724] 88006bc2b180: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc >>> fc fc >>> >>> Not killing the VM instantly produces a continuous stream of kasan errors. >>> Most of them >>> are identical to the one above, however there was one which was different: >>> >>> [5.846193] >>> == >>> [5.846787] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr >>> 88006c783eae >>> [5.847177] Read of size 22 by task systemd/1 >>> [5.847177] CPU: 3 PID: 1 Comm: systemd Tainted: GB >>> 4.11.0-rc1-nbor #150 >>> [5.847177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >>> [5.847177] Call Trace: >>> [5.847177] dump_stack+0x85/0xc9 >>> [5.847177] kasan_object_err+0x2c/0x90 >>> [5.847177] kasan_report+0x285/0x510 >>> [5.847177] check_memory_region+0x137/0x160 >>> [5.847177] kasan_check_read+0x11/0x20 >>> [5.847177] filldir+0xc3/0x160 >>> [5.847177] call_filldir+0x88/0x140 >>> [5.847177] ext4_readdir+0x757/0x920 >>> [5.847177] ?
Race condition in ext4 (was Re: 4.11-rc1 acpi stomping ext4 slabs)
On 7.03.2017 11:38, Nikolay Borisov wrote: > > > On 7.03.2017 00:35, Rafael J. Wysocki wrote: >> On Mon, Mar 6, 2017 at 9:31 PM, Nikolay Borisov >> wrote: >>> Hello, >>> >>> Booting 4.11-rc1 with kasan enabled and "slub_debug=F" produces the >>> following errors: >>> >>> [7.070797] >>> == >>> [7.071724] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr >>> 88006bc2b0ae >>> [7.071724] Read of size 20 by task systemd/1 >>> [7.071724] CPU: 1 PID: 1 Comm: systemd Not tainted 4.11.0-rc1-nbor #150 >>> [7.071724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >>> [7.071724] Call Trace: >>> [7.071724] dump_stack+0x85/0xc9 >>> [7.071724] kasan_object_err+0x2c/0x90 >>> [7.071724] kasan_report+0x285/0x510 >>> [7.071724] check_memory_region+0x137/0x160 >>> [7.071724] kasan_check_read+0x11/0x20 >>> [7.071724] filldir+0xc3/0x160 >>> [7.071724] call_filldir+0x88/0x140 >>> [7.071724] ext4_readdir+0x757/0x920 >>> [7.071724] ? iterate_dir+0x49/0x190 >>> [7.071724] iterate_dir+0x7d/0x190 >>> [7.071724] ? entry_SYSCALL_64_fastpath+0x5/0xc6 >>> [7.071724] SyS_getdents+0xac/0x170 >>> [7.071724] ? filldir64+0x170/0x170 >>> [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 >>> [7.071724] RIP: 0033:0x7fa37ca2dd3b >>> [7.071724] RSP: 002b:7ffc63daf400 EFLAGS: 0206 ORIG_RAX: >>> 004e >>> [7.071724] RAX: ffda RBX: 0046 RCX: >>> 7fa37ca2dd3b >>> [7.071724] RDX: 8000 RSI: 560b369e4a10 RDI: >>> 0004 >>> [7.071724] RBP: 7fa37cd29b20 R08: 7fa37cd29bd8 R09: >>> >>> [7.071724] R10: 008f R11: 0206 R12: >>> 8041 >>> [7.071724] R13: 7fa37cd29b78 R14: 270f R15: >>> 7fa37cd29b78 >>> [7.071724] Object at 88006bc2b080, in cache kmalloc-96 size: 96 >>> [7.071724] Allocated: >>> [7.071724] PID = 1 >>> [7.071724] save_stack_trace+0x1b/0x20 >>> [7.071724] kasan_kmalloc.part.4+0x64/0xf0 >>> [7.071724] kasan_kmalloc+0x85/0xb0 >>> [7.071724] __kmalloc+0x12b/0x320 >>> [7.071724] ext4_htree_store_dirent+0x3e/0x120 >>> [7.071724] htree_dirblock_to_tree+0xb9/0x1a0 >>> [7.071724] ext4_htree_fill_tree+0xa3/0x310 >>> [7.071724] ext4_readdir+0x6a9/0x920 >>> [7.071724] iterate_dir+0x7d/0x190 >>> [7.071724] SyS_getdents+0xac/0x170 >>> [7.071724] entry_SYSCALL_64_fastpath+0x23/0xc6 >>> [7.071724] Freed: >>> [7.071724] PID = 1 >>> [7.071724] save_stack_trace+0x1b/0x20 >>> [7.071724] kasan_slab_free+0xbe/0x190 >>> [7.071724] kfree+0xff/0x2f0 >>> [7.071724] acpi_ut_evaluate_object+0x18e/0x19d >>> [7.071724] acpi_ut_execute_STA+0x26/0x53 >>> [7.071724] acpi_ns_get_device_callback+0x73/0x163 >>> [7.071724] acpi_ns_walk_namespace+0xc0/0x17a >>> [7.071724] acpi_get_devices+0x66/0x7d >>> [7.071724] pnpacpi_init+0x52/0x74 >>> [7.071724] do_one_initcall+0x51/0x1b0 >>> [7.071724] kernel_init_freeable+0x20a/0x2a1 >>> [7.071724] kernel_init+0xe/0x100 >>> [7.071724] ret_from_fork+0x31/0x40 >>> [7.071724] Memory state around the buggy address: >>> [7.071724] 88006bc2af80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc >>> fc fc >>> [7.071724] 88006bc2b000: fb fb fb fb fb fb fb fb fb fb fb fb fc fc >>> fc fc >>> [7.071724] >88006bc2b080: 00 00 00 00 00 00 00 00 05 fc fc fc fc fc >>> fc fc >>> [7.071724]^ >>> [7.071724] 88006bc2b100: 00 00 00 00 00 00 00 00 00 04 fc fc fc fc >>> fc fc >>> [7.071724] 88006bc2b180: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc >>> fc fc >>> >>> Not killing the VM instantly produces a continuous stream of kasan errors. >>> Most of them >>> are identical to the one above, however there was one which was different: >>> >>> [5.846193] >>> == >>> [5.846787] BUG: KASAN: slab-out-of-bounds in filldir+0xc3/0x160 at addr >>> 88006c783eae >>> [5.847177] Read of size 22 by task systemd/1 >>> [5.847177] CPU: 3 PID: 1 Comm: systemd Tainted: GB >>> 4.11.0-rc1-nbor #150 >>> [5.847177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> Ubuntu-1.8.2-1ubuntu1 04/01/2014 >>> [5.847177] Call Trace: >>> [5.847177] dump_stack+0x85/0xc9 >>> [5.847177] kasan_object_err+0x2c/0x90 >>> [5.847177] kasan_report+0x285/0x510 >>> [5.847177] check_memory_region+0x137/0x160 >>> [5.847177] kasan_check_read+0x11/0x20 >>> [5.847177] filldir+0xc3/0x160 >>> [5.847177] call_filldir+0x88/0x140 >>> [5.847177] ext4_readdir+0x757/0x920 >>> [5.847177] ? iterate_dir+0x49/0x190 >>> [