On Fri, Oct 7, 2016 at 3:20 PM, Al Viro <v...@zeniv.linux.org.uk> wrote: > splice stuff.
Hmm. I've now gotten two oopses today, all at __kmalloc+0xc3/0x1f0, which seems to be the *(void **)(object + s->offset); in get_freepointer(). Because it started happening today, I'm inclined to blame mainly stuff I merged late yesterday. I'm pretty sure that 4.8.0-09134-g4c1fad64eff4 is all good, in particular, while the problems definitely happen with 4.8.0-11288-gb66484cd7470. Much of the stuff yesterday was non-x86 archiectures (the ARM soc stuff, avr32,parisc and power), so the main suspects are - Andrew's series - Al's splice stuff - Ted's ext4 changes - Jens' block layer changes yes, there are other things that came in between there, not just the architecture things, but they seem much less likely to trigger for me. The traces don't really give me any real ideas, they look like this: BUG: unable to handle kernel paging request at ffff9db749d0c000 IP: [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0 PGD 426098067 PUD 426099067 PMD 344b1a067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_conntrack ebtable_nat ebtable_broute bridge st acpi_als pinctrl_sunrisepoint tpm_tis pinctrl_intel kfifo_buf tpm_tis_core tpm industrialio acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt i915 i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclm CPU: 0 PID: 3091 Comm: collect2 Tainted: G O 4.8.0-11288-gb66484cd7470-dirty #4 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 task: ffff8ee43dbad940 task.stack: ffff9db749ee4000 RIP: 0010:[<ffffffffb320cbe3>] [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0 RSP: 0018:ffff9db749ee7b80 EFLAGS: 00010246 RAX: ffff9db749d0c000 RBX: 00000000024000c0 RCX: 0000000000000000 RDX: 00000000000034f7 RSI: 0000000000000000 RDI: 000000000001b620 RBP: ffff9db749ee7bb0 R08: ffff8ee4b6c1b620 R09: ffff8ee475810b3f R10: ffff9db749d0c000 R11: ffff8ee488a16240 R12: 00000000024000c0 R13: 0000000000000044 R14: ffff8ee4a60037c0 R15: ffff8ee4a60037c0 FS: 00007f3f8b10f740(0000) GS:ffff8ee4b6c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9db749d0c000 CR3: 00000003a188a000 CR4: 00000000003406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffffffb3258f68 ffff9db749ee7c40 0000000000000024 ffff8ee3f5810b40 7974697275636573 ffffffffb3c99760 ffff9db749ee7bd8 ffffffffb3258f68 ffff9db749ee7c40 ffff8ee4a1cb7378 ffff8ee4a1cb7360 ffff9db749ee7c10 Call Trace: [<ffffffffb3258f68>] ? simple_xattr_alloc+0x28/0x60 [<ffffffffb3258f68>] simple_xattr_alloc+0x28/0x60 [<ffffffffb31bec60>] shmem_initxattrs+0x90/0xd0 [<ffffffffb333e60a>] security_inode_init_security+0x11a/0x160 [<ffffffffb31bebd0>] ? shmem_fh_to_dentry+0x60/0x60 [<ffffffffb31c00e2>] shmem_mknod+0x62/0xd0 [<ffffffffb31c0418>] shmem_create+0x18/0x20 [<ffffffffb324110a>] path_openat+0x128a/0x13c0 [<ffffffffb3242541>] do_filp_open+0x91/0x100 [<ffffffffb325051f>] ? __alloc_fd+0x3f/0x170 [<ffffffffb322fe10>] do_sys_open+0x130/0x220 [<ffffffffb322ff1e>] SyS_open+0x1e/0x20 [<ffffffffb379df20>] entry_SYSCALL_64_fastpath+0x13/0x94 Code: 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 c5 00 00 00 49 63 47 20 49 8b 3f 4c 01 d0 40 f6 c7 0f 0f 85 1a 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 RIP [<ffffffffb320cbe3>] __kmalloc+0xc3/0x1f0 and general protection fault: 0000 [#1] SMP Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge st acpi_als pinctrl_sunrisepoint kfifo_buf pinctrl_intel industrialio tpm_tis tpm_tis_core tpm acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit CPU: 5 PID: 3649 Comm: make Not tainted 4.8.0-11290-g13510890a847-dirty #3 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016 task: ffff8e3738188000 task.stack: ffffabe649e88000 RIP: 0010:[<ffffffff8720cd63>] [<ffffffff8720cd63>] __kmalloc+0xc3/0x1f0 RSP: 0018:ffffabe649e8bc38 EFLAGS: 00010246 RAX: 1e7acd36f90e784c RBX: 00000000024080c0 RCX: ffff8e36e78631f4 RDX: 000000000000284a RSI: 0000000000000000 RDI: 000000000001b620 RBP: ffffabe649e8bc68 R08: ffff8e3776d5b620 R09: 0000000084200088 R10: 1e7acd36f90e784c R11: 0000000069636574 R12: 00000000024080c0 R13: 000000000000004b R14: ffff8e37660037c0 R15: ffff8e37660037c0 FS: 00007f020d92a740(0000) GS:ffff8e3776d40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffed15ee080 CR3: 00000003f815a000 CR4: 00000000003406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffffff872bbb8e ffff8e36faebc818 ffff8e3717123100 00000000b0b3acc0 000000000ce85fb7 ffff8e36e78631f4 ffffabe649e8bca8 ffffffff872bbb8e ffffabe649e8bcf0 ffff8e36faebc818 ffffabe649e8bd80 ffff8e36e78631fc Call Trace: [<ffffffff872bbb8e>] ? ext4_htree_store_dirent+0x3e/0x120 [<ffffffff872bbb8e>] ext4_htree_store_dirent+0x3e/0x120 [<ffffffff872cd427>] htree_dirblock_to_tree+0xc7/0x1c0 [<ffffffff872ce572>] ext4_htree_fill_tree+0xb2/0x320 [<ffffffff871e0da1>] ? special_mapping_fault+0x31/0xa0 [<ffffffff872bb900>] ext4_readdir+0x660/0x890 [<ffffffff8734620d>] ? __inode_security_revalidate+0x4d/0x70 [<ffffffff87245f22>] iterate_dir+0x172/0x1a0 [<ffffffff87246398>] SyS_getdents+0x98/0x120 [<ffffffff87246120>] ? fillonedir+0xc0/0xc0 [<ffffffff8779e0a0>] entry_SYSCALL_64_fastpath+0x13/0x94 Code: 49 83 78 10 00 4d 8b 10 0f 84 ce 00 00 00 4d 85 d2 0f 84 c5 00 00 00 49 63 47 20 49 8b 3f 4c 01 d0 40 f6 c7 0f 0f 85 1a 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 RIP [<ffffffff8720cd63>] __kmalloc+0xc3/0x1f0 RSP <ffffabe649e8bc38> ---[ end trace 843edceadb3bd424 ]--- so in both cases it was filesystem stuff, but I'm not sure how much of a pattern that is. The trapping instruction is just a mov (%rax),%rbx and as you can see rax is garbage. I guess I'll need to just run with slab debugging on, but I wanted to bring this to peoples attention in case it rings a bell for somebody. I haven't been merging anything today, partly because of this. The problem *may* go back further, but I did run 4c1fad64eff4 for a while without any sign of this. Linus