crash report: paging request errors in various krbd contexts

2014-05-16 Thread Hannes Landeholm
A production server just locked up and had to be hard rebooted. It had
these various rbd related crash signatures in it's system log within
the same 10 second interval:

kernel: BUG: unable to handle kernel paging request at 
kernel: IP: [812832dd] strnlen+0xd/0x40
kernel: PGD 180f067 PUD 1811067 PMD 0
kernel: Oops:  [#1] PREEMPT SMP
kernel: Modules linked in: xt_recent xt_limit veth ipt_MASQUERADE cbc
ipt_REJECT xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp
iptable_mangle ip_tables x_tables coretemp hwmon x86_pkg_temp_thermal
crct10dif_pclmul crct10di
f_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode evdev
pcspkr xen_netback xen_blkback xen_gntalloc xenfs xen_gntdev
xen_evtchn rbd libceph crc32c libcrc32c ext4 crc16 mbcache jbd2
ixgbevf xen_privcmd xen_pcifront xen_netfront xen_kbdfro
nt xen_fbfront syscopyarea sysfillrect sysimgblt fb_sys_fops
xen_blkfront virtio_pci virtio_net virtio_blk virtio_ring virtio
ipmi_poweroff ipmi_msghandler
kernel: CPU: 3 PID: 2887 Comm: mysqld Not tainted 3.14.1-1-js #1
kernel: task: 880342ce1d70 ti: 88033e17e000 task.ti: 88033e17e000
kernel: RIP: e030:[812832dd]  [812832dd] strnlen+0xd/0x40
kernel: RSP: e02b:88033e17fa78  EFLAGS: 00010286
kernel: RAX: 816feed6 RBX: 8800ffb60068 RCX: fffe
kernel: RDX:  RSI:  RDI: 
kernel: RBP: 88033e17fa78 R08:  R09: 
kernel: R10: 756d696e612d736a R11: 2f30622d637a762f R12: 
kernel: R13: 8800ffb600cd R14:  R15: 
kernel: FS:  7f98dec7f700() GS:8803b5d8()
knlGS:
kernel: CS:  e033 DS:  ES:  CR0: 8005003b
kernel: CR2:  CR3: 000335dcc000 CR4: 2660
kernel: Stack:
kernel:  88033e17fab0 8128562b 8800ffb60068 8800ffb600cd
kernel:  88033e17fb28 a0196869 a0196869 88033e17fb18
kernel:  81286ac1 880389525960 814de243 
kernel: Call Trace:
kernel:  [8128562b] string.isra.6+0x3b/0xf0
kernel:  [81286ac1] vsnprintf+0x1c1/0x610
kernel:  [814de243] ? _raw_spin_unlock_irq+0x13/0x30
kernel:  [81286fd9] snprintf+0x39/0x40
kernel:  [a01911e0] ? rbd_img_request_fill+0x100/0x6d0 [rbd]
kernel:  [a019122a] rbd_img_request_fill+0x14a/0x6d0 [rbd]
kernel:  [a018f4d5] ? rbd_img_request_create+0x155/0x220 [rbd]
kernel:  [8125cab9] ? blk_add_timer+0x19/0x20
kernel:  [a0194a1d] rbd_request_fn+0x1ed/0x330 [rbd]
kernel:  [81252f13] __blk_run_queue+0x33/0x40
kernel:  [8125411e] queue_unplugged+0x2e/0xd0
kernel:  [81256cf0] blk_flush_plug_list+0x1f0/0x230
kernel:  [812570a4] blk_finish_plug+0x14/0x40
kernel:  [a00b9d6e] ext4_writepages+0x48e/0xd50 [ext4]
kernel:  [81136aae] ? generic_file_aio_write+0x5e/0xe0
kernel:  [811417ae] do_writepages+0x1e/0x40
kernel:  [811363d9] __filemap_fdatawrite_range+0x59/0x60
kernel:  [811364da] filemap_write_and_wait_range+0x2a/0x70
kernel:  [a00b149a] ext4_sync_file+0xba/0x360 [ext4]
kernel:  [811d50ce] do_fsync+0x4e/0x80
kernel:  [811d5373] SyS_fdatasync+0x13/0x20
kernel:  [814e66e9] system_call_fastpath+0x16/0x1b
kernel: Code: c0 01 80 38 00 75 f7 48 29 f8 5d c3 31 c0 5d c3 66 66 66
66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 8d 4e ff 48 89 e5 74
2a 80 3f 00 74 25 48 89 f8 31 d2 eb 10 0f 1f 80 00 00 00 00 48 83
kernel: RIP  [812832dd] strnlen+0xd/0x40
kernel:  RSP 88033e17fa78
kernel: CR2: 
kernel: ---[ end trace 83a2fd2a9969b20d ]---

kernel: BUG: unable to handle kernel paging request at 177b
kernel: IP: [814dd33c] down_read+0xc/0x20
kernel: PGD 39c86b067 PUD 39c86a067 PMD 0
kernel: Oops: 0002 [#2] PREEMPT SMP
kernel: Modules linked in: xt_recent xt_limit veth ipt_MASQUERADE cbc
ipt_REJECT xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp
iptable_mangle ip_tables x_tables coretemp hwmon x86_pkg_temp_thermal
crct10dif_pclmul crct10di
f_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode evdev
pcspkr xen_netback xen_blkback xen_gntalloc xenfs xen_gntdev
xen_evtchn rbd libceph crc32c libcrc32c ext4 crc16 mbcache jbd2
ixgbevf xen_privcmd xen_pcifront xen_netfront xen_kbdfro
nt xen_fbfront syscopyarea sysfillrect sysimgblt fb_sys_fops
xen_blkfront virtio_pci virtio_net virtio_blk virtio_ring virtio
ipmi_poweroff ipmi_msghandler
kernel: CPU: 3 PID: 16277 Comm: jbd2/rbd115-8 Tainted: G  D
3.14.1-1-js #1
kernel: task: 8800e6f13110 ti: 8800f9d52000 

Re: crash report: paging request errors in various krbd contexts

2014-05-16 Thread Alex Elder
On 05/16/2014 01:00 AM, Hannes Landeholm wrote:
 A production server just locked up and had to be hard rebooted. It had
 these various rbd related crash signatures in it's system log within
 the same 10 second interval:

I'll try to provide a quick summary of what's likely
happened in each of these.

The bottom line is that it appears that some
memory used by rbd and/or libceph has become
corrupted, or there is something (or more than
one thing) that is being used after it's been
freed.  Either way this sort of thing will be
difficult to try to understand; it would be
great if it could be reproduced independently.

 kernel: BUG: unable to handle kernel paging request at 
 kernel: IP: [812832dd] strnlen+0xd/0x40
 kernel: PGD 180f067 PUD 1811067 PMD 0
 kernel: Oops:  [#1] PREEMPT SMP
 kernel: Modules linked in: xt_recent xt_limit veth ipt_MASQUERADE cbc
 ipt_REJECT xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4
 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp
 iptable_mangle ip_tables x_tables coretemp hwmon x86_pkg_temp_thermal
 crct10dif_pclmul crct10di
 f_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel
 aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode evdev
 pcspkr xen_netback xen_blkback xen_gntalloc xenfs xen_gntdev
 xen_evtchn rbd libceph crc32c libcrc32c ext4 crc16 mbcache jbd2
 ixgbevf xen_privcmd xen_pcifront xen_netfront xen_kbdfro
 nt xen_fbfront syscopyarea sysfillrect sysimgblt fb_sys_fops
 xen_blkfront virtio_pci virtio_net virtio_blk virtio_ring virtio
 ipmi_poweroff ipmi_msghandler
 kernel: CPU: 3 PID: 2887 Comm: mysqld Not tainted 3.14.1-1-js #1
 kernel: task: 880342ce1d70 ti: 88033e17e000 task.ti: 88033e17e000
 kernel: RIP: e030:[812832dd]  [812832dd] strnlen+0xd/0x40

We're calling strnlen() (ultimately) from snprintf().  The
format provided will be %s.%012llx (or similar).  The
string provided for the %s is rbd_dev-header.object_prefix,
which is a dynamically allocated string initialized once
for the rbd device, which will be NUL-terminated and
unchanging until the device gets mapped.

Either the rbd device got unmapped while still
in use, or the memory holding this rbd_dev structure
got corrupted somehow.

 kernel: RSP: e02b:88033e17fa78  EFLAGS: 00010286
 kernel: RAX: 816feed6 RBX: 8800ffb60068 RCX: fffe
 kernel: RDX:  RSI:  RDI: 
 kernel: RBP: 88033e17fa78 R08:  R09: 
 kernel: R10: 756d696e612d736a R11: 2f30622d637a762f R12: 
 kernel: R13: 8800ffb600cd R14:  R15: 
 kernel: FS:  7f98dec7f700() GS:8803b5d8()
 knlGS:
 kernel: CS:  e033 DS:  ES:  CR0: 8005003b
 kernel: CR2:  CR3: 000335dcc000 CR4: 2660
 kernel: Stack:
 kernel:  88033e17fab0 8128562b 8800ffb60068 8800ffb600cd
 kernel:  88033e17fb28 a0196869 a0196869 88033e17fb18
 kernel:  81286ac1 880389525960 814de243 
 kernel: Call Trace:
 kernel:  [8128562b] string.isra.6+0x3b/0xf0
 kernel:  [81286ac1] vsnprintf+0x1c1/0x610
 kernel:  [814de243] ? _raw_spin_unlock_irq+0x13/0x30
 kernel:  [81286fd9] snprintf+0x39/0x40
 kernel:  [a01911e0] ? rbd_img_request_fill+0x100/0x6d0 [rbd]
 kernel:  [a019122a] rbd_img_request_fill+0x14a/0x6d0 [rbd]
 kernel:  [a018f4d5] ? rbd_img_request_create+0x155/0x220 [rbd]
 kernel:  [8125cab9] ? blk_add_timer+0x19/0x20
 kernel:  [a0194a1d] rbd_request_fn+0x1ed/0x330 [rbd]
 kernel:  [81252f13] __blk_run_queue+0x33/0x40
 kernel:  [8125411e] queue_unplugged+0x2e/0xd0
 kernel:  [81256cf0] blk_flush_plug_list+0x1f0/0x230
 kernel:  [812570a4] blk_finish_plug+0x14/0x40
 kernel:  [a00b9d6e] ext4_writepages+0x48e/0xd50 [ext4]
 kernel:  [81136aae] ? generic_file_aio_write+0x5e/0xe0
 kernel:  [811417ae] do_writepages+0x1e/0x40
 kernel:  [811363d9] __filemap_fdatawrite_range+0x59/0x60
 kernel:  [811364da] filemap_write_and_wait_range+0x2a/0x70
 kernel:  [a00b149a] ext4_sync_file+0xba/0x360 [ext4]
 kernel:  [811d50ce] do_fsync+0x4e/0x80
 kernel:  [811d5373] SyS_fdatasync+0x13/0x20
 kernel:  [814e66e9] system_call_fastpath+0x16/0x1b
 kernel: Code: c0 01 80 38 00 75 f7 48 29 f8 5d c3 31 c0 5d c3 66 66 66
 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 f6 48 8d 4e ff 48 89 e5 74
 2a 80 3f 00 74 25 48 89 f8 31 d2 eb 10 0f 1f 80 00 00 00 00 48 83
 kernel: RIP  [812832dd] strnlen+0xd/0x40
 kernel:  RSP 88033e17fa78
 kernel: CR2: 
 kernel: ---[ end trace 83a2fd2a9969b20d ]---
 
 kernel: BUG: unable to handle kernel paging request at 177b
 kernel: IP: [814dd33c] down_read+0xc/0x20
 

Re: crash report: paging request errors in various krbd contexts

2014-05-16 Thread Hannes Landeholm
 The bottom line is that it appears that some
 memory used by rbd and/or libceph has become
 corrupted, or there is something (or more than
 one thing) that is being used after it's been
 freed.  Either way this sort of thing will be
 difficult to try to understand; it would be
 great if it could be reproduced independently.

 We're calling strnlen() (ultimately) from snprintf().  The
 format provided will be %s.%012llx (or similar).  The
 string provided for the %s is rbd_dev-header.object_prefix,
 which is a dynamically allocated string initialized once
 for the rbd device, which will be NUL-terminated and
 unchanging until the device gets mapped.

 Either the rbd device got unmapped while still
 in use, or the memory holding this rbd_dev structure
 got corrupted somehow.

Yes, with my limited knowledge of the kernel I would have guessed that
it was some form of memory allocation problem as well as it crashed in
wildly different contexts and it crashed right after a memory
allocation in the snprintf() case.

Is it possible to configure the kernel when building it so it sanity
checks memory allocations that are free'd and/or reserved? I have
implemented my own free list based VM in userspace and I find it very
useful to insert a header with a magic canary value that I set before
giving out memory and check when I get memory back. This allows me to
crash with the offending code in the backtrace instead of crashing in
a wildly different context.

 I don't know if you've supplied this before, but can
 you describe the way the rbd device(s) in question
 is configured?  How many devices, how big are they,
 and *especially*, are they using layering and if so
 what the relationships are between them.

It's something like ~100-200 mappings that are 10 gb each. They use
layering and generally share the same parent with varying distance to
the common ancestor snapshot, but it's unlikely to be more than ~20
layers at the moment. More than 75% probably share the same common
ancestor. We don't have rbd caching enabled.

Thank you for you time,
Hannes
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html