Package: linux Version: 3.16.7-ckt11-1+deb8u2 Severity: critical Hi,
TL;DR - please provide a kernel with a newer drbd module (e.g. 8.4.6), as the current version is incompatible with stable's drbd-utils and will result in kernel panics under load. I have the following kernel: Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17) This ships with version 8.4.3 of the drbd kernel module (which advertises '(api:1/proto:86-101)'). Using that version of the module with stable's drbd-utils (8.9.2rc1) results in kernel panics under heavy I/O load, fairly repeatedly. A kernel log of a typical crash is attached to this report. I intially reported this issue to Xen (since it happened in a dom0), and they referred me to this blog post: http://blog.chinewalking.com/drbd-kernel-oops-w-trim/ Notably, you will observe that drbd-module >=8.4.4 supports "trim", whereas 8.4.3 does not. Yet the userland tools arrange to use trim anyway: Aug 4 14:28:24 ophon kernel: [2856757.049680] drbd mws-02474: Agreed to support TRIM on protocol level Following that suggestion, I installed the kernel module 8.4.6 from upstream, and the kernel has stopped panicking. You might argue that drbd upstream's api/proto discrimination is inadequate (and perhaps a bug report should go there), but nonetheless kernel panics are a serious flaw in the kernel (or the offending module) IMAO. Regards, Matthew -- System Information: Debian Release: 8.1 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-backports'), (500, 'stable\ ') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system)
Aug 3 16:03:13 opus kernel: [ 1250.026811] drbd mws-priv-1: Starting worker thread (from drbdsetup-84 [12987]) Aug 3 16:03:13 opus kernel: [ 1250.027313] block drbd4: disk( Diskless -> Attaching ) Aug 3 16:03:13 opus kernel: [ 1250.027409] drbd mws-priv-1: Method to ensure write ordering: flush Aug 3 16:03:13 opus kernel: [ 1250.027413] block drbd4: max BIO size = 4096 Aug 3 16:03:13 opus kernel: [ 1250.027418] block drbd4: drbd_bm_resize called with capacity == 41941688 Aug 3 16:03:13 opus kernel: [ 1250.027558] block drbd4: resync bitmap: bits=5242711 words=81918 pages=160 Aug 3 16:03:13 opus kernel: [ 1250.027561] block drbd4: size = 20 GB (20970844 KB) Aug 3 16:03:13 opus kernel: [ 1250.032268] block drbd4: Writing the whole bitmap, size changed Aug 3 16:03:13 opus kernel: [ 1250.047827] block drbd4: bitmap WRITE of 160 pages took 4 jiffies Aug 3 16:03:13 opus kernel: [ 1250.061634] block drbd4: 20 GB (5242711 bits) marked out-of-sync by on disk bit-map. Aug 3 16:03:13 opus kernel: [ 1250.180186] block drbd4: bitmap READ of 160 pages took 2 jiffies Aug 3 16:03:13 opus kernel: [ 1250.180291] block drbd4: recounting of set bits took additional 0 jiffies Aug 3 16:03:13 opus kernel: [ 1250.180293] block drbd4: 20 GB (5242711 bits) marked out-of-sync by on disk bit-map. Aug 3 16:03:13 opus kernel: [ 1250.180304] block drbd4: Suspended AL updates Aug 3 16:03:13 opus kernel: [ 1250.180307] block drbd4: disk( Attaching -> Inconsistent ) Aug 3 16:03:13 opus kernel: [ 1250.180310] block drbd4: attached to UUIDs 0000000000000004:0000000000000000:0000000000000000:0000000000000000 Aug 3 16:03:13 opus kernel: [ 1250.191161] drbd mws-priv-1: conn( StandAlone -> Unconnected ) Aug 3 16:03:13 opus kernel: [ 1250.191183] drbd mws-priv-1: Starting receiver thread (from drbd_w_mws-priv [12989]) Aug 3 16:03:13 opus kernel: [ 1250.191345] drbd mws-priv-1: receiver (re)started Aug 3 16:03:13 opus kernel: [ 1250.191360] drbd mws-priv-1: conn( Unconnected -> WFConnection ) Aug 3 16:03:13 opus kernel: [ 1250.689576] drbd mws-priv-1: Handshake successful: Agreed network protocol version 101 Aug 3 16:03:13 opus kernel: [ 1250.689580] drbd mws-priv-1: Agreed to support TRIM on protocol level Aug 3 16:03:13 opus kernel: [ 1250.689616] drbd mws-priv-1: conn( WFConnection -> WFReportParams ) Aug 3 16:03:13 opus kernel: [ 1250.689631] drbd mws-priv-1: Starting asender thread (from drbd_r_mws-priv [12992]) Aug 3 16:03:13 opus kernel: [ 1250.737084] block drbd4: max BIO size = 1048576 Aug 3 16:03:13 opus kernel: [ 1250.737091] block drbd4: drbd_sync_handshake: Aug 3 16:03:13 opus kernel: [ 1250.737094] block drbd4: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:5242711 flags:0 Aug 3 16:03:13 opus kernel: [ 1250.737096] block drbd4: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:5242711 flags:0 Aug 3 16:03:13 opus kernel: [ 1250.737098] block drbd4: uuid_compare()=0 by rule 10 Aug 3 16:03:13 opus kernel: [ 1250.737100] block drbd4: No resync, but 5242711 bits in bitmap! Aug 3 16:03:13 opus kernel: [ 1250.737105] block drbd4: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) Aug 3 16:03:13 opus kernel: [ 1250.737109] block drbd4: Resumed AL updates Aug 3 16:03:14 opus kernel: [ 1250.773903] block drbd4: Accepted new current UUID, preparing to skip initial sync Aug 3 16:03:14 opus kernel: [ 1250.777061] block drbd4: bitmap WRITE of 160 pages took 1 jiffies Aug 3 16:03:14 opus kernel: [ 1250.788564] block drbd4: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Aug 3 16:03:14 opus kernel: [ 1250.788573] block drbd4: disk( Inconsistent -> UpToDate ) pdsk( Inconsistent -> UpToDate ) Aug 3 16:03:14 opus kernel: [ 1250.797104] block drbd4: receiver updated UUIDs to 14460554106EF79A:0000000000000000:0000000000000000:0000000000000000 Aug 3 16:03:14 opus kernel: [ 1250.797117] block drbd4: peer( Secondary -> Primary ) Aug 3 16:03:15 opus kernel: [ 1251.748952] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) Aug 3 16:03:15 opus kernel: [ 1251.748983] BUG: unable to handle kernel paging request at ffff8800022f3d88 Aug 3 16:03:15 opus kernel: [ 1251.749016] IP: [<ffff8800022f3d88>] 0xffff8800022f3d88 Aug 3 16:03:15 opus kernel: [ 1251.749041] PGD 1814067 PUD 1815067 PMD 2f81067 PTE 80100000022f3067 Aug 3 16:03:15 opus kernel: [ 1251.749082] Oops: 0011 [#1] SMP Aug 3 16:03:15 opus kernel: [ 1251.749106] Modules linked in: xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel joydev hid_generic iTCO_wdt iTCO_vendor_support aesni_intel evdev aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd usbhid ttm hid drm_kms_helper pcspkr drm i2c_i801 lpc_ich mfd_core i7core_edac ioatdma edac_core tpm_tis tpm ipmi_si ipmi_msghandler button shpchp processor thermal_sys drbd lru_cache libcrc32c autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ehci_pci uhci_hcd ehci_hcd scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core Aug 3 16:03:15 opus kernel: [ 1251.749691] CPU: 0 PID: 12993 Comm: drbd_a_mws-priv Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u2 Aug 3 16:03:15 opus kernel: [ 1251.749726] Hardware name: Intel Corporation S5500WBV/S5500WB, BIOS S5500.86B.01.00.0061.030920121535 03/09/2012 Aug 3 16:03:15 opus kernel: [ 1251.749765] task: ffff8800171742d0 ti: ffff8800022f0000 task.ti: ffff8800022f0000 Aug 3 16:03:15 opus kernel: [ 1251.749847] RIP: e030:[<ffff8800022f3d88>] [<ffff8800022f3d88>] 0xffff8800022f3d88 Aug 3 16:03:15 opus kernel: [ 1251.749934] RSP: e02b:ffff8800022f3d90 EFLAGS: 00010212 Aug 3 16:03:15 opus kernel: [ 1251.749984] RAX: 00000000fffffffc RBX: ffffffffffffffff RCX: 0000000000000113 Aug 3 16:03:15 opus kernel: [ 1251.750039] RDX: 0000000000000113 RSI: 00000000fffffe01 RDI: ffffffff81463f75 Aug 3 16:03:15 opus kernel: [ 1251.750094] RBP: ffff8800171742d0 R08: ffff8800022f0000 R09: 0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.750150] R10: ffff88001751b890 R11: 0000000000000000 R12: 0000000000000001 Aug 3 16:03:15 opus kernel: [ 1251.750205] R13: 0000000000000000 R14: 0000000000000010 R15: ffff880016c92000 Aug 3 16:03:15 opus kernel: [ 1251.750263] FS: 00007f90c43c0740(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.750348] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 3 16:03:15 opus kernel: [ 1251.750399] CR2: ffff8800022f3d88 CR3: 000000001073f000 CR4: 0000000000002660 Aug 3 16:03:15 opus kernel: [ 1251.750454] Stack: Aug 3 16:03:15 opus kernel: [ 1251.750493] ffff8800022f3d88 0000000000000010 0000000000000000 0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.750589] ffff8800022f3d90 0000000000000001 0000000000000000 0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.750686] 0000000000004100 ffffffffa02577be ffff880016c92080 0000001000000000 Aug 3 16:03:15 opus kernel: [ 1251.750783] Call Trace: Aug 3 16:03:15 opus kernel: [ 1251.750829] [<ffffffffa02577be>] ? drbd_asender+0x27e/0x750 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.750887] [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.750947] [<ffffffffa0260d46>] ? drbd_thread_setup+0x46/0x130 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.751006] [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.751065] [<ffffffff81087fad>] ? kthread+0xbd/0xe0 Aug 3 16:03:15 opus kernel: [ 1251.751114] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180 Aug 3 16:03:15 opus kernel: [ 1251.751170] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90 Aug 3 16:03:15 opus kernel: [ 1251.751221] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180 Aug 3 16:03:15 opus kernel: [ 1251.751274] Code: ff ff ff ff ff ff ff ff ff ff ff 88 3d 2f 02 00 88 ff ff 30 e0 00 00 00 00 00 00 12 02 01 00 00 00 00 00 90 3d 2f 02 00 88 ff ff <2b> e0 00 00 00 00 00 00 88 3d 2f 02 00 88 ff ff 10 00 00 00 00 Aug 3 16:03:15 opus kernel: [ 1251.751694] RIP [<ffff8800022f3d88>] 0xffff8800022f3d88 Aug 3 16:03:15 opus kernel: [ 1251.751747] RSP <ffff8800022f3d90> Aug 3 16:03:15 opus kernel: [ 1251.751790] CR2: ffff8800022f3d88 Aug 3 16:03:15 opus kernel: [ 1251.752128] ---[ end trace 975e04f66c2d9004 ]--- Aug 3 16:03:15 opus kernel: [ 1251.835012] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 Aug 3 16:03:15 opus kernel: [ 1251.835235] IP: [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.835408] PGD 0 Aug 3 16:03:15 opus kernel: [ 1251.835531] Oops: 0002 [#2] SMP Aug 3 16:03:15 opus kernel: [ 1251.835704] Modules linked in: xt_physdev iptable_filter ip_tables x_tables xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc bridge stp llc intel_powerclamp coretemp crc32_pclmul ghash_clmulni_intel joydev hid_generic iTCO_wdt iTCO_vendor_support aesni_intel evdev aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd usbhid ttm hid drm_kms_helper pcspkr drm i2c_i801 lpc_ich mfd_core i7core_edac ioatdma edac_core tpm_tis tpm ipmi_si ipmi_msghandler button shpchp processor thermal_sys drbd lru_cache libcrc32c autofs4 ext4 crc16 mbcache jbd2 dm_mod raid1 md_mod sg sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata ehci_pci uhci_hcd ehci_hcd scsi_mod usbcore usb_common igb i2c_algo_bit i2c_core dca ptp pps_core Aug 3 16:03:15 opus kernel: [ 1251.840379] CPU: 0 PID: 12992 Comm: drbd_r_mws-priv Tainted: G D 3.16.0-4-amd64 #1 Debian 3.16.7-ckt11-1+deb8u2 Aug 3 16:03:15 opus kernel: [ 1251.840515] Hardware name: Intel Corporation S5500WBV/S5500WB, BIOS S5500.86B.01.00.0061.030920121535 03/09/2012 Aug 3 16:03:15 opus kernel: [ 1251.840648] task: ffff880016c1a050 ti: ffff8800173e4000 task.ti: ffff8800173e4000 Aug 3 16:03:15 opus kernel: [ 1251.840771] RIP: e030:[<ffffffffa02453bd>] [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.840951] RSP: e02b:ffff8800173e7ce0 EFLAGS: 00010097 Aug 3 16:03:15 opus kernel: [ 1251.841041] RAX: 0000000000000000 RBX: ffff880017439700 RCX: 000000000000009c Aug 3 16:03:15 opus kernel: [ 1251.841137] RDX: 0000000000000000 RSI: ffff88000c850200 RDI: ffff88000c85bed0 Aug 3 16:03:15 opus kernel: [ 1251.841233] RBP: ffff88000cb22800 R08: 0000000000000cce R09: ffff88000c850200 Aug 3 16:03:15 opus kernel: [ 1251.841329] R10: 0000000000007ff0 R11: 0000000000000000 R12: ffff880002b676a0 Aug 3 16:03:15 opus kernel: [ 1251.841424] R13: ffff88001f8463b0 R14: ffff88000cb22bb0 R15: ffff88000cb22800 Aug 3 16:03:15 opus kernel: [ 1251.841523] FS: 00007f90c43c0740(0000) GS:ffff88001fa00000(0000) knlGS:0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.841648] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Aug 3 16:03:15 opus kernel: [ 1251.841739] CR2: 0000000000000008 CR3: 000000001073f000 CR4: 0000000000002660 Aug 3 16:03:15 opus kernel: [ 1251.841835] Stack: Aug 3 16:03:15 opus kernel: [ 1251.841913] 0000000000000000 0000000000030003 ffff8800174397b8 0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.842216] 0000000000000000 0000000000102800 0000000000400000 0000000000104800 Aug 3 16:03:15 opus kernel: [ 1251.842518] 0000000000000000 0000000000102800 0000000000000000 0000000000000000 Aug 3 16:03:15 opus kernel: [ 1251.842820] Call Trace: Aug 3 16:03:15 opus kernel: [ 1251.842906] [<ffffffffa0254bb5>] ? drbd_submit_peer_request+0x85/0x330 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843033] [<ffffffffa02556ea>] ? receive_Data+0x36a/0xe40 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843130] [<ffffffffa0257407>] ? drbd_receiver+0x117/0x250 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843228] [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843328] [<ffffffffa0260d46>] ? drbd_thread_setup+0x46/0x130 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843427] [<ffffffffa0260d00>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.843524] [<ffffffff81087fad>] ? kthread+0xbd/0xe0 Aug 3 16:03:15 opus kernel: [ 1251.843614] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180 Aug 3 16:03:15 opus kernel: [ 1251.843709] [<ffffffff815114d8>] ? ret_from_fork+0x58/0x90 Aug 3 16:03:15 opus kernel: [ 1251.843801] [<ffffffff81087ef0>] ? kthread_create_on_node+0x180/0x180 Aug 3 16:03:15 opus kernel: [ 1251.843894] Code: 04 48 8b 45 00 48 8d b8 d0 00 00 00 e8 dd bc 2c e1 8b 53 58 49 89 c1 c1 ea 09 01 95 54 02 00 00 49 83 fd ff 48 8b 13 48 8b 43 08 <48> 89 42 08 48 89 10 48 8d 85 c0 03 00 00 48 8b 95 c8 03 00 00 Aug 3 16:03:15 opus kernel: [ 1251.846996] RIP [<ffffffffa02453bd>] drbd_endio_write_sec_final+0x9d/0x480 [drbd] Aug 3 16:03:15 opus kernel: [ 1251.847168] RSP <ffff8800173e7ce0> Aug 3 16:03:15 opus kernel: [ 1251.847252] CR2: 0000000000000008 Aug 3 16:03:15 opus kernel: [ 1251.847335] ---[ end trace 975e04f66c2d9005 ]---