The upstream fix is likely this: >From b73eba2a867e10b9b4477738677341f3307c07bb Mon Sep 17 00:00:00 2001 From: Gang He <g...@suse.com> Date: Sat, 4 Jan 2020 13:00:22 -0800 Subject: [PATCH] ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
Because ocfs2_get_dlm_debug() function is called once less here, ocfs2 file system will trigger the system crash, usually after ocfs2 file system is unmounted. This system crash is caused by a generic memory corruption, these crash backtraces are not always the same, for exapmle, ocfs2: Unmounting device (253,16) on (node 172167785) general protection fault: 0000 [#1] SMP PTI CPU: 3 PID: 14107 Comm: fence_legacy Kdump: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__kmalloc+0xa5/0x2a0 Code: 00 00 4d 8b 07 65 4d 8b RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0 RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079 R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0 R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0 FS: 00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0 Call Trace: ext4_htree_store_dirent+0x35/0x100 [ext4] htree_dirblock_to_tree+0xea/0x290 [ext4] ext4_htree_fill_tree+0x1c1/0x2d0 [ext4] ext4_readdir+0x67c/0x9d0 [ext4] iterate_dir+0x8d/0x1a0 __x64_sys_getdents+0xab/0x130 do_syscall_64+0x60/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f699d33a9fb This regression problem was introduced by commit e581595ea29c ("ocfs: no need to check return value of debugfs_create functions"). Link: http://lkml.kernel.org/r/20191225061501.13587-1-...@suse.com Fixes: e581595ea29c ("ocfs: no need to check return value of debugfs_create functions") Signed-off-by: Gang He <g...@suse.com> Acked-by: Joseph Qi <joseph...@linux.alibaba.com> Cc: Mark Fasheh <m...@fasheh.com> Cc: Joel Becker <jl...@evilplan.org> Cc: Junxiao Bi <junxiao...@oracle.com> Cc: Changwei Ge <gechang...@live.cn> Cc: Gang He <g...@suse.com> Cc: Jun Piao <piao...@huawei.com> Cc: <sta...@vger.kernel.org> [5.3+] Signed-off-by: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org> as reported in upstream bug. Giving it a try to finally suggest as a SRU to the kernel team. ** Changed in: ocfs2-tools (Ubuntu Eoan) Status: In Progress => Invalid ** Changed in: ocfs2-tools (Ubuntu Focal) Status: In Progress => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1852122 Title: ocfs2-tools is causing kernel panics in Ubuntu Focal (Ubuntu-5.4.0-9.12) Status in OCFS2 Tools: Fix Released Status in linux package in Ubuntu: In Progress Status in ocfs2-tools package in Ubuntu: Invalid Status in linux source package in Eoan: In Progress Status in ocfs2-tools source package in Eoan: Invalid Status in linux source package in Focal: In Progress Status in ocfs2-tools source package in Focal: Invalid Bug description: I noticed the tests for ocfs2-tools/1.8.6-1ubuntu1 were constantly retrying themselves. It's a feature we have so that transient / occasional failures are auto-retried, but it's misfiring here because we're not detecting that it's a consistent failure. That particular bug is fixed, but it means that ocfs2-tools is failing on ppc64el. Here's the important part of the log, full output attached. [ 85.605738] BUG: Unable to handle kernel data access at 0x01744098 [ 85.605850] Faulting instruction address: 0xc000000000e81168 [ 85.605901] Oops: Kernel access of bad area, sig: 11 [#1] [ 85.605970] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 85.606029] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk net_failover failover [ 85.606291] CPU: 0 PID: 1 Comm: systemd Not tainted 5.3.0-18-generic #19-Ubuntu [ 85.606350] NIP: c000000000e81168 LR: c00000000054f240 CTR: 0000000000000000 [ 85.606410] REGS: c00000005a3e3700 TRAP: 0300 Not tainted (5.3.0-18-generic) [ 85.606469] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28024448 XER: 00000000 [ 85.606531] CFAR: 0000701f9806f638 DAR: 0000000001744098 DSISR: 40000000 IRQMASK: 0 [ 85.606531] GPR00: 0000000000007374 c00000005a3e3990 c0000000019c9100 c00000004fe462a8 [ 85.606531] GPR04: c00000005856d840 000000000000000e 0000000074656772 c00000004fe4a568 [ 85.606531] GPR08: 0000000000000000 c000000058568004 0000000001744090 0000000000000000 [ 85.606531] GPR12: 00000000e8086002 c000000001d60000 00007fffddd522d0 0000000000000000 [ 85.606531] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000755e07c [ 85.606531] GPR20: c0000000598caca8 c00000005a3e3a58 0000000000000000 c000000058292f00 [ 85.606531] GPR24: c000000000eea710 0000000000000000 c00000005856d840 c00000000755e074 [ 85.606531] GPR28: 000000006518907d c00000005a3e3a68 c00000004fe4b160 00000000027c47b6 [ 85.607079] NIP [c000000000e81168] rb_insert_color+0x18/0x1c0 [ 85.607137] LR [c00000000054f240] ext4_htree_store_dirent+0x140/0x1c0 [ 85.607186] Call Trace: [ 85.607208] [c00000005a3e3990] [c00000000054f158] ext4_htree_store_dirent+0x58/0x1c0 (unreliable) [ 85.607279] [c00000005a3e39e0] [c000000000594cd8] htree_dirblock_to_tree+0x1b8/0x380 [ 85.607340] [c00000005a3e3b00] [c0000000005962c0] ext4_htree_fill_tree+0xc0/0x3f0 [ 85.607401] [c00000005a3e3c00] [c00000000054ebe4] ext4_readdir+0x814/0xce0 [ 85.607459] [c00000005a3e3d40] [c000000000472d6c] iterate_dir+0x1fc/0x280 [ 85.607511] [c00000005a3e3d90] [c0000000004746f0] ksys_getdents64+0xa0/0x1f0 [ 85.607572] [c00000005a3e3e00] [c000000000474868] sys_getdents64+0x28/0x130 [ 85.607622] [c00000005a3e3e20] [c00000000000b388] system_call+0x5c/0x70 [ 85.607672] Instruction dump: [ 85.607703] 4082ffe8 4e800020 38600000 4e800020 60000000 60000000 e9230000 2c290000 [ 85.607764] 4182018c e9490000 71480001 4c820020 <e90a0008> 7c284840 2fa80000 4182006c [ 85.607827] ---[ end trace cfc53af0f8d62cef ]--- [ 85.610600] [ 86.611522] BUG: Unable to handle kernel data access at 0xc000030058567eff [ 86.611604] Faulting instruction address: 0xc000000000403aa8 [ 86.611656] Oops: Kernel access of bad area, sig: 11 [#2] [ 86.611697] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 86.611748] Modules linked in: ocfs2 quota_tr To manage notifications about this bug go to: https://bugs.launchpad.net/ocfs2-tools/+bug/1852122/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp