[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster
Thanks for reporting this bug, Hóka. The commit posted by Jacob mentions that the bug happens when HDD is used as a swap device. Do you have something like that in your environment? Also if you have a reproduce that can trigger the problem let me know. Thank you. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1823859 Title: NULL pointer dereference in split_swap_cluster Status in linux-azure package in Ubuntu: New Bug description: We have encountered the following oops on one of our VMs: Apr 7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle kernel NULL pointer dereference at 0007 Apr 7 14:02:19 rancher1 kernel: [2089793.282782] IP: split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0 Apr 7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI Apr 7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul Apr 7 14:02:19 rancher1 kernel: [2089793.618910] crc32_pclmul ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard cfbimgblt hid cfbcopyarea hv_netvsc hv_utils Apr 7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu Apr 7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 Apr 7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 0010:split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 EFLAGS: 00010246 Apr 7 14:02:19 rancher1 kernel: [2089793.800432] RAX: RBX: 007290de RCX: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 00118df9 RDI: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 0001 R09: 9c647ffd4d00 Apr 7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 0001 R12: f61ac463 Apr 7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: f61ac4638000 R15: f61ac4630040 Apr 7 14:02:19 rancher1 kernel: [2089793.935871] FS: () GS:9c647fc0() knlGS: Apr 7 14:02:19 rancher1 kernel: [2089793.966483] CS: 0010 DS: ES: CR0: 80050033 Apr 7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 3240a005 CR4: 001606f0 Apr 7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace: Apr 7 14:02:19 rancher1 kernel: [2089794.028683] split_huge_page_to_list+0x76e/0x7f0 Apr 7 14:02:19 rancher1 kernel: [2089794.051250] deferred_split_scan+0x177/0x2d0 Apr 7 14:02:19 rancher1 kernel: [2089794.065213] shrink_slab.part.50+0x20b/0x440 Apr 7 14:02:19 rancher1 kernel: [2089794.083856] shrink_node+0x2fc/0x310 Apr 7 14:02:19 rancher1 kernel: [2089794.097963] kswapd+0x32a/0x770 Apr 7 14:02:19 rancher1 kernel: [2089794.110523] kthread+0x105/0x140 Apr 7 14:02:19 rancher1 kernel: [2089794.122680] ? mem_cgroup_shrink_node+0x190/0x190 Apr 7 14:02:19 rancher1 kernel: [2089794.139139] ? kthread_destroy_worker+0x50/0x50 Apr 7 14:02:19 rancher1 kernel: [2089794.155543] ret_from_fork+0x35/0x40 Apr 7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff ff ff eb e9 0f Apr 7 14:02:19 rancher1 kernel: [2089794.237196] RIP: split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0 Apr 7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007 Apr 7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 5b797d89aee7fc1b ]--- The machine become unstable after this until reboot, like reading some namespaced process' command arguments hung, so it is possible that there was
[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster
in case anyone's interested, the upstream fix is: commit c4f9c701f9b44299e6adbc58d1a4bb2c40383494 Author: Huang Ying Date: Thu Oct 15 20:06:07 2020 -0700 mm: fix a race during THP splitting -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1823859 Title: NULL pointer dereference in split_swap_cluster Status in linux-azure package in Ubuntu: New Bug description: We have encountered the following oops on one of our VMs: Apr 7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle kernel NULL pointer dereference at 0007 Apr 7 14:02:19 rancher1 kernel: [2089793.282782] IP: split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0 Apr 7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI Apr 7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul Apr 7 14:02:19 rancher1 kernel: [2089793.618910] crc32_pclmul ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard cfbimgblt hid cfbcopyarea hv_netvsc hv_utils Apr 7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu Apr 7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 Apr 7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 0010:split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 EFLAGS: 00010246 Apr 7 14:02:19 rancher1 kernel: [2089793.800432] RAX: RBX: 007290de RCX: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 00118df9 RDI: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 0001 R09: 9c647ffd4d00 Apr 7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 0001 R12: f61ac463 Apr 7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: f61ac4638000 R15: f61ac4630040 Apr 7 14:02:19 rancher1 kernel: [2089793.935871] FS: () GS:9c647fc0() knlGS: Apr 7 14:02:19 rancher1 kernel: [2089793.966483] CS: 0010 DS: ES: CR0: 80050033 Apr 7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 3240a005 CR4: 001606f0 Apr 7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace: Apr 7 14:02:19 rancher1 kernel: [2089794.028683] split_huge_page_to_list+0x76e/0x7f0 Apr 7 14:02:19 rancher1 kernel: [2089794.051250] deferred_split_scan+0x177/0x2d0 Apr 7 14:02:19 rancher1 kernel: [2089794.065213] shrink_slab.part.50+0x20b/0x440 Apr 7 14:02:19 rancher1 kernel: [2089794.083856] shrink_node+0x2fc/0x310 Apr 7 14:02:19 rancher1 kernel: [2089794.097963] kswapd+0x32a/0x770 Apr 7 14:02:19 rancher1 kernel: [2089794.110523] kthread+0x105/0x140 Apr 7 14:02:19 rancher1 kernel: [2089794.122680] ? mem_cgroup_shrink_node+0x190/0x190 Apr 7 14:02:19 rancher1 kernel: [2089794.139139] ? kthread_destroy_worker+0x50/0x50 Apr 7 14:02:19 rancher1 kernel: [2089794.155543] ret_from_fork+0x35/0x40 Apr 7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff ff ff eb e9 0f Apr 7 14:02:19 rancher1 kernel: [2089794.237196] RIP: split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0 Apr 7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007 Apr 7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 5b797d89aee7fc1b ]--- The machine become unstable after this until reboot, like reading some namespaced process' command arguments hung, so it is possible that there was some kernel data structure corruption. The machine was under large memory
[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster
this bug is present in the current upstream also (v5.8). Red Hat is working on the fix (ref: 1739593, private). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-azure in Ubuntu. https://bugs.launchpad.net/bugs/1823859 Title: NULL pointer dereference in split_swap_cluster Status in linux-azure package in Ubuntu: New Bug description: We have encountered the following oops on one of our VMs: Apr 7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle kernel NULL pointer dereference at 0007 Apr 7 14:02:19 rancher1 kernel: [2089793.282782] IP: split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0 Apr 7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI Apr 7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul Apr 7 14:02:19 rancher1 kernel: [2089793.618910] crc32_pclmul ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard cfbimgblt hid cfbcopyarea hv_netvsc hv_utils Apr 7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu Apr 7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 Apr 7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 0010:split_swap_cluster+0x4f/0x70 Apr 7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 EFLAGS: 00010246 Apr 7 14:02:19 rancher1 kernel: [2089793.800432] RAX: RBX: 007290de RCX: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 00118df9 RDI: 007290de Apr 7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 0001 R09: 9c647ffd4d00 Apr 7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 0001 R12: f61ac463 Apr 7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: f61ac4638000 R15: f61ac4630040 Apr 7 14:02:19 rancher1 kernel: [2089793.935871] FS: () GS:9c647fc0() knlGS: Apr 7 14:02:19 rancher1 kernel: [2089793.966483] CS: 0010 DS: ES: CR0: 80050033 Apr 7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 3240a005 CR4: 001606f0 Apr 7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace: Apr 7 14:02:19 rancher1 kernel: [2089794.028683] split_huge_page_to_list+0x76e/0x7f0 Apr 7 14:02:19 rancher1 kernel: [2089794.051250] deferred_split_scan+0x177/0x2d0 Apr 7 14:02:19 rancher1 kernel: [2089794.065213] shrink_slab.part.50+0x20b/0x440 Apr 7 14:02:19 rancher1 kernel: [2089794.083856] shrink_node+0x2fc/0x310 Apr 7 14:02:19 rancher1 kernel: [2089794.097963] kswapd+0x32a/0x770 Apr 7 14:02:19 rancher1 kernel: [2089794.110523] kthread+0x105/0x140 Apr 7 14:02:19 rancher1 kernel: [2089794.122680] ? mem_cgroup_shrink_node+0x190/0x190 Apr 7 14:02:19 rancher1 kernel: [2089794.139139] ? kthread_destroy_worker+0x50/0x50 Apr 7 14:02:19 rancher1 kernel: [2089794.155543] ret_from_fork+0x35/0x40 Apr 7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff ff ff eb e9 0f Apr 7 14:02:19 rancher1 kernel: [2089794.237196] RIP: split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0 Apr 7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007 Apr 7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 5b797d89aee7fc1b ]--- The machine become unstable after this until reboot, like reading some namespaced process' command arguments hung, so it is possible that there was some kernel data structure corruption. The machine was under large memory pressure, when this happened. To manage notifications about this bug go to: