[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster

2020-11-03 Thread Marcelo Cerri
Thanks for reporting this bug, Hóka.

The commit posted by Jacob mentions that the bug happens when HDD is
used as a swap device. Do you have something like that in your
environment? Also if you have a reproduce that can trigger the problem
let me know.

Thank you.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1823859

Title:
  NULL pointer dereference in split_swap_cluster

Status in linux-azure package in Ubuntu:
  New

Bug description:
  We have encountered the following oops on one of our VMs:

  Apr  7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle 
kernel NULL pointer dereference at 0007
  Apr  7 14:02:19 rancher1 kernel: [2089793.282782] IP: 
split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0
  Apr  7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI
  Apr  7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs 
msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set 
ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat 
xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge 
stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack 
iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon 
serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul
  Apr  7 14:02:19 rancher1 kernel: [2089793.618910]  crc32_pclmul 
ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd 
glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard 
cfbimgblt hid cfbcopyarea hv_netvsc hv_utils
  Apr  7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: 
kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu
  Apr  7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
  Apr  7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 
0010:split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 
EFLAGS: 00010246
  Apr  7 14:02:19 rancher1 kernel: [2089793.800432] RAX:  RBX: 
007290de RCX: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 
00118df9 RDI: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 
0001 R09: 9c647ffd4d00
  Apr  7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 
0001 R12: f61ac463
  Apr  7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: 
f61ac4638000 R15: f61ac4630040
  Apr  7 14:02:19 rancher1 kernel: [2089793.935871] FS:  () 
GS:9c647fc0() knlGS:
  Apr  7 14:02:19 rancher1 kernel: [2089793.966483] CS:  0010 DS:  ES:  
CR0: 80050033
  Apr  7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 
3240a005 CR4: 001606f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace:
  Apr  7 14:02:19 rancher1 kernel: [2089794.028683]  
split_huge_page_to_list+0x76e/0x7f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.051250]  
deferred_split_scan+0x177/0x2d0
  Apr  7 14:02:19 rancher1 kernel: [2089794.065213]  
shrink_slab.part.50+0x20b/0x440
  Apr  7 14:02:19 rancher1 kernel: [2089794.083856]  shrink_node+0x2fc/0x310
  Apr  7 14:02:19 rancher1 kernel: [2089794.097963]  kswapd+0x32a/0x770
  Apr  7 14:02:19 rancher1 kernel: [2089794.110523]  kthread+0x105/0x140
  Apr  7 14:02:19 rancher1 kernel: [2089794.122680]  ? 
mem_cgroup_shrink_node+0x190/0x190
  Apr  7 14:02:19 rancher1 kernel: [2089794.139139]  ? 
kthread_destroy_worker+0x50/0x50
  Apr  7 14:02:19 rancher1 kernel: [2089794.155543]  ret_from_fork+0x35/0x40
  Apr  7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 
48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 
00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff 
ff ff eb e9 0f
  Apr  7 14:02:19 rancher1 kernel: [2089794.237196] RIP: 
split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0
  Apr  7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007
  Apr  7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 
5b797d89aee7fc1b ]---

  The machine become unstable after this until reboot, like reading some
  namespaced process' command arguments hung, so it is possible that
  there was 

[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster

2020-10-26 Thread Jacob
in case anyone's interested, the upstream fix is:

commit c4f9c701f9b44299e6adbc58d1a4bb2c40383494
Author: Huang Ying 
Date:   Thu Oct 15 20:06:07 2020 -0700
mm: fix a race during THP splitting

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1823859

Title:
  NULL pointer dereference in split_swap_cluster

Status in linux-azure package in Ubuntu:
  New

Bug description:
  We have encountered the following oops on one of our VMs:

  Apr  7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle 
kernel NULL pointer dereference at 0007
  Apr  7 14:02:19 rancher1 kernel: [2089793.282782] IP: 
split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0
  Apr  7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI
  Apr  7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs 
msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set 
ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat 
xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge 
stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack 
iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon 
serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul
  Apr  7 14:02:19 rancher1 kernel: [2089793.618910]  crc32_pclmul 
ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd 
glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard 
cfbimgblt hid cfbcopyarea hv_netvsc hv_utils
  Apr  7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: 
kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu
  Apr  7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
  Apr  7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 
0010:split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 
EFLAGS: 00010246
  Apr  7 14:02:19 rancher1 kernel: [2089793.800432] RAX:  RBX: 
007290de RCX: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 
00118df9 RDI: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 
0001 R09: 9c647ffd4d00
  Apr  7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 
0001 R12: f61ac463
  Apr  7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: 
f61ac4638000 R15: f61ac4630040
  Apr  7 14:02:19 rancher1 kernel: [2089793.935871] FS:  () 
GS:9c647fc0() knlGS:
  Apr  7 14:02:19 rancher1 kernel: [2089793.966483] CS:  0010 DS:  ES:  
CR0: 80050033
  Apr  7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 
3240a005 CR4: 001606f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace:
  Apr  7 14:02:19 rancher1 kernel: [2089794.028683]  
split_huge_page_to_list+0x76e/0x7f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.051250]  
deferred_split_scan+0x177/0x2d0
  Apr  7 14:02:19 rancher1 kernel: [2089794.065213]  
shrink_slab.part.50+0x20b/0x440
  Apr  7 14:02:19 rancher1 kernel: [2089794.083856]  shrink_node+0x2fc/0x310
  Apr  7 14:02:19 rancher1 kernel: [2089794.097963]  kswapd+0x32a/0x770
  Apr  7 14:02:19 rancher1 kernel: [2089794.110523]  kthread+0x105/0x140
  Apr  7 14:02:19 rancher1 kernel: [2089794.122680]  ? 
mem_cgroup_shrink_node+0x190/0x190
  Apr  7 14:02:19 rancher1 kernel: [2089794.139139]  ? 
kthread_destroy_worker+0x50/0x50
  Apr  7 14:02:19 rancher1 kernel: [2089794.155543]  ret_from_fork+0x35/0x40
  Apr  7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 
48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 
00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff 
ff ff eb e9 0f
  Apr  7 14:02:19 rancher1 kernel: [2089794.237196] RIP: 
split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0
  Apr  7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007
  Apr  7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 
5b797d89aee7fc1b ]---

  The machine become unstable after this until reboot, like reading some
  namespaced process' command arguments hung, so it is possible that
  there was some kernel data structure corruption. The machine was under
  large memory 

[Kernel-packages] [Bug 1823859] Re: NULL pointer dereference in split_swap_cluster

2020-08-04 Thread Jacob
this bug is present in the current upstream also (v5.8).
Red Hat is working on the fix (ref: 1739593, private).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1823859

Title:
  NULL pointer dereference in split_swap_cluster

Status in linux-azure package in Ubuntu:
  New

Bug description:
  We have encountered the following oops on one of our VMs:

  Apr  7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle 
kernel NULL pointer dereference at 0007
  Apr  7 14:02:19 rancher1 kernel: [2089793.282782] IP: 
split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0
  Apr  7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI
  Apr  7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs 
msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set 
ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat 
xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter nf_nat br_netfilter bridge 
stp llc nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack 
iptable_security ip_tables x_tables aufs overlay mlx4_en pci_hyperv hv_balloon 
serio_raw joydev ib_iser iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
crct10dif_pclmul
  Apr  7 14:02:19 rancher1 kernel: [2089793.618910]  crc32_pclmul 
ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd 
glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard 
cfbimgblt hid cfbcopyarea hv_netvsc hv_utils
  Apr  7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: 
kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu
  Apr  7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft 
Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
  Apr  7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 
0010:split_swap_cluster+0x4f/0x70
  Apr  7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:aaf900fbfbe0 
EFLAGS: 00010246
  Apr  7 14:02:19 rancher1 kernel: [2089793.800432] RAX:  RBX: 
007290de RCX: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.824572] RDX: aaf905001000 RSI: 
00118df9 RDI: 007290de
  Apr  7 14:02:19 rancher1 kernel: [2089793.854139] RBP: aaf900fbfbe8 R08: 
0001 R09: 9c647ffd4d00
  Apr  7 14:02:19 rancher1 kernel: [2089793.882588] R10: 9c647ffd4000 R11: 
0001 R12: f61ac463
  Apr  7 14:02:19 rancher1 kernel: [2089793.909530] R13: f61ac4630080 R14: 
f61ac4638000 R15: f61ac4630040
  Apr  7 14:02:19 rancher1 kernel: [2089793.935871] FS:  () 
GS:9c647fc0() knlGS:
  Apr  7 14:02:19 rancher1 kernel: [2089793.966483] CS:  0010 DS:  ES:  
CR0: 80050033
  Apr  7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0007 CR3: 
3240a005 CR4: 001606f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace:
  Apr  7 14:02:19 rancher1 kernel: [2089794.028683]  
split_huge_page_to_list+0x76e/0x7f0
  Apr  7 14:02:19 rancher1 kernel: [2089794.051250]  
deferred_split_scan+0x177/0x2d0
  Apr  7 14:02:19 rancher1 kernel: [2089794.065213]  
shrink_slab.part.50+0x20b/0x440
  Apr  7 14:02:19 rancher1 kernel: [2089794.083856]  shrink_node+0x2fc/0x310
  Apr  7 14:02:19 rancher1 kernel: [2089794.097963]  kswapd+0x32a/0x770
  Apr  7 14:02:19 rancher1 kernel: [2089794.110523]  kthread+0x105/0x140
  Apr  7 14:02:19 rancher1 kernel: [2089794.122680]  ? 
mem_cgroup_shrink_node+0x190/0x190
  Apr  7 14:02:19 rancher1 kernel: [2089794.139139]  ? 
kthread_destroy_worker+0x50/0x50
  Apr  7 14:02:19 rancher1 kernel: [2089794.155543]  ret_from_fork+0x35/0x40
  Apr  7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 
48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 
00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff 
ff ff eb e9 0f
  Apr  7 14:02:19 rancher1 kernel: [2089794.237196] RIP: 
split_swap_cluster+0x4f/0x70 RSP: aaf900fbfbe0
  Apr  7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0007
  Apr  7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 
5b797d89aee7fc1b ]---

  The machine become unstable after this until reboot, like reading some
  namespaced process' command arguments hung, so it is possible that
  there was some kernel data structure corruption. The machine was under
  large memory pressure, when this happened.

To manage notifications about this bug go to: