Public bug reported: Hi I'm facing the following crash now two times in a row while runnign the same test - so somewhat reproducible it seems:
[ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 1444.431172] #PF: supervisor write access in kernel mode [ 1444.454715] #PF: error_code(0x0002) - not-present page [ 1444.478052] PGD 0 P4D 0 [ 1444.489448] Oops: 0002 [#1] SMP PTI [ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P W O 5.13.0-27-generic #29~20.04.1-Ubuntu [ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018 [ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1444.881389] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1444.918201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 [ 1444.977001] Call Trace: [ 1444.988071] ? blk_mq_exit_hctx+0x160/0x160 [ 1445.007037] cpuhp_invoke_callback+0x179/0x430 [ 1445.027179] cpuhp_invoke_callback_range+0x44/0x80 [ 1445.048737] _cpu_down+0x109/0x310 [ 1445.064062] cpu_down+0x36/0x60 [ 1445.077882] cpu_device_down+0x16/0x20 [ 1445.094741] cpu_subsys_offline+0xe/0x10 [ 1445.112439] device_offline+0x8e/0xc0 [ 1445.129064] online_store+0x4c/0x90 [ 1445.144835] dev_attr_store+0x17/0x30 [ 1445.161307] sysfs_kf_write+0x3e/0x50 [ 1445.177856] kernfs_fop_write_iter+0x138/0x1c0 [ 1445.198036] new_sync_write+0x117/0x1b0 [ 1445.215386] vfs_write+0x185/0x250 [ 1445.230649] ksys_write+0x67/0xe0 [ 1445.245565] __x64_sys_write+0x1a/0x20 [ 1445.262448] do_syscall_64+0x61/0xb0 [ 1445.278585] ? do_syscall_64+0x6e/0xb0 [ 1445.295940] ? asm_exc_page_fault+0x8/0x30 [ 1445.314969] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1445.338356] RIP: 0033:0x7f1c8fda30a7 [ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1c8fda30a7 [ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 0000000000000004 [ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 00007fffed1c43c0 [ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 0000000000000001 [ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf ipmi_msghandler msr [ 1445.636951] ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi [ 1446.267521] CR2: 0000000000000008 [ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]--- [ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow Control: None [ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1446.332981] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1446.332982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 The system is somewhat stuck afterwards. I can't get back to libvirt (not restart the service, not spawn a new guest), nor openvswitch (ovs-vsctl show) all those calls get stuck while other things somewhat work. But also e.g. a new ssh login is stuck, so debugging after the crash is very limited. The order in which the tests do things is like: 1. set up a simple openvswitch 2. start the libvirt network for this OVS instance 3. disable cpus 5-11 (as I want the test to only have 0-4) 4. start a VM guest using that OVS network It seems (based on log timing) that the crash might happen at either at chcpu or guest start. --- details --- The guest I start is via uvtool: $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048 --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily Differences to a "normal" guest are - using an openvswitch connection - using huge pages The used template is like this: <domain type='kvm'> <os> <type>hvm</type> <boot dev='hd'/> </os> <!-- we pass as much as possible --> <cpu mode='host-passthrough'> <numa> <cell id='0' cpus='0-3' memory='2097152' unit='KiB' memAccess='shared'/> </numa> </cpu> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <hugepages> <page size="2" unit="M" nodeset="0"/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> <features> <acpi/> <apic/> <pae/> </features> <devices> <interface type='network'> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='network'> <source network='ovsbr0'/> <model type='virtio'/> <driver name='vhost' queues='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/3'/> <target port='0'/> </serial> <video/> </devices> </domain> The OVS setup is rather simple, one internal bridge and one upstream port, nothing "too special". This looks like: + ovs-vsctl show 8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c Bridge ovsbr0 Port eno49 Interface eno49 Port ovsbr0 Interface ovsbr0 type: internal ovs_version: "2.13.5 Libvirt knows about that OVS bridge and has a network to use it like: <network> <name>ovsbr0</name> <forward mode='bridge'/> <bridge name='ovsbr0'/> <virtualport type='openvswitch'/> </network> --- ProblemType: Bug AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Mar 29 07:06 seq crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.21 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: HP ProLiant DL360 Gen9 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200 ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19 RelatedPackageVersions: linux-restricted-modules-5.13.0-27-generic N/A linux-backports-modules-5.13.0-27-generic N/A linux-firmware 1.187.29 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.13.0-27-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: kvm libvirt _MarkForUpload: True dmi.bios.date: 01/22/2018 dmi.bios.release: 2.56 dmi.bios.vendor: HP dmi.bios.version: P89 dmi.board.name: ProLiant DL360 Gen9 dmi.board.vendor: HP dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.ec.firmware.release: 2.60 dmi.modalias: dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01: dmi.product.family: ProLiant dmi.product.name: ProLiant DL360 Gen9 dmi.product.sku: 780018-S01 dmi.sys.vendor: HP ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: apport-collected focal uec-images ** Tags added: apport-collected focal uec-images ** Description changed: Hi I'm facing the following crash now two times in a row while runnign the same test - so somewhat reproducible it seems: [ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 1444.431172] #PF: supervisor write access in kernel mode [ 1444.454715] #PF: error_code(0x0002) - not-present page [ 1444.478052] PGD 0 P4D 0 [ 1444.489448] Oops: 0002 [#1] SMP PTI [ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P W O 5.13.0-27-generic #29~20.04.1-Ubuntu [ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018 [ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1444.881389] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1444.918201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 [ 1444.977001] Call Trace: [ 1444.988071] ? blk_mq_exit_hctx+0x160/0x160 [ 1445.007037] cpuhp_invoke_callback+0x179/0x430 [ 1445.027179] cpuhp_invoke_callback_range+0x44/0x80 [ 1445.048737] _cpu_down+0x109/0x310 [ 1445.064062] cpu_down+0x36/0x60 [ 1445.077882] cpu_device_down+0x16/0x20 [ 1445.094741] cpu_subsys_offline+0xe/0x10 [ 1445.112439] device_offline+0x8e/0xc0 [ 1445.129064] online_store+0x4c/0x90 [ 1445.144835] dev_attr_store+0x17/0x30 [ 1445.161307] sysfs_kf_write+0x3e/0x50 [ 1445.177856] kernfs_fop_write_iter+0x138/0x1c0 [ 1445.198036] new_sync_write+0x117/0x1b0 [ 1445.215386] vfs_write+0x185/0x250 [ 1445.230649] ksys_write+0x67/0xe0 [ 1445.245565] __x64_sys_write+0x1a/0x20 [ 1445.262448] do_syscall_64+0x61/0xb0 [ 1445.278585] ? do_syscall_64+0x6e/0xb0 [ 1445.295940] ? asm_exc_page_fault+0x8/0x30 [ 1445.314969] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1445.338356] RIP: 0033:0x7f1c8fda30a7 [ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1c8fda30a7 [ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 0000000000000004 [ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 00007fffed1c43c0 [ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 0000000000000001 [ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf ipmi_msghandler msr [ 1445.636951] ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi [ 1446.267521] CR2: 0000000000000008 [ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]--- [ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow Control: None [ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1446.332981] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1446.332982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 The system is somewhat stuck afterwards. I can't get back to libvirt (not restart the service, not spawn a new guest), nor openvswitch (ovs-vsctl show) all those calls get stuck while other things somewhat work. But also e.g. a new ssh login is stuck, so debugging after the crash is very limited. The order in which the tests do things is like: 1. set up a simple openvswitch 2. start the libvirt network for this OVS instance 3. disable cpus 5-11 (as I want the test to only have 0-4) 4. start a VM guest using that OVS network It seems (based on log timing) that the crash might happen at either at chcpu or guest start. --- details --- The guest I start is via uvtool: $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048 --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily Differences to a "normal" guest are - using an openvswitch connection - using huge pages The used template is like this: <domain type='kvm'> <os> <type>hvm</type> <boot dev='hd'/> </os> <!-- we pass as much as possible --> <cpu mode='host-passthrough'> <numa> <cell id='0' cpus='0-3' memory='2097152' unit='KiB' memAccess='shared'/> </numa> </cpu> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <hugepages> <page size="2" unit="M" nodeset="0"/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> <features> <acpi/> <apic/> <pae/> </features> <devices> <interface type='network'> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='network'> <source network='ovsbr0'/> <model type='virtio'/> <driver name='vhost' queues='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/3'/> <target port='0'/> </serial> <video/> </devices> </domain> The OVS setup is rather simple, one internal bridge and one upstream port, nothing "too special". This looks like: + ovs-vsctl show 8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c Bridge ovsbr0 Port eno49 Interface eno49 Port ovsbr0 Interface ovsbr0 type: internal ovs_version: "2.13.5 Libvirt knows about that OVS bridge and has a network to use it like: <network> <name>ovsbr0</name> <forward mode='bridge'/> <bridge name='ovsbr0'/> <virtualport type='openvswitch'/> </network> + --- + ProblemType: Bug + AlsaDevices: + total 0 + crw-rw---- 1 root audio 116, 1 Mar 29 07:06 seq + crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer + AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' + ApportVersion: 2.20.11-0ubuntu27.21 + Architecture: amd64 + ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' + AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: + CasperMD5CheckResult: skip + DistroRelease: Ubuntu 20.04 + IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' + MachineType: HP ProLiant DL360 Gen9 + NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair + Package: linux (not installed) + PciMultimedia: + + ProcFB: 0 mgag200drmfb + ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200 + ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19 + RelatedPackageVersions: + linux-restricted-modules-5.13.0-27-generic N/A + linux-backports-modules-5.13.0-27-generic N/A + linux-firmware 1.187.29 + RfKill: Error: [Errno 2] No such file or directory: 'rfkill' + Tags: focal uec-images + Uname: Linux 5.13.0-27-generic x86_64 + UpgradeStatus: No upgrade log present (probably fresh install) + UserGroups: kvm libvirt + _MarkForUpload: True + dmi.bios.date: 01/22/2018 + dmi.bios.release: 2.56 + dmi.bios.vendor: HP + dmi.bios.version: P89 + dmi.board.name: ProLiant DL360 Gen9 + dmi.board.vendor: HP + dmi.chassis.type: 23 + dmi.chassis.vendor: HP + dmi.ec.firmware.release: 2.60 + dmi.modalias: dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01: + dmi.product.family: ProLiant + dmi.product.name: ProLiant DL360 Gen9 + dmi.product.sku: 780018-S01 + dmi.sys.vendor: HP -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1966870 Title: Focal 20.04.4 crashing when using openvswitch/hugepages Status in linux package in Ubuntu: New Bug description: Hi I'm facing the following crash now two times in a row while runnign the same test - so somewhat reproducible it seems: [ 1444.399448] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 1444.431172] #PF: supervisor write access in kernel mode [ 1444.454715] #PF: error_code(0x0002) - not-present page [ 1444.478052] PGD 0 P4D 0 [ 1444.489448] Oops: 0002 [#1] SMP PTI [ 1444.505120] CPU: 6 PID: 26233 Comm: chcpu Tainted: P W O 5.13.0-27-generic #29~20.04.1-Ubuntu [ 1444.549884] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/22/2018 [ 1444.587322] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1444.611352] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1444.696490] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1444.720510] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1444.752719] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1444.784978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1444.816712] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1444.848844] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1444.881389] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1444.918201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1444.944633] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 [ 1444.977001] Call Trace: [ 1444.988071] ? blk_mq_exit_hctx+0x160/0x160 [ 1445.007037] cpuhp_invoke_callback+0x179/0x430 [ 1445.027179] cpuhp_invoke_callback_range+0x44/0x80 [ 1445.048737] _cpu_down+0x109/0x310 [ 1445.064062] cpu_down+0x36/0x60 [ 1445.077882] cpu_device_down+0x16/0x20 [ 1445.094741] cpu_subsys_offline+0xe/0x10 [ 1445.112439] device_offline+0x8e/0xc0 [ 1445.129064] online_store+0x4c/0x90 [ 1445.144835] dev_attr_store+0x17/0x30 [ 1445.161307] sysfs_kf_write+0x3e/0x50 [ 1445.177856] kernfs_fop_write_iter+0x138/0x1c0 [ 1445.198036] new_sync_write+0x117/0x1b0 [ 1445.215386] vfs_write+0x185/0x250 [ 1445.230649] ksys_write+0x67/0xe0 [ 1445.245565] __x64_sys_write+0x1a/0x20 [ 1445.262448] do_syscall_64+0x61/0xb0 [ 1445.278585] ? do_syscall_64+0x6e/0xb0 [ 1445.295940] ? asm_exc_page_fault+0x8/0x30 [ 1445.314969] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1445.338356] RIP: 0033:0x7f1c8fda30a7 [ 1445.355062] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 1445.440161] RSP: 002b:00007fffed1c4418 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 1445.474829] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007f1c8fda30a7 [ 1445.507219] RDX: 0000000000000001 RSI: 0000559369f25869 RDI: 0000000000000004 [ 1445.539438] RBP: 00007f1c8fe88500 R08: 0000000000000000 R09: 00007fffed1c43c0 [ 1445.572547] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 1445.604842] R13: 00007fffed1c4420 R14: 0000559369f25869 R15: 0000000000000001 [ 1445.636897] Modules linked in: vhost_net tap ebtable_filter ebtables veth nbd xt_comment zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_tables ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 uio_pci_generic uio nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp rpcrdma kvm_intel sunrpc kvm rdma_ucm ib_iser libiscsi scsi_transport_iscsi rapl ib_umad rdma_cm ib_ipoib intel_cstate efi_pstore iw_cm ib_cm hpilo ioatdma acpi_ipmi acpi_tad ipmi_si mac_hid acpi_power_meter sch_fq_codel ipmi_devintf ipmi_msghandler msr [ 1445.636951] ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core ses enclosure mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt fb_sys_fops aesni_intel mlx5_core cec ixgbe pci_hyperv_intf crypto_simd xfrm_algo psample nvme cryptd rc_core hpsa mlxfw i2c_i801 dca xhci_pci drm i2c_smbus lpc_ich tg3 tls xhci_pci_renesas nvme_core mdio scsi_transport_sas wmi [ 1446.267521] CR2: 0000000000000008 [ 1446.282506] ---[ end trace 99f81ab62ed1f929 ]--- [ 1446.308857] RIP: 0010:blk_mq_hctx_notify_dead+0xc7/0x190 [ 1446.311595] ixgbe 0000:04:00.1 eno50: NIC Link is Up 10 Gbps, Flow Control: None [ 1446.332973] Code: 04 49 8d 54 05 08 4c 01 e8 48 8b 48 08 48 39 ca 74 66 48 8b 48 08 48 39 ca 74 21 48 8b 3a 48 8b 4d c8 48 8b 72 08 48 89 7d c8 <4c> 89 77 08 48 89 0e 48 89 71 08 48 89 12 48 89 50 10 41 0f b7 84 [ 1446.332975] RSP: 0018:ffffbf5d818dbbf0 EFLAGS: 00010282 [ 1446.332977] RAX: ffffdf5d7fb788c0 RBX: 0000000000000000 RCX: ffffbf5d818dbbf0 [ 1446.332978] RDX: ffffdf5d7fb788c8 RSI: 0000000000000000 RDI: 0000000000000000 [ 1446.332978] RBP: ffffbf5d818dbc28 R08: 0000000000000000 R09: ffffbf5d818dbae8 [ 1446.332979] R10: 0000000000000001 R11: 0000000000000001 R12: ffff983d939b0000 [ 1446.332980] R13: ffffdf5d7fb788c0 R14: ffffbf5d818dbbf0 R15: 0000000000000005 [ 1446.332981] FS: 00007f1c8fe88580(0000) GS:ffff9844dfb80000(0000) knlGS:0000000000000000 [ 1446.332982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1446.332983] CR2: 0000000000000008 CR3: 000000064ddf6006 CR4: 00000000001706e0 The system is somewhat stuck afterwards. I can't get back to libvirt (not restart the service, not spawn a new guest), nor openvswitch (ovs-vsctl show) all those calls get stuck while other things somewhat work. But also e.g. a new ssh login is stuck, so debugging after the crash is very limited. The order in which the tests do things is like: 1. set up a simple openvswitch 2. start the libvirt network for this OVS instance 3. disable cpus 5-11 (as I want the test to only have 0-4) 4. start a VM guest using that OVS network It seems (based on log timing) that the crash might happen at either at chcpu or guest start. --- details --- The guest I start is via uvtool: $ uvt-kvm create --run-script-once=init-ens4-dhcp.sh --memory 2048 --template guest-openvswitch-1.xml --machine-type ubuntu --unsafe-caching --cpu 4 --ssh-public-key-file /home/ubuntu/.ssh/id_rsa.pub --password=ubuntu guest-openvswitch-1 release=focal arch=amd64 label=daily Differences to a "normal" guest are - using an openvswitch connection - using huge pages The used template is like this: <domain type='kvm'> <os> <type>hvm</type> <boot dev='hd'/> </os> <!-- we pass as much as possible --> <cpu mode='host-passthrough'> <numa> <cell id='0' cpus='0-3' memory='2097152' unit='KiB' memAccess='shared'/> </numa> </cpu> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <hugepages> <page size="2" unit="M" nodeset="0"/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> <features> <acpi/> <apic/> <pae/> </features> <devices> <interface type='network'> <source network='default'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='network'> <source network='ovsbr0'/> <model type='virtio'/> <driver name='vhost' queues='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/3'/> <target port='0'/> </serial> <video/> </devices> </domain> The OVS setup is rather simple, one internal bridge and one upstream port, nothing "too special". This looks like: + ovs-vsctl show 8dfc2067-7b9b-48d7-a50a-df17bbd3cb6c Bridge ovsbr0 Port eno49 Interface eno49 Port ovsbr0 Interface ovsbr0 type: internal ovs_version: "2.13.5 Libvirt knows about that OVS bridge and has a network to use it like: <network> <name>ovsbr0</name> <forward mode='bridge'/> <bridge name='ovsbr0'/> <virtualport type='openvswitch'/> </network> --- ProblemType: Bug AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Mar 29 07:06 seq crw-rw---- 1 root audio 116, 33 Mar 29 07:06 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.20.11-0ubuntu27.21 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CasperMD5CheckResult: skip DistroRelease: Ubuntu 20.04 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' MachineType: HP ProLiant DL360 Gen9 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair Package: linux (not installed) PciMultimedia: ProcFB: 0 mgag200drmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.13.0-27-generic root=UUID=c941b173-e6b5-485a-a02b-8d966b8d3c73 ro --- console=ttyS1,115200 ProcVersionSignature: Ubuntu 5.13.0-27.29~20.04.1-generic 5.13.19 RelatedPackageVersions: linux-restricted-modules-5.13.0-27-generic N/A linux-backports-modules-5.13.0-27-generic N/A linux-firmware 1.187.29 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' Tags: focal uec-images Uname: Linux 5.13.0-27-generic x86_64 UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: kvm libvirt _MarkForUpload: True dmi.bios.date: 01/22/2018 dmi.bios.release: 2.56 dmi.bios.vendor: HP dmi.bios.version: P89 dmi.board.name: ProLiant DL360 Gen9 dmi.board.vendor: HP dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.ec.firmware.release: 2.60 dmi.modalias: dmi:bvnHP:bvrP89:bd01/22/2018:br2.56:efr2.60:svnHP:pnProLiantDL360Gen9:pvr:rvnHP:rnProLiantDL360Gen9:rvr:cvnHP:ct23:cvr:sku780018-S01: dmi.product.family: ProLiant dmi.product.name: ProLiant DL360 Gen9 dmi.product.sku: 780018-S01 dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1966870/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp