This bug was fixed in the package linux - 4.15.0-118.119 --------------- linux (4.15.0-118.119) bionic; urgency=medium
* bionic/linux: 4.15.0-118.119 -proposed tracker (LP: #1894697) * Packaging resync (LP: #1786013) - update dkms package versions * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674) - [packaging] add signed modules for nvidia 450 and 450-server * cgroup refcount is bogus when cgroup_sk_alloc is disabled (LP: #1886860) - cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone() * CVE-2020-12888 - vfio/type1: Support faulting PFNMAP vmas - vfio-pci: Fault mmaps to enable vma tracking - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory * [Hyper-V] VSS and File Copy daemons intermittently fails to start (LP: #1891224) - [Packaging] Bind hv_vss_daemon startup to hv_vss device - [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device * KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host (LP: #1837810) - KVM: fix overflow of zero page refcount with ksm running * Fix false-negative return value for rtnetlink.sh in kselftests/net (LP: #1890136) - selftests: rtnetlink: correct the final return value for the test - selftests: rtnetlink: make kci_test_encap() return sub-test result * Bionic update: upstream stable patchset 2020-08-18 (LP: #1892091) - USB: serial: qcserial: add EM7305 QDL product ID - USB: iowarrior: fix up report size handling for some devices - usb: xhci: define IDs for various ASMedia host controllers - usb: xhci: Fix ASMedia ASM1142 DMA addressing - Revert "ALSA: hda: call runtime_allow() for all hda controllers" - ALSA: seq: oss: Serialize ioctls - staging: android: ashmem: Fix lockdep warning for write operation - Bluetooth: Fix slab-out-of-bounds read in hci_extended_inquiry_result_evt() - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_evt() - Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_with_rssi_evt() - omapfb: dss: Fix max fclk divider for omap36xx - binder: Prevent context manager from incrementing ref 0 - vgacon: Fix for missing check in scrollback handling - mtd: properly check all write ioctls for permissions - leds: wm831x-status: fix use-after-free on unbind - leds: da903x: fix use-after-free on unbind - leds: lm3533: fix use-after-free on unbind - leds: 88pm860x: fix use-after-free on unbind - net/9p: validate fds in p9_fd_open - drm/nouveau/fbcon: fix module unload when fbcon init has failed for some reason - drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure - i2c: slave: improve sanity check when registering - i2c: slave: add sanity check when unregistering - usb: hso: check for return value in hso_serial_common_create() - firmware: Fix a reference count leak. - cfg80211: check vendor command doit pointer before use - igb: reinit_locked() should be called with rtnl_lock - atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent - tools lib traceevent: Fix memory leak in process_dynamic_array_len - Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23) - xattr: break delegations in {set,remove}xattr - ipv4: Silence suspicious RCU usage warning - ipv6: fix memory leaks on IPV6_ADDRFORM path - net: ethernet: mtk_eth_soc: fix MTU warnings - vxlan: Ensure FDB dump is performed under RCU - net: lan78xx: replace bogus endpoint lookup - hv_netvsc: do not use VF device if link is down - net: gre: recompute gre csum for sctp over gre tunnels - openvswitch: Prevent kernel-infoleak in ovs_ct_put_key() - Revert "vxlan: fix tos value before xmit" - selftests/net: relax cpu affinity requirement in msg_zerocopy test - rxrpc: Fix race between recvmsg and sendmsg on immediate call failure - i40e: add num_vectors checker in iwarp handler - i40e: Wrong truncation from u16 to u8 - i40e: Memory leak in i40e_config_iwarp_qvlist - Smack: fix use-after-free in smk_write_relabel_self() * Bionic update: upstream stable patchset 2020-08-11 (LP: #1891228) - AX.25: Fix out-of-bounds read in ax25_connect() - AX.25: Prevent out-of-bounds read in ax25_sendmsg() - dev: Defer free of skbs in flush_backlog - drivers/net/wan/x25_asy: Fix to make it work - net-sysfs: add a newline when printing 'tx_timeout' by sysfs - net: udp: Fix wrong clean up for IS_UDPLITE macro - rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA - AX.25: Prevent integer overflows in connect and sendmsg - ip6_gre: fix null-ptr-deref in ip6gre_init_net() - rtnetlink: Fix memory(net_device) leak when ->newlink fails - tcp: allow at most one TLP probe per flight - regmap: debugfs: check count when read regmap file - qrtr: orphan socket in qrtr_release() - sctp: shrink stream outq only when new outcnt < old outcnt - sctp: shrink stream outq when fails to do addstream reconf - crypto: ccp - Release all allocated memory if sha type is invalid - media: rc: prevent memory leak in cx23888_ir_probe - iio: imu: adis16400: fix memory leak - ath9k_htc: release allocated buffer if timed out - ath9k: release allocated buffer if timed out - PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridge - wireless: Use offsetof instead of custom macro. - ARM: 8986/1: hw_breakpoint: Don't invoke overflow handler on uaccess watchpoints - drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl() - drm: hold gem reference until object is no longer accessed - f2fs: check memory boundary by insane namelen - f2fs: check if file namelen exceeds max value - 9p/trans_fd: abort p9_read_work if req status changed - 9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work - x86/build/lto: Fix truncated .bss with -fdata-sections - rds: Prevent kernel-infoleak in rds_notify_queue_get() - xfs: fix missed wakeup on l_flush_wait - net/x25: Fix x25_neigh refcnt leak when x25 disconnect - net/x25: Fix null-ptr-deref in x25_disconnect - selftests/net: rxtimestamp: fix clang issues for target arch PowerPC - sh: Fix validation of system call number - net: lan78xx: add missing endpoint sanity check - net: lan78xx: fix transfer-buffer memory leak - mlx4: disable device on shutdown - mlxsw: core: Increase scope of RCU read-side critical section - mlxsw: core: Free EMAD transactions using kfree_rcu() - ibmvnic: Fix IRQ mapping disposal in error path - bpf: Fix map leak in HASH_OF_MAPS map - mac80211: mesh: Free ie data when leaving mesh - mac80211: mesh: Free pending skb when destroying a mpath - arm64/alternatives: move length validation inside the subsection - arm64: csum: Fix handling of bad packets - usb: hso: Fix debug compile warning on sparc32 - qed: Disable "MFW indication via attention" SPAM every 5 minutes - nfc: s3fwrn5: add missing release on skb in s3fwrn5_recv_frame - parisc: add support for cmpxchg on u8 pointers - net: ethernet: ravb: exit if re-initialization fails in tx timeout - Revert "i2c: cadence: Fix the hold bit setting" - x86/unwind/orc: Fix ORC for newly forked tasks - cxgb4: add missing release on skb in uld_send() - xen-netfront: fix potential deadlock in xennet_remove() - KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled - x86/i8259: Use printk_deferred() to prevent deadlock - drm/amdgpu: fix multiple memory leaks in acp_hw_init - selftests/net: psock_fanout: fix clang issues for target arch PowerPC - net/mlx5: Verify Hardware supports requested ptp function on a given pin - random32: update the net random state on interrupt and activity - ARM: percpu.h: fix build error - random: fix circular include dependency on arm64 after addition of percpu.h - random32: remove net_rand_state from the latent entropy gcc plugin - random32: move the pseudo-random 32-bit definitions to prandom.h - ext4: fix direct I/O read error -- Kleber Sacilotto de Souza <kleber.so...@canonical.com> Tue, 08 Sep 2020 12:09:02 +0200 ** Changed in: linux (Ubuntu Bionic) Status: Fix Committed => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2020-12888 ** Changed in: linux (Ubuntu Focal) Status: Fix Committed => Fix Released ** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-19770 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1837810 Title: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host Status in linux package in Ubuntu: Fix Released Status in linux source package in Bionic: Fix Released Status in linux source package in Focal: Fix Released Bug description: BugLink: https://bugs.launchpad.net/bugs/1837810 [Impact] We are seeing a problem on OpenStack compute nodes, and KVM hosts, where a kernel oops is generated, and all running KVM machines are placed into the pause state. This is caused by the kernel's reserved zero_page reference counter overflowing from a positive number to a negative number, and hitting a (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page(). This only happens if the machine has Kernel Samepage Mapping (KSM) enabled, with "use_zero_pages" turned on. Each time a new VM starts and the kernel does a KSM merge run during a EPT violation, the reference counter for the zero_page is incremented in try_async_pf() and never decremented. Eventually, the reference counter will overflow, causing the KVM subsystem to fail. Syslog: error : qemuMonitorJSONCheckError:392 : internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required QEMU Logs: error: kvm run failed Bad address EAX=000afe00 EBX=0000000b ECX=00000080 EDX=00000cfe ESI=0003fe00 EDI=000afe00 EBP=00000007 ESP=00006d74 EIP=000ee344 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7040 00000037 IDT= 000f707e 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7 <f3> a5 a1 00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31 Kernel Oops: [ 167.695986] WARNING: CPU: 1 PID: 3016 at /build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 follow_page_pte+0x6f4/0x710 [ 167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G OE 4.15.0-106-generic #107~16.04.1-Ubuntu [ 167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710 [ 167.696026] RSP: 0018:ffffa81802023908 EFLAGS: 00010286 [ 167.696027] RAX: ffffed8786e33a80 RBX: ffffed878c6d21b0 RCX: 0000000080000000 [ 167.696027] RDX: 0000000000000000 RSI: 00003ffffffff000 RDI: 80000001b8cea225 [ 167.696028] RBP: ffffa81802023970 R08: 80000001b8cea225 R09: ffff90c4d55fa340 [ 167.696028] R10: 0000000000000000 R11: 0000000000000000 R12: ffffed8786e33a80 [ 167.696029] R13: 0000000000000326 R14: ffff90c4db94fc50 R15: ffff90c4d55fa340 [ 167.696030] FS: 00007f6a7798c700(0000) GS:ffff90c4edc80000(0000) knlGS:0000000000000000 [ 167.696030] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 167.696031] CR2: 0000000000000000 CR3: 0000000315580002 CR4: 0000000000162ee0 [ 167.696033] Call Trace: [ 167.696047] follow_pmd_mask+0x273/0x630 [ 167.696049] follow_page_mask+0x178/0x230 [ 167.696051] __get_user_pages+0xb8/0x740 [ 167.696052] get_user_pages+0x42/0x50 [ 167.696068] __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm] [ 167.696079] ? mmu_set_spte+0x1dd/0x3a0 [kvm] [ 167.696090] try_async_pf+0x66/0x220 [kvm] [ 167.696101] tdp_page_fault+0x14b/0x2b0 [kvm] [ 167.696104] ? vmexit_fill_RSB+0x10/0x40 [kvm_intel] [ 167.696114] kvm_mmu_page_fault+0x62/0x180 [kvm] [ 167.696117] handle_ept_violation+0xbc/0x160 [kvm_intel] [ 167.696119] vmx_handle_exit+0xa5/0x580 [kvm_intel] [ 167.696129] vcpu_enter_guest+0x414/0x1260 [kvm] [ 167.696138] ? kvm_arch_vcpu_load+0x4d/0x280 [kvm] [ 167.696148] kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm] [ 167.696157] ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm] [ 167.696165] kvm_vcpu_ioctl+0x33a/0x610 [kvm] [ 167.696166] ? do_futex+0x129/0x590 [ 167.696171] ? __switch_to+0x34c/0x4e0 [ 167.696174] ? __switch_to_asm+0x35/0x70 [ 167.696176] do_vfs_ioctl+0xa4/0x600 [ 167.696177] SyS_ioctl+0x79/0x90 [ 167.696180] ? exit_to_usermode_loop+0xa5/0xd0 [ 167.696181] do_syscall_64+0x73/0x130 [ 167.696182] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 167.696184] RIP: 0033:0x7f6a80482007 [ 167.696184] RSP: 002b:00007f6a7798b8b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 167.696185] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f6a80482007 [ 167.696185] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000016 [ 167.696186] RBP: 000055fe135f3240 R08: 000055fe118be530 R09: 0000000000000001 [ 167.696186] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 167.696187] R13: 00007f6a85852000 R14: 0000000000000000 R15: 000055fe135f3240 [ 167.696188] Code: 4d 63 e6 e9 f2 fc ff ff 4c 89 45 d0 48 8b 47 10 e8 22 f0 9e 00 4c 8b 45 d0 e9 89 fc ff ff 4c 89 e7 e8 81 3f fd ff e9 aa fc ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 c1 fc ff ff 0f 1f 40 00 66 2e 0f [ 167.696200] ---[ end trace 7573f6868ea8f069 ]--- [Fix] This was fixed in 5.6-rc1 with the following commit: commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying <ann.zhuangyany...@huawei.com> Date: Sat Oct 12 11:37:31 2019 +0800 Subject: KVM: fix overflow of zero page refcount with ksm running Link: https://github.com/torvalds/linux/commit/7df003c85218b5f5b10a7f6418208f31e813f38f The fix adds a check to see if the Page Frame Number (pfn) is linked to the zero page, and if it is, treats it as reserved. This has the effect that put_page() is no longer called on the zero_page, and reference counting is no longer needed. This is a clean cherry pick to Bionic and Focal kernels. [Testcase] Create a new KVM host, and make sure it has plenty of ram. 16gb should be okay. Install KVM packages: $ sudo apt install -y qemu-kvm libvirt-bin qemu-utils genisoimage virtinst Enable Kernel Samepage Mapping, and use_zero_pages: $ echo 10000 | sudo tee /sys/kernel/mm/ksm/pages_to_scan $ echo 1 | sudo tee /sys/kernel/mm/ksm/run $ echo 1 | sudo tee /sys/kernel/mm/ksm/use_zero_pages I wrote a script which creates and destroys xenial KVM VMs in a infinite loop: https://paste.ubuntu.com/p/CvRTsDkdC7/ Save the script to disk, and execute it: $ chmod +x ksm_refcnt_overflow.sh $ ./ksm_refcnt_overflow.sh Each time a VM is created and destroyed the reference counter will increase. I wrote a kernel module which exposes a /proc interface, which we can use to look at the value of the zero_page reference counter. It works by taking the memory allocated for the zero page: empty_zero_page, which is defined in arch/x86/include/asm/pgtable.h, running virt_to_page() to get the page struct, which we can then dereference to get _refcount; https://paste.ubuntu.com/p/MJMN8jMVds/ Save the module to disk, create its Makefile from the included documentation, and build it: $ make $ sudo insmod zero_page_refcount.ko From there, we can examine the reference counter with: $ cat /proc/zero_page_refcount Zero Page Refcount: 0x687 or 1671 $ cat /proc/zero_page_refcount Zero Page Refcount: 0x846 or 2118 $ cat /proc/zero_page_refcount Zero Page Refcount: 0x9f8 or 2552 $ cat /proc/zero_page_refcount Zero Page Refcount: 0xcb2 or 3250 We see it steadily increase. Instead of waiting months for it to overflow, I implemented a /proc entry to set it to near overflow. You can use it with: $ cat /proc/zero_page_refcount_set Zero Page Refcount set to 0x1FFFFFFFFF000 After that, wait a few seconds and the reference counter will overflow: $ cat /proc/zero_page_refcount Zero Page Refcount: 0x7fffff16 or 2147483414 $ cat /proc/zero_page_refcount Zero Page Refcount: 0x80000000 or -2147483648 All VMs will become paused: $ virsh list Id Name State ---------------------------------------------------- 1 instance-0 paused 2 instance-1 paused QEMU will error out, and the kernel will oops with the messages in the impact section. I built a test kernel, which is available here: https://launchpad.net/~mruffell/+archive/ubuntu/sf290373-test If you install the test kernel and try reproduce, you will notice the reference counter is never incremented past 1: $ cat /proc/zero_page_refcount Zero Page Refcount: 0x1 or 1 $ cat /proc/zero_page_refcount Zero Page Refcount: 0x1 or 1 $ cat /proc/zero_page_refcount Zero Page Refcount: 0x1 or 1 This resolves the problem. [Regression Potential] While the change itself seems simple, it changes how the kernel treats the zero_page. The zero_page is important, since it is just a page full of 0's. Each time memory is allocated which is all 0s, the kernel sets it to use the zero_page to save memory. When an application writes to the buffer, a EPT violation happens, and the kernel does a COW to new pages to hold the data. The change is limited to how the KVM subsystem handles the zero_page. This will not break the entire kernel if a regression occurs, only KVM. If a regression were to occur, users could turn off KSM and disable KSM use_zero_pages until a fix is ready, as this particular use of zero_pages is limited to KSM. The fix landed in upstream 5.6, and has not been backported to stable kernels. I have read a bit of the paging code, especially around where the zero_page is used, and where its reference counters were being incorrectly incremented. I think the fix is correct, and I believe it won't cause any regressions. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837810/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp