[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
** Changed in: linux-azure (Ubuntu Groovy) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
This bug was fixed in the package linux-azure - 5.4.0-1032.33 --- linux-azure (5.4.0-1032.33) focal; urgency=medium * focal/linux-azure: 5.4.0-1032.33 -proposed tracker (LP: #1903162) * Focal update: v5.4.66 upstream stable release (LP: #1896824) - [Config] azure: updateconfigs for VGACON_SOFT_SCROLLBACK * [linux-azure][hibernation] Mellanox CX4 NIC's TX/RX packets stop increasing after hibernation/resume (LP: #1894896) - hv_netvsc: Fix hibernation for mlx5 VF driver * [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size (LP: #1894893) - PCI: hv: Fix hibernation in case interrupts are not re-created * linux-azure: build and include the tcm_loop module to the main kernel package (LP: #1791794) - [Config] linux-azure: CONFIG_LOOPBACK_TARGET=m (tcm_loop) * [linux-azure] Two Fixes For kdump Over Network (LP: #1883261) - PCI: hv: Fix the PCI HyperV probe failure path to release resource properly - PCI: hv: Retry PCI bus D0 entry on invalid device state [ Ubuntu: 5.4.0-55.61 ] * focal/linux: 5.4.0-55.61 -proposed tracker (LP: #1903175) * Update kernel packaging to support forward porting kernels (LP: #1902957) - [Debian] Update for leader included in BACKPORT_SUFFIX * Avoid double newline when running insertchanges (LP: #1903293) - [Packaging] insertchanges: avoid double newline * EFI: Fails when BootCurrent entry does not exist (LP: #183) - efivarfs: Replace invalid slashes with exclamation marks in dentries. * CVE-2020-14351 - perf/core: Fix race in the perf_mmap_close() function * raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations (LP: #1896578) - md: add md_submit_discard_bio() for submitting discard bio - md/raid10: extend r10bio devs to raid disks - md/raid10: pull codes that wait for blocked dev into one function - md/raid10: improve raid10 discard request - md/raid10: improve discard request for far layout - dm raid: fix discard limits for raid1 and raid10 - dm raid: remove unnecessary discard limits for raid10 * Bionic: btrfs: kernel BUG at /build/linux- eTBZpZ/linux-4.15.0/fs/btrfs/ctree.c:3233! (LP: #1902254) - btrfs: drop unnecessary offset_in_page in extent buffer helpers - btrfs: extent_io: do extra check for extent buffer read write functions - btrfs: extent-tree: kill BUG_ON() in __btrfs_free_extent() - btrfs: extent-tree: kill the BUG_ON() in insert_inline_extent_backref() - btrfs: ctree: check key order before merging tree blocks * Ethernet no link lights after reboot (Intel i225-v 2.5G) (LP: #1902578) - igc: Add PHY power management control * Undetected Data corruption in MPI workloads that use VSX for reductions on POWER9 DD2.1 systems (LP: #1902694) - powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation - selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround * [20.04 FEAT] Support/enhancement of NVMe IPL (LP: #1902179) - s390: nvme ipl - s390: nvme reipl - s390/ipl: support NVMe IPL kernel parameters * uvcvideo: add mapping for HEVC payloads (LP: #1895803) - media: uvcvideo: Add mapping for HEVC payloads * Focal update: v5.4.73 upstream stable release (LP: #1902115) - ibmveth: Switch order of ibmveth_helper calls. - ibmveth: Identify ingress large send packets. - ipv4: Restore flowi4_oif update before call to xfrm_lookup_route - mlx4: handle non-napi callers to napi_poll - net: fec: Fix phy_device lookup for phy_reset_after_clk_enable() - net: fec: Fix PHY init after phy_reset_after_clk_enable() - net: fix pos incrementment in ipv6_route_seq_next - net/smc: fix valid DMBE buffer sizes - net/tls: sendfile fails with ktls offload - net: usb: qmi_wwan: add Cellient MPL200 card - tipc: fix the skb_unshare() in tipc_buf_append() - socket: fix option SO_TIMESTAMPING_NEW - can: m_can_platform: don't call m_can_class_suspend in runtime suspend - can: j1935: j1939_tp_tx_dat_new(): fix missing initialization of skbcnt - net: j1939: j1939_session_fresh_new(): fix missing initialization of skbcnt - net/ipv4: always honour route mtu during forwarding - net_sched: remove a redundant goto chain check - r8169: fix data corruption issue on RTL8402 - cxgb4: handle 4-tuple PEDIT to NAT mode translation - binder: fix UAF when releasing todo list - ALSA: bebob: potential info leak in hwdep_read() - ALSA: hda/hdmi: fix incorrect locking in hdmi_pcm_close - nvme-pci: disable the write zeros command for Intel 600P/P3100 - chelsio/chtls: fix socket lock - chelsio/chtls: correct netdevice for vlan interface - chelsio/chtls: correct function return and return type - ibmvnic: save changed mac address to adapter->mac_addr - net: ftgmac100: Fix Aspeed ast2600 TX
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
For Groovy, the proposed fix has already been applied to the generic groovy/linux kernel as part of "Groovy update: v5.8.17 upstream stable release" (bug 1902137). Therefore, the patch applied to the linux-azure branch went away during the rebase so it's missing the BugLink to this bug report, due to that this bug will not be closed automatically when the package is released. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
** Changed in: linux-azure (Ubuntu Groovy) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
** Changed in: linux-azure (Ubuntu Focal) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
The fix was already reviewed/acked in the mailing list. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
** Changed in: linux-azure (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: linux-azure (Ubuntu Groovy) Importance: Undecided => Medium ** Changed in: linux-azure (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
SRU submission: https://lists.ubuntu.com/archives/kernel- team/2020-October/114241.html -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
** Also affects: linux-azure (Ubuntu Groovy) Importance: Undecided Status: New ** Also affects: linux-azure (Ubuntu Focal) Importance: Undecided Status: New ** Changed in: linux-azure (Ubuntu Focal) Status: New => In Progress ** Changed in: linux-azure (Ubuntu Groovy) Status: New => Fix Committed ** Changed in: linux-azure (Ubuntu Groovy) Status: Fix Committed => In Progress ** Description changed: + [Impact] + There are failed logs after resume from hibernation in NV6 (GPU passthrough size) VM in Azure: [ 1432.153730] hv_pci 47505500-0001--3130-444531334632: hv_irq_unmask() failed: 0x5 [ 1432.167910] hv_pci 47505500-0001--3130-444531334632: hv_irq_unmask() failed: 0x5 This happens to the latest stable release of the linux-azure 5.4.0-1023.23 kernel and the latest mainline linux kernel. - How reproducible: + [Test Case] + + How reproducible: 100% Steps to Reproduce: 1. Start a Standard_NV6 VM in Azure and enable hibernation properly (please refer to https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1880032/comments/14 ) E.g. here I create a Generation-1 Ubuntu 20.04 Standard NV6_Promo (6 vcpus, 56 GiB memory) VM in East US 2. 2. Make sure the in-kernel open-source nouveau driver is loaded, or blacklist the nouveau driver and install the official Nvidia GPU driver (please follow https://docs.microsoft.com/en-us/azure/virtual- machines/linux/n-series-driver-setup : "Install GRID drivers on NV or NVv3-series VMs" -- the most important step to run the "./NVIDIA-Linux- x86_64-grid.run".) 3. Run hibernation from serial console # systemctl hibernate 4. After hibernation finishes, start VM and check dmesg # dmesg|grep fail Actual results: [ 1432.153730] hv_pci 47505500-0001--3130-444531334632: hv_irq_unmask() failed: 0x5 [ 1432.167910] hv_pci 47505500-0001--3130-444531334632: hv_irq_unmask() failed: 0x5 And /proc/interrupts shows that the GPU interrupts are no longer happening. Expected results: No failed logs, and the GPU interrupt should still happen after hibernation. + [Regression Potential] + + The fix touches the pci-hyperv and can compromise the hyper-v guest + drivers. However the change is focuses on the execution path used for + hibernation that is still not officially supported. + + [Other info] BUG FIX: I made a fix here: https://lkml.org/lkml/2020/9/4/1268. Without the patch, we see the error "hv_pci 47505500-0001--3130-444531334632: hv_irq_unmask() failed: 0x5" during hibernation when the VM has the Nvidia GPU driver loaded, and after hibernation the GPU driver can no longer receive any MSI/MSI-X interrupts when we check /proc/interrupts. With the patch, we should no longer see the error, and the GPU driver should still receive interrupts after hibernation. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1894893] Re: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size
The fix is in the PCI tree now: "PCI: hv: Fix hibernation in case interrupts are not re-create" ( https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/hv=915cff7f38c5e4d47f187f8049245afc2cb3e503 ) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1894893 Title: [linux-azure][hibernation] GPU device no longer working after resume from hibernation in NV6 VM size To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1894893/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs