[Bug 1887490] Re: [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model
I just came across this bug while trying to deal with broken live migrates in our hypervisor setup. We are experiencing the regression that Markus describes when trying to upgrade libvirt from 6.0.0-0ubuntu8.3 to 6.0.0-0ubuntu8.5 in Ubuntu Focal. Live migrates won't work anymore wen using the EPYC-IBPB as cpu model for our guests. 'error: operation failed: guest CPU doesn't match specification: extra features: npt,nrip-save' -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1887490 Title: [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1887490/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state
Here a similar portion from the ovs log and gdb trace running openvswitch 2.11.0-0ubuntu2: Thu Oct 31 18:20:23 2019-2019-10-31T17:20:23.521Z|1|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting for revalidator124 to quiesce Thu Oct 31 18:20:24 2019-2019-10-31T17:20:24.521Z|2|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for revalidator124 to quiesce Thu Oct 31 18:20:59 2019-2019-10-31T17:20:26.520Z|3|ovs_rcu(urcu4)|WARN|blocked 4000 ms waiting for revalidator124 to quiesce In the trace: 29 Thread 0x7f72f97fa700 (LWP 26608) "revalidator124" 0x7f734aee237b in futex_abstimed_wait (private=, abstime=0x0, expected=10, futex_word=0x5577bab397c0 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:172 Full trace attached in gdbwrap.1572542426.log.gz The traces are a bit hocus to me, I really don't have a clue whats going on there but I guess it might help you make sense of whats going on here. ** Attachment added: "GDB trace ovs 2.11.0 during when it hangs" https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592/+attachment/5308469/+files/gdbwrap.1572542426.log.gz -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1839592 Title: Open vSwitch (Version 2.9.2) goes into deadlocked state To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state
** Attachment added: "GDB trace ovs 2.9.2 during when it hangs" https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592/+attachment/5308470/+files/gdbwrap.1566706577.log.gz -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1839592 Title: Open vSwitch (Version 2.9.2) goes into deadlocked state To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state
I just came across this bug report and would like to share my expierence. I've been having similar issues on 6 servers since we upgraded from 16.04 to 18.04 about 2 years ago with openvswitch. Our biggest problem is our inabilty to reproduce it. We just see Openvswitch hanging from time to time. Sometimes it takes a day to get stuck, sometimes it takes months. The only way to recover from it is to restart openvswitch. Right now we are running with a backport of openvswitch from Disco (2.11.0-0ubuntu2) in Bionic. With that version backported we are having the same issues as with the previously installed 2.9.2-0ubuntu0.18.04.3 version that Bionic has. I have gbd traces from both versions which I will attach. Here a small portion from the ovs log and gdb trace of openvswitch 2.9.2-0ubuntu0.18.04.3: Sun Aug 25 06:16:14 2019-2019-08-25T04:16:14.943Z|1|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting for revalidator127 to quiesce Sun Aug 25 06:16:15 2019-2019-08-25T04:16:15.943Z|2|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting for revalidator127 to quiesce Sun Aug 25 06:16:50 2019-2019-08-25T04:16:17.943Z|3|ovs_rcu(urcu4)|WARN|blocked 4001 ms waiting for revalidator127 to quiesce Small portion of the trace: 32 Thread 0x7f1bfa7fc700 (LWP 1461) "revalidator127" 0x7f1c61aeb37b in futex_abstimed_wait (private=, abstime=0x0, expected=10, futex_word=0x55e4ed0aa800 ) at ../sysdeps/unix/sysv/linux/futex-internal.h:172 The full trace is attached in gdbwrap.1566706577.log.gz (Openvswitch 2.9.2) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1839592 Title: Open vSwitch (Version 2.9.2) goes into deadlocked state To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853614] Re: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor
I can confirm there are no issues when booting with the package from the ubuntu-security-proposed PPA. amd64-microcode: Installed: 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1 Candidate: 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1 Version table: *** 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1 500 500 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 Packages 100 /var/lib/dpkg/status /proc/cpuinfo: vendor_id : AuthenticAMD cpu family : 23 model : 49 model name : AMD EPYC 7402P 24-Core Processor stepping: 0 microcode : 0x830101c -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853614 Title: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/amd64-microcode/+bug/1853614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1853614] Re: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor
I had the same problem on three servers running with an AMD EPYC 7402P 24-Core Processor. After removing the amd64-microcode 3.20191021.1ubuntu0.18.04.2 package the servers did not end up in a reboot loop anymore and were able to boot. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1853614 Title: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/amd64-microcode/+bug/1853614/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1744988] Re: time drifting on linux-hwe kernels
My apologies, I was not aware of this. After your message I have started testing the proposed kernel and can verify it is working as expected. ** Tags removed: verification-needed-artful ** Tags added: verification-done-artful -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744988 Title: time drifting on linux-hwe kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1744988/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1744988] Re: time drifting on linux-hwe kernels
I am running your build on two servers for about an hour now. Timing is stable on the both of them. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744988 Title: time drifting on linux-hwe kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1744988/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1744988] Re: time drifting on linux-hwe kernels
I have created patchfile, tsc.patch, which applies the upstream patchwork done by Len Brown to the 4.13.0-31-generic Kernel. This resolves the time issues we are having. ** Patch added: "tsc.patch" https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+attachment/5046309/+files/tsc.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744988 Title: time drifting on linux-hwe kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1744988] Re: time drifting on linux-hwe kernels
Although probably obvious, the mainline kernels that I tested have all been downloaden from http://kernel.ubuntu.com/~kernel-ppa/mainline/ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744988 Title: time drifting on linux-hwe kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1744988] [NEW] time drifting on linux-hwe kernels
Public bug reported: We observe NTP time drift on two servers running hwe kernels in Xenial. A few weeks ago we wanted to switch from 4.4 to 4.10. When rebooting the servers to the 4.10 kernel we were seeing a big time offset within minutes after booting. Despite running ntpd, it would not keep up and the offset stayed and kept growing over t. Rebooting back into the 4.4 at the time we immediatly noticed the time stayed normal. Over time I have tested about a dozen versions making me think something has been introduced in kernel 4.10 that makes the clock go out of sync. So what do we observe? After 1 min uptime: remote refid st t when poll reach delay offset jitter == *ntp4.bit.nl .PPS.1 u5 1670.497 100.084 81.382 +ntp1.bit.nl 193.0.0.229 2 u8 1670.603 93.241 70.643 +ntp2.bit.nl 193.67.79.2022 u8 1670.582 93.218 70.674 +ntp3.bit.nl 193.79.237.142 u9 1670.781 90.488 70.574 A couple of minutes later (and also hours/days, the offset just keeps growing over time) remote refid st t when poll reach delay offset jitter == *ntp4.bit.nl .PPS.1 u 13 16 3770.447 400.198 151.335 +ntp1.bit.nl 193.0.0.229 2 u 13 16 3770.313 400.561 151.339 +ntp2.bit.nl 193.67.79.2022 u 13 16 3770.517 400.445 151.398 +ntp3.bit.nl 193.79.237.142 u 12 16 3770.934 402.013 151.384 As mentioned I tested about a dozen of kernels and I thought I got it pinpointed to a specific release when the drifting got introduced, 4.10rc1. Below the test results of the kernels I have tested up till today: Tested: 4.4.0-112-generic: not affected Tested: 4.8.0-41-generic: not affected Tested: 4.8.0-58-generic : not affected Tested: 4.9.0 mainline: not affected Tested: 4.9.66 mainline: not affected Tested: 4.10-rc1 mainline: affected Tested: 4.10 mainline: affected Tested: 4.10.0-38-generic: affected Tested: 4.10.0-40-generic: affected Tested: 4.13.0-16-generic: affected Tested: 4.13.0-31-generic: affected Tested: 4.14.3 mainline: affected Tested: 4.15-rc1 mainline: affected When I was about to file this bugreport about an hour ago I noticed 4.15-rc9 was present and thought I gave it a go to make sure I really tested the latest version. And while running it over an hour now it stable. Mostl likely the following from the changelog is related the issue we are having: Len Brown (3): x86/tsc: Future-proof native_calibrate_tsc() x86/tsc: Fix erroneous TSC rate on Skylake Xeon x86/tsc: Print tsc_khz, when it differs from cpu_khz Both servers that are having issues on our side our equipped with the following cpu: Cpu Model (from /proc/cpuinfo) vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz Standard information as requested: 1: Description:Ubuntu 16.04.3 LTS Release:16.04 2: root@bit-host6:~# apt-cache policy linux-image-generic-hwe-16.04 linux-image-generic-hwe-16.04: Installed: 4.13.0.31.51 Candidate: 4.13.0.31.51 3: Stable time 4: A big time offset ** Affects: linux-hwe (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1744988 Title: time drifting on linux-hwe kernels To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs