[Bug 1887490] Re: [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model

2021-02-17 Thread Juul Spies
I just came across this bug while trying to deal with broken live
migrates in our hypervisor setup.

We are experiencing the regression that Markus describes when trying to
upgrade libvirt from 6.0.0-0ubuntu8.3 to 6.0.0-0ubuntu8.5 in Ubuntu
Focal.

Live migrates won't work anymore wen using the EPYC-IBPB as cpu model for our 
guests.
'error: operation failed: guest CPU doesn't match specification: extra 
features: npt,nrip-save'

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1887490

Title:
  [FFe/SRU] Add/Backport EPYC-v3 and EPYC-Rome CPU model

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1887490/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state

2019-11-28 Thread Juul Spies
Here a similar portion from the ovs log and gdb trace running openvswitch 
2.11.0-0ubuntu2:
Thu Oct 31 18:20:23 
2019-2019-10-31T17:20:23.521Z|1|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting 
for revalidator124 to quiesce
Thu Oct 31 18:20:24 
2019-2019-10-31T17:20:24.521Z|2|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting 
for revalidator124 to quiesce
Thu Oct 31 18:20:59 
2019-2019-10-31T17:20:26.520Z|3|ovs_rcu(urcu4)|WARN|blocked 4000 ms waiting 
for revalidator124 to quiesce

In the trace:
29   Thread 0x7f72f97fa700 (LWP 26608) "revalidator124" 0x7f734aee237b in 
futex_abstimed_wait (private=, abstime=0x0, expected=10, 
futex_word=0x5577bab397c0 ) at 
../sysdeps/unix/sysv/linux/futex-internal.h:172

Full trace attached in gdbwrap.1572542426.log.gz

The traces are a bit hocus to me, I really don't have a clue whats going
on there but I guess it might help you make sense of whats going on
here.


** Attachment added: "GDB trace ovs 2.11.0 during when it hangs"
   
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592/+attachment/5308469/+files/gdbwrap.1572542426.log.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1839592

Title:
  Open vSwitch (Version 2.9.2) goes into deadlocked state

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state

2019-11-28 Thread Juul Spies
** Attachment added: "GDB trace ovs 2.9.2 during when it hangs"
   
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1839592/+attachment/5308470/+files/gdbwrap.1566706577.log.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1839592

Title:
  Open vSwitch (Version 2.9.2) goes into deadlocked state

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1839592] Re: Open vSwitch (Version 2.9.2) goes into deadlocked state

2019-11-28 Thread Juul Spies
I just came across this bug report and would like to share my
expierence.

I've been having similar issues on 6 servers since we upgraded from 16.04 to 
18.04 about 2 years ago with openvswitch.
Our biggest problem is our inabilty to reproduce it. We just see Openvswitch 
hanging from time to time. Sometimes it takes a day to get stuck, sometimes it 
takes months.
The only way to recover from it is to restart openvswitch.

Right now we are running with a backport of openvswitch from Disco
(2.11.0-0ubuntu2) in Bionic. With that version backported we are having
the same issues as with the previously installed 2.9.2-0ubuntu0.18.04.3
version that Bionic has.

I have gbd traces from both versions which I will attach.

Here a small portion from the ovs log and gdb trace of openvswitch 
2.9.2-0ubuntu0.18.04.3:
Sun Aug 25 06:16:14 
2019-2019-08-25T04:16:14.943Z|1|ovs_rcu(urcu4)|WARN|blocked 1000 ms waiting 
for revalidator127 to quiesce
Sun Aug 25 06:16:15 
2019-2019-08-25T04:16:15.943Z|2|ovs_rcu(urcu4)|WARN|blocked 2000 ms waiting 
for revalidator127 to quiesce
Sun Aug 25 06:16:50 
2019-2019-08-25T04:16:17.943Z|3|ovs_rcu(urcu4)|WARN|blocked 4001 ms waiting 
for revalidator127 to quiesce

Small portion of the trace:
32   Thread 0x7f1bfa7fc700 (LWP 1461) "revalidator127" 0x7f1c61aeb37b in 
futex_abstimed_wait (private=, abstime=0x0, expected=10, 
futex_word=0x55e4ed0aa800 ) at 
../sysdeps/unix/sysv/linux/futex-internal.h:172

The full trace is attached in gdbwrap.1566706577.log.gz (Openvswitch
2.9.2)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1839592

Title:
  Open vSwitch (Version 2.9.2) goes into deadlocked state

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1839592/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853614] Re: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor

2019-11-26 Thread Juul Spies
I can confirm there are no issues when booting with the package from the
ubuntu-security-proposed PPA.

amd64-microcode:
  Installed: 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1
  Candidate: 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1
  Version table:
 *** 3.20191021.1+really3.20181128.1~ubuntu0.18.04.1 500
500 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu 
bionic/main amd64 Packages
100 /var/lib/dpkg/status


/proc/cpuinfo:
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 49
model name  : AMD EPYC 7402P 24-Core Processor
stepping: 0
microcode   : 0x830101c

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853614

Title:
  System stuck in reboot loop on AMD EPYC 7542 32-Core Processor

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/amd64-microcode/+bug/1853614/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853614] Re: System stuck in reboot loop on AMD EPYC 7542 32-Core Processor

2019-11-25 Thread Juul Spies
I had the same problem on three servers running with an AMD EPYC 7402P
24-Core Processor.

After removing the amd64-microcode 3.20191021.1ubuntu0.18.04.2 package
the servers did not end up in a reboot loop anymore and were able to
boot.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853614

Title:
  System stuck in reboot loop on AMD EPYC 7542 32-Core Processor

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/amd64-microcode/+bug/1853614/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744988] Re: time drifting on linux-hwe kernels

2018-03-21 Thread Juul Spies
My apologies, I was not aware of this. After your message I have started
testing the proposed kernel and can verify it is working as expected.

** Tags removed: verification-needed-artful
** Tags added: verification-done-artful

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744988] Re: time drifting on linux-hwe kernels

2018-02-01 Thread Juul Spies
I am running your build on two servers for about an hour now. Timing is
stable on the both of them.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744988] Re: time drifting on linux-hwe kernels

2018-01-31 Thread Juul Spies
I have created patchfile, tsc.patch, which applies the upstream
patchwork done by Len Brown to the 4.13.0-31-generic Kernel. This
resolves the time issues we are having.

** Patch added: "tsc.patch"
   
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+attachment/5046309/+files/tsc.patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744988] Re: time drifting on linux-hwe kernels

2018-01-23 Thread Juul Spies
Although probably obvious, the mainline kernels that I tested have all
been downloaden from http://kernel.ubuntu.com/~kernel-ppa/mainline/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1744988] [NEW] time drifting on linux-hwe kernels

2018-01-23 Thread Juul Spies
Public bug reported:

We observe NTP time drift on two servers running hwe kernels in Xenial.
A few weeks ago we wanted to switch from 4.4 to 4.10. When rebooting the
servers to the 4.10 kernel we were seeing a big time offset within
minutes after booting. Despite running ntpd, it would not keep up and
the offset stayed and kept growing over t.

Rebooting back into the 4.4 at the time we immediatly noticed the time
stayed normal. Over time I have tested about a dozen versions making me
think something has been introduced in kernel 4.10 that makes the clock
go out of sync.

So what do we observe?

After 1 min uptime:
 remote   refid  st t when poll reach   delay   offset  jitter
==
*ntp4.bit.nl .PPS.1 u5   1670.497  100.084  81.382
+ntp1.bit.nl 193.0.0.229  2 u8   1670.603   93.241  70.643
+ntp2.bit.nl 193.67.79.2022 u8   1670.582   93.218  70.674
+ntp3.bit.nl 193.79.237.142 u9   1670.781   90.488  70.574

A couple of minutes later (and also hours/days, the offset just keeps
growing over time)

 remote   refid  st t when poll reach   delay   offset  jitter
==
*ntp4.bit.nl .PPS.1 u   13   16  3770.447  400.198 151.335
+ntp1.bit.nl 193.0.0.229  2 u   13   16  3770.313  400.561 151.339
+ntp2.bit.nl 193.67.79.2022 u   13   16  3770.517  400.445 151.398
+ntp3.bit.nl 193.79.237.142 u   12   16  3770.934  402.013 151.384

As mentioned I tested about a dozen of kernels and I thought I got it
pinpointed to a specific release when the drifting got introduced,
4.10rc1. Below the test results of the kernels I have tested up till
today:

Tested: 4.4.0-112-generic: not affected
Tested: 4.8.0-41-generic: not affected
Tested: 4.8.0-58-generic : not affected
Tested: 4.9.0 mainline: not affected
Tested: 4.9.66 mainline: not affected
Tested: 4.10-rc1 mainline: affected
Tested: 4.10 mainline: affected
Tested: 4.10.0-38-generic: affected
Tested: 4.10.0-40-generic: affected
Tested: 4.13.0-16-generic: affected
Tested: 4.13.0-31-generic: affected
Tested: 4.14.3 mainline: affected
Tested: 4.15-rc1 mainline: affected

When I was about to file this bugreport about an hour ago I noticed
4.15-rc9 was present and thought I gave it a go to make sure I really
tested the latest version. And while running it over an hour now it
stable.

Mostl likely the following from the changelog is related the issue we
are having:

Len Brown (3):
  x86/tsc: Future-proof native_calibrate_tsc()
  x86/tsc: Fix erroneous TSC rate on Skylake Xeon
  x86/tsc: Print tsc_khz, when it differs from cpu_khz

Both servers that are having issues on our side our equipped with the
following cpu:

Cpu Model (from /proc/cpuinfo)
vendor_id   : GenuineIntel
cpu family  : 6
model   : 85
model name  : Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz

Standard information as requested:
1:
Description:Ubuntu 16.04.3 LTS
Release:16.04

2: 
root@bit-host6:~# apt-cache policy linux-image-generic-hwe-16.04
linux-image-generic-hwe-16.04:
  Installed: 4.13.0.31.51
  Candidate: 4.13.0.31.51

3: Stable time

4: A big time offset

** Affects: linux-hwe (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1744988

Title:
  time drifting on linux-hwe kernels

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1744988/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs