Bug#949845: Constant 100% CPU usage by ovs-vswitchd
Hi Thomas, Then I'll await your findings while running 2.11 ourselves. If the new build works well at you end I'll roll out several nodes using that one for testing. Apart from 100% CPU usage we didn't have issues, by the way. K. On 18-03-2020 19:18, Thomas Goirand wrote: > I've fixed *one* type of crash, but we saw others, with a different > backtrace (which I could see using gdb). > > We're now upgrading to the version see here: > http://shade.infomaniak.ch/buster-pu/openvswitch/ > > This is the top of the 2.10 branch, version is: > 2.10.4+2020.01.14.b2ccc307f1+dfsg1-1+deb10u3 > > I don't know yet if it fixes the problem we have ... > > I can try to convince the release team to update to that version in > Buster, but chances they accept is kind of low. >
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
On 3/18/20 10:35 AM, Kees Meijs wrote: > > On 17-03-2020 14:37, Thomas Goirand wrote: >> You may have notice my last upload of OVS in buster-proposed-updates. >> This upload fixes at least one of the crashes which leads to vswitchd >> taking 100% of one core. >> >> However, there's still some other issues we've experienced in >> production. Soon, we'll test the latest version of OVS 2.10, and I'll be >> able to tell if this fixes the other crash I've seen. In the mean time, >> you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from >> buster-proposed-updates. > > Good morning Thomas, > > The past few weeks have been intense at least so I did not. Same for > comparing 2.10 with 2.11 code. Much appreciated you point out the > upload. > > To save valuable time our cluster is running 2.11 where possible but > it would be best to go back to the stock Debian packages. > > What other issues do you refer to? I would love to make time and test > your new build (thank you, taking a deep bow) but are curious about > other potential issues before I do. > > Best regards, > Kees Hi, I've fixed *one* type of crash, but we saw others, with a different backtrace (which I could see using gdb). We're now upgrading to the version see here: http://shade.infomaniak.ch/buster-pu/openvswitch/ This is the top of the 2.10 branch, version is: 2.10.4+2020.01.14.b2ccc307f1+dfsg1-1+deb10u3 I don't know yet if it fixes the problem we have ... I can try to convince the release team to update to that version in Buster, but chances they accept is kind of low. Cheers, Thomas Goirand (zigo)
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
Good morning Thomas, The past few weeks have been intense at least so I did not. Same for comparing 2.10 with 2.11 code. Much appreciated you point out the upload. To save valuable time our cluster is running 2.11 where possible but it would be best to go back to the stock Debian packages. What other issues do you refer to? I would love to make time and test your new build (thank you, taking a deep bow) but are curious about other potential issues before I do. Best regards, Kees On 17-03-2020 14:37, Thomas Goirand wrote: > You may have notice my last upload of OVS in buster-proposed-updates. > This upload fixes at least one of the crashes which leads to vswitchd > taking 100% of one core. > > However, there's still some other issues we've experienced in > production. Soon, we'll test the latest version of OVS 2.10, and I'll be > able to tell if this fixes the other crash I've seen. In the mean time, > you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from > buster-proposed-updates. >
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
Hi Kees, You may have notice my last upload of OVS in buster-proposed-updates. This upload fixes at least one of the crashes which leads to vswitchd taking 100% of one core. However, there's still some other issues we've experienced in production. Soon, we'll test the latest version of OVS 2.10, and I'll be able to tell if this fixes the other crash I've seen. In the mean time, you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from buster-proposed-updates. Cheers, Thomas Goirand (zigo)
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
Hi Thomas, You're absolutely right in terms of the correct way of fixing things. I'll take a look in the upstream changelog and maybe will diff the source as well. Hopefully the fix itself is trivial. Cheers, Kees On 27-01-2020 11:06, Thomas Goirand wrote: > Have you investigated to know which upstream patch fixed the issue, so > that we could backport that single patch instead?
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
On 1/25/20 8:33 PM, Kees Meijs wrote: > Is upgrading to 2.11 in stable a viable option? No it's not. The release team wont let this happen. > (The backports team felt > this bug is severe enough and the upgrade is only very minor.) Uploading to stable-backports is *not* the way to fix bugs in Debian stable. The way to fix bugs in stable is ... to fix bugs in stable! :) Have you investigated to know which upstream patch fixed the issue, so that we could backport that single patch instead? Cheers, Thomas Goirand (zigo)
Bug#949845: Constant 100% CPU usage by ovs-vswitchd
Package: openvswitch-switch Version: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12 Hi there, We extensively use Open vSwitch in our OpenStack and Ceph environments and noticed version 2.10 "eats up" a CPU core with 100% utilisation on buster. For example: > 2020-01-19T23:00:06.362Z|148220|poll_loop(handler130)|INFO|Dropped > 734849 log messages in last 6 seconds (most recently, 0 seconds ago) > due to excessive rate > 2020-01-19T23:00:06.362Z|148221|poll_loop(handler130)|INFO|wakeup due > to [POLLIN] on fd 23 (unknown anon_inode:[eventpoll]) at > lib/dpif-netlink.c:2786 (99% CPU usage) Other users experience a similar problem on other distributions as well. As it seems this is a bug resolved in version 2.11. Recently I manually built the 2.11 package from bullseye and installed that. Although the build is not perfect (needed to add some files to debian/not-installed from bugtool) the resulting packages install well and the CPU usage is back to normal values. For reference, I added the following files: > usr/share/openvswitch/bugtool-plugins/system-configuration.xml > usr/share/openvswitch/bugtool-plugins/system-configuration/openvswitch.xml > usr/share/openvswitch/bugtool-plugins/system-logs/openvswitch.xml > usr/share/openvswitch/bugtool-plugins/kernel-info/openvswitch.xml > usr/share/openvswitch/bugtool-plugins/network-status/openvswitch.xml > usr/share/openvswitch/bugtool-plugins/network-status/ovn.xml Other than that, backporting seems trivial. Is upgrading to 2.11 in stable a viable option? (The backports team felt this bug is severe enough and the upgrade is only very minor.) Thanks in advance! Best regards, Kees