Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-03-19 Thread Kees Meijs
Hi Thomas,

Then I'll await your findings while running 2.11 ourselves. If the new
build works well at you end I'll roll out several nodes using that one
for testing.

Apart from 100% CPU usage we didn't have issues, by the way.

K.

On 18-03-2020 19:18, Thomas Goirand wrote:
> I've fixed *one* type of crash, but we saw others, with a different
> backtrace (which I could see using gdb).
>
> We're now upgrading to the version see here:
> http://shade.infomaniak.ch/buster-pu/openvswitch/
>
> This is the top of the 2.10 branch, version is:
> 2.10.4+2020.01.14.b2ccc307f1+dfsg1-1+deb10u3
>
> I don't know yet if it fixes the problem we have ...
>
> I can try to convince the release team to update to that version in
> Buster, but chances they accept is kind of low.
>



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-03-18 Thread Thomas Goirand
On 3/18/20 10:35 AM, Kees Meijs wrote:
> 
> On 17-03-2020 14:37, Thomas Goirand wrote:
>> You may have notice my last upload of OVS in buster-proposed-updates.
>> This upload fixes at least one of the crashes which leads to vswitchd
>> taking 100% of one core.
>>
>> However, there's still some other issues we've experienced in
>> production. Soon, we'll test the latest version of OVS 2.10, and I'll be
>> able to tell if this fixes the other crash I've seen. In the mean time,
>> you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from
>> buster-proposed-updates.
>
> Good morning Thomas,
>
> The past few weeks have been intense at least so I did not. Same for
> comparing 2.10 with 2.11 code. Much appreciated you point out the
> upload.
>
> To save valuable time our cluster is running 2.11 where possible but
> it would be best to go back to the stock Debian packages.
>
> What other issues do you refer to? I would love to make time and test
> your new build (thank you, taking a deep bow) but are curious about
> other potential issues before I do.
>
> Best regards,
> Kees

Hi,

I've fixed *one* type of crash, but we saw others, with a different
backtrace (which I could see using gdb).

We're now upgrading to the version see here:
http://shade.infomaniak.ch/buster-pu/openvswitch/

This is the top of the 2.10 branch, version is:
2.10.4+2020.01.14.b2ccc307f1+dfsg1-1+deb10u3

I don't know yet if it fixes the problem we have ...

I can try to convince the release team to update to that version in
Buster, but chances they accept is kind of low.

Cheers,

Thomas Goirand (zigo)



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-03-18 Thread Kees Meijs
Good morning Thomas,

The past few weeks have been intense at least so I did not. Same for
comparing 2.10 with 2.11 code. Much appreciated you point out the upload.

To save valuable time our cluster is running 2.11 where possible but it
would be best to go back to the stock Debian packages.

What other issues do you refer to? I would love to make time and test
your new build (thank you, taking a deep bow) but are curious about
other potential issues before I do.

Best regards,
Kees

On 17-03-2020 14:37, Thomas Goirand wrote:
> You may have notice my last upload of OVS in buster-proposed-updates.
> This upload fixes at least one of the crashes which leads to vswitchd
> taking 100% of one core.
>
> However, there's still some other issues we've experienced in
> production. Soon, we'll test the latest version of OVS 2.10, and I'll be
> able to tell if this fixes the other crash I've seen. In the mean time,
> you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from
> buster-proposed-updates.
>



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-03-17 Thread Thomas Goirand
Hi Kees,

You may have notice my last upload of OVS in buster-proposed-updates.
This upload fixes at least one of the crashes which leads to vswitchd
taking 100% of one core.

However, there's still some other issues we've experienced in
production. Soon, we'll test the latest version of OVS 2.10, and I'll be
able to tell if this fixes the other crash I've seen. In the mean time,
you can try 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u2 from
buster-proposed-updates.

Cheers,

Thomas Goirand (zigo)



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-01-27 Thread Kees Meijs
Hi Thomas,

You're absolutely right in terms of the correct way of fixing things.

I'll take a look in the upstream changelog and maybe will diff the
source as well. Hopefully the fix itself is trivial.

Cheers,
Kees

On 27-01-2020 11:06, Thomas Goirand wrote:
> Have you investigated to know which upstream patch fixed the issue, so
> that we could backport that single patch instead?



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-01-27 Thread Thomas Goirand
On 1/25/20 8:33 PM, Kees Meijs wrote:
> Is upgrading to 2.11 in stable a viable option?

No it's not. The release team wont let this happen.

> (The backports team felt
> this bug is severe enough and the upgrade is only very minor.)

Uploading to stable-backports is *not* the way to fix bugs in Debian
stable. The way to fix bugs in stable is ... to fix bugs in stable! :)

Have you investigated to know which upstream patch fixed the issue, so
that we could backport that single patch instead?

Cheers,

Thomas Goirand (zigo)



Bug#949845: Constant 100% CPU usage by ovs-vswitchd

2020-01-25 Thread Kees Meijs
Package: openvswitch-switch
Version: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12

Hi there,

We extensively use Open vSwitch in our OpenStack and Ceph environments
and noticed version 2.10 "eats up" a CPU core with 100% utilisation on
buster.

For example:

> 2020-01-19T23:00:06.362Z|148220|poll_loop(handler130)|INFO|Dropped
> 734849 log messages in last 6 seconds (most recently, 0 seconds ago)
> due to excessive rate
> 2020-01-19T23:00:06.362Z|148221|poll_loop(handler130)|INFO|wakeup due
> to [POLLIN] on fd 23 (unknown anon_inode:[eventpoll]) at
> lib/dpif-netlink.c:2786 (99% CPU usage)

Other users experience a similar problem on other distributions as well.
As it seems this is a bug resolved in version 2.11.

Recently I manually built the 2.11 package from bullseye and installed
that. Although the build is not perfect (needed to add some files to
debian/not-installed from bugtool) the resulting packages install well
and the CPU usage is back to normal values.

For reference, I added the following files:

> usr/share/openvswitch/bugtool-plugins/system-configuration.xml
> usr/share/openvswitch/bugtool-plugins/system-configuration/openvswitch.xml
> usr/share/openvswitch/bugtool-plugins/system-logs/openvswitch.xml
> usr/share/openvswitch/bugtool-plugins/kernel-info/openvswitch.xml
> usr/share/openvswitch/bugtool-plugins/network-status/openvswitch.xml
> usr/share/openvswitch/bugtool-plugins/network-status/ovn.xml

Other than that, backporting seems trivial.

Is upgrading to 2.11 in stable a viable option? (The backports team felt
this bug is severe enough and the upgrade is only very minor.)

Thanks in advance!

Best regards,
Kees