Re: BIRD on physical / virtual server.

2020-07-08 Thread Saso Tavcar
Hi,

There is a known issue with with Open vSwitch (OVS) performance with BGP.
Even without OVS we hit IRQ (too many) issue on "physical" network in our KVM
environment.


Official OVS quote:
> We'd accept patches to improve OVS's routing table code.  It's not
> designed to scale to 1,800,000 routes.  We'd also take code to suppress
> the routing table code in cases where it isn't actually needed, since
> it's not always needed.  But we can't take a patch to just delete it;
> I'm sure you understand.
I tried to apply this patch at that time, but was already useless for newer 
versions:

https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin
 


Our workaround was to scale VM with 3 vCPU-s, since our average system load is 
1.5 for BGP.

You can see what is happening:

[root@bgp1 ~]# top
...
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND


  654 root  10 -10 1284492   1.0g  20276 R  98.0  27.0   2513:01 
ovs-vswitchd
   
   16 root  20   0   0  0  0 S   2.0   0.0  24:45.60 
ksoftirqd/1  

[root@bgp1 ~]# ip route show
...
1.0.0.0/24 via 89.212.xx.xx dev t2-v24-ha proto bird 
1.0.4.0/24 via 89.212.xx.xx dev t2-v24-ha proto bird 
1.0.4.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
1.0.5.0/24 via 89.212.xx.xx dev t2-v24-ha proto bird


Routes being constantly added and deleted:

[root@bgp1 ~]# ip monitor
...
Deleted 2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:xxx:xxx::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 68.69.37.0/24 via 89.212.xx.xx dev t2-v24-ha proto bird 
68.69.37.0/24 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 2.16.70.0/23 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 88.221.28.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 23.50.188.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 92.122.68.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 88.221.100.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird 
Deleted 92.123.208.0/22 via 89.212.xx.xx dev t2-v24-ha proto bird
. 



Regards,
saso

> On 8 Jul 2020, at 08:05, Mike Neo  wrote:
> 
> Hi,
> 
> what is your experience with installing bird on virtual server (ESXi)? Is 
> there any limitation of this kind of deployment for example problems with 
> performance etc.?
> 
> Regards,
> Mike



Re: 100% CPU load with device scanning enabled

2019-05-06 Thread Saso Tavcar
The best solution would be a good OVS routing table patch as quoted.

Maybe BIRD developers can help, since they are native C developers.

We also tried bird on native (K)VM network interfaces. Since they are some kind 
of SW
emulation too, we hit on unrecoverable network IRQ problems, thus overloaded 
OVS is
still better solution for us.

Regards,
saso

> On 6 May 2019, at 21:01, Kees Meijs  wrote:
> 
> Hi Saso,
> 
> Thank you very much. OVS is new in the mix (we're not replacing Quagga alone) 
> as well. Obviously we didn't expect this to happen.
> 
> I'll see if patching OVS in Debian in a similar way works for us or if 
> another approach fits better (i.e. maybe not using OVS at all).
> 
> If you'll know of a better more upgrade-and-maintainance-proof solution I 
> would welcome more information.
> 
> Regards,
> Kees
> 
> On 06-05-19 20:40, Saso Tavcar wrote:
>> this is an OVS issue, already discussed:
>> 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html 
>> <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html>
>> ...
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html 
>> <https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html>
>> 
>> Official OVS quote:
>> > We'd accept patches to improve OVS's routing table code.  It's not
>> > designed to scale to 1,800,000 routes.  We'd also take code to suppress
>> > the routing table code in cases where it isn't actually needed, since
>> > it's not always needed.  But we can't take a patch to just delete it;
>> > I'm sure you understand.
>> I tried to apply this patch at that time, but was already useless for newer 
>> versions:
>> 
>> https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin
>>  
>> <https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin>
>> 
>> Our workaround was to scale VM with 3 vCPU-s, since our average system load 
>> is 1.5 for BGP.
>> 
>> You can see what is happening:
>> 
> 



Re: 100% CPU load with device scanning enabled

2019-05-06 Thread Saso Tavcar
Hi,

this is an OVS issue, already discussed:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043007.html 

...
https://mail.openvswitch.org/pipermail/ovs-discuss/2016-November/043063.html 


Official OVS quote:
> We'd accept patches to improve OVS's routing table code.  It's not
> designed to scale to 1,800,000 routes.  We'd also take code to suppress
> the routing table code in cases where it isn't actually needed, since
> it's not always needed.  But we can't take a patch to just delete it;
> I'm sure you understand.
I tried to apply this patch at that time, but was already useless for newer 
versions:

https://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161123/5379b333/attachment.bin
 


Our workaround was to scale VM with 3 vCPU-s, since our average system load is 
1.5 for BGP.

You can see what is happening:

[root@bgp1 ~]# top
...
  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND

   
  654 root  10 -10 1284492   1.0g  20276 R  98.0  27.0   2513:01 
ovs-vswitchd
  
   16 root  20   0   0  0  0 S   2.0   0.0  24:45.60 
ksoftirqd/1  

[root@bgp1 ~]# ip route show
...
1.0.0.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.4.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.4.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
1.0.5.0/24 via 89.212.47.185 dev t2-v24-ha proto bird


Routes being constantly added and deleted:

[root@bgp1 ~]# ip monitor
...
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 
1024 pref medium
2620:11d:6000::/42 via 2a01:260:1021::1 dev t2-v26-ha proto bird metric 1024 
pref medium
Deleted 68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
68.69.37.0/24 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
103.115.180.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 2.16.70.0/23 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 88.221.28.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 23.50.188.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 92.122.68.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 88.221.100.0/22 via 89.212.47.185 dev t2-v24-ha proto bird 
Deleted 92.123.208.0/22 via 89.212.47.185 dev t2-v24-ha proto bird
. 



Regards,
saso

> On 6 May 2019, at 19:30, Kees Meijs mailto:k...@nefos.nl>> 
> wrote:
> 
> Hi list,
> 
> We're in the process of replacing Quagga with BIRD but stumble upon a
> little problem.
> 
> When device scanning is on (obviously default) our testing machine
> completely fills up a CPU core. The culprit isn't BIRD itself but an
> Open vSwitch daemon.
> 
> After disabling the device protocol and restarting BIRD, everything goes
> back to it's quiet state.
> 
> BIRD (1.6.3-2) and Open vSwitch (2.6.2~pre+git20161223-3) both were
> installed as Debian stable packages.
> 
> The configuration is as simple as:
> 
>> # This is a minimal configuration file, which allows the bird daemon
>> to start
>> # but will not cause anything else to happen.
>> #
>> # Please refer to the documentation in the bird-doc package or BIRD User's
>> # Guide on http://bird.network.cz/  for more 
>> information on configuring
>> BIRD and
>> # adding routing protocols.
>> 
>> # Change this into your BIRD router ID. It's a world-wide unique
>> identification
>> # of your router, usually one of router's IPv4 addresses.
>> router id 1.2.3.4;
>> 
>> # The Device protocol is not a real routing protocol. It doesn't
>> generate any
>> # routes and it only serves as a module for getting information about
>> network
>> # interfaces from the kernel.
>> protocol device {
>> }
>> 
>> # The Kernel protocol is not a real routing protocol. Instead of
>> communicating
>> # with other routers in the network, it performs synchronization of BIRD's
>> # routing tables with the OS kernel.
>> protocol kernel {
>> metric 64;# Use explicit kernel route metric to avoid collision