Dear all, It seems that I'm facing a really strange behaviour during the tests I am running.
I'm afraid it will be quite a long mail, please be brave :) First, here is the context: - we are using Quagga since a few years now (2013/2014) for the IP transit of one of our datacenter. - for each Quagga version, we are compiling it with only two specific customs: 1) SNMP enabled, 2) BGP_ATTR_DEFAULT_WEIGHT set to 0 in bgpd/bgp_attr.h (to not use weight mechanic at all). - we plan to upgrade our routers from Debian Jessie to Debian Stretch, this is why I did some tests on a DEV environment (based on VirtualBox). To investigate this possible issue, I made a lab environment (using VirtualBox too) where the configuration file are simpler while they still reflect my PROD environment in some way. Here is my DEV setup: - two VBOX for my (one) ISP: ISP_R1 and ISP_R2, each of them are connected to the "world" (Internet) -- one iBGP session established between them. - two VBOX for my routers: R1 and R2 -- one iBGP session established between them, -- R1 have an eBGP session to ISP_R1, -- R2 have an eBGP session to ISP_R2. - one VBOX behind my routers, only used as an IP to test. Schematically, the setup looks like this: -------------------------"INTERNET"--------------------------- ...................|.....10.0.0.254.......|................... ...................|......................|................... ...................|....10.0.0.1 (VRRP)...|................... (192.168.200.1) ISP_R1------------------ISP_R2 (192.168.200.2) ............172.16.1.254...........172.16.2.254............... ...................|......................|................... ............172.16.1.1...............172.16.2.1............... ...................R1---------------------R2.................. ...................|192.168.255.254 (VRRP)|................... ...................|......................|................... ........................192.168.255.1......................... ............................SERVER............................ Here are the Quagga configurations files: ISP_R1 : https://pastebin.com/VpstkLVj ISP_R2 : https://pastebin.com/iiRXS3iS R1 : https://pastebin.com/WyB1Gm6c R2 : https://pastebin.com/THDKZbnw You will see that these configurations are quite simple (please remember that they are lab's configurations, they are not optimized/ideal). Basically: - ISP_R1 and ISP_R2 are sending only a default route to their peers (R1 and R2), - these defaults routes are also configured as static routes in R1 and R2 configurations, - R1 is preferred for input and output traffic (using local-pref and metric). In a normal situation here is how the routes are on routers: - ISP_R1: 192.168.255.0/24 via 172.16.1.1 dev eth4 proto zebra metric 20 - ISP_R2: 192.168.255.0/24 via 192.168.100.1 dev eth0 proto zebra metric 20 - R1 : default via 172.16.1.254 dev eth4 proto zebra metric 20 - R2 : default via 192.168.200.1 dev eth0 proto zebra metric 20 Now here is my issue: if I force the link between R1 and ISP_R1 to go down (with "vboxmanage controlvm R1 setlinkstate5 off"), the default route of R1 doesn't switch to R2 ! Routes appears like this: - ISP_R1: 192.168.255.0/24 via 192.168.100.2 dev eth0 proto zebra metric 20 - ISP_R2: 192.168.255.0/24 via 172.16.2.1 dev eth4 proto zebra metric 20 - R1 : -- no default route anymore -- - R2 : default via 192.168.200.1 dev eth0 proto zebra metric 20 R1 doesn't have a default route but still send it to R2: R1# show ip bgp neighbors 192.168.56.3 advertised-routes Network Next Hop Metric LocPrf Weight Path *> 0.0.0.0 192.168.56.2 0 200 0 i *> 192.168.56.2/32 192.168.56.2 0 100 0 i *> 192.168.56.3/32 192.168.56.2 0 100 0 i So, incoming traffic switches and is OK while outgoing traffic doesn't work anymore. At this point the issue could be in Quagga's configuration but here is the tricky part: if I restart Quagga on R1, routes are updated correctly! Here is the results: - ISP_R1: 192.168.255.0/24 via 192.168.100.2 dev eth0 proto zebra metric 20 - ISP_R2: 192.168.255.0/24 via 172.16.2.1 dev eth4 proto zebra metric 20 - R1 : default via 192.168.200.2 dev eth0 proto zebra metric 20 - R2 : default via 172.16.2.254 dev eth4 proto zebra metric 20 And now the funny part: from now on (when Quagga has been restarted while the link between ISP_R1 and R1 was DOWN) I can put the port UP/DOWN (UP/DOWN again, etc) and the routes will always be applied/removed correctly! If I restart Quagga while the link is UP and then put it DOWN, the same issue happens : the default route disappears on R1. It looks likes Quagga doesn't see the correct ethernet link despite the "link-detect" enabled on the interface. Few final words: - I'm not always able to reproduce it but most of the time I am (+ 90% of my attempts). - This behavior doesn't seems to appear if static defaults routes are not set in Quagga configurations on R1 and R2. - I'm not able to try this on physical servers so it might be related to my VirtualBox setup in some way. - I did reproduce this issue with Quagga 1.1.1 and 1.2.2 on Debian Jessie and with Quagga 1.2.2 on Stretch (I didn't test 1.1.1 on Stretch). - The recent fork of Quagga, doesn't have this behavior. - Please note that, except this behavior, I don't have any other issue (I can for example swap traffic to R2 using local-pref and metric without any issue). I really don't think that this is a configuration issue but I guess everyone think this way when they find an issue :) I know my english is quite bad so please let me know if I need to clarify something. Thanks! Thomas _______________________________________________ Quagga-users mailing list Quagga-users@lists.quagga.net https://lists.quagga.net/mailman/listinfo/quagga-users