Hi Remi,

I've managed to fully reproduce repeatedly.

So the problem is basically related with the MPATH flag.


What I've done first was to force the MPATH flag to appear.

To do this, I've forced both Ubuntu hosts to advertise the route
simultaneously.


So the first issue is at this point, when I remove the prefix
announcement from the second host, the entry is cleared but the MPATH
flag is not removed.


starting point:


root@fw1:~# route -n get 10.250.250.153

   route to: 10.250.250.153
destination: 10.250.250.153
       mask: 255.255.255.255
    gateway: 10.10.53.28
  interface: vlan1353
 if address: 10.10.53.26
   priority: 32 (ospf)
      flags: <UP,GATEWAY,DONE>
     use       mtu    expire
  474393         0         0
root@fw1:~#


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UG         0   474399     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0    35241     -    48
vlan1150


After forcibly advertise the route from the second host also:


root@fw1:~# route -n get 10.250.250.153
   route to: 10.250.250.153
destination: 10.250.250.153
       mask: 255.255.255.255
    gateway: 10.10.53.28
  interface: vlan1353
 if address: 10.10.53.26
   priority: 32 (ospf)
      flags: <UP,GATEWAY,DONE,MPATH>
     use       mtu    expire
  474443         0         0
root@fw1:~#


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0   474706     -    32
vlan1353
10.250.250.153/32  10.10.53.29        UGP        0        0     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0    35241     -    48
vlan1150
root@fw1:~#


After removing the route announcement from the second host, we get only
one OSPF path again, but flag remains:


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0   474761     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0    35241     -    48
vlan1150
root@fw1:~#


At this point, I then shutdown the first host interface. All converges
for OSPF, but FIB don't.


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0   475285     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0    35241     -    48
vlan1150
root@fw1:~#
root@fw1:~# ospfctl sho rib | grep 10.250.250.153
10.250.250.153/32    10.10.53.29       Intra-Area   Network   110    
00:00:56
root@fw1:~#


Service is down at this point. not pinging.

The fix as before:


root@fw1:~# ospfctl fib reload
reload request sent.
root@fw1:~#


root@fw1:~# ospfctl sho rib | grep 10.250.250.153
10.250.250.153/32    10.10.53.29       Intra-Area   Network   110    
00:00:07
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.11.155       UG         0       63     -    32
vlan1150
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~#
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      144     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~#


Since I was issue the commands repeatedly, I was able to see another bad
behaviour, please note that the OSPF FIB entry is temporarily populated
with the next-hop from the BGP entry.

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  _*10.10.11.155*_  _*!!!!*_ UG         0       63    
-    32 vlan1150
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~#


But service is restored a this point.



I then raise the first host interface and it re-converges normally and
service does not break.


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      870     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      879     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UG         0     1147     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~#



I've then repeated the process and all happen again.


Service normal:

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UG         0     1158     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150

Advertized both:

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1259     -    32
vlan1353
10.250.250.153/32  10.10.53.29        UGP        0        0     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1270     -    32
vlan1353
10.250.250.153/32  10.10.53.29        UGP        0        0     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150


Advertised only first host again:


root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1292     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1297     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1301     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150

Shutdown the first host interface:

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1351     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# ospfctl sho rib | grep 10.250.250.153
10.250.250.153/32    10.10.53.29       Intra-Area   Network   110    
00:00:11
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UGP        0     1438     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150

Manually restore service:

root@fw1:~# ospfctl fib reload
reload request sent.
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0       40     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0       60     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      118     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150

Service restored. Raised first host again:

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      131     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.29        UG         0      133     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150

Strange next-hop copy again:

root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.11.155       UG         0      242     -    32
vlan1150
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UG         0      377     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~# route -n show | grep  10.250.250.153
10.250.250.153/32  10.10.53.28        UG         0      404     -    32
vlan1353
10.250.250.153/32  10.10.11.155       UG         0        0     -    48
vlan1150
root@fw1:~#

Service ok, but had small outage because of the temporary next-hop set.


I Hope this way you can reproduce it on your side.


Let me know if you've any progress.


Thank you for your help.


Best regards,

João Alves


Reply via email to