Hello all, We find ourselves trying to solve a requirement where we would like to test the viability of our paths to the internet and tear down the bgp session if it is determined to be faulty. We had an issue recently where we did not lose link or bgp but the carrier lost the ability to route traffic to the internet for us and our existing automatic detection and remediation strategies failed to detect this condition and we lost customer packets.
Conceptually, we have a pair of DCS7050-QX landing a fiber each from two ISPs with default routes on BGP at a dozen POPs around the US. One of the ISPs is our primary transit, and one is predominantly for peered customers, but we can use it for transit during issues with the primary circuits. I did some research on this and it seems like perhaps the on-boot event handler launching a python daemon to do this active probing out each isp circuit and then making config changes in response to transit failures might be the best option available to us. However, I thought I’d reach out to the broader community to see if there’s a better way to solve this, has an example script, or if anyone has recommendations for methods of active monitoring for protecting against this sort of failure. Thanks in advance for any insight and time. *Alex Buie*Senior Cloud Operations Engineer 450 Century Pkwy # 100 Allen, TX 75013 <https://maps.google.com/?q=450+Century+Pkwy+STE+100+%7C+Allen,+TX+%7C+75013&entry=gmail&source=g> D: 469-884-0225 | www.cytracom.com