Andy Cole via NANOG wrote on 26/09/2025 04:21:
No
configuration changes to routing policy at all. After a few days we
started to get customer complaints for certain sites/domains being
unreachable. I worked around the issue by not announcing the customer
blocks to the route servers and changed the return path to traverse
transit. This solved the issue, but I'm perplexed as to what could've
caused the issue, and where to look to resolve it. If you guys could
provide feedback and point me in the right direction I'd appreciate it. TIA.
If this was confirmed working before upgrading to 2x10, then that's
useful data.
The starting point here would be to check both 10G bearer circuits for
errors and discards. Dallas-IX is using IXP Manager so you should be
able to log in and check for discards and errors on both ports at the
remote side in addition to checking the same on your local router (or
switch).
If it's not traffic being dropped on the link, then it could be an issue
relating to the hashing algo on one side of the LAG or the other. Try to
get a repeat case with specific traffic, and then bring this up with the
Dallas IX people. Is traffic using both links? Are either of them
filling up? Does the problem go away if you disable one link, or the other?
Make sure to rule out MTU problems in each bearer link too.
Also, be sure to rule out ipv6 routing. Sometimes web pages don't load
up properly because some of the assets are delivered over ipv6. Because
ipv6 isn't as well monitored as ipv4 in general (cue outrage) and
because everyone starts out diagnostics with tools which default to
ipv4, this can sometimes slip under the radar.
Nick
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/[email protected]/message/O6QZ7TYU5WQVPKS2HIKZ3D2F54T6IMJK/