Hi Everyone,
Quick synopsis of our network, multiple pops, all connected via various 3rd party carriers, who all use differing MTUs, that can also "change" unexpectedly(Unavoidable unfortunately!)...hence, we have a few options, disable transport path-mtu-discovery, and run with the small 536 MTU default, or try setting a larger MTU, and hope the interpop links MTU doesnt drop below this, or use a "dynamic" approach, ala transport path-mtu-discovery. Faced an unusual issue last night - 2 ME3600's, both connected together, and connected to an ASR1006 (POPD) peer with 2 RR's (ASR1K's)...both "had" transport path-mtu-discovery enabled, and had happily peered with the 2 RRs for ~50weeks....last night, one of engineers attempted to peer with MS/360 on "ME01", and caused the peering seesions from this ME to flap to the RR's.....and I assume transport path-mtu-discovery was then triggered to "re-calc" the optimum MTU to the RRs (This is one piece of info Im not sure on....when does transport path-mtu-discovery actually calc the MTU, what are the triggers for it to re-calc?) Anyway, the value it ended up with, was too large, and BGP sessions to the 2 RR's would establish for 3minutes, fail, then re-establish ~9sec later....disabling transport path-mtu-discovery "fixed" this. The thing that concerns/confuses me about transport path-mtu-discovery (And if it simply is unreliable on the ME's to use in our network), is that on the 2 ME's, both with the same path to the 2 RRs, transport path-mtu-discovery came up with 2 completely different MTU sizes. PE01 (When it failed) - 1954 bytes PE02 (Which is still is apparently using) - 2936 bytes Now, ping tests from both these ME's show that 2936 bytes is absolutely not achievable (Where it got this number, I dont know)....but BGP is still up and running, and has been for 50 weeks....so it cant be using this MTU size? The max I can get through from both PE's is 1552 (Output below from both ME's)...so Im guessing if PE02 should have any flap/issue, we will be hit with a similiar issue that occurred last night on PE01 PE01-EQ-SY3-L1H500160-R1803-RU38#ping xxx.xxx.xxx.213 size 1552 df-bit Type escape sequence to abort. Sending 5, 1552-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds: Packet sent with the DF bit set !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 16/18/20 ms PE01-EQ-SY3-L1H500160-R1803-RU38#ping xxx.xxx.xxx.213 size 1553 df-bit Type escape sequence to abort. Sending 5, 1553-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds: Packet sent with the DF bit set ..... Success rate is 0 percent (0/5) PE02-EQ-SY3-L1H500160-R1803-RU37#ping xxx.xxx.xxx.213 size 1552 df-bit Type escape sequence to abort. Sending 5, 1552-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds: Packet sent with the DF bit set !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 16/19/20 ms PE02-EQ-SY3-L1H500160-R1803-RU37#ping xxx.xxx.xxx.213 size 1553 df-bit Type escape sequence to abort. Sending 5, 1553-byte ICMP Echos to xxx.xxx.xxx.213, timeout is 2 seconds: Packet sent with the DF bit set ..... Success rate is 0 percent (0/5) Any insight/recommendations are highly appreciated......as it stands now, I dont think we have any other choice than to completely remove transport path-mtu-discovery, and run with the small 536byte default...not ideal, but Im at a loss how transport mtu disc actually "works out" the MTU it decides on....from my limited experience with it, lol, it appears to pick a number at random (I know this cant be the case) Cheers _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/