> But I'd be very surprised if the router is acting as anything more > that a network-layer device. It might perhaps have some soft connection > state being used for generating accounting records. Being Cisco > it's probably a switch-router, so it might carry some per-port hard > state for validating source IP addresses and ARPs on each port. > > The firewall is much more likely to be carrying per-flow Sack > state. The Cisco PIX had a bug with SACK handling (CSCse14419, > fixed in 7.0(7), 7.1(2.34), 7.2(2.2), 8.0(0.141) but perhaps it > has regressed). A simple trace either side of the firewall will > show the inconsistency between the TCP sequence number (which > gets randomised) and the Sack sequence number (which didn't). > You could disable the TCP Sequence Number Randomisation feature > and see if the fault reoccurs.
I do have TCP Sequence # Randomization enabled on my router. However, if this was causing an issue, wouldn't it always occur and cause connection issues, not just after 38 hours of correct operation? I can look into turning this off, but I'll likely have to jump through several hoops which will be challenging if I don't have a very clear definitive reason why this is causing this issue. Plus, I've had this problem with at least 2 other sets of network switches over the past 4 years. I'm actually running 7.0(6), which doesn't have the fix you mentioned. If it really is possible that this issue wouldn't always cause problems, but only after hours of succesful operation, then I could probably motivate the upgrade. I can try to setup a trace, but this is a lot of work for other people in my organization, so it will take quite some time. > You'd probably should also investigate the Linux kernel, > especially the size and locks of the components of the Sack data > structures and what happens to those data structures after Sack is > disabled (presumably the Sack data structure is in some unhappy > circumstance, and disabling Sack allows the data to be discarded, > magically unclaging the box). > > In the absence of the reporter wanting to dump the kernel's > core, how about a patch to print the Sack datastructure when > the command to disable Sack is received by the kernel? > Maybe just print the last 16b of the IP address? Given the fact that I've had this problem for so long, over a variety of networking hardware vendors and colo-facilities, this really sounds good to me. It will be challenging for me to justify a kernel core dump, but a simple patch to dump the Sack data would be do-able. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html