Hi all,

Here is my latest update on this one and a work around as well. Not great, but it work for now until this bug is fix.

To reproduce the problem, you only need to enable:

ip tcp selective-ack

on your Cisco router and as soon as you will clean the BGP session setup with MD5 on your OpenBSD from the Cisco side, regardless of OS version, and even on current, it will never comes back to life. The only way would be for you to clear your cisco and when in idle mode, to clear form the OpenBSD side, then and only then will the session will come back up.

However, you will still have a LOTS of errors messages in your logs if you look regarding this MD5 session. These don't go away until a reload is done, so on busy network, not very friendly either, nor practical as well.

*** This bug ONLY show up when MD5 is configure WITH ip tcp selective-ack ***

Without MD5, it's working very well thank you! May be the same bug is there, but just not affecting the session, may be possible, but I do not know that however. My tests didn't show that to be true so far anyway.

I have been looking at the code for a few days, and I have to admit, I get lost at times trying to follow it. But it look to me that it would be either in tcp_input.c or tcp_output.c. Most likely in tcp_input.c and in the section that process the reset received command from the remote end. It also have to be when "TCP_SIGNATURE" is enable as well, so I would assume that it have to be common between the two, but that's just a guess for now. Looking at the standard from the September 81 page 65 to 73, on how the process should be done, look it might be there, but I still haven't fully understood that yet. The tcp_input.c follow that very strictly, but there have to be a step omitted someplace and I can't put my finger on it yet. But look like a possibility of reply to the remote reset with ACK without the MD5 in the packet may be the cause of it, but again, not sure of that fact.

Why, no problem to setup the session at the start, and only show the problem when a reset is received at witch point the remote end expect the ack with MD5 and doesn't get it and will stay stuck in FINWAIT1 mode for ever. The OpenBSD show connected stage, but the remote end show OpenSent stage and will stay there.

The work around I use for now is to compile a kernel with

option          TCP_SACK        # Selective Acknowledgements for TCP

disable. Not great I have to admit, but as I do not control the remote end of multiples peers and some may actually use the "ip tcp selective-ack" feature on their routers if they try to get more efficiency out of it, I would be the one impacted by this and I can't really see myself telling them not to use it because I have a bug on my side.

So, for now, I simply compile a kernel with that TCP_SACK disable and then no selective acknowledgment will be in use and then all peer sessions with MD5 will not suffer this bug.

So, if anyone is actually using BGPd on their network AND also use MD5, I would recommend to use for now a kernel without "TCP_SACK" enable in it if they do not want their bgp session going dead in case of reset from remote end and have to do manual interventions from both side to get it back up. If you are 100% sure that none of your peer actually use this feature, then, you are home free and don't even change anything with it!

Hope this help some, it sure helped me. I got stuck with this one and lost a few hairs in the process. (;>


May be someone with better understanding of the process and specially of the tcp_input.c file might find the reason for this, great. If possible however, if someone find the problem, I would love if I may ask, to give me a bit of feedback if time allow on how the problem was solved as I would love to learn that in the process. I think I am getting close to it, but I can't put my finger on it yet. So, learning from it would be greatly appreciated if you would be so kind!

Regards,

Daniel

Reply via email to