Re: Strange MD5 Auth problem in BIRD 1.3.8

2013-10-15 Thread Ondrej Zajicek
On Thu, Oct 10, 2013 at 04:49:41PM -0500, Michael Vallaly wrote:
 
 Hi,
 
 I recently had an interesting problem surrounding socket option buffers
 and its use in Bird on Linux 3.6 which I hope someone could shed some
 light on.
 
 And after digging around in the Linux system it seems I was
 running out of socket option memory buffers (duh!). 
 Thusly I was able to fix this by issuing:
 
 snip 
 echo 40960  /proc/sys/net/core/optmem_max  # Defaults to 20480
 /snip
 
 Is this expected? Any insight on how to properly size the socket option
 memory buffers used by bird? Is this some sort of a socket buffer leak?

I have no idea. This seems like an internal problem of Linux kernel.
/proc/sys/net/core/optmem_max AFAIK specifies ancillary buffer size
*per socket*, and MD5sum uses just perhaps some 100s of bytes.

-- 
Elen sila lumenn' omentielvo

Ondrej 'SanTiago' Zajicek (email: santi...@crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
To err is human -- to blame it on a computer is even more so.


signature.asc
Description: Digital signature


Strange MD5 Auth problem in BIRD 1.3.8

2013-10-10 Thread Michael Vallaly

Hi,

I recently had an interesting problem surrounding socket option buffers
and its use in Bird on Linux 3.6 which I hope someone could shed some
light on.

Quite frequently see the following in our logs when enabling/starting
BGP sessions when configured to use MD5 auth.

snip
Sep 24 23:12:46 rtr2 bird: sk_set_md5_auth_int: setsockopt: No such
file or directory 
/snip

These never seem to cause any functionality problems, but seemed
strange / maybe related to my new ongoing issue. ;)

## My functionality impacting problem ##

Yesterday after some upstream BGP peers has connectivity issues (Hold
timer expired), all of my previously working BGP sessions (using MD5
auth) attempted to reconnect and gave me the following in the logs:

snip
Oct  9 18:02:08 rtr2 bird: plxhq: Error: Hold timer expired
Oct  9 18:02:08 rtr2 bird: plxhq: BGP session closed
Oct  9 18:02:08 rtr2 bird: plxhq: State changed to flush
Oct  9 18:02:08 rtr2 bird: plxhq: State changed to stop
Oct  9 18:02:08 rtr2 bird: sk_set_md5_auth_int: setsockopt: No such
file or directory Oct  9 18:02:08 rtr2 bird: plxhq: Down
Oct  9 18:02:08 rtr2 bird: plxhq: Starting
Oct  9 18:02:08 rtr2 bird: sk_set_md5_auth_int: setsockopt: Cannot
allocate memory
/snip

At which point the BGP session fails to establish/start, and all
subsequent BGP sessions that are started (with MD5 Auth) also fail
with the same message. 

Looking through the bird code it seems bird issues some socket control
messages to update the TCP socket with MD5 parameters. 

And after digging around in the Linux system it seems I was
running out of socket option memory buffers (duh!). 
Thusly I was able to fix this by issuing:

snip 
echo 40960  /proc/sys/net/core/optmem_max  # Defaults to 20480
/snip

Is this expected? Any insight on how to properly size the socket option
memory buffers used by bird? Is this some sort of a socket buffer leak?

snip
bird show memory
BIRD memory usage
Routing tables:307 MB
Route attributes:  106 MB
ROA tables:192  B
Protocols: 388 kB
Total: 413 MB

$ uptime
 16:45:03 up 343 days,  1:23,  1 user,  load average: 0.00, 0.03, 0.05

/snip

I have multiple identical machines running the same
os/software/configuration and so far only one of
them has shown this behavior.

Thanks!

-Mike

-- 
Michael Vallaly mvall...@nolatency.com