On Thu, 27 Oct 2005, James Yonan wrote:
> On Thu, 27 Oct 2005, Gunter Ohrner wrote:
>
> > Hi!
> >
> > We're experiencing regular assertion failures and subsequent OpenVPN server
> > crashes on one of our servers.
> >
> > The assertion failure is always the same:
> >
> > ,----
> > | Assertion failed at multi.c:1561
> > | Exiting
> > `----
> >
> > The crash seems to leave openvpn's network device, tap0 in our case, in a
> > state which blocks all processes subsequently trying to access to device.
> > The crashes happen every few days and a restart of the server machine is
> > needed.
> >
> > Does anyone have any quick idea of this behaviour's cause? Unfortunately
> > according to Google we're the only ones on Linux 2.6 with this crash. ;)
> >
> > http://openvpn.net/archive/openvpn-users/2005-08/msg00011.html mentions a
> > similar problem but running kernel 2.2.25 and no solution has been provided
> > so far, a suggested patch did not fix the problem for the reporter.
> >
> > ,----[ Some details about our setup ]
> > | * Debian Sarge i386
> > | * Kernel 2.6.12.6 32 Bit Opteron optimized
> > | * Debian's 2.0-1sarge1 openvpn package
> > | * Dual Opteron 246 2,0GHz
> > `----
> >
> > ,----[ OpenVPN configuration (excerpts) ]
> > | * bind to single interface/port
> > | * use udp
> > | * use tap0
> > | * PSK authentication
> > `----
> >
> > The server is also routing traffic and we do traffic limiting for some
> > traffic (destination dependant, to comply with a leased link policy). This
> > limiting is not done on the device on which the encrypted openvpn traffic
> > leaves the machine but on an IMQ device before the incoming traffic enters
> > tap0, so openvpn should not see anything from it.
> >
> > Are there any further details needed to chase this bug, in whichever kind
> > of
> > software we're using it may be?
>
> This assertion usually occurs when the tun/tap device locks up and doesn't
> accept any write syscalls.
>
> Can you try an earlier 2.6 kernel (or 2.4), and see if the problem goes
> away?
>
> I would lean towards thinking that this is a tun/tap driver issue, simply
> because I've never heard about it on anything other than the old
> unmaintained 2.2 driver, or in this case a very new kernel.
>
> But having said that, I can't yet rule out that it's an OpenVPN bug. I
> could certainly "fix" the assertion by making OpenVPN wait forever for the
> tun/tap device to accept output. But then OpenVPN would simply hang, and
> you would have even less information to go on.
Ok, here's an update. I'm attaching a patch which I believe will fix
this. I've only been able to reproduce the assertion under simulated
conditions, therefore it would be great if you could test in a real-world
setting.
The patch should apply cleanly to 2.0, 2.0.x, or 2.1-beta.
James
Index: multi.c
===================================================================
--- multi.c (revision 672)
+++ multi.c (revision 730)
@@ -1583,7 +1583,8 @@
struct multi_instance *mi;
bool ret = true;
- ASSERT (!m->pending);
+ if (m->pending)
+ return true;
if (!instance)
{
@@ -1737,7 +1738,8 @@
printf ("TUN -> TCP/UDP [%d]\n", BLEN (&m->top.c2.buf));
#endif
- ASSERT (!m->pending);
+ if (m->pending)
+ return true;
/*
* Route an incoming tun/tap packet to
Index: forward.c
===================================================================
--- forward.c (revision 672)
+++ forward.c (revision 730)
@@ -609,10 +609,10 @@
*/
int status;
+ /*ASSERT (!c->c2.to_tun.len);*/
+
perf_push (PERF_READ_IN_LINK);
- ASSERT (!c->c2.to_tun.len);
-
c->c2.buf = c->c2.buffers->read_link_buf;
ASSERT (buf_init (&c->c2.buf, FRAME_HEADROOM_ADJ (&c->c2.frame,
FRAME_HEADROOM_MARKER_READ_LINK)));
status = link_socket_read (c->c2.link_socket, &c->c2.buf, MAX_RW_SIZE_LINK
(&c->c2.frame), &c->c2.from);
@@ -824,13 +824,13 @@
void
read_incoming_tun (struct context *c)
{
- perf_push (PERF_READ_IN_TUN);
-
/*
* Setup for read() call on TUN/TAP device.
*/
- ASSERT (!c->c2.to_link.len);
+ /*ASSERT (!c->c2.to_link.len);*/
+ perf_push (PERF_READ_IN_TUN);
+
c->c2.buf = c->c2.buffers->read_tun_buf;
#ifdef TUN_PASS_BUFFER
read_tun_buffered (c->c1.tuntap, &c->c2.buf, MAX_RW_SIZE_TUN (&c->c2.frame));
@@ -1056,14 +1056,15 @@
{
struct gc_arena gc = gc_new ();
- perf_push (PERF_PROC_OUT_TUN);
-
/*
* Set up for write() call to TUN/TAP
* device.
*/
- ASSERT (c->c2.to_tun.len > 0);
+ if (c->c2.to_tun.len <= 0)
+ return;
+ perf_push (PERF_PROC_OUT_TUN);
+
/*
* The --mssfix option requires
* us to examine the IPv4 header.