[Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!
I was puzzled why sometimes openvpn ceases to generate '--ping 30' packets. I instrumented it a bit, here is a pseudo-patch: errlevel.h: +#define D_PING LOGLEV(4, 0, 0) /* show --ping packets */ -#define D_INTERVAL LOGLEV(8, 70, M_DEBUG) /* show interval.h debugging info */ +#define D_INTERVAL LOGLEV(4, 70, M_DEBUG) /* show interval.h debugging info */ ping-inline.h: (check_ping_send now will explain why it do not send pings) static inline void check_ping_send (struct context *c) { void check_ping_send_dowork (struct context *c); if (!c->options.ping_send_timeout) { msg (D_PING, "PING: !c->options.ping_send_timeout"); return; } if (c->c2.to_link.len) { msg (D_PING, "PING: c->c2.to_link.len <> 0"); return; } if (!event_timeout_trigger (&c->c2.ping_send_interval, &c->c2.timeval)) { msg (D_PING, "PING: !event_timeout_trigger(%ld,%ld)",(long)(c->c2.ping_send_interval.last), (long)c->c2.timeval.tv_sec); return; } check_ping_send_dowork (c); } ping.c:check_ping_send_dowork (struct context *c): (log pings) ... encrypt_sign (c, true); + msg (D_PING, "PING: SENT"); } forward.c:void single_select (struct context *c): (log large select timeouts) ... /* * Wait for something to happen. */ c->c2.select_status = 1; /* this will be our return "status" if select doesn't get called */ if (!c->sig->signal_received && !SOCKET_READ_RESIDUAL (c->c2.link_socket)) { if (check_debug_level (D_SELECT)) show_select_status (c); + if(((unsigned long)c->c2.timeval.tv_sec) >= 28) { //vda + msg(D_PING, "PING: select timeout=%ld", (long)c->c2.timeval.tv_sec); + } c->c2.select_status = SELECT (&c->c2.event_wait, &c->c2.timeval); check_status (c->c2.select_status, "select", NULL, NULL); } With --ping 30, select shall _never_ have timeout>30 secs, right? I was lucky. Bug struck me soon after I restarted openvpn. Here is the full log: Fri May 28 22:59:11 2004 Current Parameter Settings: Fri May 28 22:59:11 2004 config = '[UNDEF]' Fri May 28 22:59:11 2004 mode = 0 Fri May 28 22:59:11 2004 persist_config = DISABLED Fri May 28 22:59:11 2004 persist_mode = 1 Fri May 28 22:59:11 2004 show_ciphers = DISABLED Fri May 28 22:59:11 2004 show_digests = DISABLED Fri May 28 22:59:11 2004 genkey = DISABLED Fri May 28 22:59:11 2004 askpass = DISABLED Fri May 28 22:59:11 2004 show_tls_ciphers = DISABLED Fri May 28 22:59:11 2004 proto = 0 Fri May 28 22:59:11 2004 local = '1.1.4.1' Fri May 28 22:59:11 2004 remote_list[0] = {'1.1.4.2', 8002} Fri May 28 22:59:11 2004 remote_random = DISABLED Fri May 28 22:59:11 2004 local_port = 8002 Fri May 28 22:59:11 2004 remote_port = 8002 Fri May 28 22:59:11 2004 remote_float = DISABLED Fri May 28 22:59:11 2004 ipchange = '[UNDEF]' Fri May 28 22:59:11 2004 bind_local = ENABLED Fri May 28 22:59:11 2004 dev = 'tun' Fri May 28 22:59:11 2004 dev_type = '[UNDEF]' Fri May 28 22:59:11 2004 dev_node = '[UNDEF]' Fri May 28 22:59:11 2004 tun_ipv6 = DISABLED Fri May 28 22:59:11 2004 ifconfig_local = '1.1.5.1' Fri May 28 22:59:11 2004 ifconfig_remote_netmask = '1.1.5.2' Fri May 28 22:59:11 2004 ifconfig_noexec = DISABLED Fri May 28 22:59:11 2004 ifconfig_nowarn = DISABLED Fri May 28 22:59:11 2004 shaper = 0 Fri May 28 22:59:11 2004 tun_mtu = 1434 Fri May 28 22:59:11 2004 tun_mtu_defined = ENABLED Fri May 28 22:59:11 2004 link_mtu = 1500 Fri May 28 22:59:11 2004 link_mtu_defined = DISABLED Fri May 28 22:59:11 2004 tun_mtu_extra = 0 Fri May 28 22:59:11 2004 tun_mtu_extra_defined = DISABLED Fri May 28 22:59:11 2004 fragment = 0 Fri May 28 22:59:11 2004 mtu_discover_type = -1 Fri May 28 22:59:11 2004 mtu_test = 0 Fri May 28 22:59:11 2004 mlock = DISABLED Fri May 28 22:59:11 2004 inactivity_timeout = 0 Fri May 28 22:59:11 2004 ping_send_timeout = 30 Fri May 28 22:59:11 2004 ping_rec_timeout = 0 Fri May 28 22:59:11 2004 ping_rec_timeout_action = 0 Fri May 28 22:59:11 2004 ping_timer_remote = DISABLED Fri May 28 22:59:11 2004 persist_tun = DISABLED Fri May 28 22:59:11 2004 persist_local_ip = DISABLED Fri May 28 22:59:11 2004 persist_remote_ip = DISABLED Fri May 28 22:59:11 2004 persist_key = DISABLED Fri May 28 22:59:11 2004 mssfix = 1450 Fri May 28 22:59:11 2004 passtos = DISABLED Fri May 28 22:59:11 2004 resolve_retry_seconds = 0 Fri May 28 22:59:11 2004 connect_retry_seconds = 5 Fri May 28 22:59:11 2004 username = '[UNDEF]' Fri May 28 22:59:11 2004 groupname = '[UNDEF]' Fri May 28 22:59:11 2004 chroot_dir = '[UNDEF]' Fri May 28 22:59:11 2004 cd_dir = '[UNDEF]' Fri May 28 22:59:11 2004 writepid = '[UNDEF]' Fri May 28 22:59:11 2004 up_script = '[UNDEF]' Fri May 28 22:59:11 2004 down_script = '[UNDEF]' Fri May 28 22:59:11 2004 up_restart = DISABLED Fri May 28 22:59:11 2004 daemon = DISABLED Fri May 28 22:59:11 2004 inetd = 0 Fri May 28 22:59:11 2004 log = DISABLED Fri May 28 2
Re: [Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!
Denis, That looks like a possible bug in the coarse timer update logic. The coarse timer deals with events scheduled at a resolution denoted by an integer number of seconds, such as pings. A timeout of one year is used as kind of "effectively infinite" time interval. If you see this large timeout, it means that the ping event is losing its place in the coarse timer event queue -- probably a bug. James Denis Vlasenko said: > I was puzzled why sometimes openvpn ceases to generate > '--ping 30' packets. I instrumented it a bit, > here is a pseudo-patch: > > errlevel.h: > +#define D_PING LOGLEV(4, 0, 0) /* show --ping packets > */ > > -#define D_INTERVAL LOGLEV(8, 70, M_DEBUG) /* show interval.h debugging info */ > +#define D_INTERVAL LOGLEV(4, 70, M_DEBUG) /* show interval.h debugging info */ > > ping-inline.h: (check_ping_send now will explain why it do not send pings) > static inline void > check_ping_send (struct context *c) > { > void check_ping_send_dowork (struct context *c); > if (!c->options.ping_send_timeout) { msg (D_PING, "PING: !c->options.ping_send_timeout"); return; } > if (c->c2.to_link.len) { msg (D_PING, "PING: c->c2.to_link.len <> 0"); return; } > if (!event_timeout_trigger (&c->c2.ping_send_interval, &c->c2.timeval)) > { msg (D_PING, "PING: !event_timeout_trigger(%ld,%ld)",(long)(c->c2.ping_send_interval.last), (long)c->c2.timeval.tv_sec); return; } > check_ping_send_dowork (c); > } > > ping.c:check_ping_send_dowork (struct context *c): (log pings) > ... >encrypt_sign (c, true); > + msg (D_PING, "PING: SENT"); > } > > forward.c:void single_select (struct context *c): (log large select timeouts) > ... > /* >* Wait for something to happen. >*/ > c->c2.select_status = 1; /* this will be our return "status" if select doesn't get called */ > if (!c->sig->signal_received && !SOCKET_READ_RESIDUAL (c->c2.link_socket)) > { > if (check_debug_level (D_SELECT)) > show_select_status (c); > > + if(((unsigned long)c->c2.timeval.tv_sec) >= 28) { //vda > + msg(D_PING, "PING: select timeout=%ld", (long)c->c2.timeval.tv_sec); > + } > c->c2.select_status = SELECT (&c->c2.event_wait, &c->c2.timeval); > check_status (c->c2.select_status, "select", NULL, NULL); > } > > > With --ping 30, select shall _never_ have timeout>30 secs, right? > > I was lucky. Bug struck me soon after I restarted openvpn. > Here is the full log: > > Fri May 28 22:59:11 2004 Current Parameter Settings: > Fri May 28 22:59:11 2004 config = '[UNDEF]' > Fri May 28 22:59:11 2004 mode = 0 > Fri May 28 22:59:11 2004 persist_config = DISABLED > Fri May 28 22:59:11 2004 persist_mode = 1 > Fri May 28 22:59:11 2004 show_ciphers = DISABLED > Fri May 28 22:59:11 2004 show_digests = DISABLED > Fri May 28 22:59:11 2004 genkey = DISABLED > Fri May 28 22:59:11 2004 askpass = DISABLED > Fri May 28 22:59:11 2004 show_tls_ciphers = DISABLED > Fri May 28 22:59:11 2004 proto = 0 > Fri May 28 22:59:11 2004 local = '1.1.4.1' > Fri May 28 22:59:11 2004 remote_list[0] = {'1.1.4.2', 8002} > Fri May 28 22:59:11 2004 remote_random = DISABLED > Fri May 28 22:59:11 2004 local_port = 8002 > Fri May 28 22:59:11 2004 remote_port = 8002 > Fri May 28 22:59:11 2004 remote_float = DISABLED > Fri May 28 22:59:11 2004 ipchange = '[UNDEF]' > Fri May 28 22:59:11 2004 bind_local = ENABLED > Fri May 28 22:59:11 2004 dev = 'tun' > Fri May 28 22:59:11 2004 dev_type = '[UNDEF]' > Fri May 28 22:59:11 2004 dev_node = '[UNDEF]' > Fri May 28 22:59:11 2004 tun_ipv6 = DISABLED > Fri May 28 22:59:11 2004 ifconfig_local = '1.1.5.1' > Fri May 28 22:59:11 2004 ifconfig_remote_netmask = '1.1.5.2' > Fri May 28 22:59:11 2004 ifconfig_noexec = DISABLED > Fri May 28 22:59:11 2004 ifconfig_nowarn = DISABLED > Fri May 28 22:59:11 2004 shaper = 0 > Fri May 28 22:59:11 2004 tun_mtu = 1434 > Fri May 28 22:59:11 2004 tun_mtu_defined = ENABLED > Fri May 28 22:59:11 2004 link_mtu = 1500 > Fri May 28 22:59:11 2004 link_mtu_defined = DISABLED > Fri May 28 22:59:11 2004 tun_mtu_extra = 0 > Fri May 28 22:59:11 2004 tun_mtu_extra_defined = DISABLED > Fri May 28 22:59:11 2004 fragment = 0 > Fri May 28 22:59:11 2004 mtu_discover_type = -1 > Fri May 28 22:59:11 2004 mtu_test = 0 > Fri May 28 22:59:11 2004 mlock = DISABLED > Fri May 28 22:59:11 2004 inactivity_timeout = 0 > Fri May 28 22:59:11 2004 ping_send_timeout = 30 > Fri May 28 22:59:11 2004 ping_rec_timeout = 0 > Fri May 28 22:59:11 2004 ping_rec_timeout_action = 0 > Fri May 28 22:59:11 2004 ping_timer_remote = DISABLED > Fri May 28 22:59:11 2004 persist_tun = DISABLED > Fri May 28 22:59:11 2004 persist_local_ip = DISABLED > Fri May 28 22:59:11 2004 persist_remote_ip = DISABLED > Fri May 28 22:59:11 2004 persist_key = DISABLED > Fri May 28 22:59:11 2004 mssfix = 1450 > Fri May 28 22:59:11 2004 passtos = DISABLED
Re: [Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!
Denis, I think I see the bug: /* * Should we ping the remote? */ static inline void check_ping_send (struct context *c) { void check_ping_send_dowork (struct context *c); if (c->options.ping_send_timeout && !c->c2.to_link.len && event_timeout_trigger (&c->c2.ping_send_interval, &c->c2.timeval)) check_ping_send_dowork (c); } if c->c2.to_link.len is nonzero (which happens if the outgoing TCP/UDP buffer is already full) then event_timeout_trigger is not called, meaning that c->c2.timeval (the earliest wakeup) is not updated. The solution is probably to pass the boolean !c->c2.to_link.len as another parm to event_timeout_trigger so that it can still update c->c2.timeval even if the outgoing buffer is full. I will fix in b3. Thanks for noticing the bug and for coding the instrumentation to isolate it. James James Yonan said: > Denis, > > That looks like a possible bug in the coarse timer update logic. The coarse > timer deals with events scheduled at a resolution denoted by an integer number > of seconds, such as pings. A timeout of one year is used as kind of > "effectively infinite" time interval. If you see this large timeout, it means > that the ping event is losing its place in the coarse timer event queue -- > probably a bug. > > James > > Denis Vlasenko said: > > > I was puzzled why sometimes openvpn ceases to generate > > '--ping 30' packets. I instrumented it a bit, > > here is a pseudo-patch: > > > > errlevel.h: > > +#define D_PING LOGLEV(4, 0, 0) /* show --ping packets */ > > > > -#define D_INTERVAL LOGLEV(8, 70, M_DEBUG) /* show interval.h > debugging info */ > > +#define D_INTERVAL LOGLEV(4, 70, M_DEBUG) /* show interval.h > debugging info */ > > > > ping-inline.h: (check_ping_send now will explain why it do not send pings) > > static inline void > > check_ping_send (struct context *c) > > { > > void check_ping_send_dowork (struct context *c); > > if (!c->options.ping_send_timeout) { msg (D_PING, "PING: > !c->options.ping_send_timeout"); return; } > > if (c->c2.to_link.len) { msg (D_PING, "PING: c->c2.to_link.len <> 0"); > return; } > > if (!event_timeout_trigger (&c->c2.ping_send_interval, &c->c2.timeval)) > > { msg (D_PING, "PING: > !event_timeout_trigger(%ld,%ld)",(long)(c->c2.ping_send_interval.last), > (long)c->c2.timeval.tv_sec); return; } > > check_ping_send_dowork (c); > > } > > > > ping.c:check_ping_send_dowork (struct context *c): (log pings) > > ... > >encrypt_sign (c, true); > > + msg (D_PING, "PING: SENT"); > > } > > > > forward.c:void single_select (struct context *c): (log large select > > timeouts) > > ... > > /* > >* Wait for something to happen. > >*/ > > c->c2.select_status = 1; /* this will be our return "status" if > select doesn't get called */ > > if (!c->sig->signal_received && !SOCKET_READ_RESIDUAL (c->c2.link_socket)) > > { > > if (check_debug_level (D_SELECT)) > > show_select_status (c); > > > > + if(((unsigned long)c->c2.timeval.tv_sec) >= 28) { //vda > > + msg(D_PING, "PING: select timeout=%ld", (long)c->c2.timeval.tv_sec); > > + } > > c->c2.select_status = SELECT (&c->c2.event_wait, &c->c2.timeval); > > check_status (c->c2.select_status, "select", NULL, NULL); > > } > > > > > > With --ping 30, select shall _never_ have timeout>30 secs, right? > > > > I was lucky. Bug struck me soon after I restarted openvpn. > > Here is the full log: > > > > Fri May 28 22:59:11 2004 Current Parameter Settings: > > Fri May 28 22:59:11 2004 config = '[UNDEF]' > > Fri May 28 22:59:11 2004 mode = 0 > > Fri May 28 22:59:11 2004 persist_config = DISABLED > > Fri May 28 22:59:11 2004 persist_mode = 1 > > Fri May 28 22:59:11 2004 show_ciphers = DISABLED > > Fri May 28 22:59:11 2004 show_digests = DISABLED > > Fri May 28 22:59:11 2004 genkey = DISABLED > > Fri May 28 22:59:11 2004 askpass = DISABLED > > Fri May 28 22:59:11 2004 show_tls_ciphers = DISABLED > > Fri May 28 22:59:11 2004 proto = 0 > > Fri May 28 22:59:11 2004 local = '1.1.4.1' > > Fri May 28 22:59:11 2004 remote_list[0] = {'1.1.4.2', 8002} > > Fri May 28 22:59:11 2004 remote_random = DISABLED > > Fri May 28 22:59:11 2004 local_port = 8002 > > Fri May 28 22:59:11 2004 remote_port = 8002 > > Fri May 28 22:59:11 2004 remote_float = DISABLED > > Fri May 28 22:59:11 2004 ipchange = '[UNDEF]' > > Fri May 28 22:59:11 2004 bind_local = ENABLED > > Fri May 28 22:59:11 2004 dev = 'tun' > > Fri May 28 22:59:11 2004 dev_type = '[UNDEF]' > > Fri May 28 22:59:11 2004 dev_node = '[UNDEF]' > > Fri May 28 22:59:11 2004 tun_ipv6 = DISABLED > > Fri May 28 22:59:11 2004 ifconfig_local = '1.1.5.1' > > Fri May 28 22:59:11 2004 ifconfig_remote_netmask = '1.1.5.2' > > Fri May 28 22:59:11 2004 ifconfig_noexec = DISABLED > > Fri May 28 22:59:11 2004 ifconfig_nowarn = DISABLED > > Fri May 28
Re: [Openvpn-devel] [BUG] sometimes --ping 30 stops: select() timeout=31536000 seconds (exactly one year)!
On Saturday 29 May 2004 00:15, James Yonan wrote: > Denis, > > That looks like a possible bug in the coarse timer update logic. The > coarse timer deals with events scheduled at a resolution denoted by an > integer number of seconds, such as pings. A timeout of one year is used as > kind of "effectively infinite" time interval. If you see this large > timeout, it means that the ping event is losing its place in the coarse > timer event queue -- probably a bug. Yes. It clearly happened here: static void check_coarse_timers_dowork (struct context *c) { const struct timeval save = c->c2.timeval; c->c2.timeval.tv_sec = BIG_TIMEOUT; c->c2.timeval.tv_usec = 0; process_coarse_timers (c); c->c2.coarse_timer_wakeup = now + c->c2.timeval.tv_sec; msg (D_INTERVAL, "TIMER: coarse timer wakeup %d seconds", (int) c->c2.timeval.tv_sec); /* Is the coarse timeout NOT the earliest one? */ if (c->c2.timeval.tv_sec > save.tv_sec) c->c2.timeval = save; } I can see that TIMER: ... message with 1 year worth of seconds printed. Obviosly, process_coarse_timers (c) did not do it's job. I instrumented it this way now: process_coarse_timers (struct context *c) { #ifdef USE_CRYPTO /* flush current packet-id to file once per 60 seconds if --replay-persist was specified */ check_packet_id_persist_flush (c); msg(D_PING, "PING: process_coarse_timers 1 tv=t=%ld", (long)c->c2.timeval.tv_sec); #endif /* process connection establishment items */ check_connection_established (c); msg(D_PING, "PING: process_coarse_timers 2 tv=t=%ld", (long)c->c2.timeval.tv_sec); #if P2MP /* see if we should send a push_request in response to --pull */ check_push_request (c); msg(D_PING, "PING: process_coarse_timers 3 tv=t=%ld", (long)c->c2.timeval.tv_sec); #endif /* process --route options */ check_add_routes (c); msg(D_PING, "PING: process_coarse_timers 4 tv=t=%ld", (long)c->c2.timeval.tv_sec); /* possibly exit due to --inactive */ check_inactivity_timeout (c); msg(D_PING, "PING: process_coarse_timers 5 tv=t=%ld", (long)c->c2.timeval.tv_sec); if (c->sig->signal_received) return; /* restart if ping not received */ check_ping_restart (c); msg(D_PING, "PING: process_coarse_timers 6 tv=t=%ld", (long)c->c2.timeval.tv_sec); if (c->sig->signal_received) return; /* Should we send an OCC_REQUEST message? */ check_send_occ_req (c); msg(D_PING, "PING: process_coarse_timers 7 tv=t=%ld", (long)c->c2.timeval.tv_sec); /* Should we send an MTU load test? */ check_send_occ_load_test (c); msg(D_PING, "PING: process_coarse_timers 8 tv=t=%ld", (long)c->c2.timeval.tv_sec); /* Should we ping the remote? */ check_ping_send (c); msg(D_PING, "PING: process_coarse_timers 9 tv=t=%ld", (long)c->c2.timeval.tv_sec); } Output: Sat May 29 00:54:51 2004 RANDOM USEC=412366 Sat May 29 00:55:01 2004 PING: process_coarse_timers 1 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 2 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 3 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 4 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 5 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 6 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 7 tv=t=31536000 Sat May 29 00:55:01 2004 PING: process_coarse_timers 8 tv=t=31536000 Sat May 29 00:55:01 2004 EVENT event_timeout_wakeup (20/30) Sat May 29 00:55:01 2004 PING: !event_timeout_trigger(1085781291,20) Sat May 29 00:55:01 2004 PING: process_coarse_timers 9 tv=t=20 okay, check_ping_send (c) have done it's job. Sat May 29 00:55:01 2004 TIMER: coarse timer wakeup 20 seconds Sat May 29 00:55:01 2004 RANDOM USEC=113731 Sat May 29 00:55:18 2004 RANDOM USEC=994739 Sat May 29 00:55:22 2004 PING: process_coarse_timers 1 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 2 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 3 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 4 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 5 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 6 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 7 tv=t=31536000 Sat May 29 00:55:22 2004 PING: process_coarse_timers 8 tv=t=31536000 Sat May 29 00:55:22 2004 PING: c->c2.to_link.len <> 0 Sat May 29 00:55:22 2004 PING: process_coarse_timers 9 tv=t=31536000 Sat May 29 00:55:22 2004 TIMER: coarse timer wakeup 31536000 seconds BOOM! Sat May 29 00:55:22 2004 PING: select timeout=31536000 Sat May 29 00:55:22 2004 PING: select timeout=31536000 Sat May 29 00:55:22 2004 PING: select timeout=31536000 Sat May 29 00:55:22 2004 PING: select timeout=31536000 Sat May 29 00:55:22 2004 PING: select timeout=31536000 For your reference: static inline void check_ping_send (struct context *c) { void check_ping_send_dowork (struct context *c); if (!c->options.ping_sen