RE: Openvpn tap uses 99% cpu time

2007-10-03 Thread Emile Coetzee
Hi Bruce

I hope you can still remember this post from way back when. You can read the
archives here:
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2007-03/msg00445.html


Up until now we have been tracking RELENG_6_2 without any issues but I was
forced to look at this again to get a newer NIC working where I needed
RELENG_6. 

So quick recap: A while ago I hit a problem where the openvpn tap service
would kill my machine with 99% CPU usage. I could reproduce this on a number
of machines. I have now produced a way to prevent and reproduce the problem,
so hopefully if someone else can reproduce this we can find a proper fix.

I have been loading my tap device using /boot/loader.conf
if_tap_load=YES

So I removed this from the loader. Rebooted and tried to start openvpn
(/usr/local/sbin/openvpn --cd /usr/local/etc/openvpn --config
/usr/local/etc/openvpn/tapsvr.conf) It would not start (obviously as there
was no tap device). 

I then loaded the tap device using kldload (kldload if_tap) and tried again.
This time is started and surprise, surprise no 99% CPU problem. Not wanting
to have to load the device before starting openvpn I then compiled the tap
device into my kernel which did the trick.

So it seems that there is a problem with loading the tap device from
loader.conf on RELENG_6. This works correctly on RELENG_6_2.

Do you want to see if you can reproduce this first or should I open a bug?

Regards
Emile


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Emile Coetzee
Posted At: 20 March 2007 14:16 PM
Posted To: freebsd-stable
Conversation: Openvpn tap uses 99% cpu time
Subject: RE: Openvpn tap uses 99% cpu time

Emile Coetzee wrote:
 Okay I finally have a ktrace of the offending process. You can view it
here:
 http://www.clarotech.co.za/dump/openvpn2.txt
   
Thanks for this. If this is the correct trace, of the correct process, 
then it looks like OpenVPN is hanging immediately on opening the tap
device.

One thing that does jump out at me is the use of the persist-tun 
keyword. Can you try removing the use of this keyword? It is something 
I've never had to use with OpenVPN.

I did try it without the persist settings and it seemed to work but then I
put it back again and that worked too (i.e. no 100% CPU usage). However
after restarting the box and repeating the tests, both produced the 100% CPU
usage issue. 

What I suspect happened is that somehow the tap device was already present
and thus openvpn did not need to create one. I then tried to use the
cloned_interfaces=tap0 to see if that would create a tap device for me at
boot time. But the server hung similarly to when openvpn uses 100% CPU time
more or less where I would expect it to initialize the NICs. Unfortunately I
did not think to check if the tap device was present before testing it
without the persist setting. So it's a bit of mystery.

I have lost the box I was testing on (off to a client) and will only be able
to setup a new one to do testing on next week. So I will feed back with any
new findings.

Cheers
Emile

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Openvpn tap uses 99% cpu time

2007-03-20 Thread Emile Coetzee
Emile Coetzee wrote:
 Okay I finally have a ktrace of the offending process. You can view it
here:
 http://www.clarotech.co.za/dump/openvpn2.txt
   
Thanks for this. If this is the correct trace, of the correct process, 
then it looks like OpenVPN is hanging immediately on opening the tap
device.

One thing that does jump out at me is the use of the persist-tun 
keyword. Can you try removing the use of this keyword? It is something 
I've never had to use with OpenVPN.

I did try it without the persist settings and it seemed to work but then I
put it back again and that worked too (i.e. no 100% CPU usage). However
after restarting the box and repeating the tests, both produced the 100% CPU
usage issue. 

What I suspect happened is that somehow the tap device was already present
and thus openvpn did not need to create one. I then tried to use the
cloned_interfaces=tap0 to see if that would create a tap device for me at
boot time. But the server hung similarly to when openvpn uses 100% CPU time
more or less where I would expect it to initialize the NICs. Unfortunately I
did not think to check if the tap device was present before testing it
without the persist setting. So it's a bit of mystery.

I have lost the box I was testing on (off to a client) and will only be able
to setup a new one to do testing on next week. So I will feed back with any
new findings.

Cheers
Emile

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-16 Thread Bruce M. Simpson

Emile Coetzee wrote:

Okay I finally have a ktrace of the offending process. You can view it here:
http://www.clarotech.co.za/dump/openvpn2.txt
  
Thanks for this. If this is the correct trace, of the correct process, 
then it looks like OpenVPN is hanging immediately on opening the tap device.


In a quick look at the code, nothing jumps out at me which is out of the 
ordinary about the locking during clone and open. It would have been 
useful to have the ps output to correlate with in case the process was 
spinning trying to acquire a spin lock, which would show up in the WCHAN 
field.


One thing that does jump out at me is the use of the persist-tun 
keyword. Can you try removing the use of this keyword? It is something 
I've never had to use with OpenVPN.


To the best of my knowledge there have been no other reports of this 
problem on FreeBSD. I'm sorry there is just very little hard information 
to go on here at the moment.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Emile Coetzee
I have done some more investigation. Rolling back to RELENG_6_2 solves the
problem. I have now had this problem on 3 boxes in 2 days and have been able
to reproduce it on a 4th in our lab.

Starting openvpn with full debug it stops just before the point where the
TAP devices gets initiated. And then the output from top shows openvpn on
99% cpu time (well almost).

RELENG_6
[snip]
Thu Mar 15 08:27:39 2007 us=333199 OpenVPN 2.0.6 i386-portbld-freebsd6.2
[SSL] [LZO] built on Mar 14 2007
Thu Mar 15 08:27:39 2007 us=344375 Diffie-Hellman initialized with 1024 bit
key
Thu Mar 15 08:27:39 2007 us=345355 MTU DYNAMIC mtu=0, flags=1, 0 - 138
Thu Mar 15 08:27:39 2007 us=345376 TLS-Auth MTU parms [ L:1574 D:138 EF:38
EB:0 ET:0 EL:0 ]
Thu Mar 15 08:27:39 2007 us=345390 MTU DYNAMIC mtu=1450, flags=2, 1574 -
1450

RELENG_6_2
[snip]
Thu Mar 15 07:18:23 2007 us=437582 OpenVPN 2.0.6 i386-portbld-freebsd6.2
[SSL] [LZO] built on Mar 14 2007
Thu Mar 15 07:18:23 2007 us=448672 Diffie-Hellman initialized with 1024 bit
key
Thu Mar 15 07:18:23 2007 us=449653 MTU DYNAMIC mtu=0, flags=1, 0 - 138
Thu Mar 15 07:18:23 2007 us=449676 TLS-Auth MTU parms [ L:1574 D:138 EF:38
EB:0 ET:0 EL:0 ]
Thu Mar 15 07:18:23 2007 us=449692 MTU DYNAMIC mtu=1450, flags=2, 1574 -
1450
Thu Mar 15 07:18:23 2007 us=449759 TUN/TAP device /dev/tap0 opened
Thu Mar 15 07:18:23 2007 us=449832 Data Channel MTU parms [ L:1574 D:1450
EF:42 EB:135 ET:32 EL:0 AF:3/1 ]
Thu Mar 15 07:18:23 2007 us=450475 GID set to nobody
Thu Mar 15 07:18:23 2007 us=450504 UID set to nobody
Thu Mar 15 07:18:23 2007 us=450532 Socket Buffers: R=[42080-65536]
S=[9216-65536]
Thu Mar 15 07:18:23 2007 us=450552 UDPv4 link local (bound): [undef]:1195
Thu Mar 15 07:18:23 2007 us=450567 UDPv4 link remote: [undef]
Thu Mar 15 07:18:23 2007 us=450586 MULTI: multi_init called, r=256 v=256
Thu Mar 15 07:18:23 2007 us=450623 IFCONFIG POOL: base=192.168.251.200
size=11
Thu Mar 15 07:18:23 2007 us=450649 IFCONFIG POOL LIST
Thu Mar 15 07:18:23 2007 us=450670 PO_INIT maxevents=4 flags=0x0002
Thu Mar 15 07:18:23 2007 us=450688 Initialization Sequence Completed
Thu Mar 15 07:18:23 2007 us=450702 SCHEDULE: schedule_find_least NULL
Thu Mar 15 07:18:23 2007 us=450718 PO_CTL rwflags=0x0001 ev=5 arg=0x080891c0
Thu Mar 15 07:18:23 2007 us=450733 PO_CTL rwflags=0x0001 ev=6 arg=0x080891c4
Thu Mar 15 07:18:23 2007 us=450751 I/O WAIT TR|Tw|SR|Sw [10/0]

last pid:  1066;  load averages:  0.49,  0.12,  0.04up 0+00:48:00
08:28:16
31 processes:  2 running, 29 sleeping
CPU states:  0.0% user,  0.0% nice, 98.5% system,  1.5% interrupt,  0.0%
idle
Mem: 11M Active, 7956K Inact, 17M Wired, 11M Buf, 448M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
 1066 root1 1220  2984K  2436K RUN  0:36 96.72% openvpn
  786 root1  960  3056K  1900K select   0:00  0.00% ntpd
[snip]

Vlad Galu:
As this is a tap server there is no remote end point to connect to. It is
waiting for a windows client to connect. As you can see from the openvpn
logs the tap device is not even getting initiated.

Luigi Rizzo:
I'm no programmer, so suggestions on how to fix it are welcome ;)

Regards
Emile

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Bruce M. Simpson

Emile Coetzee wrote:

I have done some more investigation. Rolling back to RELENG_6_2 solves the
problem. I have now had this problem on 3 boxes in 2 days and have been able
to reproduce it on a 4th in our lab.

Starting openvpn with full debug it stops just before the point where the
TAP devices gets initiated. And then the output from top shows openvpn on
99% cpu time (well almost).
  
The recent changes to tap(4) were tested with openvpn and ppp before 
committing, and this is an isolated report at the moment. Other users on 
the list have suggested that the problem you are reporting does not 
affect FreeBSD alone.


Please consider doing the following:

1. Use ktrace(8) to extract a kdump of openvpn as it runs into this 
problem, so that you may determine exactly which syscalls are invoked 
when you see this problem, and post it on this list.


Consider using the ktrace pid mode to attach to openvpn as it blocks, as 
opposed to running openvpn under ktrace as a child process.


2. Implementing the usleep() fix as discussed by luigi, as the 
information in point 1 will give clues as to where the code is spinning.


Regards,
BMS

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Emile Coetzee
Hi Bruce

 

I have run a ktrace as you suggested

 

I put the following in a script /usr/local/sbin/openvpn --daemon --config
/usr/local/etc/openvpn/tapserver.conf  and called it tapserver.sh and then
used

ktrace -d -f openvpn.dump ./tapserver.sh

This captured all the information for the parent PID 1082

 

I then watched top in a second session and identified the PID (1083) of the
openvpn process that was using up the CPU time.

 

I then ran 

ktrace -p 1083 -f openvpn2.dump 

 

This produced no data at all.

 

The dump is fairly large so I have dropped it here:
http://pastebin.com/899318

If required/allowed I can post it to the list.

 

Regards

Emile

 

 

 

 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Bruce M. Simpson

Emile Coetzee wrote:


I then ran 

ktrace -p 1083 -f openvpn2.dump 


This produced no data at all.
  

Shouldn't. :-) You need to run ktrace then kdump to dump the trace.

Correlate the cpu spin with the wallclock time in the kdump output, and 
paste an excerpt of what you get -- it sounds like it is spinning on a 
syscall.


I can't seem to get the stuff from pastebin, it just times out, so ok to 
post an excerpt here.


Thanks,
BMS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Emile Coetzee

 I then ran 

 ktrace -p 1083 -f openvpn2.dump 

 This produced no data at all.
   

Shouldn't. :-) You need to run ktrace then kdump to dump the trace.

I figured that part out on my own thanks ;) The second file is 0kb in size
running a kdump on it produces no output.

Correlate the cpu spin with the wallclock time in the kdump output, and 
paste an excerpt of what you get -- it sounds like it is spinning on a 
syscall.
I have no idea what exactly you want me to do here. Point me at some reading
material if you like or show me some examples.

I can't seem to get the stuff from pastebin, it just times out, so ok to 
post an excerpt here.

No worries below is the last part of the first ktrace. I have dropped a full
copy on our webserver: http://www.clarotech.co.za/dump/openvpn.txt

  1082 sh   RET   read 4096/0x1000
  1082 sh   CALL  mmap(0,0xea000,0x5,0x20002,0x3,0,0,0)
  1082 sh   RET   mmap 672038912/0x280e8000
  1082 sh   CALL  mprotect(0x281b5000,0x1000,0x7)
  1082 sh   RET   mprotect 0
  1082 sh   CALL  mprotect(0x281b5000,0x1000,0x5)
  1082 sh   RET   mprotect 0
  1082 sh   CALL  mmap(0x281b6000,0x6000,0x3,0x12,0x3,0,0xcd000,0)
  1082 sh   RET   mmap 672882688/0x281b6000
  1082 sh   CALL  mmap(0x281bc000,0x16000,0x3,0x1012,0x,0,0,0)
  1082 sh   RET   mmap 672907264/0x281bc000
  1082 sh   CALL  close(0x3)
  1082 sh   RET   close 0
  1082 sh   CALL  access(0x28092000,0)
  1082 sh   NAMI  /lib/libncurses.so.6
  1082 sh   RET   access 0
  1082 sh   CALL  sysarch(0xa,0xbfbfec60)
  1082 sh   RET   sysarch 0
  1082 sh   CALL  mmap(0,0x440,0x3,0x1000,0x,0,0,0)
  1082 sh   RET   mmap 672997376/0x281d2000
  1082 sh   CALL  munmap(0x281d2000,0x440)
  1082 sh   RET   munmap 0
  1082 sh   CALL  mmap(0,0x3f8,0x3,0x1000,0x,0,0,0)
  1082 sh   RET   mmap 672997376/0x281d2000
  1082 sh   CALL  munmap(0x281d2000,0x3f8)
  1082 sh   RET   munmap 0
  1082 sh   CALL  mmap(0,0x1208,0x3,0x1000,0x,0,0,0)
  1082 sh   RET   mmap 672997376/0x281d2000
  1082 sh   CALL  munmap(0x281d2000,0x1208)
  1082 sh   RET   munmap 0
  1082 sh   CALL  mmap(0,0x58d8,0x3,0x1000,0x,0,0,0)
  1082 sh   RET   mmap 672997376/0x281d2000
  1082 sh   CALL  munmap(0x281d2000,0x58d8)
  1082 sh   RET   munmap 0
  1082 sh   CALL  sigprocmask(0x1,0x28087980,0xbfbfec30)
  1082 sh   RET   sigprocmask 0
  1082 sh   CALL  sigprocmask(0x3,0x28087990,0)
  1082 sh   RET   sigprocmask 0
  1082 sh   CALL  getpid
  1082 sh   RET   getpid 1082/0x43a
  1082 sh   CALL  geteuid
  1082 sh   RET   geteuid 0
  1082 sh   CALL  getppid
  1082 sh   RET   getppid 1006/0x3ee
  1082 sh   CALL  readlink(0x281ade4b,0xbfbfe810,0x3f)
  1082 sh   NAMI  /etc/malloc.conf
  1082 sh   RET   readlink -1 errno 2 No such file or directory
  1082 sh   CALL  issetugid
  1082 sh   RET   issetugid 0
  1082 sh   CALL  mmap(0,0x1000,0x3,0x1002,0x,0,0,0)
  1082 sh   RET   mmap 672997376/0x281d2000
  1082 sh   CALL  break(0x8066000)
  1082 sh   RET   break 0
  1082 sh   CALL  break(0x8067000)
  1082 sh   RET   break 0
  1082 sh   CALL  break(0x8068000)
  1082 sh   RET   break 0
  1082 sh   CALL  stat(0x8067004,0xbfbfebb0)
  1082 sh   NAMI  /var/mail/support
  1082 sh   RET   stat 0
  1082 sh   CALL  getuid
  1082 sh   RET   getuid 0
  1082 sh   CALL  geteuid
  1082 sh   RET   geteuid 0
  1082 sh   CALL  getgid
  1082 sh   RET   getgid 0
  1082 sh   CALL  getegid
  1082 sh   RET   getegid 0
  1082 sh   CALL  open(0xbfbfee07,0,0x28066b71)
  1082 sh   NAMI  ./tapserver.sh
  1082 sh   RET   open 3
  1082 sh   CALL  fcntl(0x3,0,0xa)
  1082 sh   RET   fcntl 10/0xa
  1082 sh   CALL  close(0x3)
  1082 sh   RET   close 0
  1082 sh   CALL  fcntl(0xa,0x2,0x1)
  1082 sh   RET   fcntl 0
  1082 sh   CALL  sigaction(0x2,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x2,0xbfbfebb0,0xbfbfeb90)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x2,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x2,0xbfbfebb0,0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x3,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x3,0xbfbfebb0,0xbfbfeb90)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x3,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x3,0xbfbfebb0,0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0xf,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0xf,0xbfbfebb0,0xbfbfeb90)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  sigaction(0x1c,0,0xbfbfebb0)
  1082 sh   RET   sigaction 0
  1082 sh   CALL  

Re: Openvpn tap uses 99% cpu time

2007-03-15 Thread Bruce M. Simpson

Emile Coetzee wrote:

No worries below is the last part of the first ktrace. I have dropped a full
copy on our webserver: http://www.clarotech.co.za/dump/openvpn.txt
  

Hi,

Still no diagnostic info I'm afraid. What is needed is for the problem 
to be reproduced, with the openvpn process which is spinning under 
ktrace. You might need to hack your script to do this i.e. run openvpn 
directly from ktrace. Let it run for a bit, let it spin, then kill it. 
Check that you have kdump output from the openvpn process itself.


Use the -T switch of kdump to get absolute timestamps, watch the 
wallclock time on your system when the spin happens, and post an excerpt 
of the kdump output at that time, which should tell us where openvpn is 
spinning.


Thanks,
BMS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Openvpn tap uses 99% cpu time

2007-03-15 Thread Emile Coetzee
Emile Coetzee wrote:
 No worries below is the last part of the first ktrace. I have dropped a
full
 copy on our webserver: http://www.clarotech.co.za/dump/openvpn.txt
   
Hi,

Still no diagnostic info I'm afraid. What is needed is for the problem 
to be reproduced, with the openvpn process which is spinning under 
ktrace. You might need to hack your script to do this i.e. run openvpn 
directly from ktrace. Let it run for a bit, let it spin, then kill it. 
Check that you have kdump output from the openvpn process itself.

Use the -T switch of kdump to get absolute timestamps, watch the 
wallclock time on your system when the spin happens, and post an excerpt 
of the kdump output at that time, which should tell us where openvpn is 
spinning.

The problem is that I can't kill openvpn once the spin starts. I can do very
little else. Even trying to login on the console does not go any further
than letting me enter a username and password and then it just sits. Even
Ctrl-C on running openvpn interactively on the command line does not work.

I will try some of your suggestions and post back.

Regards
Emile

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Openvpn tap uses 99% cpu time

2007-03-15 Thread Emile Coetzee
Still no diagnostic info I'm afraid. What is needed is for the problem 
to be reproduced, with the openvpn process which is spinning under 
ktrace. You might need to hack your script to do this i.e. run openvpn 
directly from ktrace. Let it run for a bit, let it spin, then kill it. 
Check that you have kdump output from the openvpn process itself.

Use the -T switch of kdump to get absolute timestamps, watch the 
wallclock time on your system when the spin happens, and post an excerpt 
of the kdump output at that time, which should tell us where openvpn is 
spinning.

Okay I finally have a ktrace of the offending process. You can view it here:
http://www.clarotech.co.za/dump/openvpn2.txt

Regards
Emile

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Openvpn tap uses 99% cpu time

2007-03-14 Thread Emile Coetzee
Since the latest updates to sys/net/if_tap.c (I suspect) in 6.2-STABLE
my openvpn tap server is using up all available CPU time (99%)
effectively killing the box. 

I replicated this on a second machine which was about 3 weeks behind
with a cvsup to the latest from RELENG_6 and after rebuilding the kernel
it does the same thing.

I have portupgraded openvpn to the latest release (openvpn-2.0.6_7), but
this had not made any difference. Is anyone else experiencing similar
problems?

Regards
Emile
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-14 Thread Vlad GALU

On 3/14/07, Emile Coetzee [EMAIL PROTECTED] wrote:

Since the latest updates to sys/net/if_tap.c (I suspect) in 6.2-STABLE
my openvpn tap server is using up all available CPU time (99%)
effectively killing the box.

I replicated this on a second machine which was about 3 weeks behind
with a cvsup to the latest from RELENG_6 and after rebuilding the kernel
it does the same thing.

I have portupgraded openvpn to the latest release (openvpn-2.0.6_7), but
this had not made any difference. Is anyone else experiencing similar
problems?



  It usually happens when OpenVPN is told to retry connecting to the
remote endpoint and the remote endpoint isn't accessible due to
network conditions. It happens the same even on Windows.



Regards
Emile
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]




--
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-14 Thread Jeremy Chadwick
On Wed, Mar 14, 2007 at 03:11:58PM +0200, Emile Coetzee wrote:
 Since the latest updates to sys/net/if_tap.c (I suspect) in 6.2-STABLE
 my openvpn tap server is using up all available CPU time (99%)
 effectively killing the box. 

 I replicated this on a second machine which was about 3 weeks behind
 with a cvsup to the latest from RELENG_6 and after rebuilding the kernel
 it does the same thing.

 I have portupgraded openvpn to the latest release (openvpn-2.0.6_7), but
 this had not made any difference. Is anyone else experiencing similar
 problems?

I haven't seen this on either end of our OpenVPN setup.  Ours:

endpoint A:

FreeBSD eos.sc1.parodius.com 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Mar  8 
10:41:09 PST 2007

openvpn-2.0.6_7 Secure IP/Ethernet tunnel daemon

bridge0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
ether 2e:2e:ec:9c:76:02
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto stp maxaddr 100 timeout 1200
root id 00:00:00:00:00:00 priority 0 ifcost 0 port 0
member: tap0 flags=143LEARNING,DISCOVER,AUTOEDGE,AUTOPTP
member: bge1 flags=143LEARNING,DISCOVER,AUTOEDGE,AUTOPTP
tap0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500
ether 00:bd:22:14:00:00
Opened by PID 36044

endpoint B:

FreeBSD medusa.parodius.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug  5 
09:36:02 PDT 2006

openvpn-2.0.6_4 Secure IP/Ethernet tunnel daemon

bridge0: flags=8043UP,BROADCAST,RUNNING,MULTICAST mtu 1500
ether e6:7a:00:c2:48:e5
priority 32768 hellotime 2 fwddelay 15 maxage 20
member: tap0 flags=3LEARNING,DISCOVER
member: em1 flags=3LEARNING,DISCOVER
tap0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500
ether 00:bd:a4:27:00:00
Opened by PID 849


One thing I am noticing, however, is that on endpoint A the pid tap0
claims to be associated with is incorrect.  My guess is that this is due
to some fork() stuff that's going on.  tap0 claims the wrong pid, but
the pidfile has the correct pid.

eos# ps -auxw | grep 36044
eos#
eos# ps -auxw | grep openvpn
root36048  0.0  0.1  3248  2644  ??  Ss7:03AM   0:00.02 
/usr/local/sbin/openvpn --cd ...
eos# cat /var/run/openvpn.pid
36048

While on endpoint B:

medusa# ps -auxw | grep 849
root  849  0.0  0.2  3604  2268  ??  Ss3Feb07   2:18.63 
/usr/local/sbin/openvpn --cd ...
medusa# cat /var/run/openvpn.pid
849


-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Openvpn tap uses 99% cpu time

2007-03-14 Thread Luigi Rizzo
On Wed, Mar 14, 2007 at 03:48:38PM +0200, Vlad GALU wrote:
 On 3/14/07, Emile Coetzee [EMAIL PROTECTED] wrote:
  Since the latest updates to sys/net/if_tap.c (I suspect) in 6.2-STABLE
  my openvpn tap server is using up all available CPU time (99%)
  effectively killing the box.
...
It usually happens when OpenVPN is told to retry connecting to the
 remote endpoint and the remote endpoint isn't accessible due to
 network conditions. It happens the same even on Windows.

something that often does the job is putting a usleep(1)
right after the offending syscall (e.g. a select() returning
with an error, or the like).
This deschedules the current process for at least one tick,
which is normally enough to change the busy-wait loop into one
that still is busy-wait but only takes a tiny fraction of the cpu.
Of course you want to usleep() only if there is a real error condition.

cheers
luigi
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]