-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 15/03/10 16:29, Jan Just Keijser wrote:
> More tests, this time with 'oprofile' : here's a recap:
> - nothing changed on the server side:
> openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key
> --cipher none
> 
> - upgraded to kernel 2.6.32.9-70.fc12.x86_64 on my laptop
> - selinux is disabled
> - installed the debuginfo rpms to get a 'vmlinux'
> - configure the oprofile deamon using
> opcontrol
> --vmlinux=/usr/lib/debug/lib/modules/2.6.32.9-70.fc12.x86_64/vmlinux
> - now start it, reset the statistics, start openvpn
>  opcontrol --start
>  opcontrol --reset
>  ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret
> secret.key --remote kudde.nikhef.nl --cipher none
> - download a file using 'nc' (this maxes out my 100 Mbps LAN at roughly
> 11 MB/s)
> - get the statistics
>  opcontrol --dump
>  opreport -l > stats

Thanks a lot!  This is way cool!  It just strikes me that you probably
should play with ftrace instead of oprofile.  It's better and got a
lower overhead than oprofile, afaik.

<http://lwn.net/Articles/322666/>

(btw. with an ftrace enabled kernel, it's even available on embedded
devices. You only need 'mount', 'echo' and 'cat' to interact with ftrace
... even though Steven Rostedt is working on a GUI for ftrace, called
kernelshark)

With such new kernel, there's also the perf tool.

<http://perf.wiki.kernel.org/index.php/Main_Page>

In both of these tools you have something called callgraph, iirc.  And
with that you see which function is calling which function and the
amount of time each step used.

I'll ask some people who's been involved in both ftrace and perf for
some better pointers!

> Here're the results on my laptop, running at runlevel 2 with as many
> daemons stopped and modules unloaded as possible:
> 
> when download a 100 Mb file (using nc) I see:
> 
>> head -20 after.100mb
> samples  %        app name                 symbol name
> 55558    30.0622  vmlinux                  read_hpet
> 19613    10.6125  vmlinux                  mwait_idle_with_hints

read_hpet usually means HPET timer.  I'm actually surprised to see that
one so high up on the list.  Right now I don't recall what
mwait_idle_with_hints does, but I guess it pops up high due to it
calling read_hpet().

> 10692     5.7854  libcrypto.so.1.0.0       /usr/lib64/libcrypto.so.1.0.0
> 5407      2.9257  vmlinux                  acpi_os_read_port
> 2546      1.3776  vmlinux                  copy_user_generic_string
> 1945      1.0524  opreport                 /usr/bin/opreport
> 1885      1.0200  vmlinux                  hpet_next_event
> 1325      0.7170  tg3                      /tg3
> 1235      0.6683  vmlinux                  schedule
> 1121      0.6066  tun                      /tun
> 1049      0.5676  vmlinux                  do_sys_poll
> 796       0.4307  vmlinux                  acpi_idle_enter_bm
> 795       0.4302  vmlinux                  sched_clock_local
> 769       0.4161  vmlinux                  tick_broadcast_oneshot_control
> 757       0.4096  vmlinux                  tcp_packet
> 749       0.4053  vmlinux                  cfb_imageblit
> 728       0.3939  vmlinux                  system_call
> 
> Observations:
> - why the heck is libcrypto so high on the list? I used 'cipher none' !
> - the 'tun' driver does not seem to be the bottleneck

It still might be that there are some locking going on, where the tun
driver calls some code paths which hits the mwait_idle_with_hints ...
(this is why I'm recommending to have a look at the callgraph in perf or
ftrace)

> Ah, of course,  openvpn still used crypto for the HMAC handshake!

That will cause calls to libcrypto.

> After adding '--auth none' to both client and server (and a tweak to
> opreport) I now get:
> 
> samples  %        linenr info                 app name                
> symbol name
> 140883   31.1707  hpet.c:748                  vmlinux                 
> read_hpet
> 51808    11.4626  process.c:356               vmlinux                 
> mwait_idle_with_hints

Still amazingly high!

[snip]
> Observations:
> - note that openvpn itself does not even make the top 15. It's lower in
> the list, however:
> 11896     0.3580  openvpn                  openvpn                 
> io_wait_dowork
> 10842     0.3263  openvpn                  openvpn                  po_wait
> 9608      0.2892  openvpn                  openvpn                 
> openvpn_decrypt
> 9449      0.2844  openvpn                  openvpn                  main
> 9250      0.2784  openvpn                  openvpn                 
> pre_select
> 9191      0.2766  openvpn                  openvpn                 
> process_incoming_link
> 7027      0.2115  openvpn                  openvpn                  po_ctl
> 4148      0.1248  openvpn                  openvpn                 
> packet_id_add
> 4090      0.1231  openvpn                  openvpn                 
> mss_fixup
> 4022      0.1210  openvpn                  openvpn                 
> process_io
> [...]
> - why do I see 'po_ctl' and the likes? is this the old
> openvpn-does-not-properly-use-EPOLL bug again?
> 
> I also ran a similar test using a Centos5 client (kernel
> 2.6.18-164.6.1.el5) over a Gbps LAN : here you can see some limitations
> of the tun driver or openvpn itself:
> - raw UDP gives me ~ 110 MB/s  (maxing out the Gbps LAN)
> - raw TCP/IP gives me ~ 60 MB/s (could use some tweaking but is not so
> bad, actually)
> - openvpn over UDP maxes out at somewhere between 32 - 40 MB/s
> - openvpn over TCP maxes out at ~ 16 MB/s
> 
> So either I'm misreading oprofile or it's *not* the tun driver that is
> causing bottlenecks.

It does not look so immediately.  But I this trace don't show which
functions are called by which functions.  If f.ex. the mwait or
read_hpet calls are coming from the tun driver, OpenSSL/libcrypto or
OpenVPN itself.

On the other hand!  HPET is not the most efficient timer, on some
systems where TSC timer is available it could be that this would change
the profile as well.

Available clock sources are found like this:
cat /sys/devices/system/clocksource/clocksource*/available_clocksource

And the clock source in use is found like this:
cat /sys/devices/system/clocksource/clocksource*/current_clocksource

On many systems the TSC clock source is considered unstable by kernel.
I've forgotten now if it was AMD or Intel CPUs which was worse, or how
this was related to the number of cores.  But it is connected somehow to
how the TSC clock source is synchronised between the CPU cores.

> If anybody else has more experience with 'oprofile' then please let me
> know how I can rerun these tests more effectively.

I have no experience with oprofile, so I'm not sure if callgraph is
available there.  But I believe a callgraph would give some even better
clues.

Thanks a lot for you tests and analysis!


kind regards,

David Sommerseth
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkueXuwACgkQDC186MBRfrpDUQCfewiKGlGanhyoHlpvu2rL0pmr
xpcAoJ6d9jXWuzsBadSO9+3VFAdyIy+5
=L9W5
-----END PGP SIGNATURE-----

Reply via email to