On Sat, 2019-04-13 at 13:55 +0300, David Woodhouse wrote: > > Let's switch to using iperf. You can limit the sending bandwidth with > that. If we send more than the receive side can handle, it actually > ends up receiving less than its peak capacity.
So, while iperf is running at the optimum output, let's see what perf
says:
sudo perf record -g --pid=`pidof lt-openconnect`
Children Self Command Shared Object Symbol
+ 42.15% 0.30% lt-openconnect [kernel.vmlinux] [k]
entry_SYSCALL_64_after_hwframe
+ 41.92% 0.42% lt-openconnect [kernel.vmlinux] [k] do_syscall_64
+ 36.89% 0.55% lt-openconnect libpthread-2.28.so [.] __libc_send
+ 32.96% 32.87% lt-openconnect libopenconnect.so.5.5.0 [.]
aesni_cbc_sha1_enc_ssse3
+ 30.31% 0.16% lt-openconnect [kernel.vmlinux] [k]
__x64_sys_sendto
+ 30.14% 0.43% lt-openconnect [kernel.vmlinux] [k] __sys_sendto
+ 28.78% 0.04% lt-openconnect [kernel.vmlinux] [k] sock_sendmsg
+ 27.95% 1.17% lt-openconnect [kernel.vmlinux] [k] udp_sendmsg
+ 17.77% 0.08% lt-openconnect [kernel.vmlinux] [k]
udp_send_skb.isra.50
+ 17.60% 0.01% lt-openconnect [kernel.vmlinux] [k] ip_send_skb
+ 16.34% 0.26% lt-openconnect [kernel.vmlinux] [k] ip_output
+ 15.10% 0.48% lt-openconnect [kernel.vmlinux] [k]
ip_finish_output2
+ 14.57% 0.44% lt-openconnect [kernel.vmlinux] [k]
__dev_queue_xmit
+ 13.31% 0.10% lt-openconnect [kernel.vmlinux] [k]
sch_direct_xmit
+ 10.35% 0.21% lt-openconnect libpthread-2.28.so [.] __libc_read
+ 8.76% 0.23% lt-openconnect [kernel.vmlinux] [k]
dev_hard_start_xmit
+ 7.78% 0.18% lt-openconnect [kernel.vmlinux] [k] ip_make_skb
+ 6.82% 0.11% lt-openconnect [kernel.vmlinux] [k] ksys_read
+ 6.25% 0.18% lt-openconnect [kernel.vmlinux] [k] vfs_read
+ 6.24% 6.21% lt-openconnect libopenconnect.so.5.5.0 [.]
sha1_block_data_order_ssse3
+ 5.70% 0.82% lt-openconnect [kernel.vmlinux] [k]
__ip_append_data.isra.52
+ 5.56% 0.00% lt-openconnect [unknown] [k]
0000000000000000
+ 5.54% 5.54% lt-openconnect [kernel.vmlinux] [k]
entry_SYSCALL_64
+ 5.33% 0.15% lt-openconnect [kernel.vmlinux] [k] __vfs_read
+ 5.15% 0.52% lt-openconnect [tun] [k]
tun_chr_read_iter
+ 5.13% 2.84% lt-openconnect [ena] [k]
ena_start_xmit
.. and without the '-g':
Overhead Command Shared Object Symbol
32.94% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_sha1_enc_ssse3
5.77% lt-openconnect libopenconnect.so.5.5.0 [.]
sha1_block_data_order_ssse3
4.70% lt-openconnect [kernel.vmlinux] [k] _raw_spin_lock
3.44% lt-openconnect [kernel.vmlinux] [k] entry_SYSCALL_64
2.86% lt-openconnect [kernel.vmlinux] [k] syscall_return_via_sysret
2.81% lt-openconnect [kernel.vmlinux] [k]
copy_user_enhanced_fast_string
2.66% lt-openconnect [kernel.vmlinux] [k] irq_entries_start
1.86% lt-openconnect libopenconnect.so.5.5.0 [.] aesni_cbc_encrypt
1.44% lt-openconnect [kernel.vmlinux] [k] pvclock_clocksource_read
1.39% lt-openconnect [kernel.vmlinux] [k] native_apic_msr_eoi_write
1.34% lt-openconnect [ena] [k] ena_io_poll
1.33% lt-openconnect [ena] [k] ena_start_xmit
1.12% lt-openconnect [kernel.vmlinux] [k] __fget_light
1.02% lt-openconnect [kernel.vmlinux] [k] common_interrupt
0.88% lt-openconnect [kernel.vmlinux] [k] interrupt_entry
0.73% lt-openconnect [kernel.vmlinux] [k] packet_rcv
0.71% lt-openconnect [tun] [k] tun_do_read
0.66% lt-openconnect [kernel.vmlinux] [k] udp_sendmsg
0.66% lt-openconnect [kernel.vmlinux] [k] __slab_free
0.61% lt-openconnect [kernel.vmlinux] [k] ipt_do_table
0.61% lt-openconnect [kernel.vmlinux] [k] ipv4_mtu
0.60% lt-openconnect [kernel.vmlinux] [k] sock_wfree
0.58% lt-openconnect [kernel.vmlinux] [k] kfree
0.58% lt-openconnect [tun] [k] tun_chr_read_iter
Expanding (a rerun of) the first one to see where all that syscall time
is, it's mostly on the UDP send side:
Children Self Command Shared Object Symbol
- 38.15% 0.29% lt-openconnect [kernel.vmlinux] [k]
entry_SYSCALL_64_after_hwframe ▒
37.86% entry_SYSCALL_64_after_hwframe
▒
- do_syscall_64
◆
- 28.92% __x64_sys_sendto
▒
- 28.68% __sys_sendto
▒
- 27.05% sock_sendmsg
▒
- 26.44% udp_sendmsg
▒
- 17.62% udp_send_skb.isra.50
▒
- 17.46% ip_send_skb
▒
- 15.92% ip_output
▒
- 14.29% ip_finish_output2
▒
- 12.85% __dev_queue_xmit
▒
- 11.31% sch_direct_xmit
▒
+ 5.92% dev_hard_start_xmit
▒
4.36% _raw_spin_lock
▒
+ 0.84% validate_xmit_skb_list
▒
+ 0.76% __local_bh_enable_ip
▒
0.73% ip_finish_output
▒
0.55% nf_hook_slow
▒
+ 1.54% ip_local_out
▒
+ 7.46% ip_make_skb
▒
0.77% sk_dst_check
▒
+ 0.54% security_socket_sendmsg
▒
+ 1.05% sockfd_lookup_light
▒
- 7.51% ksys_read
▒
+ 6.76% vfs_read
▒
+ 0.63% __fdget_pos
▒
+ 0.55% common_interrupt
▒
+ 37.92% 0.38% lt-openconnect [kernel.vmlinux] [k]
do_syscall_64 ▒
+ 37.66% 32.36% lt-openconnect libopenconnect.so.5.5.0 [.]
aesni_cbc_sha1_enc_ssse3 ▒
So setting up zerocopy for the tun device with virtio-user might not
win us much. Maybe MSG_ZEROCOPY for the UDP could help? Not quite sure
where in the above the copy from userspace is actually happening.
But these are my results, not yours. And frankly, I'm not worried about
the performance on *my* system. 1800Mb/s will do me quite nicely for
now, thank you very much.
Let's see what you get on your side for the comparable traces. Start
the 'record' right after starting iperf in a different terminal, then
stop it just before iperf is about to finish, ~10 seconds later.
(oops, I see sha1_block_data_order_ssse3 in the trace. It's not
'detecting' AVX support. Fix that and my ESP microbenchmark is now at
2775Mb/s, although the overall perf traces look similar.
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ openconnect-devel mailing list [email protected] http://lists.infradead.org/mailman/listinfo/openconnect-devel
