On Wed 12 Jun 2019 at 15:03, Jiri Pirko <[email protected]> wrote: > Hi. > > I came across serious performance degradation when adding many tps. I'm > using following script: > > ------------------------------------------------------------------------ > #!/bin/bash > > dev=testdummy > ip link add name $dev type dummy > ip link set dev $dev up > tc qdisc add dev $dev ingress > > tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp") > pref_id=1 > > while [ $pref_id -lt 20000 ] > do > echo "filter add dev $dev ingress proto ip pref $pref_id matchall > action drop" >> $tmp_file_name > ((pref_id++)) > done > > start=$(date +"%s") > tc -b $tmp_file_name > stop=$(date +"%s") > echo "Insertion duration: $(($stop - $start)) sec" > rm -f $tmp_file_name > > ip link del dev $dev > ------------------------------------------------------------------------ > > On my testing vm, result on 5.1 kernel is: > Insertion duration: 3 sec > On net-next this is: > Insertion duration: 54 sec > > I did simple prifiling using perf. Output on 5.1 kernel: > 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find > 3.30% tc [kernel.kallsyms] [k] > _raw_spin_unlock_irqrestore > 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64 > 0.60% tc_pref_scale.s libc-2.28.so [.] malloc > 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner > 0.51% tc libc-2.28.so [.] > __memset_sse2_unaligned_erms > 0.40% tc_pref_scale.s libc-2.28.so [.] > __gconv_transform_utf8_internal > 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free > 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2 > 0.37% tc [kernel.kallsyms] [k] idr_get_free
Are these results for same config? Here I don't see any lockdep or KASAN. However in next trace... > > Output on net-next: > 39.26% tc [kernel.vmlinux] [k] lock_is_held_type > 33.99% tc [kernel.vmlinux] [k] tcf_chain_tp_find > 12.77% tc [kernel.vmlinux] [k] __asan_load4_noabort > 1.90% tc [kernel.vmlinux] [k] __asan_load8_noabort > 1.08% tc [kernel.vmlinux] [k] lock_acquire > 0.94% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled > 0.61% tc [kernel.vmlinux] [k] > debug_lockdep_rcu_enabled.part.5 > 0.51% tc [kernel.vmlinux] [k] unwind_next_frame > 0.50% tc [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore > 0.47% tc_pref_scale.s [kernel.vmlinux] [k] lock_acquire > 0.47% tc [kernel.vmlinux] [k] lock_release ... both lockdep and kasan consume most of CPU time. BTW it takes 5 sec to execute your script on my system with net-next (debug options disabled). > > I didn't investigate this any further now. I fear that this might be > related to Vlad's changes in the area. Any ideas? > > Thanks! > > Jiri
