Wed, Jun 12, 2019 at 02:34:02PM CEST, [email protected] wrote: > >On Wed 12 Jun 2019 at 15:03, Jiri Pirko <[email protected]> wrote: >> Hi. >> >> I came across serious performance degradation when adding many tps. I'm >> using following script: >> >> ------------------------------------------------------------------------ >> #!/bin/bash >> >> dev=testdummy >> ip link add name $dev type dummy >> ip link set dev $dev up >> tc qdisc add dev $dev ingress >> >> tmp_file_name=$(date +"/tmp/tc_batch.%s.%N.tmp") >> pref_id=1 >> >> while [ $pref_id -lt 20000 ] >> do >> echo "filter add dev $dev ingress proto ip pref $pref_id matchall >> action drop" >> $tmp_file_name >> ((pref_id++)) >> done >> >> start=$(date +"%s") >> tc -b $tmp_file_name >> stop=$(date +"%s") >> echo "Insertion duration: $(($stop - $start)) sec" >> rm -f $tmp_file_name >> >> ip link del dev $dev >> ------------------------------------------------------------------------ >> >> On my testing vm, result on 5.1 kernel is: >> Insertion duration: 3 sec >> On net-next this is: >> Insertion duration: 54 sec >> >> I did simple prifiling using perf. Output on 5.1 kernel: >> 77.85% tc [kernel.kallsyms] [k] tcf_chain_tp_find >> 3.30% tc [kernel.kallsyms] [k] >> _raw_spin_unlock_irqrestore >> 1.33% tc_pref_scale.s [kernel.kallsyms] [k] do_syscall_64 >> 0.60% tc_pref_scale.s libc-2.28.so [.] malloc >> 0.55% tc [kernel.kallsyms] [k] mutex_spin_on_owner >> 0.51% tc libc-2.28.so [.] >> __memset_sse2_unaligned_erms >> 0.40% tc_pref_scale.s libc-2.28.so [.] >> __gconv_transform_utf8_internal >> 0.38% tc_pref_scale.s libc-2.28.so [.] _int_free >> 0.37% tc_pref_scale.s libc-2.28.so [.] __GI___strlen_sse2 >> 0.37% tc [kernel.kallsyms] [k] idr_get_free > >Are these results for same config? Here I don't see any lockdep or >KASAN. However in next trace... > >> >> Output on net-next: >> 39.26% tc [kernel.vmlinux] [k] lock_is_held_type >> 33.99% tc [kernel.vmlinux] [k] tcf_chain_tp_find >> 12.77% tc [kernel.vmlinux] [k] __asan_load4_noabort >> 1.90% tc [kernel.vmlinux] [k] __asan_load8_noabort >> 1.08% tc [kernel.vmlinux] [k] lock_acquire >> 0.94% tc [kernel.vmlinux] [k] debug_lockdep_rcu_enabled >> 0.61% tc [kernel.vmlinux] [k] >> debug_lockdep_rcu_enabled.part.5 >> 0.51% tc [kernel.vmlinux] [k] unwind_next_frame >> 0.50% tc [kernel.vmlinux] [k] >> _raw_spin_unlock_irqrestore >> 0.47% tc_pref_scale.s [kernel.vmlinux] [k] lock_acquire >> 0.47% tc [kernel.vmlinux] [k] lock_release > >... both lockdep and kasan consume most of CPU time. > >BTW it takes 5 sec to execute your script on my system with net-next >(debug options disabled).
You are right, my bad. Sorry for the fuzz. > >> >> I didn't investigate this any further now. I fear that this might be >> related to Vlad's changes in the area. Any ideas? >> >> Thanks! >> >> Jiri
