On Fri, 6 Jul 2018 08:22:01 +0200 Claudio <[email protected]> wrote:
> Hello all, > > I have been experimenting with the idea of leaving ftrace enabled, with sched > events, > on production systems. > > The main concern that I am having at the moment is about the impact on the > system. > Enabling the sched events that I currently need for the tracing application > seems to slow down context-switches considerably, and make the system less > responsive. > > I have tested with cyclictest on the mainline kernel, and noticed an increase > of min, avg latencies of around 25%. > > Is this expected? No it is not. And this is weird. I'm seeing it too. > > Some initial investigation into ftrace seems to point at the reservation and > commit of the events into the ring buffer > as the highest sources of overhead, while event parameters copying, including > COMM, does not seem to have any noticeable effect > relative to those costs. Did you use perf to see where the issues were? If so what perf command did you use? When running just: perf record ./cyclictest --smp -p95 -m -s -N -l 1000000 -q I get this: 6.08% cyclictest [kernel.vmlinux] [k] syscall_return_via_sysret 4.03% cyclictest cyclictest [.] timerthread 3.97% cyclictest [kernel.vmlinux] [k] __schedule 3.81% cyclictest [kernel.vmlinux] [k] native_write_msr 3.13% cyclictest [kernel.vmlinux] [k] trace_save_cmdline 2.71% cyclictest [kernel.vmlinux] [k] do_nanosleep 2.48% cyclictest [kernel.vmlinux] [k] do_syscall_64 2.11% cyclictest [kernel.vmlinux] [k] pick_next_task_fair 2.03% cyclictest [kernel.vmlinux] [k] native_sched_clock 1.99% cyclictest [kernel.vmlinux] [k] cpuacct_charge 1.93% cyclictest [vdso] [.] __vdso_clock_gettime 1.86% cyclictest [kernel.vmlinux] [k] hrtimer_nanosleep 1.84% cyclictest [kernel.vmlinux] [k] cpupri_set 1.66% cyclictest [kernel.vmlinux] [k] update_curr_rt 1.65% cyclictest [kernel.vmlinux] [k] __perf_event_task_sched_out 1.59% cyclictest [kernel.vmlinux] [k] dequeue_rt_stack 1.55% cyclictest [kernel.vmlinux] [k] __rb_reserve_next Which shows there's a lot being done before we even get to the ring buffer reserve. One thing that I notice too is that we have trace_save_cmdline that also gets activated when enabling trace events. Perhaps I should try to optimize that more. > > I have been running 20 times the following test, and thrown away the first > results: > > $ sudo ./cyclictest --smp -p95 -m -s -N -l 100000 -q > > $ uname -a > Linux claudio-HP-ProBook-470-G5 4.18.0-rc3+ #3 SMP Tue Jul 3 15:50:30 CEST > 2018 x86_64 x86_64 x86_64 GNU/Linux > > For brevity, this is a comparison of one test's results. All other test > results show the same ~25% increase. > > On the left side, the run without ftrace sched events, on the right side with > ftrace sched events enabled. > > CPU Count Min Act Avg Max Count > Min-ftrace Act-ftrace Avg-ftrace Max-ftrace > 0 100000 2339 2936 2841 139478 100000 2900 > 3182 3566 93056 > 1 66742 2365 3386 2874 93639 66750 2959 > 3786 3646 154074 > 2 50080 2376 3058 2910 196221 50097 2997 > 4209 3655 18707 > 3 40076 2394 3461 2931 17914 40091 3006 > 4417 3750 17159 > 4 33404 2371 3612 2834 15336 33419 2997 > 3836 3594 23172 > 5 28635 2387 3313 2885 25863 28649 2995 > 3795 3647 9956 > 6 25058 2384 3428 2968 12162 25071 3051 > 4366 3719 18151 > 7 22275 2381 2859 2982 10706 22287 3046 > 5078 3825 10781 > > I would be thankful for any advice or comments on this, > especially with the goal in mind to lower as much as possible the runtime > impact on the system. But still, we use all sorts of trace events when dealing with real time, and trace events hardly shows anything in the PREEMPT_RT kernel. Where cyclictest shows 20 microseconds, and sched events hardly touches that. But I see this weird side effect on a distro kernel too. Maybe there's something weird going on when we have CONFIG_VOLUNTARY_PREEMPT set :-/ I'll investigate further. Thanks for the report. -- Steve

