Hi, here are the early results of my experiments on enabling ftrace and also perf for use under Xenomai. Things work quite well so far, at least on x86-64 which allows access to current and current_thread_info from arbitrary contexts.
My primary goal was event tracing via ftrace, and this is the still minimalistic outcome (excerpt of a latency run): sampling-4872-4874 [000] 1075.969437: xn_nucleus_thread_suspend: thread=0xffffc90010909840 thread_name=sampling-4872 mask=4 timeout=0 timeout_mode=0 wchan=0x0 sampling-4872-4874 [000] 1075.969438: xn_nucleus_sched: status=2000000 sampling-4872-4874 [000] 1075.969438: xn_nucleus_sched_switch: prev=0xffffc90010909840 prev_name=sampling-4872 next=0xffffffff82501e80 next_name=ROOT/0 display-4872-4873 [000] 1075.969441: sys_exit: NR 1 = 70 display-4872-4873 [000] 1075.969442: sys_exit_write: 0x46 display-4872-4873 [000] 1075.969448: sys_enter: NR 1 (1, ce3a6000, 50, 1010100, 0, fefefeff) display-4872-4873 [000] 1075.969448: sys_enter_write: fd: 0x00000001, buf: 0xce3a6000, count: 0x00000050 display-4872-4873 [000] 1075.969453: sched_wakeup: comm=display-4872 pid=4873 prio=120 success=0 target_cpu=000 display-4872-4873 [000] 1075.969458: sys_exit: NR 1 = 80 display-4872-4873 [000] 1075.969459: sys_exit_write: 0x50 display-4872-4873 [000] 1075.969477: sys_enter: NR 1 (1, ce3a6000, 50, 1010100, 25252525, fefefeff) display-4872-4873 [000] 1075.969478: sys_enter_write: fd: 0x00000001, buf: 0xce3a6000, count: 0x00000050 display-4872-4873 [000] 1075.969483: sched_wakeup: comm=display-4872 pid=4873 prio=120 success=0 target_cpu=000 display-4872-4873 [000] 1075.969488: sys_exit: NR 1 = 80 display-4872-4873 [000] 1075.969488: sys_exit_write: 0x50 display-4872-4873 [000] 1075.969497: sched_wakeup: comm=gatekeeper/0 pid=44 prio=0 success=1 target_cpu=000 display-4872-4873 [000] 1075.969499: sched_stat_runtime: comm=display-4872 pid=4873 runtime=132372 [ns] vruntime=59700061733 [ns] display-4872-4873 [000] 1075.969501: sched_switch: prev_comm=display-4872 prev_pid=4873 prev_prio=120 prev_state=S ==> next_comm=gatekeeper/0 next_pid=44 next_prio=0 gatekeeper/0-44 [000] 1075.969505: xn_nucleus_thread_resume: thread=0xffffc90010908c40 thread_name=display-4872 mask=200 gatekeeper/0-44 [000] 1075.969506: xn_nucleus_sched_remote: status=2000000 gatekeeper/0-44 [000] 1075.969507: xn_nucleus_sched: status=2000000 gatekeeper/0-44 [000] 1075.969507: xn_nucleus_sched_switch: prev=0xffffffff82501e80 prev_name=ROOT/0 next=0xffffc90010908c40 next_name=display-4872 display-4872-4873 [000] 1075.969511: xn_nucleus_thread_suspend: thread=0xffffc90010908c40 thread_name=display-4872 mask=2 timeout=0 timeout_mode=0 wchan=0xffffc90010909608 The dump above was generated by "trace-cmd report", but the GUI of kernelshark works as well. Only a few Xenomai tracepoints were converted to TRACE_EVENT, so this is not yet fully chatty. I furthermore enabled the use of the function and function graph tracer over Xenomai (the full loop of the latency sampling thread): 0) | __ipipe_syscall_root() { 0) | __ipipe_dispatch_event() { 0) | hisyscall_event() { 0) | __rt_task_wait_period() { 0) | rt_task_wait_period() { 0) | xnpod_wait_thread_period() { 0) | xnpod_suspend_thread() { 0) | __xnpod_schedule() { 0) | ipipe_send_ipi() { 0) 0.425 us | __ipipe_send_ipi(); 0) 1.416 us | } 0) | xnsched_pick_next() { 0) 0.368 us | xnsched_tp_pick(); 0) | xnsched_sporadic_pick() { 0) 0.368 us | getmlq(); 0) 1.375 us | } 0) | xnsched_rt_pick() { 0) 0.372 us | getmlq(); 0) 1.374 us | } 0) 0.357 us | xnsched_idle_pick(); 0) 6.415 us | } 0) 0.371 us | __phys_addr(); ------------------------------------------ 0) samplin-4850 => <idle>-0 ------------------------------------------ 0) + 35.050 us | } 0) + 46.805 us | } 0) 0.533 us | __ipipe_walk_pipeline(); 0) + 49.183 us | } 0) + 50.242 us | } 0) + 54.607 us | } 0) + 77.866 us | } /* do_softirq */ 0) 0.387 us | rcu_irq_exit(); 0) 0.383 us | idle_cpu(); 0) | tick_nohz_stop_sched_tick() { 0) 1.048 us | ipipe_check_context(); 0) | ktime_get() { 0) 0.894 us | read_hpet(); 0) 1.972 us | } 0) 0.406 us | update_ts_time_stats(); 0) 0.376 us | timekeeping_max_deferment(); 0) 0.533 us | rcu_needs_cpu(); 0) 0.368 us | printk_needs_cpu(); 0) | get_next_timer_interrupt() { 0) 0.406 us | _raw_spin_lock(); 0) 0.368 us | _raw_spin_unlock(); 0) | hrtimer_get_next_event() { 0) | _raw_spin_lock_irqsave() { 0) 1.059 us | ipipe_check_context(); 0) 2.130 us | } 0) 0.353 us | __ipipe_spin_unlock_debug(); 0) | _raw_spin_unlock_irqrestore() { 0) | __ipipe_restore_root() { 0) 1.025 us | ipipe_check_context(); 0) 2.186 us | } 0) 3.177 us | } 0) 7.955 us | } 0) + 11.035 us | } 0) 0.571 us | select_nohz_load_balancer(); 0) | __ipipe_restore_root() { 0) 1.025 us | ipipe_check_context(); 0) 2.058 us | } 0) + 25.250 us | } 0) ! 106.862 us | } /* irq_exit */ 0) ! 386.745 us | } /* smp_apic_timer_interrupt */ 0) <========== | 0) ! 389.524 us | } /* __ipipe_sync_stage */ 0) ! 390.692 us | } /* __ipipe_walk_pipeline */ 0) ! 409.742 us | } /* __ipipe_dispatch_wired_nocheck */ 0) ! 410.819 us | } /* __ipipe_dispatch_wired */ 0) ! 415.161 us | } /* __ipipe_handle_irq */ 0) ! 422.534 us | } /* __ipipe_halt_root */ 0) ! 423.563 us | } /* default_idle */ 0) 0.387 us | __exit_idle(); 0) 1.056 us | ipipe_check_context(); 0) 0.383 us | enter_idle(); 0) | default_idle() { 0) | __ipipe_halt_root() { 0) | __ipipe_handle_irq() { 0) 0.417 us | irq_to_desc(); 0) | __ipipe_ack_apic() { 0) 0.364 us | native_apic_mem_write(); 0) 1.355 us | } 0) | __ipipe_dispatch_wired() { 0) | __ipipe_dispatch_wired_nocheck() { 0) | xnintr_clock_handler() { 0) 0.360 us | rthal_nmi_disarm(); 0) | xntimer_tick_aperiodic() { 0) | xnthread_periodic_handler() { 0) | xnpod_resume_thread() { 0) | xnsched_rt_enqueue() { 0) 0.383 us | addmlq(); 0) 1.416 us | } 0) 2.806 us | } 0) 3.820 us | } 0) | xntimer_next_local_shot() { 0) 0.357 us | native_apic_mem_write(); 0) 0.360 us | rthal_nmi_arm(); 0) 2.712 us | } 0) 8.466 us | } 0) | __xnpod_schedule() { 0) | ipipe_send_ipi() { 0) 0.409 us | __ipipe_send_ipi(); 0) 1.413 us | } 0) | xnsched_pick_next() { 0) 0.356 us | xnsched_tp_pick(); 0) | xnsched_sporadic_pick() { 0) 0.439 us | getmlq(); 0) 1.427 us | } 0) 4.019 us | } 0) 0.353 us | __phys_addr(); ------------------------------------------ 0) <idle>-0 => samplin-4850 ------------------------------------------ 0) + 87.054 us | } /* __xnpod_schedule */ 0) + 88.240 us | } /* xnpod_suspend_thread */ 0) 0.383 us | xntimer_get_overruns(); 0) 0.740 us | __ipipe_restore_pipeline_head(); 0) + 92.217 us | } /* xnpod_wait_thread_period */ 0) + 93.284 us | } /* rt_task_wait_period */ 0) + 94.347 us | } /* __rt_task_wait_period */ 0) + 95.485 us | } /* hisyscall_event */ 0) + 97.987 us | } /* __ipipe_dispatch_event */ 0) + 99.422 us | } /* __ipipe_syscall_root */ The function graph tracer is nice to understand call paths, but its performance impact is devastating (two tracepoints per function, entry *and* exit). Finally I fixed some remaining I-pipe bug and got perf working: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ PerfTop: 973 irqs/sec kernel:29.5% exact: 0.0% [1000Hz cycles], (all, cpu: 0) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ samples pcnt function DSO _______ _____ ______________________________ _______________________________________________ 5858.00 55.8% loads /home/jan/calibrator 861.00 8.2% __xnpod_schedule [kernel.kallsyms] 842.00 8.0% __ipipe_check_percpu_access [kernel.kallsyms] 485.00 4.6% read_hpet [kernel.kallsyms] 288.00 2.7% __switch_to [kernel.kallsyms] 144.00 1.4% xnintr_clock_handler [kernel.kallsyms] 136.00 1.3% irq_entries_start [kernel.kallsyms] 113.00 1.1% ipipe_check_context [kernel.kallsyms] 92.00 0.9% xnpod_wait_thread_period [kernel.kallsyms] 77.00 0.7% sched_clock [kernel.kallsyms] 73.00 0.7% _raw_spin_lock [kernel.kallsyms] 72.00 0.7% xntimer_tick_aperiodic [kernel.kallsyms] 68.00 0.6% xnpod_suspend_thread [kernel.kallsyms] 48.00 0.5% intel_pmu_disable_all [kernel.kallsyms] 48.00 0.5% __ipipe_send_ipi [kernel.kallsyms] 46.00 0.4% __ipipe_restore_root [kernel.kallsyms] 46.00 0.4% __ipipe_dispatch_event [kernel.kallsyms] 45.00 0.4% xntimer_next_local_shot [kernel.kallsyms] 42.00 0.4% sched_clock_local [kernel.kallsyms] 41.00 0.4% rt_task_wait_period /home/jan/xenomai/inst64/lib/libnative.so.3.0.0 38.00 0.4% hisyscall_event [kernel.kallsyms] 35.00 0.3% perf_ctx_adjust_freq [kernel.kallsyms] 34.00 0.3% __pthread_setcanceltype /lib64/libpthread-2.9.so This is the output of "perf top" using the PMU to trigger NMIs frequently, but you can also use it with other event sources, including the Xenomai ftrace instrumentations. The patch queues for those experiments are still rather short. Find all I-pipe patches on top of my 2.6.34-x86 queue at git://git.kiszka.org/ipipe-2.6 queues/2.6.35-x86-trace The patches are about allowing preempt_disable/enable over any context and hardening a few core paths of ftrace and perf: ipipe: Harden ftrace function and function-graph tracer ipipe: Harden clocks used by ftrace and perf ipipe: Remove context check from add/sub_preempt_count ipipe: Add root domain guard to preempt_schedule Then there is a short queue for Xenomai, demonstrating how the instrumentation can be converted: git://git.xenomai.org/xenomai-jki.git queues/ftrace There are surely some traps&pitfalls remaining, even on x86-64. One is, e.g., that enabling/disabling tracepoints can impose high latency on running real time load (tons of ipipe_critical_enter...). However, if used carefully, these tools can be very helpful for Xenomai-based development. Another is that in-kernel threads are of course still not recognized by the tracer (no own current, no pid). But once we got rid of the Xenomai-created in-kernel threads they should work fine across all arch and deliver consistent states as well. One todo on my list is looking for a way to capture and visualize the pipeline and hard-irq states of all domains. It might be challenging as I want to avoid forking the trace-cmd tools just for this purpose. If anyone want to give this a try, please drop me a note on your experiences. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux _______________________________________________ Adeos-main mailing list Adeos-main@gna.org https://mail.gna.org/listinfo/adeos-main