On 10/07/2011 08:07 PM, Robin Gareus wrote:
On 10/08/2011 03:25 AM, Michael Ost wrote:
Hi list,

We are seeing unexpected interruptions of SCHED_RR audio processing
threads, and are struggling to understand why they are happening. Does
anyone have any good tips or tools to suggest to help figure out what is
preempting or delaying realtime audio threads?

https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html#builtintools

It's a bit dated but so is your 2.6.24.3 kernel. You will need to
compile the kernel with CONFIG_PREEMPT_TRACER=y CONFIG_SCHED_TRACER=y
under "Kernel hacking ->  Tracers ->  .." to get access to
/sys/kernel/debug/tracing/

The man page of cyclictest (8) includes some hints: see the '-b' option
on how to produce traces from the scheduler. Also check out
..linux-rt-source/Documentation/trace/histograms.txt

Excellent!

The issues are coming up with Receptor [see (*) below for an intro]
running its 2.6.24.3 CCRMA based kernel. The bug appears with Receptor
in its "dual core mode", with three instances of Native Instruments'
Kontakt 4 in DFD + multi-core mode. Either lots of held notes, or large
patches are needed to get the spikes.

With these settings there are 5 SCHED_RR threads processing audio (on a
two-core system). 2 are from Receptor, and 1 from each instance of
Kontakt. These Kontakt "helper threads" are released to do work as
possible while the audio thread is processing.

Kontakt/DFD is using mmap to bring its files into memory. This is done
in a lower priority "DFD thread", and the mapped memory is used by the
r/t audio threads.

DFD is important because the problems don't happen without it. And the
high SCHED_RR thread count is important, because the problems also don't
happen if we reduce the count.

is the DFD thread calling mlock() on the mapped memory?
maybe madvise/mlock fail because you're trying to lock more pages than
RLIMIT_MEMLOCK permits? just a guess.

Nope, I looked into that. Nothing quite that straightfoward, unfortunately.

Documentation/trace/mmiotrace.txt should help to find out if a process
blocks due to memory mapped i/o. The debug interface is nifty and the
time consuming part is compiling a kernel with CONFIG_MMIOTRACE=y :)

Again, thanks for the tip. Excellent.

When the spike happens there are no:
* wine bottlenecks
* system calls
* threads blocking on each other
* page faults during audio processing

There do appear to be "involuntary context switches" (as reported by
getrusage) when the spikes happen. This makes it seem like the scheduler
is interrupting our threads. But how do you figure out why that is
happening?
        
There aren't many threads in the system with higher priority. All of the
5 processing threads are SCHED_RR/76. The higher priority threads in the
system are:
* migration/0 - FIFO/99
* migration/1 - FIFO/99
* watchdog/0 - FIFO/99
* watchdog/1 - FIFO/99
* posix_cpu_timer (x2) - FIFO/99
* IRQ8 (rtc) - FIFO/99
* IRQ20 (our audio card) - FIFO 77

Could other kernel activity interrupt the audio threads? Are there
issues with memory mapping, that can block other unrelated threads? Are
there just too danged many SCHED_RR threads fighting for two cores?

Anyone have any suggestions for how to trace the scheduler, and thread
or process interruptions?

Apologies for the lengthy post, but this is a tricky subject.

Thanks for any tips or insights,

It would not surprise me if that is one of the many issues that got
fixed since 2.6.24. That kernel still featured the BKL [BigKernelLock]
and the problem you are describing is not too far out..

3.0.6-rt17 requires a bit more work, but 2.6.39 is very stable.

We have tried 2.6.33 with the same results. I'll see if we can try something even newer.

Great info. Thanks alot!

Michael Ost
_______________________________________________
Linux-audio-dev mailing list
Linux-audio-dev@lists.linuxaudio.org
http://lists.linuxaudio.org/listinfo/linux-audio-dev

Reply via email to