On Tuesday 18 October 2011 11:04:36 Urmas Lett wrote: > Hello. > > Why is ffmpeg -threads massively slower with ULE than 4BSD? > > ffmpeg preset veryfast with sched_bsd: > real 1m49.407s > user 6m53.932s > sys 0m1.700s > > ffmpeg preset veryfast with sched_ule: > real 2m52.711s > user 6m50.310s > sys 0m1.582s > > #uname -a > FreeBSD 9.0-RC1 FreeBSD 9.0-RC1 #0: Mon Oct 17 20:32:29 EEST
Since no-one has offered any insight about the cause (yet) I'll explain what I think is happening here. SCHED_ULE tries to make threads "sticky" to a certain CPU. This has benefits in the realm of cache utilization and memory locality for NUMA systems. This works great when all running threads do useful work all the time. Not all threads will receive equal amounts of CPU time. SCHED_BSD on the other hand much more aggressively reschedules running threads on other CPUs with no regard for cache locality. This gives all threads an equal share of CPU time. I was unable to (quickly) find out the exact implementation details of multithreaded ffmpeg, but I guess it simply splits up each frame in N equal parts and uses N threads to encode these parts. The master process then probably recombines them into the final frame. Because almost all encodings compress video using the 3rd dimension (time) it must wait for the current frame to finish before it can start encoding the next frame. Thus we end up with a workload like this: split1 -> N x encode1 -> recombine1 -> split2 -> N x encode2 -> recombine2 etc.. This bursty behavior really is no good match for ULE, because it (ffmpeg's master process) assumes equal runtime of all threads. ULE has no time to properly load balance all threads before they die (or stop doing work). You can see clearly in your timings that SCHED_ULE is actually just as fast when we look at the amount of CPU time spent. However, because it is not nearly as aggressive as SCHED_BSD in stealing threads from busy CPUs there is some idle time in there as well. This causes the difference. There are some tunable sysctls (kern.sched.*) that might help in this scenario. I bet if you would run two ffmpeg processes in parallel you'd get about the same runtimes for both schedulers. (Disclaimer: I have collected most of this information from the mailing lists, not the actual code, so I could be completely wrong) -- Pieter de Goeje _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscr...@freebsd.org"