On Tue, 23 Oct 2007, Josh Carroll wrote:

Hello,

I posted this to the stable mailing list, as I thought it was
pertinent there, but I think it will get better attention here. So I
apologize in advance for cross-posting if this is a faux pas. :)

Anyway, in summary, ULE is about 5-6 % slower than 4BSD for two
workloads that I am sensitive to: building world with -j X, and ffmpeg
-threads X. Other benchmarks seem to indicate relatively equal
performance between the two. MySQL, on the other hand, is
significantly faster in ULE.

I'm trying to understand why ffmpeg and buildworld are slower in ULE
than 4BSD, since it seems to me that ULE was supposed to be the better
scaling scheduler.

Here is a link to the original thread on the stable mailing list:

http://lists.freebsd.org/pipermail/freebsd-stable/2007-October/037379.html

Remy replied with some interesting results for building world between
the two schedulers on an 8-way system. It seems that ULE suffers as
more threads/processes are thrown at it, at least it appears that way
from Remy's data.

Does anyone have any additional performance tests I can run that might
help indicate where the deficiency is in the ULE scheduler? MySQL
performance is excellent, so I'm wondering if it was tuned to that
particular workload?

I'm not sure if Remy subscribes to this list, so I am CC'ing him. Hope
you don't mind Remy :)

Josh,

Thanks for your emails. First, as gnn mentioned, I'm without most of my things at the moment. I have some patches which might improve your workload but I need to test and tune them more myself before I give them out. I doubt any of the sysctls other than steal_thresh and balance_ticks will help your situation.

ULE is tuned for workloads that benefit from improved affinity. Not necessarily mysql in particular. Tests with other workloads that benefit from improved affinity have verified that it's not really mysql specific tuning. The problem with buildworld for ULE is that 4BSD gets basically perfect affinity and perfect balancing because the compiler runs, typically uninterrupted, until it does a blocking disk transaction. At that point it doesn't matter which CPU it is resumed on.

Your tests with ffmpeg threads vs processes probably is triggering more context switches due to lock contention in the kernel in the threads case. This is also likely the problem with some super-smack tests. On each context switch 4BSD has an opportunity to perfectly balance the CPUs. ULE does not because it's too costly and hinders other workloads.

I don't doubt that we can improve things further. It will just have to wait for another few weeks before I'm able to do much about it.

Thanks,
Jeff


Regards,
Josh
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to