This is turning into a bad month  :-(

I'm running BOINC clients on this box, and the kernel seems unable to 
schedule them properly. I'm subscribed to several projects, so I should 
have one on each CPU all the time, running at nice 19 and therefore mopping 
up all available CPU cycles. That's how it used to run. But nowadays the 
kernel scheduler insists on allocating both of them to the same CPU, thus 
limiting them to 50% load. Occasionally it will start up correctly, but 
only if I've started the BOINC client interactively rather than from a 
startup script, but even if so it still reverts to its bad behaviour after 
a while. I haven't been able so far to spot any particular influence that 
might cause this reversion, and the time before it happens is apparently 
random.

The box is a Supermicro H8DCE with 2 x Opteron 246 CPUs and 2 x 2GB RAM. 
This board divides the DIMM slots into two banks of four, one bank next to 
each CPU and associated with it. I've tried various kernels from 2.6.16-r13 
to 2.6.21-r1. I've tried unsetting all the clever-looking optimisations in 
the kernel, I've tried all three scheduling algorithms and I've tried 
resetting the BIOS to "optimised" defaults. I've even tried a genkernel 
kernel with default config, but that version couldn't see the root 
disk /dev/sda for some reason, and of course it wouldn't boot.

It's also odd that CPU1 runs 5 - 6 C hotter than CPU0, whether loaded or 
not.

Sometimes I suspect a problem with APIC or perhaps the IOMMU, re which I 
have mostly default or conservative settings in the kernel. Has anyone here 
some experience they could offer?

I've also been to the BOINC project sites and changed my preferences to the 
most conservative I can find, but still I can't get proper allocation of 
boinc clients to processors. I've tried the forums and got some useful 
help, but not yet a solution.

This all started some time ago, about the time when I had to replace the 
motherboard, but as I wasn't following it very closely at the time I 
haven't been able to pinpoint the factor that caused the change in kernel 
scheduling behaviour.

-- 
Rgds
Peter Humphrey
Linux Counter 5290, Aug 93
-- 
[EMAIL PROTECTED] mailing list

Reply via email to