Peter Humphrey <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Fri, 22 Jun 2007 10:30:26 +0100:
> This is what happens: when BOINC starts up it starts two processes, > which it thinks are going to occupy up to 100% of each processor's time. > But both gkrellm and top show both processes running at 50% on CPU1, > always that one, with CPU0 idling. Then, if I start an emerge or > something, that divides its time more-or-less equally between the two > processors with the BOINC processes still confined to CPU1. > > Even more confusingly, sometimes top even disagrees with itself about > the processor loadings, the heading lines showing one CPU loaded and the > task lines showing the other. > > Just occasionally, BOINC will start its processes properly, each using > 100% of a CPU, but after a while it reverts spontaneously to its usual > behaviour. I can't find anything in any log to coincide with the > reversion. Was it you who posted about this before, or someone else? If it wasn't you, take a look back thru the list a couple months, as it did come up previously. You may have someone to compare notes with. =8^) Separate processes or separate threads? Two CPUs (um, two separate sockets) or two cores on the same CPU/socket? Some or all of the following you likely already know, but hey, maybe it'll help someone else and it never hurts to throw it in anyway... The kernel task scheduler uses CPU affinity, which is supposed to have a variable resistance to switching CPUs, and an preference for keeping a task on the CPU controlling its memory, given a NUMA architecture situation where there's local and remote memory, and a penalty to be paid for access to remote memory. There are however differences in architecture between AMD (with its onboard memory controller and closer cooperation, both between cores and between CPUs on separate sockets, due to the direct Hypertransport links) and Intel (with it's off-chip controller and looser inter-core, inter- chip, and inter-socket cooperation). There's also differences in the way you can configure both the memory (thru the BIOS) and the kernel, for separate NUMA access or unified view memory. If these settings don't match your actual physical layout, efficiency will be less than peak, either because there won't be enough resistance to a relatively high cost switching between CPUs/cores and memory, so they'll switch off frequently with little reason, incurring expensive delays each time, or because there's too much resistance and to much favor placed on what the kernel thinks is local vs remote memory, when it's all the same, and there is in fact very little cost to switching cores/CPUs. Generally, if you have a single slot true dual core (Intel core-duo or any AMD dual core), you'll run a single memory controller with a single unified view on memory, and costs to switch cores will be relatively low. You'll want to disable NUMA, and configure your kernel with a single scheduling domain. If you have multiple slots or the early Intel pseudo-dual-cores, which were really two separate CPUs simply packaged together, with no special cooperation between them, you'll probably want them in separate scheduling domains. If it's AMD with its onboard memory controllers, two sockets means two controllers, and you'll also want to consider NUMA, tho you can disable it and interleave your memory if you wish, for a unified memory view and higher bandwidth, but at the tradeoff of higher latency and less efficient memory access when separate tasks (each running on a CPU) both want to use memory at the same time. If you are lucky enough to have four cores, it gets more complex, as currently, four-cores operate as two loosely cooperating pairs, with closer cooperation between cores of the same pair. For highest efficiency there, you'll have two levels of scheduling domain, mirroring the tight local pair-partner cooperation with the rather looser cooperation between pairs. In particular, you'll want to pay attention to the following kernel config settings under Processor type and features: 1) Symmetric multiprocessing support (CONFIG_SMP). You probably have this set right or you'd not be using multiple CPUs/cores. 2) Under SMP, /possibly/ SMT (CONFIG_SCHED_SMT), tho for Intel only, and on the older Hyperthreading Netburst arch models. 3) Still under SMP, Multi-core scheduler support (CONFIG_SCHED_MC), if you have true dual cores. Again, note that the first "dual core" Intel units were simply two separate CPUs in the same package, so you probably do NOT want this for them. 4) Non Uniform Memory Access (NUMA) Support (CONFIG_NUMA). You probably do NOT want this on single-socket multi-cores, and on most Intel systems. You probably DO want this on AMD multi-socket Opteron systems, BUT note that there may be BIOS settings for this as well. It won't work so efficiently if the BIOS setting doesn't agree with the kernel setting. 5) If you have NUMA support enabled, you'll also want either Old style AMD Opteron NUMA detection (CONFIG_K8_NUMA) or (preferred) ACPI NUMA detection (CONFIG_X86_64_ACPI_NUMA). 6) Make sure you do *NOT* have NUMA emulation (CONFIG_NUMA_EMU) enabled. As the help for that option says, it's only useful for debugging. What I'm wondering, of course, is whether you have NUMA turned on when you shouldn't, or don't have core scheduling turned on when you should, thus artificially increasing the resistance to switching cores/cpus and causing the stickiness. Now for the process vs. thread stuff. With NUMA turned on, especially if core scheduling is turned off, threads of the same app, accessing the same memory, will be more likely to be scheduled on the same processor. I don't know anything that will allow specifying processor per-thread, at least with the newer NPTL (Native POSIX Thread Library) threading. With the older Linux threads model, each thread showed up as a separate process, with its own PID, and could therefore be accessed separately by the various scheduling tools. If however, you were correct when you said BOINC starts two separate /processes/ (not threads), or if BOINC happens to use the older/heavier Linux threads model (which again will cause the threads to show up as separate processes), THEN you are in luck! =8^) There are two scheduling utility packages that include utilities to tie processes to one or more specific processors. sys-process/schedutils is what I have installed. It's a collection of separate utilities, including taskset, by which I can tell the kernel which CPUs I want specific processes to run on. This worked well for me since I was more interested in taskset than the other included utilities, and only had to learn the single simple command. It does what I need it to do, and does it well. =8^) If you prefer a single do-it-all scheduler-tool, perhaps easier to learn if you plan to fiddle with more than simply which CPU a process runs on, and want to learn it all at once, sys-process/schedtool may be more your style. Hope that's of some help, even if part or all of it is review. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [EMAIL PROTECTED] mailing list