Hi everyone, We have a small cluster of 6 identical 48-core nodes for astrophysical research. We are struggling on getting openmpi to run efficiently on the nodes. The head node is running ubuntu and openmpi-1.6.5 on a local disk. All worker nodes are booting from NFS exported root that resides on a NAS, also with ubuntu and openmpi 1.6.5. All nodes have Gbit ethernets and the NAS is connected to the switch with 4 NICs. The motherboard is Supermicro H8QG6, processors are 2.6GHz AMD Opterons 6344.
When we run openmpi on the head node, everything works as expected. But when we run in on any of the worker nodes, the execution is ~20+ times longer, and htop shows that all processes spend the vast majority of their time on kernel cycles (red symbols). I have been trying to learn about the profilers and MCA optimization and such, but it seems to me that a 20-fold hit in performance indicates a much more serious problem. For example, it might have to do with a buggy BIOS that doesn't report L3 cache correctly, and that throws hwloc warnings that I reported in the past. I flashed the BIOS to the latest version, we are running the latest kernel, and I tried newer, manually compiled hwloc/openmpi to no avail. I am at my wits' end on what to try next, and I would thoroughly appreciate any help and guidance. Our cluster is idling till I resolve this, and quite a few people are tapping on my shoulder impatiently. And yes, I'm an astronomer, not a sys admin, so please excuse my ignorance. Thanks a bunch, Andrej
