I've got good news. I was able to get opensm to take control. I gave it a priority of 15 and rebooted the 7000D. Unfortunately I'm not sure I can leave it like this forever. The only host I had with opensm installed is my test front end for an OS upgrade I'm testing. We're moving from Rocks 4.3 to Rocks 5.3 (RHEL 4.5 to RHEL 5.4). I may need to reboot this node from time to time over the next couple of weeks, but at least I'm working right now. So you say that a 288 node system will work "out of the box", what happens when you hit 289? Is that a magic number or just an estimate. We have 268 compute nodes plus a few auxiliary nodes so we're pretty close to that number.
Thanks, Mike On Mar 24, 2010, at 12:25 PM, Ira Weiny wrote: > On Wed, 24 Mar 2010 11:34:02 -0600 > Michael Robbert <mrobb...@mines.edu> wrote: > >> Interesting note! The 7024 is our large switch where all the hosts are >> connected, but I was told that we were sold the 7000D because the 7024 >> didn't have a subnet manager. Unfortunately the 7000D has a different CLI >> and that command is not available and I don't have the password for our 7024 >> so I can't log onto it. >> >> On another note I just noticed the uptime on the 7000D is just over 1 day so >> that must have been the start of the problem, but I have no idea why it >> rebooted nor why it didn't come up working. I'm pretty sure we tested a >> reboot of the device during acceptance testing. >> >> Oh, I just got your second note: >> ================================== >> BTW, I highly recommend running the opensm on a server instead of using the >> sm on the switch. We found running the sm on the switch was much less >> reliable. I also recommend using a server dedicated to opensm only. >> ================================== > > I will second this. OpenSM has come a long way since the time Cisco was > selling IB switches. If I understand your situation you don't even need the > 7000D you could just remove it and run OpenSM on a "management" node. If you > can afford it adding a node for OpenSM would be nice but I am not sure you > _need_ it. > > OpenSM is now managing many of the largest IB networks out there, on a 288 > node system it will have no problems at all "out of the box". > > :D > > Ira > >> I will take that into consideration, but we bought this as a "turn-key" >> solution from Dell. They designed it and we had no experience with IB so we >> trusted their knowledge. > > <snip> > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html