Hi Eric, Thanks for the response. OpenSM is running and set to start on bootup on MachineB: ps aux | grep open root 5616 0.0 0.1 142004 1396 ? Sl 13:39 0:00 /usr/sbin/opensm -t 200 -f /var/log/opensm.log -g 0
The log on Machine B just logs this every 10 seconds: Nov 25 14:34:21 148541 [477A7940] 0x01 -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING Nov 25 14:34:31 153173 [477A7940] 0x80 -> SM port is down Ibstat confirms port is in polling state on MachineB. MachineA however is in a bad state, I tried the openibd restart command, it accepted the command but after 5 minutes shows no progress of doing anything and is just at the cursor. Is some sort of forced restart of openibd possible? Thanks, Rob -----Original Message----- From: Baur, Eric [mailto:[EMAIL PROTECTED] Sent: 25 November 2008 14:31 To: Robert Dunkley Subject: RE: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource Temporarily unavailable" Robert- Is OpenSM set to start on boot? chkconfig --list | grep opensmd If not: chkconfig opensmd on and: /etc/init.d/opensmd start You can also restart openib without rebooting the machines. /etc/init.d/openibd restart -Eric -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Dunkley Sent: Tuesday, November 25, 2008 9:21 AM To: [email protected] Subject: [ofa-general] Mellanox Gen3,Linux and ibpanic - "Resource Temporarily unavailable" Hi everyone, I'm using a setup of two machines (Lets call them A and B) directly connected by 1 cable. Each machine has a Mellanox MT25204 (Gen3 Mellanox PCI-E Infiniband card) and uses IPOIB, they run Centos 5.2 with OFED 1.3 installed, Machine B runs OpenSM. All was working fine. I shutdown Machine A did some maintenance and then powered it on again, everything is OK again. I then shutdown Machine B (The one running OpenSM), this seemed to really upset Machine A. After booting Machine B again, Machine B looks OK with the port down and in polling state. Machine A however gives the following error if I run ibstat: ibpanic: [11406] main: stat of IB device 'mthca0' failed: (Resource temporarily unavailable) I don't want to reboot Machine A as it must synch data with Machine B over the Infiniband link first. Does anyone have any idea how to fix machine A? Thanks, Rob The SAQ Group Registered Office: 18 Chapel Street, Petersfield, Hampshire GU32 3DZ SEMTEC Limited Trading as SAQ is Registered in England & Wales Company Number: 06481952 http://www.saqnet.co.uk AS29219 SAQ Group Delivers high quality, honestly priced communication and I.T. services to UK Business. DSL : Domains : Email : Hosting : CoLo : Servers : Racks : Transit : Backups : Managed Networks : Remote Support. Find us in http://www.thebestof.co.uk/petersfield _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
