I have added couple of patches to the OFED stack as described in bug#160, bug#172, and bug#159 and with this successfully tested the APM functionality, except one issue.
Configuration: 2 Nodes CPU: AMD Opteron(tm) Processor 252 Dual processor CA type: MT25208 Firmware version: 5.1.4 OS: CentOS release 4.2 IB: OFED 1.0 2 Flextronics 24 port switchs Node1 Port1 connected to Switch1 Node1 Port2 connected to Switch2 Node2 Port1 connected to switch1 Node 2 Port 2 connected to Switch2 Node1 : Active side of the RC QP Node 2 : Passive side of the RC QP Test1: Failover simulation on Node1 1. Simulate the port1 failure, RC QP migrates the path to port2 2. Simulate the port1 UP to rearm the alternate path from port1 3. Simulate the port2 failure, RC QP migrate the path to port1 4. Simulate the port2 IP to rearm the alternate path from port2 Test2: Real failover my manually pulling the cable 1. Simulate the failover/failback by pulling cable of Node1 port1 2. Simulate the failover/failback by pulling cable of Node1 port2 3. Simulate the failover/failback by pulling cable of Node2 port1 4. Simulate the failover/failback by pulling cable of Node2 port2 ISSUE: If I pull the both the cables then there are no paths to the destination, so RC QP connection is supposed to tear down. But it is not working. 1. Create a RC QP and load both primary and alternate path (I was setting rnr_retry_count = 6, retry_count = 6, packet_life_time field of struct ib_sa_path_rec to 15 and also tried with 12) 2. Send some traffic over RC QP 3. Disconnect the cable belonging to the primary path 4. It smoothly fails over to alternate path and it becomes primary path. No affect to the traffic on that RC QP 5. Remove the second cable belonging to the new primary path. 6. Obviously traffic stops since there are no paths to the destination. But for the outstanding WRs in the RC QP I don't get any callback from the verbs layer describing whether it succeeded or failed due to some error like IB_WC_RETRY_EXC_ERR. When I query the RC QP properties it still shows that it is in IB_QPS_RTS state. Without APM functionality it behaves correctly - 1. Create a RC QP and load only primary path (I was setting rnr_retry_count = 6, retry_count = 6, packet_life_time field of struct ib_sa_path_rec to 15 and also tried with 12) 2. Send some traffic over RC QP 3. Disconnect the cable belonging to the primary path 4. Obviously traffic stops since there are no paths to the destination. For the outstanding WRs in the RC QP I do get a callback from the verbs layer describing the first WR that it failed due to error IB_WC_RETRY_EXC_ERR and for all other WRs I get IB_WC_WR_FLUSH_ERR. I will close this RC QP. VBabu Date: Mon, 16 Oct 2006 14:03:50 -0700 From: "Sean Hefty" <[EMAIL PROTECTED]> Subject: Re: [openib-general] APM support in openib stack To: [EMAIL PROTECTED] Cc: openib-general@openib.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=iso-8859-1; format=flowed somenath wrote: >>>> Doesn't ib_cm_init_qp_attr() set this for you? >> >> >> >> No, it doesn't. it returns me >> attr_mask= 0x12d181 >> port=0x0 alt_port=0x0 > > Okay - there was a fix to the cm.c file (svn rev 8267) that added setting the alternate port number when initializing the QP attributes. Apparently that fix did not make it into the release that you're using. - Sean _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general