Hello, Summary: 1. An observation/theory of a race condition in VF/PF mailbox protocol leading to VF showing link down 2. Belief that the race is exploited due to the VF not waiting on PF replies for some messages 3. Patch for the above
I've been trying to get i350 SR-IOV VFs working in my Xen environment (XenServer 6.0 [Xen 4.1], igb 3.2.10, HVM CentOS 5.x guest with igbvf 1.1.5, Dell T3500). We've had a lot of experience and success with SR-IOV with both the 82599 and 82576 ET2; this work was to extend our support to include the i350. The problem I observed was that around 9 times out of 10 the VF link was showing as down, both after the initial module load and after a "ifconfig up". On the occasions the link did come up the VF worked perfectly. Adding a few printks showed the link was being declared down due to the VF/PF mailbox timeout having been hit (setting mbx->timeout to zero). Further instrumentation suggests that the timeout occurred when the VF was waiting for an ACK from the PF that never came, due to the original VF to PF message not being received by the PF. Based on limited logging the following sequence is what I believe happened: 1. VF sends message (e.g. 0x00010003) to PF (get VFU lock; write buffer; release VFU lock/signal REQ), VF waits for ACK 2. PF igb_msg_task handler checks the VFICR and finds REQ set 3. PF igb_msg_task handler calls igb_rcv_msg_from_vf which reads message and ACKs (get PFU lock; read buffer; release PFU lock/signal ACK) 4. PF deals with message, e.g. calling igb_set_vf_multicasts 5. VF receives ACK 6. VF moves on to send next message (e.g. 0x00000006) to PF (get VFU lock; write buffer; release VFU lock/signal REQ), VF waits for ACK 7. PF igb_rcv_msg_from_vf sends reply message (orig msg | E1000_VT_MSGTYPE_ACK|E1000_VT_MSGTYPE_CTS) from PF to VF (get PFU lock; clear REQ/ACK bits in VFICR ***clearing the REQ flag just set by VF***; write buffer ***overwriting VF's pending message***; release PFU lock/signal STS) 8. PF igb_msg_task handler runs but REQ flag is zero so message not handled 9. VF times out waiting for ACK to the second message >From inspecting the code in both drivers my understanding is that the VFU/PFU >locking mechanism is only being used to protect the message buffer while it is >being read or written, it is not protecting against the buffer being re-used >by either driver before the existing message has been handled (the lock is >released when setting REQ/STS, not on receiving the ACK as the protocol >description in the 82576 datasheet suggests). Adding a 5000usec sleep after >each message send from the VF makes the original link-down failure go away >giving some confidence to the race condition theory. I believe that the race should not be a problem if the VF and PF are exchanging messages in the expected order however in the above case the VF sent a E1000_VF_SET_MULTICAST message but did not wait for the reply message from the PF. Reviewing the driver code shows that of the five messages the PF igb_rcv_msg_from_vf would reply to the VF driver does not wait for replies for three (E1000_VF_SET_MULTICAST, E1000_VF_SET_LPE and E1000_VF_SET_VLAN). Patching the VF driver (see below) to perform dummy reads after sending each of these three messages makes the original link-down failure go away. Have I misunderstood the locking strategy for the mailbox? As far as I can see nothing has changed in the newer igb and ibgvf drivers that would explain why I'm seeing the VF link failure on the i350 but not on the other NICs (I don't see this with the older 82576 in the same system with the same drivers). I can only assume it's just very bad luck with timing in this particular system and configuration. Regards, James Bulpin Read (and ignore) replies to VF to PF messages currently unhandled Signed-off-by: James Bulpin <james.bul...@eu.citrix.com> diff -rup igbvf-1.1.5.pristine/src/e1000_vf.c igbvf-1.1.5.mboxreply/src/e1000_vf.c --- igbvf-1.1.5.pristine/src/e1000_vf.c 2011-08-16 18:38:11.000000000 +0100 +++ igbvf-1.1.5.mboxreply/src/e1000_vf.c 2011-11-02 12:54:13.892369000 +0000 @@ -269,6 +269,7 @@ void e1000_update_mc_addr_list_vf(struct u16 *hash_list = (u16 *)&msgbuf[1]; u32 hash_value; u32 i; + s32 ret_val; DEBUGFUNC("e1000_update_mc_addr_list_vf"); @@ -298,7 +299,10 @@ void e1000_update_mc_addr_list_vf(struct mc_addr_list += ETH_ADDR_LEN; } - mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0); + ret_val = mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0); + if (!ret_val) + mbx->ops.read_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0); + } /** @@ -311,6 +315,7 @@ void e1000_vfta_set_vf(struct e1000_hw * { struct e1000_mbx_info *mbx = &hw->mbx; u32 msgbuf[2]; + s32 ret_val; msgbuf[0] = E1000_VF_SET_VLAN; msgbuf[1] = vid; @@ -318,7 +323,9 @@ void e1000_vfta_set_vf(struct e1000_hw * if (set) msgbuf[0] |= E1000_VF_SET_VLAN_ADD; - mbx->ops.write_posted(hw, msgbuf, 2, 0); + ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0); + if (!ret_val) + mbx->ops.read_posted(hw, msgbuf, 2, 0); } /** e1000_rlpml_set_vf - Set the maximum receive packet length @@ -329,11 +336,14 @@ void e1000_rlpml_set_vf(struct e1000_hw { struct e1000_mbx_info *mbx = &hw->mbx; u32 msgbuf[2]; + s32 ret_val; msgbuf[0] = E1000_VF_SET_LPE; msgbuf[1] = max_size; - mbx->ops.write_posted(hw, msgbuf, 2, 0); + ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0); + if (!ret_val) + mbx->ops.read_posted(hw, msgbuf, 2, 0); } /** ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired