Hello,

Summary:
 1. An observation/theory of a race condition in VF/PF mailbox protocol leading 
to VF showing link down
 2. Belief that the race is exploited due to the VF not waiting on PF replies 
for some messages
 3. Patch for the above

I've been trying to get i350 SR-IOV VFs working in my Xen environment 
(XenServer 6.0 [Xen 4.1], igb 3.2.10, HVM CentOS 5.x guest with igbvf 1.1.5, 
Dell T3500). We've had a lot of experience and success with SR-IOV with both 
the 82599 and 82576 ET2; this work was to extend our support to include the 
i350. The problem I observed was that around 9 times out of 10 the VF link was 
showing as down, both after the initial module load and after a "ifconfig up". 
On the occasions the link did come up the VF worked perfectly. Adding a few 
printks showed the link was being declared down due to the VF/PF mailbox 
timeout having been hit (setting mbx->timeout to zero). Further instrumentation 
suggests that the timeout occurred when the VF was waiting for an ACK from the 
PF that never came, due to the original VF to PF message not being received by 
the PF. Based on limited logging the following sequence is what I believe 
happened:

 1. VF sends message (e.g. 0x00010003) to PF (get VFU lock; write buffer; 
release VFU lock/signal REQ), VF waits for ACK
 2. PF igb_msg_task handler checks the VFICR and finds REQ set
 3. PF igb_msg_task handler calls igb_rcv_msg_from_vf which reads message and 
ACKs (get PFU lock; read buffer; release PFU lock/signal ACK)
 4. PF deals with message, e.g. calling igb_set_vf_multicasts
 5. VF receives ACK
 6. VF moves on to send next message (e.g. 0x00000006) to PF (get VFU lock; 
write buffer; release VFU lock/signal REQ), VF waits for ACK
 7. PF igb_rcv_msg_from_vf sends reply message (orig msg | 
E1000_VT_MSGTYPE_ACK|E1000_VT_MSGTYPE_CTS) from PF to VF (get PFU lock; clear 
REQ/ACK bits in VFICR ***clearing the REQ flag just set by VF***; write buffer 
***overwriting VF's pending message***; release PFU lock/signal STS)
 8. PF igb_msg_task handler runs but REQ flag is zero so message not handled
 9. VF times out waiting for ACK to the second message

>From inspecting the code in both drivers my understanding is that the VFU/PFU 
>locking mechanism is only being used to protect the message buffer while it is 
>being read or written, it is not protecting against the buffer being re-used 
>by either driver before the existing message has been handled (the lock is 
>released when setting REQ/STS, not on receiving the ACK as the protocol 
>description in the 82576 datasheet suggests). Adding a 5000usec sleep after 
>each message send from the VF makes the original link-down failure go away 
>giving some confidence to the race condition theory.

I believe that the race should not be a problem if the VF and PF are exchanging 
messages in the expected order however in the above case the VF sent a 
E1000_VF_SET_MULTICAST message but did not wait for the reply message from the 
PF. Reviewing the driver code shows that of the five messages the PF 
igb_rcv_msg_from_vf would reply to the VF driver does not wait for replies for 
three (E1000_VF_SET_MULTICAST, E1000_VF_SET_LPE and E1000_VF_SET_VLAN). 
Patching the VF driver (see below) to perform dummy reads after sending each of 
these three messages makes the original link-down failure go away.

Have I misunderstood the locking strategy for the mailbox? As far as I can see 
nothing has changed in the newer igb and ibgvf drivers that would explain why 
I'm seeing the VF link failure on the i350 but not on the other NICs (I don't 
see this with the older 82576 in the same system with the same drivers). I can 
only assume it's just very bad luck with timing in this particular system and 
configuration.

Regards,
James Bulpin

Read (and ignore) replies to VF to PF messages currently unhandled

Signed-off-by: James Bulpin <james.bul...@eu.citrix.com>

diff -rup igbvf-1.1.5.pristine/src/e1000_vf.c 
igbvf-1.1.5.mboxreply/src/e1000_vf.c
--- igbvf-1.1.5.pristine/src/e1000_vf.c 2011-08-16 18:38:11.000000000 +0100
+++ igbvf-1.1.5.mboxreply/src/e1000_vf.c        2011-11-02 12:54:13.892369000 
+0000
@@ -269,6 +269,7 @@ void e1000_update_mc_addr_list_vf(struct
        u16 *hash_list = (u16 *)&msgbuf[1];
        u32 hash_value;
        u32 i;
+       s32 ret_val;

        DEBUGFUNC("e1000_update_mc_addr_list_vf");

@@ -298,7 +299,10 @@ void e1000_update_mc_addr_list_vf(struct
                mc_addr_list += ETH_ADDR_LEN;
        }

-       mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+       ret_val = mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+       if (!ret_val)
+               mbx->ops.read_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+
 }

 /**
@@ -311,6 +315,7 @@ void e1000_vfta_set_vf(struct e1000_hw *
 {
        struct e1000_mbx_info *mbx = &hw->mbx;
        u32 msgbuf[2];
+       s32 ret_val;

        msgbuf[0] = E1000_VF_SET_VLAN;
        msgbuf[1] = vid;
@@ -318,7 +323,9 @@ void e1000_vfta_set_vf(struct e1000_hw *
        if (set)
                msgbuf[0] |= E1000_VF_SET_VLAN_ADD;

-       mbx->ops.write_posted(hw, msgbuf, 2, 0);
+       ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0);
+       if (!ret_val)
+               mbx->ops.read_posted(hw, msgbuf, 2, 0);
 }

 /** e1000_rlpml_set_vf - Set the maximum receive packet length
@@ -329,11 +336,14 @@ void e1000_rlpml_set_vf(struct e1000_hw
 {
        struct e1000_mbx_info *mbx = &hw->mbx;
        u32 msgbuf[2];
+       s32 ret_val;

        msgbuf[0] = E1000_VF_SET_LPE;
        msgbuf[1] = max_size;

-       mbx->ops.write_posted(hw, msgbuf, 2, 0);
+       ret_val = mbx->ops.write_posted(hw, msgbuf, 2, 0);
+       if (!ret_val)
+               mbx->ops.read_posted(hw, msgbuf, 2, 0);
 }

 /**

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to