[E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF link staying down

2011-11-02 Thread James Bulpin
Hello,

Summary:
 1. An observation/theory of a race condition in VF/PF mailbox protocol leading 
to VF showing link down
 2. Belief that the race is exploited due to the VF not waiting on PF replies 
for some messages
 3. Patch for the above

I've been trying to get i350 SR-IOV VFs working in my Xen environment 
(XenServer 6.0 [Xen 4.1], igb 3.2.10, HVM CentOS 5.x guest with igbvf 1.1.5, 
Dell T3500). We've had a lot of experience and success with SR-IOV with both 
the 82599 and 82576 ET2; this work was to extend our support to include the 
i350. The problem I observed was that around 9 times out of 10 the VF link was 
showing as down, both after the initial module load and after a "ifconfig up". 
On the occasions the link did come up the VF worked perfectly. Adding a few 
printks showed the link was being declared down due to the VF/PF mailbox 
timeout having been hit (setting mbx->timeout to zero). Further instrumentation 
suggests that the timeout occurred when the VF was waiting for an ACK from the 
PF that never came, due to the original VF to PF message not being received by 
the PF. Based on limited logging the following sequence is what I believe 
happened:

 1. VF sends message (e.g. 0x00010003) to PF (get VFU lock; write buffer; 
release VFU lock/signal REQ), VF waits for ACK
 2. PF igb_msg_task handler checks the VFICR and finds REQ set
 3. PF igb_msg_task handler calls igb_rcv_msg_from_vf which reads message and 
ACKs (get PFU lock; read buffer; release PFU lock/signal ACK)
 4. PF deals with message, e.g. calling igb_set_vf_multicasts
 5. VF receives ACK
 6. VF moves on to send next message (e.g. 0x0006) to PF (get VFU lock; 
write buffer; release VFU lock/signal REQ), VF waits for ACK
 7. PF igb_rcv_msg_from_vf sends reply message (orig msg | 
E1000_VT_MSGTYPE_ACK|E1000_VT_MSGTYPE_CTS) from PF to VF (get PFU lock; clear 
REQ/ACK bits in VFICR ***clearing the REQ flag just set by VF***; write buffer 
***overwriting VF's pending message***; release PFU lock/signal STS)
 8. PF igb_msg_task handler runs but REQ flag is zero so message not handled
 9. VF times out waiting for ACK to the second message

>From inspecting the code in both drivers my understanding is that the VFU/PFU 
>locking mechanism is only being used to protect the message buffer while it is 
>being read or written, it is not protecting against the buffer being re-used 
>by either driver before the existing message has been handled (the lock is 
>released when setting REQ/STS, not on receiving the ACK as the protocol 
>description in the 82576 datasheet suggests). Adding a 5000usec sleep after 
>each message send from the VF makes the original link-down failure go away 
>giving some confidence to the race condition theory.

I believe that the race should not be a problem if the VF and PF are exchanging 
messages in the expected order however in the above case the VF sent a 
E1000_VF_SET_MULTICAST message but did not wait for the reply message from the 
PF. Reviewing the driver code shows that of the five messages the PF 
igb_rcv_msg_from_vf would reply to the VF driver does not wait for replies for 
three (E1000_VF_SET_MULTICAST, E1000_VF_SET_LPE and E1000_VF_SET_VLAN). 
Patching the VF driver (see below) to perform dummy reads after sending each of 
these three messages makes the original link-down failure go away.

Have I misunderstood the locking strategy for the mailbox? As far as I can see 
nothing has changed in the newer igb and ibgvf drivers that would explain why 
I'm seeing the VF link failure on the i350 but not on the other NICs (I don't 
see this with the older 82576 in the same system with the same drivers). I can 
only assume it's just very bad luck with timing in this particular system and 
configuration.

Regards,
James Bulpin

Read (and ignore) replies to VF to PF messages currently unhandled

Signed-off-by: James Bulpin 

diff -rup igbvf-1.1.5.pristine/src/e1000_vf.c 
igbvf-1.1.5.mboxreply/src/e1000_vf.c
--- igbvf-1.1.5.pristine/src/e1000_vf.c 2011-08-16 18:38:11.0 +0100
+++ igbvf-1.1.5.mboxreply/src/e1000_vf.c2011-11-02 12:54:13.892369000 
+
@@ -269,6 +269,7 @@ void e1000_update_mc_addr_list_vf(struct
u16 *hash_list = (u16 *)&msgbuf[1];
u32 hash_value;
u32 i;
+   s32 ret_val;

DEBUGFUNC("e1000_update_mc_addr_list_vf");

@@ -298,7 +299,10 @@ void e1000_update_mc_addr_list_vf(struct
mc_addr_list += ETH_ADDR_LEN;
}

-   mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+   ret_val = mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+   if (!ret_val)
+   mbx->ops.read_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+
 }

 /**
@@ -311,6 +315,7 @@ void e1000_vfta_set_vf(struct e1000_hw *
 {
struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[2];
+   s32 ret_val;

msgbuf[0] = E1000_VF_SET_VLAN;
msgbuf[1] = vid;
@@ -318,7 +323,9 @@ void e1000_vfta_set_vf(struct 

Re: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF link staying down

2011-11-02 Thread Rose, Gregory V
James,

I'm sorry to hear you're having problems with this and I really appreciate the 
extensive and detailed investigation you've done on the matter.

I'll do some further investigation on my side and we will review and test your 
patch.  Given that no one here at Intel has seen the problem it might be that 
the best we can do is incorporate the patch and then make sure that it doesn't 
break anything else.  If that is the case and the patch fixes your issue then I 
can't see any reason for us to not accept the patch.

Thanks,

- Greg

> -Original Message-
> From: James Bulpin [mailto:james.bul...@eu.citrix.com]
> Sent: Wednesday, November 02, 2011 11:31 AM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF
> link staying down
> 
> Hello,
> 
> Summary:
>  1. An observation/theory of a race condition in VF/PF mailbox protocol
> leading to VF showing link down
>  2. Belief that the race is exploited due to the VF not waiting on PF
> replies for some messages
>  3. Patch for the above
> 
> I've been trying to get i350 SR-IOV VFs working in my Xen environment
> (XenServer 6.0 [Xen 4.1], igb 3.2.10, HVM CentOS 5.x guest with igbvf
> 1.1.5, Dell T3500). We've had a lot of experience and success with SR-IOV
> with both the 82599 and 82576 ET2; this work was to extend our support to
> include the i350. The problem I observed was that around 9 times out of 10
> the VF link was showing as down, both after the initial module load and
> after a "ifconfig up". On the occasions the link did come up the VF worked
> perfectly. Adding a few printks showed the link was being declared down
> due to the VF/PF mailbox timeout having been hit (setting mbx->timeout to
> zero). Further instrumentation suggests that the timeout occurred when the
> VF was waiting for an ACK from the PF that never came, due to the original
> VF to PF message not being received by the PF. Based on limited logging
> the following sequence is what I believe happened:
> 
>  1. VF sends message (e.g. 0x00010003) to PF (get VFU lock; write buffer;
> release VFU lock/signal REQ), VF waits for ACK
>  2. PF igb_msg_task handler checks the VFICR and finds REQ set
>  3. PF igb_msg_task handler calls igb_rcv_msg_from_vf which reads message
> and ACKs (get PFU lock; read buffer; release PFU lock/signal ACK)
>  4. PF deals with message, e.g. calling igb_set_vf_multicasts
>  5. VF receives ACK
>  6. VF moves on to send next message (e.g. 0x0006) to PF (get VFU
> lock; write buffer; release VFU lock/signal REQ), VF waits for ACK
>  7. PF igb_rcv_msg_from_vf sends reply message (orig msg |
> E1000_VT_MSGTYPE_ACK|E1000_VT_MSGTYPE_CTS) from PF to VF (get PFU lock;
> clear REQ/ACK bits in VFICR ***clearing the REQ flag just set by VF***;
> write buffer ***overwriting VF's pending message***; release PFU
> lock/signal STS)
>  8. PF igb_msg_task handler runs but REQ flag is zero so message not
> handled
>  9. VF times out waiting for ACK to the second message
> 
> >From inspecting the code in both drivers my understanding is that the
> VFU/PFU locking mechanism is only being used to protect the message buffer
> while it is being read or written, it is not protecting against the buffer
> being re-used by either driver before the existing message has been
> handled (the lock is released when setting REQ/STS, not on receiving the
> ACK as the protocol description in the 82576 datasheet suggests). Adding a
> 5000usec sleep after each message send from the VF makes the original
> link-down failure go away giving some confidence to the race condition
> theory.
> 
> I believe that the race should not be a problem if the VF and PF are
> exchanging messages in the expected order however in the above case the VF
> sent a E1000_VF_SET_MULTICAST message but did not wait for the reply
> message from the PF. Reviewing the driver code shows that of the five
> messages the PF igb_rcv_msg_from_vf would reply to the VF driver does not
> wait for replies for three (E1000_VF_SET_MULTICAST, E1000_VF_SET_LPE and
> E1000_VF_SET_VLAN). Patching the VF driver (see below) to perform dummy
> reads after sending each of these three messages makes the original link-
> down failure go away.
> 
> Have I misunderstood the locking strategy for the mailbox? As far as I can
> see nothing has changed in the newer igb and ibgvf drivers that would
> explain why I'm seeing the VF link failure on the i350 but not on the
> other NICs (I don't see this with the older 82576 in the same system with
> the same drivers). I can only assume it's just very bad luck with timing
> in this particular system and configuration.
> 
> Regards,
> 

Re: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF link staying down

2011-11-03 Thread Rose, Gregory V
James,

Could you please try this patch?  It should be pretty much identical to yours 
but reuses code a bit more.

If it works for you then we'll go ahead and submit for internal validation and 
testing and then take it from there.

Thanks,

--- igbvf-old/igbvf-2.0.1/src/e1000_vf.c2011-11-03 10:00:52.0 
-0700
+++ igbvf-new/igbvf-2.0.1/src/e1000_vf.c2011-11-03 09:59:10.0 
-0700
@@ -252,6 +252,16 @@ static u32 e1000_hash_mc_addr_vf(struct 
return hash_value;
 }
 
+static void write_and_read_response(struct e1000_hw *hw, u32 *msg, u16 size)
+{
+   struct e1000_mbx_info *mbx = &hw->mbx;
+   u32 retmsg[E1000_VFMAILBOX_SIZE];
+   s32 retval = mbx->ops.write_posted(hw, msg, size, 0);
+
+   if (!retval)
+   mbx->ops.read_posted(hw, retmsg, E1000_VFMAILBOX_SIZE, 0);
+}
+
 /**
  *  e1000_update_mc_addr_list_vf - Update Multicast addresses
  *  @hw: pointer to the HW structure
@@ -264,7 +274,6 @@ static u32 e1000_hash_mc_addr_vf(struct 
 void e1000_update_mc_addr_list_vf(struct e1000_hw *hw,
  u8 *mc_addr_list, u32 mc_addr_count)
 {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[E1000_VFMAILBOX_SIZE];
u16 *hash_list = (u16 *)&msgbuf[1];
u32 hash_value;
@@ -298,7 +307,7 @@ void e1000_update_mc_addr_list_vf(struct
mc_addr_list += ETH_ADDR_LEN;
}
 
-   mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+   write_and_read_response(hw, msgbuf, E1000_VFMAILBOX_SIZE);
 }
 
 /**
@@ -309,7 +318,6 @@ void e1000_update_mc_addr_list_vf(struct
  **/
 void e1000_vfta_set_vf(struct e1000_hw *hw, u16 vid, bool set)
 {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[2];
 
msgbuf[0] = E1000_VF_SET_VLAN;
@@ -318,7 +326,7 @@ void e1000_vfta_set_vf(struct e1000_hw *
if (set)
msgbuf[0] |= E1000_VF_SET_VLAN_ADD;
 
-   mbx->ops.write_posted(hw, msgbuf, 2, 0);
+   write_and_read_response(hw, msgbuf, 2);
 }
 
 /** e1000_rlpml_set_vf - Set the maximum receive packet length
@@ -327,13 +335,12 @@ void e1000_vfta_set_vf(struct e1000_hw *
  **/
 void e1000_rlpml_set_vf(struct e1000_hw *hw, u16 max_size)
 {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[2];
 
msgbuf[0] = E1000_VF_SET_LPE;
msgbuf[1] = max_size;
 
-   mbx->ops.write_posted(hw, msgbuf, 2, 0);
+   write_and_read_response(hw, msgbuf, 2);
 }
 
 /**





igbvf-mbx-patch.patch
Description: igbvf-mbx-patch.patch
--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF link staying down

2011-11-03 Thread James Bulpin
Hi Greg,

That looks a lot more elegant than mine :) . I've tested on my rig and it works 
fine (applied to igbvf 1.1.5). 

Thanks,
James

-Original Message-
From: Rose, Gregory V [mailto:gregory.v.r...@intel.com] 
Sent: 03 November 2011 17:21
To: Rose, Gregory V; James Bulpin; e1000-devel@lists.sourceforge.net
Subject: RE: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF 
link staying down

James,

Could you please try this patch?  It should be pretty much identical to yours 
but reuses code a bit more.

If it works for you then we'll go ahead and submit for internal validation and 
testing and then take it from there.

Thanks,

--- igbvf-old/igbvf-2.0.1/src/e1000_vf.c2011-11-03 10:00:52.0 
-0700
+++ igbvf-new/igbvf-2.0.1/src/e1000_vf.c2011-11-03 09:59:10.0 
-0700
@@ -252,6 +252,16 @@ static u32 e1000_hash_mc_addr_vf(struct 
return hash_value;
 }
 
+static void write_and_read_response(struct e1000_hw *hw, u32 *msg, u16 
+size) {
+   struct e1000_mbx_info *mbx = &hw->mbx;
+   u32 retmsg[E1000_VFMAILBOX_SIZE];
+   s32 retval = mbx->ops.write_posted(hw, msg, size, 0);
+
+   if (!retval)
+   mbx->ops.read_posted(hw, retmsg, E1000_VFMAILBOX_SIZE, 0); }
+
 /**
  *  e1000_update_mc_addr_list_vf - Update Multicast addresses
  *  @hw: pointer to the HW structure
@@ -264,7 +274,6 @@ static u32 e1000_hash_mc_addr_vf(struct  void 
e1000_update_mc_addr_list_vf(struct e1000_hw *hw,
  u8 *mc_addr_list, u32 mc_addr_count)  {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[E1000_VFMAILBOX_SIZE];
u16 *hash_list = (u16 *)&msgbuf[1];
u32 hash_value;
@@ -298,7 +307,7 @@ void e1000_update_mc_addr_list_vf(struct
mc_addr_list += ETH_ADDR_LEN;
}
 
-   mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
+   write_and_read_response(hw, msgbuf, E1000_VFMAILBOX_SIZE);
 }
 
 /**
@@ -309,7 +318,6 @@ void e1000_update_mc_addr_list_vf(struct
  **/
 void e1000_vfta_set_vf(struct e1000_hw *hw, u16 vid, bool set)  {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[2];
 
msgbuf[0] = E1000_VF_SET_VLAN;
@@ -318,7 +326,7 @@ void e1000_vfta_set_vf(struct e1000_hw *
if (set)
msgbuf[0] |= E1000_VF_SET_VLAN_ADD;
 
-   mbx->ops.write_posted(hw, msgbuf, 2, 0);
+   write_and_read_response(hw, msgbuf, 2);
 }
 
 /** e1000_rlpml_set_vf - Set the maximum receive packet length @@ -327,13 
+335,12 @@ void e1000_vfta_set_vf(struct e1000_hw *
  **/
 void e1000_rlpml_set_vf(struct e1000_hw *hw, u16 max_size)  {
-   struct e1000_mbx_info *mbx = &hw->mbx;
u32 msgbuf[2];
 
msgbuf[0] = E1000_VF_SET_LPE;
msgbuf[1] = max_size;
 
-   mbx->ops.write_posted(hw, msgbuf, 2, 0);
+   write_and_read_response(hw, msgbuf, 2);
 }
 
 /**




--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to VF link staying down

2011-11-03 Thread Rose, Gregory V
Great to hear.  We'll spin a build for our internal validation team.  Barring 
any unforeseen issues it should be coming out in the next release of the igbvf 
driver to sourceforge.  We'll keep you updated if anything comes up.

Thanks!

- Greg

> -Original Message-
> From: James Bulpin [mailto:james.bul...@eu.citrix.com]
> Sent: Thursday, November 03, 2011 11:45 AM
> To: Rose, Gregory V; e1000-devel@lists.sourceforge.net
> Subject: RE: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to
> VF link staying down
> 
> Hi Greg,
> 
> That looks a lot more elegant than mine :) . I've tested on my rig and it
> works fine (applied to igbvf 1.1.5).
> 
> Thanks,
> James
> 
> -Original Message-
> From: Rose, Gregory V [mailto:gregory.v.r...@intel.com]
> Sent: 03 November 2011 17:21
> To: Rose, Gregory V; James Bulpin; e1000-devel@lists.sourceforge.net
> Subject: RE: [E1000-devel] igb/i350 VF/PF mailbox locking race leading to
> VF link staying down
> 
> James,
> 
> Could you please try this patch?  It should be pretty much identical to
> yours but reuses code a bit more.
> 
> If it works for you then we'll go ahead and submit for internal validation
> and testing and then take it from there.
> 
> Thanks,
> 
> --- igbvf-old/igbvf-2.0.1/src/e1000_vf.c  2011-11-03 10:00:52.0 -
> 0700
> +++ igbvf-new/igbvf-2.0.1/src/e1000_vf.c  2011-11-03 09:59:10.0 -
> 0700
> @@ -252,6 +252,16 @@ static u32 e1000_hash_mc_addr_vf(struct
>   return hash_value;
>  }
> 
> +static void write_and_read_response(struct e1000_hw *hw, u32 *msg, u16
> +size) {
> + struct e1000_mbx_info *mbx = &hw->mbx;
> + u32 retmsg[E1000_VFMAILBOX_SIZE];
> + s32 retval = mbx->ops.write_posted(hw, msg, size, 0);
> +
> + if (!retval)
> + mbx->ops.read_posted(hw, retmsg, E1000_VFMAILBOX_SIZE, 0); }
> +
>  /**
>   *  e1000_update_mc_addr_list_vf - Update Multicast addresses
>   *  @hw: pointer to the HW structure
> @@ -264,7 +274,6 @@ static u32 e1000_hash_mc_addr_vf(struct  void
> e1000_update_mc_addr_list_vf(struct e1000_hw *hw,
> u8 *mc_addr_list, u32 mc_addr_count)  {
> - struct e1000_mbx_info *mbx = &hw->mbx;
>   u32 msgbuf[E1000_VFMAILBOX_SIZE];
>   u16 *hash_list = (u16 *)&msgbuf[1];
>   u32 hash_value;
> @@ -298,7 +307,7 @@ void e1000_update_mc_addr_list_vf(struct
>   mc_addr_list += ETH_ADDR_LEN;
>   }
> 
> - mbx->ops.write_posted(hw, msgbuf, E1000_VFMAILBOX_SIZE, 0);
> + write_and_read_response(hw, msgbuf, E1000_VFMAILBOX_SIZE);
>  }
> 
>  /**
> @@ -309,7 +318,6 @@ void e1000_update_mc_addr_list_vf(struct
>   **/
>  void e1000_vfta_set_vf(struct e1000_hw *hw, u16 vid, bool set)  {
> - struct e1000_mbx_info *mbx = &hw->mbx;
>   u32 msgbuf[2];
> 
>   msgbuf[0] = E1000_VF_SET_VLAN;
> @@ -318,7 +326,7 @@ void e1000_vfta_set_vf(struct e1000_hw *
>   if (set)
>   msgbuf[0] |= E1000_VF_SET_VLAN_ADD;
> 
> - mbx->ops.write_posted(hw, msgbuf, 2, 0);
> + write_and_read_response(hw, msgbuf, 2);
>  }
> 
>  /** e1000_rlpml_set_vf - Set the maximum receive packet length @@ -327,13
> +335,12 @@ void e1000_vfta_set_vf(struct e1000_hw *
>   **/
>  void e1000_rlpml_set_vf(struct e1000_hw *hw, u16 max_size)  {
> - struct e1000_mbx_info *mbx = &hw->mbx;
>   u32 msgbuf[2];
> 
>   msgbuf[0] = E1000_VF_SET_LPE;
>   msgbuf[1] = max_size;
> 
> - mbx->ops.write_posted(hw, msgbuf, 2, 0);
> + write_and_read_response(hw, msgbuf, 2);
>  }
> 
>  /**
> 
> 


--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired