On Thu, 2016-07-07 at 01:09 +0000, Lu, Wenzhuo wrote:
> Hi Luca,
> 
> 
> > -----Original Message-----
> > From: Luca Boccassi [mailto:lboccass at Brocade.com]
> > Sent: Thursday, July 7, 2016 12:23 AM
> > To: Lu, Wenzhuo
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > 
> > On Wed, 2016-07-06 at 00:45 +0000, Lu, Wenzhuo wrote:
> > > Hi Luca,
> > >
> > > > -----Original Message-----
> > > > From: Luca Boccassi [mailto:lboccass at Brocade.com]
> > > > Sent: Tuesday, July 5, 2016 5:53 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > >
> > > > On Tue, 2016-07-05 at 00:52 +0000, Lu, Wenzhuo wrote:
> > > > > Hi Luca,
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Luca Boccassi [mailto:lboccass at Brocade.com]
> > > > > > Sent: Monday, July 4, 2016 11:48 PM
> > > > > > To: Lu, Wenzhuo
> > > > > > Cc: dev at dpdk.org
> > > > > > Subject: Re: [dpdk-dev] [PATCH v6 0/4] support reset of VF link
> > > > > >
> > > > > > On Mon, 2016-06-20 at 14:24 +0800, Wenzhuo Lu wrote:
> > > > > > > If the PF link is down and up, VF link will not work accordingly.
> > > > > > > This patch set addes the support of VF link reset. So, when VF
> > > > > > > receices the messges of physical link down/up. APP can reset
> > > > > > > the VF link and let it recover.
> > > > > > >
> > > > > > > PS: This patch set is splitted from a previous patch set,
> > > > > > > *automatic link recovery on ixgbe/igb VF*, and it's base on
> > > > > > > the patch set *support mailbox interruption on ixgbe/igb VF*.
> > > > > > >
> > > > > > > Wenzhuo Lu (3):
> > > > > > >   lib/librte_ether: support device reset
> > > > > > >   ixgbe: implement device reset on VF
> > > > > > >   igb: implement device reset on VF
> > > > > > >
> > > > > > > Zhe Tao (1):
> > > > > > >   i40e: implement device reset on VF
> > > > > > >
> > > > > > > v1:
> > > > > > > - Added the implementation for the VF reset functionality.
> > > > > > > v2:
> > > > > > > - Changed the i40e related operations during VF reset.
> > > > > > > v3:
> > > > > > > - Resent the patches because of the mail sent issue.
> > > > > > > v4:
> > > > > > > - Removed some VF reset emulation code.
> > > > > > > v5:
> > > > > > > - Removed all the code related with lock.
> > > > > > > v6:
> > > > > > > - Updated the NIC feature overview matrix.
> > > > > > > - Added more explanation in the doxygen comment of reset API.
> > > > > > >
> > > > > > >  doc/guides/nics/overview.rst           |  1 +
> > > > > > >  doc/guides/rel_notes/release_16_07.rst | 13 ++++++
> > > > > > >  drivers/net/e1000/igb_ethdev.c         | 59 
> > > > > > > ++++++++++++++++++++++++
> > > > > > >  drivers/net/i40e/i40e_ethdev.h         |  4 ++
> > > > > > >  drivers/net/i40e/i40e_ethdev_vf.c      | 83
> > > > > > ++++++++++++++++++++++++++++++++++
> > > > > > >  drivers/net/i40e/i40e_rxtx.c           | 10 ++++
> > > > > > >  drivers/net/i40e/i40e_rxtx.h           |  4 ++
> > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.c       | 64
> > +++++++++++++++++++++++++-
> > > > > > >  drivers/net/ixgbe/ixgbe_ethdev.h       |  2 +-
> > > > > > >  drivers/net/ixgbe/ixgbe_rxtx.c         | 12 +++--
> > > > > > >  lib/librte_ether/rte_ethdev.c          | 17 +++++++
> > > > > > >  lib/librte_ether/rte_ethdev.h          | 24 ++++++++++
> > > > > > >  lib/librte_ether/rte_ether_version.map |  7 +++
> > > > > > >  13 files changed, 295 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > Hello Wenzhuo,
> > > > > >
> > > > > > I'm testing this patchset, but I am sporadically running into an
> > > > > > issue where the VFs reset fails after the PF flaps.
> > > > > >
> > > > > > I have a VM running on a KVM box with a X540-AT2, passing 2 VFs in.
> > > > > >
> > > > > > I am using calling rte_eth_dev_reset in response to a
> > > > > > RTE_ETH_EVENT_INTR_RESET callback, and the following errors
> > > > > > appear in the
> > > > > > log:
> > > > > >
> > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to update link.
> > > > > > PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed
> > > > > > queue_id=0
> > > > > > PMD: ixgbevf_dev_start(): Unable to initialize RX hardware (-12)
> > > > > > PMD: ixgbevf_dev_reset(): Ixgbe VF reset: Failed to start device.
> > > > > >
> > > > > > Jumping in with GDB, it seems that the rte_rxmbuf_alloc call in
> > > > > > ixgbe_alloc_rx_queue_mbufs returns NULL at iteration 64 out of 2048.
> > > > > > The application has ~500 2MB hugepages, and there's 2GB of free
> > > > > > memory available on top of that.
> > > > > >
> > > > > > Have you seen this before? Any pointer or suggestion for debugging?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > --
> > > > > > Kind regards,
> > > > > > Luca Boccassi
> > > > > I think the problem is the mbuf occupied by the packets is not
> > > > > released. This
> > > > memory has to be released by the APP, so my patches haven?t covered 
> > > > this.
> > > > Actually an example is needed to show how to use the reset API. I
> > > > plan to modify the testpmd.
> > > > > You may notice this feature is postponed to 16.11. Would you like
> > > > > to wait for
> > > > the new version that will include an example?
> > > >
> > > > Hi,
> > > >
> > > > Unfortunately we need the VF reset working sooner than that, so one
> > > > way or the other I'll need to sort it out. Given I've got a use case
> > > > where this is happening, if it can be helpful for you I'm more than
> > > > happy to help as a guinea pig. If you could please give some
> > > > guidance/guidelines with regards to which API to use to sort the mbuf
> > problem, I can try it out and give back some feedback.
> > > >
> > > > Thanks!
> > > I made a stupid mistake and deleted all my code. So, I have to take
> > > some time to rewrite it :( Attached the example I used to test the reset 
> > > API. It's
> > modified from the l2fwd example. So you can compare it with l2fwd to see 
> > what
> > need to be added.
> > > Hopefully it can help :)
> > 
> > Thanks! That made me understand a couple of things more, and I've got past 
> > the
> > problem.
> > 
> > Unfortunately now there's a bigger issue - rte_eth_dev_reset is a blocking 
> > call.
> > the _RESET event callback is fired when the PF goes down, but when I call
> > rte_eth_dev_reset it will block until the PF goes back up. There is no way, 
> > as far
> > as I can see, to know if the PF is back up before calling rte_eth_dev_reset.
> > 
> > This is a problem because, as far as I understand, I have to call all the
> > rte_eth_dev_ APIs from the same thread, in my case the master thread, and I
> > can't have that block potentially indefinitely.
> > 
> > Would it be possible to have 2 events instead of 1, one when the PF goes 
> > down
> > and one when it goes up? This way an application would be able to soft-stop 
> > the
> > port (drain queues, etc) when the PF is down, and then call the reset API 
> > when it
> > goes back up.
> > 
> > Thanks!
> Sorry we cannot have 2 events now. There're 2 problems to have 2 events.
> 1, Normally we use kernel driver for PF. Now the kernel driver only have one 
> kind of message for link down and up. So we cannot tell if it's down or up.
> 2, When the PF is down, if we don't reset the VF, VF is not working. It 
> cannot receive any message from PF. So we cannot know that when PF is up. It 
> means normally we have to reset VF twice when PF down and up. (Surely we can 
> wait a while when we receive the message from PF until PF is up. But we 
> cannot tell how long the time is appropriate. So this *wait a while* may work 
> for flash.)

Thanks for the clarification, I understand.

The problem with a blocking call is that we basically need to spawn one
thread per rte_eth_dev_reset call, since there is no way of knowing if a
PF is down for good or just flapping, and we can't have a single thread
managing all the interfaces being blocked forever (EG: PF 1 and 2 go
down, thread blocks on PF 1 reset call but it never returns, meanwhile
PF 2 goes back up but call is never made).

A colleague of mine, Eric Kinzie, suggested to add a blocking boolean
parameter to rte_eth_dev_reset API. If set to false, then the call will
not block and just does one try and return an error (EAGAIN ?). Would
this be an acceptable proposition?

-- 
Kind regards,
Luca Boccassi

Reply via email to