On Mon, May 16, 2016 at 6:39 AM, Tudor Cornea <tcor...@ixiacom.com> wrote:
> Greetings all,
>
> I'm experiencing Tx queue hangs with the ixgbe driver.
>
> The scenario that I am running is the following:
>
> 1.       Have two X540-AT2 nics, and connect them through a cable
>
> 2.       For each of the two nics create a virtual function from the host OS
>
> Ex: echo 1 > /sys/class/net/enp5s0f1/device/sriov_numvfs
>
> 3.       Create two guest virtual machines, and assign the PFs to them (not 
> the virtual functions)
>
> 4.       Boot up the guests, and try to send some packets through the 
> interfaces that will be managed by the ixgbe driver
>
> Ex: ping 1.1.1.1 -I eth1
>
> 5.       Notice that the adapter keeps resetting due to a Tx queue hanging
>
> The dmesg log shows the following information:
>
> ixgbe 0000:00:08.0: eth1: Detected Tx Unit Hang
>   Tx Queue             <0>
>   TDH, TDT             <2>, <3>
>   next_to_use          <3>
>   next_to_clean        <0>
> tx_buffer_info[next_to_clean]
>   time_stamp           <1000d00d6>
>   jiffies              <1000d0a95>
> ixgbe 0000:00:08.0: eth1: tx hang 4 detected on queue 0, resetting adapter
> ixgbe 0000:00:08.0: eth1: Reset adapter
> ixgbe 0000:00:08.0: eth1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
>
>                 Also, running ethtool tests on the NIC shows that it's in an 
> inconsistent state
>
> [root@localhost ~]# ethtool -t eth1
> The test result is FAIL
> The test extra info:
> Register test  (offline)         0
> Eeprom test    (offline)         0
> Interrupt test (offline)         0
> Loopback test  (offline)         13
> Link test   (on/offline)         0

This is likely due to residual VF configuration floating around in the
device.  I'm assuming that your guest that contains the PF is not
aware that there are VFs allocated to it.  I assume you are using a
440FX model for your guest and not something that supports MM config
like the Q35 model.  As such the guest isn't aware that SR-IOV has
assigned the first block of queues to the VFs and as such when it
tries to Tx using those queues it cannot because the VFs are denied
permission to that guest via the IOMMU functionality in your host.

> I would add that if I were to remove the VF interfaces prior to assigning the 
> PF to the guests, the problem does not happen anymore,
> which makes me suspect it's a configuration related to the way that the 
> transmit queues are configured in the case of VFs being enabled.

That definitely is not supported.  You should not be able to
direct-assign a PF if there are VFs allocated to it.

> My setup is the following:
>
>
> 1.       NIC: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2
>
> 2.       Hypervisor:
> Type: KVM
> Distro: CentOS 7
> Kernel: 3.10.0-229.4.2.el7.x86_64
>
> 3.       Guest
> Distro: CentOS 6.3
> Kernel: Customized 3.10 kernel , but saw the problem with a default CentOS 
> stock kernel as well (2.6.32-279.el6.x86_64)
>                 Ixgbe version: 3.6.7-k
>
> For now, as a workaround I can disable the virtual functions, before using 
> pass-through on the PF.
> I would like to understand better why does this problem happen, and if there 
> are any patches that attempt to fix the issue.

Most likely any "fix" would probably prevent you from being able to
direct assign the PF if there are any VFs allocated to it.  The
problem is the host has to be there to manage the PF and any resources
that it allocated to the VFs.  By direct assigning the PF you likely
triggered a function level reset on the PF which would have reset the
device and disabled SR-IOV in the process.  If you check lspci -vvv on
the host after direct assigning it to a guest you should probably see
that most of the resources related to SR-IOV have been modified and or
freed.

> I know there are a few problems related to when SR-IOV and DCB are 
> simultaneously enabled [1] , but I don't think this is the case in my setup.
> My customized guest kernel has CONFIG_DCB and CONFIG_PCI_IOV disabled while 
> the stock kernel from CentOS has them enabled, yet I manage to see the 
> problem in both cases.
>
> [1] http://www.spinics.net/lists/netdev/msg203427.html

Actually that is something I fixed a long time ago.  DCB and SR-IOV
have been able to co-exist for some time now.

> Thanks,
> Tudor

Hope my explanation helps.

- Alex

------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to