10.06.2014 22:23, Tantilov, Emil S пишет:
>> -----Original Message-----
>> From: Андрей Василишин [mailto:[email protected]]
>> Sent: Tuesday, June 10, 2014 8:35 AM
>> To: Tantilov, Emil S; [email protected]
>> Subject: Re: [E1000-devel] 82599EB big packet loss
>>
>>
>> Emil, thaks for reply!
>>>> with one of this modules (don't remember which one):
>> Intel SFP+ SR FTLX8571D3BCV-IT or Intel SFP+ SR AFBR-
>> 703SDZ-IN2
>>> If the issue is only seen with specific module, then it
>> would help to know the exact model.
>>>
>> I have three servers with 82599 NICs (two with 82599EB and
>> one with 82599ES) in one data-centre, all with the same behavior.
>>
>>>> When OUTbount traffic up to ~9800 Mbit/s INbound traffic
>> appear packet loss up to 30%
>>>
>>> From the stats, some of the counters that maybe related
>> to the packet loss:
>>>
>>>
>>>> rx_no_dma_resources: 68
>>>
>>> Because this counter increments, but xoff counters are 0,
>> I'm guessing you don't have flow control enabled. This
>> basically means that the driver has no free Rx descriptors
>> in the receive queue.
>>>
>>
>> Just enable ethtool -A eth0 rx on tx on
>>
>>>> fdir_overflow: 1424
>>>
>>> You have many sessions which exceed the space allocated
>> for Fdir filters. You can try and increase it using the FdirPballoc
>> parameter.
>>
>>
>> and increase FdirPballoc
>>
>> Now I have such options:
>> options ixgbe RSS=8 DCA=2 LLIPort=80 allow_unsupported_sfp=1
>> FdirPballoc=3
>>
>> The problem remained :(
>
> The counters you provided are from a smaller sample size (judging by the
> total packets), but they look very similar - at least with respect to
> fdir_overflow. If the number of sessions exceeds the buffer allowed by the
> driver for storing the Fdir filters, you may be better off creating your own
> using ethtool -U
>
> Also some other suggestions you may want to try:
>
> 1. If not already, try spreading the interrupts for each CPU using the
> set_irq_affinity script included with the driver instead of irqbalance.
I use own scriprt, which does the same:
77: 3448170536 0 0 0 0 0
0 0 IR-PCI-MSI-edge eth0-TxRx-0
78: 0 2048309948 0 0 0 0
0 0 IR-PCI-MSI-edge eth0-TxRx-1
79: 0 0 1548996028 0 0 0
0 0 IR-PCI-MSI-edge eth0-TxRx-2
80: 62 0 0 0 0 0
0 0 IR-PCI-MSI-edge mpt2sas0-msix0
81: 0 0 0 0 0 0
0 0 IR-PCI-MSI-edge ahci
82: 0 0 0 1408642754 0 0
0 0 IR-PCI-MSI-edge eth0-TxRx-3
83: 0 0 0 0 3530534336 0
0 0 IR-PCI-MSI-edge eth0-TxRx-4
84: 0 0 0 0 0 2062344051
0 0 IR-PCI-MSI-edge eth0-TxRx-5
85: 0 0 0 0 0 0
1532051797 0 IR-PCI-MSI-edge eth0-TxRx-6
86: 0 0 0 0 0 0
0 1365942581 IR-PCI-MSI-edge eth0-TxRx-7
>
> 2. Because flow control is not negotiated, you may want to check if your link
> partner supports it as well (usually you can tell if you check dmesg for the
> "Link Up" event - it will include information about whether flow control is
> enabled and in what direction).
>
# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on
> 3. LLI - I'm not sure why you enable it, but this could be adding to your
> trouble on Rx. You may want to experiment with ethtool -C rx-usec and allow
> for more ints/sec although this will affect your entire traffic, not just
> port 80.
>
I just load native debian wheezy module without any parameters^
# modinfo ixgbe
filename:
/lib/modules/3.2.0-4-amd64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version: 3.6.7-k
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, <[email protected]>
srcversion: ECD3B1926F04B11454F4AAD
alias: pci:v00008086d00001560sv*sd*bc*sc*i*
alias: pci:v00008086d0000154Asv*sd*bc*sc*i*
alias: pci:v00008086d00001557sv*sd*bc*sc*i*
alias: pci:v00008086d0000154Fsv*sd*bc*sc*i*
alias: pci:v00008086d0000154Dsv*sd*bc*sc*i*
alias: pci:v00008086d00001528sv*sd*bc*sc*i*
alias: pci:v00008086d000010F8sv*sd*bc*sc*i*
alias: pci:v00008086d0000151Csv*sd*bc*sc*i*
alias: pci:v00008086d00001529sv*sd*bc*sc*i*
alias: pci:v00008086d0000152Asv*sd*bc*sc*i*
alias: pci:v00008086d000010F9sv*sd*bc*sc*i*
alias: pci:v00008086d00001514sv*sd*bc*sc*i*
alias: pci:v00008086d00001507sv*sd*bc*sc*i*
alias: pci:v00008086d000010FBsv*sd*bc*sc*i*
alias: pci:v00008086d00001517sv*sd*bc*sc*i*
alias: pci:v00008086d000010FCsv*sd*bc*sc*i*
alias: pci:v00008086d000010F7sv*sd*bc*sc*i*
alias: pci:v00008086d00001508sv*sd*bc*sc*i*
alias: pci:v00008086d000010DBsv*sd*bc*sc*i*
alias: pci:v00008086d000010F4sv*sd*bc*sc*i*
alias: pci:v00008086d000010E1sv*sd*bc*sc*i*
alias: pci:v00008086d000010F1sv*sd*bc*sc*i*
alias: pci:v00008086d000010ECsv*sd*bc*sc*i*
alias: pci:v00008086d000010DDsv*sd*bc*sc*i*
alias: pci:v00008086d0000150Bsv*sd*bc*sc*i*
alias: pci:v00008086d000010C8sv*sd*bc*sc*i*
alias: pci:v00008086d000010C7sv*sd*bc*sc*i*
alias: pci:v00008086d000010C6sv*sd*bc*sc*i*
alias: pci:v00008086d000010B6sv*sd*bc*sc*i*
depends: mdio,dca
intree: Y
vermagic: 3.2.0-4-amd64 SMP mod_unload modversions
parm: max_vfs:Maximum number of virtual functions to allocate
per physical function (uint)
> 4. Try and disable LRO (you can tell by the hw_rsc counters) - ethtool -K
> ethX lro off. In addition increasing the number of Rx descriptors may help
> (ethtool -G)
# ethtool -K eth0 lro off
# ethtool -G eth0 rx 4096
# ethtool -G eth0 tx 4096
>
> 5. Last but not least because you are using unsupported module - it may be a
> good idea to try with SFP+ module that loads without the need for
> allow_unsupported_sfp. This will help you rule out incompatibilities with the
> SFP+ module.
I am using one of this modules: Intel SFP+ SR FTLX8571D3BCV-IT or
Intel SFP+ SR AFBR-703SDZ-IN2 and now this option disabled
> Performance is a tricky business and finding the parameters that work best
> for your environment may take some time. What I would suggest is start with a
> clean ixgbe driver and then use module parameters and/or ethtool to adjust
> settings one by one to see how your performance is affected by each change.
I've done it, nothing is changed. This server with this NIC but with
other transceiver (SFP+ LR) and other switch working fine.
Is it possible determine which transceiver plugged in NIC without
unplugged it?
Now I want determine, who is in fault: transceiver or switch or my
curves hands.
Statistics from HP 2910-al switch where the server is connected:
# show interfaces B2
Status and Counters - Port Counters for port B2
Name : host-13 [asavula 2014-05-06]
MAC Address : 78e3b5-2d2f0c
Link Status : Up
Totals (Since boot or last clear) :
Bytes Rx : 375,374,210 Bytes Tx : 2,313,982,624
Unicast Rx : 1,107,471,218 Unicast Tx : 3,260,764,315
Bcast/Mcast Rx : 103 Bcast/Mcast Tx : 205,893
Errors (Since boot or last clear) :
FCS Rx : 0 Drops Tx : 48,502
Alignment Rx : 0 Collisions Tx : 0
Runts Rx : 0 Late Colln Tx : 0
Giants Rx : 119 Excessive Colln : 0
Total Rx Errors : 119 Deferred Tx : 0
Others (Since boot or last clear) :
Discard Rx : 0 Out Queue Len : 0
Unknown Protos : 0
Rates (5 minute weighted average) :
Total Rx (Kbps) : 9,520,344 Total Tx (Kbps) : 341,360
Unicast Rx (Pkts/sec) : 785,759 Unicast Tx (Pkts/sec) : 409,250
B/Mcast Rx (Pkts/sec) : 0 B/Mcast Tx (Pkts/sec) : 0
Utilization Rx : 95.20 % Utilization Tx : 03.41 %
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired