W dniu 2017-10-17 o 16:08, Paweł Staszewski pisze:


W dniu 2017-10-17 o 13:52, Paweł Staszewski pisze:


W dniu 2017-10-17 o 13:05, Paweł Staszewski pisze:


W dniu 2017-10-17 o 12:59, Paweł Staszewski pisze:


W dniu 2017-10-17 o 12:51, Paweł Staszewski pisze:


W dniu 2017-10-17 o 12:20, Paweł Staszewski pisze:


W dniu 2017-10-17 o 11:48, Paweł Staszewski pisze:


W dniu 2017-10-17 o 02:44, Paweł Staszewski pisze:


W dniu 2017-10-17 o 01:56, Alexander Duyck pisze:
On Mon, Oct 16, 2017 at 4:34 PM, Paweł Staszewski <pstaszew...@itcare.pl> wrote:

W dniu 2017-10-16 o 18:26, Paweł Staszewski pisze:


W dniu 2017-10-16 o 13:20, Pavlos Parissis pisze:
On 15/10/2017 02:58 πμ, Alexander Duyck wrote:
Hi Pawel,

To clarify is that Dave Miller's tree or Linus's that you are talking about? If it is Dave's tree how long ago was it you pulled it since I
think the fix was just pushed by Jeff Kirsher a few days ago.

The issue should be fixed in the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972

Do you know when it is going to be available on net-next and linux-stable
repos?

Cheers,
Pavlos


I will make some tests today night with "net" git tree where this patch is
included.
Starting from 0:00 CET
:)


Upgraded and looks like problem is not solved with that patch
Currently running system with
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
kernel

Still about 0.5GB of memory is leaking somewhere

Also can confirm that the latest kernel where memory is not leaking (with
use i40e driver intel 710 cards) is 4.11.12
With kernel 4.11.12 - after hour no change in memory usage.

also checked that with ixgbe instead of i40e with same net.git kernel there is no memleak - after hour same memory usage - so for 100% this is i40e
driver problem.
So how long was the run to get the .5GB of memory leaking?
1 hour


Also is there any chance of you being able to bisect to determine
where the memory leak was introduced since as you pointed out it
didn't exist in 4.11.12 so odds are it was introduced somewhere
between 4.11 and the latest kernel release.
Can be hard cause currently need to back to 4.11.12 - this is production host/router Will try to find some free/test router for tests/bicects with i40e driver (intel 710 cards)


Thanks.

- Alex



Also forgoto to add errors for i40e when driver initialize:
[   15.760569] i40e 0000:02:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.365587] i40e 0000:03:00.3: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.367686] i40e 0000:02:00.2: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.368816] i40e 0000:03:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.369877] i40e 0000:03:00.2: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.370941] i40e 0000:02:00.3: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.372005] i40e 0000:02:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on [   16.373029] i40e 0000:03:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on

some params that are set for this nic's
        ip link set up dev $i
        ethtool -A $i autoneg off rx off tx off
        ethtool -G $i rx 1024 tx 2048
        ip link set $i txqueuelen 1000
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 tx-usecs 128
        ethtool -L $i combined 6
        #ethtool -N $i rx-flow-hash udp4 sdfn
        ethtool -K $i ntuple on
        ethtool -K $i gro off
        ethtool -K $i tso off




Also after TSO/GRO on there is memory usage change - and leaking faster Below image from memory usage before change with TSO/GRO OFF and after enabling TSO/GRO

https://ibb.co/dTqBY6


Thanks
Pawel



With settings like this:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 tx-usecs 128
        ethtool -K $i gro on
        ethtool -K $i tso on

        done

Server is leaking about 4-6MB per each 10 seconds
MEMLEAK:
5  MB/10sec
6  MB/10sec
4  MB/10sec
4  MB/10sec


Other settings TSO/GRO off
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 tx-usecs 128
        ethtool -K $i gro off
        ethtool -K $i tso off

        done

Same leak about 5MB per 10 seconds
MEMLEAK:
5  MB/10sec
5  MB/10sec
5  MB/10sec


Other settings rx-usecs change from 512 to 1024:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 1024 tx-usecs 128
        ethtool -K $i gro off
        ethtool -K $i tso off

        done

MEMLEAK:
4  MB/10sec
3  MB/10sec
4  MB/10sec
4  MB/10sec


So memleak have something to do with rx-usecs (less interrupts but bigger latency for traffic)


But also enabling TSO/GRO making leak about 1MB bigger for each 10 seconds



So far best config is:
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 64 tx-usecs 512
        ethtool -K $i gro off
        ethtool -K $i tso on

        done

MEMLEAK - about 2MB/10secs
2  MB/10sec
2  MB/10sec
2  MB/10sec


With - rx-usecs set to 256 (about 7-9MB/10secs memleak)
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 256 tx-usecs 512
        ethtool -K $i gro off
        ethtool -K $i tso on

        done

MEMLEAK:
7  MB/10sec
7  MB/10sec
8  MB/10sec
9  MB/10sec



And even less memleak with rx-usecs set to 32
ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 32 tx-usecs 512
        ethtool -K $i gro off
        ethtool -K $i tso on

        done


MEMLEAK - about 0-2MB for each 10 seconds
0  MB/10sec
1  MB/10sec
0  MB/10sec
2  MB/10sec
1  MB/10sec





So best settings - to have as less leak as possible for now (rx-usecs set to 16): ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 16 tx-usecs 768
        ethtool -K $i gro on
        ethtool -K $i tso on

        done


MEMLEAK: (0-1MB/10seconds)
0  MB/10sec
0  MB/10sec
0  MB/10sec
1  MB/10sec
1  MB/10sec
-1  MB/10sec
1  MB/10sec
1  MB/10sec
0  MB/10sec

(there are some memory recycles - so this is good :) )



Compared to(rx-usecs 512):

ifc='enp2s0f0 enp2s0f1 enp2s0f2 enp2s0f3 enp3s0f0 enp3s0f1 enp3s0f2 enp3s0f3'
for i in $ifc
        do
        ethtool -C $i adaptive-rx off adaptive-tx off rx-usecs 512 tx-usecs 128
        ethtool -K $i gro on
        ethtool -K $i tso on

        done

Server is leaking about 4-6MB per each 10 seconds
MEMLEAK:
5  MB/10sec
6  MB/10sec
4  MB/10sec
4  MB/10sec



And  graph where all changes for rx-usecs was done over some time:
https://ibb.co/nrRfbR





Cant eliminate the problem with settings - memleak is bigger or less visible with rx-usecs set to low values - but then have 100% cpu load - cant have rx-usecs set to 16

Cant find also other host with same cards or that are using i40e driver for tests with bisecting
So will just replace to mellanox :)

Reply via email to