Hi Guys,

I sent this to fcoe-devel but it might be holiday season or the
mailing list is abandoned as the emails concerning fcoe are
pretty low.

On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
Date: Mon, 23 Jul 2018 14:16:31 +0200
From: ard <a...@kwaak.net>
Subject: FCOE vn2vn memory leaks in 4.14
To: fcoe-de...@open-fcoe.org

Hi guys,

After an upgrade of one of my systems from 3.10 to 4.14.55, I
noticed a serious memory leak.
As this kernel is not 100% vanilla, I started the bug report
here:
https://github.com/hardkernel/linux/issues/360

The essence is this:
I have an FCoE interface assigned to a vlan on a nic.
These were remnants of a test I did. The FCoE was still
configured, but no targets were exported to that endpoint.
So it would see and join multicast announcements of 2 other
systems, but do nothing with it.
This was good enoug to waste about 600MB of memory in 2 or 3
days.
Some things have changed, maybe the amount of announcements (due
to the heat I turn of systems), or really something in the
kernel. But after 1 week I really have to pro-actively reboot the
systeme in order to avoid OOM's.
I've now disabled the the FCoE vlan on the port of that system,
so it won't get any broadcasts.
No memory leaks so far.
The kmemleak is in that bug report, I won't mail it, since its
2.5MB.
The gist seems to be:
  backtrace:
    [<bf3382ec>] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
    [<bf338c64>] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
    [<bf33a400>] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
    [<c013dbb0>] process_one_work+0x138/0x4bc

These seem to stand out:
root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
fcoe_fip_vlan_recv kmemleak.txt 
1090
898

So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
Except for one system that I turn off and on once a day, all other systems are
stable running (older kernel though).

The system I turnn of and on again also has some vn2vn problems and that's also
a 4.14 kernel.
(steam machine with steamos kernel, fcoe not actively used, but with a bcache
on one of the targets, it probably auto registers a dependency)
This is outside the scope of this ticket though.

The system with the memory leak is a system intended to run 24/7.

If anyone can point me to the right place, or help me...

Regards,
Ard van Breemen

-- 
.signature not found

Reply via email to