Re: [PATCH 0/3] scsi: fcoe: memleak fixes

2018-08-15 Thread ard
Hi,

On Fri, Aug 10, 2018 at 10:34:49AM +0200, Johannes Thumshirn wrote:
> Hannes can you have a look at it?

As a side note, I am busy with other things the next 3 weeks, but
I will be able to add some printk's and run it in weeks again.
If you want me to put probes somewhere to have a better look at
the states :-).

Regards,
Ard

-- 
.signature not found


Re: [PATCH 0/3] scsi: fcoe: memleak fixes

2018-08-09 Thread ard
Hi,

On Thu, Aug 09, 2018 at 12:01:30PM +0200, ard wrote:
> This to determine if we have a single regression in just the
> login handling or both.
As a matter of fact, I think this will not work on vn2vn:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/scsi/libfc?id=386b97b43c0c9e0d878eec7ea1db16af22b036ae
scsi: libfc: Rework PRLI handling

As it clearly rejects the PRLI and eventually sends a LOGO when
no store is associated.
But as san setup goes: it first attaches the network driver to
the fcoe layer and hence is store less. Then it attaches stores
to export. But at that moment all possible initiators already got
a LOGO


But looking back at the logs, it is not all that bad as what I
said.  It's not erroring ad infinitum, it's: 

root@antec:~/logs# grep "vn_add rport 00c76e\|kmemleak" 2018-08-08-kern.log|cut 
-d\  -f9-|uniq -c
  1   2.577320] kmemleak: Kernel memory leak detector initialized
  1   2.577350] kmemleak: Automatic memory scanning thread started
  1 136.452894] kmemleak: 1 new suspected memory leaks (see 
/sys/kernel/debug/kmemleak)
  1 host10: fip: vn_add rport 00c76e new state 0
  1 host10: fip: vn_add rport 00c76e old state 0
  8 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
  2 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 52 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 50 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 47 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 55 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 52 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 46 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 50 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 46 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
 36 host10: fip: vn_add rport 00c76e old state 4
  1 kmemleak: 50 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
  1 kmemleak: 36 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
  1 kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
  1 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

>From the last few lines we can clearly see that de vn_add rport ...  old state 
>4
coincides with a kmemleak 10 minutes later. especially 50 and 36. Now the exact
dump follows for that.

Now to get back to what is the difference:
This is a working login with a 4.9:

Aug  8 10:53:15 localhost kernel: [   14.929451] host10: fip: vn_add rport 
0004e0 old state 0
Aug  8 10:53:21 localhost kernel: [   23.143274] host10: fip: beacon from rport 
4e0
Aug  8 10:53:21 localhost kernel: [   23.143275] host10: fip: beacon expired 
for rport 4e0
Aug  8 10:53:21 localhost kernel: [   23.143276] host10: rport 0004e0: Login to 
port
Aug  8 10:53:21 localhost kernel: [   23.143277] host10: rport 0004e0: Entered 
FLOGI state from Init state
Aug  8 10:53:21 localhost kernel: [   23.143294] host10: fip: els_send op 7 
d_id 4e0
Aug  8 10:53:21 localhost kernel: [   23.143301] host10: fip: beacon from rport 
4e0
Aug  8 10:53:21 localhost kernel: [   23.143920] host10: rport 0004e0: Received 
a FLOGI accept
Aug  8 10:53:21 localhost kernel: [   23.143921] host10: rport 0004e0: Port 
entered PLOGI state from FLOGI state
Aug  8 10:53:21 localhost kernel: [   23.143956] host10: rport 0004e0: Received 
a PLOGI accept
Aug  8 10:53:21 localhost kernel: [   23.143958] host10: rport 0004e0: Port 
entered PRLI state from PLOGI state
Aug  8 10:53:21 localhost kernel: [   23.143982] host10: rport 0004e0: Received 
a PRLI accept
Aug  8 10:53:21 localhost kernel: [   23.143984] host10: rport 0004e0: PRLI 
spp_flags = 0x0 spp_type 0x20
Aug  8 10:53:21 localhost kernel: [   23.143985] host10: rport 0004e0: Error -6 
in state PRLI, retrying
Aug  8 10:53:21 localhost kernel: [   23.144434] host10: rport 0004e0: Received 
PRLI request while in state PRLI
Aug  8 10:53:21 localhost kernel: [   23.15] host10: rport 0004e0: PRLI 
rspp type 8 active 1 passive 0
Aug  8 10:53:21 localhost kernel: [   23.749492] host10: rport 0004e0: Received 
RTV request
Aug  8 10:53:23 localhost kernel: [   25.204559] host10: rport 0004e0: Port 
timeout, state PRLI
Aug  8 10:

Re: FCOE vn2vn memory leaks in 4.14

2018-07-31 Thread ard
Hi,

On Tue, Jul 31, 2018 at 10:38:06AM +0200, Johannes Thumshirn wrote:
> So I've fixed one use-after-free and one memory leak, but the one you
> reported is still on the TODO list.

Wow, thanks...

> Long story short, I can reproduce it here and I'm working on it.
> 
> Thanks for your patience,
Thank you for being so pro-active, seriously. I wanted to look at
it some more, but heatwave, no airco and a festival in between
:-(.

Anyway, I got a PC and an odroid Xu4 (ARM) for testing now.

Regards,
Ard van Breemen
-- 
.signature not found


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi,

On Thu, Jul 26, 2018 at 05:05:37PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> > The system itself is an exynos 5422 arm. It worked perfectly fine
> > with 3.10 as an Initiator, now it leaks memory the moment I
> > enable the FCoE vlan on the port.
> 
> So I had a look through the commits between v3.10 and v4.14 and this
> one sticks out:
> ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()")
> 
> While I think it is necessary to release a skb with kfree_skb() it
> still might be worth trying to revert it for a test run.

So I had a recompile for the destop (i920)
And fortunately after 2 hours he was already collecting memory
leaks.
This makes to me at least a few unknowns more clean:
1) usb vs pci nic doesn't matter.
(I am too lazy to send in:
https://github.com/ardje/linux/commit/93e0b1fec38859ff0fb6e24eab10778f5b3be289
)
2) ARM vs X86 doesn't matter

Anyway: here are the kmemleak and the dmesg after almost 2 hours:
https://github.com/hardkernel/linux/files/2233646/kmemleak-antec.txt
https://github.com/hardkernel/linux/files/2233648/dmesg.txt

Also the kmemleak.txt of the x86 seems to be more verbose:

unreferenced object 0x880196472400 (size 512):
  comm "kworker/7:2", pid 120, jiffies 4301444306 (age 1225.078s)
  hex dump (first 32 bytes):
b8 d7 7c 8d 01 88 ff ff 00 00 00 00 00 00 00 00  ..|.
05 00 00 00 08 00 00 00 52 05 30 06 1e 00 00 10  R.0.
  backtrace:
[] fc_rport_create+0x42/0x190 [libfc]
[] fcoe_ctlr_vn_add.isra.17+0x42/0x1d0 [libfcoe]
[] fcoe_ctlr_vn_recv+0x496/0xad0 [libfcoe]
[] fcoe_ctlr_recv_work+0x700/0xfb0 [libfcoe]
[] process_one_work+0x142/0x370
[] worker_thread+0x62/0x3d0
[] kthread+0x114/0x150
[] ret_from_fork+0x35/0x40
[] 0x

vs:
unreferenced object 0xe07d9b00 (size 256):
  comm "kworker/0:1", pid 97, jiffies 4294944354 (age 209914.188s)
  hex dump (first 32 bytes):
70 64 49 ec 00 00 00 00 07 00 00 00 08 00 00 00  pdI.
88 40 7f 1d 24 00 00 10 88 40 7f 1d 24 00 00 20  .@..$@..$.. 
  backtrace:
[] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
[] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
[] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
[] process_one_work+0x138/0x4bc
[] worker_thread+0x34/0x4f4
[] kthread+0x12c/0x15c
[] ret_from_fork+0x14/0x2c
[] 0x

Now the x86 dump leads me to:
http://lists.open-fcoe.org/pipermail/fcoe-devel/2013-May/012014.html

Actually already got there from my arm dump, but they are different in 
backtrace.
Anyway:
root@antec:~# grep -c fc_rport_create kmemleak.txt
44
So 44 * 512 bytes leaked in that path. And an extra thing: "it was leaked in" 
libfc and not libfcoe.
Or just like the bug report we were leaking fc_rport_priv.
But one thing I don't understand (yet) is why the fc_rport_create happens while
we already have a port.

Anyway, I will continue bug hunting. It's night, and the temperature has 
dropped to 29.8 .

Regards,
Ard

-- 
.signature not found


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi,

On Thu, Jul 26, 2018 at 03:36:22PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 03:02:14PM +0200, ard wrote:
> > Hi Guys,
> > 
> > I sent this to fcoe-devel but it might be holiday season or the
> > mailing list is abandoned as the emails concerning fcoe are
> > pretty low.
> 
> Yes, the list is defunct as I didn't get admin privileges passed by
> the old Maintainer when I took over.

That explains :-).

> Anyways, can you please enable the kernel memory leak detector [1] and
> possibly even try a more up to date (like v4.18-rc6) kernel?
> 
> [1] https://www.kernel.org/doc/html/v4.17/dev-tools/kmemleak.html

The up to date kernel would be a problem.
The kmemleak log is here:
https://github.com/hardkernel/linux/files/2218589/kmemleak.txt
Sorry that github doesn't do a preview.

The system itself is an exynos 5422 arm. It worked perfectly fine
with 3.10 as an Initiator, now it leaks memory the moment I
enable the FCoE vlan on the port.


I also have a arm v5 running 3.7.1 (intel ss4000e) that works
fine as stable target.

The arm as initiator was able to crash my D525 as target running
4.0 on the target just by mounting btrfs. The target now runs 4.3
and has been a stable target ever since.


The main issue seems to be in fcoe_ctlr.c, and that has not
really been touched except by a broomstick for generic kernel
maintenance.

What I can do is compile a 4.14 and a 4.18 kernel for my main
initiator, a desktop that has an ssd used as bcache on FCoE
drives. That desktop is turned off however due to a heatwave.
The last known working kernel was 3.18 on that system. I will
compile a new one.

> Thanks a lot,
>Johannes

Well, thank you for maintaining a life saver.

> 
> > 
> > On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
> > Date: Mon, 23 Jul 2018 14:16:31 +0200
> > From: ard 
> > Subject: FCOE vn2vn memory leaks in 4.14
> > To: fcoe-de...@open-fcoe.org
> > 
> > Hi guys,
> > 
> > After an upgrade of one of my systems from 3.10 to 4.14.55, I
> > noticed a serious memory leak.
> > As this kernel is not 100% vanilla, I started the bug report
> > here:
> > https://github.com/hardkernel/linux/issues/360
> > 
> > The essence is this:
> > I have an FCoE interface assigned to a vlan on a nic.
> > These were remnants of a test I did. The FCoE was still
> > configured, but no targets were exported to that endpoint.
> > So it would see and join multicast announcements of 2 other
> > systems, but do nothing with it.
> > This was good enoug to waste about 600MB of memory in 2 or 3
> > days.
> > Some things have changed, maybe the amount of announcements (due
> > to the heat I turn of systems), or really something in the
> > kernel. But after 1 week I really have to pro-actively reboot the
> > systeme in order to avoid OOM's.
> > I've now disabled the the FCoE vlan on the port of that system,
> > so it won't get any broadcasts.
> > No memory leaks so far.
> > The kmemleak is in that bug report, I won't mail it, since its
> > 2.5MB.
> > The gist seems to be:
> >   backtrace:
> > [] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
> > [] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
> > [] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
> > [] process_one_work+0x138/0x4bc
> > 
> > These seem to stand out:
> > root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
> > fcoe_fip_vlan_recv kmemleak.txt 
> > 1090
> > 898
> > 
> > So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
> > Except for one system that I turn off and on once a day, all other systems 
> > are
> > stable running (older kernel though).
> > 
> > The system I turnn of and on again also has some vn2vn problems and that's 
> > also
> > a 4.14 kernel.
> > (steam machine with steamos kernel, fcoe not actively used, but with a 
> > bcache
> > on one of the targets, it probably auto registers a dependency)
> > This is outside the scope of this ticket though.
> > 
> > The system with the memory leak is a system intended to run 24/7.
> > 
> > If anyone can point me to the right place, or help me...
> > 
> > Regards,
> > Ard van Breemen
> > 
> > -- 
> > .signature not found
> 
> -- 
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> 

-- 
.signature not found


FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi Guys,

I sent this to fcoe-devel but it might be holiday season or the
mailing list is abandoned as the emails concerning fcoe are
pretty low.

On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
Date: Mon, 23 Jul 2018 14:16:31 +0200
From: ard 
Subject: FCOE vn2vn memory leaks in 4.14
To: fcoe-de...@open-fcoe.org

Hi guys,

After an upgrade of one of my systems from 3.10 to 4.14.55, I
noticed a serious memory leak.
As this kernel is not 100% vanilla, I started the bug report
here:
https://github.com/hardkernel/linux/issues/360

The essence is this:
I have an FCoE interface assigned to a vlan on a nic.
These were remnants of a test I did. The FCoE was still
configured, but no targets were exported to that endpoint.
So it would see and join multicast announcements of 2 other
systems, but do nothing with it.
This was good enoug to waste about 600MB of memory in 2 or 3
days.
Some things have changed, maybe the amount of announcements (due
to the heat I turn of systems), or really something in the
kernel. But after 1 week I really have to pro-actively reboot the
systeme in order to avoid OOM's.
I've now disabled the the FCoE vlan on the port of that system,
so it won't get any broadcasts.
No memory leaks so far.
The kmemleak is in that bug report, I won't mail it, since its
2.5MB.
The gist seems to be:
  backtrace:
[] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
[] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
[] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
[] process_one_work+0x138/0x4bc

These seem to stand out:
root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
fcoe_fip_vlan_recv kmemleak.txt 
1090
898

So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
Except for one system that I turn off and on once a day, all other systems are
stable running (older kernel though).

The system I turnn of and on again also has some vn2vn problems and that's also
a 4.14 kernel.
(steam machine with steamos kernel, fcoe not actively used, but with a bcache
on one of the targets, it probably auto registers a dependency)
This is outside the scope of this ticket though.

The system with the memory leak is a system intended to run 24/7.

If anyone can point me to the right place, or help me...

Regards,
Ard van Breemen

-- 
.signature not found


Re: [PATCH v2 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-11-01 Thread Ard Biesheuvel
On 1 November 2017 at 15:45, Bart Van Assche <bart.vanass...@wdc.com> wrote:
> On Wed, 2017-11-01 at 15:17 +, Ard Biesheuvel wrote:
>> On 1 November 2017 at 14:50, Bart Van Assche <bart.vanass...@wdc.com> wrote:
>> > On Mon, 2017-10-16 at 15:49 -0700, Bart Van Assche wrote:
>> > > Use the sgl_alloc() and sgl_free() functions instead of open coding
>> > > these functions.
>> > >
>> > > Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
>> > > Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> > > Cc: Herbert Xu <herb...@gondor.apana.org.au>
>> >
>> > Ard and/or Herbert, can you please have a look at this patch and let us 
>> > know
>> > whether or not it looks fine to you?
>>
>> The patch itself does not look unreasonable, but I can't find
>> sgl_alloc() anywhere in the source tree. Given that you have cc'ed me
>> on this patch only, I can only assume that you are adding this as part
>> of the series, but without any context, I can't really review this,
>> sorry.
>
> Hello Ard,
>
> Do you expect to be Cc-ed personally or is Cc-ing the linux-crypto mailing
> list sufficient? The linux-crypto mailing list was Cc-ed for the entire patch
> series as one can see here:
> https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg28485.html.
>

I guess people's opinions may differ regarding what they want to be
cc'ed on, but in general, you should at least cc everyone on the cover
letter if you cc them on individual patches, and in my case, I'd
rather have the whole series even if only a single patch is relevant
to me.


Re: [PATCH v2 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-11-01 Thread Ard Biesheuvel
On 1 November 2017 at 14:50, Bart Van Assche <bart.vanass...@wdc.com> wrote:
> On Mon, 2017-10-16 at 15:49 -0700, Bart Van Assche wrote:
>> Use the sgl_alloc() and sgl_free() functions instead of open coding
>> these functions.
>>
>> Signed-off-by: Bart Van Assche <bart.vanass...@wdc.com>
>> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> Cc: Herbert Xu <herb...@gondor.apana.org.au>
>
> Ard and/or Herbert, can you please have a look at this patch and let us know
> whether or not it looks fine to you?
>

The patch itself does not look unreasonable, but I can't find
sgl_alloc() anywhere in the source tree. Given that you have cc'ed me
on this patch only, I can only assume that you are adding this as part
of the series, but without any context, I can't really review this,
sorry.