Re: [PATCH] mpt3sas: Swap I/O memory read value back to cpu endianness

2018-07-26 Thread Sreekanth Reddy
On Wed, Jul 25, 2018 at 8:32 PM, Andy Shevchenko
 wrote:
> On Wed, Jul 25, 2018 at 12:42 PM, Sreekanth Reddy
>  wrote:
>> Swap the I/O memory read value back to cpu endianness before storing it
>> in a data structures which are defined in the MPI headers where
>> u8 components are not defined in the endianness order.
>>
>> In this area from day one mpt3sas driver is using le32_to_cpu() &
>> cpu_to_le32() APIs. But in the patch cf6bf9710c
>> (mpt3sas: Bug fix for big endian systems) we have removed these APIs
>> before reading I/O memory which we should haven't done it. So
>> in this patch I am correcting it by adding these APIs back
>> before accessing I/O memory.
>
>> -   __u64 data_out = b;
>> +   __u64 data_out = cpu_to_le64(b);
>>
>> spin_lock_irqsave(writeq_lock, flags);
>> writel((u32)(data_out), addr);
>
> Wouldn't be the same as
>
> __raw_writel(data_out >> 0, addr);
> __raw_writel(data_out >> 32, addr + 4);
> mmiowb();
>
> ?

Yes, I can replace the writel() APIs with __raw_writel() with mmiowb()
memory barrier. Hoping this doesn't create any other side effects.

I will post new patch with this change tomorrow after doing some
testing in this area.

>
>>  _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
>>  {
>> -   writeq(b, addr);
>> +   writeq(cpu_to_le64(b), addr);
>
> Similar here
> __raw_writeq(b, addr);
> mmiowb();
>
>
>> -   writel((u32)(request[i]), &ioc->chip->Doorbell);
>> +   writel(cpu_to_le32(request[i]), &ioc->chip->Doorbell);
>
> This kind of endianess play (including above) should make sparse unhappy.
>
> Did you run it with
>
> C=1 CF=-D__CHECK_ENDIAN__
>
> ?

Yes I run it on 4.18 kernel and I don't see any error or warning for
these lines.

Thanks,
Sreekanth

>
> --
> With Best Regards,
> Andy Shevchenko


Re: [PATCH] mpt3sas: correct reset of smid while clearing scsi tracker

2018-07-26 Thread Sreekanth Reddy
On Tue, Jul 24, 2018 at 10:37 AM, Sreekanth Reddy
 wrote:
> Any update on this patch!.

Just Ping once again. Any update on this patch.

> Thanks,
> Sreekanth
>
> On Fri, Jul 20, 2018 at 5:56 PM, Sreekanth Reddy
>  wrote:
>> In mpt3sas_base_clear_st() function smid value is reseted in wrong line,
>> i.e. driver should reset smid value to zero after decrementing chain_offset
>> counter in chain_lookup table but in current code, driver is resetting smid
>> value before decrementing the chain_offset counter. which we are correcting
>> with this patch.
>>
>> v1 changelog:
>> Added memory barriers before & after atomic_set() API.
>>
>> Signed-off-by: Sreekanth Reddy 
>> ---
>>  drivers/scsi/mpt3sas/mpt3sas_base.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
>> b/drivers/scsi/mpt3sas/mpt3sas_base.c
>> index 902610d..94359d8 100644
>> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
>> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
>> @@ -1702,6 +1702,8 @@ static int mpt3sas_remove_dead_ioc_func(void *arg)
>> return NULL;
>>
>> chain_req = &ioc->chain_lookup[smid - 
>> 1].chains_per_smid[chain_offset];
>> +   /* Adding memory barrier before atomic operation. */
>> +   smp_mb__before_atomic();
>> atomic_inc(&ioc->chain_lookup[smid - 1].chain_offset);
>> return chain_req;
>>  }
>> @@ -3283,8 +3285,12 @@ void mpt3sas_base_clear_st(struct MPT3SAS_ADAPTER 
>> *ioc,
>> return;
>> st->cb_idx = 0xFF;
>> st->direct_io = 0;
>> -   st->smid = 0;
>> +   /* Adding memory barrier before atomic operation. */
>> +   smp_mb__before_atomic();
>> atomic_set(&ioc->chain_lookup[st->smid - 1].chain_offset, 0);
>> +   /* Adding memory barrier after atomic operation. */
>> +   smp_mb__after_atomic();
>> +   st->smid = 0;
>>  }
>>
>>  /**
>> --
>> 1.8.3.1
>>


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Johannes Thumshirn
>From my limited insight into this:

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Jack Wang
Bart Van Assche  于2018年7月25日周三 下午7:39写道:
>
> This patch avoids that self-removal triggers the following deadlock:
>
> ==
> WARNING: possible circular locking dependency detected
> 4.18.0-rc2-dbg+ #5 Not tainted
> --
> modprobe/6539 is trying to acquire lock:
> 8323c4cd (kn->count#202){}, at: kernfs_remove_by_name_ns+0x45/0x90
>
> but task is already holding lock:
> a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 
> [scsi_mod]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&shost->scan_mutex){+.+.}:
>__mutex_lock+0xfe/0xc70
>mutex_lock_nested+0x1b/0x20
>scsi_remove_device+0x26/0x40 [scsi_mod]
>sdev_store_delete+0x27/0x30 [scsi_mod]
>dev_attr_store+0x3e/0x50
>sysfs_kf_write+0x87/0xa0
>kernfs_fop_write+0x190/0x230
>__vfs_write+0xd2/0x3b0
>vfs_write+0x101/0x270
>ksys_write+0xab/0x120
>__x64_sys_write+0x43/0x50
>do_syscall_64+0x77/0x230
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> -> #0 (kn->count#202){}:
>lock_acquire+0xd2/0x260
>__kernfs_remove+0x424/0x4a0
>kernfs_remove_by_name_ns+0x45/0x90
>remove_files.isra.1+0x3a/0x90
>sysfs_remove_group+0x5c/0xc0
>sysfs_remove_groups+0x39/0x60
>device_remove_attrs+0x82/0xb0
>device_del+0x251/0x580
>__scsi_remove_device+0x19f/0x1d0 [scsi_mod]
>scsi_forget_host+0x37/0xb0 [scsi_mod]
>scsi_remove_host+0x9b/0x150 [scsi_mod]
>sdebug_driver_remove+0x4b/0x150 [scsi_debug]
>device_release_driver_internal+0x241/0x360
>device_release_driver+0x12/0x20
>bus_remove_device+0x1bc/0x290
>device_del+0x259/0x580
>device_unregister+0x1a/0x70
>sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
>scsi_debug_exit+0x76/0xe8 [scsi_debug]
>__x64_sys_delete_module+0x1c1/0x280
>do_syscall_64+0x77/0x230
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock(&shost->scan_mutex);
>lock(kn->count#202);
>lock(&shost->scan_mutex);
>   lock(kn->count#202);
>
>  *** DEADLOCK ***
>
> 2 locks held by modprobe/6539:
>  #0: efaf9298 (&dev->mutex){}, at: 
> device_release_driver_internal+0x68/0x360
>  #1: a6ec2c69 (&shost->scan_mutex){+.+.}, at: 
> scsi_remove_host+0x21/0x150 [scsi_mod]
>
> stack backtrace:
> CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ #5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  dump_stack+0xa4/0xf5
>  print_circular_bug.isra.34+0x213/0x221
>  __lock_acquire+0x1a7e/0x1b50
>  lock_acquire+0xd2/0x260
>  __kernfs_remove+0x424/0x4a0
>  kernfs_remove_by_name_ns+0x45/0x90
>  remove_files.isra.1+0x3a/0x90
>  sysfs_remove_group+0x5c/0xc0
>  sysfs_remove_groups+0x39/0x60
>  device_remove_attrs+0x82/0xb0
>  device_del+0x251/0x580
>  __scsi_remove_device+0x19f/0x1d0 [scsi_mod]
>  scsi_forget_host+0x37/0xb0 [scsi_mod]
>  scsi_remove_host+0x9b/0x150 [scsi_mod]
>  sdebug_driver_remove+0x4b/0x150 [scsi_debug]
>  device_release_driver_internal+0x241/0x360
>  device_release_driver+0x12/0x20
>  bus_remove_device+0x1bc/0x290
>  device_del+0x259/0x580
>  device_unregister+0x1a/0x70
>  sdebug_remove_adapter+0x8b/0xf0 [scsi_debug]
>  scsi_debug_exit+0x76/0xe8 [scsi_debug]
>  __x64_sys_delete_module+0x1c1/0x280
>  do_syscall_64+0x77/0x230
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> See also 
> https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html.
>
> Suggested-by: Eric W. Biederman 
> Fixes: ac0ece9174ac ("scsi: use device_remove_file_self() instead of 
> device_schedule_callback()")
> Signed-off-by: Bart Van Assche 
> Cc: Eric W. Biederman 
> Cc: Tejun Heo 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> Cc: Ingo Molnar 
> Cc: Oleg Nesterov 
> Cc: 
Looks good to me!
Reviewed-by: Jack Wang 


FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi Guys,

I sent this to fcoe-devel but it might be holiday season or the
mailing list is abandoned as the emails concerning fcoe are
pretty low.

On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
Date: Mon, 23 Jul 2018 14:16:31 +0200
From: ard 
Subject: FCOE vn2vn memory leaks in 4.14
To: fcoe-de...@open-fcoe.org

Hi guys,

After an upgrade of one of my systems from 3.10 to 4.14.55, I
noticed a serious memory leak.
As this kernel is not 100% vanilla, I started the bug report
here:
https://github.com/hardkernel/linux/issues/360

The essence is this:
I have an FCoE interface assigned to a vlan on a nic.
These were remnants of a test I did. The FCoE was still
configured, but no targets were exported to that endpoint.
So it would see and join multicast announcements of 2 other
systems, but do nothing with it.
This was good enoug to waste about 600MB of memory in 2 or 3
days.
Some things have changed, maybe the amount of announcements (due
to the heat I turn of systems), or really something in the
kernel. But after 1 week I really have to pro-actively reboot the
systeme in order to avoid OOM's.
I've now disabled the the FCoE vlan on the port of that system,
so it won't get any broadcasts.
No memory leaks so far.
The kmemleak is in that bug report, I won't mail it, since its
2.5MB.
The gist seems to be:
  backtrace:
[] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
[] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
[] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
[] process_one_work+0x138/0x4bc

These seem to stand out:
root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
fcoe_fip_vlan_recv kmemleak.txt 
1090
898

So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
Except for one system that I turn off and on once a day, all other systems are
stable running (older kernel though).

The system I turnn of and on again also has some vn2vn problems and that's also
a 4.14 kernel.
(steam machine with steamos kernel, fcoe not actively used, but with a bcache
on one of the targets, it probably auto registers a dependency)
This is outside the scope of this ticket though.

The system with the memory leak is a system intended to run 24/7.

If anyone can point me to the right place, or help me...

Regards,
Ard van Breemen

-- 
.signature not found


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Tejun Heo
Hello,

ISTR giving the same feedback before.

On Wed, Jul 25, 2018 at 10:38:28AM -0700, Bart Van Assche wrote:
> +struct remove_dev_work {
> + struct callback_headhead;
> + struct scsi_device  *sdev;
> +};
> +
> +static void delete_sdev(struct callback_head *head)
> +{
> + struct remove_dev_work *work = container_of(head, typeof(*work), head);
> + struct scsi_device *sdev = work->sdev;
> +
> + scsi_remove_device(sdev);
> + kfree(work);
> + scsi_device_put(sdev);
> +}
> +
>  static ssize_t
>  sdev_store_delete(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
>  {
> - if (device_remove_file_self(dev, attr))
> - scsi_remove_device(to_scsi_device(dev));
> - return count;
> -};
> + struct scsi_device *sdev = to_scsi_device(dev);
> + struct remove_dev_work *work;
> + int ret;
> +
> + ret = scsi_device_get(sdev);
> + if (ret < 0)
> + goto out;
> + ret = -ENOMEM;
> + work = kmalloc(sizeof(*work), GFP_KERNEL);
> + if (!work)
> + goto put;
> + work->head.func = delete_sdev;
> + work->sdev = sdev;
> + ret = task_work_add(current, &work->head, false);
> + if (ret < 0)
> + goto free;
> + ret = count;
> +
> +out:
> + return ret;
> +
> +free:
> + kfree(work);
> +
> +put:
> + scsi_device_put(sdev);
> + goto out;
> +}

Making removal asynchronous this way sometimes causes issues because
whether the user sees the device released or not is racy.
kernfs/sysfs have mechanisms to deal with these cases - remove_self
and kernfs_break_active_protection().  Have you looked at those?

Thanks.

-- 
tejun


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread Johannes Thumshirn
On Thu, Jul 26, 2018 at 03:02:14PM +0200, ard wrote:
> Hi Guys,
> 
> I sent this to fcoe-devel but it might be holiday season or the
> mailing list is abandoned as the emails concerning fcoe are
> pretty low.

Yes, the list is defunct as I didn't get admin privileges passed by
the old Maintainer when I took over.

Anyways, can you please enable the kernel memory leak detector [1] and
possibly even try a more up to date (like v4.18-rc6) kernel?

[1] https://www.kernel.org/doc/html/v4.17/dev-tools/kmemleak.html

Thanks a lot,
   Johannes

> 
> On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
> Date: Mon, 23 Jul 2018 14:16:31 +0200
> From: ard 
> Subject: FCOE vn2vn memory leaks in 4.14
> To: fcoe-de...@open-fcoe.org
> 
> Hi guys,
> 
> After an upgrade of one of my systems from 3.10 to 4.14.55, I
> noticed a serious memory leak.
> As this kernel is not 100% vanilla, I started the bug report
> here:
> https://github.com/hardkernel/linux/issues/360
> 
> The essence is this:
> I have an FCoE interface assigned to a vlan on a nic.
> These were remnants of a test I did. The FCoE was still
> configured, but no targets were exported to that endpoint.
> So it would see and join multicast announcements of 2 other
> systems, but do nothing with it.
> This was good enoug to waste about 600MB of memory in 2 or 3
> days.
> Some things have changed, maybe the amount of announcements (due
> to the heat I turn of systems), or really something in the
> kernel. But after 1 week I really have to pro-actively reboot the
> systeme in order to avoid OOM's.
> I've now disabled the the FCoE vlan on the port of that system,
> so it won't get any broadcasts.
> No memory leaks so far.
> The kmemleak is in that bug report, I won't mail it, since its
> 2.5MB.
> The gist seems to be:
>   backtrace:
> [] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
> [] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
> [] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
> [] process_one_work+0x138/0x4bc
> 
> These seem to stand out:
> root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
> fcoe_fip_vlan_recv kmemleak.txt 
> 1090
> 898
> 
> So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
> Except for one system that I turn off and on once a day, all other systems are
> stable running (older kernel though).
> 
> The system I turnn of and on again also has some vn2vn problems and that's 
> also
> a 4.14 kernel.
> (steam machine with steamos kernel, fcoe not actively used, but with a bcache
> on one of the targets, it probably auto registers a dependency)
> This is outside the scope of this ticket though.
> 
> The system with the memory leak is a system intended to run 24/7.
> 
> If anyone can point me to the right place, or help me...
> 
> Regards,
> Ard van Breemen
> 
> -- 
> .signature not found

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH] mpt3sas: Swap I/O memory read value back to cpu endianness

2018-07-26 Thread Andy Shevchenko
On Thu, Jul 26, 2018 at 2:25 PM, Sreekanth Reddy
 wrote:
> On Wed, Jul 25, 2018 at 8:32 PM, Andy Shevchenko
>  wrote:
>> On Wed, Jul 25, 2018 at 12:42 PM, Sreekanth Reddy
>>  wrote:
>>> Swap the I/O memory read value back to cpu endianness before storing it
>>> in a data structures which are defined in the MPI headers where
>>> u8 components are not defined in the endianness order.
>>>
>>> In this area from day one mpt3sas driver is using le32_to_cpu() &
>>> cpu_to_le32() APIs. But in the patch cf6bf9710c
>>> (mpt3sas: Bug fix for big endian systems) we have removed these APIs
>>> before reading I/O memory which we should haven't done it. So
>>> in this patch I am correcting it by adding these APIs back
>>> before accessing I/O memory.
>>
>>> -   __u64 data_out = b;
>>> +   __u64 data_out = cpu_to_le64(b);
>>>
>>> spin_lock_irqsave(writeq_lock, flags);
>>> writel((u32)(data_out), addr);
>>
>> Wouldn't be the same as
>>
>> __raw_writel(data_out >> 0, addr);
>> __raw_writel(data_out >> 32, addr + 4);
>> mmiowb();
>>
>> ?
>
> Yes, I can replace the writel() APIs with __raw_writel() with mmiowb()
> memory barrier. Hoping this doesn't create any other side effects.
>
> I will post new patch with this change tomorrow after doing some
> testing in this area.

Thanks!

>
>>
>>>  _base_writeq(__u64 b, volatile void __iomem *addr, spinlock_t *writeq_lock)
>>>  {
>>> -   writeq(b, addr);
>>> +   writeq(cpu_to_le64(b), addr);
>>
>> Similar here
>> __raw_writeq(b, addr);
>> mmiowb();
>>
>>
>>> -   writel((u32)(request[i]), &ioc->chip->Doorbell);
>>> +   writel(cpu_to_le32(request[i]), &ioc->chip->Doorbell);
>>
>> This kind of endianess play (including above) should make sparse unhappy.
>>
>> Did you run it with
>>
>> C=1 CF=-D__CHECK_ENDIAN__
>>
>> ?
>
> Yes I run it on 4.18 kernel and I don't see any error or warning for
> these lines.

If you try on x86 I'm pretty sure you get some warnings, especially
taken into consideration [1].

[1]:

commit 6469a0ee0a06b2ea1f5afbb1d5a3feed017d4c7a (tip/x86-asm-for-linus)
Author: Andy Shevchenko 
Date:   Tue May 15 14:52:11 2018 +0300

   x86/io: Define readq()/writeq() to use 64-bit type

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Bart Van Assche
On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> Making removal asynchronous this way sometimes causes issues because
> whether the user sees the device released or not is racy.
> kernfs/sysfs have mechanisms to deal with these cases - remove_self
> and kernfs_break_active_protection().  Have you looked at those?

Hello Tejun,

The call stack in the patch description shows that sdev_store_delete() is
involved in the deadlock. The implementation of that function is as follows:

static ssize_t
sdev_store_delete(struct device *dev, struct device_attribute *attr,
  const char *buf, size_t count)
{
if (device_remove_file_self(dev, attr))
scsi_remove_device(to_scsi_device(dev));
return count;
};

device_remove_file_self() calls sysfs_remove_file_self() and that last
function calls kernfs_remove_self(). In other words, kernfs_remove_self()
is already being used. Please let me know if I misunderstood your comment.

Thanks,

Bart.


Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread t...@kernel.org
Hello,

On Thu, Jul 26, 2018 at 02:09:41PM +, Bart Van Assche wrote:
> On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> > Making removal asynchronous this way sometimes causes issues because
> > whether the user sees the device released or not is racy.
> > kernfs/sysfs have mechanisms to deal with these cases - remove_self
> > and kernfs_break_active_protection().  Have you looked at those?
> 
> Hello Tejun,
> 
> The call stack in the patch description shows that sdev_store_delete() is
> involved in the deadlock. The implementation of that function is as follows:
> 
> static ssize_t
> sdev_store_delete(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
>   if (device_remove_file_self(dev, attr))
>   scsi_remove_device(to_scsi_device(dev));
>   return count;
> };
> 
> device_remove_file_self() calls sysfs_remove_file_self() and that last
> function calls kernfs_remove_self(). In other words, kernfs_remove_self()
> is already being used. Please let me know if I misunderstood your comment.

So, here, because scsi_remove_device() is the one involved in the
circular dependency, just breaking the dependency chain on the file
itself (self removal) isn't enough.  You can wrap the whole operation
with kernfs_break_active_protection() to also move
scsi_remove_device() invocation outside the kernfs synchronization.
This will need to be piped through sysfs but shouldn't be too complex.

Thanks.

-- 
tejun


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi,

On Thu, Jul 26, 2018 at 03:36:22PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 03:02:14PM +0200, ard wrote:
> > Hi Guys,
> > 
> > I sent this to fcoe-devel but it might be holiday season or the
> > mailing list is abandoned as the emails concerning fcoe are
> > pretty low.
> 
> Yes, the list is defunct as I didn't get admin privileges passed by
> the old Maintainer when I took over.

That explains :-).

> Anyways, can you please enable the kernel memory leak detector [1] and
> possibly even try a more up to date (like v4.18-rc6) kernel?
> 
> [1] https://www.kernel.org/doc/html/v4.17/dev-tools/kmemleak.html

The up to date kernel would be a problem.
The kmemleak log is here:
https://github.com/hardkernel/linux/files/2218589/kmemleak.txt
Sorry that github doesn't do a preview.

The system itself is an exynos 5422 arm. It worked perfectly fine
with 3.10 as an Initiator, now it leaks memory the moment I
enable the FCoE vlan on the port.


I also have a arm v5 running 3.7.1 (intel ss4000e) that works
fine as stable target.

The arm as initiator was able to crash my D525 as target running
4.0 on the target just by mounting btrfs. The target now runs 4.3
and has been a stable target ever since.


The main issue seems to be in fcoe_ctlr.c, and that has not
really been touched except by a broomstick for generic kernel
maintenance.

What I can do is compile a 4.14 and a 4.18 kernel for my main
initiator, a desktop that has an ssd used as bcache on FCoE
drives. That desktop is turned off however due to a heatwave.
The last known working kernel was 3.18 on that system. I will
compile a new one.

> Thanks a lot,
>Johannes

Well, thank you for maintaining a life saver.

> 
> > 
> > On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
> > Date: Mon, 23 Jul 2018 14:16:31 +0200
> > From: ard 
> > Subject: FCOE vn2vn memory leaks in 4.14
> > To: fcoe-de...@open-fcoe.org
> > 
> > Hi guys,
> > 
> > After an upgrade of one of my systems from 3.10 to 4.14.55, I
> > noticed a serious memory leak.
> > As this kernel is not 100% vanilla, I started the bug report
> > here:
> > https://github.com/hardkernel/linux/issues/360
> > 
> > The essence is this:
> > I have an FCoE interface assigned to a vlan on a nic.
> > These were remnants of a test I did. The FCoE was still
> > configured, but no targets were exported to that endpoint.
> > So it would see and join multicast announcements of 2 other
> > systems, but do nothing with it.
> > This was good enoug to waste about 600MB of memory in 2 or 3
> > days.
> > Some things have changed, maybe the amount of announcements (due
> > to the heat I turn of systems), or really something in the
> > kernel. But after 1 week I really have to pro-actively reboot the
> > systeme in order to avoid OOM's.
> > I've now disabled the the FCoE vlan on the port of that system,
> > so it won't get any broadcasts.
> > No memory leaks so far.
> > The kmemleak is in that bug report, I won't mail it, since its
> > 2.5MB.
> > The gist seems to be:
> >   backtrace:
> > [] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
> > [] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
> > [] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
> > [] process_one_work+0x138/0x4bc
> > 
> > These seem to stand out:
> > root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c 
> > fcoe_fip_vlan_recv kmemleak.txt 
> > 1090
> > 898
> > 
> > So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
> > Except for one system that I turn off and on once a day, all other systems 
> > are
> > stable running (older kernel though).
> > 
> > The system I turnn of and on again also has some vn2vn problems and that's 
> > also
> > a 4.14 kernel.
> > (steam machine with steamos kernel, fcoe not actively used, but with a 
> > bcache
> > on one of the targets, it probably auto registers a dependency)
> > This is outside the scope of this ticket though.
> > 
> > The system with the memory leak is a system intended to run 24/7.
> > 
> > If anyone can point me to the right place, or help me...
> > 
> > Regards,
> > Ard van Breemen
> > 
> > -- 
> > .signature not found
> 
> -- 
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> 

-- 
.signature not found


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread Johannes Thumshirn
On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> > Anyways, can you please enable the kernel memory leak detector [1] and
> > possibly even try a more up to date (like v4.18-rc6) kernel?
> > 
> > [1] https://www.kernel.org/doc/html/v4.17/dev-tools/kmemleak.html
> 
> The up to date kernel would be a problem.
> The kmemleak log is here:
> https://github.com/hardkernel/linux/files/2218589/kmemleak.txt
> Sorry that github doesn't do a preview.

No Problem, I can see the fcoe leaks (+ others)

> 
> The system itself is an exynos 5422 arm. It worked perfectly fine
> with 3.10 as an Initiator, now it leaks memory the moment I
> enable the FCoE vlan on the port.
> 
> 
> I also have a arm v5 running 3.7.1 (intel ss4000e) that works
> fine as stable target.
> 
> The arm as initiator was able to crash my D525 as target running
> 4.0 on the target just by mounting btrfs. The target now runs 4.3
> and has been a stable target ever since.
> 
> 
> The main issue seems to be in fcoe_ctlr.c, and that has not
> really been touched except by a broomstick for generic kernel
> maintenance.
> 
> What I can do is compile a 4.14 and a 4.18 kernel for my main
> initiator, a desktop that has an ssd used as bcache on FCoE
> drives. That desktop is turned off however due to a heatwave.
> The last known working kernel was 3.18 on that system. I will
> compile a new one.

I'll setup a test environment here and try to reproduce. In the
meanwhile I have your report and try to narrow down the leak(s).

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread Johannes Thumshirn
On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> The system itself is an exynos 5422 arm. It worked perfectly fine
> with 3.10 as an Initiator, now it leaks memory the moment I
> enable the FCoE vlan on the port.

So I had a look through the commits between v3.10 and v4.14 and this
one sticks out:
ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()")

While I think it is necessary to release a skb with kfree_skb() it
still might be worth trying to revert it for a test run.

Thanks,
Johannes
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


your website photos

2018-07-26 Thread Jan

I would like to contact the person that manages your images for your
company?

We provide different services that makes your images and pictures to look
good. For the images to
be more attractive, we services such as background image cut out, clipping
path, shadow adding
(drop shadow, reflection shadow, natural shadow, mirror effect), image
masking, product image editing.
Image Manipulation / Clothes Neck-Joint and image retouching.

Also, we also use the most recent application as well as techniques such as
Adobe Photoshop.

The following are the kind of services together:

Clipping Path Service
Clipping Path, Cut out image,Image Clipping, Clip image
Photo Masking Service
Channel Masking,Photo Masking, Translucent Masking
Image Cropping Service
Crop image, Photo cut out,
Image Manipulation Service
Composite image, Neck Joint
Shadow Adding Services
Drop Shadow adding, Reflection shadow,Natural shadow
Photo Retouching Service
Photo Retouching, Glamour Retouching.

Our Service is 24-48 hours but we can deliver the images sooner in case of
emergency.

We can give you editing test on your photos.
We do unlimited revisions until you are satisfied with the work.

Thanks,
Jan Williams



[PATCH] qedi: Fix a potential buffer overflow

2018-07-26 Thread Bart Van Assche
Tell snprintf() to store at most 255 characters in the output buffer
instead of 256. This patch avoids that smatch reports the following
warning:

drivers/scsi/qedi/qedi_main.c:891: qedi_get_boot_tgt_info() error: snprintf() 
is printing too much 256 vs 255

Signed-off-by: Bart Van Assche 
Cc: 
Cc: 
---
 drivers/scsi/qedi/qedi_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 682f3ce31014..ea62180d9ec8 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -888,7 +888,7 @@ static void qedi_get_boot_tgt_info(struct nvm_iscsi_block 
*block,
ipv6_en = !!(block->generic.ctrl_flags &
 NVM_ISCSI_CFG_GEN_IPV6_ENABLED);
 
-   snprintf(tgt->iscsi_name, NVM_ISCSI_CFG_ISCSI_NAME_MAX_LEN, "%s\n",
+   snprintf(tgt->iscsi_name, sizeof(tgt->iscsi_name), "%s\n",
 block->target[index].target_name.byte);
 
tgt->ipv6_en = ipv6_en;
-- 
2.18.0



Re: [PATCH, RESEND] Avoid that SCSI device removal through sysfs triggers a deadlock

2018-07-26 Thread Bart Van Assche
On Thu, 2018-07-26 at 07:14 -0700, t...@kernel.org wrote:
> Hello,
> 
> On Thu, Jul 26, 2018 at 02:09:41PM +, Bart Van Assche wrote:
> > On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> > > Making removal asynchronous this way sometimes causes issues because
> > > whether the user sees the device released or not is racy.
> > > kernfs/sysfs have mechanisms to deal with these cases - remove_self
> > > and kernfs_break_active_protection().  Have you looked at those?
> > 
> > Hello Tejun,
> > 
> > The call stack in the patch description shows that sdev_store_delete() is
> > involved in the deadlock. The implementation of that function is as follows:
> > 
> > static ssize_t
> > sdev_store_delete(struct device *dev, struct device_attribute *attr,
> >   const char *buf, size_t count)
> > {
> > if (device_remove_file_self(dev, attr))
> > scsi_remove_device(to_scsi_device(dev));
> > return count;
> > };
> > 
> > device_remove_file_self() calls sysfs_remove_file_self() and that last
> > function calls kernfs_remove_self(). In other words, kernfs_remove_self()
> > is already being used. Please let me know if I misunderstood your comment.
> 
> So, here, because scsi_remove_device() is the one involved in the
> circular dependency, just breaking the dependency chain on the file
> itself (self removal) isn't enough.  You can wrap the whole operation
> with kernfs_break_active_protection() to also move
> scsi_remove_device() invocation outside the kernfs synchronization.
> This will need to be piped through sysfs but shouldn't be too complex.

Hello Tejun,

I have tried to implement the above but my implementation triggered a kernel
warning. Can you have a look at the attached patches and see what I did wrong?

Thanks,

Bart.

The kernel warning I ran into is as follows:

kernfs_put: 6:0:0:0/delete: released with incorrect active_ref 2147483647
WARNING: CPU: 6 PID: 1014 at fs/kernfs/dir.c:527 kernfs_put+0x136/0x180
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.0.0-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:kernfs_put+0x136/0x180
Call Trace:
 evict+0xc1/0x190
 __dentry_kill+0xc4/0x150
 shrink_dentry_list+0x9e/0x1c0
 shrink_dcache_parent+0x63/0x70
 d_invalidate+0x42/0xb0
 lookup_fast+0x278/0x2a0
 walk_component+0x38/0x450
 link_path_walk+0x2a8/0x4f0
 path_openat+0x89/0x13a0
 do_filp_open+0x8c/0xf0
 do_sys_open+0x1a6/0x230
 do_syscall_64+0x4f/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
From f280e1cb31eda63c47db02574dfdfaee8a3f1dbc Mon Sep 17 00:00:00 2001
From: Bart Van Assche 
Date: Thu, 26 Jul 2018 09:23:08 -0700
Subject: [PATCH 1/3] fs/sysfs: Introduce sysfs_remove_file_self_w_cb()

This patch makes it possible to execute more code than only the
__kernfs_remove() call while the active protection is dropped.

Signed-off-by: Bart Van Assche 
Cc: Greg Kroah-Hartman 
Cc: Rafael J. Wysocki 
---
 fs/sysfs/file.c   | 17 -
 include/linux/sysfs.h | 16 
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 5c13f29bfcdb..db9386070571 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -429,7 +429,12 @@ EXPORT_SYMBOL_GPL(sysfs_remove_file_ns);
  *
  * See kernfs_remove_self() for details.
  */
-bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr)
+bool sysfs_remove_file_self_w_cb(struct kobject *kobj,
+ const struct attribute *attr,
+ void (*cb)(struct kobject *kobj,
+	const struct attribute *attr,
+	void *data),
+ void *data)
 {
 	struct kernfs_node *parent = kobj->sd;
 	struct kernfs_node *kn;
@@ -440,11 +445,21 @@ bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr)
 		return false;
 
 	ret = kernfs_remove_self(kn);
+	if (ret && cb) {
+		kernfs_break_active_protection(kn);
+		cb(kobj, attr, data);
+		kernfs_break_active_protection(kn);
+	}
 
 	kernfs_put(kn);
 	return ret;
 }
 
+bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr)
+{
+	return sysfs_remove_file_self_w_cb(kobj, attr, NULL, NULL);
+}
+
 void sysfs_remove_files(struct kobject *kobj, const struct attribute **ptr)
 {
 	int i;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index b8bfdc173ec0..3c954d677736 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -239,6 +239,12 @@ int __must_check sysfs_chmod_file(struct kobject *kobj,
   const struct attribute *attr, umode_t mode);
 void sysfs_remove_file_ns(struct kobject *kobj, const struct attribute *attr,
 			  const void *ns);
+bool sysfs_remove_file_self_w_cb(struct kobject *kobj,
+ const struct attribute *attr,
+ void (*cb)(struct kobject *kobj,
+	const struct attribute *attr,
+	void *),
+ void *data);
 bool sysfs_remove_file_self(struct kobject *kobj, const struct attribute *attr);
 void sysfs_remove_files(struct kobject *kobj, const struct attribute **attr);
 
@@ -356,6 +362,16 @@ static inline void sysfs

Re: FCOE vn2vn memory leaks in 4.14

2018-07-26 Thread ard
Hi,

On Thu, Jul 26, 2018 at 05:05:37PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> > The system itself is an exynos 5422 arm. It worked perfectly fine
> > with 3.10 as an Initiator, now it leaks memory the moment I
> > enable the FCoE vlan on the port.
> 
> So I had a look through the commits between v3.10 and v4.14 and this
> one sticks out:
> ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()")
> 
> While I think it is necessary to release a skb with kfree_skb() it
> still might be worth trying to revert it for a test run.

So I had a recompile for the destop (i920)
And fortunately after 2 hours he was already collecting memory
leaks.
This makes to me at least a few unknowns more clean:
1) usb vs pci nic doesn't matter.
(I am too lazy to send in:
https://github.com/ardje/linux/commit/93e0b1fec38859ff0fb6e24eab10778f5b3be289
)
2) ARM vs X86 doesn't matter

Anyway: here are the kmemleak and the dmesg after almost 2 hours:
https://github.com/hardkernel/linux/files/2233646/kmemleak-antec.txt
https://github.com/hardkernel/linux/files/2233648/dmesg.txt

Also the kmemleak.txt of the x86 seems to be more verbose:

unreferenced object 0x880196472400 (size 512):
  comm "kworker/7:2", pid 120, jiffies 4301444306 (age 1225.078s)
  hex dump (first 32 bytes):
b8 d7 7c 8d 01 88 ff ff 00 00 00 00 00 00 00 00  ..|.
05 00 00 00 08 00 00 00 52 05 30 06 1e 00 00 10  R.0.
  backtrace:
[] fc_rport_create+0x42/0x190 [libfc]
[] fcoe_ctlr_vn_add.isra.17+0x42/0x1d0 [libfcoe]
[] fcoe_ctlr_vn_recv+0x496/0xad0 [libfcoe]
[] fcoe_ctlr_recv_work+0x700/0xfb0 [libfcoe]
[] process_one_work+0x142/0x370
[] worker_thread+0x62/0x3d0
[] kthread+0x114/0x150
[] ret_from_fork+0x35/0x40
[] 0x

vs:
unreferenced object 0xe07d9b00 (size 256):
  comm "kworker/0:1", pid 97, jiffies 4294944354 (age 209914.188s)
  hex dump (first 32 bytes):
70 64 49 ec 00 00 00 00 07 00 00 00 08 00 00 00  pdI.
88 40 7f 1d 24 00 00 10 88 40 7f 1d 24 00 00 20  .@..$@..$.. 
  backtrace:
[] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
[] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
[] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
[] process_one_work+0x138/0x4bc
[] worker_thread+0x34/0x4f4
[] kthread+0x12c/0x15c
[] ret_from_fork+0x14/0x2c
[] 0x

Now the x86 dump leads me to:
http://lists.open-fcoe.org/pipermail/fcoe-devel/2013-May/012014.html

Actually already got there from my arm dump, but they are different in 
backtrace.
Anyway:
root@antec:~# grep -c fc_rport_create kmemleak.txt
44
So 44 * 512 bytes leaked in that path. And an extra thing: "it was leaked in" 
libfc and not libfcoe.
Or just like the bug report we were leaking fc_rport_priv.
But one thing I don't understand (yet) is why the fc_rport_create happens while
we already have a port.

Anyway, I will continue bug hunting. It's night, and the temperature has 
dropped to 29.8 .

Regards,
Ard

-- 
.signature not found


[PATCH] qla2xxx: Fix memory leak for allocating abort IOCB

2018-07-26 Thread Himanshu Madhani
From: Quinn Tran 

In the case of IOCB QFull, Initiator code can leave behind
a stale pointer to an SRB structure on the outstanding command
array.

Fixes: 82de802ad46e ("scsi: qla2xxx: Preparation for Target MQ.")
Cc: sta...@vger.kernel.org #4.16
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
---
Hi Martin, 

This patch fixes memory leak detected in the automation framework which 
triggers lots
of error condition.

Please apply this patch to 4.18/scsi-fixes at your earliest convenience.

Thanks,
Himanshu
---
 drivers/scsi/qla2xxx/qla_iocb.c | 53 +
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c b/drivers/scsi/qla2xxx/qla_iocb.c
index a91cca52b5d5..dd93a22fe843 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -2130,34 +2130,11 @@ __qla2x00_alloc_iocbs(struct qla_qpair *qpair, srb_t 
*sp)
req_cnt = 1;
handle = 0;
 
-   if (!sp)
-   goto skip_cmd_array;
-
-   /* Check for room in outstanding command list. */
-   handle = req->current_outstanding_cmd;
-   for (index = 1; index < req->num_outstanding_cmds; index++) {
-   handle++;
-   if (handle == req->num_outstanding_cmds)
-   handle = 1;
-   if (!req->outstanding_cmds[handle])
-   break;
-   }
-   if (index == req->num_outstanding_cmds) {
-   ql_log(ql_log_warn, vha, 0x700b,
-   "No room on outstanding cmd array.\n");
-   goto queuing_error;
-   }
-
-   /* Prep command array. */
-   req->current_outstanding_cmd = handle;
-   req->outstanding_cmds[handle] = sp;
-   sp->handle = handle;
-
-   /* Adjust entry-counts as needed. */
-   if (sp->type != SRB_SCSI_CMD)
+   if (sp && (sp->type != SRB_SCSI_CMD)) {
+   /* Adjust entry-counts as needed. */
req_cnt = sp->iocbs;
+   }
 
-skip_cmd_array:
/* Check for room on request queue. */
if (req->cnt < req_cnt + 2) {
if (qpair->use_shadow_reg)
@@ -2183,6 +2160,28 @@ __qla2x00_alloc_iocbs(struct qla_qpair *qpair, srb_t *sp)
if (req->cnt < req_cnt + 2)
goto queuing_error;
 
+   if (sp) {
+   /* Check for room in outstanding command list. */
+   handle = req->current_outstanding_cmd;
+   for (index = 1; index < req->num_outstanding_cmds; index++) {
+   handle++;
+   if (handle == req->num_outstanding_cmds)
+   handle = 1;
+   if (!req->outstanding_cmds[handle])
+   break;
+   }
+   if (index == req->num_outstanding_cmds) {
+   ql_log(ql_log_warn, vha, 0x700b,
+   "No room on outstanding cmd array.\n");
+   goto queuing_error;
+   }
+
+   /* Prep command array. */
+   req->current_outstanding_cmd = handle;
+   req->outstanding_cmds[handle] = sp;
+   sp->handle = handle;
+   }
+
/* Prep packet */
req->cnt -= req_cnt;
pkt = req->ring_ptr;
@@ -2195,6 +2194,8 @@ __qla2x00_alloc_iocbs(struct qla_qpair *qpair, srb_t *sp)
pkt->handle = handle;
}
 
+   return pkt;
+
 queuing_error:
qpair->tgt_counters.num_alloc_iocb_failed++;
return pkt;
-- 
2.12.0