Re: [dm-devel] md/dm-mpath: check whether all pgpaths have same uuid in multipath_ctr()

2021-03-26 Thread Martin Wilck
On Thu, 2021-03-25 at 11:14 -0400, Mike Snitzer wrote:
> On Wed, Mar 24 2021 at  9:21pm -0400,
> Zhiqiang Liu  wrote:
> 
> > 
> > 
> > On 2021/3/22 22:22, Mike Snitzer wrote:
> > > On Mon, Mar 22 2021 at  4:11am -0400,
> > > Christoph Hellwig  wrote:
> > > 
> > > > On Sat, Mar 20, 2021 at 03:19:23PM +0800, Zhiqiang Liu wrote:
> > > > > From: Zhiqiang Liu 
> > > > > 
> > > > > When we make IO stress test on multipath device, there will
> > > > > be a metadata err because of wrong path. In the test, we
> > > > > concurrent execute 'iscsi device login|logout' and
> > > > > 'multipath -r' command with IO stress on multipath device.
> > > > > In some case, systemd-udevd may have not time to process
> > > > > uevents of iscsi device logout|login, and then 'multipath -r'
> > > > > command triggers multipathd daemon calls ioctl to load table
> > > > > with incorrect old device info from systemd-udevd.
> > > > > Then, one iscsi path may be incorrectly attached to another
> > > > > multipath which has different uuid. Finally, the metadata err
> > > > > occurs when umounting filesystem to down write metadata on
> > > > > the iscsi device which is actually not owned by the multipath
> > > > > device.
> > > > > 
> > > > > So we need to check whether all pgpaths of one multipath have
> > > > > the same uuid, if not, we should throw a error.
> > > > > 
> > > > > Signed-off-by: Zhiqiang Liu 
> > > > > Signed-off-by: lixiaokeng 
> > > > > Signed-off-by: linfeilong 
> > > > > Signed-off-by: Wubo 
> > > > > ---
> > > > >  drivers/md/dm-mpath.c   | 52
> > > > > +
> > > > >  drivers/scsi/scsi_lib.c |  1 +
> > > > >  2 files changed, 53 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > > index bced42f082b0..f0b995784b53 100644
> > > > > --- a/drivers/md/dm-mpath.c
> > > > > +++ b/drivers/md/dm-mpath.c
> > > > > @@ -24,6 +24,7 @@
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > +#include 
> > > > >  #include 
> > > > >  #include 
> > > > > 
> > > > > @@ -1169,6 +1170,45 @@ static int parse_features(struct
> > > > > dm_arg_set *as, struct multipath *m)
> > > > > return r;
> > > > >  }
> > > > > 
> > > > > +#define SCSI_VPD_LUN_ID_PREFIX_LEN 4
> > > > > +#define MPATH_UUID_PREFIX_LEN 7
> > > > > +static int check_pg_uuid(struct priority_group *pg, char
> > > > > *md_uuid)
> > > > > +{
> > > > > +   char pgpath_uuid[DM_UUID_LEN] = {0};
> > > > > +   struct request_queue *q;
> > > > > +   struct pgpath *pgpath;
> > > > > +   struct scsi_device *sdev;
> > > > > +   ssize_t count;
> > > > > +   int r = 0;
> > > > > +
> > > > > +   list_for_each_entry(pgpath, >pgpaths, list) {
> > > > > +   q = bdev_get_queue(pgpath->path.dev->bdev);
> > > > > +   sdev = scsi_device_from_queue(q);
> > > > 
> > > > Common dm-multipath code should never poke into scsi
> > > > internals.  This
> > > > is something for the device handler to check.  It probably also
> > > > won't
> > > > work for all older devices.
> > > 
> > > Definitely.
> > > 
> > > But that aside, userspace (multipathd) _should_ be able to do
> > > extra
> > > validation, _before_ pushing down a new table to the kernel,
> > > rather than
> > > forcing the kernel to do it.
> > 
> > As your said, it is better to do extra validation in userspace
> > (multipathd).
> > However, in some cases, the userspace cannot see the real-time
> > present devices
> > info as Martin (committer of multipath-tools) said.
> > In addition, the kernel can see right device info in the table at
> > any time,
> > so the uuid check in kernel can ensure one multipath is composed
> > with paths mapped to
> > the same device.
> > 
> > Considering the severity of the wrong path in multipath, I think it
> > worths more
> > checking.
> 
> As already said: this should be fixable in userspace.  Please work
> with
> multipath-tools developers to address this.

I agree this patch won't help, because the kernel doesn't (re)attach
devices to multipath maps by itself. If multipathd actively adds a
device to a map, it must check the WWID beforehand, and so it does (and
has been doing so for years).

But in general, it's hard to avoid WWID mismatches entirely in user
space. We have no problem if a device is removed an re-added. But if it
looks like a device just having been offline or unreachable for some
time and then reappear, it gets tricky. We might even miss the fact
that the device was temporarily away. multipathd can't constantly poll
devices just to detect changes - and what if the sysfs vpd attributes
stay the same because the kernel didn't even notice?

It would be great if userspace could rely on the kernel to deliver
events in such cases. I want look into monitoring SCSI UNIT ATTENTION
events, which multipathd currently doesn't. That might cover many
situations. But I've been told that in some situations really no event
arrived in user space, 

Re: [PATCH v2] block: Suppress uevent for hidden device when removed

2021-03-11 Thread Martin Wilck
On Thu, 2021-03-11 at 16:19 +0100, Daniel Wagner wrote:
> register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN
> but enables uevents at the end again in order to announce disk after
> possible partitions are created.
> 
> When the device is removed the uevents are still on and user land
> sees
> 'remove' messages for devices which were never 'add'ed to the system.
> 
>   KERNEL[95481.571887] remove   /devices/virtual/nvme-
> fabrics/ctl/nvme5/nvme0c5n1 (block)
> 
> Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the
> uevents at all.
> 
> Signed-off-by: Daniel Wagner 

Reviewed-by: Martin Wilck 



Re: [PATCH] block: Suppress uevent for hidden device when removed

2021-03-03 Thread Martin Wilck
On Wed, 2021-03-03 at 18:12 +0100, Daniel Wagner wrote:
> register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN
> but enables uevents at the end again in order to announce disk after
> possible partitions are created.
> 
> When the device is removed the uevents are still on and user land
> sees
> 'remove' messages for devices which were never 'add'ed to the system.
> 
>   KERNEL[95481.571887] remove   /devices/virtual/nvme-
> fabrics/ctl/nvme5/nvme0c5n1 (block)
> 
> Let's suppress the uevents for GENHD_FL_HIDDEN again before calling
> device_del() which will write trigger uevents.
> 
> Signed-off-by: Daniel Wagner 
> ---
>  block/genhd.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index c55e8f0fced1..ab9ed355bdef 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -731,6 +731,9 @@ void del_gendisk(struct gendisk *disk)
> if (!sysfs_deprecated)
> sysfs_remove_link(block_depr,
> dev_name(disk_to_dev(disk)));
> pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
> +
> +   if (disk->flags & GENHD_FL_HIDDEN)
> +   dev_set_uevent_suppress(disk_to_dev(disk), 1);
> device_del(disk_to_dev(disk));
>  }
>  EXPORT_SYMBOL(del_gendisk);

I wonder if it wouldn't be wiser to remove this code

if (disk->flags & GENHD_FL_HIDDEN) {
dev_set_uevent_suppress(ddev, 0);
return;
}

from register_disk(). The way you did it now, we would receive neither
"add" nor "remove" events in user space, but there might be change
events in between ?

This said, I'd like to raise the question whether it was the right
decision to suppress these uevents in the first place. 8ddcd653257c
("block: introduce GENHD_FL_HIDDEN") simply stated that they aren't
registered as usable block devices. Maybe the kernel should leave the
decision whether or not they are interesting to user space itself?

For example, I've written an extension for multipath-tools some time
ago which displays the topology of NVMe native multipath devices, and
uses sysfs data from the hidden NVMe private namespaces for that
purpose. Not receiving  uevents for them makes it practically
impossible to track the status of these devices.

Regards,
Martin




Re: linux-next: manual merge of the rdma tree with Linus' tree

2021-02-11 Thread Martin Wilck
On Wed, 2021-02-10 at 22:08 +, Pearson, Robert B wrote:
> Looks perfect. Thanks.

+1

Thanks everyone.




Re: [PATCH v2 0/4] qla2xxx: A couple crash fixes

2020-08-31 Thread Martin Wilck
On Mon, 2020-08-31 at 18:18 +0200, Daniel Wagner wrote:
> changes since v1:
> 
>  - added dummy warn function to patch#1
>  - added log entry to patch#4
> 
> as suggested by Martin
> 
> 
> 
> Initial cover letter:
> 
> The first crash we observed is due memory corruption in the srb
> memory
> pool. Unforuntatly, I couldn't find the source of the problem but the
> workaround by resetting the cleanup callbacks 'fixes' this problem
> (patch #1). I think as intermeditate step this should be merged until
> the real cause can be identified.
> 
> The second crash is due a race condition(?) in the firmware. The sts
> entries are not updated in time which leads to this crash pattern
> which several customers have reported:
> 
>  #0 [c0d1bb80] scsi_dma_unmap at d0001e4904d4 [scsi_mod]
>  #1 [c0d1bbe0] qla2x00_sp_compl at d000204803cc [qla2xxx]
>  #2 [c0d1bc20] qla24xx_process_response_queue at
> d000204c5810 [qla2xxx]
>  #3 [c0d1bd50] qla24xx_msix_rsp_q at d000204c8fd8
> [qla2xxx]
>  #4 [c0d1bde0] __handle_irq_event_percpu at c0189510
>  #5 [c0d1bea0] handle_irq_event_percpu at c018978c
>  #6 [c0d1bee0] handle_irq_event at c018984c
>  #7 [c0d1bf10] handle_fasteoi_irq at c018efc0
>  #8 [c0d1bf40] generic_handle_irq at c0187f10
>  #9 [c0d1bf60] __do_irq at c0018784
>  #10 [c0d1bf90] call_do_irq at c002caa4
>  #11 [c0ecca417a00] do_IRQ at c0018970
>  #12 [c0ecca417a50] restore_check_irq_replay at c000de98
> 
> From analyzing the crash dump it was clear that
> qla24xx_mbx_iocb_entry() calls sp->done (qla2x00_sp_compl) which
> crashes because the response is not a mailbox entry, it is a status
> entry. Patch #4 changes the process logic for mailbox commands so
> that
> the sp is parsed before calling the correct proccess function.
> 
> Thanks,
> Daniel
> 
> Daniel Wagner (4):
>   qla2xxx: Warn if done() or free() are called on an already freed
> srb
>   qla2xxx: Simplify return value logic in
> qla2x00_get_sp_from_handle()
>   qla2xxx: Drop unused function argument from
> qla2x00_get_sp_from_handle()
>   qla2xxx: Handle incorrect entry_type entries
> 
>  drivers/scsi/qla2xxx/qla_gbl.h|  3 +-
>  drivers/scsi/qla2xxx/qla_init.c   | 10 ++
>  drivers/scsi/qla2xxx/qla_inline.h |  5 +++
>  drivers/scsi/qla2xxx/qla_isr.c| 74 +++
> 
>  drivers/scsi/qla2xxx/qla_mr.c |  9 ++---
>  5 files changed, 62 insertions(+), 39 deletions(-)
> 

For the set:
Reviewed-by: Martin Wilck 





Re: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

2020-08-27 Thread Martin Wilck
On Thu, 2020-08-27 at 13:46 +0200, Daniel Wagner wrote:
> On Thu, Aug 27, 2020 at 12:17:13PM +0200, Martin Wilck wrote:
> > Should we perhaps log an error message when we detect a mismatch
> > between sp->type and entry_type?
> 
> Sure can do, but does it really help? Not much we can do in the
> driver. I hope the firmware gets fixed eventually. I am not against
> it,
> just not sure if the log entry really is helping except saying 'you
> are
> using a firmware with a known issue'.
> 

... which might provide insightful, to users as well as perhaps
developers (by observing under which conditions this problem occurs).
I'd hope so, at least. But you know this issue much better than me.

Regards,
Martin




Re: [PATCH 4/4] qla2xxx: Handle incorrect entry_type entries

2020-08-27 Thread Martin Wilck
On Thu, 2020-08-27 at 11:58 +0200, Daniel Wagner wrote:
> It was observed on an ISP8324 16Gb HBA with fw=8.08.203 (d0d5) that
> pkt->entry_type was MBX_IOCB_TYPE/0x39 with an sp->type SRB_SCSI_CMD
> which is invalid and should not be possible.
> 
> A careful code review of the crash dump didn't reveal any short
> comings. Reading the entry_type from the crash dump shows the
> expected
> value of STATUS_TYPE/0x03 but the call trace shows that
> qla24xx_mbx_iocb_entry() is used.
> 
> One possible explanation is when pkt->entry_type is read it doesn't
> contain the correct information. That means the driver observes an
> data
> race by the firmware.
> 
> Signed-off-by: Daniel Wagner 
> ---
>  drivers/scsi/qla2xxx/qla_isr.c | 28 ++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c
> b/drivers/scsi/qla2xxx/qla_isr.c
> index b787643f5031..0c324e88b189 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -3392,6 +3392,31 @@ void qla24xx_nvme_ls4_iocb(struct
> scsi_qla_host *vha,
>   sp->done(sp, comp_status);
>  }
>  
> +static void qla24xx_process_mbx_iocb_response(struct scsi_qla_host
> *vha,
> + struct rsp_que *rsp, struct sts_entry_24xx *pkt)
> +{
> + srb_t *sp;
> +
> + sp = qla2x00_get_sp_from_handle(vha, rsp->req, pkt);
> + if (!sp)
> + return;
> +
> + if (sp->type == SRB_SCSI_CMD ||
> + sp->type == SRB_NVME_CMD ||
> + sp->type == SRB_TM_CMD) {
> + /* Some firmware version don't update the entry_type
> +  * correctly.  It was observed entry_type contained
> +  * MBCX_IOCB_TYPE instead of the expected STATUS_TYPE
> +  * for sp->type SRB_SCSI_CMD, SRB_NVME_CMD or
> +  * SRB_TM_CMD.
> +  */
> + qla2x00_status_entry(vha, rsp, pkt);
> + return;
> + }
> +
> + qla24xx_mbx_iocb_entry(vha, rsp->req, (struct mbx_24xx_entry
> *)pkt);
> +}
> +
>  /**
>   * qla24xx_process_response_queue() - Process response queue
> entries.
>   * @vha: SCSI driver HA context
> @@ -3499,8 +3524,7 @@ void qla24xx_process_response_queue(struct
> scsi_qla_host *vha,
>   (struct abort_entry_24xx *)pkt);
>   break;
>   case MBX_IOCB_TYPE:
> - qla24xx_mbx_iocb_entry(vha, rsp->req,
> - (struct mbx_24xx_entry *)pkt);
> + qla24xx_process_mbx_iocb_response(vha, rsp,
> pkt);
>   break;
>   case VP_CTRL_IOCB_TYPE:
>   qla_ctrlvp_completed(vha, rsp->req,

Should we perhaps log an error message when we detect a mismatch
between sp->type and entry_type?

Regards,
Martin




Re: [PATCH 1/4] qla2xxx: Reset done and free callback pointer on release

2020-08-27 Thread Martin Wilck
On Thu, 2020-08-27 at 11:58 +0200, Daniel Wagner wrote:
> Reset ->done and ->free when releasing the srb. There is a hidden
> use-after-free bug in the driver which corrupts the srb memory pool
> which originates from the cleanup callbacks. By explicitly resetting
> the callbacks to NULL, we workaround the memory corruption.
> 
> An extensive search didn't bring any lights on the real problem. The
> initial idea was to set both pointers to NULL and try to catch
> invalid
> accesses. But instead the memory corruption was gone and the driver
> didn't crash.
> 
> Signed-off-by: Daniel Wagner 
> ---
>  drivers/scsi/qla2xxx/qla_inline.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_inline.h
> b/drivers/scsi/qla2xxx/qla_inline.h
> index 861dc522723c..6d41d758fc17 100644
> --- a/drivers/scsi/qla2xxx/qla_inline.h
> +++ b/drivers/scsi/qla2xxx/qla_inline.h
> @@ -211,6 +211,8 @@ static inline void
>  qla2xxx_rel_qpair_sp(struct qla_qpair *qpair, srb_t *sp)
>  {
>   sp->qpair = NULL;
> + sp->done = NULL;
> + sp->free = NULL;
>   mempool_free(sp, qpair->srb_mempool);
>   QLA_QPAIR_MARK_NOT_BUSY(qpair);
>  }

Both sp->done() and sp->free() are called all over the place without
making sure the pointers are non-null. If these functions can be called
for freed sp's, wouldn't that mean we'd crash?

How about setting them to a dummy function that prints a big fat
warning?

Martin




Re: [PATCH AUTOSEL 5.8 70/72] nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths

2020-08-10 Thread Martin Wilck
On Sat, 2020-08-08 at 19:35 -0400, Sasha Levin wrote:
> From: Hannes Reinecke 
> 
> [ Upstream commit fbd6a42d8932e172921c7de10468a2e12c34846b ]
> 
> When nvme_round_robin_path() finds a valid namespace we should be
> using it;
> falling back to __nvme_find_path() for non-optimized paths will cause
> the
> result from nvme_round_robin_path() to be ignored for non-optimized
> paths.
> 
> Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
> Signed-off-by: Martin Wilck 
> Signed-off-by: Hannes Reinecke 
> Reviewed-by: Sagi Grimberg 
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/nvme/host/multipath.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)

Hello Sasha,

this patch needs a fix that I recently submitted to linux-nvme.
The same holds for the respective 5.7 and 5.4 AUTOSEL patches.

http://lists.infradead.org/pipermail/linux-nvme/2020-August/018570.html

Regards,
Martin




Re: [PATCH] qla2xxx: fix a potential NULL pointer dereference

2019-10-04 Thread Martin Wilck
On Wed, 2019-09-18 at 22:06 +0530, Allen Pais wrote:
> alloc_workqueue is not checked for errors and as a result,
> a potential NULL dereference could occur.
> 
> Signed-off-by: Allen Pais 
> ---
>  drivers/scsi/qla2xxx/qla_os.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Martin Wilck 




Re: [PATCH] scsi: core: Log SCSI command age with errors

2019-09-30 Thread Martin Wilck
Hello Milan,

On Mon, 2019-09-30 at 14:35 +0530, Milan P. Gandhi wrote:
> On 9/30/19 2:12 PM, Martin Wilck wrote:

> > Wrt the enablement of the option on highly loaded systems, I'm not
> > sure
> > I understand. You need to enable SCSI logging anyway, don't you?
> 
> By default we keep the SCSI debug logging disabled or am I missing 
> something?
> 
> > Is it an issue to have to set 2 sysfs values rather than just one?
> 
> The idea here is to capture the above debug data even without 
> any user interventions to change any sysfs entries or to enable 
> debug logging on busy, critical production systems.

So, you're looking at the scsi_io_completion() code path. In my
experience that isn't reliable for bug hunting because of the the
message rate limiting. Therefore I prefer using SCSI logging
MLCOMPLETE=1, where no rate limiting applies. But that's just a side
note, it depends on the case what's more useful.

Back to the cmd age output, IMO we're are on a thin line between
capturing useful information and keeping the logs neat. As I already
said, I'm not convinced that this information, as important it may be
for the case(s) you're currently investigating, has the same generic
degree of importance or usefulness as what's currently printed (the CDB
and the sense data). But OTOH, that's just a gut feeling, and I can't
claim to have the experience to make general statement on it. If noone
else has issues with this being printed by default, I'm not going
oppose it. 

> Also, we are not changing the existing text in SCSI command error
> log,
> but we are only adding one single word at the end of message. Ideally
> the user scripts are written to grep specific pattern from the logs.
> Since we are not replacing any existing text from the logs, the 
> scripts should still work with this change as well.

You are certainly aware that such scripts don't necessarily conform to
what kernel developers would consider "ideal" :-) But again, I just
wanted to raise the issue; if noone else thinks it matters, fine with
me.

Thanks
Martin




Re: [PATCH] scsi: core: Log SCSI command age with errors

2019-09-30 Thread Martin Wilck
On Fri, 2019-09-27 at 13:45 -0400, Laurence Oberman wrote:
> 
> Hi Martin
> 
> Agreed about log extraction, but turning that on with a busy workload
> in a production environment is not practical. We cant do it with
> systems with 1000's of luns and 1000's of IOPS/sec.
> Also second resolution is good enough for the debug we want to see.

I gather that you look at a specific problem where second resolution is
sufficient. For upstream, the generic usefulness should be considered,
and I don't think we can say today that better-than-second resolution
will never be useful, so I still vote for milliseconds.

Wrt the enablement of the option on highly loaded systems, I'm not sure
I understand. You need to enable SCSI logging anyway, don't you? Is it
an issue to have to set 2 sysfs values rather than just one?

Regards,
Martin

> 
> Regards
> Laurence
> 




Re: [PATCH] scsi: core: Log SCSI command age with errors

2019-09-27 Thread Martin Wilck
On Mon, 2019-09-23 at 11:31 +0530,  Milan P. Gandhi wrote:
> Couple of users had requested to print the SCSI command age along 
> with command failure errors. This is a small change, but allows 
> users to get more important information about the command that was 
> failed, it would help the users in debugging the command failures:
> 
> Signed-off-by: Milan P. Gandhi 
> ---
> diff --git a/drivers/scsi/scsi_logging.c
> b/drivers/scsi/scsi_logging.c
> index ecc5918e372a..ca2182bc53c6 100644
> --- a/drivers/scsi/scsi_logging.c
> +++ b/drivers/scsi/scsi_logging.c
> @@ -437,6 +437,7 @@ void scsi_print_result(const struct scsi_cmnd
> *cmd, const char *msg,
>   const char *mlret_string = scsi_mlreturn_string(disposition);
>   const char *hb_string = scsi_hostbyte_string(cmd->result);
>   const char *db_string = scsi_driverbyte_string(cmd->result);
> + unsigned long cmd_age = (jiffies - cmd->jiffies_at_alloc) / HZ;

This comes down to pretty coarse time resolution, does it not? More
often than not, the time difference shown will be 0. I'd recommend at
least millisecond resolution.

>  
>   logbuf = scsi_log_reserve_buffer(_len);
>   if (!logbuf)
> @@ -478,10 +479,15 @@ void scsi_print_result(const struct scsi_cmnd
> *cmd, const char *msg,
>  
>   if (db_string)
>   off += scnprintf(logbuf + off, logbuf_len - off,
> -  "driverbyte=%s", db_string);
> +  "driverbyte=%s ", db_string);
>   else
>   off += scnprintf(logbuf + off, logbuf_len - off,
> -  "driverbyte=0x%02x", driver_byte(cmd-
> >result));
> +  "driverbyte=0x%02x ",
> +  driver_byte(cmd->result));
> +
> + off += scnprintf(logbuf + off, logbuf_len - off,
> +  "cmd-age=%lus", cmd_age);

This is certainly helpful in some situations. Yet I am unsure if it
should *always* be printed. I wouldn't say it's as important as the 
other stuff scsi_print_result() provides. After all, by activating
MLQUEUE+MLCOMPLETE, the time on-wire can be extracted with better
accuracy from currently available logs. 

So perhaps make this depend on a module parameter?

Also, we should carefully ponder if we want to put this on the same
line as the driver byte, as users may have created scripts for parsing
this output.

Regards,
Martin




Re: [PATCH] cdrom: gdrom.c: fix a incorrect use of kstrndup()

2019-08-09 Thread Martin Wilck
On Tue, 2019-08-06 at 00:11 +0530, Saurav Girepunje wrote:
> According to doc: "Note: Use kmemdup_nul() instead if the size
> is known exactly." So we should use kmemdup_nul() here instead of
> kstrndup().
> 
> Signed-off-by: Saurav Girepunje 
> ---

Reviewed-by: Martin Wilck 



Re: linux-next: manual merge of the scsi tree with the block tree

2019-04-17 Thread Martin Wilck
On Mon, 2019-04-15 at 06:48 -0700, Bart Van Assche wrote:
> On 4/14/19 10:59 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the scsi tree got a conflict in:
> > 
> >   drivers/scsi/sd.c
> > 
> > between commit:
> > 
> >   c92e2f04b359 ("block: disk_events: introduce event flags")
> > 
> > from the block tree and commit:
> > 
> >   21e6ba3f0e02 ("scsi: sd: Rely on the driver core for asynchronous
> > probing")
> >   d16ece577bf2 ("scsi: sd: Inline sd_probe_part2()")
> > 
> > from the scsi tree.
> > 
> > I fixed it up (see below) and can carry the fix as necessary. This
> > is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your
> > tree
> > is submitted for merging.  You may also want to consider
> > cooperating
> > with the maintainer of the conflicting tree to minimise any
> > particularly
> > complex conflicts.
> 
> Thanks Stephen for having resolved this conflict. The conflict
> resolution
> looks good to me.

Yes, it looks good to me as well.

Thanks,
Martin




Re: [PATCH 0/9 v3] locks: avoid thundering-herd wake-ups

2018-10-25 Thread Martin Wilck
On Thu, 2018-10-25 at 12:04 -0400, J. Bruce Fields wrote:
> On Wed, Oct 24, 2018 at 09:43:54AM +1100, NeilBrown wrote:
> > This took longer that I had wanted, due to various reasons - sorry.
> > And I'm now posting it in a merge window, which is not ideal.  I
> > don't
> > expect it to be included in this merge window and I won't be at all
> > impatient for review, but I didn't want to delay it further.
> 
> Yes, apologies, I've been looking forward to finding out how this
> turned
> out, but I'd like to track down a bug or two and also review Olga's
> copy
> patches  Bug me if I haven't gotten to this in a week or two.
> 
> I'd also be really interested in any details of the performance
> experiments that you could share.

I pushed my test code to https://github.com/mwilck/lockscale.

Cheers,
Martin




Re: [PATCH 0/9 v3] locks: avoid thundering-herd wake-ups

2018-10-25 Thread Martin Wilck
On Thu, 2018-10-25 at 12:04 -0400, J. Bruce Fields wrote:
> On Wed, Oct 24, 2018 at 09:43:54AM +1100, NeilBrown wrote:
> > This took longer that I had wanted, due to various reasons - sorry.
> > And I'm now posting it in a merge window, which is not ideal.  I
> > don't
> > expect it to be included in this merge window and I won't be at all
> > impatient for review, but I didn't want to delay it further.
> 
> Yes, apologies, I've been looking forward to finding out how this
> turned
> out, but I'd like to track down a bug or two and also review Olga's
> copy
> patches  Bug me if I haven't gotten to this in a week or two.
> 
> I'd also be really interested in any details of the performance
> experiments that you could share.

I pushed my test code to https://github.com/mwilck/lockscale.

Cheers,
Martin




Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-20 Thread Martin Wilck
On Mon, 2018-08-20 at 16:02 -0400, J. Bruce Fields wrote:
> On Mon, Aug 20, 2018 at 01:02:21PM +0200, Martin Wilck wrote:
> > On Wed, 2018-08-08 at 14:29 -0400, J. Bruce Fields wrote:
> > > On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
> > > > On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
> > > > > If you have a many-core machine, and have many threads all
> > > > > wanting to
> > > > > briefly lock a give file (udev is known to do this), you can
> > > > > get
> > > > > quite
> > > > > poor performance.
> > > > > 
> > > > > When one thread releases a lock, it wakes up all other
> > > > > threads
> > > > > that
> > > > > are waiting (classic thundering-herd) - one will get the lock
> > > > > and
> > > > > the
> > > > > others go to sleep.
> > > > > When you have few cores, this is not very noticeable: by the
> > > > > time
> > > > > the
> > > > > 4th or 5th thread gets enough CPU time to try to claim the
> > > > > lock,
> > > > > the
> > > > > earlier threads have claimed it, done what was needed, and
> > > > > released.
> > > > > With 50+ cores, the contention can easily be measured.
> > > > > 
> > > > > This patchset creates a tree of pending lock request in which
> > > > > siblings
> > > > > don't conflict and each lock request does conflict with its
> > > > > parent.
> > > > > When a lock is released, only requests which don't conflict
> > > > > with
> > > > > each
> > > > > other a woken.
> > > > > 
> > > > > Testing shows that lock-acquisitions-per-second is now fairly
> > > > > stable even
> > > > > as number of contending process goes to 1000.  Without this
> > > > > patch,
> > > > > locks-per-second drops off steeply after a few 10s of
> > > > > processes.
> > > > > 
> > > > > There is a small cost to this extra complexity.
> > > > > At 20 processes running a particular test on 72 cores, the
> > > > > lock
> > > > > acquisitions per second drops from 1.8 million to 1.4 million
> > > > > with
> > > > > this patch.  For 100 processes, this patch still provides 1.4
> > > > > million
> > > > > while without this patch there are about 700,000.
> > > > > 
> > > > > NeilBrown
> > > > > 
> > > > > ---
> > > > > 
> > > > > NeilBrown (4):
> > > > >   fs/locks: rename some lists and pointers.
> > > > >   fs/locks: allow a lock request to block other requests.
> > > > >   fs/locks: change all *_conflict() functions to return
> > > > > bool.
> > > > >   fs/locks: create a tree of dependent requests.
> > > > > 
> > > > > 
> > > > >  fs/cifs/file.c  |2 -
> > > > >  fs/locks.c  |  142
> > > > > +--
> > > > >  include/linux/fs.h  |5 +
> > > > >  include/trace/events/filelock.h |   16 ++--
> > > > >  4 files changed, 103 insertions(+), 62 deletions(-)
> > > > > 
> > > > 
> > > > Nice work! I looked over this and I think it looks good.
> > > > 
> > > > I made an attempt to fix this issue several years ago, but my
> > > > method
> > > > sucked as it ended up penalizing the unlocking task too much.
> > > > This
> > > > is
> > > > much cleaner and should scale well overall, I think.
> > > 
> > > I think I also took a crack at this at one point while I was at
> > > UM/CITI
> > > and never got anything I was happy with.  Looks like good work!
> > > 
> > > I remember one main obstacle that I felt like I never had a good
> > > benchmark
> > > 
> > > How did you choose this workload and hardware?  Was it in fact
> > > udev
> > > (booting a large machine?), or was there some other motivation?
> > 
> > Some details can be found here:
> > 
> > https://github.com/systemd/systemd/pull/9551
> > 
> > https://github.com/sys

Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-20 Thread Martin Wilck
On Mon, 2018-08-20 at 16:02 -0400, J. Bruce Fields wrote:
> On Mon, Aug 20, 2018 at 01:02:21PM +0200, Martin Wilck wrote:
> > On Wed, 2018-08-08 at 14:29 -0400, J. Bruce Fields wrote:
> > > On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
> > > > On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
> > > > > If you have a many-core machine, and have many threads all
> > > > > wanting to
> > > > > briefly lock a give file (udev is known to do this), you can
> > > > > get
> > > > > quite
> > > > > poor performance.
> > > > > 
> > > > > When one thread releases a lock, it wakes up all other
> > > > > threads
> > > > > that
> > > > > are waiting (classic thundering-herd) - one will get the lock
> > > > > and
> > > > > the
> > > > > others go to sleep.
> > > > > When you have few cores, this is not very noticeable: by the
> > > > > time
> > > > > the
> > > > > 4th or 5th thread gets enough CPU time to try to claim the
> > > > > lock,
> > > > > the
> > > > > earlier threads have claimed it, done what was needed, and
> > > > > released.
> > > > > With 50+ cores, the contention can easily be measured.
> > > > > 
> > > > > This patchset creates a tree of pending lock request in which
> > > > > siblings
> > > > > don't conflict and each lock request does conflict with its
> > > > > parent.
> > > > > When a lock is released, only requests which don't conflict
> > > > > with
> > > > > each
> > > > > other a woken.
> > > > > 
> > > > > Testing shows that lock-acquisitions-per-second is now fairly
> > > > > stable even
> > > > > as number of contending process goes to 1000.  Without this
> > > > > patch,
> > > > > locks-per-second drops off steeply after a few 10s of
> > > > > processes.
> > > > > 
> > > > > There is a small cost to this extra complexity.
> > > > > At 20 processes running a particular test on 72 cores, the
> > > > > lock
> > > > > acquisitions per second drops from 1.8 million to 1.4 million
> > > > > with
> > > > > this patch.  For 100 processes, this patch still provides 1.4
> > > > > million
> > > > > while without this patch there are about 700,000.
> > > > > 
> > > > > NeilBrown
> > > > > 
> > > > > ---
> > > > > 
> > > > > NeilBrown (4):
> > > > >   fs/locks: rename some lists and pointers.
> > > > >   fs/locks: allow a lock request to block other requests.
> > > > >   fs/locks: change all *_conflict() functions to return
> > > > > bool.
> > > > >   fs/locks: create a tree of dependent requests.
> > > > > 
> > > > > 
> > > > >  fs/cifs/file.c  |2 -
> > > > >  fs/locks.c  |  142
> > > > > +--
> > > > >  include/linux/fs.h  |5 +
> > > > >  include/trace/events/filelock.h |   16 ++--
> > > > >  4 files changed, 103 insertions(+), 62 deletions(-)
> > > > > 
> > > > 
> > > > Nice work! I looked over this and I think it looks good.
> > > > 
> > > > I made an attempt to fix this issue several years ago, but my
> > > > method
> > > > sucked as it ended up penalizing the unlocking task too much.
> > > > This
> > > > is
> > > > much cleaner and should scale well overall, I think.
> > > 
> > > I think I also took a crack at this at one point while I was at
> > > UM/CITI
> > > and never got anything I was happy with.  Looks like good work!
> > > 
> > > I remember one main obstacle that I felt like I never had a good
> > > benchmark
> > > 
> > > How did you choose this workload and hardware?  Was it in fact
> > > udev
> > > (booting a large machine?), or was there some other motivation?
> > 
> > Some details can be found here:
> > 
> > https://github.com/systemd/systemd/pull/9551
> > 
> > https://github.com/sys

Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-20 Thread Martin Wilck
On Wed, 2018-08-08 at 14:29 -0400, J. Bruce Fields wrote:
> On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
> > On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
> > > If you have a many-core machine, and have many threads all
> > > wanting to
> > > briefly lock a give file (udev is known to do this), you can get
> > > quite
> > > poor performance.
> > > 
> > > When one thread releases a lock, it wakes up all other threads
> > > that
> > > are waiting (classic thundering-herd) - one will get the lock and
> > > the
> > > others go to sleep.
> > > When you have few cores, this is not very noticeable: by the time
> > > the
> > > 4th or 5th thread gets enough CPU time to try to claim the lock,
> > > the
> > > earlier threads have claimed it, done what was needed, and
> > > released.
> > > With 50+ cores, the contention can easily be measured.
> > > 
> > > This patchset creates a tree of pending lock request in which
> > > siblings
> > > don't conflict and each lock request does conflict with its
> > > parent.
> > > When a lock is released, only requests which don't conflict with
> > > each
> > > other a woken.
> > > 
> > > Testing shows that lock-acquisitions-per-second is now fairly
> > > stable even
> > > as number of contending process goes to 1000.  Without this
> > > patch,
> > > locks-per-second drops off steeply after a few 10s of processes.
> > > 
> > > There is a small cost to this extra complexity.
> > > At 20 processes running a particular test on 72 cores, the lock
> > > acquisitions per second drops from 1.8 million to 1.4 million
> > > with
> > > this patch.  For 100 processes, this patch still provides 1.4
> > > million
> > > while without this patch there are about 700,000.
> > > 
> > > NeilBrown
> > > 
> > > ---
> > > 
> > > NeilBrown (4):
> > >   fs/locks: rename some lists and pointers.
> > >   fs/locks: allow a lock request to block other requests.
> > >   fs/locks: change all *_conflict() functions to return bool.
> > >   fs/locks: create a tree of dependent requests.
> > > 
> > > 
> > >  fs/cifs/file.c  |2 -
> > >  fs/locks.c  |  142
> > > +--
> > >  include/linux/fs.h  |5 +
> > >  include/trace/events/filelock.h |   16 ++--
> > >  4 files changed, 103 insertions(+), 62 deletions(-)
> > > 
> > 
> > Nice work! I looked over this and I think it looks good.
> > 
> > I made an attempt to fix this issue several years ago, but my
> > method
> > sucked as it ended up penalizing the unlocking task too much. This
> > is
> > much cleaner and should scale well overall, I think.
> 
> I think I also took a crack at this at one point while I was at
> UM/CITI
> and never got anything I was happy with.  Looks like good work!
> 
> I remember one main obstacle that I felt like I never had a good
> benchmark
> 
> How did you choose this workload and hardware?  Was it in fact udev
> (booting a large machine?), or was there some other motivation?

Some details can be found here:

https://github.com/systemd/systemd/pull/9551

https://github.com/systemd/systemd/pull/8667#issuecomment-385520335
and comments further down. 8667 has been superseded by 9551.

The original problem was that the symlink "/dev/disk/by-
partlabel/primary" may be claimed by _many_ devices on big systems
under certain distributions, which use older versions of parted for
partition creation on GPT disk labels. I've seen systems with literally
thousands of contenders for this symlink. 

We found that with current systemd, this can cause a boot-time race
where the wrong device is eventually assigned the "best" contender for
the symlink (e.g. a partition on multipath member rather than a
partition on the multipath map itself). I extended the udev test suite,
creating a test that makes this race easily reproducible, at least on
systems with many CPUs (the test host I used most had 72 cores).

I created an udev patch that would use systemd's built in fcntl-based
locking to avoid this race, but I found that it would massively slow
down the system, and found the contention to be in the spin locks in
posix_lock_common(). (I therefore added more the systemd patches to
make the locking scale better, but that's irrelevant for the kernel-
side discussion).

I further created an artificial test just for the scaling of
fcntl(F_OFD_SETLKW) and flock(), with which I could reproduce the
scaling problem easily, and do some quantitive experiments. My tests
didn't use any byte ranges, only "full" locking of 0-byte files.

> Not that I'm likely to do it any time soon, but could you share
> sufficient details for someone else to reproduce your results?
> 
> --b.

The udev test code can be found in the above links. It adds a new
script "test/sd-script.py" that would be run after "test/sys-script.py" 
using a numeric argument indicating the number of contenders for the
test link to be created, such as "python test/sd-script.py test 1000".

Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

2018-08-20 Thread Martin Wilck
On Wed, 2018-08-08 at 14:29 -0400, J. Bruce Fields wrote:
> On Wed, Aug 08, 2018 at 12:47:22PM -0400, Jeff Layton wrote:
> > On Wed, 2018-08-08 at 11:51 +1000, NeilBrown wrote:
> > > If you have a many-core machine, and have many threads all
> > > wanting to
> > > briefly lock a give file (udev is known to do this), you can get
> > > quite
> > > poor performance.
> > > 
> > > When one thread releases a lock, it wakes up all other threads
> > > that
> > > are waiting (classic thundering-herd) - one will get the lock and
> > > the
> > > others go to sleep.
> > > When you have few cores, this is not very noticeable: by the time
> > > the
> > > 4th or 5th thread gets enough CPU time to try to claim the lock,
> > > the
> > > earlier threads have claimed it, done what was needed, and
> > > released.
> > > With 50+ cores, the contention can easily be measured.
> > > 
> > > This patchset creates a tree of pending lock request in which
> > > siblings
> > > don't conflict and each lock request does conflict with its
> > > parent.
> > > When a lock is released, only requests which don't conflict with
> > > each
> > > other a woken.
> > > 
> > > Testing shows that lock-acquisitions-per-second is now fairly
> > > stable even
> > > as number of contending process goes to 1000.  Without this
> > > patch,
> > > locks-per-second drops off steeply after a few 10s of processes.
> > > 
> > > There is a small cost to this extra complexity.
> > > At 20 processes running a particular test on 72 cores, the lock
> > > acquisitions per second drops from 1.8 million to 1.4 million
> > > with
> > > this patch.  For 100 processes, this patch still provides 1.4
> > > million
> > > while without this patch there are about 700,000.
> > > 
> > > NeilBrown
> > > 
> > > ---
> > > 
> > > NeilBrown (4):
> > >   fs/locks: rename some lists and pointers.
> > >   fs/locks: allow a lock request to block other requests.
> > >   fs/locks: change all *_conflict() functions to return bool.
> > >   fs/locks: create a tree of dependent requests.
> > > 
> > > 
> > >  fs/cifs/file.c  |2 -
> > >  fs/locks.c  |  142
> > > +--
> > >  include/linux/fs.h  |5 +
> > >  include/trace/events/filelock.h |   16 ++--
> > >  4 files changed, 103 insertions(+), 62 deletions(-)
> > > 
> > 
> > Nice work! I looked over this and I think it looks good.
> > 
> > I made an attempt to fix this issue several years ago, but my
> > method
> > sucked as it ended up penalizing the unlocking task too much. This
> > is
> > much cleaner and should scale well overall, I think.
> 
> I think I also took a crack at this at one point while I was at
> UM/CITI
> and never got anything I was happy with.  Looks like good work!
> 
> I remember one main obstacle that I felt like I never had a good
> benchmark
> 
> How did you choose this workload and hardware?  Was it in fact udev
> (booting a large machine?), or was there some other motivation?

Some details can be found here:

https://github.com/systemd/systemd/pull/9551

https://github.com/systemd/systemd/pull/8667#issuecomment-385520335
and comments further down. 8667 has been superseded by 9551.

The original problem was that the symlink "/dev/disk/by-
partlabel/primary" may be claimed by _many_ devices on big systems
under certain distributions, which use older versions of parted for
partition creation on GPT disk labels. I've seen systems with literally
thousands of contenders for this symlink. 

We found that with current systemd, this can cause a boot-time race
where the wrong device is eventually assigned the "best" contender for
the symlink (e.g. a partition on multipath member rather than a
partition on the multipath map itself). I extended the udev test suite,
creating a test that makes this race easily reproducible, at least on
systems with many CPUs (the test host I used most had 72 cores).

I created an udev patch that would use systemd's built in fcntl-based
locking to avoid this race, but I found that it would massively slow
down the system, and found the contention to be in the spin locks in
posix_lock_common(). (I therefore added more the systemd patches to
make the locking scale better, but that's irrelevant for the kernel-
side discussion).

I further created an artificial test just for the scaling of
fcntl(F_OFD_SETLKW) and flock(), with which I could reproduce the
scaling problem easily, and do some quantitive experiments. My tests
didn't use any byte ranges, only "full" locking of 0-byte files.

> Not that I'm likely to do it any time soon, but could you share
> sufficient details for someone else to reproduce your results?
> 
> --b.

The udev test code can be found in the above links. It adds a new
script "test/sd-script.py" that would be run after "test/sys-script.py" 
using a numeric argument indicating the number of contenders for the
test link to be created, such as "python test/sd-script.py test 1000".

Re: [PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND

2018-04-26 Thread Martin Wilck
On Fri, 2018-04-20 at 19:15 -0400, Martin K. Petersen wrote:
> 
> Much better, thanks for reworking this. Applied to 4.18/scsi-queue.

Thank you!

By the way, I've been wondering whether declaring blist_flags_t
__bitwise was a wise decision. blist_flags_t is kernel-internal, thus
endianness doesn't matter. OTOH, using __bitwise requires explicit
casts in many places, which may suppress warnings about integer size
mismatch and made me overlook some places where I had to change
"unsigned long" to "unsigned long long" in the first place
(in the submitted and applied version I think I caught them all).

Regards,
Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND

2018-04-26 Thread Martin Wilck
On Fri, 2018-04-20 at 19:15 -0400, Martin K. Petersen wrote:
> 
> Much better, thanks for reworking this. Applied to 4.18/scsi-queue.

Thank you!

By the way, I've been wondering whether declaring blist_flags_t
__bitwise was a wise decision. blist_flags_t is kernel-internal, thus
endianness doesn't matter. OTOH, using __bitwise requires explicit
casts in many places, which may suppress warnings about integer size
mismatch and made me overlook some places where I had to change
"unsigned long" to "unsigned long long" in the first place
(in the submitted and applied version I think I caught them all).

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-18 Thread Martin Wilck
On Tue, 2018-04-17 at 17:07 -0700, Linus Torvalds wrote:
> On Tue, Apr 17, 2018 at 4:35 PM, Martin Wilck <mwi...@suse.com>
> wrote:
> > Sparse emits errors about ilog2() in array indices because of the
> > use of
> > __ilog2_32() and __ilog2_64(),
> 
> If sparse warns about it, then presumably gcc with -Wvla warns about
> it too?

No, it doesn't (gcc 7.3.0). -> https://paste.opensuse.org/27471594
It doesn't even warn on an expression like this:

  #define SIZE (1<<10)
  static int foo[ilog2(SIZE)];

sparse 0.5.2 doesn't warn about that either. It emits "error: bad
integer constant expression" only if ilog2 is used in an array
initializer, like this:

  #define SIZE (1<<10)
  #define SUBS (1<<5)
  static int foo [ilog2(SIZE)] = {
  [ilog2(SUBS)] = 0,
  };

So maybe I was wrong, and this is actually a false positive in sparse.

> So I suspect that what you'd want is
> 
>   #define ilog2(n) \
> __builtin_choose_expr(__is_constexpr(n), \
> const_ilog2(n), \
> __builtin_choose_expr(sizeof(n) <= 4, \
> __ilog2_u32(n), \
> __ilog2_u64(n)))
> 
> or something. Hmm?

Do you want me to convert the patch to your approach anyway?
Or should I throw this away and report to sparse?

Regards and thanks,
Martin

PS: apologies to all recipients for the broken cc list in my post.
-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-18 Thread Martin Wilck
On Tue, 2018-04-17 at 17:07 -0700, Linus Torvalds wrote:
> On Tue, Apr 17, 2018 at 4:35 PM, Martin Wilck 
> wrote:
> > Sparse emits errors about ilog2() in array indices because of the
> > use of
> > __ilog2_32() and __ilog2_64(),
> 
> If sparse warns about it, then presumably gcc with -Wvla warns about
> it too?

No, it doesn't (gcc 7.3.0). -> https://paste.opensuse.org/27471594
It doesn't even warn on an expression like this:

  #define SIZE (1<<10)
  static int foo[ilog2(SIZE)];

sparse 0.5.2 doesn't warn about that either. It emits "error: bad
integer constant expression" only if ilog2 is used in an array
initializer, like this:

  #define SIZE (1<<10)
  #define SUBS (1<<5)
  static int foo [ilog2(SIZE)] = {
  [ilog2(SUBS)] = 0,
  };

So maybe I was wrong, and this is actually a false positive in sparse.

> So I suspect that what you'd want is
> 
>   #define ilog2(n) \
> __builtin_choose_expr(__is_constexpr(n), \
> const_ilog2(n), \
> __builtin_choose_expr(sizeof(n) <= 4, \
> __ilog2_u32(n), \
> __ilog2_u64(n)))
> 
> or something. Hmm?

Do you want me to convert the patch to your approach anyway?
Or should I throw this away and report to sparse?

Regards and thanks,
Martin

PS: apologies to all recipients for the broken cc list in my post.
-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH v3 6/6] scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

2018-04-17 Thread Martin Wilck
On Fujitsu ETERNUS systems, sense code ABORTED COMMAND with ASC/Q C1/01 is
used to indicate temporary condition where the storage-internal path to a
target is switched from one controller to another. SCSI commands that
return with this error code must be retried unconditionally (i.e. without
the "maybe_retry" logic in scsi_decide_disposition); otherwise dm-multipath
might initiate a failover from a healthy path e.g. for REQ_FAILFAST_DEV
commands.

Introduce a new blist flag for this case.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/scsi_devinfo.c | 1 +
 drivers/scsi/scsi_error.c   | 3 +++
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index f7b94c1..e75a50f 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -168,6 +168,7 @@ static struct {
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "F8", NULL, BLIST_NOREPORTLUN},
{"FSC", "CentricStor", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
+   {"FUJITSU", "ETERNUS_DXM", "*", BLIST_RETRY_ASC_C1},
{"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN | BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1dee91f..896b991 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -527,6 +527,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 
if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
return ADD_TO_MLQUEUE;
+   if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
+   sdev->sdev_bflags & BLIST_RETRY_ASC_C1)
+   return ADD_TO_MLQUEUE;
 
return NEEDS_RETRY;
case NOT_READY:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 91a327e..3fdb322 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -65,8 +65,10 @@
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 /* Always retry ABORTED_COMMAND with Internal Target Failure */
 #define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
+/* Always retry ABORTED_COMMAND with ASC 0xc1 */
+#define BLIST_RETRY_ASC_C1 ((__force blist_flags_t)(1ULL << 33))
 
-#define __BLIST_LAST_USED BLIST_RETRY_ITF
+#define __BLIST_LAST_USED BLIST_RETRY_ASC_C1
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1



[PATCH v3 6/6] scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

2018-04-17 Thread Martin Wilck
On Fujitsu ETERNUS systems, sense code ABORTED COMMAND with ASC/Q C1/01 is
used to indicate temporary condition where the storage-internal path to a
target is switched from one controller to another. SCSI commands that
return with this error code must be retried unconditionally (i.e. without
the "maybe_retry" logic in scsi_decide_disposition); otherwise dm-multipath
might initiate a failover from a healthy path e.g. for REQ_FAILFAST_DEV
commands.

Introduce a new blist flag for this case.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 1 +
 drivers/scsi/scsi_error.c   | 3 +++
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index f7b94c1..e75a50f 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -168,6 +168,7 @@ static struct {
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "F8", NULL, BLIST_NOREPORTLUN},
{"FSC", "CentricStor", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
+   {"FUJITSU", "ETERNUS_DXM", "*", BLIST_RETRY_ASC_C1},
{"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN | BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
{"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN | 
BLIST_INQUIRY_36},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1dee91f..896b991 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -527,6 +527,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
 
if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
return ADD_TO_MLQUEUE;
+   if (sshdr.asc == 0xc1 && sshdr.ascq == 0x01 &&
+   sdev->sdev_bflags & BLIST_RETRY_ASC_C1)
+   return ADD_TO_MLQUEUE;
 
return NEEDS_RETRY;
case NOT_READY:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 91a327e..3fdb322 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -65,8 +65,10 @@
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 /* Always retry ABORTED_COMMAND with Internal Target Failure */
 #define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
+/* Always retry ABORTED_COMMAND with ASC 0xc1 */
+#define BLIST_RETRY_ASC_C1 ((__force blist_flags_t)(1ULL << 33))
 
-#define __BLIST_LAST_USED BLIST_RETRY_ITF
+#define __BLIST_LAST_USED BLIST_RETRY_ASC_C1
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1



[PATCH v3 4/6] scsi: devinfo: warn on undefined blist flags

2018-04-17 Thread Martin Wilck
Warn if a device (or the user) sets blist flags which are unknown
or have been removed. This should enable us to reuse freed blist
bits in later releases.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_devinfo.c |  6 ++
 include/scsi/scsi_devinfo.h | 21 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index d5135ef..fd901a5 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -190,7 +190,7 @@ $(obj)/53c700.o $(MODVERDIR)/$(obj)/53c700.ver: 
$(obj)/53c700_d.h
 $(obj)/scsi_sysfs.o: $(obj)/scsi_devinfo_tbl.c
 
 quiet_cmd_bflags = GEN $@
-   cmd_bflags = sed -n 's/.*BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
+   cmd_bflags = sed -n 's/.*define *BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
 
 $(obj)/scsi_devinfo_tbl.c: include/scsi/scsi_devinfo.h
$(call if_changed,bflags)
diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index fc6b755..c05843a 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -371,6 +371,12 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
}
flags = (__force blist_flags_t)val;
}
+   if (flags & __BLIST_UNUSED_MASK) {
+   pr_err("scsi_devinfo (%s:%s): unsupported flags 0x%llx",
+  vendor, model, flags & __BLIST_UNUSED_MASK);
+   kfree(devinfo);
+   return -EINVAL;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index e206d29..3434e14 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -28,8 +28,13 @@
 #define BLIST_LARGELUN ((__force blist_flags_t)(1ULL << 9))
 /* override additional length field */
 #define BLIST_INQUIRY_36   ((__force blist_flags_t)(1ULL << 10))
+#define __BLIST_UNUSED_11  ((__force blist_flags_t)(1ULL << 11))
 /* do not do automatic start on add */
 #define BLIST_NOSTARTONADD ((__force blist_flags_t)(1ULL << 12))
+#define __BLIST_UNUSED_13  ((__force blist_flags_t)(1ULL << 13))
+#define __BLIST_UNUSED_14  ((__force blist_flags_t)(1ULL << 14))
+#define __BLIST_UNUSED_15  ((__force blist_flags_t)(1ULL << 15))
+#define __BLIST_UNUSED_16  ((__force blist_flags_t)(1ULL << 16))
 /* try REPORT_LUNS even for SCSI-2 devs (if HBA supports more than 8 LUNs) */
 #define BLIST_REPORTLUN2   ((__force blist_flags_t)(1ULL << 17))
 /* don't try REPORT_LUNS scan (SCSI-3 devs) */
@@ -44,10 +49,12 @@
 #define BLIST_RETRY_HWERROR((__force blist_flags_t)(1ULL << 22))
 /* maximum 512 sector cdb length */
 #define BLIST_MAX_512  ((__force blist_flags_t)(1ULL << 23))
+#define __BLIST_UNUSED_24  ((__force blist_flags_t)(1ULL << 24))
 /* Disable T10 PI (DIF) */
 #define BLIST_NO_DIF   ((__force blist_flags_t)(1ULL << 25))
 /* Ignore SBC-3 VPD pages */
 #define BLIST_SKIP_VPD_PAGES   ((__force blist_flags_t)(1ULL << 26))
+#define __BLIST_UNUSED_27  ((__force blist_flags_t)(1ULL << 27))
 /* Attempt to read VPD pages */
 #define BLIST_TRY_VPD_PAGES((__force blist_flags_t)(1ULL << 28))
 /* don't try to issue RSOC */
@@ -57,4 +64,18 @@
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 
+#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+
+#define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
+  (__force blist_flags_t) \
+  ((__force __u64)__BLIST_LAST_USED - 1ULL)))
+#define __BLIST_UNUSED_MASK (__BLIST_UNUSED_11 | \
+__BLIST_UNUSED_13 | \
+__BLIST_UNUSED_14 | \
+__BLIST_UNUSED_15 | \
+__BLIST_UNUSED_16 | \
+__BLIST_UNUSED_24 | \
+__BLIST_UNUSED_27 | \
+__BLIST_HIGH_UNUSED)
+
 #endif
-- 
2.16.1



[PATCH v3 4/6] scsi: devinfo: warn on undefined blist flags

2018-04-17 Thread Martin Wilck
Warn if a device (or the user) sets blist flags which are unknown
or have been removed. This should enable us to reuse freed blist
bits in later releases.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_devinfo.c |  6 ++
 include/scsi/scsi_devinfo.h | 21 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
index d5135ef..fd901a5 100644
--- a/drivers/scsi/Makefile
+++ b/drivers/scsi/Makefile
@@ -190,7 +190,7 @@ $(obj)/53c700.o $(MODVERDIR)/$(obj)/53c700.ver: 
$(obj)/53c700_d.h
 $(obj)/scsi_sysfs.o: $(obj)/scsi_devinfo_tbl.c
 
 quiet_cmd_bflags = GEN $@
-   cmd_bflags = sed -n 's/.*BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
+   cmd_bflags = sed -n 's/.*define *BLIST_\([A-Z0-9_]*\) 
*.*/BLIST_FLAG_NAME(\1),/p' $< > $@
 
 $(obj)/scsi_devinfo_tbl.c: include/scsi/scsi_devinfo.h
$(call if_changed,bflags)
diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index fc6b755..c05843a 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -371,6 +371,12 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
}
flags = (__force blist_flags_t)val;
}
+   if (flags & __BLIST_UNUSED_MASK) {
+   pr_err("scsi_devinfo (%s:%s): unsupported flags 0x%llx",
+  vendor, model, flags & __BLIST_UNUSED_MASK);
+   kfree(devinfo);
+   return -EINVAL;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index e206d29..3434e14 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -28,8 +28,13 @@
 #define BLIST_LARGELUN ((__force blist_flags_t)(1ULL << 9))
 /* override additional length field */
 #define BLIST_INQUIRY_36   ((__force blist_flags_t)(1ULL << 10))
+#define __BLIST_UNUSED_11  ((__force blist_flags_t)(1ULL << 11))
 /* do not do automatic start on add */
 #define BLIST_NOSTARTONADD ((__force blist_flags_t)(1ULL << 12))
+#define __BLIST_UNUSED_13  ((__force blist_flags_t)(1ULL << 13))
+#define __BLIST_UNUSED_14  ((__force blist_flags_t)(1ULL << 14))
+#define __BLIST_UNUSED_15  ((__force blist_flags_t)(1ULL << 15))
+#define __BLIST_UNUSED_16  ((__force blist_flags_t)(1ULL << 16))
 /* try REPORT_LUNS even for SCSI-2 devs (if HBA supports more than 8 LUNs) */
 #define BLIST_REPORTLUN2   ((__force blist_flags_t)(1ULL << 17))
 /* don't try REPORT_LUNS scan (SCSI-3 devs) */
@@ -44,10 +49,12 @@
 #define BLIST_RETRY_HWERROR((__force blist_flags_t)(1ULL << 22))
 /* maximum 512 sector cdb length */
 #define BLIST_MAX_512  ((__force blist_flags_t)(1ULL << 23))
+#define __BLIST_UNUSED_24  ((__force blist_flags_t)(1ULL << 24))
 /* Disable T10 PI (DIF) */
 #define BLIST_NO_DIF   ((__force blist_flags_t)(1ULL << 25))
 /* Ignore SBC-3 VPD pages */
 #define BLIST_SKIP_VPD_PAGES   ((__force blist_flags_t)(1ULL << 26))
+#define __BLIST_UNUSED_27  ((__force blist_flags_t)(1ULL << 27))
 /* Attempt to read VPD pages */
 #define BLIST_TRY_VPD_PAGES((__force blist_flags_t)(1ULL << 28))
 /* don't try to issue RSOC */
@@ -57,4 +64,18 @@
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
 
+#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+
+#define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
+  (__force blist_flags_t) \
+  ((__force __u64)__BLIST_LAST_USED - 1ULL)))
+#define __BLIST_UNUSED_MASK (__BLIST_UNUSED_11 | \
+__BLIST_UNUSED_13 | \
+__BLIST_UNUSED_14 | \
+__BLIST_UNUSED_15 | \
+__BLIST_UNUSED_16 | \
+__BLIST_UNUSED_24 | \
+__BLIST_UNUSED_27 | \
+__BLIST_HIGH_UNUSED)
+
 #endif
-- 
2.16.1



[PATCH v3 5/6] scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix

2018-04-17 Thread Martin Wilck
EMC Symmetrix returns 'internal target error' for a variety
of conditions, most of which will be transient.
So we should always retry it, even with failfast set.
Otherwise we'd get spurious path flaps with multipath.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/scsi_devinfo.c | 3 ++-
 drivers/scsi/scsi_error.c   | 4 
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index c05843a..f7b94c1 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -161,7 +161,8 @@ static struct {
{"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, storage on LUN 
0 */
{"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, no storage on 
LUN 0 */
{"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_REPORTLUN2},
+   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN |
+BLIST_REPORTLUN2 | BLIST_RETRY_ITF},
{"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
{"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ac3b1c3..1dee91f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "scsi_priv.h"
@@ -524,6 +525,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
if (sshdr.asc == 0x10) /* DIF */
return SUCCESS;
 
+   if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
+   return ADD_TO_MLQUEUE;
+
return NEEDS_RETRY;
case NOT_READY:
case UNIT_ATTENTION:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 3434e14..91a327e 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -63,8 +63,10 @@
 #define BLIST_MAX_1024 ((__force blist_flags_t)(1ULL << 30))
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
+/* Always retry ABORTED_COMMAND with Internal Target Failure */
+#define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
 
-#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+#define __BLIST_LAST_USED BLIST_RETRY_ITF
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1



[PATCH v3 5/6] scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix

2018-04-17 Thread Martin Wilck
EMC Symmetrix returns 'internal target error' for a variety
of conditions, most of which will be transient.
So we should always retry it, even with failfast set.
Otherwise we'd get spurious path flaps with multipath.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 3 ++-
 drivers/scsi/scsi_error.c   | 4 
 include/scsi/scsi_devinfo.h | 4 +++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index c05843a..f7b94c1 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -161,7 +161,8 @@ static struct {
{"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, storage on LUN 
0 */
{"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* EMC CLARiiON, no storage on 
LUN 0 */
{"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_REPORTLUN2},
+   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN |
+BLIST_REPORTLUN2 | BLIST_RETRY_ITF},
{"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
{"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ac3b1c3..1dee91f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "scsi_priv.h"
@@ -524,6 +525,9 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
if (sshdr.asc == 0x10) /* DIF */
return SUCCESS;
 
+   if (sshdr.asc == 0x44 && sdev->sdev_bflags & BLIST_RETRY_ITF)
+   return ADD_TO_MLQUEUE;
+
return NEEDS_RETRY;
case NOT_READY:
case UNIT_ATTENTION:
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 3434e14..91a327e 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -63,8 +63,10 @@
 #define BLIST_MAX_1024 ((__force blist_flags_t)(1ULL << 30))
 /* Use UNMAP limit for WRITE SAME */
 #define BLIST_UNMAP_LIMIT_WS   ((__force blist_flags_t)(1ULL << 31))
+/* Always retry ABORTED_COMMAND with Internal Target Failure */
+#define BLIST_RETRY_ITF((__force blist_flags_t)(1ULL << 32))
 
-#define __BLIST_LAST_USED BLIST_UNMAP_LIMIT_WS
+#define __BLIST_LAST_USED BLIST_RETRY_ITF
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
   (__force blist_flags_t) \
-- 
2.16.1



[PATCH v3 2/6] scsi: use const_ilog2 for array indices

2018-04-17 Thread Martin Wilck
Use the just introduced const_ilog2() macro to avoid sparse
errors.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/scsi_debugfs.c | 2 +-
 drivers/scsi/scsi_sysfs.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index b784002..c5a8756 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -4,7 +4,7 @@
 #include 
 #include "scsi_debugfs.h"
 
-#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
+#define SCSI_CMD_FLAG_NAME(name)[const_ilog2(SCMD_##name)] = #name
 static const char *const scsi_cmd_flags[] = {
SCSI_CMD_FLAG_NAME(TAGGED),
SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e56a4ac..feacd7a 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
-- 
2.16.1



[PATCH v3 3/6] scsi: devinfo: change blist_flag_t to 64bit

2018-04-17 Thread Martin Wilck
Space for SCSI blist flags is gradually running out. Change the
type to __u64. And fix a checkpatch complaint about symbolic
mode flags in scsi_devinfo.c.

Make checkpatch happy by replacing simple_strtoul() with kstrtoull().

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/scsi_devinfo.c | 18 +++-
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 50 ++---
 4 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index e5370d7..fc6b755 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -361,8 +361,16 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
scsi_strcpy_devinfo("model", devinfo->model, sizeof(devinfo->model),
model, compatible);
 
-   if (strflags)
-   flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 
0);
+   if (strflags) {
+   unsigned long long val;
+   int ret = kstrtoull(strflags, 0, );
+
+   if (ret != 0) {
+   kfree(devinfo);
+   return ret;
+   }
+   flags = (__force blist_flags_t)val;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
@@ -615,7 +623,7 @@ static int devinfo_seq_show(struct seq_file *m, void *v)
devinfo_table->name)
seq_printf(m, "[%s]:\n", devinfo_table->name);
 
-   seq_printf(m, "'%.8s' '%.16s' 0x%x\n",
+   seq_printf(m, "'%.8s' '%.16s' 0x%llx\n",
   devinfo->vendor, devinfo->model, devinfo->flags);
return 0;
 }
@@ -734,9 +742,9 @@ MODULE_PARM_DESC(dev_flags,
 " list entries for vendor and model with an integer value of flags"
 " to the scsi device info list");
 
-module_param_named(default_dev_flags, scsi_default_dev_flags, int, 
S_IRUGO|S_IWUSR);
+module_param_named(default_dev_flags, scsi_default_dev_flags, ullong, 0644);
 MODULE_PARM_DESC(default_dev_flags,
-"scsi default device flag integer value");
+"scsi default device flag uint64_t value");
 
 /**
  * scsi_exit_devinfo - remove /proc/scsi/device_info & the scsi_dev_info_list
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index feacd7a..43c27ce 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force __u64)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c..4c36af6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -15,7 +15,7 @@ struct scsi_cmnd;
 struct scsi_lun;
 struct scsi_sense_hdr;
 
-typedef unsigned int __bitwise blist_flags_t;
+typedef __u64 __bitwise blist_flags_t;
 
 struct scsi_mode_data {
__u32   length;
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index ea67c32..e206d29 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -6,55 +6,55 @@
  */
 
 /* Only scan LUN 0 */
-#define BLIST_NOLUN((__force blist_flags_t)(1 << 0))
+#define BLIST_NOLUN((__force blist_flags_t)(1ULL << 0))
 /* Known to have LUNs, force scanning.
  * DEPRECATED: Use max_luns=N */
-#define BLIST_FORCELUN ((__force blist_flags_t)(1 << 1))
+#define BLIST_FORCELUN ((__force blist_flags_t)(1ULL << 1))
 /* Flag for broken handshaking */
-#define BLIST_BORKEN   ((__force blist_flags_t)(1 << 2))
+#define BLIST_BORKEN   ((__force blist_flags_t)(1ULL << 2))
 /* unlock by special command */
-#define BLIST_KEY  ((__force blist_flags_t)(1 << 3))
+#define BLIST_KEY  ((__force blist_flags_t)(1ULL << 3))
 /* Do not use LUNs in parallel */
-#define BLIST_SINGLELUN((__force blist_flags_t)(1 << 4))
+#define BLIST_SINGLELUN((__force blist_flags_t)(1ULL << 4))
 /* Buggy Tagged Command Queuing */
-#define BLIST_NOTQ ((__force blist_flags_t)(1 << 5))
+#define BLIST_NOTQ ((__force blist_flags_t)(1ULL << 5))
 /* Non consecutive LUN numbering */
-#define BLIST_SPARSELUN((__force blist_flags_t)(1 << 6))
+#define BLIST_SPARSELUN((__force blist_flags_t)(1ULL << 6))
 /

[PATCH v3 2/6] scsi: use const_ilog2 for array indices

2018-04-17 Thread Martin Wilck
Use the just introduced const_ilog2() macro to avoid sparse
errors.

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_debugfs.c | 2 +-
 drivers/scsi/scsi_sysfs.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c
index b784002..c5a8756 100644
--- a/drivers/scsi/scsi_debugfs.c
+++ b/drivers/scsi/scsi_debugfs.c
@@ -4,7 +4,7 @@
 #include 
 #include "scsi_debugfs.h"
 
-#define SCSI_CMD_FLAG_NAME(name) [ilog2(SCMD_##name)] = #name
+#define SCSI_CMD_FLAG_NAME(name)[const_ilog2(SCMD_##name)] = #name
 static const char *const scsi_cmd_flags[] = {
SCSI_CMD_FLAG_NAME(TAGGED),
SCSI_CMD_FLAG_NAME(UNCHECKED_ISA_DMA),
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e56a4ac..feacd7a 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
-- 
2.16.1



[PATCH v3 3/6] scsi: devinfo: change blist_flag_t to 64bit

2018-04-17 Thread Martin Wilck
Space for SCSI blist flags is gradually running out. Change the
type to __u64. And fix a checkpatch complaint about symbolic
mode flags in scsi_devinfo.c.

Make checkpatch happy by replacing simple_strtoul() with kstrtoull().

Signed-off-by: Martin Wilck 
---
 drivers/scsi/scsi_devinfo.c | 18 +++-
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 50 ++---
 4 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index e5370d7..fc6b755 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -361,8 +361,16 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
scsi_strcpy_devinfo("model", devinfo->model, sizeof(devinfo->model),
model, compatible);
 
-   if (strflags)
-   flags = (__force blist_flags_t)simple_strtoul(strflags, NULL, 
0);
+   if (strflags) {
+   unsigned long long val;
+   int ret = kstrtoull(strflags, 0, );
+
+   if (ret != 0) {
+   kfree(devinfo);
+   return ret;
+   }
+   flags = (__force blist_flags_t)val;
+   }
devinfo->flags = flags;
devinfo->compatible = compatible;
 
@@ -615,7 +623,7 @@ static int devinfo_seq_show(struct seq_file *m, void *v)
devinfo_table->name)
seq_printf(m, "[%s]:\n", devinfo_table->name);
 
-   seq_printf(m, "'%.8s' '%.16s' 0x%x\n",
+   seq_printf(m, "'%.8s' '%.16s' 0x%llx\n",
   devinfo->vendor, devinfo->model, devinfo->flags);
return 0;
 }
@@ -734,9 +742,9 @@ MODULE_PARM_DESC(dev_flags,
 " list entries for vendor and model with an integer value of flags"
 " to the scsi device info list");
 
-module_param_named(default_dev_flags, scsi_default_dev_flags, int, 
S_IRUGO|S_IWUSR);
+module_param_named(default_dev_flags, scsi_default_dev_flags, ullong, 0644);
 MODULE_PARM_DESC(default_dev_flags,
-"scsi default device flag integer value");
+"scsi default device flag uint64_t value");
 
 /**
  * scsi_exit_devinfo - remove /proc/scsi/device_info & the scsi_dev_info_list
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index feacd7a..43c27ce 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -968,7 +968,7 @@ sdev_show_wwid(struct device *dev, struct device_attribute 
*attr,
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
 #define BLIST_FLAG_NAME(name)  \
-   [const_ilog2((__force unsigned int)BLIST_##name)] = #name
+   [const_ilog2((__force __u64)BLIST_##name)] = #name
 static const char *const sdev_bflags_name[] = {
 #include "scsi_devinfo_tbl.c"
 };
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 7ae177c..4c36af6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -15,7 +15,7 @@ struct scsi_cmnd;
 struct scsi_lun;
 struct scsi_sense_hdr;
 
-typedef unsigned int __bitwise blist_flags_t;
+typedef __u64 __bitwise blist_flags_t;
 
 struct scsi_mode_data {
__u32   length;
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index ea67c32..e206d29 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -6,55 +6,55 @@
  */
 
 /* Only scan LUN 0 */
-#define BLIST_NOLUN((__force blist_flags_t)(1 << 0))
+#define BLIST_NOLUN((__force blist_flags_t)(1ULL << 0))
 /* Known to have LUNs, force scanning.
  * DEPRECATED: Use max_luns=N */
-#define BLIST_FORCELUN ((__force blist_flags_t)(1 << 1))
+#define BLIST_FORCELUN ((__force blist_flags_t)(1ULL << 1))
 /* Flag for broken handshaking */
-#define BLIST_BORKEN   ((__force blist_flags_t)(1 << 2))
+#define BLIST_BORKEN   ((__force blist_flags_t)(1ULL << 2))
 /* unlock by special command */
-#define BLIST_KEY  ((__force blist_flags_t)(1 << 3))
+#define BLIST_KEY  ((__force blist_flags_t)(1ULL << 3))
 /* Do not use LUNs in parallel */
-#define BLIST_SINGLELUN((__force blist_flags_t)(1 << 4))
+#define BLIST_SINGLELUN((__force blist_flags_t)(1ULL << 4))
 /* Buggy Tagged Command Queuing */
-#define BLIST_NOTQ ((__force blist_flags_t)(1 << 5))
+#define BLIST_NOTQ ((__force blist_flags_t)(1ULL << 5))
 /* Non consecutive LUN numbering */
-#define BLIST_SPARSELUN((__force blist_flags_t)(1 << 6))
+#define BLIST_SPARSELUN((__force blist_flags_t)(1ULL << 6))
 /* Avoid LUNS >= 5 */
-#define BLIST_MA

[PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND

2018-04-17 Thread Martin Wilck
Here is another attempt to handle the special return codes for ABORTED COMMAND
for certain SCSI devices. Following MKP's recommendation, I've created two
new BLIST flags, simplifying the code in scsi_error.c compared to the previous
versions. Rather than using "free" bits, I increased the length of
blist_flag_t to 64 bit, and used previously unused bits. I also added checking
for obsolete and unused bits.

For the blist_flag_t size increase, I used sparse to try and avoid regressions;
that necessitated fixing sparse's errors for the current code first.

Martin Wilck (6):
  ilog2: create truly constant version for sparse
  scsi: use const_ilog2 for array indices
  scsi: devinfo: change blist_flag_t to 64bit
  scsi: devinfo: warn on undefined blist flags
  scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix
  scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_debugfs.c |  2 +-
 drivers/scsi/scsi_devinfo.c | 28 +
 drivers/scsi/scsi_error.c   |  7 +
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/linux/log2.h| 35 ++---
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 75 ++---
 8 files changed, 107 insertions(+), 46 deletions(-)

-- 
2.16.1



[PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-17 Thread Martin Wilck
Sparse emits errors about ilog2() in array indices because of the use of
__ilog2_32() and __ilog2_64(), rightly so
(https://www.spinics.net/lists/linux-sparse/msg03471.html).

Create a const_ilog2() variant that works with sparse for this
scenario.

(Note: checkpatch.pl complains about missing parentheses, but that appears
to be a false positive. I can get rid of the warning simply by inserting
whitespace, making checkpatch "see" the whole macro).

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 include/linux/log2.h | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index 41a1ae0..2af7f778 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -72,16 +72,13 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 }
 
 /**
- * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
  * @n: parameter
  *
- * constant-capable log of base 2 calculation
- * - this can be used to initialise global variables from constant data, hence
- * the massive ternary operator construction
- *
- * selects the appropriately-sized optimised version depending on sizeof(n)
+ * Use this where sparse expects a true constant expression, e.g. for array
+ * indices.
  */
-#define ilog2(n)   \
+#define const_ilog2(n) \
 (  \
__builtin_constant_p(n) ? ( \
(n) < 2 ? 0 :   \
@@ -147,10 +144,26 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
(n) & (1ULL <<  4) ?  4 :   \
(n) & (1ULL <<  3) ?  3 :   \
(n) & (1ULL <<  2) ?  2 :   \
-   1 ) :   \
-   (sizeof(n) <= 4) ?  \
-   __ilog2_u32(n) :\
-   __ilog2_u64(n)  \
+   1) :\
+   -1)
+
+/**
+ * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * @n: parameter
+ *
+ * constant-capable log of base 2 calculation
+ * - this can be used to initialise global variables from constant data, hence
+ * the massive ternary operator construction
+ *
+ * selects the appropriately-sized optimised version depending on sizeof(n)
+ */
+#define ilog2(n) \
+( \
+   __builtin_constant_p(n) ?   \
+   const_ilog2(n) :\
+   (sizeof(n) <= 4) ?  \
+   __ilog2_u32(n) :\
+   __ilog2_u64(n)  \
  )
 
 /**
-- 
2.16.1



[PATCH v3 0/6] scsi: handle special return codes for ABORTED COMMAND

2018-04-17 Thread Martin Wilck
Here is another attempt to handle the special return codes for ABORTED COMMAND
for certain SCSI devices. Following MKP's recommendation, I've created two
new BLIST flags, simplifying the code in scsi_error.c compared to the previous
versions. Rather than using "free" bits, I increased the length of
blist_flag_t to 64 bit, and used previously unused bits. I also added checking
for obsolete and unused bits.

For the blist_flag_t size increase, I used sparse to try and avoid regressions;
that necessitated fixing sparse's errors for the current code first.

Martin Wilck (6):
  ilog2: create truly constant version for sparse
  scsi: use const_ilog2 for array indices
  scsi: devinfo: change blist_flag_t to 64bit
  scsi: devinfo: warn on undefined blist flags
  scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix
  scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS

 drivers/scsi/Makefile   |  2 +-
 drivers/scsi/scsi_debugfs.c |  2 +-
 drivers/scsi/scsi_devinfo.c | 28 +
 drivers/scsi/scsi_error.c   |  7 +
 drivers/scsi/scsi_sysfs.c   |  2 +-
 include/linux/log2.h| 35 ++---
 include/scsi/scsi_device.h  |  2 +-
 include/scsi/scsi_devinfo.h | 75 ++---
 8 files changed, 107 insertions(+), 46 deletions(-)

-- 
2.16.1



[PATCH v3 1/6] ilog2: create truly constant version for sparse

2018-04-17 Thread Martin Wilck
Sparse emits errors about ilog2() in array indices because of the use of
__ilog2_32() and __ilog2_64(), rightly so
(https://www.spinics.net/lists/linux-sparse/msg03471.html).

Create a const_ilog2() variant that works with sparse for this
scenario.

(Note: checkpatch.pl complains about missing parentheses, but that appears
to be a false positive. I can get rid of the warning simply by inserting
whitespace, making checkpatch "see" the whole macro).

Signed-off-by: Martin Wilck 
---
 include/linux/log2.h | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index 41a1ae0..2af7f778 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -72,16 +72,13 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 }
 
 /**
- * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * const_ilog2 - log base 2 of 32-bit or a 64-bit constant unsigned value
  * @n: parameter
  *
- * constant-capable log of base 2 calculation
- * - this can be used to initialise global variables from constant data, hence
- * the massive ternary operator construction
- *
- * selects the appropriately-sized optimised version depending on sizeof(n)
+ * Use this where sparse expects a true constant expression, e.g. for array
+ * indices.
  */
-#define ilog2(n)   \
+#define const_ilog2(n) \
 (  \
__builtin_constant_p(n) ? ( \
(n) < 2 ? 0 :   \
@@ -147,10 +144,26 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
(n) & (1ULL <<  4) ?  4 :   \
(n) & (1ULL <<  3) ?  3 :   \
(n) & (1ULL <<  2) ?  2 :   \
-   1 ) :   \
-   (sizeof(n) <= 4) ?  \
-   __ilog2_u32(n) :\
-   __ilog2_u64(n)  \
+   1) :\
+   -1)
+
+/**
+ * ilog2 - log base 2 of 32-bit or a 64-bit unsigned value
+ * @n: parameter
+ *
+ * constant-capable log of base 2 calculation
+ * - this can be used to initialise global variables from constant data, hence
+ * the massive ternary operator construction
+ *
+ * selects the appropriately-sized optimised version depending on sizeof(n)
+ */
+#define ilog2(n) \
+( \
+   __builtin_constant_p(n) ?   \
+   const_ilog2(n) :\
+   (sizeof(n) <= 4) ?  \
+   __ilog2_u32(n) :\
+   __ilog2_u64(n)  \
  )
 
 /**
-- 
2.16.1



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Wed, 2018-01-24 at 11:41 -0800, Khazhismel Kumykov wrote:
> On Wed, Jan 24, 2018 at 11:09 AM, Martin Wilck <mwi...@suse.com>
> wrote:
> > On Wed, 2018-01-24 at 10:44 -0800, Khazhismel Kumykov wrote:
> > > On Wed, Jan 24, 2018 at 2:57 AM, Martin Wilck <mwi...@suse.com>
> > > wrote:
> > > > On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> > > > > Move the last used path to the end of the list (least
> > > > > preferred)
> > > > > so
> > > > > that
> > > > > ties are more evenly distributed.
> > > > > 
> > > > > For example, in case with three paths with one that is slower
> > > > > than
> > > > > others, the remaining two would be unevenly used if they tie.
> > > > > This is
> > > > > due to the rotation not being a truely fair distribution.
> > > > > 
> > > > > Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b
> > > > > are
> > > > > 'tied'
> > > > > Three possible rotations:
> > > > > (a, b, c) -> best path 'a'
> > > > > (b, c, a) -> best path 'b'
> > > > > (c, a, b) -> best path 'a'
> > > > > (a, b, c) -> best path 'a'
> > > > > (b, c, a) -> best path 'b'
> > > > > (c, a, b) -> best path 'a'
> > > > > ...
> > > > 
> > > > 
> > > > This happens only if a and b actually have the same weight
> > > > (e.g.
> > > > queue
> > > > length for the queue-length selector). If 'a' really receives
> > > > more
> > > > IO,
> > > > its queue grows, and the selector will start preferring 'b', so
> > > > the
> > > > effect should level out automatically with the current code as
> > > > soon
> > > > as
> > > > you have real IO going on. But maybe I haven't grasped what
> > > > you're
> > > > referring to as "tied".
> > > 
> > > Yes, for "tied" I'm referring to paths with the same weight. As
> > > queue
> > > length grows it does tend to level out, but in the case where
> > > queue
> > > length doesn't grow (in this example I'm imagining 2 concurrent
> > > requests to the device) the bias does show and the selectors
> > > really
> > > do
> > > send 'a' 2x more requests than 'b' (when 'c' is much slower and
> > > 'a'
> > > and 'b' are ~same speed).
> > 
> > Have you actually observed this? I find the idea pretty academical
> > that
> > two paths would be walking "tied" this way. In practice, under IO
> > load,
> > I'd expect queue lengths (and service-times, for that matter) to
> > fluctuate enough to prevent this effect to be measurable. But of
> > course, I may be wrong. If you really did observe this, the next
> > question would be: does hurt performance to an extent that can be
> > noticed/measured? I reckon that if 'a' got saturated under the
> > load,
> > hurting performance, its queue length would grow quickly and 'b'
> > would
> > automatically get preferred.
> 
> This is fairly simple to observe - start two sync reader threads
> against a device with 3 backing paths, with one path taking longer on
> average to complete requests than the other two. One of the 'faster'
> paths will be used ~2x more than the other. Perhaps not that common a
> situation, but is a real one. The bias seemed simple to remove, so
> that the two (or N) paths would be used equally.
> 
> I don't see a downside to more evenly distributing in this case,
> although I can't say I've directly observed performance downsides for
> a biased distribution (aside from the distribution being biased
> itself
> being a downside).
> The bias of note is inherent to the order paths are added to the
> selector (and which path is 'always bad'), so if 'a' is saturated due
> to this, then slows, once it recovers it will continue to be
> preferred, versus in an even distribution.

Well, the expectation is indeed that load is spread equally, and I can
also see no downside. So:

Reviewed-by: Martin Wilck <mwi...@suse.com>

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Wed, 2018-01-24 at 11:41 -0800, Khazhismel Kumykov wrote:
> On Wed, Jan 24, 2018 at 11:09 AM, Martin Wilck 
> wrote:
> > On Wed, 2018-01-24 at 10:44 -0800, Khazhismel Kumykov wrote:
> > > On Wed, Jan 24, 2018 at 2:57 AM, Martin Wilck 
> > > wrote:
> > > > On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> > > > > Move the last used path to the end of the list (least
> > > > > preferred)
> > > > > so
> > > > > that
> > > > > ties are more evenly distributed.
> > > > > 
> > > > > For example, in case with three paths with one that is slower
> > > > > than
> > > > > others, the remaining two would be unevenly used if they tie.
> > > > > This is
> > > > > due to the rotation not being a truely fair distribution.
> > > > > 
> > > > > Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b
> > > > > are
> > > > > 'tied'
> > > > > Three possible rotations:
> > > > > (a, b, c) -> best path 'a'
> > > > > (b, c, a) -> best path 'b'
> > > > > (c, a, b) -> best path 'a'
> > > > > (a, b, c) -> best path 'a'
> > > > > (b, c, a) -> best path 'b'
> > > > > (c, a, b) -> best path 'a'
> > > > > ...
> > > > 
> > > > 
> > > > This happens only if a and b actually have the same weight
> > > > (e.g.
> > > > queue
> > > > length for the queue-length selector). If 'a' really receives
> > > > more
> > > > IO,
> > > > its queue grows, and the selector will start preferring 'b', so
> > > > the
> > > > effect should level out automatically with the current code as
> > > > soon
> > > > as
> > > > you have real IO going on. But maybe I haven't grasped what
> > > > you're
> > > > referring to as "tied".
> > > 
> > > Yes, for "tied" I'm referring to paths with the same weight. As
> > > queue
> > > length grows it does tend to level out, but in the case where
> > > queue
> > > length doesn't grow (in this example I'm imagining 2 concurrent
> > > requests to the device) the bias does show and the selectors
> > > really
> > > do
> > > send 'a' 2x more requests than 'b' (when 'c' is much slower and
> > > 'a'
> > > and 'b' are ~same speed).
> > 
> > Have you actually observed this? I find the idea pretty academical
> > that
> > two paths would be walking "tied" this way. In practice, under IO
> > load,
> > I'd expect queue lengths (and service-times, for that matter) to
> > fluctuate enough to prevent this effect to be measurable. But of
> > course, I may be wrong. If you really did observe this, the next
> > question would be: does hurt performance to an extent that can be
> > noticed/measured? I reckon that if 'a' got saturated under the
> > load,
> > hurting performance, its queue length would grow quickly and 'b'
> > would
> > automatically get preferred.
> 
> This is fairly simple to observe - start two sync reader threads
> against a device with 3 backing paths, with one path taking longer on
> average to complete requests than the other two. One of the 'faster'
> paths will be used ~2x more than the other. Perhaps not that common a
> situation, but is a real one. The bias seemed simple to remove, so
> that the two (or N) paths would be used equally.
> 
> I don't see a downside to more evenly distributing in this case,
> although I can't say I've directly observed performance downsides for
> a biased distribution (aside from the distribution being biased
> itself
> being a downside).
> The bias of note is inherent to the order paths are added to the
> selector (and which path is 'always bad'), so if 'a' is saturated due
> to this, then slows, once it recovers it will continue to be
> preferred, versus in an even distribution.

Well, the expectation is indeed that load is spread equally, and I can
also see no downside. So:

Reviewed-by: Martin Wilck 

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Wed, 2018-01-24 at 10:44 -0800, Khazhismel Kumykov wrote:
> On Wed, Jan 24, 2018 at 2:57 AM, Martin Wilck <mwi...@suse.com>
> wrote:
> > On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> > > Move the last used path to the end of the list (least preferred)
> > > so
> > > that
> > > ties are more evenly distributed.
> > > 
> > > For example, in case with three paths with one that is slower
> > > than
> > > others, the remaining two would be unevenly used if they tie.
> > > This is
> > > due to the rotation not being a truely fair distribution.
> > > 
> > > Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b are
> > > 'tied'
> > > Three possible rotations:
> > > (a, b, c) -> best path 'a'
> > > (b, c, a) -> best path 'b'
> > > (c, a, b) -> best path 'a'
> > > (a, b, c) -> best path 'a'
> > > (b, c, a) -> best path 'b'
> > > (c, a, b) -> best path 'a'
> > > ...
> > 
> > 
> > This happens only if a and b actually have the same weight (e.g.
> > queue
> > length for the queue-length selector). If 'a' really receives more
> > IO,
> > its queue grows, and the selector will start preferring 'b', so the
> > effect should level out automatically with the current code as soon
> > as
> > you have real IO going on. But maybe I haven't grasped what you're
> > referring to as "tied".
> 
> Yes, for "tied" I'm referring to paths with the same weight. As queue
> length grows it does tend to level out, but in the case where queue
> length doesn't grow (in this example I'm imagining 2 concurrent
> requests to the device) the bias does show and the selectors really
> do
> send 'a' 2x more requests than 'b' (when 'c' is much slower and 'a'
> and 'b' are ~same speed).

Have you actually observed this? I find the idea pretty academical that
two paths would be walking "tied" this way. In practice, under IO load,
I'd expect queue lengths (and service-times, for that matter) to
fluctuate enough to prevent this effect to be measurable. But of
course, I may be wrong. If you really did observe this, the next
question would be: does hurt performance to an extent that can be
noticed/measured? I reckon that if 'a' got saturated under the load,
hurting performance, its queue length would grow quickly and 'b' would
automatically get preferred.

> > OTOH, if the "best" path has much lower queue length than the other
> > paths for whatever reason, your pushing it to the tail will require
> > a
> > full list walk with every new call of the selector. I see tjat as a
> > small disadvantage of your approach.
> 
> I see, with best at the tail, we would not see as much benefit from
> the check if a path has no IOs on it in queue-length. In service-time
> no such short circuit exists, so I don't believe anything changes
> there. Am I understanding this correctly?

Forget that. I was confused. We're walking the entire list every time
anyway, except for the . I find it strange to put the best candidate to
the tail every time, but I can't prove that it really hurts.

Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Wed, 2018-01-24 at 10:44 -0800, Khazhismel Kumykov wrote:
> On Wed, Jan 24, 2018 at 2:57 AM, Martin Wilck 
> wrote:
> > On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> > > Move the last used path to the end of the list (least preferred)
> > > so
> > > that
> > > ties are more evenly distributed.
> > > 
> > > For example, in case with three paths with one that is slower
> > > than
> > > others, the remaining two would be unevenly used if they tie.
> > > This is
> > > due to the rotation not being a truely fair distribution.
> > > 
> > > Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b are
> > > 'tied'
> > > Three possible rotations:
> > > (a, b, c) -> best path 'a'
> > > (b, c, a) -> best path 'b'
> > > (c, a, b) -> best path 'a'
> > > (a, b, c) -> best path 'a'
> > > (b, c, a) -> best path 'b'
> > > (c, a, b) -> best path 'a'
> > > ...
> > 
> > 
> > This happens only if a and b actually have the same weight (e.g.
> > queue
> > length for the queue-length selector). If 'a' really receives more
> > IO,
> > its queue grows, and the selector will start preferring 'b', so the
> > effect should level out automatically with the current code as soon
> > as
> > you have real IO going on. But maybe I haven't grasped what you're
> > referring to as "tied".
> 
> Yes, for "tied" I'm referring to paths with the same weight. As queue
> length grows it does tend to level out, but in the case where queue
> length doesn't grow (in this example I'm imagining 2 concurrent
> requests to the device) the bias does show and the selectors really
> do
> send 'a' 2x more requests than 'b' (when 'c' is much slower and 'a'
> and 'b' are ~same speed).

Have you actually observed this? I find the idea pretty academical that
two paths would be walking "tied" this way. In practice, under IO load,
I'd expect queue lengths (and service-times, for that matter) to
fluctuate enough to prevent this effect to be measurable. But of
course, I may be wrong. If you really did observe this, the next
question would be: does hurt performance to an extent that can be
noticed/measured? I reckon that if 'a' got saturated under the load,
hurting performance, its queue length would grow quickly and 'b' would
automatically get preferred.

> > OTOH, if the "best" path has much lower queue length than the other
> > paths for whatever reason, your pushing it to the tail will require
> > a
> > full list walk with every new call of the selector. I see tjat as a
> > small disadvantage of your approach.
> 
> I see, with best at the tail, we would not see as much benefit from
> the check if a path has no IOs on it in queue-length. In service-time
> no such short circuit exists, so I don't believe anything changes
> there. Am I understanding this correctly?

Forget that. I was confused. We're walking the entire list every time
anyway, except for the . I find it strange to put the best candidate to
the tail every time, but I can't prove that it really hurts.

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> Move the last used path to the end of the list (least preferred) so
> that
> ties are more evenly distributed.
> 
> For example, in case with three paths with one that is slower than
> others, the remaining two would be unevenly used if they tie. This is
> due to the rotation not being a truely fair distribution.
> 
> Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b are
> 'tied'
> Three possible rotations:
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> ...


This happens only if a and b actually have the same weight (e.g. queue
length for the queue-length selector). If 'a' really receives more IO,
its queue grows, and the selector will start preferring 'b', so the
effect should level out automatically with the current code as soon as
you have real IO going on. But maybe I haven't grasped what you're
referring to as "tied".

OTOH, if the "best" path has much lower queue length than the other
paths for whatever reason, your pushing it to the tail will require a
full list walk with every new call of the selector. I see tjat as a
small disadvantage of your approach.

Regards
Martin

> 
> So 'a' is used 2x more than 'b', although they should be used evenly.
> 
> With this change, the most recently used path is always the least
> preferred, removing this bias resulting in even distribution.
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> (c, b, a) -> best path 'b'
> ...
> 
> Signed-off-by: Khazhismel Kumykov <kha...@google.com>
> ---
>  drivers/md/dm-queue-length.c | 6 +++---
>  drivers/md/dm-service-time.c | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-
> length.c
> index 23f178641794..969c4f1a3633 100644
> --- a/drivers/md/dm-queue-length.c
> +++ b/drivers/md/dm-queue-length.c
> @@ -195,9 +195,6 @@ static struct dm_path *ql_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (list_empty(>valid_paths))
>   goto out;
>  
> - /* Change preferred (first in list) path to evenly balance.
> */
> - list_move_tail(s->valid_paths.next, >valid_paths);
> -
>   list_for_each_entry(pi, >valid_paths, list) {
>   if (!best ||
>   (atomic_read(>qlen) < atomic_read(
> >qlen)))
> @@ -210,6 +207,9 @@ static struct dm_path *ql_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (!best)
>   goto out;
>  
> + /* Move most recently used to least preferred to evenly
> balance. */
> + list_move_tail(>list, >valid_paths);
> +
>   ret = best->path;
>  out:
>   spin_unlock_irqrestore(>lock, flags);
> diff --git a/drivers/md/dm-service-time.c b/drivers/md/dm-service-
> time.c
> index 7b8642045c55..f006a9005593 100644
> --- a/drivers/md/dm-service-time.c
> +++ b/drivers/md/dm-service-time.c
> @@ -282,9 +282,6 @@ static struct dm_path *st_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (list_empty(>valid_paths))
>   goto out;
>  
> - /* Change preferred (first in list) path to evenly balance.
> */
> - list_move_tail(s->valid_paths.next, >valid_paths);
> -
>   list_for_each_entry(pi, >valid_paths, list)
>   if (!best || (st_compare_load(pi, best, nr_bytes) <
> 0))
>   best = pi;
> @@ -292,6 +289,9 @@ static struct dm_path *st_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (!best)
>   goto out;
>  
> + /* Move most recently used to least preferred to evenly
> balance. */
> + list_move_tail(>list, >valid_paths);
> +
>   ret = best->path;
>  out:
>   spin_unlock_irqrestore(>lock, flags);
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [dm-devel] [PATCH] dm mpath selector: more evenly distribute ties

2018-01-24 Thread Martin Wilck
On Fri, 2018-01-19 at 15:07 -0800, Khazhismel Kumykov wrote:
> Move the last used path to the end of the list (least preferred) so
> that
> ties are more evenly distributed.
> 
> For example, in case with three paths with one that is slower than
> others, the remaining two would be unevenly used if they tie. This is
> due to the rotation not being a truely fair distribution.
> 
> Illustrated: paths a, b, c, 'c' has 1 outstanding IO, a and b are
> 'tied'
> Three possible rotations:
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> ...


This happens only if a and b actually have the same weight (e.g. queue
length for the queue-length selector). If 'a' really receives more IO,
its queue grows, and the selector will start preferring 'b', so the
effect should level out automatically with the current code as soon as
you have real IO going on. But maybe I haven't grasped what you're
referring to as "tied".

OTOH, if the "best" path has much lower queue length than the other
paths for whatever reason, your pushing it to the tail will require a
full list walk with every new call of the selector. I see tjat as a
small disadvantage of your approach.

Regards
Martin

> 
> So 'a' is used 2x more than 'b', although they should be used evenly.
> 
> With this change, the most recently used path is always the least
> preferred, removing this bias resulting in even distribution.
> (a, b, c) -> best path 'a'
> (b, c, a) -> best path 'b'
> (c, a, b) -> best path 'a'
> (c, b, a) -> best path 'b'
> ...
> 
> Signed-off-by: Khazhismel Kumykov 
> ---
>  drivers/md/dm-queue-length.c | 6 +++---
>  drivers/md/dm-service-time.c | 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/dm-queue-length.c b/drivers/md/dm-queue-
> length.c
> index 23f178641794..969c4f1a3633 100644
> --- a/drivers/md/dm-queue-length.c
> +++ b/drivers/md/dm-queue-length.c
> @@ -195,9 +195,6 @@ static struct dm_path *ql_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (list_empty(>valid_paths))
>   goto out;
>  
> - /* Change preferred (first in list) path to evenly balance.
> */
> - list_move_tail(s->valid_paths.next, >valid_paths);
> -
>   list_for_each_entry(pi, >valid_paths, list) {
>   if (!best ||
>   (atomic_read(>qlen) < atomic_read(
> >qlen)))
> @@ -210,6 +207,9 @@ static struct dm_path *ql_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (!best)
>   goto out;
>  
> + /* Move most recently used to least preferred to evenly
> balance. */
> + list_move_tail(>list, >valid_paths);
> +
>   ret = best->path;
>  out:
>   spin_unlock_irqrestore(>lock, flags);
> diff --git a/drivers/md/dm-service-time.c b/drivers/md/dm-service-
> time.c
> index 7b8642045c55..f006a9005593 100644
> --- a/drivers/md/dm-service-time.c
> +++ b/drivers/md/dm-service-time.c
> @@ -282,9 +282,6 @@ static struct dm_path *st_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (list_empty(>valid_paths))
>   goto out;
>  
> - /* Change preferred (first in list) path to evenly balance.
> */
> - list_move_tail(s->valid_paths.next, >valid_paths);
> -
>   list_for_each_entry(pi, >valid_paths, list)
>   if (!best || (st_compare_load(pi, best, nr_bytes) <
> 0))
>   best = pi;
> @@ -292,6 +289,9 @@ static struct dm_path *st_select_path(struct
> path_selector *ps, size_t nr_bytes)
>   if (!best)
>   goto out;
>  
> + /* Move most recently used to least preferred to evenly
> balance. */
> + list_move_tail(>list, >valid_paths);
> +
>   ret = best->path;
>  out:
>   spin_unlock_irqrestore(>lock, flags);
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH] scsi: libfc: fix ELS request handling

2017-11-25 Thread Martin Wilck
The modification of fc_lport_recv_els_req() in commit fcabb09e59a7
(merged in 4.12-rc1) caused certain requests not to be handled at all.
Fix that.

Fixes: fcabb09e59a7 "scsi: libfc: directly call ELS request handlers"
Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/scsi/libfc/fc_lport.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
index 2fd0ec651170..787e82435241 100644
--- a/drivers/scsi/libfc/fc_lport.c
+++ b/drivers/scsi/libfc/fc_lport.c
@@ -904,10 +904,14 @@ static void fc_lport_recv_els_req(struct fc_lport *lport,
case ELS_FLOGI:
if (!lport->point_to_multipoint)
fc_lport_recv_flogi_req(lport, fp);
+   else
+   fc_rport_recv_req(lport, fp);
break;
case ELS_LOGO:
if (fc_frame_sid(fp) == FC_FID_FLOGI)
fc_lport_recv_logo_req(lport, fp);
+   else
+   fc_rport_recv_req(lport, fp);
break;
case ELS_RSCN:
lport->tt.disc_recv_req(lport, fp);
-- 
2.15.0



[PATCH] scsi: libfc: fix ELS request handling

2017-11-25 Thread Martin Wilck
The modification of fc_lport_recv_els_req() in commit fcabb09e59a7
(merged in 4.12-rc1) caused certain requests not to be handled at all.
Fix that.

Fixes: fcabb09e59a7 "scsi: libfc: directly call ELS request handlers"
Signed-off-by: Martin Wilck 
---
 drivers/scsi/libfc/fc_lport.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
index 2fd0ec651170..787e82435241 100644
--- a/drivers/scsi/libfc/fc_lport.c
+++ b/drivers/scsi/libfc/fc_lport.c
@@ -904,10 +904,14 @@ static void fc_lport_recv_els_req(struct fc_lport *lport,
case ELS_FLOGI:
if (!lport->point_to_multipoint)
fc_lport_recv_flogi_req(lport, fp);
+   else
+   fc_rport_recv_req(lport, fp);
break;
case ELS_LOGO:
if (fc_frame_sid(fp) == FC_FID_FLOGI)
fc_lport_recv_logo_req(lport, fp);
+   else
+   fc_rport_recv_req(lport, fp);
break;
case ELS_RSCN:
lport->tt.disc_recv_req(lport, fp);
-- 
2.15.0



Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Fri, 2017-09-29 at 16:59 -0600, Keith Busch wrote:
> On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote:
> > In the NVME subsystem, we're seeing a race condition with udev
> > where
> > device_add_disk() is called (which triggers an "add" uevent), and a
> > sysfs attribute group is added to the disk device afterwards.
> > If udev rules access these attributes before they are created,
> > udev processing of the device is incomplete, in particular, device
> > WWIDs may not be determined correctly.
> > 
> > To fix this, this patch introduces a new function
> > device_add_disk_with_groups(), which takes a list of attribute
> > groups
> > and adds them to the device before sending out uevents.
> > 
> > Signed-off-by: Martin Wilck <mwi...@suse.com>
> 
> Is NVMe the only one having this problem?

There are other devices that follow the same programming pattern
(device_add_disk followed by sysfs_create_group), but I haven't tested
them all, nor reviewed whether these devices need the disk's sysfs
attributes for udev processing. If they don't, the problem will just go
unnoticed.

SCSI obviously takes a very different approach to sysfs layout.

> Was putting our attributes in the disk's kobj a bad choice?

Well, it would make sense to separate the block (disk) device from the
device representing the NVMe subsys/namespace in sysfs. But I guess
it's too late for that now. Actually, the first attempt I made to solve
this problem was exactly that, and it proved to "work", too, but only
at the cost of changing the path of the NVMe block device in sysfs,
which I considered a no-go. Thus I came up with the approach I posted.

Regards,
Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Fri, 2017-09-29 at 16:59 -0600, Keith Busch wrote:
> On Thu, Sep 28, 2017 at 09:36:36PM +0200, Martin Wilck wrote:
> > In the NVME subsystem, we're seeing a race condition with udev
> > where
> > device_add_disk() is called (which triggers an "add" uevent), and a
> > sysfs attribute group is added to the disk device afterwards.
> > If udev rules access these attributes before they are created,
> > udev processing of the device is incomplete, in particular, device
> > WWIDs may not be determined correctly.
> > 
> > To fix this, this patch introduces a new function
> > device_add_disk_with_groups(), which takes a list of attribute
> > groups
> > and adds them to the device before sending out uevents.
> > 
> > Signed-off-by: Martin Wilck 
> 
> Is NVMe the only one having this problem?

There are other devices that follow the same programming pattern
(device_add_disk followed by sysfs_create_group), but I haven't tested
them all, nor reviewed whether these devices need the disk's sysfs
attributes for udev processing. If they don't, the problem will just go
unnoticed.

SCSI obviously takes a very different approach to sysfs layout.

> Was putting our attributes in the disk's kobj a bad choice?

Well, it would make sense to separate the block (disk) device from the
device representing the NVMe subsys/namespace in sysfs. But I guess
it's too late for that now. Actually, the first attempt I made to solve
this problem was exactly that, and it proved to "work", too, but only
at the cost of changing the path of the NVMe block device in sysfs,
which I considered a no-go. Thus I came up with the approach I posted.

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Sun, 2017-10-01 at 10:00 +0200, Christoph Hellwig wrote:
> While this looks okay-ish to me I really don't want people confused
> with three variants of add_disk, we really need to consolidate
> our helpers there a bit..
> 

Can you give me a hint what you'd like to see?

Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-10-04 Thread Martin Wilck
On Sun, 2017-10-01 at 10:00 +0200, Christoph Hellwig wrote:
> While this looks okay-ish to me I really don't want people confused
> with three variants of add_disk, we really need to consolidate
> our helpers there a bit..
> 

Can you give me a hint what you'd like to see?

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH 2/2] nvme: use device_add_disk_with_groups()

2017-09-28 Thread Martin Wilck
By using device_add_disk_with_groups(), we can avoid the race
condition with udev rule processing, because no udev event will
be triggered before all attributes are available.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/nvme/host/core.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5a14cc7f28ee..e7289a727715 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2156,6 +2156,11 @@ static const struct attribute_group nvme_ns_attr_group = 
{
.is_visible = nvme_ns_attrs_are_visible,
 };
 
+static const struct attribute_group *nvme_ns_attr_groups[] = {
+   _ns_attr_group,
+   NULL,
+};
+
 #define nvme_show_str_function(field)  
\
 static ssize_t  field##_show(struct device *dev,   
\
struct device_attribute *attr, char *buf)   
\
@@ -2405,11 +2410,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
 
kfree(id);
 
-   device_add_disk(ctrl->device, ns->disk);
-   if (sysfs_create_group(_to_dev(ns->disk)->kobj,
-   _ns_attr_group))
-   pr_warn("%s: failed to create sysfs group for identification\n",
-   ns->disk->disk_name);
+   device_add_disk_with_groups(ctrl->device, ns->disk,
+   nvme_ns_attr_groups);
if (ns->ndev && nvme_nvm_register_sysfs(ns))
pr_warn("%s: failed to register lightnvm sysfs group for 
identification\n",
ns->disk->disk_name);
-- 
2.14.0



[PATCH 2/2] nvme: use device_add_disk_with_groups()

2017-09-28 Thread Martin Wilck
By using device_add_disk_with_groups(), we can avoid the race
condition with udev rule processing, because no udev event will
be triggered before all attributes are available.

Signed-off-by: Martin Wilck 
---
 drivers/nvme/host/core.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5a14cc7f28ee..e7289a727715 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2156,6 +2156,11 @@ static const struct attribute_group nvme_ns_attr_group = 
{
.is_visible = nvme_ns_attrs_are_visible,
 };
 
+static const struct attribute_group *nvme_ns_attr_groups[] = {
+   _ns_attr_group,
+   NULL,
+};
+
 #define nvme_show_str_function(field)  
\
 static ssize_t  field##_show(struct device *dev,   
\
struct device_attribute *attr, char *buf)   
\
@@ -2405,11 +2410,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid)
 
kfree(id);
 
-   device_add_disk(ctrl->device, ns->disk);
-   if (sysfs_create_group(_to_dev(ns->disk)->kobj,
-   _ns_attr_group))
-   pr_warn("%s: failed to create sysfs group for identification\n",
-   ns->disk->disk_name);
+   device_add_disk_with_groups(ctrl->device, ns->disk,
+   nvme_ns_attr_groups);
if (ns->ndev && nvme_nvm_register_sysfs(ns))
pr_warn("%s: failed to register lightnvm sysfs group for 
identification\n",
ns->disk->disk_name);
-- 
2.14.0



[PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-09-28 Thread Martin Wilck
In the NVME subsystem, we're seeing a race condition with udev where
device_add_disk() is called (which triggers an "add" uevent), and a
sysfs attribute group is added to the disk device afterwards.
If udev rules access these attributes before they are created,
udev processing of the device is incomplete, in particular, device
WWIDs may not be determined correctly.

To fix this, this patch introduces a new function
device_add_disk_with_groups(), which takes a list of attribute groups
and adds them to the device before sending out uevents.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 block/genhd.c | 17 -
 include/linux/genhd.h |  8 +++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index dd305c65ffb0..1900682a221e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -552,7 +552,8 @@ static int exact_lock(dev_t devt, void *data)
return 0;
 }
 
-static void register_disk(struct device *parent, struct gendisk *disk)
+static void register_disk(struct device *parent, struct gendisk *disk,
+ const struct attribute_group **groups)
 {
struct device *ddev = disk_to_dev(disk);
struct block_device *bdev;
@@ -578,6 +579,9 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
}
}
 
+   if (groups != NULL && sysfs_create_groups(>kobj, groups))
+   dev_warn(ddev, "failed to add attribute groups");
+
/*
 * avoid probable deadlock caused by allocating memory with
 * GFP_KERNEL in runtime_resume callback of its all ancestor
@@ -619,16 +623,19 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
 }
 
 /**
- * device_add_disk - add partitioning information to kernel list
+ * device_add_disk_with_groups - add partitioning information to kernel list
  * @parent: parent device for the disk
  * @disk: per-device partitioning information
+ * @groups: NULL-terminated array of attribute groups
  *
  * This function registers the partitioning information in @disk
  * with the kernel.
  *
  * FIXME: error handling
  */
-void device_add_disk(struct device *parent, struct gendisk *disk)
+void device_add_disk_with_groups(struct device *parent,
+   struct gendisk *disk,
+   const struct attribute_group **groups)
 {
struct backing_dev_info *bdi;
dev_t devt;
@@ -664,7 +671,7 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);
-   register_disk(parent, disk);
+   register_disk(parent, disk, groups);
blk_register_queue(disk);
 
/*
@@ -680,7 +687,7 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
disk_add_events(disk);
blk_integrity_add(disk);
 }
-EXPORT_SYMBOL(device_add_disk);
+EXPORT_SYMBOL(device_add_disk_with_groups);
 
 void del_gendisk(struct gendisk *disk)
 {
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ea652bfcd675..3404d92d5063 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -386,7 +386,13 @@ static inline void free_part_info(struct hd_struct *part)
 extern void part_round_stats(struct request_queue *q, int cpu, struct 
hd_struct *part);
 
 /* block/genhd.c */
-extern void device_add_disk(struct device *parent, struct gendisk *disk);
+extern void device_add_disk_with_groups(struct device *parent,
+   struct gendisk *disk,
+   const struct attribute_group **groups);
+static inline void device_add_disk(struct device *parent, struct gendisk *disk)
+{
+   device_add_disk_with_groups(parent, disk, NULL);
+}
 static inline void add_disk(struct gendisk *disk)
 {
device_add_disk(NULL, disk);
-- 
2.14.0



[PATCH 1/2] block: genhd: add device_add_disk_with_groups

2017-09-28 Thread Martin Wilck
In the NVME subsystem, we're seeing a race condition with udev where
device_add_disk() is called (which triggers an "add" uevent), and a
sysfs attribute group is added to the disk device afterwards.
If udev rules access these attributes before they are created,
udev processing of the device is incomplete, in particular, device
WWIDs may not be determined correctly.

To fix this, this patch introduces a new function
device_add_disk_with_groups(), which takes a list of attribute groups
and adds them to the device before sending out uevents.

Signed-off-by: Martin Wilck 
---
 block/genhd.c | 17 -
 include/linux/genhd.h |  8 +++-
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index dd305c65ffb0..1900682a221e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -552,7 +552,8 @@ static int exact_lock(dev_t devt, void *data)
return 0;
 }
 
-static void register_disk(struct device *parent, struct gendisk *disk)
+static void register_disk(struct device *parent, struct gendisk *disk,
+ const struct attribute_group **groups)
 {
struct device *ddev = disk_to_dev(disk);
struct block_device *bdev;
@@ -578,6 +579,9 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
}
}
 
+   if (groups != NULL && sysfs_create_groups(>kobj, groups))
+   dev_warn(ddev, "failed to add attribute groups");
+
/*
 * avoid probable deadlock caused by allocating memory with
 * GFP_KERNEL in runtime_resume callback of its all ancestor
@@ -619,16 +623,19 @@ static void register_disk(struct device *parent, struct 
gendisk *disk)
 }
 
 /**
- * device_add_disk - add partitioning information to kernel list
+ * device_add_disk_with_groups - add partitioning information to kernel list
  * @parent: parent device for the disk
  * @disk: per-device partitioning information
+ * @groups: NULL-terminated array of attribute groups
  *
  * This function registers the partitioning information in @disk
  * with the kernel.
  *
  * FIXME: error handling
  */
-void device_add_disk(struct device *parent, struct gendisk *disk)
+void device_add_disk_with_groups(struct device *parent,
+   struct gendisk *disk,
+   const struct attribute_group **groups)
 {
struct backing_dev_info *bdi;
dev_t devt;
@@ -664,7 +671,7 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
 
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);
-   register_disk(parent, disk);
+   register_disk(parent, disk, groups);
blk_register_queue(disk);
 
/*
@@ -680,7 +687,7 @@ void device_add_disk(struct device *parent, struct gendisk 
*disk)
disk_add_events(disk);
blk_integrity_add(disk);
 }
-EXPORT_SYMBOL(device_add_disk);
+EXPORT_SYMBOL(device_add_disk_with_groups);
 
 void del_gendisk(struct gendisk *disk)
 {
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ea652bfcd675..3404d92d5063 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -386,7 +386,13 @@ static inline void free_part_info(struct hd_struct *part)
 extern void part_round_stats(struct request_queue *q, int cpu, struct 
hd_struct *part);
 
 /* block/genhd.c */
-extern void device_add_disk(struct device *parent, struct gendisk *disk);
+extern void device_add_disk_with_groups(struct device *parent,
+   struct gendisk *disk,
+   const struct attribute_group **groups);
+static inline void device_add_disk(struct device *parent, struct gendisk *disk)
+{
+   device_add_disk_with_groups(parent, disk, NULL);
+}
 static inline void add_disk(struct gendisk *disk)
 {
device_add_disk(NULL, disk);
-- 
2.14.0



Re: [PATCH] string.h: un-fortify memcpy_and_pad

2017-09-11 Thread Martin Wilck
On Wed, 2017-09-06 at 14:36 +0200, Martin Wilck wrote:
> The way I'd implemented the new helper memcpy_and_pad  with
> __FORTIFY_INLINE caused compiler warnings for certain kernel
> configurations.
> 
> This helper is only used in a single place at this time, and thus
> doesn't benefit much from fortification. So simplify the code
> by dropping fortification support for now.
> 
> Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()"
> Signed-off-by: Martin Wilck <mwi...@suse.com>
> Acked-by: Arnd Bergmann <a...@arndb.de>
> 
> ---
>  include/linux/string.h | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)

Hello Stephen and Christoph,

my broken patch 01f33c336e2d is in Linus' tree and causing compiler
warnings there. Could you please take care that this fix is pulled in
on top of it? Or should I take another action myself?

Thanks,
Martin


> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index e1eeb0a8a9693..54d21783e18dd 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -434,20 +434,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const
> char *q)
>   * @count: The number of bytes to copy
>   * @pad: Character to use for padding if space is left in
> destination.
>   */
> -__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
> -  const void *src, size_t count,
> int pad)
> +static inline void memcpy_and_pad(void *dest, size_t dest_len,
> +   const void *src, size_t count, int
> pad)
>  {
> - size_t dest_size = __builtin_object_size(dest, 0);
> - size_t src_size = __builtin_object_size(src, 0);
> -
> - if (__builtin_constant_p(dest_len) &&
> __builtin_constant_p(count)) {
> - if (dest_size < dest_len && dest_size < count)
> - __write_overflow();
> - else if (src_size < dest_len && src_size < count)
> - __read_overflow3();
> - }
> - if (dest_size < dest_len)
> - fortify_panic(__func__);
>   if (dest_len > count) {
>   memcpy(dest, src, count);
>   memset(dest + count, pad,  dest_len - count);



Re: [PATCH] string.h: un-fortify memcpy_and_pad

2017-09-11 Thread Martin Wilck
On Wed, 2017-09-06 at 14:36 +0200, Martin Wilck wrote:
> The way I'd implemented the new helper memcpy_and_pad  with
> __FORTIFY_INLINE caused compiler warnings for certain kernel
> configurations.
> 
> This helper is only used in a single place at this time, and thus
> doesn't benefit much from fortification. So simplify the code
> by dropping fortification support for now.
> 
> Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()"
> Signed-off-by: Martin Wilck 
> Acked-by: Arnd Bergmann 
> 
> ---
>  include/linux/string.h | 15 ++-
>  1 file changed, 2 insertions(+), 13 deletions(-)

Hello Stephen and Christoph,

my broken patch 01f33c336e2d is in Linus' tree and causing compiler
warnings there. Could you please take care that this fix is pulled in
on top of it? Or should I take another action myself?

Thanks,
Martin


> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index e1eeb0a8a9693..54d21783e18dd 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -434,20 +434,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const
> char *q)
>   * @count: The number of bytes to copy
>   * @pad: Character to use for padding if space is left in
> destination.
>   */
> -__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
> -  const void *src, size_t count,
> int pad)
> +static inline void memcpy_and_pad(void *dest, size_t dest_len,
> +   const void *src, size_t count, int
> pad)
>  {
> - size_t dest_size = __builtin_object_size(dest, 0);
> - size_t src_size = __builtin_object_size(src, 0);
> -
> - if (__builtin_constant_p(dest_len) &&
> __builtin_constant_p(count)) {
> - if (dest_size < dest_len && dest_size < count)
> - __write_overflow();
> - else if (src_size < dest_len && src_size < count)
> - __read_overflow3();
> - }
> - if (dest_size < dest_len)
> - fortify_panic(__func__);
>   if (dest_len > count) {
>   memcpy(dest, src, count);
>   memset(dest + count, pad,  dest_len - count);



[PATCH] string.h: un-fortify memcpy_and_pad

2017-09-06 Thread Martin Wilck
The way I'd implemented the new helper memcpy_and_pad  with
__FORTIFY_INLINE caused compiler warnings for certain kernel
configurations.

This helper is only used in a single place at this time, and thus
doesn't benefit much from fortification. So simplify the code
by dropping fortification support for now.

Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()"
Signed-off-by: Martin Wilck <mwi...@suse.com>
Acked-by: Arnd Bergmann <a...@arndb.de>

---
 include/linux/string.h | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index e1eeb0a8a9693..54d21783e18dd 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -434,20 +434,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
  * @count: The number of bytes to copy
  * @pad: Character to use for padding if space is left in destination.
  */
-__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
-const void *src, size_t count, int pad)
+static inline void memcpy_and_pad(void *dest, size_t dest_len,
+ const void *src, size_t count, int pad)
 {
-   size_t dest_size = __builtin_object_size(dest, 0);
-   size_t src_size = __builtin_object_size(src, 0);
-
-   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
-   if (dest_size < dest_len && dest_size < count)
-   __write_overflow();
-   else if (src_size < dest_len && src_size < count)
-   __read_overflow3();
-   }
-   if (dest_size < dest_len)
-   fortify_panic(__func__);
if (dest_len > count) {
memcpy(dest, src, count);
memset(dest + count, pad,  dest_len - count);
-- 
2.14.0



[PATCH] string.h: un-fortify memcpy_and_pad

2017-09-06 Thread Martin Wilck
The way I'd implemented the new helper memcpy_and_pad  with
__FORTIFY_INLINE caused compiler warnings for certain kernel
configurations.

This helper is only used in a single place at this time, and thus
doesn't benefit much from fortification. So simplify the code
by dropping fortification support for now.

Fixes: 01f33c336e2d "string.h: add memcpy_and_pad()"
Signed-off-by: Martin Wilck 
Acked-by: Arnd Bergmann 

---
 include/linux/string.h | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index e1eeb0a8a9693..54d21783e18dd 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -434,20 +434,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
  * @count: The number of bytes to copy
  * @pad: Character to use for padding if space is left in destination.
  */
-__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
-const void *src, size_t count, int pad)
+static inline void memcpy_and_pad(void *dest, size_t dest_len,
+ const void *src, size_t count, int pad)
 {
-   size_t dest_size = __builtin_object_size(dest, 0);
-   size_t src_size = __builtin_object_size(src, 0);
-
-   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
-   if (dest_size < dest_len && dest_size < count)
-   __write_overflow();
-   else if (src_size < dest_len && src_size < count)
-   __read_overflow3();
-   }
-   if (dest_size < dest_len)
-   fortify_panic(__func__);
if (dest_len > count) {
memcpy(dest, src, count);
memset(dest + count, pad,  dest_len - count);
-- 
2.14.0



Re: linux-next: build failure after merge of the akpm-current tree

2017-09-06 Thread Martin Wilck
Hello Stephen,

On Thu, 2017-08-31 at 18:21 +1000, Stephen Rothwell wrote:
> Hi Andrew,
> 
> After merging the akpm-current tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> In file included from
> /home/sfr/next/next/include/uapi/linux/uuid.h:21:0,
>  from /home/sfr/next/next/include/linux/uuid.h:19,
>  from
> /home/sfr/next/next/include/linux/mod_devicetable.h:12,
>  from /home/sfr/next/next/scripts/mod/devicetable-
> offsets.c:2:
> /home/sfr/next/next/include/linux/string.h: In function
> 'memcpy_and_pad':
> /home/sfr/next/next/include/linux/string.h:450:3: error: implicit
> declaration of function 'fortify_panic' [-Werror=implicit-function-
> declaration]
>fortify_panic(__func__);
>^
> 
> Caused by commit
> 
>   9b04e51112ba ("fortify: use WARN instead of BUG for now")
> 
> interacting with commit
> 
>   01f33c336e2d ("string.h: add memcpy_and_pad()")
> 
> from the block tree.
> 
> I have applied the following merge fix patch:
> 
> From: Stephen Rothwell <s...@canb.auug.org.au>
> Date: Thu, 31 Aug 2017 18:13:43 +1000
> Subject: [PATCH] fortify: use WARN instead of BUG for now fix
> 
> Signed-off-by: Stephen Rothwell <s...@canb.auug.org.au>
> ---
>  include/linux/string.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index edd2b6154b80..e3b713114732 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -447,7 +447,7 @@ __FORTIFY_INLINE void memcpy_and_pad(void *dest,
> size_t dest_len,
>   __read_overflow3();
>   }
>   if (dest_size < dest_len)
> - fortify_panic(__func__);
> + fortify_overflow(__func__);
>   if (dest_len > count) {
>   memcpy(dest, src, count);
>   memset(dest + count, pad,  dest_len - count);
> -- 
> 2.13.2

Arnd Bergmann spotted another problem with that patch. I decided to rip
out the "fortify" related code. I'll send a patch in a follow-up email.

Regards,
Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: linux-next: build failure after merge of the akpm-current tree

2017-09-06 Thread Martin Wilck
Hello Stephen,

On Thu, 2017-08-31 at 18:21 +1000, Stephen Rothwell wrote:
> Hi Andrew,
> 
> After merging the akpm-current tree, today's linux-next build (arm
> multi_v7_defconfig) failed like this:
> 
> In file included from
> /home/sfr/next/next/include/uapi/linux/uuid.h:21:0,
>  from /home/sfr/next/next/include/linux/uuid.h:19,
>  from
> /home/sfr/next/next/include/linux/mod_devicetable.h:12,
>  from /home/sfr/next/next/scripts/mod/devicetable-
> offsets.c:2:
> /home/sfr/next/next/include/linux/string.h: In function
> 'memcpy_and_pad':
> /home/sfr/next/next/include/linux/string.h:450:3: error: implicit
> declaration of function 'fortify_panic' [-Werror=implicit-function-
> declaration]
>fortify_panic(__func__);
>^
> 
> Caused by commit
> 
>   9b04e51112ba ("fortify: use WARN instead of BUG for now")
> 
> interacting with commit
> 
>   01f33c336e2d ("string.h: add memcpy_and_pad()")
> 
> from the block tree.
> 
> I have applied the following merge fix patch:
> 
> From: Stephen Rothwell 
> Date: Thu, 31 Aug 2017 18:13:43 +1000
> Subject: [PATCH] fortify: use WARN instead of BUG for now fix
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  include/linux/string.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index edd2b6154b80..e3b713114732 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -447,7 +447,7 @@ __FORTIFY_INLINE void memcpy_and_pad(void *dest,
> size_t dest_len,
>   __read_overflow3();
>   }
>   if (dest_size < dest_len)
> - fortify_panic(__func__);
> + fortify_overflow(__func__);
>   if (dest_len > count) {
>   memcpy(dest, src, count);
>   memset(dest + count, pad,  dest_len - count);
> -- 
> 2.13.2

Arnd Bergmann spotted another problem with that patch. I decided to rip
out the "fortify" related code. I'll send a patch in a follow-up email.

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH] string.h: un-fortify memcpy_and_pad

2017-09-05 Thread Martin Wilck
The way I'd implemented the new helper memcpy_and_pad  with
__FORTIFY_INLINE caused compiler warnings for certain kernel
configurations.

This helper is only used in a single place at this time, and thus
doesn't benefit much from fortification. So simplify the code
by dropping fortification support for now.

Fixes: 3c5fa8cd18f8 "string.h: add memcpy_and_pad()"
Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 include/linux/string.h | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 0bec4151b0eb9..0495cd3c81689 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -404,20 +404,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
  * @count: The number of bytes to copy
  * @pad: Character to use for padding if space is left in destination.
  */
-__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
-const void *src, size_t count, int pad)
+static inline void memcpy_and_pad(void *dest, size_t dest_len,
+ const void *src, size_t count, int pad)
 {
-   size_t dest_size = __builtin_object_size(dest, 0);
-   size_t src_size = __builtin_object_size(src, 0);
-
-   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
-   if (dest_size < dest_len && dest_size < count)
-   __write_overflow();
-   else if (src_size < dest_len && src_size < count)
-   __read_overflow3();
-   }
-   if (dest_size < dest_len)
-   fortify_panic(__func__);
if (dest_len > count) {
memcpy(dest, src, count);
memset(dest + count, pad,  dest_len - count);
-- 
2.14.0



[PATCH] string.h: un-fortify memcpy_and_pad

2017-09-05 Thread Martin Wilck
The way I'd implemented the new helper memcpy_and_pad  with
__FORTIFY_INLINE caused compiler warnings for certain kernel
configurations.

This helper is only used in a single place at this time, and thus
doesn't benefit much from fortification. So simplify the code
by dropping fortification support for now.

Fixes: 3c5fa8cd18f8 "string.h: add memcpy_and_pad()"
Signed-off-by: Martin Wilck 
---
 include/linux/string.h | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 0bec4151b0eb9..0495cd3c81689 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -404,20 +404,9 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
  * @count: The number of bytes to copy
  * @pad: Character to use for padding if space is left in destination.
  */
-__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
-const void *src, size_t count, int pad)
+static inline void memcpy_and_pad(void *dest, size_t dest_len,
+ const void *src, size_t count, int pad)
 {
-   size_t dest_size = __builtin_object_size(dest, 0);
-   size_t src_size = __builtin_object_size(src, 0);
-
-   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
-   if (dest_size < dest_len && dest_size < count)
-   __write_overflow();
-   else if (src_size < dest_len && src_size < count)
-   __read_overflow3();
-   }
-   if (dest_size < dest_len)
-   fortify_panic(__func__);
if (dest_len > count) {
memcpy(dest, src, count);
memset(dest + count, pad,  dest_len - count);
-- 
2.14.0



Re: [PATCH v4 2/3] string.h: add memcpy_and_pad()

2017-09-05 Thread Martin Wilck
On Tue, 2017-09-05 at 09:28 +0200, Arnd Bergmann wrote:
> 
> > +/**
> > + * memcpy_and_pad - Copy one buffer to another with padding
> > + * @dest: Where to copy to
> > + * @dest_len: The destination buffer size
> > + * @src: Where to copy from
> > + * @count: The number of bytes to copy
> > + * @pad: Character to use for padding if space is left in
> > destination.
> > + */
> > +__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
> > +const void *src, size_t count,
> > int pad)
> > +{
> 
> This is causing compile-time warnings for me:
> 
> In file included from /git/arm-soc/arch/x86/include/asm/string.h:2:0,
>  from /git/arm-soc/include/linux/string.h:18,
>  from /git/arm-soc/arch/x86/include/asm/page_32.h:34,
>  from /git/arm-soc/arch/x86/include/asm/page.h:13,
>  from /git/arm-
> soc/arch/x86/include/asm/thread_info.h:11,
>  from /git/arm-soc/include/linux/thread_info.h:37,
>  from /git/arm-soc/arch/x86/include/asm/preempt.h:6,
>  from /git/arm-soc/include/linux/preempt.h:80,
>  from /git/arm-soc/include/linux/spinlock.h:50,
>  from /git/arm-soc/include/linux/seqlock.h:35,
>  from /git/arm-soc/include/linux/time.h:5,
>  from /git/arm-soc/include/linux/stat.h:18,
>  from /git/arm-soc/include/linux/module.h:10,
>  from /git/arm-soc/drivers/md/dm-integrity.c:9:
> /git/arm-soc/arch/x86/include/asm/string_32.h:196:25: error:
> '__memcpy' is static but used in inline function 'memcpy_and_pad'
> which is not static [-Werror]
>  #define memcpy(t, f, n) __memcpy((t), (f), (n))
>  ^~~~
> /git/arm-soc/include/linux/string.h:466:3: note: in expansion of
> macro 'memcpy'
> 
>^
> /git/arm-soc/arch/x86/include/asm/string_32.h:196:25: error:
> '__memcpy' is static but used in inline function 'memcpy_and_pad'
> which is not static [-Werror]
>  #define memcpy(t, f, n) __memcpy((t), (f), (n))
>  ^~~~
> 
> The problem is the use of __FORTIFY_INLINE outside of the #ifdef
> section above it.
> I used an ugly local workaround, duplicating the function with a
> 'static inline' variant
> in an #else block. Alternatively we could add an extern version in
> lib/string.c for the
> non-fortified case.

I'm sorry. It seems that I messed the code up by trying to do it right.
I suggest to simply drop the fortification code from this function,
which is not a "common str/mem function" anyway. Please tell me if
that'd be ok for you.

I'll send a patch in a follow-up email. 

Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [PATCH v4 2/3] string.h: add memcpy_and_pad()

2017-09-05 Thread Martin Wilck
On Tue, 2017-09-05 at 09:28 +0200, Arnd Bergmann wrote:
> 
> > +/**
> > + * memcpy_and_pad - Copy one buffer to another with padding
> > + * @dest: Where to copy to
> > + * @dest_len: The destination buffer size
> > + * @src: Where to copy from
> > + * @count: The number of bytes to copy
> > + * @pad: Character to use for padding if space is left in
> > destination.
> > + */
> > +__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
> > +const void *src, size_t count,
> > int pad)
> > +{
> 
> This is causing compile-time warnings for me:
> 
> In file included from /git/arm-soc/arch/x86/include/asm/string.h:2:0,
>  from /git/arm-soc/include/linux/string.h:18,
>  from /git/arm-soc/arch/x86/include/asm/page_32.h:34,
>  from /git/arm-soc/arch/x86/include/asm/page.h:13,
>  from /git/arm-
> soc/arch/x86/include/asm/thread_info.h:11,
>  from /git/arm-soc/include/linux/thread_info.h:37,
>  from /git/arm-soc/arch/x86/include/asm/preempt.h:6,
>  from /git/arm-soc/include/linux/preempt.h:80,
>  from /git/arm-soc/include/linux/spinlock.h:50,
>  from /git/arm-soc/include/linux/seqlock.h:35,
>  from /git/arm-soc/include/linux/time.h:5,
>  from /git/arm-soc/include/linux/stat.h:18,
>  from /git/arm-soc/include/linux/module.h:10,
>  from /git/arm-soc/drivers/md/dm-integrity.c:9:
> /git/arm-soc/arch/x86/include/asm/string_32.h:196:25: error:
> '__memcpy' is static but used in inline function 'memcpy_and_pad'
> which is not static [-Werror]
>  #define memcpy(t, f, n) __memcpy((t), (f), (n))
>  ^~~~
> /git/arm-soc/include/linux/string.h:466:3: note: in expansion of
> macro 'memcpy'
> 
>^
> /git/arm-soc/arch/x86/include/asm/string_32.h:196:25: error:
> '__memcpy' is static but used in inline function 'memcpy_and_pad'
> which is not static [-Werror]
>  #define memcpy(t, f, n) __memcpy((t), (f), (n))
>  ^~~~
> 
> The problem is the use of __FORTIFY_INLINE outside of the #ifdef
> section above it.
> I used an ugly local workaround, duplicating the function with a
> 'static inline' variant
> in an #else block. Alternatively we could add an extern version in
> lib/string.c for the
> non-fortified case.

I'm sorry. It seems that I messed the code up by trying to do it right.
I suggest to simply drop the fortification code from this function,
which is not a "common str/mem function" anyway. Please tell me if
that'd be ok for you.

I'll send a patch in a follow-up email. 

Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH v4 3/3] nvmet_execute_identify_ctrl: use memcpy_and_pad()

2017-08-14 Thread Martin Wilck
This changes the earlier patch "nvmet: don't report 0-bytes
in serial number" to use the memcpy_and_pad() helper introduced
in a previous patch.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/nvme/target/admin-cmd.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index a53bb6635b837..7ccea863e0ab5 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -168,15 +168,6 @@ static void nvmet_execute_get_log_page(struct nvmet_req 
*req)
nvmet_req_complete(req, status);
 }
 
-static void copy_and_pad(char *dst, int dst_len, const char *src, int src_len)
-{
-   int len = min(src_len, dst_len);
-
-   memcpy(dst, src, len);
-   if (dst_len > len)
-   memset(dst + len, ' ', dst_len - len);
-}
-
 static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -196,8 +187,9 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
 
bin2hex(id->sn, >subsys->serial,
min(sizeof(ctrl->subsys->serial), sizeof(id->sn) / 2));
-   copy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1);
-   copy_and_pad(id->fr, sizeof(id->fr), UTS_RELEASE, strlen(UTS_RELEASE));
+   memcpy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.14.0



[PATCH v4 0/3] Improve readbility of NVME "wwid" attribute (target side)

2017-08-14 Thread Martin Wilck
Hi Christoph,

I'm reposting the target-side of my patch rebased against 4.13-rc
as requested.

NOTE: an error has occurred while merging the previous version of my patch.
This is fixed by patch 1/3 in the series - that's an important fix for 4.13,
please push forward. 2/3 and 3/3 move the "copy_and_pad" functionality to a 
generic
helper, as requested. I've split this off in case the generic function meets
criticism elsewhere.

Original cover letter:

With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Changes wrt v2:
 * 3/3: Make sure no underflow occurs (Joe Perches)

Changes wrt v3:
 * Rebased on 4.13-rc3.
 * Dropped client-side patch which was merged in nvme-4.13 already.
 * Split off bug fix (patch 1/3).

Martin Wilck (3):
  nvmet_execute_identify_ctrl: don't overwrite with 0-bytes
  string.h: add memcpy_and_pad()
  nvmet_execute_identify_ctrl: use memcpy_and_pad()

 drivers/nvme/target/admin-cmd.c | 20 +++-
 include/linux/string.h  | 30 ++
 2 files changed, 33 insertions(+), 17 deletions(-)

-- 
2.14.0



[PATCH v4 3/3] nvmet_execute_identify_ctrl: use memcpy_and_pad()

2017-08-14 Thread Martin Wilck
This changes the earlier patch "nvmet: don't report 0-bytes
in serial number" to use the memcpy_and_pad() helper introduced
in a previous patch.

Signed-off-by: Martin Wilck 
---
 drivers/nvme/target/admin-cmd.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index a53bb6635b837..7ccea863e0ab5 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -168,15 +168,6 @@ static void nvmet_execute_get_log_page(struct nvmet_req 
*req)
nvmet_req_complete(req, status);
 }
 
-static void copy_and_pad(char *dst, int dst_len, const char *src, int src_len)
-{
-   int len = min(src_len, dst_len);
-
-   memcpy(dst, src, len);
-   if (dst_len > len)
-   memset(dst + len, ' ', dst_len - len);
-}
-
 static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -196,8 +187,9 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
 
bin2hex(id->sn, >subsys->serial,
min(sizeof(ctrl->subsys->serial), sizeof(id->sn) / 2));
-   copy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1);
-   copy_and_pad(id->fr, sizeof(id->fr), UTS_RELEASE, strlen(UTS_RELEASE));
+   memcpy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.14.0



[PATCH v4 0/3] Improve readbility of NVME "wwid" attribute (target side)

2017-08-14 Thread Martin Wilck
Hi Christoph,

I'm reposting the target-side of my patch rebased against 4.13-rc
as requested.

NOTE: an error has occurred while merging the previous version of my patch.
This is fixed by patch 1/3 in the series - that's an important fix for 4.13,
please push forward. 2/3 and 3/3 move the "copy_and_pad" functionality to a 
generic
helper, as requested. I've split this off in case the generic function meets
criticism elsewhere.

Original cover letter:

With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Changes wrt v2:
 * 3/3: Make sure no underflow occurs (Joe Perches)

Changes wrt v3:
 * Rebased on 4.13-rc3.
 * Dropped client-side patch which was merged in nvme-4.13 already.
 * Split off bug fix (patch 1/3).

Martin Wilck (3):
  nvmet_execute_identify_ctrl: don't overwrite with 0-bytes
  string.h: add memcpy_and_pad()
  nvmet_execute_identify_ctrl: use memcpy_and_pad()

 drivers/nvme/target/admin-cmd.c | 20 +++-
 include/linux/string.h  | 30 ++
 2 files changed, 33 insertions(+), 17 deletions(-)

-- 
2.14.0



[PATCH v4 1/3] nvmet_execute_identify_ctrl: don't overwrite with 0-bytes

2017-08-14 Thread Martin Wilck
The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.

Fixes: 42de82a8b544 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/nvme/target/admin-cmd.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 2d7a98ab53fbf..a53bb6635b837 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -199,12 +199,6 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
copy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1);
copy_and_pad(id->fr, sizeof(id->fr), UTS_RELEASE, strlen(UTS_RELEASE));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
-
id->rab = 6;
 
/*
-- 
2.14.0



[PATCH v4 2/3] string.h: add memcpy_and_pad()

2017-08-14 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Note: the warnings reported by the kbuild test robot for this patch
are actually generated by the use of CONFIG_PROFILE_ALL_BRANCHES
together with __FORTIFY_INLINE.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.14.0



[PATCH v4 2/3] string.h: add memcpy_and_pad()

2017-08-14 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Note: the warnings reported by the kbuild test robot for this patch
are actually generated by the use of CONFIG_PROFILE_ALL_BRANCHES
together with __FORTIFY_INLINE.

Signed-off-by: Martin Wilck 
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.14.0



[PATCH v4 1/3] nvmet_execute_identify_ctrl: don't overwrite with 0-bytes

2017-08-14 Thread Martin Wilck
The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.

Fixes: 42de82a8b544 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck 
---
 drivers/nvme/target/admin-cmd.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 2d7a98ab53fbf..a53bb6635b837 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -199,12 +199,6 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
copy_and_pad(id->mn, sizeof(id->mn), model, sizeof(model) - 1);
copy_and_pad(id->fr, sizeof(id->fr), UTS_RELEASE, strlen(UTS_RELEASE));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
-
id->rab = 6;
 
/*
-- 
2.14.0



[PATCH v3 3/3] nvme: wwid_show: strip trailing 0-bytes

2017-07-20 Thread Martin Wilck
Some broken targets (such as the current Linux target) pad
model or serial fields with 0-bytes rather than spaces. The
NVME spec disallows 0 bytes in "ASCII" fields.
Thus strip trailing 0-bytes, too. Also make sure that we get no
underflow for pathological input.

Signed-off-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Hannes Reinecke <h...@suse.de>
Acked-by: Christoph Hellwig <h...@lst.de>

---
 drivers/nvme/host/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb96f4a7ae3a9..9c558ab485bbc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2001,9 +2001,11 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
return sprintf(buf, "eui.%8phN\n", ns->eui);
 
-   while (ctrl->serial[serial_len - 1] == ' ')
+   while (serial_len > 0 && (ctrl->serial[serial_len - 1] == ' ' ||
+ ctrl->serial[serial_len - 1] == '\0'))
serial_len--;
-   while (ctrl->model[model_len - 1] == ' ')
+   while (model_len > 0 && (ctrl->model[model_len - 1] == ' ' ||
+ctrl->model[model_len - 1] == '\0'))
model_len--;
 
return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-- 
2.13.2



[PATCH v3 0/3] Improve readbility of NVME "wwid" attribute

2017-07-20 Thread Martin Wilck
With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Changes wrt v2:
 * 3/3: Make sure no underflow occurs (Joe Perches)

Martin Wilck (3):
  string.h: add memcpy_and_pad()
  nvmet: identify controller: improve standard compliance
  nvme: wwid_show: strip trailing 0-bytes

 drivers/nvme/host/core.c|  6 --
 drivers/nvme/target/admin-cmd.c | 13 ++---
 include/linux/string.h  | 30 ++
 3 files changed, 40 insertions(+), 9 deletions(-)

-- 
2.13.2



[PATCH v3 3/3] nvme: wwid_show: strip trailing 0-bytes

2017-07-20 Thread Martin Wilck
Some broken targets (such as the current Linux target) pad
model or serial fields with 0-bytes rather than spaces. The
NVME spec disallows 0 bytes in "ASCII" fields.
Thus strip trailing 0-bytes, too. Also make sure that we get no
underflow for pathological input.

Signed-off-by: Martin Wilck 
Reviewed-by: Hannes Reinecke 
Acked-by: Christoph Hellwig 

---
 drivers/nvme/host/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb96f4a7ae3a9..9c558ab485bbc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2001,9 +2001,11 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
return sprintf(buf, "eui.%8phN\n", ns->eui);
 
-   while (ctrl->serial[serial_len - 1] == ' ')
+   while (serial_len > 0 && (ctrl->serial[serial_len - 1] == ' ' ||
+ ctrl->serial[serial_len - 1] == '\0'))
serial_len--;
-   while (ctrl->model[model_len - 1] == ' ')
+   while (model_len > 0 && (ctrl->model[model_len - 1] == ' ' ||
+ctrl->model[model_len - 1] == '\0'))
model_len--;
 
return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-- 
2.13.2



[PATCH v3 0/3] Improve readbility of NVME "wwid" attribute

2017-07-20 Thread Martin Wilck
With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Changes wrt v2:
 * 3/3: Make sure no underflow occurs (Joe Perches)

Martin Wilck (3):
  string.h: add memcpy_and_pad()
  nvmet: identify controller: improve standard compliance
  nvme: wwid_show: strip trailing 0-bytes

 drivers/nvme/host/core.c|  6 --
 drivers/nvme/target/admin-cmd.c | 13 ++---
 include/linux/string.h  | 30 ++
 3 files changed, 40 insertions(+), 9 deletions(-)

-- 
2.13.2



[PATCH v3 2/3] nvmet: identify controller: improve standard compliance

2017-07-20 Thread Martin Wilck
The NVME standard mandates that the SN, MN, and FR fields of
the Indentify Controller Data Structure be "ASCII strings".
That means that they may not contain 0-bytes, not even string
terminators.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/nvme/target/admin-cmd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 35f930db3c02c..bd040ae32528d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -173,6 +173,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_id_ctrl *id;
u16 status = 0;
+   const char MODEL[] = "Linux";
 
id = kzalloc(sizeof(*id), GFP_KERNEL);
if (!id) {
@@ -184,14 +185,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
id->vid = 0;
id->ssvid = 0;
 
-   memset(id->sn, ' ', sizeof(id->sn));
-   snprintf(id->sn, sizeof(id->sn), "%llx", ctrl->serial);
+   bin2hex(id->sn, >serial, min(sizeof(ctrl->serial),
+  sizeof(id->sn) / 2));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
+   memcpy_and_pad(id->mn, sizeof(id->mn), MODEL, sizeof(MODEL) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.13.2



[PATCH v3 2/3] nvmet: identify controller: improve standard compliance

2017-07-20 Thread Martin Wilck
The NVME standard mandates that the SN, MN, and FR fields of
the Indentify Controller Data Structure be "ASCII strings".
That means that they may not contain 0-bytes, not even string
terminators.

Signed-off-by: Martin Wilck 
---
 drivers/nvme/target/admin-cmd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 35f930db3c02c..bd040ae32528d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -173,6 +173,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_id_ctrl *id;
u16 status = 0;
+   const char MODEL[] = "Linux";
 
id = kzalloc(sizeof(*id), GFP_KERNEL);
if (!id) {
@@ -184,14 +185,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
id->vid = 0;
id->ssvid = 0;
 
-   memset(id->sn, ' ', sizeof(id->sn));
-   snprintf(id->sn, sizeof(id->sn), "%llx", ctrl->serial);
+   bin2hex(id->sn, >serial, min(sizeof(ctrl->serial),
+  sizeof(id->sn) / 2));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
+   memcpy_and_pad(id->mn, sizeof(id->mn), MODEL, sizeof(MODEL) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.13.2



[PATCH v3 1/3] string.h: add memcpy_and_pad()

2017-07-20 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.2



[PATCH v3 1/3] string.h: add memcpy_and_pad()

2017-07-20 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Signed-off-by: Martin Wilck 
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.2



[PATCH v2 2/3] nvmet: identify controller: improve standard compliance

2017-07-20 Thread Martin Wilck
The NVME standard mandates that the SN, MN, and FR fields of
the Indentify Controller Data Structure be "ASCII strings".
That means that they may not contain 0-bytes, not even string
terminators.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 drivers/nvme/target/admin-cmd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 35f930db3c02c..bd040ae32528d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -173,6 +173,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_id_ctrl *id;
u16 status = 0;
+   const char MODEL[] = "Linux";
 
id = kzalloc(sizeof(*id), GFP_KERNEL);
if (!id) {
@@ -184,14 +185,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
id->vid = 0;
id->ssvid = 0;
 
-   memset(id->sn, ' ', sizeof(id->sn));
-   snprintf(id->sn, sizeof(id->sn), "%llx", ctrl->serial);
+   bin2hex(id->sn, >serial, min(sizeof(ctrl->serial),
+  sizeof(id->sn) / 2));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
+   memcpy_and_pad(id->mn, sizeof(id->mn), MODEL, sizeof(MODEL) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.13.2



[PATCH v2 2/3] nvmet: identify controller: improve standard compliance

2017-07-20 Thread Martin Wilck
The NVME standard mandates that the SN, MN, and FR fields of
the Indentify Controller Data Structure be "ASCII strings".
That means that they may not contain 0-bytes, not even string
terminators.

Signed-off-by: Martin Wilck 
---
 drivers/nvme/target/admin-cmd.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 35f930db3c02c..bd040ae32528d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -173,6 +173,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
struct nvmet_ctrl *ctrl = req->sq->ctrl;
struct nvme_id_ctrl *id;
u16 status = 0;
+   const char MODEL[] = "Linux";
 
id = kzalloc(sizeof(*id), GFP_KERNEL);
if (!id) {
@@ -184,14 +185,12 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req 
*req)
id->vid = 0;
id->ssvid = 0;
 
-   memset(id->sn, ' ', sizeof(id->sn));
-   snprintf(id->sn, sizeof(id->sn), "%llx", ctrl->serial);
+   bin2hex(id->sn, >serial, min(sizeof(ctrl->serial),
+  sizeof(id->sn) / 2));
 
-   memset(id->mn, ' ', sizeof(id->mn));
-   strncpy((char *)id->mn, "Linux", sizeof(id->mn));
-
-   memset(id->fr, ' ', sizeof(id->fr));
-   strncpy((char *)id->fr, UTS_RELEASE, sizeof(id->fr));
+   memcpy_and_pad(id->mn, sizeof(id->mn), MODEL, sizeof(MODEL) - 1, ' ');
+   memcpy_and_pad(id->fr, sizeof(id->fr),
+  UTS_RELEASE, strlen(UTS_RELEASE), ' ');
 
id->rab = 6;
 
-- 
2.13.2



[PATCH v2 0/3] Improve readbility of NVME "wwid" attribute

2017-07-20 Thread Martin Wilck
With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Martin Wilck (3):
  string.h: add memcpy_and_pad()
  nvmet: identify controller: improve standard compliance
  nvme: wwid_show: strip trailing 0-bytes

 drivers/nvme/host/core.c|  6 --
 drivers/nvme/target/admin-cmd.c | 13 ++---
 include/linux/string.h  | 30 ++
 3 files changed, 40 insertions(+), 9 deletions(-)

-- 
2.13.2



[PATCH v2 0/3] Improve readbility of NVME "wwid" attribute

2017-07-20 Thread Martin Wilck
With the current implementation, the default "fallback" WWID generation
code (if no nguid, euid etc. are defined) for Linux NVME host and target
results in the following WWID format:

nvme.-3163653363666438366239656630386200-4c696e757800-0002

This is not only hard to read, it poses real problems e.g. for multipath
(dm WWIDs are limited to 128 characters).

With this patch series, the WWID on a Linux host connected to a Linux target
looks like this:

nvme.-65613435333665653738613464363961-4c696e7578-0001

Changes wrt v1:
 * 1/3: new, moved helper to include/linux/string.h (Christoph Hellwig)
(you suggested kernel.h, but I think this matches string.h better)  
 * Dropped the last patch from the v1 series that would have changed valid 
WWIDs for
   HW NVME controllers.

Martin Wilck (3):
  string.h: add memcpy_and_pad()
  nvmet: identify controller: improve standard compliance
  nvme: wwid_show: strip trailing 0-bytes

 drivers/nvme/host/core.c|  6 --
 drivers/nvme/target/admin-cmd.c | 13 ++---
 include/linux/string.h  | 30 ++
 3 files changed, 40 insertions(+), 9 deletions(-)

-- 
2.13.2



[PATCH v2 3/3] nvme: wwid_show: strip trailing 0-bytes

2017-07-20 Thread Martin Wilck
Some broken targets (such as the current Linux target) pad
model or serial fields with 0-bytes rather than spaces. The
NVME spec disallows 0 bytes in "ASCII" fields.
Thus strip trailing 0-bytes, too.

Signed-off-by: Martin Wilck <mwi...@suse.com>
Reviewed-by: Hannes Reinecke <h...@suse.de>
Acked-by: Christoph Hellwig <h...@lst.de>

---
 drivers/nvme/host/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb96f4a7ae3a9..30b87600157d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2001,9 +2001,11 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
return sprintf(buf, "eui.%8phN\n", ns->eui);
 
-   while (ctrl->serial[serial_len - 1] == ' ')
+   while (ctrl->serial[serial_len - 1] == ' ' ||
+  ctrl->serial[serial_len - 1] == '\0')
serial_len--;
-   while (ctrl->model[model_len - 1] == ' ')
+   while (ctrl->model[model_len - 1] == ' ' ||
+  ctrl->model[model_len - 1] == '\0')
model_len--;
 
return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-- 
2.13.2



[PATCH v2 3/3] nvme: wwid_show: strip trailing 0-bytes

2017-07-20 Thread Martin Wilck
Some broken targets (such as the current Linux target) pad
model or serial fields with 0-bytes rather than spaces. The
NVME spec disallows 0 bytes in "ASCII" fields.
Thus strip trailing 0-bytes, too.

Signed-off-by: Martin Wilck 
Reviewed-by: Hannes Reinecke 
Acked-by: Christoph Hellwig 

---
 drivers/nvme/host/core.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cb96f4a7ae3a9..30b87600157d7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2001,9 +2001,11 @@ static ssize_t wwid_show(struct device *dev, struct 
device_attribute *attr,
if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
return sprintf(buf, "eui.%8phN\n", ns->eui);
 
-   while (ctrl->serial[serial_len - 1] == ' ')
+   while (ctrl->serial[serial_len - 1] == ' ' ||
+  ctrl->serial[serial_len - 1] == '\0')
serial_len--;
-   while (ctrl->model[model_len - 1] == ' ')
+   while (ctrl->model[model_len - 1] == ' ' ||
+  ctrl->model[model_len - 1] == '\0')
model_len--;
 
return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-- 
2.13.2



[PATCH v2 1/3] string.h: add memcpy_and_pad()

2017-07-20 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Signed-off-by: Martin Wilck <mwi...@suse.com>
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.2



[PATCH v2 1/3] string.h: add memcpy_and_pad()

2017-07-20 Thread Martin Wilck
This helper function is useful for the nvme subsystem, and maybe
others.

Signed-off-by: Martin Wilck 
---
 include/linux/string.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index a467e617eeb08..0bec4151b0eb9 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -200,6 +200,7 @@ static inline const char *kbasename(const char *path)
 void fortify_panic(const char *name) __noreturn __cold;
 void __read_overflow(void) __compiletime_error("detected read beyond size of 
object passed as 1st parameter");
 void __read_overflow2(void) __compiletime_error("detected read beyond size of 
object passed as 2nd parameter");
+void __read_overflow3(void) __compiletime_error("detected read beyond size of 
object passed as 3rd parameter");
 void __write_overflow(void) __compiletime_error("detected write beyond size of 
object passed as 1st parameter");
 
 #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && 
defined(CONFIG_FORTIFY_SOURCE)
@@ -395,4 +396,33 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q)
 
 #endif
 
+/**
+ * memcpy_and_pad - Copy one buffer to another with padding
+ * @dest: Where to copy to
+ * @dest_len: The destination buffer size
+ * @src: Where to copy from
+ * @count: The number of bytes to copy
+ * @pad: Character to use for padding if space is left in destination.
+ */
+__FORTIFY_INLINE void memcpy_and_pad(void *dest, size_t dest_len,
+const void *src, size_t count, int pad)
+{
+   size_t dest_size = __builtin_object_size(dest, 0);
+   size_t src_size = __builtin_object_size(src, 0);
+
+   if (__builtin_constant_p(dest_len) && __builtin_constant_p(count)) {
+   if (dest_size < dest_len && dest_size < count)
+   __write_overflow();
+   else if (src_size < dest_len && src_size < count)
+   __read_overflow3();
+   }
+   if (dest_size < dest_len)
+   fortify_panic(__func__);
+   if (dest_len > count) {
+   memcpy(dest, src, count);
+   memset(dest + count, pad,  dest_len - count);
+   } else
+   memcpy(dest, src, dest_len);
+}
+
 #endif /* _LINUX_STRING_H_ */
-- 
2.13.2



Re: [PATCH] nvmet: preserve controller serial number between reboots

2017-07-13 Thread Martin Wilck
On Thu, 2017-07-13 at 12:48 +0200, Johannes Thumshirn wrote:
> The NVMe target has no way to preserve controller serial
> IDs across reboots which breaks udev scripts doing
> SYMLINK+="dev/disk/by-id/nvme-$env{ID_SERIAL}-part%n.
> 
> Export the randomly generated serial number via configfs and allow
> setting of a serial via configfs to mitigate this breakage.

I'm wondering if this should be a write-once attribute. Also,
Once the serial number has been passed on to some host (or maybe only:
while the device is in use by some host), the attribute should probably
be read-only.

Martin



Re: [PATCH] nvmet: preserve controller serial number between reboots

2017-07-13 Thread Martin Wilck
On Thu, 2017-07-13 at 12:48 +0200, Johannes Thumshirn wrote:
> The NVMe target has no way to preserve controller serial
> IDs across reboots which breaks udev scripts doing
> SYMLINK+="dev/disk/by-id/nvme-$env{ID_SERIAL}-part%n.
> 
> Export the randomly generated serial number via configfs and allow
> setting of a serial via configfs to mitigate this breakage.

I'm wondering if this should be a write-once attribute. Also,
Once the serial number has been passed on to some host (or maybe only:
while the device is in use by some host), the attribute should probably
be read-only.

Martin



Re: work queue of scsi fc transports should be serialized

2017-05-22 Thread Martin Wilck
On Sat, 2017-05-20 at 08:25 +, Dashi DS1 Cao wrote:
> On Fri, 2017-05-19 at 09:36 +, Dashi DS1 Cao wrote:
> > It seems there is a race of multiple "fc_starget_delete" of the
> > same 
> > rport, thus of the same SCSI host. The race leads to the race of 
> > scsi_remove_target and it cannot be prevented by the code snippet 
> > alone, even of the most recent
> > version:
> > spin_lock_irqsave(shost->host_lock, flags);
> > list_for_each_entry(starget, >__targets, siblings) {
> > if (starget->state == STARGET_DEL ||
> > starget->state == STARGET_REMOVE)
> > continue;
> > If there is a possibility that the starget is under deletion(state
> > == 
> > STARGET_DEL), it should be possible that list_next_entry(starget, 
> > siblings) could cause a read access violation.
> > Hello Dashi,
> > Something else must be going on. From scsi_remove_target():
> > restart:
> > spin_lock_irqsave(shost->host_lock, flags);
> > list_for_each_entry(starget, >__targets, siblings) {
> > if (starget->state == STARGET_DEL ||
> > starget->state == STARGET_REMOVE)
> > continue;
> > if (starget->dev.parent == dev || >dev == dev)
> > {
> > kref_get(>reap_ref);
> > starget->state = STARGET_REMOVE;
> > spin_unlock_irqrestore(shost->host_lock,
> > flags);
> > __scsi_remove_target(starget);
> > scsi_target_reap(starget);
> > goto restart;
> > }
> > }
> > spin_unlock_irqrestore(shost->host_lock, flags);
> > In other words, before scsi_remove_target() decides to call
> > __scsi_remove_target(), it changes the target state into
> > STARGET_REMOVE while holding the host lock. 
> > This means that scsi_remove_target() won't
> > call __scsi_remove_target() twice and also that it won't invoke
> > list_next_entry(starget, siblings) after starget has been 
> > freed.
> > Bart.
> 
> In the crashes of Suse 12 sp1, the root cause is the deletion of a
> list node without holding the lock:
> spin_lock_irqsave(shost->host_lock, flags);
> list_for_each_entry_safe(starget, tmp, >__targets,
> siblings) {
> if (starget->state == STARGET_DEL)
> continue;
> if (starget->dev.parent == dev || >dev ==
> dev) {
> /* assuming new targets arrive at the end */
> kref_get(>reap_ref);
> spin_unlock_irqrestore(shost->host_lock,
> flags);
> 
> __scsi_remove_target(starget);
> list_move_tail(>siblings,
> _list);  --this deletion from shost->__targets list is done
> without the lock.
> spin_lock_irqsave(shost->host_lock, flags);
>  }
>   }
>   spin_unlock_irqrestore(shost->host_lock, flags);

I believe this is fixed in SLES12-SP1 kernel 3.12.53-60.30.1, with the
following patch:

* Mon Jan 18 2016 jthumsh...@suse.de
- scsi: restart list search after unlock in scsi_remove_target
  (bsc#944749, bsc#959257).
- Delete
  patches.fixes/0001-SCSI-Fix-hard-lockup-in-scsi_remove_target.patch.
- commit 2490876

Regards,
Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: work queue of scsi fc transports should be serialized

2017-05-22 Thread Martin Wilck
On Sat, 2017-05-20 at 08:25 +, Dashi DS1 Cao wrote:
> On Fri, 2017-05-19 at 09:36 +, Dashi DS1 Cao wrote:
> > It seems there is a race of multiple "fc_starget_delete" of the
> > same 
> > rport, thus of the same SCSI host. The race leads to the race of 
> > scsi_remove_target and it cannot be prevented by the code snippet 
> > alone, even of the most recent
> > version:
> > spin_lock_irqsave(shost->host_lock, flags);
> > list_for_each_entry(starget, >__targets, siblings) {
> > if (starget->state == STARGET_DEL ||
> > starget->state == STARGET_REMOVE)
> > continue;
> > If there is a possibility that the starget is under deletion(state
> > == 
> > STARGET_DEL), it should be possible that list_next_entry(starget, 
> > siblings) could cause a read access violation.
> > Hello Dashi,
> > Something else must be going on. From scsi_remove_target():
> > restart:
> > spin_lock_irqsave(shost->host_lock, flags);
> > list_for_each_entry(starget, >__targets, siblings) {
> > if (starget->state == STARGET_DEL ||
> > starget->state == STARGET_REMOVE)
> > continue;
> > if (starget->dev.parent == dev || >dev == dev)
> > {
> > kref_get(>reap_ref);
> > starget->state = STARGET_REMOVE;
> > spin_unlock_irqrestore(shost->host_lock,
> > flags);
> > __scsi_remove_target(starget);
> > scsi_target_reap(starget);
> > goto restart;
> > }
> > }
> > spin_unlock_irqrestore(shost->host_lock, flags);
> > In other words, before scsi_remove_target() decides to call
> > __scsi_remove_target(), it changes the target state into
> > STARGET_REMOVE while holding the host lock. 
> > This means that scsi_remove_target() won't
> > call __scsi_remove_target() twice and also that it won't invoke
> > list_next_entry(starget, siblings) after starget has been 
> > freed.
> > Bart.
> 
> In the crashes of Suse 12 sp1, the root cause is the deletion of a
> list node without holding the lock:
> spin_lock_irqsave(shost->host_lock, flags);
> list_for_each_entry_safe(starget, tmp, >__targets,
> siblings) {
> if (starget->state == STARGET_DEL)
> continue;
> if (starget->dev.parent == dev || >dev ==
> dev) {
> /* assuming new targets arrive at the end */
> kref_get(>reap_ref);
> spin_unlock_irqrestore(shost->host_lock,
> flags);
> 
> __scsi_remove_target(starget);
> list_move_tail(>siblings,
> _list);  --this deletion from shost->__targets list is done
> without the lock.
> spin_lock_irqsave(shost->host_lock, flags);
>  }
>   }
>   spin_unlock_irqrestore(shost->host_lock, flags);

I believe this is fixed in SLES12-SP1 kernel 3.12.53-60.30.1, with the
following patch:

* Mon Jan 18 2016 jthumsh...@suse.de
- scsi: restart list search after unlock in scsi_remove_target
  (bsc#944749, bsc#959257).
- Delete
  patches.fixes/0001-SCSI-Fix-hard-lockup-in-scsi_remove_target.patch.
- commit 2490876

Regards,
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [RFC 10/10] kmod: add a sanity check on module loading

2016-12-09 Thread Martin Wilck
On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
> 
> Although this does get us in the business of keeping alias maps in
> kernel, the the work to support and maintain this is trivial.

You've implemented a special treatment for request_module("fs-$X")in
finished_kmod_load(), but there are many more aliases defined (and
used) in the kernel. Do you plan to implement special code for "char-
major-$X", "crypto-$X", "binfmt-$X" etc. later? 

Regards
Martin

-- 
Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



Re: [RFC 10/10] kmod: add a sanity check on module loading

2016-12-09 Thread Martin Wilck
On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
> 
> Although this does get us in the business of keeping alias maps in
> kernel, the the work to support and maintain this is trivial.

You've implemented a special treatment for request_module("fs-$X")in
finished_kmod_load(), but there are many more aliases defined (and
used) in the kernel. Do you plan to implement special code for "char-
major-$X", "crypto-$X", "binfmt-$X" etc. later? 

Regards
Martin

-- 
Dr. Martin Wilck , Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)



[PATCH v3] base/platform: fix binding for drivers without probe callback

2015-12-03 Thread martin . wilck
From: Martin Wilck 

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe were missing. As a result,
a device and driver could be "bound" together just by matching their names;
this doesn't work any more after b8b2c7d845d5.

This change broke the assumptions of certain drivers; for example, the TPM
code has long assumed that platform driver and device with matching name
could be bound in this way. That assumption may cause such drivers to
fail with Oops during initialization after applying this change. Failure
in suspend/resume tests under qemu has also been reported.

This patch restores the previous (4.3.0 and earlier) behavior of
platform_drv_probe() in the case when the associated platform driver has
no "probe" function.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain callbacks are 
called unconditionally")
Signed-off-by: Martin Wilck 
---
 v2: fixed style issues, rephrased commit message.
 v3: rephrased commit message and subject again.

 drivers/base/platform.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..176b59f 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,15 @@ static int platform_drv_probe(struct device *_dev)
return ret;
 
ret = dev_pm_domain_attach(_dev, true);
-   if (ret != -EPROBE_DEFER && drv->probe) {
-   ret = drv->probe(dev);
-   if (ret)
-   dev_pm_domain_detach(_dev, true);
+   if (ret != -EPROBE_DEFER) {
+   if (drv->probe) {
+   ret = drv->probe(dev);
+   if (ret)
+   dev_pm_domain_detach(_dev, true);
+   } else {
+   /* don't fail if just dev_pm_domain_attach failed */
+   ret = 0;
+   }
}
 
if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] base/platform: fix binding for drivers without probe callback

2015-12-03 Thread martin . wilck
From: Martin Wilck <martin.wi...@ts.fujitsu.com>

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe were missing. As a result,
a device and driver could be "bound" together just by matching their names;
this doesn't work any more after b8b2c7d845d5.

This change broke the assumptions of certain drivers; for example, the TPM
code has long assumed that platform driver and device with matching name
could be bound in this way. That assumption may cause such drivers to
fail with Oops during initialization after applying this change. Failure
in suspend/resume tests under qemu has also been reported.

This patch restores the previous (4.3.0 and earlier) behavior of
platform_drv_probe() in the case when the associated platform driver has
no "probe" function.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain callbacks are 
called unconditionally")
Signed-off-by: Martin Wilck <martin.wi...@ts.fujitsu.com>
---
 v2: fixed style issues, rephrased commit message.
 v3: rephrased commit message and subject again.

 drivers/base/platform.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..176b59f 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,15 @@ static int platform_drv_probe(struct device *_dev)
return ret;
 
ret = dev_pm_domain_attach(_dev, true);
-   if (ret != -EPROBE_DEFER && drv->probe) {
-   ret = drv->probe(dev);
-   if (ret)
-   dev_pm_domain_detach(_dev, true);
+   if (ret != -EPROBE_DEFER) {
+   if (drv->probe) {
+   ret = drv->probe(dev);
+   if (ret)
+   dev_pm_domain_detach(_dev, true);
+   } else {
+   /* don't fail if just dev_pm_domain_attach failed */
+   ret = 0;
+   }
}
 
if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] base/platform: return success when probe function is NULL

2015-11-30 Thread martin . wilck
From: Martin Wilck 

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe were missing. As a result,
a device and driver could be "bound" together just by matching their names;
this doesn't work any more after b8b2c7d845d5.

This may cause problems later for certain usage of platform_driver_register()
and platform_device_register_simple(). I observed a panic while loading
the tpm_tis driver with parameter "force=1" (i.e. registering tpm_tis as
a platform driver), because tpm_tis_init's assumption that the device
returned by platform_device_register_simple() was bound didn't hold any more
(tpmm_chip_alloc() dereferences chip->pdev->driver, causing panic).

This patch restores the previous (4.3.0 and earlier) behavior of
platform_drv_probe() in the case when the associated platform driver has
no "probe" function.

v2: fixed style issues, rephrased commit message.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain callbacks are 
called unconditionally")
Signed-off-by: Martin Wilck 
---
 drivers/base/platform.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..176b59f 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,15 @@ static int platform_drv_probe(struct device *_dev)
return ret;
 
ret = dev_pm_domain_attach(_dev, true);
-   if (ret != -EPROBE_DEFER && drv->probe) {
-   ret = drv->probe(dev);
-   if (ret)
-   dev_pm_domain_detach(_dev, true);
+   if (ret != -EPROBE_DEFER) {
+   if (drv->probe) {
+   ret = drv->probe(dev);
+   if (ret)
+   dev_pm_domain_detach(_dev, true);
+   } else {
+   /* don't fail if just dev_pm_domain_attach failed */
+   ret = 0;
+   }
}
 
if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] base/platform: return success when probe function is NULL

2015-11-30 Thread martin . wilck
From: Martin Wilck <martin.wi...@ts.fujitsu.com>

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe were missing. As a result,
a device and driver could be "bound" together just by matching their names;
this doesn't work any more after b8b2c7d845d5.

This may cause problems later for certain usage of platform_driver_register()
and platform_device_register_simple(). I observed a panic while loading
the tpm_tis driver with parameter "force=1" (i.e. registering tpm_tis as
a platform driver), because tpm_tis_init's assumption that the device
returned by platform_device_register_simple() was bound didn't hold any more
(tpmm_chip_alloc() dereferences chip->pdev->driver, causing panic).

This patch restores the previous (4.3.0 and earlier) behavior of
platform_drv_probe() in the case when the associated platform driver has
no "probe" function.

v2: fixed style issues, rephrased commit message.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain callbacks are 
called unconditionally")
Signed-off-by: Martin Wilck <martin.wi...@ts.fujitsu.com>
---
 drivers/base/platform.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..176b59f 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,15 @@ static int platform_drv_probe(struct device *_dev)
return ret;
 
ret = dev_pm_domain_attach(_dev, true);
-   if (ret != -EPROBE_DEFER && drv->probe) {
-   ret = drv->probe(dev);
-   if (ret)
-   dev_pm_domain_detach(_dev, true);
+   if (ret != -EPROBE_DEFER) {
+   if (drv->probe) {
+   ret = drv->probe(dev);
+   if (ret)
+   dev_pm_domain_detach(_dev, true);
+   } else {
+   /* don't fail if just dev_pm_domain_attach failed */
+   ret = 0;
+   }
}
 
if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] base/platform: fix panic when probe function is NULL

2015-11-26 Thread martin . wilck
From: Martin Wilck 

Since b8b2c7d845d5, platform_drv_probe() is called for all platform
devices. If drv->probe is NULL, and dev_pm_domain_attach() fails,
platform_drv_probe() will return the error code from dev_pm_domain_attach().

This causes real_probe() to enter the "probe_failed" path and set
dev->driver to NULL. Before b8b2c7d845d5, real_probe() would assume
success if both dev->bus->probe and drv->probe are missing.

This may cause a panic later. For example, inserting the tpm_tis
driver with parameter "force=1" (i.e. registering tpm_tis as a platform
driver) will panic in tpmm_chip_alloc() because dev->driver is NULL:

 chip->cdev.owner = chip->pdev->driver->owner;

This patch fixes this by returning success in platform_drv_probe() if
"just" dev_pm_domain_attach() had failed. This restores the semantics
of platform_device_register_XXX() if the associated platform driver has
no "probe" function.

Fixes: b8b2c7d845d5 ("base/platform: assert that dev_pm_domain
callbacks are called unconditionally")

Signed-off-by: Martin Wilck 
---
 drivers/base/platform.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 1dd6d3b..c994e76 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -513,10 +513,14 @@ static int platform_drv_probe(struct device *_dev)
return ret;
 
ret = dev_pm_domain_attach(_dev, true);
-   if (ret != -EPROBE_DEFER && drv->probe) {
-   ret = drv->probe(dev);
-   if (ret)
-   dev_pm_domain_detach(_dev, true);
+   if (ret != -EPROBE_DEFER) {
+   if (drv->probe) {
+   ret = drv->probe(dev);
+   if (ret)
+   dev_pm_domain_detach(_dev, true);
+   } else
+   /* don't fail if just dev_pm_domain_attach failed */
+   ret = 0;
}
 
if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >