Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-27 Thread Jens Axboe
On 3/27/18 9:39 AM, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0; a driver may allocate pre_vectors for special use. This
> patch adds an offset parameter so blk-mq may find the intended affinity
> mask and updates all drivers using this API accordingly.

Looks good to me, I'll queue it up for 4.17.

-- 
Jens Axboe



Re: 答复: Re: [PATCH] scsi: Replace sdev_printk with printk_deferred to avoid

2018-03-27 Thread Sergey Senozhatsky
On (03/28/18 10:29), wen.yan...@zte.com.cn wrote:
>Hello Bart,
>
>We have a detailed discussion of the problem.
>Sergey Senozhatsky, Petr and many people have made a lot of efforts about
>it.
>Please see this link:
>https://bugzilla.kernel.org/show_bug.cgi?id=199003
>
>1, Petr suggests that it should be modified in this way:
>IMHO, printing the same message so many times is useless. Therefore
>some throttling would make sense. If we want to keep sdev_printk(),

The thing with retelimiting is that - yes, we do less printks but we still
do them under queue spin_lock.

So I was thinking about something like below [a quick-n-dirty workaround]

---

 drivers/scsi/scsi_lib.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0dfec0dedd5e..6c930fbdd24c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1825,9 +1825,13 @@ static void scsi_request_fn(struct request_queue *q)
break;
 
if (unlikely(!scsi_device_online(sdev))) {
+   scsi_kill_request(req, q);
+   spin_unlock_irq(q->queue_lock);
+
sdev_printk(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
-   scsi_kill_request(req, q);
+
+   spin_lock_irq(q->queue_lock);
continue;
}
 


Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-27 Thread Ming Lei
On Tue, Mar 27, 2018 at 09:39:06AM -0600, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0; a driver may allocate pre_vectors for special use. This
> patch adds an offset parameter so blk-mq may find the intended affinity
> mask and updates all drivers using this API accordingly.
> 
> Cc: Don Brace 
> Cc: 
> Cc: 
> Signed-off-by: Keith Busch 
> ---
> v1 -> v2:
> 
>   Update blk-mq API directly instead of chaining a default parameter to
>   a new API, and update all drivers accordingly.
> 
>  block/blk-mq-pci.c| 6 --
>  drivers/nvme/host/pci.c   | 2 +-
>  drivers/scsi/qla2xxx/qla_os.c | 2 +-
>  drivers/scsi/smartpqi/smartpqi_init.c | 2 +-
>  include/linux/blk-mq-pci.h| 3 ++-
>  5 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..e233996bb76f 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
>   * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
>   * @set: tagset to provide the mapping for
>   * @pdev:PCI device associated with @set.
> + * @offset:  Offset to use for the pci irq vector
>   *
>   * This function assumes the PCI device @pdev has at least as many available
>   * interrupt vectors as @set has queues.  It will then query the vector
> @@ -28,13 +29,14 @@
>   * that maps a queue to the CPUs that have irq affinity for the corresponding
>   * vector.
>   */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset)
>  {
>   const struct cpumask *mask;
>   unsigned int queue, cpu;
>  
>   for (queue = 0; queue < set->nr_hw_queues; queue++) {
> - mask = pci_irq_get_affinity(pdev, queue);
> + mask = pci_irq_get_affinity(pdev, queue + offset);
>   if (!mask)
>   goto fallback;
>  
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index cef5ce851a92..e3b9efca0571 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -414,7 +414,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
>  {
>   struct nvme_dev *dev = set->driver_data;
>  
> - return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
> + return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
>  }
>  
>  /**
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 12ee6e02d146..2c705f3dd265 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -6805,7 +6805,7 @@ static int qla2xxx_map_queues(struct Scsi_Host *shost)
>   if (USER_CTRL_IRQ(vha->hw))
>   rc = blk_mq_map_queues(>tag_set);
>   else
> - rc = blk_mq_pci_map_queues(>tag_set, vha->hw->pdev);
> + rc = blk_mq_pci_map_queues(>tag_set, vha->hw->pdev, 0);
>   return rc;
>  }
>  
> diff --git a/drivers/scsi/smartpqi/smartpqi_init.c 
> b/drivers/scsi/smartpqi/smartpqi_init.c
> index b2880c7709e6..10c94011c8a8 100644
> --- a/drivers/scsi/smartpqi/smartpqi_init.c
> +++ b/drivers/scsi/smartpqi/smartpqi_init.c
> @@ -5348,7 +5348,7 @@ static int pqi_map_queues(struct Scsi_Host *shost)
>  {
>   struct pqi_ctrl_info *ctrl_info = shost_to_hba(shost);
>  
> - return blk_mq_pci_map_queues(>tag_set, ctrl_info->pci_dev);
> + return blk_mq_pci_map_queues(>tag_set, ctrl_info->pci_dev, 0);
>  }
>  
>  static int pqi_getpciinfo_ioctl(struct pqi_ctrl_info *ctrl_info,
> diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
> index 6338551e0fb9..9f4c17f0d2d8 100644
> --- a/include/linux/blk-mq-pci.h
> +++ b/include/linux/blk-mq-pci.h
> @@ -5,6 +5,7 @@
>  struct blk_mq_tag_set;
>  struct pci_dev;
>  
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
> +int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> +   int offset);
>  
>  #endif /* _LINUX_BLK_MQ_PCI_H */
> -- 
> 2.14.3
> 

Reviewed-by: Ming Lei 

-- 
Ming


[Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=199003

Wen Yang (wen.yan...@zte.com.cn) changed:

   What|Removed |Added

 CC||linux-scsi@vger.kernel.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Resend Patch 2/3] Netvsc: Use the vmbus functiton to calculate ring buffer percentage

2018-03-27 Thread Long Li
From: Long Li 

In Vmbus, we have defined a function to calculate available ring buffer
percentage to write.

Use that function and remove netvsc's private version.

Signed-off-by: Long Li 
---
 drivers/net/hyperv/hyperv_net.h |  1 -
 drivers/net/hyperv/netvsc.c | 17 +++--
 drivers/net/hyperv/netvsc_drv.c |  3 ---
 3 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index cd538d5a7986..a0199ab13d67 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -189,7 +189,6 @@ struct netvsc_device;
 struct net_device_context;
 
 extern u32 netvsc_ring_bytes;
-extern struct reciprocal_value netvsc_ring_reciprocal;
 
 struct netvsc_device *netvsc_device_add(struct hv_device *device,
const struct netvsc_device_info *info);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0265d703eb03..8af0069e4d8c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
@@ -590,17 +589,6 @@ void netvsc_device_remove(struct hv_device *device)
 #define RING_AVAIL_PERCENT_HIWATER 20
 #define RING_AVAIL_PERCENT_LOWATER 10
 
-/*
- * Get the percentage of available bytes to write in the ring.
- * The return value is in range from 0 to 100.
- */
-static u32 hv_ringbuf_avail_percent(const struct hv_ring_buffer_info 
*ring_info)
-{
-   u32 avail_write = hv_get_bytes_to_write(ring_info);
-
-   return reciprocal_divide(avail_write  * 100, netvsc_ring_reciprocal);
-}
-
 static inline void netvsc_free_send_slot(struct netvsc_device *net_device,
 u32 index)
 {
@@ -649,7 +637,8 @@ static void netvsc_send_tx_complete(struct netvsc_device 
*net_device,
wake_up(_device->wait_drain);
 
if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
-   (hv_ringbuf_avail_percent(>outbound) > 
RING_AVAIL_PERCENT_HIWATER ||
+   (hv_get_avail_to_write_percent(>outbound) >
+RING_AVAIL_PERCENT_HIWATER ||
 queue_sends < 1)) {
netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx));
ndev_ctx->eth_stats.wake_queue++;
@@ -757,7 +746,7 @@ static inline int netvsc_send_pkt(
struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
u64 req_id;
int ret;
-   u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
+   u32 ring_avail = hv_get_avail_to_write_percent(_channel->outbound);
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index faea0be18924..b0b1c2fd2b7b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -35,7 +35,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -55,7 +54,6 @@ static unsigned int ring_size __ro_after_init = 128;
 module_param(ring_size, uint, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
 unsigned int netvsc_ring_bytes __ro_after_init;
-struct reciprocal_value netvsc_ring_reciprocal __ro_after_init;
 
 static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_IFUP |
@@ -2186,7 +2184,6 @@ static int __init netvsc_drv_init(void)
ring_size);
}
netvsc_ring_bytes = ring_size * PAGE_SIZE;
-   netvsc_ring_reciprocal = reciprocal_value(netvsc_ring_bytes);
 
ret = vmbus_driver_register(_drv);
if (ret)
-- 
2.14.1



[Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write

2018-03-27 Thread Long Li
From: Long Li 

This is a best effort for estimating on how busy the ring buffer is for
that channel, based on available buffer to write in percentage. It is still
possible that at the time of actual ring buffer write, the space may not be
available due to other processes may be writing at the time.

Selecting a channel based on how full it is can reduce the possibility that
a ring buffer write will fail, and avoid the situation a channel is over
busy.

Now it's possible that storvsc can use a smaller ring buffer size
(e.g. 40k bytes) to take advantage of cache locality.

Signed-off-by: Long Li 
---
 drivers/scsi/storvsc_drv.c | 62 +-
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index a2ec0bc9e9fa..b1a87072b3ab 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer 
size (bytes)");
 
 module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
 MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to 
subchannels");
+
+static int ring_avail_percent_lowater = 10;
+module_param(ring_avail_percent_lowater, int, S_IRUGO);
+MODULE_PARM_DESC(ring_avail_percent_lowater,
+   "Select a channel if available ring size > this in percent");
+
 /*
  * Timeout in seconds for all devices managed by this driver.
  */
@@ -1285,9 +1291,9 @@ static int storvsc_do_io(struct hv_device *device,
 {
struct storvsc_device *stor_device;
struct vstor_packet *vstor_packet;
-   struct vmbus_channel *outgoing_channel;
+   struct vmbus_channel *outgoing_channel, *channel;
int ret = 0;
-   struct cpumask alloced_mask;
+   struct cpumask alloced_mask, other_numa_mask;
int tgt_cpu;
 
vstor_packet = >vstor_packet;
@@ -1301,22 +1307,53 @@ static int storvsc_do_io(struct hv_device *device,
/*
 * Select an an appropriate channel to send the request out.
 */
-
if (stor_device->stor_chns[q_num] != NULL) {
outgoing_channel = stor_device->stor_chns[q_num];
-   if (outgoing_channel->target_cpu == smp_processor_id()) {
+   if (outgoing_channel->target_cpu == q_num) {
/*
 * Ideally, we want to pick a different channel if
 * available on the same NUMA node.
 */
cpumask_and(_mask, _device->alloced_cpus,
cpumask_of_node(cpu_to_node(q_num)));
-   for_each_cpu_wrap(tgt_cpu, _mask,
-   outgoing_channel->target_cpu + 1) {
-   if (tgt_cpu != outgoing_channel->target_cpu) {
-   outgoing_channel =
-   stor_device->stor_chns[tgt_cpu];
-   break;
+
+   for_each_cpu_wrap(tgt_cpu, _mask, q_num + 1) {
+   if (tgt_cpu == q_num)
+   continue;
+   channel = stor_device->stor_chns[tgt_cpu];
+   if (hv_get_avail_to_write_percent(
+   >outbound)
+   > ring_avail_percent_lowater) {
+   outgoing_channel = channel;
+   goto found_channel;
+   }
+   }
+
+   /*
+* All the other channels on the same NUMA node are
+* busy. Try to use the channel on the current CPU
+*/
+   if (hv_get_avail_to_write_percent(
+   _channel->outbound)
+   > ring_avail_percent_lowater)
+   goto found_channel;
+
+   /*
+* If we reach here, all the channels on the current
+* NUMA node are busy. Try to find a channel in
+* other NUMA nodes
+*/
+   cpumask_andnot(_numa_mask,
+   _device->alloced_cpus,
+   cpumask_of_node(cpu_to_node(q_num)));
+
+   for_each_cpu(tgt_cpu, _numa_mask) {
+   channel = stor_device->stor_chns[tgt_cpu];
+   if (hv_get_avail_to_write_percent(
+   >outbound)
+   > ring_avail_percent_lowater) {
+

[Resend Patch 1/3] Vmbus: Add function to report available ring buffer to write in total ring size percentage

2018-03-27 Thread Long Li
From: Long Li 

Netvsc has a function to calculate how much ring buffer in percentage is
available to write. This function is also useful for storvsc and other
vmbus devices.

Define a similar function in vmbus to be used by other vmbus devices.

Signed-off-by: Long Li 
---
 drivers/hv/ring_buffer.c |  2 ++
 include/linux/hyperv.h   | 12 
 2 files changed, 14 insertions(+)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 8699bb969e7e..3c836c099a8f 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -227,6 +227,8 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
ring_info->ring_buffer->feature_bits.value = 1;
 
ring_info->ring_size = page_cnt << PAGE_SHIFT;
+   ring_info->ring_size_div10_reciprocal =
+   reciprocal_value(ring_info->ring_size / 10);
ring_info->ring_datasize = ring_info->ring_size -
sizeof(struct hv_ring_buffer);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2048f3c3b68a..eb7204851089 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_PAGE_BUFFER_COUNT  32
 #define MAX_MULTIPAGE_BUFFER_COUNT 32 /* 128K */
@@ -121,6 +122,7 @@ struct hv_ring_buffer {
 struct hv_ring_buffer_info {
struct hv_ring_buffer *ring_buffer;
u32 ring_size;  /* Include the shared header */
+   struct reciprocal_value ring_size_div10_reciprocal;
spinlock_t ring_lock;
 
u32 ring_datasize;  /* < ring_size */
@@ -155,6 +157,16 @@ static inline u32 hv_get_bytes_to_write(const struct 
hv_ring_buffer_info *rbi)
return write;
 }
 
+static inline u32 hv_get_avail_to_write_percent(
+   const struct hv_ring_buffer_info *rbi)
+{
+   u32 avail_write = hv_get_bytes_to_write(rbi);
+
+   return reciprocal_divide(
+   (avail_write  << 3) + (avail_write << 1),
+   rbi->ring_size_div10_reciprocal);
+}
+
 /*
  * VMBUS version is 32 bit entity broken up into
  * two 16 bit quantities: major_number. minor_number.
-- 
2.14.1



Re: [PATCH] scsi: libsas: add transport class for ATA devices

2018-03-27 Thread Jason Yan

Hi, Dan

On 2018/3/26 22:38, Dan Williams wrote:

On Mon, Mar 26, 2018 at 2:27 AM, Jason Yan  wrote:

Now ata devices attached with sas controller do not have transport
class, so that we can not see any information of these ata devices in
/sys/class/ata_port(or ata_link or ata_device).

Add transport class for the ata devices attached with sas controller.
The /sys/class directory will show the infomation of the ata devices
as follows:

localhost:/sys/class # ls ata*
ata_device:
dev1.0  dev2.0

ata_link:
link1  link2

ata_port:
ata1  ata2

No functional change of the device scanning and io path. The ata
transport class was deleted when destroying the sas devices.


Have you tested this with suspend / resume power management? I believe
that is what stopped me from going down this path, when I first
evaluated it.



I tested this patch with suspend/resume power management and did not
see any regressions. I don't know if this test is enough. The log is
as below:

[ 1098.226768] hns-nic HISI00C2:01 eth1: link up
[ 1183.162762] PM: suspend entry (s2idle)
[ 1183.162766] PM: Syncing filesystems ... done.
[ 1183.163224] Freezing user space processes ... (elapsed 0.001 seconds) 
done.

[ 1183.164710] OOM killer disabled.
[ 1183.164711] Freezing remaining freezable tasks ... (elapsed 0.001 
seconds) done.

[ 1183.166415] Suspending console(s) (use no_console_suspend to debug)
[ 1183.166733] pci_bus 0004:88: 2-byte config write to 0004:88:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1183.166741] pci_bus 0005:00: 2-byte config write to 0005:00:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1183.166748] pci_bus 0006:c0: 2-byte config write to 0006:c0:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1183.166758] pci_bus 000a:10: 2-byte config write to 000a:10:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1183.166765] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1183.166774] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 
offset 0x44 may corrupt adjacent RW1C bits

[ 1183.242586] sd 0:0:11:0: [sdl] Synchronizing SCSI cache
[ 1183.258568] sd 0:0:10:0: [sdk] Synchronizing SCSI cache
[ 1183.258699] sd 0:0:10:0: [sdk] Stopping disk
[ 1183.922568] sd 0:0:9:0: [sdj] Synchronizing SCSI cache
[ 1183.922690] sd 0:0:9:0: [sdj] Stopping disk
[ 1184.610567] sd 0:0:8:0: [sdi] Synchronizing SCSI cache
[ 1184.626567] sd 0:0:7:0: [sdh] Synchronizing SCSI cache
[ 1184.642564] sd 0:0:6:0: [sdg] Synchronizing SCSI cache
[ 1184.658567] sd 0:0:5:0: [sdf] Synchronizing SCSI cache
[ 1184.674565] sd 0:0:4:0: [sde] Synchronizing SCSI cache
[ 1184.690567] sd 0:0:3:0: [sdd] Synchronizing SCSI cache
[ 1184.706564] sd 0:0:2:0: [sdc] Synchronizing SCSI cache
[ 1184.722566] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 1184.738563] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1184.738844] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 1184.738857] sas: ata1: end_device-0:0:9: dev error handler
[ 1184.738860] sas: ata2: end_device-0:0:10: dev error handler
[ 1184.738877] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 
tries: 1
[ 1184.739169] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1184.739179] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1184.739188] pci_bus 000a:10: 2-byte config write to 000a:10:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1184.739214] pci_bus 0006:c0: 2-byte config write to 0006:c0:00.0 
offset 0x44 may corrupt adjacent RW1C bits

[ 1184.739334] pcieport 0002:80:00.0: can't derive routing for PCI INT A
[ 1184.739336] ixgbe 0002:81:00.0: PCI INT A: no GSI
[ 1184.758563] PM: suspend debug: Waiting for 5 second(s).
[ 1189.809010] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 1189.811180] sas: ata1: end_device-0:0:9: dev error handler
[ 1189.811185] sas: ata1: end_device-0:0:9: Unable to reset ata device?
[ 1189.811223] sas: ata2: end_device-0:0:10: dev error handler
[ 1189.811226] sas: ata2: end_device-0:0:10: Unable to reset ata device?
[ 1189.811474] sd 0:0:9:0: [sdj] Starting disk
[ 1189.888654] ixgbe 0002:81:00.0: Multiqueue Enabled: Rx Queue count = 
63, Tx Queue count = 63 XDP Queue count = 0

[ 1189.974214] ata2.00: configured for UDMA/133
[ 1189.974285] ata1.00: configured for UDMA/133
[ 1189.974295] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 
tries: 1

[ 1197.991137] sd 0:0:10:0: [sdk] Starting disk
[ 1210.019736] pci_generic_config_write32: 2 callbacks suppressed
[ 1210.019740] pci_bus 000d:30: 2-byte config write to 000d:30:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1210.019748] pci_bus 000c:20: 2-byte config write to 000c:20:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1210.019755] pci_bus 000a:10: 2-byte config write to 000a:10:00.0 
offset 0x44 may corrupt adjacent RW1C bits
[ 1210.019764] pci_bus 0006:c0: 2-byte config write to 0006:c0:00.0 
offset 0x44 may 

Re: [PATCH v2] qla2xxx: Fix race condition between iocb timeout and initialisation

2018-03-27 Thread Madhani, Himanshu
Hi Ben, 

> On Mar 21, 2018, at 10:58 AM, Ben Hutchings  
> wrote:
> 
> On Wed, 2018-03-21 at 17:55 +, Madhani, Himanshu wrote:
>> Hi Ben, 
>> 
>>> On Mar 21, 2018, at 10:45 AM, Ben Hutchings  
>>> wrote:
>>> 
>>> On Wed, 2018-03-21 at 17:17 +, Madhani, Himanshu wrote:
 Hi Ben, 
 
> On Mar 20, 2018, at 2:36 PM, Ben Hutchings 
>  wrote:
> 
> qla2x00_init_timer() calls add_timer() on the iocb timeout timer,
> which means the timeout function pointer and any data that the
> function depends on must be initialised beforehand.
> 
> Move this initialisation before each call to qla2x00_init_timer().  In
> some cases qla2x00_init_timer() initialises a completion structure
> needed by the timeout function, so move the call to add_timer() after
> that.
>>> 
>>> [...]
 What motivated this patch? are you debugging any crash which was
 helped by moving code around? 
>>> 
>>> I saw the recent fix that added a call to del_timer(), noticed the lack
>>> of synchronisation and then kept looking further.
>>> 
 What I see from this patch is that its moving iocb.cmd.timeout field
 before qla2x00_init_timer(),
 which are completely different from each other. 
>>> 
>>> How are they "completely different"?  qla2x00_init_timer() starts a
>>> timer that will call qla2x00_sp_timeout(), which in turn uses the
>>> iocb_cmd.timeout function pointer.  We should not assume that any
>>> initialisation code after the call to add_timer() will run before the
>>> timer expires, since the scheduler might run other tasks for the whole
>>> time.  It's unlikely but not impossible.
>>> 
>> 
>> I see. I used “completely different” due to the lack of better wording. 
>> 
>> I wanted to get context and understand if there was any issue that would
>> have helped with these changes. 
>> 
>> your explanation does help and I agree that there is window where timer() 
>> will 
>> pop before timeout is initialized. 
>> 
>> Let me put this patch in our test cycle for couple days and will respond on 
>> this one
> 
> Thanks.
> 

Just to update, we are still going thru review and testing and will take some 
extra cycle to conclude. 
I will update as soon as we are able to reach conclusion.

> Ben.
> 
> -- 
> Ben Hutchings
> Software Developer, Codethink Ltd.

Thanks,
- Himanshu



Re: [PATCH net-next] qed*: Utilize FW 8.33.11.0

2018-03-27 Thread Jason Gunthorpe
On Tue, Mar 27, 2018 at 08:50:24PM +0300, Leon Romanovsky wrote:
> On Tue, Mar 27, 2018 at 05:41:51PM +, Kalderon, Michal wrote:
> > > From: Jason Gunthorpe [mailto:j...@ziepe.ca]
> > > Sent: Tuesday, March 27, 2018 12:18 AM
> > > > diff --git a/drivers/infiniband/hw/qedr/main.c
> > > > b/drivers/infiniband/hw/qedr/main.c
> > > > index db4bf97..7dbbe6d 100644
> > > > +++ b/drivers/infiniband/hw/qedr/main.c
> > > > @@ -51,6 +51,7 @@
> > > >  MODULE_DESCRIPTION("QLogic 40G/100G ROCE Driver");
> > > > MODULE_AUTHOR("QLogic Corporation");  MODULE_LICENSE("Dual
> > > BSD/GPL");
> > > > +MODULE_VERSION(QEDR_MODULE_VERSION);
> > > >
> > > >  #define QEDR_WQ_MULTIPLIER_DFT (3)
> > > >
> > > > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > > > b/drivers/infiniband/hw/qedr/qedr.h
> > > > index 86d4511..ab0d411 100644
> > > > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > > > @@ -43,6 +43,8 @@
> > > >  #include "qedr_hsi_rdma.h"
> > > >
> > > >  #define QEDR_NODE_DESC "QLogic 579xx RoCE HCA"
> > > > +#define QEDR_MODULE_VERSION "8.33.11.20"
> > > > +
> > >
> > > I thought we had a general prohibition against versions like
> > > this in mainline drivers? And what does this hunk have to do
> > > with supporting new firmware?
> > >
> > I'm assuming you refer only to rdma in regards to version
> > prohibition right ? as looking at all other vendors (including
> > Mellanox) all have module versions under net/ why is rdma
> > different in this way ?  I now searched back mails on the topic
> > and found an email from Leon where he stated: " I am strongly
> > against module versions. You should rely on official kernel
> > version."  But it's not always the inbox driver that is installed
> > or probed, the kernel version is not enough.  Given different
> > distros, vanilla kernels, out of box drivers, etc... it is
> > essential for us that based on logs And modinfo we can determine
> > the qed* drivers that are running.
> 
> We actually stopped to maintain driver versions, just ensure that inbox,
> upstream and MLNX_OFED have different names.
> 
> The discussion thread is here
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004426.html
> https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004441.html

Hmm, Linus pretty clearly said No to MODULE_VERSION and related.

So I can't take this hunk, and you shouldn't do in ethernet either, I
guess.

Honestly the idea that this version will somehow have meaning in the
distro kernels is pretty far fetched. You think distros will backport
patches changing version # in any way that will make some kind of
sense?

Jason


Re: [PATCH v3 3/7] scsi_io_completion_nz_result function added

2018-03-27 Thread Bart Van Assche
On Tue, 2018-03-27 at 15:05 -0400, Douglas Gilbert wrote:
> A reviewer requested the original helper function's two return values
> be reduced to one: the blk_stat variable. This required a hack to
> differentiate the default setting of blk_stat (BLK_STS_OK) from the case
> when the helper assigns BLK_STS_OK as the return value. The hack is to
> return the otherwise unused BLK_STS_NOTSUPP value as an indication that
> the helper didn't change anything. That hack was judged by another
> reviewer to be worse that the "two return values" ugliness it was
> trying to address. So back to the original "two return values" solution.
> Returning a structure containing result and blk_stat is another
> possibility but returning structures is frowned upon in some circles.

It seems to me like this version of this patch is such that the above comment
no longer applies?

> + sense_valid = scsi_command_normalize_sense(cmd, );
> + about_current = sense_valid ? !scsi_sense_is_deferred() : true;
> +
> + if (blk_rq_is_passthrough(req)) {
> + if (sense_valid) {
> + /*
> +  * SG_IO wants current and deferred errors
> +  */
> + scsi_req(req)->sense_len =
> + min(8 + cmd->sense_buffer[7],
> + SCSI_SENSE_BUFFERSIZE);
> + }
> + if (about_current)
> + *blk_statp = __scsi_error_from_host_byte(cmd, result);

This patch does more than just moving code. E.g. the 'about_current'
variable is new and does not match to any variable in the existing code.
Please separate code movement and code refactoring.

Thanks,

Bart.





Re: [PATCH v3 2/7] scsi_io_completion rename variables

2018-03-27 Thread Bart Van Assche
On Tue, 2018-03-27 at 15:05 -0400, Douglas Gilbert wrote:
> Change some variable names and associated comments for clarity. Correct
> some misleading comments.
> 
> Signed-off-by: Douglas Gilbert 
> Reviewed-by: Johannes Thumshirn 

Reviewed-by: Bart Van Assche 





HI

2018-03-27 Thread Lucy Boston
-- 
Greeting, once again is me Lucy Boston this is twice am contacting you
please is very urgent respond to me for more details through my.
Email:

dr.lucybos...@gmail.com


[PATCH v3 5/7] scsi_io_completion_reprep helper added

2018-03-27 Thread Douglas Gilbert
Since the action "reprep" is called from two places, rather than repeat
the code, make a new scsi_io_completion helper with "reprep" as its
suffix.

Signed-off-by: Douglas Gilbert 
---
 drivers/scsi/scsi_lib.c | 41 ++---
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b1cefd8e527f..e7e5eb5b9e2e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -751,6 +751,20 @@ static blk_status_t __scsi_error_from_host_byte(struct 
scsi_cmnd *cmd,
}
 }
 
+/* Helper for scsi_io_completion() when "reprep" action required. */
+static void scsi_io_completion_reprep(struct scsi_cmnd *cmd,
+ struct request_queue *q)
+{
+   /* A new command will be prepared and issued. */
+   if (q->mq_ops) {
+   scsi_mq_requeue_cmd(cmd);
+   } else {
+   /* Unprep request and put it back at head of the queue. */
+   scsi_release_buffers(cmd);
+   scsi_requeue_command(q, cmd);
+   }
+}
+
 /* Helper for scsi_io_completion() when special action required. */
 static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
 {
@@ -893,15 +907,7 @@ static void scsi_io_completion_action(struct scsi_cmnd 
*cmd, int result)
return;
/*FALLTHRU*/
case ACTION_REPREP:
-   /* Unprep the request and put it back at the head of the queue.
-* A new command will be prepared and issued.
-*/
-   if (q->mq_ops)
-   scsi_mq_requeue_cmd(cmd);
-   else {
-   scsi_release_buffers(cmd);
-   scsi_requeue_command(q, cmd);
-   }
+   scsi_io_completion_reprep(cmd, q);
break;
case ACTION_RETRY:
/* Retry the same command immediately */
@@ -1079,20 +1085,9 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
 * If there had been no error, but we have leftover bytes in the
 * requeues just queue the command up again.
 */
-   if (result == 0) {
-   /*
-* Unprep the request and put it back at the head of the
-* queue. A new command will be prepared and issued.
-* This block is the same as case ACTION_REPREP in
-* scsi_io_completion_action() above.
-*/
-   if (q->mq_ops)
-   scsi_mq_requeue_cmd(cmd);
-   else {
-   scsi_release_buffers(cmd);
-   scsi_requeue_command(q, cmd);
-   }
-   } else
+   if (result == 0)
+   scsi_io_completion_reprep(cmd, q);
+   else
scsi_io_completion_action(cmd, result);
 }
 
-- 
2.14.1



[PATCH v3 1/7] scsi_io_completion comment on end_request return

2018-03-27 Thread Douglas Gilbert
scsi_end_request() is called multiple times from scsi_io_completion()
which branches on its bool returned value. Add comment before the static
definition of scsi_end_request() about the meaning of that return.

Signed-off-by: Douglas Gilbert 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
The reader might think that if scsi_end_request() had actually ended the
request (i.e. no bytes remaining) it would return true. If so, the reader
would be misled.

 drivers/scsi/scsi_lib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 393f9db8f41b..d44ee84df091 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -662,6 +662,7 @@ static void scsi_release_bidi_buffers(struct scsi_cmnd *cmd)
cmd->request->next_rq->special = NULL;
 }
 
+/* Returns false when no more bytes to process, true if there are more */
 static bool scsi_end_request(struct request *req, blk_status_t error,
unsigned int bytes, unsigned int bidi_bytes)
 {
-- 
2.14.1



[PATCH v3 0/7] scsi_io_completion cleanup

2018-03-27 Thread Douglas Gilbert
While working of this "[PATCH v4] Make SCSI Status CONDITION MET
equivalent to GOOD" the author had trouble finding how to add another
corner case check in the scsi_io_completion() function (scsi_lib.c).
That function is executed at the completion of every SCSI command but
only 20 or so of its hundred of lines of code are executed for the vast
majority of invocations (i.e. the fastpath).

This patch refactors scsi_io_completion() by taking the bulk of its
error processing code into three static helper functions, leaving only
the 20 or so code lines that constitute the fastpath. In this process
comments were added and tweaked, plus variables renamed. The last two
patches in this set are optional: adding conditional hints along the
fastpath and converting 3 BUG() calls in scsi_io_completion() to WARNs
as requested by a reviewer.

There is no attempt in this patchset to change the logic of the original
scsi_io_completion(). Some conditional checks are saved by reducing the
number of redundant tests (e.g. on the 'result' variable being non-zero).
Also De Morgan's laws are applied to some complex conditions to simplify
them from the reader's perspective. The fact remains, this is a very
complex function.

This patch is against Martin Petersen's 4.17/scsi-queue branch.

Douglas Gilbert (7):
  scsi_io_completion comment on end_request return
  scsi_io_completion rename variables
  scsi_io_completion_nz_result function added
  scsi_io_completion_action helper added
  scsi_io_completion_reprep helper added
  scsi_io_completion hints on fastpatch
  scsi_io_completion convert BUGs to WARNs

 drivers/scsi/scsi_lib.c | 376 +++-
 1 file changed, 213 insertions(+), 163 deletions(-)

-- 
2.14.1



[PATCH v3 4/7] scsi_io_completion_action helper added

2018-03-27 Thread Douglas Gilbert
Place scsi_io_completion()'s complex error processing associated with a
local enumeration into a static helper function. That enumeration's
values start with "ACTION_" so use the suffix "_action" in the helper
function's name.

Signed-off-by: Douglas Gilbert 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/scsi_lib.c | 326 ++--
 1 file changed, 174 insertions(+), 152 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3f65bb541ca3..b1cefd8e527f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -751,6 +751,169 @@ static blk_status_t __scsi_error_from_host_byte(struct 
scsi_cmnd *cmd,
}
 }
 
+/* Helper for scsi_io_completion() when special action required. */
+static void scsi_io_completion_action(struct scsi_cmnd *cmd, int result)
+{
+   struct request_queue *q = cmd->device->request_queue;
+   struct request *req = cmd->request;
+   int level = 0;
+   enum {ACTION_FAIL, ACTION_REPREP, ACTION_RETRY,
+ ACTION_DELAYED_RETRY} action;
+   unsigned long wait_for = (cmd->allowed + 1) * req->timeout;
+   struct scsi_sense_hdr sshdr;
+   bool sense_valid = false;
+   bool sense_current = true;  /* false implies "deferred sense" */
+   blk_status_t blk_stat;
+
+   if (scsi_command_normalize_sense(cmd, )) {
+   sense_valid = true;
+   sense_current = !scsi_sense_is_deferred();
+   }
+
+   blk_stat = __scsi_error_from_host_byte(cmd, result);
+
+   if (host_byte(result) == DID_RESET) {
+   /* Third party bus reset or reset for error recovery
+* reasons.  Just retry the command and see what
+* happens.
+*/
+   action = ACTION_RETRY;
+   } else if (sense_valid && sense_current) {
+   switch (sshdr.sense_key) {
+   case UNIT_ATTENTION:
+   if (cmd->device->removable) {
+   /* Detected disc change.  Set a bit
+* and quietly refuse further access.
+*/
+   cmd->device->changed = 1;
+   action = ACTION_FAIL;
+   } else {
+   /* Must have been a power glitch, or a
+* bus reset.  Could not have been a
+* media change, so we just retry the
+* command and see what happens.
+*/
+   action = ACTION_RETRY;
+   }
+   break;
+   case ILLEGAL_REQUEST:
+   /* If we had an ILLEGAL REQUEST returned, then
+* we may have performed an unsupported
+* command.  The only thing this should be
+* would be a ten byte read where only a six
+* byte read was supported.  Also, on a system
+* where READ CAPACITY failed, we may have
+* read past the end of the disk.
+*/
+   if ((cmd->device->use_10_for_rw &&
+   sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
+   (cmd->cmnd[0] == READ_10 ||
+cmd->cmnd[0] == WRITE_10)) {
+   /* This will issue a new 6-byte command. */
+   cmd->device->use_10_for_rw = 0;
+   action = ACTION_REPREP;
+   } else if (sshdr.asc == 0x10) /* DIX */ {
+   action = ACTION_FAIL;
+   blk_stat = BLK_STS_PROTECTION;
+   /* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
+   } else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+   action = ACTION_FAIL;
+   blk_stat = BLK_STS_TARGET;
+   } else
+   action = ACTION_FAIL;
+   break;
+   case ABORTED_COMMAND:
+   action = ACTION_FAIL;
+   if (sshdr.asc == 0x10) /* DIF */
+   blk_stat = BLK_STS_PROTECTION;
+   break;
+   case NOT_READY:
+   /* If the device is in the process of becoming
+* ready, or has a temporary blockage, retry.
+*/
+   if (sshdr.asc == 0x04) {
+   switch (sshdr.ascq) {
+   case 0x01: /* becoming ready */
+   case 0x04: /* format in progress */
+  

[PATCH v3 2/7] scsi_io_completion rename variables

2018-03-27 Thread Douglas Gilbert
Change some variable names and associated comments for clarity. Correct
some misleading comments.

Signed-off-by: Douglas Gilbert 
Reviewed-by: Johannes Thumshirn 
---
 drivers/scsi/scsi_lib.c | 57 ++---
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d44ee84df091..598836804745 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -777,18 +777,19 @@ static blk_status_t __scsi_error_from_host_byte(struct 
scsi_cmnd *cmd,
  *be put back on the queue and retried using the same
  *command as before, possibly after a delay.
  *
- * c) We can call scsi_end_request() with -EIO to fail
- *the remainder of the request.
+ * c) We can call scsi_end_request() with blk_stat other than
+ *BLK_STS_OK, to fail the remainder of the request.
  */
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
int result = cmd->result;
struct request_queue *q = cmd->device->request_queue;
struct request *req = cmd->request;
-   blk_status_t error = BLK_STS_OK;
+   blk_status_t blk_stat = BLK_STS_OK;
struct scsi_sense_hdr sshdr;
bool sense_valid = false;
-   int sense_deferred = 0, level = 0;
+   bool sense_current = true;  /* false implies "deferred sense" */
+   int level = 0;
enum {ACTION_FAIL, ACTION_REPREP, ACTION_RETRY,
  ACTION_DELAYED_RETRY} action;
unsigned long wait_for = (cmd->allowed + 1) * req->timeout;
@@ -796,7 +797,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
if (result) {
sense_valid = scsi_command_normalize_sense(cmd, );
if (sense_valid)
-   sense_deferred = scsi_sense_is_deferred();
+   sense_current = !scsi_sense_is_deferred();
}
 
if (blk_rq_is_passthrough(req)) {
@@ -809,8 +810,9 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
min(8 + cmd->sense_buffer[7],
SCSI_SENSE_BUFFERSIZE);
}
-   if (!sense_deferred)
-   error = __scsi_error_from_host_byte(cmd, 
result);
+   if (sense_current)
+   blk_stat = __scsi_error_from_host_byte(cmd,
+  result);
}
/*
 * __scsi_error_from_host_byte may have reset the host_byte
@@ -829,13 +831,13 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
BUG();
return;
}
-   } else if (blk_rq_bytes(req) == 0 && result && !sense_deferred) {
+   } else if (blk_rq_bytes(req) == 0 && result && sense_current) {
/*
 * Flush commands do not transfers any data, and thus cannot use
 * good_bytes != blk_rq_bytes(req) as the signal for an error.
-* This sets the error explicitly for the problem case.
+* This sets blk_stat explicitly for the problem case.
 */
-   error = __scsi_error_from_host_byte(cmd, result);
+   blk_stat = __scsi_error_from_host_byte(cmd, result);
}
 
/* no bidi support for !blk_rq_is_passthrough yet */
@@ -865,8 +867,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int 
good_bytes)
else if (!(req->rq_flags & RQF_QUIET))
scsi_print_sense(cmd);
result = 0;
-   /* for passthrough error may be set */
-   error = BLK_STS_OK;
+   /* for passthrough, blk_stat may be set */
+   blk_stat = BLK_STS_OK;
}
/*
 * Another corner case: the SCSI status byte is non-zero but 'good'.
@@ -877,23 +879,24 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
 */
if (status_byte(result) && scsi_status_is_good(result)) {
result = 0;
-   error = BLK_STS_OK;
+   blk_stat = BLK_STS_OK;
}
 
/*
-* special case: failed zero length commands always need to
-* drop down into the retry code. Otherwise, if we finished
-* all bytes in the request we are done now.
+* Next deal with any sectors which we were able to correctly
+* handle. Failed, zero length commands always need to drop down
+* to retry code. Fast path should return in this block.
 */
-   if (!(blk_rq_bytes(req) == 0 && error) &&
-   !scsi_end_request(req, error, good_bytes, 0))
-  

[PATCH v3 7/7] scsi_io_completion convert BUGs to WARNs

2018-03-27 Thread Douglas Gilbert
The scsi_io_completion function contains three BUG() and BUG_ON() calls.
Replace them with WARN variants.

Signed-off-by: Douglas Gilbert 
---
 drivers/scsi/scsi_lib.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 391bbb0b19b8..6fa24cff43a6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1046,13 +1046,21 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
scsi_req(req->next_rq)->resid_len = scsi_in(cmd)->resid;
if (scsi_end_request(req, BLK_STS_OK, blk_rq_bytes(req),
blk_rq_bytes(req->next_rq)))
-   BUG();
+   WARN_ONCE(true,
+ "Bidi command with remaining bytes");
return;
}
}
 
/* no bidi support yet, other than in pass-through */
-   BUG_ON(blk_bidi_rq(req));
+   if (unlikely(blk_bidi_rq(req))) {
+   WARN_ONCE(true, "Only support bidi command in passthrough");
+   scmd_printk(KERN_ERR, cmd, "Killing bidi command\n");
+   if (scsi_end_request(req, BLK_STS_IOERR, blk_rq_bytes(req),
+blk_rq_bytes(req->next_rq)))
+   WARN_ONCE(true, "Bidi command with remaining bytes");
+   return;
+   }
 
/*
 * Next deal with any sectors which we were able to correctly
@@ -1075,7 +1083,8 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
/* Kill remainder if no retries */
if (unlikely(blk_stat && scsi_noretry_cmd(cmd))) {
if (scsi_end_request(req, blk_stat, blk_rq_bytes(req), 0))
-   BUG();
+   WARN_ONCE(true,
+   "Bytes remaining after failed, no-retry command");
return;
}
 
-- 
2.14.1



[PATCH v3 6/7] scsi_io_completion hints on fastpatch

2018-03-27 Thread Douglas Gilbert
Add likely() and unlikely() hints to conditionals on or near the fastpath.

Signed-off-by: Douglas Gilbert 
Reviewed-by: Johannes Thumshirn 
---
A reviewer wanted any performance improvement (or otherwise) quantified.
The improvement was so small, that ftrace ignored it. Inline timing code
suggests the improvement from this whole patchset is around 7 nanoseconds
per invocation (tested on a Lenovo X270 (i5-7200U)). Not exactly huge.
Another win might be the smaller size of scsi_io_completion() after the
refactoring; this should allow more other code to fit in the instruction
cache.

 drivers/scsi/scsi_lib.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e7e5eb5b9e2e..391bbb0b19b8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1028,17 +1028,17 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
struct request *req = cmd->request;
blk_status_t blk_stat = BLK_STS_OK;
 
-   if (result) /* does not necessarily mean there is an error */
+   if (unlikely(result))   /* a nz result may or may not be an error */
result = scsi_io_completion_nz_result(cmd, result, _stat);
 
-   if (blk_rq_is_passthrough(req)) {
+   if (unlikely(blk_rq_is_passthrough(req))) {
/*
 * __scsi_error_from_host_byte may have reset the host_byte
 */
scsi_req(req)->result = cmd->result;
scsi_req(req)->resid_len = scsi_get_resid(cmd);
 
-   if (scsi_bidi_cmnd(cmd)) {
+   if (unlikely(scsi_bidi_cmnd(cmd))) {
/*
 * Bidi commands Must be complete as a whole,
 * both sides at once.
@@ -1051,7 +1051,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
}
}
 
-   /* no bidi support for !blk_rq_is_passthrough yet */
+   /* no bidi support yet, other than in pass-through */
BUG_ON(blk_bidi_rq(req));
 
/*
@@ -1067,15 +1067,13 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
 * handle. Failed, zero length commands always need to drop down
 * to retry code. Fast path should return in this block.
 */
-   if (blk_rq_bytes(req) > 0 || blk_stat == BLK_STS_OK) {
-   if (!scsi_end_request(req, blk_stat, good_bytes, 0))
-   return; /* no bytes remaining */
+   if (likely(blk_rq_bytes(req) > 0 || blk_stat == BLK_STS_OK)) {
+   if (likely(!scsi_end_request(req, blk_stat, good_bytes, 0)))
+   return; /* no bytes remaining */
}
 
-   /*
-* Kill remainder if no retrys.
-*/
-   if (blk_stat && scsi_noretry_cmd(cmd)) {
+   /* Kill remainder if no retries */
+   if (unlikely(blk_stat && scsi_noretry_cmd(cmd))) {
if (scsi_end_request(req, blk_stat, blk_rq_bytes(req), 0))
BUG();
return;
@@ -1085,7 +1083,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
 * If there had been no error, but we have leftover bytes in the
 * requeues just queue the command up again.
 */
-   if (result == 0)
+   if (likely(result == 0))
scsi_io_completion_reprep(cmd, q);
else
scsi_io_completion_action(cmd, result);
-- 
2.14.1



[PATCH v3 3/7] scsi_io_completion_nz_result function added

2018-03-27 Thread Douglas Gilbert
Break out several intertwined paths when cmd->result is non zero and
place them in the scsi_io_completion_nz_result helper function. The
logic is not changed.

Signed-off-by: Douglas Gilbert 
---
A reviewer requested the original helper function's two return values
be reduced to one: the blk_stat variable. This required a hack to
differentiate the default setting of blk_stat (BLK_STS_OK) from the case
when the helper assigns BLK_STS_OK as the return value. The hack is to
return the otherwise unused BLK_STS_NOTSUPP value as an indication that
the helper didn't change anything. That hack was judged by another
reviewer to be worse that the "two return values" ugliness it was
trying to address. So back to the original "two return values" solution.
Returning a structure containing result and blk_stat is another
possibility but returning structures is frowned upon in some circles.

 drivers/scsi/scsi_lib.c | 126 
 1 file changed, 74 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 598836804745..3f65bb541ca3 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -751,6 +751,78 @@ static blk_status_t __scsi_error_from_host_byte(struct 
scsi_cmnd *cmd,
}
 }
 
+/*
+ * Helper for scsi_io_completion() when cmd->result is non-zero. Returns a
+ * new result that may suppress further error checking. Also modifies
+ * *blk_statp in some cases.
+ */
+static int scsi_io_completion_nz_result(struct scsi_cmnd *cmd, int result,
+   blk_status_t *blk_statp)
+{
+   bool sense_valid;
+   bool about_current;
+   struct request *req = cmd->request;
+   struct scsi_sense_hdr sshdr;
+
+   sense_valid = scsi_command_normalize_sense(cmd, );
+   about_current = sense_valid ? !scsi_sense_is_deferred() : true;
+
+   if (blk_rq_is_passthrough(req)) {
+   if (sense_valid) {
+   /*
+* SG_IO wants current and deferred errors
+*/
+   scsi_req(req)->sense_len =
+   min(8 + cmd->sense_buffer[7],
+   SCSI_SENSE_BUFFERSIZE);
+   }
+   if (about_current)
+   *blk_statp = __scsi_error_from_host_byte(cmd, result);
+   } else if (blk_rq_bytes(req) == 0 && about_current) {
+   /*
+* Flush commands do not transfers any data, and thus cannot use
+* good_bytes != blk_rq_bytes(req) as the signal for an error.
+* This sets *blk_statp explicitly for the problem case.
+*/
+   *blk_statp = __scsi_error_from_host_byte(cmd, result);
+   }
+   /*
+* Recovered errors need reporting, but they're always treated as
+* success, so fiddle the result code here.  For passthrough requests
+* we already took a copy of the original into sreq->result which
+* is what gets returned to the user
+*/
+   if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) {
+   bool do_print = true;
+   /*
+* if ATA PASS-THROUGH INFORMATION AVAILABLE [0x0, 0x1d]
+* skip print since caller wants ATA registers. Only occurs
+* on SCSI ATA PASS_THROUGH commands when CK_COND=1
+*/
+   if ((sshdr.asc == 0x0) && (sshdr.ascq == 0x1d))
+   do_print = false;
+   else if (req->rq_flags & RQF_QUIET)
+   do_print = false;
+   if (do_print)
+   scsi_print_sense(cmd);
+   /* for passthrough, *blk_statp may be set, so clear */
+   *blk_statp = BLK_STS_OK;
+   result = 0;
+   }
+   /*
+* Another corner case: the SCSI status byte is non-zero but 'good'.
+* Example: PRE-FETCH command returns SAM_STAT_CONDITION_MET when
+* it is able to fit nominated LBs in its cache (and SAM_STAT_GOOD
+* if it can't fit). Treat SAM_STAT_CONDITION_MET and the related
+* intermediate statuses (both obsolete in SAM-4) as good.
+*/
+   if (status_byte(result) && scsi_status_is_good(result)) {
+   *blk_statp = BLK_STS_OK;
+   result = 0;
+   }
+   return result;
+}
+
 /*
  * Function:scsi_io_completion()
  *
@@ -794,26 +866,14 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
  ACTION_DELAYED_RETRY} action;
unsigned long wait_for = (cmd->allowed + 1) * req->timeout;
 
-   if (result) {
+   if (result) {   /* does not necessarily mean there is an error */
sense_valid = scsi_command_normalize_sense(cmd, );
if (sense_valid)
sense_current = !scsi_sense_is_deferred();
+ 

Re: [RFC PATCH] mpt3sas: mpt3sas_scsih_enclosure_find_by_handle can be static

2018-03-27 Thread Bart Van Assche
On Wed, 2018-03-21 at 05:06 +0800, kbuild test robot wrote:
> Fixes: 793a6223beef ("mpt3sas: Cache enclosure pages during enclosure add.")
> Signed-off-by: Fengguang Wu 
> ---
>  mpt3sas_scsih.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
> b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index c93c5c5..67a43957 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -1370,7 +1370,7 @@ mpt3sas_scsih_expander_find_by_handle(struct 
> MPT3SAS_ADAPTER *ioc, u16 handle)
>   * This searches for enclosure device based on handle, then returns the
>   * enclosure object.
>   */
> -struct _enclosure_node *
> +static struct _enclosure_node *
>  mpt3sas_scsih_enclosure_find_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 
> handle)
>  {
>   struct _enclosure_node *enclosure_dev, *r;

Hello Chaitra,

Are you aware that if the 0-day test infrastructure suggests an improvement
for a patch that the patch that that improvement applies to gets ignored
unless either the patch is reposted with the improvement applied or that it
is explained why the suggested improvement is inappropriate?

Thanks,

Bart.




Re: [PATCH net-next] qed*: Utilize FW 8.33.11.0

2018-03-27 Thread Leon Romanovsky
On Tue, Mar 27, 2018 at 05:41:51PM +, Kalderon, Michal wrote:
> > From: Jason Gunthorpe [mailto:j...@ziepe.ca]
> > Sent: Tuesday, March 27, 2018 12:18 AM
> > > diff --git a/drivers/infiniband/hw/qedr/main.c
> > > b/drivers/infiniband/hw/qedr/main.c
> > > index db4bf97..7dbbe6d 100644
> > > +++ b/drivers/infiniband/hw/qedr/main.c
> > > @@ -51,6 +51,7 @@
> > >  MODULE_DESCRIPTION("QLogic 40G/100G ROCE Driver");
> > > MODULE_AUTHOR("QLogic Corporation");  MODULE_LICENSE("Dual
> > BSD/GPL");
> > > +MODULE_VERSION(QEDR_MODULE_VERSION);
> > >
> > >  #define QEDR_WQ_MULTIPLIER_DFT   (3)
> > >
> > > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > > b/drivers/infiniband/hw/qedr/qedr.h
> > > index 86d4511..ab0d411 100644
> > > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > > @@ -43,6 +43,8 @@
> > >  #include "qedr_hsi_rdma.h"
> > >
> > >  #define QEDR_NODE_DESC "QLogic 579xx RoCE HCA"
> > > +#define QEDR_MODULE_VERSION "8.33.11.20"
> > > +
> >
> > I thought we had a general prohibition against versions like this in 
> > mainline
> > drivers? And what does this hunk have to do with supporting new firmware?
> >
> > Jason
> I'm assuming you refer only to rdma in regards to version prohibition right ? 
> as looking at all other vendors
> (including Mellanox) all have module versions under net/ why is rdma 
> different in this way ?
> I now searched back mails on the topic and found an email from Leon where he 
> stated:
> " I am strongly against module versions. You should rely on official kernel 
> version."
> But it's not always the inbox driver that is installed or probed, the kernel 
> version is not enough.
> Given different distros, vanilla kernels, out of box drivers, etc... it is 
> essential for  us that based on
> logs And modinfo we can determine the qed* drivers that are running.

We actually stopped to maintain driver versions, just ensure that inbox,
upstream and MLNX_OFED have different names.

The discussion thread is here
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004426.html
https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2017-June/004441.html

>
> We have received complaints that our qedr module doesn't have a version 
> whereas all of our other
> components do (qed, qede, qedi, qedf). We decided to add the qedr version 
> with the next version
> update to align with the rest of the components.
>
> We can move the driver version bump into a different commit for all 
> components, just made
> sense to Add it to this one as it is the root of the version update.
> Let me know if you think it is essential and I'll make the change for v2
>
> Thanks,
> Michal


signature.asc
Description: PGP signature


RE: [PATCH net-next] qed*: Utilize FW 8.33.11.0

2018-03-27 Thread Kalderon, Michal
> From: Jason Gunthorpe [mailto:j...@ziepe.ca]
> Sent: Tuesday, March 27, 2018 12:18 AM
> > diff --git a/drivers/infiniband/hw/qedr/main.c
> > b/drivers/infiniband/hw/qedr/main.c
> > index db4bf97..7dbbe6d 100644
> > +++ b/drivers/infiniband/hw/qedr/main.c
> > @@ -51,6 +51,7 @@
> >  MODULE_DESCRIPTION("QLogic 40G/100G ROCE Driver");
> > MODULE_AUTHOR("QLogic Corporation");  MODULE_LICENSE("Dual
> BSD/GPL");
> > +MODULE_VERSION(QEDR_MODULE_VERSION);
> >
> >  #define QEDR_WQ_MULTIPLIER_DFT (3)
> >
> > diff --git a/drivers/infiniband/hw/qedr/qedr.h
> > b/drivers/infiniband/hw/qedr/qedr.h
> > index 86d4511..ab0d411 100644
> > +++ b/drivers/infiniband/hw/qedr/qedr.h
> > @@ -43,6 +43,8 @@
> >  #include "qedr_hsi_rdma.h"
> >
> >  #define QEDR_NODE_DESC "QLogic 579xx RoCE HCA"
> > +#define QEDR_MODULE_VERSION "8.33.11.20"
> > +
> 
> I thought we had a general prohibition against versions like this in mainline
> drivers? And what does this hunk have to do with supporting new firmware?
> 
> Jason
I'm assuming you refer only to rdma in regards to version prohibition right ? 
as looking at all other vendors
(including Mellanox) all have module versions under net/ why is rdma different 
in this way ?
I now searched back mails on the topic and found an email from Leon where he 
stated: 
" I am strongly against module versions. You should rely on official kernel 
version."
But it's not always the inbox driver that is installed or probed, the kernel 
version is not enough. 
Given different distros, vanilla kernels, out of box drivers, etc... it is 
essential for  us that based on
logs And modinfo we can determine the qed* drivers that are running. 

We have received complaints that our qedr module doesn't have a version whereas 
all of our other
components do (qed, qede, qedi, qedf). We decided to add the qedr version with 
the next version
update to align with the rest of the components. 

We can move the driver version bump into a different commit for all components, 
just made
sense to Add it to this one as it is the root of the version update. 
Let me know if you think it is essential and I'll make the change for v2

Thanks,
Michal


[PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-27 Thread Keith Busch
The PCI interrupt vectors intended to be associated with a queue may
not start at 0; a driver may allocate pre_vectors for special use. This
patch adds an offset parameter so blk-mq may find the intended affinity
mask and updates all drivers using this API accordingly.

Cc: Don Brace 
Cc: 
Cc: 
Signed-off-by: Keith Busch 
---
v1 -> v2:

  Update blk-mq API directly instead of chaining a default parameter to
  a new API, and update all drivers accordingly.

 block/blk-mq-pci.c| 6 --
 drivers/nvme/host/pci.c   | 2 +-
 drivers/scsi/qla2xxx/qla_os.c | 2 +-
 drivers/scsi/smartpqi/smartpqi_init.c | 2 +-
 include/linux/blk-mq-pci.h| 3 ++-
 5 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
index 76944e3271bf..e233996bb76f 100644
--- a/block/blk-mq-pci.c
+++ b/block/blk-mq-pci.c
@@ -21,6 +21,7 @@
  * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
  * @set:   tagset to provide the mapping for
  * @pdev:  PCI device associated with @set.
+ * @offset:Offset to use for the pci irq vector
  *
  * This function assumes the PCI device @pdev has at least as many available
  * interrupt vectors as @set has queues.  It will then query the vector
@@ -28,13 +29,14 @@
  * that maps a queue to the CPUs that have irq affinity for the corresponding
  * vector.
  */
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+   int offset)
 {
const struct cpumask *mask;
unsigned int queue, cpu;
 
for (queue = 0; queue < set->nr_hw_queues; queue++) {
-   mask = pci_irq_get_affinity(pdev, queue);
+   mask = pci_irq_get_affinity(pdev, queue + offset);
if (!mask)
goto fallback;
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cef5ce851a92..e3b9efca0571 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -414,7 +414,7 @@ static int nvme_pci_map_queues(struct blk_mq_tag_set *set)
 {
struct nvme_dev *dev = set->driver_data;
 
-   return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev));
+   return blk_mq_pci_map_queues(set, to_pci_dev(dev->dev), 0);
 }
 
 /**
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 12ee6e02d146..2c705f3dd265 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -6805,7 +6805,7 @@ static int qla2xxx_map_queues(struct Scsi_Host *shost)
if (USER_CTRL_IRQ(vha->hw))
rc = blk_mq_map_queues(>tag_set);
else
-   rc = blk_mq_pci_map_queues(>tag_set, vha->hw->pdev);
+   rc = blk_mq_pci_map_queues(>tag_set, vha->hw->pdev, 0);
return rc;
 }
 
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c 
b/drivers/scsi/smartpqi/smartpqi_init.c
index b2880c7709e6..10c94011c8a8 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -5348,7 +5348,7 @@ static int pqi_map_queues(struct Scsi_Host *shost)
 {
struct pqi_ctrl_info *ctrl_info = shost_to_hba(shost);
 
-   return blk_mq_pci_map_queues(>tag_set, ctrl_info->pci_dev);
+   return blk_mq_pci_map_queues(>tag_set, ctrl_info->pci_dev, 0);
 }
 
 static int pqi_getpciinfo_ioctl(struct pqi_ctrl_info *ctrl_info,
diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h
index 6338551e0fb9..9f4c17f0d2d8 100644
--- a/include/linux/blk-mq-pci.h
+++ b/include/linux/blk-mq-pci.h
@@ -5,6 +5,7 @@
 struct blk_mq_tag_set;
 struct pci_dev;
 
-int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev);
+int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
+ int offset);
 
 #endif /* _LINUX_BLK_MQ_PCI_H */
-- 
2.14.3



Re: [PATCH v2 4/6] scsi_io_completion_action helper added

2018-03-27 Thread Bart Van Assche
On Sun, 2018-03-18 at 21:59 +0100, Douglas Gilbert wrote:
> Place scsi_io_completion()'s complex error processing associated with a
> local enumeration into a static helper function. That enumeration's
> values start with "ACTION_" so use the suffix "_action" in the helper
> function's name.

Please split this patch into two patches: one that introduces the new function
scsi_io_completion_action() without changing the logic of the code and a second
patch that introduces the "sense_valid_and_current" variable. I think that that
will make the changes easier to read and to verify.

Thanks,

Bart.







Re: [PATCH v2 4/6] scsi_io_completion_action helper added

2018-03-27 Thread Bart Van Assche
On Tue, 2018-03-27 at 09:36 -0400, Douglas Gilbert wrote:
> On 2018-03-26 08:13 PM, Bart Van Assche wrote:
> > On Sun, 2018-03-18 at 21:59 +0100, Douglas Gilbert wrote:
> > > + /* sense not about current command is termed: deferred */
> > 
> > Do we really need comments that explain the SCSI specs? If such a comment is
> > added I think it should be added above the definition of 
> > scsi_sense_is_deferred()
> > together with a reference to the "Sense data" section in SPC.
> > 
> > > + if (result == 0) {
> > > + /*
> > > +  * Unprep the request and put it back at the head of the
> > > +  * queue. A new command will be prepared and issued.
> > > +  * This block is the same as case ACTION_REPREP in
> > > +  * scsi_io_completion_action() above.
> > >*/
> > > - if (q->mq_ops) {
> > > + if (q->mq_ops)
> > >   scsi_mq_requeue_cmd(cmd);
> > > - } else {
> > > + else {
> > >   scsi_release_buffers(cmd);
> > >   scsi_requeue_command(q, cmd);
> > >   }
> > 
> > Have these changes been verified with checkpatch? Checkpatch should have 
> > reported
> > the following about the above chunk of code: Unbalanced braces around else 
> > statement.
> 
> Yes they were, did you check them? If so, with what command line options?
> Since with no options /scripts/checkpatch.pl returns
> clean for all patches in this set.

If checkpatch did not complain about this patch then I think that that
indicates a bug in checkpatch. The following excerpt from the kernel v4.16-rc7
checkpatch source code shows that checkpatch should complain about the above
changes:

# check for single line unbalanced braces
if ($sline =~ /^.\s*\}\s*else\s*$/ ||
$sline =~ /^.\s*else\s*\{\s*$/) {
CHK("BRACES", "Unbalanced braces around else 
statement\n" . $herecurr);
}

Anyway, I think the output of the following commands shows that balancing braces
is the preferred style in the Linux kernel:

$ git grep "$(printf "\t")else {" | wc -l
4971
$ git grep '} else {' | wc -l
61132

Thanks,

Bart.






Re: [PATCH] scsi: Replace sdev_printk with printk_deferred to avoid

2018-03-27 Thread Bart Van Assche
On Thu, 2018-03-08 at 16:50 +0800, Wen Yang wrote:
> When scsi disks went wrong frequently, and with serial console 
> attached, tasks may be blocked in the following flow for more than 10s: [ ... 
> ]

From https://bugzilla.kernel.org/show_bug.cgi?id=199003: "Hm,
printk_deferred is a bit dangerous; it moves console_unlock() to
IRQ. [ ... ] So I'd say that those two approaches

printk_deferred + touch_nmi_watchdog

combined can do quite some harm. One thing for sure - they don't really fix
any problems."

Does that mean that this patch should be dropped?

Thanks,

Bart.




[GIT PULL] SCSI fixes for 4.16-rc7

2018-03-27 Thread James Bottomley
Two driver fixes (ibmvfc, iscsi_tcp) and a USB fix for devices that
give the wrong return to Read Capacity and cause a huge log spew.  The
remaining 5 patches all try to fix commit 84676c1f21e8ff5
"genirq/affinity: assign vectors to all possible CPUs") which broke the
non-mq I/O path.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Brian King (1):
  scsi: ibmvfc: Avoid unnecessary port relogin

Jianchao Wang (1):
  scsi: iscsi_tcp: set BDI_CAP_STABLE_WRITES when data digest enabled

Martin K. Petersen (1):
  scsi: sd: Remember that READ CAPACITY(16) succeeded

Ming Lei (5):
  scsi: virtio_scsi: unify scsi_host_template
  scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
  scsi: core: introduce force_blk_mq
  scsi: megaraid_sas: fix selection of reply queue
  scsi: hpsa: fix selection of reply queue

And the diffstat:

 drivers/scsi/hosts.c|   1 +
 drivers/scsi/hpsa.c |  73 
 drivers/scsi/hpsa.h |   1 +
 drivers/scsi/ibmvscsi/ibmvfc.c  |   6 +-
 drivers/scsi/iscsi_tcp.c|   8 ++
 drivers/scsi/megaraid/megaraid_sas.h|   1 +
 drivers/scsi/megaraid/megaraid_sas_base.c   |  39 -
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  12 +--
 drivers/scsi/sd.c   |   2 +
 drivers/scsi/virtio_scsi.c  | 129 
 include/scsi/scsi_host.h|   3 +
 11 files changed, 128 insertions(+), 147 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index dd9464920456..ef22b275d050 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -474,6 +474,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
shost->dma_boundary = 0x;
 
shost->use_blk_mq = scsi_use_blk_mq;
+   shost->use_blk_mq = scsi_use_blk_mq || shost->hostt->force_blk_mq;
 
device_initialize(>shost_gendev);
dev_set_name(>shost_gendev, "host%d", shost->host_no);
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 87b260e403ec..31423b6dc26d 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info *h, 
struct CommandList *c,
c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
if (unlikely(!h->msix_vectors))
return;
-   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-   c->Header.ReplyQueue =
-   raw_smp_processor_id() % h->nreply_queues;
-   else
-   c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+   c->Header.ReplyQueue = reply_queue;
}
 }
 
@@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct 
ctlr_info *h,
 * Tell the controller to post the reply to the queue for this
 * processor.  This seems to give the best I/O throughput.
 */
-   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-   cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-   else
-   cp->ReplyQueue = reply_queue % h->nreply_queues;
+   cp->ReplyQueue = reply_queue;
/*
 * Set the bits in the address sent down to include:
 *  - performant mode bit (bit 0)
@@ -1087,10 +1080,7 @@ static void set_ioaccel2_tmf_performant_mode(struct 
ctlr_info *h,
/* Tell the controller to post the reply to the queue for this
 * processor.  This seems to give the best I/O throughput.
 */
-   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-   cp->reply_queue = smp_processor_id() % h->nreply_queues;
-   else
-   cp->reply_queue = reply_queue % h->nreply_queues;
+   cp->reply_queue = reply_queue;
/* Set the bits in the address sent down to include:
 *  - performant mode bit not used in ioaccel mode 2
 *  - pull count (bits 0-3)
@@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct 
ctlr_info *h,
 * Tell the controller to post the reply to the queue for this
 * processor.  This seems to give the best I/O throughput.
 */
-   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-   cp->reply_queue = smp_processor_id() % h->nreply_queues;
-   else
-   cp->reply_queue = reply_queue % h->nreply_queues;
+   cp->reply_queue = reply_queue;
/*
 * Set the bits in the address sent down to include:
 *  - performant mode bit not used in ioaccel mode 2
@@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info 
*h,
 {
dial_down_lockup_detection_during_fw_flash(h, c);

Re: [PATCH v2 4/6] scsi_io_completion_action helper added

2018-03-27 Thread Douglas Gilbert

On 2018-03-26 08:13 PM, Bart Van Assche wrote:

On Sun, 2018-03-18 at 21:59 +0100, Douglas Gilbert wrote:

+   /* sense not about current command is termed: deferred */


Do we really need comments that explain the SCSI specs? If such a comment is
added I think it should be added above the definition of 
scsi_sense_is_deferred()
together with a reference to the "Sense data" section in SPC.


+   if (result == 0) {
+   /*
+* Unprep the request and put it back at the head of the
+* queue. A new command will be prepared and issued.
+* This block is the same as case ACTION_REPREP in
+* scsi_io_completion_action() above.
 */
-   if (q->mq_ops) {
+   if (q->mq_ops)
scsi_mq_requeue_cmd(cmd);
-   } else {
+   else {
scsi_release_buffers(cmd);
scsi_requeue_command(q, cmd);
}


Have these changes been verified with checkpatch? Checkpatch should have 
reported
the following about the above chunk of code: Unbalanced braces around else 
statement.


Yes they were, did you check them? If so, with what command line options?
Since with no options /scripts/checkpatch.pl returns
clean for all patches in this set.


Additionally, have you considered to introduce a new function instead of 
duplicating
existing code?


Yes, that could be done.


Otherwise this patch looks fine to me.


Doug Gilbert


Re: [Bug 199003] console stalled, cause Hard LOCKUP.

2018-03-27 Thread Sergey Senozhatsky
I'll Cc blockdev

On (03/27/18 08:36), bugzilla-dae...@bugzilla.kernel.org wrote:
> > --- Comment #17 from sergey.senozhatsky.w...@gmail.com ---
> > On (03/26/18 13:05), bugzilla-dae...@bugzilla.kernel.org wrote:
> > > Therefore the serial console is actually pretty fast. It seems that the
> > > deadline
> > > 10ms-per-character is not in the game here.
> > 
> > As the name suggests this is dmesg - content of logbuf. We can't tell
> > anything about serial consoles speed from it.
> 
> Grrr, you are right. It would be interesting to see the output from
> the serial port as well.
> 
> Anyway, it does not change the fact that printing so many same lines is
> useless. The throttling still would make sense and probably would
> solve the problem.

You are right.

Looking at backtraces 
(https://bugzilla.kernel.org/attachment.cgi?id=274953=edit)
there *probably* was just one CPU doing all printk-s and all printouts. And
there was one CPU waiting for that printing CPU to unlock the queue spin_lock.

The printing CPU was looping in scsi_request_fn() picking up requests
and calling sdev_printk() for each of them, because the device was
offline. Given that serial console is not very fast, that we called
serial console under queue spin_lock and the number of printks called,
it was enough to lockup the CPU which was spining on queue spin_lock and
to hard lockup the system.

scsi_request_fn() does unlock the queue lock later, but not in that
!scsi_device_online(sdev) error case.

scsi_request_fn()
{
for (;;) {
int rtn;
/*
 * get next queueable request.  We do this early to make sure
 * that the request is fully prepared even if we cannot
 * accept it.
 */
req = blk_peek_request(q);
if (!req)
break;

if (unlikely(!scsi_device_online(sdev))) {
sdev_printk(KERN_ERR, sdev,
"rejecting I/O to offline device\n");
scsi_kill_request(req, q);
continue;
^ still under spinlock
}
}

I'd probably just unlock/lock queue lock, rather than ratelimit printk-s,
before `continue'. Dunno.

James, Martin, what do you think?

-ss


答复: 答复: 答复: [PATCH v8 2/5] dt-bindings: scsi: ufs: add document for hisi-ufs

2018-03-27 Thread liwei (CM)
Hi, Arnd

At present our ufs module mainly has four clocks from the outside:
hclk_ufs: main clock of ufs controller ,freq is 207.5MHz 
cfg_phy_clk:  configuration clock of MPHY, freq is 51.875MHz
ref_phy_clk:  reference clock of MPHY from PMU, freq is 19.2MHz
ref_io_clk:reference clock for the external interface to the device, freq 
is 19.2MHz

We control two clocks "ref_io_clk" and "cfg_phy_clk" in the driver because the 
other two are controlled by main clock module and pmu. 
for this patch, cfg_phy_clk corresponds to "phy_clk", ref_io_clk corresponds to 
"ref_clk".

So the clks in the patch you give appear to be unsuitable for describing this 
.And the following clks of qcom are internal clock? 
We didn't describe or pay attention to the clock inside the ufs module.

PHY to controller symbol synchronization clocks:
"rx_lane0_sync_clk" - RX Lane 0
"rx_lane1_sync_clk" - RX Lane 1
"tx_lane0_sync_clk" - TX Lane 0
"tx_lane1_sync_clk" - TX Lane 1


-邮件原件-
发件人: liwei (CM) 
发送时间: 2018年3月26日 20:02
收件人: 'Arnd Bergmann'
抄送: Rob Herring; Mark Rutland; xuwei (O); Catalin Marinas; Will Deacon; Vinayak 
Holikatti; James E.J. Bottomley; Martin K. Petersen; Kevin Hilman; Gregory 
CLEMENT; Thomas Petazzoni; Masahiro Yamada; Riku Voipio; Thierry Reding; 
Krzysztof Kozlowski; Eric Anholt; DTML; Linux Kernel Mailing List; Linux ARM; 
linux-scsi; zangleigang; Gengjianfeng; Guodong Xu; Zhangfei Gao; Fengbaopeng 
(kevin, Kirin Solution Dept); Yaniv Gardi
主题: 答复: 答复: 答复: [PATCH v8 2/5] dt-bindings: scsi: ufs: add document for hisi-ufs

Hi, Arnd

I'll ask our soc colleagues for help and give a detailed and accurate 
explanation aosp.

Thanks!


-邮件原件-
发件人: arndbergm...@gmail.com [mailto:arndbergm...@gmail.com] 代表 Arnd Bergmann
发送时间: 2018年3月26日 18:42
收件人: liwei (CM)
抄送: Rob Herring; Mark Rutland; xuwei (O); Catalin Marinas; Will Deacon; Vinayak 
Holikatti; James E.J. Bottomley; Martin K. Petersen; Kevin Hilman; Gregory 
CLEMENT; Thomas Petazzoni; Masahiro Yamada; Riku Voipio; Thierry Reding; 
Krzysztof Kozlowski; Eric Anholt; DTML; Linux Kernel Mailing List; Linux ARM; 
linux-scsi; zangleigang; Gengjianfeng; Guodong Xu; Zhangfei Gao; Fengbaopeng 
(kevin, Kirin Solution Dept); Yaniv Gardi
主题: Re: 答复: 答复: [PATCH v8 2/5] dt-bindings: scsi: ufs: add document for hisi-ufs

On Mon, Mar 26, 2018 at 12:26 PM, liwei (CM)  wrote:
> 发件人: arndbergm...@gmail.com [mailto:arndbergm...@gmail.com] 代表 Arnd 
> Bergmann
> > 主题: Re: 答复: [PATCH v8 2/5] dt-bindings: scsi: ufs: add document for 
> > hisi-ufs On Fri, Mar 23, 2018 at 3:22 AM, liwei (CM)  
> > wrote:
> >> The clock names sound generic enough, should we have both in the generic 
> >> binding?
> >>
> >> Do you mean that add a "phy_clk" to ufshcd-pltfrm 's bindings?
> >> At present, it seems that in the implementation of generic code, 
> >> apart from "ref_clk" may have special processing, other clk will 
> >> not have special processing and simply parse and enable; Referring 
> >> to ufs-qcom binding, I think "phy_clk" can be named "iface_clk", 
> >> this "iface_clk" exists in ufshcd-pltfrm bindings;If so, "ref_clk", 
> >> "iface_clk" are both in the generic binding,we will remove them here. Is 
> >> that okay?
>
> > I'm looking at the generic binding again, and it seems we never 
> > quite managed to fix some minor problems with it. See below for a possible 
> > way to clarify it.
>
> phy_clk is actually given to the phy. But as previously mentioned , we 
> do not have a separate phy to configure ; The clks in the patch you 
> give appear to be unsuitable for describing this .
> Here we can't describe phy_clk in the node "ufsphy1: ufsphy@fc597000" like 
> qcom.
> So can we put it here in our own binding like this?

I think the concept of having a phy clk is generic enough that it's better to 
have that in the common part, others will surely have the same thing, and in 
this case, qcom would be the exception that does not use one.

There are apparently a couple of things related to the phy that may or may not 
require a clk:

- ref_clk: The reference clock on the mipi bus, this is what qcom have, this 
would
  be the 19.2 MHz clock signal.
- one clock to drive the logic block for the PHY itself, if it is included 
within
  the same logical portion of an SoC as the ufshcd, but uses a separate clock.
- Looking at the Android kernel as distributed by google/qualcomm, they have
  four separate clocks described as

PHY to controller symbol synchronization clocks:
"rx_lane0_sync_clk" - RX Lane 0
"rx_lane1_sync_clk" - RX Lane 1
"tx_lane0_sync_clk" - TX Lane 0
"tx_lane1_sync_clk" - TX Lane 1

Which of the above would your phy_clk refer to?

   Arnd

[1] 
https://android.googlesource.com/kernel/msm/+/android-msm-bullhead-3.10-marshmallow-dr/Documentation/devicetree/bindings/ufs/ufshcd-pltfrm.txt?autodive=0%2F%2F%2F%2F%2F