[PATCH v2] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Fam Zheng
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng 

---

v2: Fix granularity mismatch. [Martin]
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..a5c7e67 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
} else
rw_max = BLK_DEF_MAX_SECTORS;
+   rw_max = min_not_zero(rw_max, logical_to_sectors(sdp, dev_max));
 
/* Combine with controller limits */
q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
-- 
2.9.3



Re: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Martin K. Petersen
Fam Zheng  writes:

Hi Fam,

> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>   rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
>   } else
>   rw_max = BLK_DEF_MAX_SECTORS;
> + rw_max = min_not_zero(rw_max, dev_max);

rw_max is in sectors, dev_max is in logical blocks.

>  
>   /* Combine with controller limits */
>   q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 00/10] be2iscsi: driver update 11.4.0.0

2017-03-27 Thread Martin K. Petersen
Jitendra Bhivare  writes:

> This patch is generated against for-next branch.
>
> v3 changes:
>  be2iscsi: Fix closing of connection
>   - Fixed per Tomas's review comments.
>
> v2 changes:
>  +be2iscsi: Update Copyright
>
> Jitendra Bhivare (10):
>   be2iscsi: Check tag in beiscsi_mccq_compl_wait
>   be2iscsi: Fix closing of connection
>   be2iscsi: Replace spin_unlock_bh with spin_lock
>   scsi_transport_iscsi: Use flush_work in iscsi_remove_session
>   be2iscsi: Increase HDQ default queue size
>   be2iscsi: Use num_cons field in Rx CQE
>   be2iscsi: Remove free_list for ASYNC handles
>   be2iscsi: Check size before copying ASYNC handle
>   be2iscsi: Update Copyright
>   be2iscsi: Update driver version

Applied to 4.12/scsi-queue.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ufs: fix wrong/ambiguous fall through comments

2017-03-27 Thread Martin K. Petersen
kusumi.tomoh...@gmail.com writes:

> From: Tomohiro Kusumi 
>
> These aren't really falling through to anywhere meaningful.
>
> Signed-off-by: Tomohiro Kusumi 

Applied to 4.12/scsi-queue.

Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: osd_uld: remove an unneeded NULL check

2017-03-27 Thread Martin K. Petersen
Dan Carpenter  writes:

> We don't call the remove() function unless probe() succeeds so "oud"
> can't be NULL here.  Plus, if it were NULL, we dereference it on the
> next line so it would crash anyway.

Applied to 4.12/scsi-queue (by hand).

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: ufs: remove the duplicated checking for supporting clkscaling

2017-03-27 Thread Martin K. Petersen
Jaehoon Chung  writes:

Jaehoon,

> There are same conditions for checking whether supporting clkscaling
> or not.  When ufshcd is supporting clkscaling, active_reqs should be
> decreased by two.

Applied to 4.11/scsi-fixes.

Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering


Delock 89384 Sata Controller Causes Lockups Under Heavy Load

2017-03-27 Thread Matthias Peter Walther
Hello,

I'm new to this list and I signed up, because I found an instability
with the following sata controller:

Product name: Delock 89384 10 Port PCIe 2.0 x2 Low Profile retail
Identifies as: 03:00.0 SATA controller: ASMedia Technology Inc. Device
0625 (rev 01)
(PCIe to 10x sata controller card)

Problem description: The controller works and recognizes all my drives.
But under heavy load, e. g. a mdadm raid-6 resync or just a dd to a
file, it keeps causing lockups and random device link resettings on
multiple devices.

I spend the last two weeks on replacing components in this server, the
controller is definitely the problem. Everything works fine with a
Marvell 9215 controller and I tried the controller with three different
mainboards and kernel versions 3.2, 4.4 and 4.10. The controller or its
kernel driver definitely cause these lock ups. I made sure, that all
drives were properly connected. [Syslog attached at the bottom of this
mail.]

As I am an experienced linux user, but new to this, first questions:

Is this the right place to seek for help?

If not so: Where might I get help with this?

If so: Does anybody have an idea, what might causes this problem.

My abilities: I can test patches on the mainline kernel. I can't code,
as I lack any kind of knowledge about the sata standard. I have the
controller card and an empty spare device here, to run any kind of tests.

Syslog of one of these resets: If the level of stress is high enough,
they happen on all connected devices (seemingly random) from different
manufacturers (WesternDigital and Seagate) and different types of
models. So this is probably not a bug in the firmware of one of the drives.

Log:
Mar 24 09:01:43 Server1 kernel: [ 1807.338347] ata3.00: exception Emask
0x0 SAct 0x0 SErr 0x0 action
0x6 frozen
Mar 24 09:01:43 Server1 kernel: [ 1807.340701] ata3.00: failed command:
FLUSH CACHE EXT
Mar 24 09:01:43 Server1 kernel: [ 1807.343078] ata3.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 24 09:01:43 Server1 kernel: [ 1807.343078]  res
40/00:00:00:00:00/00:00:00:00:00/00 Emask
 0x4 (timeout)
Mar 24 09:01:43 Server1 kernel: [ 1807.349717] ata3.00: status: { DRDY }
Mar 24 09:01:43 Server1 kernel: [ 1807.353029] ata3: hard resetting link
Mar 24 09:01:43 Server1 kernel: [ 1807.665533] ata3: SATA link up 6.0
Gbps (SStatus 133 SControl 300)
Mar 24 09:01:43 Server1 kernel: [ 1807.667000] ata3.00: configured for
UDMA/133
Mar 24 09:01:43 Server1 kernel: [ 1807.667007] ata3.00: retrying FLUSH
0xea Emask 0x4
Mar 24 09:01:43 Server1 kernel: [ 1807.667164] ata3.00: device reported
invalid CHS sector 0
Mar 24 09:01:43 Server1 kernel: [ 1807.667183] ata3: EH complete

Whenever such a lock up happens, the whole partition is not read or
writeable for at least 90 seconds and sometimes several minutes. But the
system never crashed. I tried to google the controller card, didn't find
much about it.

Any advice would be much appreciated :).

Greetings,
Matthias




Re: [PATCH 7/7] sd: use ZERO_PAGE for WRITE_SAME payloads

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote:
> We're never touching the contents of the page, so save a memory
> allocation for these cases.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/scsi/sd.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c18fe9ff1f8f..af632e350ab4 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -756,7 +756,7 @@ static int sd_setup_write_same16_cmnd(struct scsi_cmnd 
> *cmd)
>   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
>   u32 data_len = sdp->sector_size;
>  
> - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> + rq->special_vec.bv_page = ZERO_PAGE(0);
>   if (!rq->special_vec.bv_page)
>   return BLKPREP_DEFER;
>   rq->special_vec.bv_offset = 0;
> @@ -785,7 +785,7 @@ static int sd_setup_write_same10_cmnd(struct scsi_cmnd 
> *cmd, bool unmap)
>   u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);
>   u32 data_len = sdp->sector_size;
>  
> - rq->special_vec.bv_page = alloc_page(GFP_ATOMIC | __GFP_ZERO);
> + rq->special_vec.bv_page = ZERO_PAGE(0);
>   if (!rq->special_vec.bv_page)
>   return BLKPREP_DEFER;
>   rq->special_vec.bv_offset = 0;
> @@ -1256,7 +1256,8 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt)
>  {
>   struct request *rq = SCpnt->request;
>  
> - if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
> + if ((rq->rq_flags & RQF_SPECIAL_PAYLOAD) &&
> + rq->special_vec.bv_page != ZERO_PAGE(0))
>   __free_page(rq->special_vec.bv_page);
>  
>   if (SCpnt->cmnd != scsi_req(rq)->cmd) {

Reviewed-by: Bart Van Assche 




Re: [PATCH 6/7] sd: support multi-range TRIM for ATA disks

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote:
> @@ -698,13 +698,19 @@ static void sd_config_discard(struct scsi_disk *sdkp, 
> unsigned int mode)
>   break;
>  
>   case SD_LBP_ATA_TRIM:
> - max_blocks = 65535 * (512 / sizeof(__le64));
> + max_ranges = 512 / sizeof(__le64);
> + max_range_size = USHRT_MAX;
> + max_blocks = max_ranges * max_range_size;
>   if (sdkp->device->ata_trim_zeroes_data)
>   q->limits.discard_zeroes_data = 1;
>   break;
>   }
>  
>   blk_queue_max_discard_sectors(q, max_blocks * (logical_block_size >> 
> 9));
> + if (max_ranges)
> + blk_queue_max_discard_segments(q, max_ranges);
> + if (max_range_size)
> + blk_queue_max_discard_segment_size(q, max_range_size);
>   queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
>  }

Hello Christoph,

Does blk_queue_max_discard_segment_size() expect a second argument that is a
number of bytes? Is max_range_size a number of logical blocks that each have
a size 1 << sector_shift?

> @@ -826,14 +832,21 @@ static int sd_setup_ata_trim_cmnd(struct scsi_cmnd *cmd)
>   cmd->cmnd[8] = data_len;
>  
>   buf = page_address(rq->special_vec.bv_page);
> - for (i = 0; i < (data_len >> 3); i++) {
> - u64 n = min(nr_sectors, 0xu);
> + __rq_for_each_bio(bio, rq) {
> + u64 sector = bio->bi_iter.bi_sector >> (sector_shift - 9);
> + u32 nr_sectors = bio->bi_iter.bi_size >> sector_shift;
>  
> - buf[i] = cpu_to_le64(sector | (n << 48));
> - if (nr_sectors <= 0x)
> - break;
> - sector += 0x;
> - nr_sectors -= 0x;
> + do {
> + u64 n = min(nr_sectors, 0xu);
> +
> + buf[i] = cpu_to_le64(sector | (n << 48));
> + if (nr_sectors <= 0x)
> + break;
> + sector += 0x;
> + nr_sectors -= 0x;
> + i++;
> +
> + } while (!WARN_ON_ONCE(i >= data_len >> 3));
>   }
>  
>   cmd->allowed = SD_MAX_RETRIES;

It's unfortunate that the loop-end test occurs twice (i < data_len >> 3).
Please consider using put_unaligned_le64() instead of cpu_to_le64() such
that the data type of buf can be changed from __le64* into void *. With
that change and by introducing e.g. the following:

void *end;
[ ... ]
end = (void *)buf + data_len;

the loop variable 'i' can be eliminated. If buf[i++] = ... would be
changed into *buf++ = ... then that would allow to change the two
occurrences of (i < data_len >> 3) into (buf < end).

Thanks,

Bart.

Re: [PATCH 5/7] block: add a max_discard_segment_size queue limit

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote:
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 5a7da607ca04..3b3bd646f580 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -333,6 +333,7 @@ struct queue_limits {
>   unsigned short  max_segments;
>   unsigned short  max_integrity_segments;
>   unsigned short  max_discard_segments;
> + unsigned intmax_discard_segment_size;
>  
>   unsigned char   misaligned;
>   unsigned char   discard_misaligned;
> @@ -1150,6 +1151,8 @@ extern void blk_queue_max_segments(struct request_queue 
> *, unsigned short);
>  extern void blk_queue_max_discard_segments(struct request_queue *,
>   unsigned short);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
> +extern void blk_queue_max_discard_segment_size(struct request_queue *,
> + unsigned int);
>  extern void blk_queue_max_discard_sectors(struct request_queue *q,
>   unsigned int max_discard_sectors);
>  extern void blk_queue_max_write_same_sectors(struct request_queue *q,
> @@ -1415,6 +1418,11 @@ static inline unsigned int 
> queue_max_segment_size(struct request_queue *q)
>   return q->limits.max_segment_size;
>  }
>  
> +static inline unsigned int queue_max_discard_segment_size(struct 
> request_queue *q)
> +{
> + return q->limits.max_discard_segment_size;
> +}
> +
>  static inline unsigned short queue_logical_block_size(struct request_queue 
> *q)
>  {
>   int retval = 512;

Hello Christoph,

It's probably a good idea to add a comment somewhere that the unit of
max_discard_segment_size is one byte.

Additionally, should it be mentioned in the commit message that this
patch limits the maximum discard segment size to 4 GB?

Thanks,

Bart.

Re: [PATCH 2/7] sd: provide a new ata trim provisioning mode

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote:
> + case SD_LBP_ATA_TRIM:
> + max_blocks = 65535 * (512 / sizeof(__le64));
> + if (sdkp->device->ata_trim_zeroes_data)
> + q->limits.discard_zeroes_data = 1;
> + break;

Do we need a comment here that explains where the numbers 65535 and
512 come from?

> + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
> + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);

Please consider using logical_to_sectors() instead of open-coding this
function.

Thanks,

Bart.

Re: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors

2017-03-27 Thread Joseph Salisbury
On 03/27/2017 06:14 PM, Stephen Hemminger wrote:
> Are you sure the real problem is not the one fixed by this commit?
>
> commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f
> Author: Stephen Hemminger 
> Date:   Tue Mar 7 09:15:53 2017 -0800
>
> scsi: storvsc: Workaround for virtual DVD SCSI version
> 
> Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
> version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.
> 
> Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
> successfully with virtual DVD ROM device. What happens is that the SCSI
> scan process falls back to doing sequential probing by INQUIRY.  But the
> storvsc driver has a previous workaround that masks/blocks all errors
> reports from INQUIRY (or MODE_SENSE) commands.  This workaround causes
> the scan to then populate a full set of bogus LUN's on the target and
> then sends kernel spinning off into a death spiral doing block reads on
> the non-existent LUNs.
> 
> By setting the correct blacklist flags, the target with the DVD device
> is scanned with REPORTLUN and that works correctly.
> 
> Patch needs to go in current 4.11, it is safe but not necessary in older
> kernels.
> 
> Signed-off-by: Stephen Hemminger 
> Reviewed-by: K. Y. Srinivasan 
> Reviewed-by: Christoph Hellwig 
> Signed-off-by: Martin K. Petersen 
>
> -Original Message-
> From: Joseph Salisbury [mailto:joseph.salisb...@canonical.com] 
> Sent: Monday, March 27, 2017 1:22 PM
> To: Long Li 
> Cc: KY Srinivasan ; Martin K. Petersen 
> ; Haiyang Zhang ; Stephen 
> Hemminger ; j...@linux.vnet.ibm.com; 
> de...@linuxdriverproject.org; linux-scsi ; LKML 
> ; sta...@vger.kernel.org; Greg KH 
> 
> Subject: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] 
> scsi: storvsc: properly set residual data length on errors
>
> Hi Long Li,
>
> A kernel bug report was opened against Ubuntu [0].  After a kernel
> bisect, it was found that reverting the following commit resolved this bug:
>
> commit 40630f462824ee24bc00d692865c86c3828094e0
> Author: Long Li 
> Date:   Wed Dec 14 18:46:03 2016 -0800
>
> scsi: storvsc: properly set residual data length on errors
>
>
> The regression was introduced in mainline as of v4.11-rc1.  It was also
> cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y.
>
>
> This regression seems pretty severe since it's preventing virtual
> machines from booting.  It's affecting a couple of users so far.  I was
> hoping to get your feedback, since you are the patch author.  Do you
> think gathering any additional data will help diagnose this issue, or
> would it be best to submit a revert request?
>
>
> Thanks,
>
> Joe
>
>
> [0] http://pad.lv/1674635
>
>
Thanks for the pointer, Stephen.  I'll have this patch tested and
respond with the results.


Joe



Re: [PATCH 1/7] ѕd: split sd_setup_discard_cmnd

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-20 at 16:43 -0400, Christoph Hellwig wrote:
> + u64 sector = blk_rq_pos(rq) >> (ilog2(sdp->sector_size) - 9);
> + u32 nr_sectors = blk_rq_sectors(rq) >> (ilog2(sdp->sector_size) - 9);

Although I know this is an issue in the existing code and not something
introduced by you: please consider using logical_to_sectors() instead of
open-coding this function. Otherwise this patch looks fine to me.

Thanks,

Bart.


RE: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors

2017-03-27 Thread Stephen Hemminger
Are you sure the real problem is not the one fixed by this commit?

commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f
Author: Stephen Hemminger 
Date:   Tue Mar 7 09:15:53 2017 -0800

scsi: storvsc: Workaround for virtual DVD SCSI version

Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.

Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the SCSI
scan process falls back to doing sequential probing by INQUIRY.  But the
storvsc driver has a previous workaround that masks/blocks all errors
reports from INQUIRY (or MODE_SENSE) commands.  This workaround causes
the scan to then populate a full set of bogus LUN's on the target and
then sends kernel spinning off into a death spiral doing block reads on
the non-existent LUNs.

By setting the correct blacklist flags, the target with the DVD device
is scanned with REPORTLUN and that works correctly.

Patch needs to go in current 4.11, it is safe but not necessary in older
kernels.

Signed-off-by: Stephen Hemminger 
Reviewed-by: K. Y. Srinivasan 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Martin K. Petersen 

-Original Message-
From: Joseph Salisbury [mailto:joseph.salisb...@canonical.com] 
Sent: Monday, March 27, 2017 1:22 PM
To: Long Li 
Cc: KY Srinivasan ; Martin K. Petersen 
; Haiyang Zhang ; Stephen 
Hemminger ; j...@linux.vnet.ibm.com; 
de...@linuxdriverproject.org; linux-scsi ; LKML 
; sta...@vger.kernel.org; Greg KH 

Subject: [REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] 
scsi: storvsc: properly set residual data length on errors

Hi Long Li,

A kernel bug report was opened against Ubuntu [0].  After a kernel
bisect, it was found that reverting the following commit resolved this bug:

commit 40630f462824ee24bc00d692865c86c3828094e0
Author: Long Li 
Date:   Wed Dec 14 18:46:03 2016 -0800

scsi: storvsc: properly set residual data length on errors


The regression was introduced in mainline as of v4.11-rc1.  It was also
cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y.


This regression seems pretty severe since it's preventing virtual
machines from booting.  It's affecting a couple of users so far.  I was
hoping to get your feedback, since you are the patch author.  Do you
think gathering any additional data will help diagnose this issue, or
would it be best to submit a revert request?


Thanks,

Joe


[0] http://pad.lv/1674635




[REGRESSION][Stable][v3.12.y][v4.4.y][v4.9.y][v4.10.y][v4.11-rc1] scsi: storvsc: properly set residual data length on errors

2017-03-27 Thread Joseph Salisbury
Hi Long Li,

A kernel bug report was opened against Ubuntu [0].  After a kernel
bisect, it was found that reverting the following commit resolved this bug:

commit 40630f462824ee24bc00d692865c86c3828094e0
Author: Long Li 
Date:   Wed Dec 14 18:46:03 2016 -0800

scsi: storvsc: properly set residual data length on errors


The regression was introduced in mainline as of v4.11-rc1.  It was also
cc'd to stable and has landed in v3.12.y, v4.4.y, v4.9.y and v4.10.y.


This regression seems pretty severe since it's preventing virtual
machines from booting.  It's affecting a couple of users so far.  I was
hoping to get your feedback, since you are the patch author.  Do you
think gathering any additional data will help diagnose this issue, or
would it be best to submit a revert request?


Thanks,

Joe


[0] http://pad.lv/1674635




Re: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Laurence Oberman


- Original Message -
> From: "Fam Zheng" 
> To: linux-ker...@vger.kernel.org
> Cc: "Martin K. Petersen" , f...@redhat.com, 
> linux-scsi@vger.kernel.org, "James E.J.
> Bottomley" 
> Sent: Monday, March 27, 2017 10:18:31 AM
> Subject: [PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
> 
> If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
> end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
> may get error.
> 
> Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits")
> Signed-off-by: Fam Zheng 
> ---
>  drivers/scsi/sd.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index fcfeddc..e2e21ab 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>   rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
>   } else
>   rw_max = BLK_DEF_MAX_SECTORS;
> + rw_max = min_not_zero(rw_max, dev_max);
>  
>   /* Combine with controller limits */
>   q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
> --
> 2.9.3
> 
> 
Looks good to me and addresses the issue we faced.
FInal confirmation this change is OK to come from Martin or James.

Reviewed-by: Laurence Oberman 



Re: [Drbd-dev] RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-27 Thread Bart Van Assche
On Mon, 2017-03-27 at 10:03 -0400, Mike Snitzer wrote:
> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Some time ago I posted a patch series for the block layer that zeroes
start and tail for nonaligned discard requests if dzd has been set. See
also "[PATCH v3 0/5] Make blkdev_issue_discard() submit aligned discard
requests" (https://www.spinics.net/lists/linux-block/msg02360.html).

Bart.

Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-27 Thread Christoph Hellwig
On Mon, Mar 27, 2017 at 10:03:07AM -0400, Mike Snitzer wrote:
> By "you" I assume you're referring to Lars?

Yes.

> Lars' approach for discard,
> when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
> the way discard and zeroing got conflated certainly lends itself to
> these ugly hacks.  SO I do appreciate that for anything to leverage
> discard_zeroes_data it needs to be reliable.  Which runs counter to how
> discard was implemented (discard may get silently dropped!)  But that is
> why dm-thinp doesn't advertise dzd.  Anyway...

That's exactly what this series does - remove discard_zeroes_data and
use the new REQ_OP_WRITE_ZEROES for anything that wants zeroing offload.

> As for the blkdev_issue_zeroout() resorting to manually zeroing the
> range, if the discard fails or dzd not supported, that certainly
> requires DM thinp to implement manual zeroing of the head and tail of
> the range if partial blocks are being zeroed.  So I welcome any advances
> there.  It is probably something that is best left to Joe or myself to
> tackle.  But I'll gladly accept patches ;)

Ok, I'll happily leave this to the two of you..


[PATCH] sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable

2017-03-27 Thread Fam Zheng
If device reports a small max_xfer_blocks and a zero opt_xfer_blocks, we
end up using BLK_DEF_MAX_SECTORS, which is wrong and r/w of that size
may get error.

Fixes: ca369d51b3e ("block/sd: Fix device-imposed transfer length limits")
Signed-off-by: Fam Zheng 
---
 drivers/scsi/sd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fcfeddc..e2e21ab 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2957,6 +2957,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
} else
rw_max = BLK_DEF_MAX_SECTORS;
+   rw_max = min_not_zero(rw_max, dev_max);
 
/* Combine with controller limits */
q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q));
-- 
2.9.3



Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-27 Thread Mike Snitzer
On Mon, Mar 27 2017 at  5:10am -0400,
Christoph Hellwig  wrote:

> It sounds like you don't want to support traditional discard at all,
> but only WRITE ZEROES.  So in many ways this series is the right way
> forward.  It would be nice if we could do a full blown
> REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
> similar to what hardware that implements WRITE SAME of zeroes
> or WRITE ZEROES would do.  I'll see if I could include that in my
> series.

By "you" I assume you're referring to Lars?  Lars' approach for discard,
when drbd is layered on dm-thinp, feels over-engineered.  Not his fault,
the way discard and zeroing got conflated certainly lends itself to
these ugly hacks.  SO I do appreciate that for anything to leverage
discard_zeroes_data it needs to be reliable.  Which runs counter to how
discard was implemented (discard may get silently dropped!)  But that is
why dm-thinp doesn't advertise dzd.  Anyway...

As for the blkdev_issue_zeroout() resorting to manually zeroing the
range, if the discard fails or dzd not supported, that certainly
requires DM thinp to implement manual zeroing of the head and tail of
the range if partial blocks are being zeroed.  So I welcome any advances
there.  It is probably something that is best left to Joe or myself to
tackle.  But I'll gladly accept patches ;)


[PATCHv5 2/2] tcmu: Fix wrongly calculating of the base_command_size

2017-03-27 Thread lixiubo
From: Xiubo Li 

The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.

For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.

Signed-off-by: Xiubo Li 
Tested-by: Ilias Tsitsimpis 
---
 drivers/target/target_core_user.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 65d475f..ede815c 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -408,6 +408,13 @@ static inline size_t tcmu_cmd_get_data_length(struct 
tcmu_cmd *tcmu_cmd)
return data_length;
 }
 
+static inline uint32_t tcmu_cmd_get_block_cnt(struct tcmu_cmd *tcmu_cmd)
+{
+   size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
+
+   return data_length / DATA_BLOCK_SIZE;
+}
+
 static sense_reason_t
 tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
 {
@@ -435,8 +442,7 @@ static inline size_t tcmu_cmd_get_data_length(struct 
tcmu_cmd *tcmu_cmd)
 * expensive to tell how many regions are freed in the bitmap
*/
base_command_size = max(offsetof(struct tcmu_cmd_entry,
-   req.iov[se_cmd->t_bidi_data_nents +
-   se_cmd->t_data_nents]),
+   req.iov[tcmu_cmd_get_block_cnt(tcmu_cmd)]),
sizeof(struct tcmu_cmd_entry));
command_size = base_command_size
+ round_up(scsi_command_size(se_cmd->t_task_cdb), 
TCMU_OP_ALIGN_SIZE);
-- 
1.8.3.1





Re: [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES

2017-03-27 Thread Christoph Hellwig
On Thu, Mar 23, 2017 at 11:10:38AM -0400, Mike Snitzer wrote:
> Not sure why you've split out the dm-kcopyd patch, likely best to just
> fold it into the previous dm support patch.

The dm-kcopyd patch switches to using WRITE_ZEROES instead of write
same for dm as user of these interfaces.  The one before just adds
WRITE_ZEROES support to dm.



Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-27 Thread Christoph Hellwig
It sounds like you don't want to support traditional discard at all,
but only WRITE ZEROES.  So in many ways this series is the right way
forward.  It would be nice if we could do a full blown
REQ_OP_WRITE_ZEROES for dm_think that zeroes out partial blocks,
similar to what hardware that implements WRITE SAME of zeroes
or WRITE ZEROES would do.  I'll see if I could include that in my
series.


[PATCHv5 0/2] tcmu: For bugs fix only

2017-03-27 Thread lixiubo
From: Xiubo Li 

Changed for V5:
- This only includes #1 and #2. And for old #3, #4 are still reviewing.
- #1, since the issue reported by Ilias is a separate new one, and will
  create a new patch later.
- #2, address the issue pointed out by Mike, thanks.
- #1 and #2 have been tested by Ilias in BIDI case, thanks.

Changed for V4:
- re-order the #3, #4 at the head.
- merge most of the #5 to others.

Xiubo Li (2):
  tcmu: Fix possible overwrite of t_data_sg's last iov[]
  tcmu: Fix wrongly calculating of the base_command_size

 drivers/target/target_core_user.c | 44 +++
 1 file changed, 31 insertions(+), 13 deletions(-)

-- 
1.8.3.1





[PATCHv5 1/2] tcmu: Fix possible overwrite of t_data_sg's last iov[]

2017-03-27 Thread lixiubo
From: Xiubo Li 

If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Signed-off-by: Xiubo Li 
Signed-off-by: Bryant G. Ly 
Reviewed-by: Mike Christie 
Tested-by: Ilias Tsitsimpis 
---
 drivers/target/target_core_user.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/target/target_core_user.c 
b/drivers/target/target_core_user.c
index 57defc3..65d475f 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -394,6 +394,20 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
return true;
 }
 
+static inline size_t tcmu_cmd_get_data_length(struct tcmu_cmd *tcmu_cmd)
+{
+   struct se_cmd *se_cmd = tcmu_cmd->se_cmd;
+   size_t data_length = round_up(se_cmd->data_length, DATA_BLOCK_SIZE);
+
+   if (se_cmd->se_cmd_flags & SCF_BIDI) {
+   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
+   data_length += round_up(se_cmd->t_bidi_data_sg->length,
+   DATA_BLOCK_SIZE);
+   }
+
+   return data_length;
+}
+
 static sense_reason_t
 tcmu_queue_cmd_ring(struct tcmu_cmd *tcmu_cmd)
 {
@@ -407,7 +421,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
uint32_t cmd_head;
uint64_t cdb_off;
bool copy_to_data_area;
-   size_t data_length;
+   size_t data_length = tcmu_cmd_get_data_length(tcmu_cmd);
DECLARE_BITMAP(old_bitmap, DATA_BLOCK_BITS);
 
if (test_bit(TCMU_DEV_BIT_BROKEN, &udev->flags))
@@ -433,11 +447,6 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
 
mb = udev->mb_addr;
cmd_head = mb->cmd_head % udev->cmdr_size; /* UAM */
-   data_length = se_cmd->data_length;
-   if (se_cmd->se_cmd_flags & SCF_BIDI) {
-   BUG_ON(!(se_cmd->t_bidi_data_sg && se_cmd->t_bidi_data_nents));
-   data_length += se_cmd->t_bidi_data_sg->length;
-   }
if ((command_size > (udev->cmdr_size / 2)) ||
data_length > udev->data_size) {
pr_warn("TCMU: Request of size %zu/%zu is too big for %u/%zu "
@@ -511,11 +520,14 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, 
size_t cmd_size, size_t d
entry->req.iov_dif_cnt = 0;
 
/* Handle BIDI commands */
-   iov_cnt = 0;
-   alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
-   se_cmd->t_bidi_data_nents, &iov, &iov_cnt, false);
-   entry->req.iov_bidi_cnt = iov_cnt;
-
+   if (se_cmd->se_cmd_flags & SCF_BIDI) {
+   iov_cnt = 0;
+   iov++;
+   alloc_and_scatter_data_area(udev, se_cmd->t_bidi_data_sg,
+   se_cmd->t_bidi_data_nents, &iov, &iov_cnt,
+   false);
+   entry->req.iov_bidi_cnt = iov_cnt;
+   }
/* cmd's data_bitmap is what changed in process */
bitmap_xor(tcmu_cmd->data_bitmap, old_bitmap, udev->data_bitmap,
DATA_BLOCK_BITS);
-- 
1.8.3.1