[PATCH] blk-mq: I/O and timer unplugs are inverted in blktrace

2018-09-26 Thread Ilya Dryomov
trace_block_unplug() takes true for explicit unplugs and false for
implicit unplugs.  schedule() unplugs are implicit and should be
reported as timer unplugs.  While correct in the legacy code, this has
been inverted in blk-mq since 4.11.

Cc: sta...@vger.kernel.org
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: Ilya Dryomov 
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 85a1c1a59c72..e3c39ea8e17b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1628,7 +1628,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
BUG_ON(!rq->q);
if (rq->mq_ctx != this_ctx) {
if (this_ctx) {
-   trace_block_unplug(this_q, depth, 
from_schedule);
+   trace_block_unplug(this_q, depth, 
!from_schedule);
blk_mq_sched_insert_requests(this_q, this_ctx,
_list,
from_schedule);
@@ -1648,7 +1648,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
 * on 'ctx_list'. Do those.
 */
if (this_ctx) {
-   trace_block_unplug(this_q, depth, from_schedule);
+   trace_block_unplug(this_q, depth, !from_schedule);
blk_mq_sched_insert_requests(this_q, this_ctx, _list,
from_schedule);
}
-- 
2.14.4



Re: [PATCH] [v2] rbd: avoid Wreturn-type warnings

2018-04-04 Thread Ilya Dryomov
On Wed, Apr 4, 2018 at 2:53 PM, Arnd Bergmann  wrote:
> A new set of warnings appeared in next-20180403 in some configurations
> when gcc cannot see that rbd_assert(0) leads to an unreachable code
> path:
>
> drivers/block/rbd.c: In function 'rbd_img_is_write':
> drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function '__rbd_obj_handle_request':
> drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function 'rbd_obj_handle_write':
> drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
> [-Werror=return-type]
>
> As the rbd_assert() here shows has no extra information beyond the verbose
> BUG(), we can simply use BUG() directly in its place.  This is reliably
> detected as not returning on any architecture, since it doesn't depend
> on the unlikely() comparison that confused gcc.
>
> Fixes: 3da691bf4366 ("rbd: new request handling code")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/block/rbd.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 07dc5419bd63..5f7f4d4b78a8 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1392,7 +1392,7 @@ static bool rbd_img_is_write(struct rbd_img_request 
> *img_req)
> case OBJ_OP_DISCARD:
> return true;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>
> @@ -2466,7 +2466,7 @@ static bool rbd_obj_handle_write(struct rbd_obj_request 
> *obj_req)
> }
> return false;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>
> @@ -2494,7 +2494,7 @@ static bool __rbd_obj_handle_request(struct 
> rbd_obj_request *obj_req)
> }
> return false;
> default:
> -   rbd_assert(0);
> +   BUG();
> }
>  }
>

Applied.

Thanks,

Ilya


Re: [PATCH] rbd: add missing return statements

2018-04-04 Thread Ilya Dryomov
On Wed, Apr 4, 2018 at 11:49 AM, Arnd Bergmann  wrote:
> A new set of warnings appeared in next-20180403 in some configurations
> when gcc cannot see that rbd_assert(0) leads to an unreachable code
> path:
>
> drivers/block/rbd.c: In function 'rbd_img_is_write':
> drivers/block/rbd.c:1397:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function '__rbd_obj_handle_request':
> drivers/block/rbd.c:2499:1: error: control reaches end of non-void function 
> [-Werror=return-type]
> drivers/block/rbd.c: In function 'rbd_obj_handle_write':
> drivers/block/rbd.c:2471:1: error: control reaches end of non-void function 
> [-Werror=return-type]
>
> To work around this, we can add a return statement to each of these
> cases. An alternative would be to remove the unlikely() annotation
> in rbd_assert(), or to just use BUG()/BUG_ON() directly. This adds the
> return statements, guessing what the most reasonable behavior
> would be.

Hi Arnd,

I don't like these bogus return statements.  Let's go with explicit
BUG/BUG_ON() instead.

Thanks,

Ilya


Re: [PATCH] rbd: remove VLA usage

2018-03-30 Thread Ilya Dryomov
On Fri, Mar 30, 2018 at 9:17 PM, Gustavo A. R. Silva
 wrote:
> In preparation to enabling -Wvla, remove the use of stack VLA.
>
> In this particular case, variable buf_size is replaced with a constant
> expression that can be computed at preprocessing time. This avoids two
> VLA warnings. Also, as a consequence of the mentioned change, some code
> was slightly refactored.
>
> The use of stack Variable Length Arrays needs to be avoided, as they
> can be a vector for stack exhaustion, which can be both a runtime bug
> or a security flaw. Also, in general, as code evolves it is easy to
> lose track of how big a VLA can get. Thus, we can end up having runtime
> failures that are hard to debug.
>
> Also, fixed as part of the directive to remove all VLAs from
> the kernel: https://lkml.org/lkml/2018/3/7/621
>
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  drivers/block/rbd.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 1e03b04..5133122 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -3091,20 +3091,20 @@ static int __rbd_notify_op_lock(struct rbd_device 
> *rbd_dev,
>  {
> struct ceph_osd_client *osdc = _dev->rbd_client->client->osdc;
> struct rbd_client_id cid = rbd_get_cid(rbd_dev);
> -   int buf_size = 4 + 8 + 8 + CEPH_ENCODING_START_BLK_LEN;
> -   char buf[buf_size];
> +   char buf[4 + 8 + 8 + CEPH_ENCODING_START_BLK_LEN];
> void *p = buf;
>
> dout("%s rbd_dev %p notify_op %d\n", __func__, rbd_dev, notify_op);
>
> /* encode *LockPayload NotifyMessage (op + ClientId) */
> -   ceph_start_encoding(, 2, 1, buf_size - CEPH_ENCODING_START_BLK_LEN);
> +   ceph_start_encoding(, 2, 1,
> +   sizeof(buf) - CEPH_ENCODING_START_BLK_LEN);
> ceph_encode_32(, notify_op);
> ceph_encode_64(, cid.gid);
> ceph_encode_64(, cid.handle);
>
> return ceph_osdc_notify(osdc, _dev->header_oid,
> -   _dev->header_oloc, buf, buf_size,
> +   _dev->header_oloc, buf, sizeof(buf),
> RBD_NOTIFY_TIMEOUT, preply_pages, preply_len);
>  }
>
> @@ -3610,19 +3610,18 @@ static void __rbd_acknowledge_notify(struct 
> rbd_device *rbd_dev,
>  u64 notify_id, u64 cookie, s32 *result)
>  {
> struct ceph_osd_client *osdc = _dev->rbd_client->client->osdc;
> -   int buf_size = 4 + CEPH_ENCODING_START_BLK_LEN;
> -   char buf[buf_size];
> +   char buf[4 + CEPH_ENCODING_START_BLK_LEN];
> +   int buf_size = 0;
> int ret;
>
> if (result) {
> void *p = buf;
>
> +   buf_size = sizeof(buf);
> /* encode ResponseMessage */
> ceph_start_encoding(, 1, 1,
> buf_size - CEPH_ENCODING_START_BLK_LEN);
> ceph_encode_32(, *result);
> -   } else {
> -   buf_size = 0;
> }
>
> ret = ceph_osdc_notify_ack(osdc, _dev->header_oid,

Already fixed:

  
https://github.com/ceph/ceph-client/commit/2474e77e75215a3cad3b652d2f55ba5b123cb119

Thanks,

Ilya


Re: [PATCH] rbd: fix spelling mistake: "reregisteration" -> "re-registration"

2018-03-20 Thread Ilya Dryomov
On Tue, Mar 20, 2018 at 11:39 AM, Colin Ian King
 wrote:
> On 20/03/18 10:35, Dongsheng Yang wrote:
>> Hi Colin,
>>
>> On 03/19/2018 09:33 PM, Colin King wrote:
>>> From: Colin Ian King 
>>>
>>> Trivial fix to spelling mistake in rdb_warn message text
>>>
>>> Signed-off-by: Colin Ian King 
>>> ---
>>>   drivers/block/rbd.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 1e03b04819c8..7b97240ff15e 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -3887,7 +3887,7 @@ static void rbd_reregister_watch(struct
>>> work_struct *work)
>>> ret = rbd_dev_refresh(rbd_dev);
>>>   if (ret)
>>> -rbd_warn(rbd_dev, "reregisteration refresh failed: %d", ret);
>>> +rbd_warn(rbd_dev, "re-registrationrefresh failed: %d", ret);
>>
>> Hmmm, I am not sure is that a spelling mistake, because the function
>> name is rbd_reregister_watch().
>
> OK, well reregisteration should be reregistration then.

I went ahead and applied the change to "reregistration".

Thanks,

Ilya


Re: [PATCH v3 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2018-01-18 Thread Ilya Dryomov
On Thu, Jan 11, 2018 at 2:09 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> Hello,
>
> I was doing some cleanup work on rbd BLKROSET handler and discovered
> that we ignore partition rw/ro setting (hd_struct->policy) for pretty
> much everything but straight writes.
>
> David (CCed) has blktests patches standing by.
>
> (Another aspect of this is that we don't enforce open(2) mode.  Tejun
> took a stab at this a few years ago, but his patch had to be reverted:
>
>   75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
>   e51900f7d38c ("block: revert block_dev read-only check")
>
> It is a separate issue and refusing writes to read-only devices is
> obviously more important, but perhaps it's time to revisit that as
> well?)
>
> v2 -> v3:
> - lookup part only once; combine read-only check with existing
>   should_fail_request check
>
> v1 -> v2:
> - added unlikely() per Sagi's suggestion
>
> Thanks,
>
> Ilya
>
>
> Ilya Dryomov (2):
>   block: fail op_is_write() requests to read-only partitions
>   block: add bdev_read_only() checks to common helpers
>
>  block/blk-core.c | 56 
> ++--
>  block/blk-lib.c  | 12 
>  2 files changed, 50 insertions(+), 18 deletions(-)

Jens, could you please pick these up for 4.16?

Thanks,

Ilya


[PATCH v3 2/2] block: add bdev_read_only() checks to common helpers

2018-01-11 Thread Ilya Dryomov
Similar to blkdev_write_iter(), return -EPERM if the partition is
read-only.  This covers ioctl(), fallocate() and most in-kernel users
but isn't meant to be exhaustive -- everything else will be caught in
generic_make_request_checks(), fail with -EIO and can be fixed later.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2bc544ce3d2e..a676084d4740 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -37,6 +37,9 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
@@ -156,6 +159,9 @@ static int __blkdev_issue_write_same(struct block_device 
*bdev, sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
@@ -233,6 +239,9 @@ static int __blkdev_issue_write_zeroes(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
 
@@ -287,6 +296,9 @@ static int __blkdev_issue_zero_pages(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
while (nr_sects != 0) {
bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
   gfp_mask);
-- 
2.4.3



[PATCH v3 1/2] block: fail op_is_write() requests to read-only partitions

2018-01-11 Thread Ilya Dryomov
Regular block device writes go through blkdev_write_iter(), which does
bdev_read_only(), while zeroout/discard/etc requests are never checked,
both userspace- and kernel-triggered.  Add a generic catch-all check to
generic_make_request_checks() to actually enforce ioctl(BLKROSET) and
set_disk_ro(), which is used by quite a few drivers for things like
snapshots, read-only backing files/images, etc.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-core.c | 56 ++--
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f843ae4f858d..44a6dfe47ae1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2062,6 +2062,21 @@ static inline bool should_fail_request(struct hd_struct 
*part,
 
 #endif /* CONFIG_FAIL_MAKE_REQUEST */
 
+static inline bool bio_check_ro(struct bio *bio, struct hd_struct *part)
+{
+   if (part->policy && op_is_write(bio_op(bio))) {
+   char b[BDEVNAME_SIZE];
+
+   printk(KERN_ERR
+  "generic_make_request: Trying to write "
+   "to read-only block-device %s (partno %d)\n",
+   bio_devname(bio, b), part->partno);
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * Remap block n of partition p to block n+start(p) of the disk.
  */
@@ -2070,27 +2085,28 @@ static inline int blk_partition_remap(struct bio *bio)
struct hd_struct *p;
int ret = 0;
 
+   rcu_read_lock();
+   p = __disk_get_part(bio->bi_disk, bio->bi_partno);
+   if (unlikely(!p || should_fail_request(p, bio->bi_iter.bi_size) ||
+bio_check_ro(bio, p))) {
+   ret = -EIO;
+   goto out;
+   }
+
/*
 * Zone reset does not include bi_size so bio_sectors() is always 0.
 * Include a test for the reset op code and perform the remap if needed.
 */
-   if (!bio->bi_partno ||
-   (!bio_sectors(bio) && bio_op(bio) != REQ_OP_ZONE_RESET))
-   return 0;
+   if (!bio_sectors(bio) && bio_op(bio) != REQ_OP_ZONE_RESET)
+   goto out;
 
-   rcu_read_lock();
-   p = __disk_get_part(bio->bi_disk, bio->bi_partno);
-   if (likely(p && !should_fail_request(p, bio->bi_iter.bi_size))) {
-   bio->bi_iter.bi_sector += p->start_sect;
-   bio->bi_partno = 0;
-   trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
-   bio->bi_iter.bi_sector - p->start_sect);
-   } else {
-   printk("%s: fail for partition %d\n", __func__, bio->bi_partno);
-   ret = -EIO;
-   }
-   rcu_read_unlock();
+   bio->bi_iter.bi_sector += p->start_sect;
+   bio->bi_partno = 0;
+   trace_block_bio_remap(bio->bi_disk->queue, bio, part_devt(p),
+ bio->bi_iter.bi_sector - p->start_sect);
 
+out:
+   rcu_read_unlock();
return ret;
 }
 
@@ -2149,15 +2165,19 @@ generic_make_request_checks(struct bio *bio)
 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
 * if queue is not a request based queue.
 */
-
if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
goto not_supported;
 
if (should_fail_request(>bi_disk->part0, bio->bi_iter.bi_size))
goto end_io;
 
-   if (blk_partition_remap(bio))
-   goto end_io;
+   if (!bio->bi_partno) {
+   if (unlikely(bio_check_ro(bio, >bi_disk->part0)))
+   goto end_io;
+   } else {
+   if (blk_partition_remap(bio))
+   goto end_io;
+   }
 
if (bio_check_eod(bio, nr_sectors))
goto end_io;
-- 
2.4.3



[PATCH v3 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2018-01-11 Thread Ilya Dryomov
Hello,

I was doing some cleanup work on rbd BLKROSET handler and discovered
that we ignore partition rw/ro setting (hd_struct->policy) for pretty
much everything but straight writes.

David (CCed) has blktests patches standing by.

(Another aspect of this is that we don't enforce open(2) mode.  Tejun
took a stab at this a few years ago, but his patch had to be reverted:

  75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
  e51900f7d38c ("block: revert block_dev read-only check")

It is a separate issue and refusing writes to read-only devices is
obviously more important, but perhaps it's time to revisit that as
well?)

v2 -> v3:
- lookup part only once; combine read-only check with existing
  should_fail_request check

v1 -> v2:
- added unlikely() per Sagi's suggestion

Thanks,

    Ilya


Ilya Dryomov (2):
  block: fail op_is_write() requests to read-only partitions
  block: add bdev_read_only() checks to common helpers

 block/blk-core.c | 56 ++--
 block/blk-lib.c  | 12 
 2 files changed, 50 insertions(+), 18 deletions(-)

-- 
2.4.3



Re: [PATCH v2 1/2] block: fail op_is_write() requests to read-only partitions

2018-01-10 Thread Ilya Dryomov
On Wed, Jan 10, 2018 at 5:34 PM, Jens Axboe <ax...@kernel.dk> wrote:
> On 1/10/18 9:18 AM, Ilya Dryomov wrote:
>> Regular block device writes go through blkdev_write_iter(), which does
>> bdev_read_only(), while zeroout/discard/etc requests are never checked,
>> both userspace- and kernel-triggered.  Add a generic catch-all check to
>> generic_make_request_checks() to actually enforce ioctl(BLKROSET) and
>> set_disk_ro(), which is used by quite a few drivers for things like
>> snapshots, read-only backing files/images, etc.
>>
>> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
>> Reviewed-by: Sagi Grimberg <s...@grimberg.me>
>> ---
>>  block/blk-core.c | 23 ++-
>>  1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index f843ae4f858d..d132bec4a266 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -2123,6 +2123,20 @@ static inline int bio_check_eod(struct bio *bio, 
>> unsigned int nr_sectors)
>>   return 0;
>>  }
>>
>> +static inline bool bio_check_ro(struct bio *bio)
>> +{
>> + struct hd_struct *p;
>> + bool ret = false;
>> +
>> + rcu_read_lock();
>> + p = __disk_get_part(bio->bi_disk, bio->bi_partno);
>> + if (!p || (p->policy && op_is_write(bio_op(bio
>> + ret = true;
>> + rcu_read_unlock();
>> +
>> + return ret;
>> +}> +
>>  static noinline_for_stack bool
>>  generic_make_request_checks(struct bio *bio)
>>  {
>> @@ -2145,11 +2159,18 @@ generic_make_request_checks(struct bio *bio)
>>   goto end_io;
>>   }
>>
>> + if (unlikely(bio_check_ro(bio))) {
>> + printk(KERN_ERR
>> +"generic_make_request: Trying to write "
>> + "to read-only block-device %s (partno %d)\n",
>> + bio_devname(bio, b), bio->bi_partno);
>> + goto end_io;
>> + }
>
> It's yet another check that adds part lookup and rcu lock/unlock in that
> path. Can we combine some of them? Make this part of the remap?  This
> overhead impacts every IO, let's not bloat it more than absolutely
> necessary.

Yes, combining with should_fail_request check in remap should be easy
enough.  I considered it, but opted for the less invasive patch.  I'll
re-spin.

Thanks,

Ilya


[PATCH v2 2/2] block: add bdev_read_only() checks to common helpers

2018-01-10 Thread Ilya Dryomov
Similar to blkdev_write_iter(), return -EPERM if the partition is
read-only.  This covers ioctl(), fallocate() and most in-kernel users
but isn't meant to be exhaustive -- everything else will be caught in
generic_make_request_checks(), fail with -EIO and can be fixed later.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2bc544ce3d2e..a676084d4740 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -37,6 +37,9 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
@@ -156,6 +159,9 @@ static int __blkdev_issue_write_same(struct block_device 
*bdev, sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
@@ -233,6 +239,9 @@ static int __blkdev_issue_write_zeroes(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
 
@@ -287,6 +296,9 @@ static int __blkdev_issue_zero_pages(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
while (nr_sects != 0) {
bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
   gfp_mask);
-- 
2.4.3



[PATCH v2 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2018-01-10 Thread Ilya Dryomov
Hello,

I was doing some cleanup work on rbd BLKROSET handler and discovered
that we ignore partition rw/ro setting (hd_struct->policy) for pretty
much everything but straight writes.

David (CCed) has blktests patches standing by.

(Another aspect of this is that we don't enforce open(2) mode.  Tejun
took a stab at this a few years ago, but his patch had to be reverted:

  75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
  e51900f7d38c ("block: revert block_dev read-only check")

It is a separate issue and refusing writes to read-only devices is
obviously more important, but perhaps it's time to revisit that as
well?)

v1 -> v2:
- added unlikely() per Sagi's suggestion

Thanks,

        Ilya


Ilya Dryomov (2):
  block: fail op_is_write() requests to read-only partitions
  block: add bdev_read_only() checks to common helpers

 block/blk-core.c | 23 ++-
 block/blk-lib.c  | 12 
 2 files changed, 34 insertions(+), 1 deletion(-)

-- 
2.4.3



[PATCH v2 1/2] block: fail op_is_write() requests to read-only partitions

2018-01-10 Thread Ilya Dryomov
Regular block device writes go through blkdev_write_iter(), which does
bdev_read_only(), while zeroout/discard/etc requests are never checked,
both userspace- and kernel-triggered.  Add a generic catch-all check to
generic_make_request_checks() to actually enforce ioctl(BLKROSET) and
set_disk_ro(), which is used by quite a few drivers for things like
snapshots, read-only backing files/images, etc.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
Reviewed-by: Sagi Grimberg <s...@grimberg.me>
---
 block/blk-core.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f843ae4f858d..d132bec4a266 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2123,6 +2123,20 @@ static inline int bio_check_eod(struct bio *bio, 
unsigned int nr_sectors)
return 0;
 }
 
+static inline bool bio_check_ro(struct bio *bio)
+{
+   struct hd_struct *p;
+   bool ret = false;
+
+   rcu_read_lock();
+   p = __disk_get_part(bio->bi_disk, bio->bi_partno);
+   if (!p || (p->policy && op_is_write(bio_op(bio
+   ret = true;
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static noinline_for_stack bool
 generic_make_request_checks(struct bio *bio)
 {
@@ -2145,11 +2159,18 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
+   if (unlikely(bio_check_ro(bio))) {
+   printk(KERN_ERR
+  "generic_make_request: Trying to write "
+   "to read-only block-device %s (partno %d)\n",
+   bio_devname(bio, b), bio->bi_partno);
+   goto end_io;
+   }
+
/*
 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
 * if queue is not a request based queue.
 */
-
if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
goto not_supported;
 
-- 
2.4.3



Re: [PATCH 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2017-11-27 Thread Ilya Dryomov
On Thu, Nov 16, 2017 at 10:02 AM, Ilya Dryomov <idryo...@gmail.com> wrote:
> On Thu, Nov 9, 2017 at 7:44 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
>> Hello,
>>
>> I was doing some cleanup work on rbd BLKROSET handler and discovered
>> that we ignore partition rw/ro setting (hd_struct->policy) for pretty
>> much everything but straight writes.
>>
>> David (CCed) has blktests patches standing by.
>>
>> (Another aspect of this is that we don't enforce open(2) mode.  Tejun
>> took a stab at this a few years ago, but his patch had to be reverted:
>>
>>   75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
>>   e51900f7d38c ("block: revert block_dev read-only check")
>>
>> It is a separate issue and refusing writes to read-only devices is
>> obviously more important, but perhaps it's time to revisit that as
>> well?)
>>
>> Thanks,
>>
>> Ilya
>>
>>
>> Ilya Dryomov (2):
>>   block: fail op_is_write() requests to read-only partitions
>>   block: add bdev_read_only() checks to common helpers
>>
>>  block/blk-core.c | 23 ++-
>>  block/blk-lib.c  | 12 
>>  2 files changed, 34 insertions(+), 1 deletion(-)
>
> Ping...  Christoph, Jens, could one of you please take a look?

Ping?

Thanks,

Ilya


Re: [PATCH 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2017-11-16 Thread Ilya Dryomov
On Thu, Nov 9, 2017 at 7:44 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> Hello,
>
> I was doing some cleanup work on rbd BLKROSET handler and discovered
> that we ignore partition rw/ro setting (hd_struct->policy) for pretty
> much everything but straight writes.
>
> David (CCed) has blktests patches standing by.
>
> (Another aspect of this is that we don't enforce open(2) mode.  Tejun
> took a stab at this a few years ago, but his patch had to be reverted:
>
>   75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
>   e51900f7d38c ("block: revert block_dev read-only check")
>
> It is a separate issue and refusing writes to read-only devices is
> obviously more important, but perhaps it's time to revisit that as
> well?)
>
> Thanks,
>
> Ilya
>
>
> Ilya Dryomov (2):
>   block: fail op_is_write() requests to read-only partitions
>   block: add bdev_read_only() checks to common helpers
>
>  block/blk-core.c | 23 ++-
>  block/blk-lib.c  | 12 
>  2 files changed, 34 insertions(+), 1 deletion(-)

Ping...  Christoph, Jens, could one of you please take a look?

Thanks,

Ilya


[PATCH 0/2] block: enforce ioctl(BLKROSET) and set_disk_ro()

2017-11-09 Thread Ilya Dryomov
Hello,

I was doing some cleanup work on rbd BLKROSET handler and discovered
that we ignore partition rw/ro setting (hd_struct->policy) for pretty
much everything but straight writes.

David (CCed) has blktests patches standing by.

(Another aspect of this is that we don't enforce open(2) mode.  Tejun
took a stab at this a few years ago, but his patch had to be reverted:

  75f1dc0d076d ("block: check bdev_read_only() from blkdev_get()")
  e51900f7d38c ("block: revert block_dev read-only check")

It is a separate issue and refusing writes to read-only devices is
obviously more important, but perhaps it's time to revisit that as
well?)

Thanks,

    Ilya


Ilya Dryomov (2):
  block: fail op_is_write() requests to read-only partitions
  block: add bdev_read_only() checks to common helpers

 block/blk-core.c | 23 ++-
 block/blk-lib.c  | 12 
 2 files changed, 34 insertions(+), 1 deletion(-)

-- 
2.4.3



[PATCH 2/2] block: add bdev_read_only() checks to common helpers

2017-11-09 Thread Ilya Dryomov
Similar to blkdev_write_iter(), return -EPERM if the partition is
read-only.  This covers ioctl(), fallocate() and most in-kernel users
but isn't meant to be exhaustive -- everything else will be caught in
generic_make_request_checks(), fail with -EIO and can be fixed later.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index f625fda5f095..64fe863ae43a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -36,6 +36,9 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secure_erase(q))
return -EOPNOTSUPP;
@@ -155,6 +158,9 @@ static int __blkdev_issue_write_same(struct block_device 
*bdev, sector_t sector,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
return -EINVAL;
@@ -232,6 +238,9 @@ static int __blkdev_issue_write_zeroes(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
 
@@ -286,6 +295,9 @@ static int __blkdev_issue_zero_pages(struct block_device 
*bdev,
if (!q)
return -ENXIO;
 
+   if (bdev_read_only(bdev))
+   return -EPERM;
+
while (nr_sects != 0) {
bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
   gfp_mask);
-- 
2.4.3



[PATCH 1/2] block: fail op_is_write() requests to read-only partitions

2017-11-09 Thread Ilya Dryomov
Regular block device writes go through blkdev_write_iter(), which does
bdev_read_only(), while zeroout/discard/etc requests are never checked,
both userspace- and kernel-triggered.  Add a generic catch-all check to
generic_make_request_checks() to actually enforce ioctl(BLKROSET) and
set_disk_ro(), which is used by quite a few drivers for things like
snapshots, read-only backing files/images, etc.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-core.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b8d1aa2d1008..139ff47caf4a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2022,6 +2022,20 @@ static inline int bio_check_eod(struct bio *bio, 
unsigned int nr_sectors)
return 0;
 }
 
+static inline bool bio_check_ro(struct bio *bio)
+{
+   struct hd_struct *p;
+   int ret = false;
+
+   rcu_read_lock();
+   p = __disk_get_part(bio->bi_disk, bio->bi_partno);
+   if (!p || (p->policy && op_is_write(bio_op(bio
+   ret = true;
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static noinline_for_stack bool
 generic_make_request_checks(struct bio *bio)
 {
@@ -2044,11 +2058,18 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
+   if (bio_check_ro(bio)) {
+   printk(KERN_ERR
+  "generic_make_request: Trying to write "
+   "to read-only block-device %s (partno %d)\n",
+   bio_devname(bio, b), bio->bi_partno);
+   goto end_io;
+   }
+
/*
 * For a REQ_NOWAIT based request, return -EOPNOTSUPP
 * if queue is not a request based queue.
 */
-
if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q))
goto not_supported;
 
-- 
2.4.3



Re: [PATCH v3 0/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-25 Thread Ilya Dryomov
On Mon, Oct 16, 2017 at 3:59 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> Hi Christoph, Martin,
>
> blkdev_issue_zeroout() now checks for any error.  This required a minor
> refactor, so I dropped the stable tag, Jens can add it back if needed.
>
> v2 -> v3:
> - another code flow change in blkdev_issue_zeroout() suggested by
>   Christoph -- no functional changes
>
> v1 -> v2:
> - changed code flow in blkdev_issue_zeroout() according to Christoph's
>   suggestion
> - this required adding additional checks to blkdev_issue_zeroout() and
>   __blkdev_issue_zero_pages(), the latter is no longer void
>
> Previous version at
>
>   https://marc.info/?l=linux-block=150712940922089=2
>
> Thanks,
>
> Ilya
>
>
> Ilya Dryomov (2):
>   block: factor out __blkdev_issue_zero_pages()
>   block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()
>
>  block/blk-lib.c | 108 
> +---
>  1 file changed, 72 insertions(+), 36 deletions(-)

Jens, can you pick these up?  Technically this is a user-visible
regression (Fixes tag in 2/2), but we are late in the cycle so I'm not
sure what is the right thing to do here.

Thanks,

Ilya


[PATCH] block: move CAP_SYS_ADMIN check in blkdev_roset()

2017-10-18 Thread Ilya Dryomov
Check for CAP_SYS_ADMIN before calling into the driver, similar to
blkdev_flushbuf().  This is safer and can spare a check in the driver.

(Currently BLKROSET is overridden by md and rbd, rbd is missing the
check.  md has the check, but it covers a lot more than BLKROSET.)

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
Al, this appears to go back to your "[PATCH] block ioctl cleanup",
history commit c6973580141c.  2002 was a long time ago, but still ;)
Was there a reason you made BLKFLSBUF check for CAP_SYS_ADMIN before
->ioctl() and BLKROSET after?
---
 block/ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 0de02ee67eed..3f81bc50ac00 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -437,11 +437,12 @@ static int blkdev_roset(struct block_device *bdev, 
fmode_t mode,
 {
int ret, n;
 
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
ret = __blkdev_driver_ioctl(bdev, mode, cmd, arg);
if (!is_unrecognized_ioctl(ret))
return ret;
-   if (!capable(CAP_SYS_ADMIN))
-   return -EACCES;
if (get_user(n, (int __user *)arg))
return -EFAULT;
set_device_ro(bdev, n);
-- 
2.4.3



Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-16 Thread Ilya Dryomov
On Mon, Oct 16, 2017 at 1:44 PM, Christoph Hellwig <h...@infradead.org> wrote:
> On Fri, Oct 06, 2017 at 02:31:20PM +0200, Ilya Dryomov wrote:
>> This would unconditionally overwrite any WRITE ZEROS error.  If we get
>> e.g. -EIO, and manual zeroing is not allowed, I don't think we want to
>> return -EOPNOTSUPP?
>>
>> Returning -EOPNOTSUPP to mean "can't zero using either method" doesn't
>> make sense to me, because manual zeroing is always supported, just not
>> always allowed.
>
> Then thow the throw bdev_write_zeroes_sectors check back in:
>
> if (ret && try_write_zeroes) {
> if (!(flags & BLKDEV_ZERO_NOFALLBACK))
> try_write_zeroes = false;
> goto retry;
> }
> if (!bdev_write_zeroes_sectors(bdev))
> ret = -EOPNOTSUPP;
> }
>
> The important bit is that the structure in the current patch where the
> bdev_write_zeroes_sectors check is on the same level as the method
> selection is extremely confusing.

I see.  An updated version should be in your inbox.

Thanks,

Ilya


[PATCH v3 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-16 Thread Ilya Dryomov
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:

  $ fallocate -zn -l 1k /dev/sdg
  fallocate: fallocate failed: Remote I/O error
  $ fallocate -zn -l 1k /dev/sdg  # OK
  $ fallocate -zn -l 1k /dev/sdg  # OK

The following calls succeed because sd_done() sets ->no_write_same in
response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
__blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.

This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
sd_done() has just set ->no_write_same thus indicating lack of offload
support.

Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
Cc: Christoph Hellwig <h...@lst.de>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: Hannes Reinecke <h...@suse.com>
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 45 +++--
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9d2ab8bba52a..23411ec7f3cc 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -321,12 +321,6 @@ static int __blkdev_issue_zero_pages(struct block_device 
*bdev,
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
- *  Note that this function may fail with -EOPNOTSUPP if the driver signals
- *  zeroing offload support, but the device fails to process the command (for
- *  some devices there is no non-destructive way to verify whether this
- *  operation is actually supported).  In this case the caller should call
- *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
used.
- *
  *  If a device is using logical block provisioning, the underlying space will
  *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
  *
@@ -370,18 +364,49 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
 {
-   int ret;
-   struct bio *bio = NULL;
+   int ret = 0;
+   sector_t bs_mask;
+   struct bio *bio;
struct blk_plug plug;
+   bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
 
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
+
+retry:
+   bio = NULL;
blk_start_plug();
-   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
-   , flags);
+   if (try_write_zeroes) {
+   ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects,
+ gfp_mask, , flags);
+   } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
+   ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
+   gfp_mask, );
+   } else {
+   /* No zeroing offload support */
+   ret = -EOPNOTSUPP;
+   }
if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
bio_put(bio);
}
blk_finish_plug();
+   if (ret && try_write_zeroes) {
+   if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
+   try_write_zeroes = false;
+   goto retry;
+   }
+   if (!bdev_write_zeroes_sectors(bdev)) {
+   /*
+* Zeroing offload support was indicated, but the
+* device reported ILLEGAL REQUEST (for some devices
+* there is no non-destructive way to verify whether
+* WRITE ZEROES is actually supported).
+*/
+   ret = -EOPNOTSUPP;
+   }
+   }
 
return ret;
 }
-- 
2.4.3



[PATCH v3 1/2] block: factor out __blkdev_issue_zero_pages()

2017-10-16 Thread Ilya Dryomov
blkdev_issue_zeroout() will use this in !BLKDEV_ZERO_NOFALLBACK case.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 63 +
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 62240f8832ca..9d2ab8bba52a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -274,6 +274,40 @@ static unsigned int __blkdev_sectors_to_bio_pages(sector_t 
nr_sects)
return min(pages, (sector_t)BIO_MAX_PAGES);
 }
 
+static int __blkdev_issue_zero_pages(struct block_device *bdev,
+   sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+   struct bio **biop)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct bio *bio = *biop;
+   int bi_size = 0;
+   unsigned int sz;
+
+   if (!q)
+   return -ENXIO;
+
+   while (nr_sects != 0) {
+   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
+  gfp_mask);
+   bio->bi_iter.bi_sector = sector;
+   bio_set_dev(bio, bdev);
+   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+   while (nr_sects != 0) {
+   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
+   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
+   nr_sects -= bi_size >> 9;
+   sector += bi_size >> 9;
+   if (bi_size < sz)
+   break;
+   }
+   cond_resched();
+   }
+
+   *biop = bio;
+   return 0;
+}
+
 /**
  * __blkdev_issue_zeroout - generate number of zero filed write bios
  * @bdev:  blockdev to issue
@@ -304,9 +338,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
unsigned flags)
 {
int ret;
-   int bi_size = 0;
-   struct bio *bio = *biop;
-   unsigned int sz;
sector_t bs_mask;
 
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
@@ -316,30 +347,10 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
-   goto out;
-
-   ret = 0;
-   while (nr_sects != 0) {
-   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
-  gfp_mask);
-   bio->bi_iter.bi_sector = sector;
-   bio_set_dev(bio, bdev);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-
-   while (nr_sects != 0) {
-   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
-   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
-   nr_sects -= bi_size >> 9;
-   sector += bi_size >> 9;
-   if (bi_size < sz)
-   break;
-   }
-   cond_resched();
-   }
+   return ret;
 
-   *biop = bio;
-out:
-   return ret;
+   return __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask,
+biop);
 }
 EXPORT_SYMBOL(__blkdev_issue_zeroout);
 
-- 
2.4.3



[PATCH v3 0/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-16 Thread Ilya Dryomov
Hi Christoph, Martin,

blkdev_issue_zeroout() now checks for any error.  This required a minor
refactor, so I dropped the stable tag, Jens can add it back if needed.

v2 -> v3:
- another code flow change in blkdev_issue_zeroout() suggested by
  Christoph -- no functional changes

v1 -> v2:
- changed code flow in blkdev_issue_zeroout() according to Christoph's
  suggestion
- this required adding additional checks to blkdev_issue_zeroout() and
  __blkdev_issue_zero_pages(), the latter is no longer void

Previous version at

  https://marc.info/?l=linux-block=150712940922089=2

Thanks,

Ilya


Ilya Dryomov (2):
  block: factor out __blkdev_issue_zero_pages()
  block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

 block/blk-lib.c | 108 +---
 1 file changed, 72 insertions(+), 36 deletions(-)

-- 
2.4.3



Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-16 Thread Ilya Dryomov
On Fri, Oct 6, 2017 at 2:31 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> On Fri, Oct 6, 2017 at 2:05 PM, Christoph Hellwig <h...@infradead.org> wrote:
>> On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote:
>>> This is to avoid returning -EREMOTEIO in the following case: device
>>> doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout
>>> is called with BLKDEV_ZERO_NOFALLBACK.  Enter blkdev_issue_zeroout(),
>>> bdev_write_zeroes_sectors() != 0, so we issue WRITE ZEROES.  The
>>> request fails with ILLEGAL REQUEST, sd_done() sets ->no_write_same and
>>> updates queue_limits, ILLEGAL REQUEST is translated into -EREMOTEIO,
>>> which is returned from submit_bio_wait().  Manual zeroing is not
>>> allowed, so we must return an error, but it shouldn't be -EREMOTEIO if
>>> queue_limits just got updated because of ILLEGAL REQUEST.  Without this
>>> conditional, we'd get
>>
>> Hmm.  I think we'd better off to just do the before the retry loop:
>>
>> if (ret && try_write_zeroes) {
>> if (!(flags & BLKDEV_ZERO_NOFALLBACK))
>> try_write_zeroes = false;
>> goto retry;
>> }
>> ret = -EOPNOTSUPP;
>> }
>
> This would unconditionally overwrite any WRITE ZEROS error.  If we get
> e.g. -EIO, and manual zeroing is not allowed, I don't think we want to
> return -EOPNOTSUPP?
>
> Returning -EOPNOTSUPP to mean "can't zero using either method" doesn't
> make sense to me, because manual zeroing is always supported, just not
> always allowed.

Ping...

Thanks,

Ilya


Re: false positive lockdep splat with loop device

2017-10-10 Thread Ilya Dryomov
On Tue, Oct 10, 2017 at 11:16 AM, Ilya Dryomov <idryo...@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 6:33 PM, Christoph Hellwig <h...@infradead.org> wrote:
>> Does the patch below fix the warning for you?
>>
>> --
>> From 28aae7104425433d39e6142adcd5b88dc5b0ad5f Mon Sep 17 00:00:00 2001
>> From: Christoph Hellwig <h...@lst.de>
>> Date: Thu, 5 Oct 2017 18:31:02 +0200
>> Subject: block: use DECLARE_COMPLETION_ONSTACK in submit_bio_wait
>>
>> This way we get our lockdep annotations right in case multiple layers
>> in the stack use submit_bio_wait.
>>
>> It also happens to simplify the code by getting rid of the submit_bio_ret
>> structure.
>>
>> Signed-off-by: Christoph Hellwig <h...@lst.de>
>> ---
>>  block/bio.c | 19 +--
>>  1 file changed, 5 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index 8338304ea256..4e18e959fc0a 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -917,17 +917,9 @@ int bio_iov_iter_get_pages(struct bio *bio, struct 
>> iov_iter *iter)
>>  }
>>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>>
>> -struct submit_bio_ret {
>> -   struct completion event;
>> -   int error;
>> -};
>> -
>>  static void submit_bio_wait_endio(struct bio *bio)
>>  {
>> -   struct submit_bio_ret *ret = bio->bi_private;
>> -
>> -   ret->error = blk_status_to_errno(bio->bi_status);
>> -   complete(>event);
>> +   complete(bio->bi_private);
>>  }
>>
>>  /**
>> @@ -943,16 +935,15 @@ static void submit_bio_wait_endio(struct bio *bio)
>>   */
>>  int submit_bio_wait(struct bio *bio)
>>  {
>> -   struct submit_bio_ret ret;
>> +   DECLARE_COMPLETION_ONSTACK(done);
>>
>> -   init_completion();
>> -   bio->bi_private = 
>> +   bio->bi_private = 
>> bio->bi_end_io = submit_bio_wait_endio;
>> bio->bi_opf |= REQ_SYNC;
>> submit_bio(bio);
>> -   wait_for_completion_io();
>> +   wait_for_completion_io();
>>
>> -   return ret.error;
>> +   return blk_status_to_errno(bio->bi_status);
>>  }
>>  EXPORT_SYMBOL(submit_bio_wait);
>
> No, it doesn't -- the splat is a little bit more complicated, but
> fundamentally the same thing.

Easily triggered with generic/361 too, BTW.

Thanks,

Ilya


Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-06 Thread Ilya Dryomov
On Fri, Oct 6, 2017 at 2:05 PM, Christoph Hellwig <h...@infradead.org> wrote:
> On Thu, Oct 05, 2017 at 09:32:33PM +0200, Ilya Dryomov wrote:
>> This is to avoid returning -EREMOTEIO in the following case: device
>> doesn't support WRITE SAME but scsi_disk::max_ws_blocks != 0, zeroout
>> is called with BLKDEV_ZERO_NOFALLBACK.  Enter blkdev_issue_zeroout(),
>> bdev_write_zeroes_sectors() != 0, so we issue WRITE ZEROES.  The
>> request fails with ILLEGAL REQUEST, sd_done() sets ->no_write_same and
>> updates queue_limits, ILLEGAL REQUEST is translated into -EREMOTEIO,
>> which is returned from submit_bio_wait().  Manual zeroing is not
>> allowed, so we must return an error, but it shouldn't be -EREMOTEIO if
>> queue_limits just got updated because of ILLEGAL REQUEST.  Without this
>> conditional, we'd get
>
> Hmm.  I think we'd better off to just do the before the retry loop:
>
> if (ret && try_write_zeroes) {
> if (!(flags & BLKDEV_ZERO_NOFALLBACK))
> try_write_zeroes = false;
> goto retry;
> }
> ret = -EOPNOTSUPP;
> }

This would unconditionally overwrite any WRITE ZEROS error.  If we get
e.g. -EIO, and manual zeroing is not allowed, I don't think we want to
return -EOPNOTSUPP?

Returning -EOPNOTSUPP to mean "can't zero using either method" doesn't
make sense to me, because manual zeroing is always supported, just not
always allowed.

Thanks,

Ilya


Re: [PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-05 Thread Ilya Dryomov
On Thu, Oct 5, 2017 at 7:13 PM, Christoph Hellwig <h...@infradead.org> wrote:
> On Wed, Oct 04, 2017 at 05:03:16PM +0200, Ilya Dryomov wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
>> SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
>>
>>   $ fallocate -zn -l 1k /dev/sdg
>>   fallocate: fallocate failed: Remote I/O error
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>
> Can we wire this up for blktests somehow?

This is covered by Darrick's generic/351, part of fstests blockdev
group.

>
>>
>> The following calls succeed because sd_done() sets ->no_write_same in
>> response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
>> __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
>>
>> This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
>> and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
>> specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
>> sd_done() has just set ->no_write_same thus indicating lack of offload
>> support.
>>
>> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
>> Cc: Christoph Hellwig <h...@lst.de>
>> Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
>> Cc: Hannes Reinecke <h...@suse.com>
>> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
>> ---
>>  block/blk-lib.c | 41 +++--
>>  1 file changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 9d2ab8bba52a..17494275673e 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -321,12 +321,6 @@ static int __blkdev_issue_zero_pages(struct 
>> block_device *bdev,
>>   *  Zero-fill a block range, either using hardware offload or by explicitly
>>   *  writing zeroes to the device.
>>   *
>> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
>> - *  zeroing offload support, but the device fails to process the command 
>> (for
>> - *  some devices there is no non-destructive way to verify whether this
>> - *  operation is actually supported).  In this case the caller should call
>> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
>> used.
>> - *
>>   *  If a device is using logical block provisioning, the underlying space 
>> will
>>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>>   *
>> @@ -370,18 +364,45 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
>>  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>   sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
>>  {
>> - int ret;
>> - struct bio *bio = NULL;
>> + int ret = 0;
>> + sector_t bs_mask;
>> + struct bio *bio;
>>   struct blk_plug plug;
>> + bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
>> +
>> + bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>> + if ((sector | nr_sects) & bs_mask)
>> + return -EINVAL;
>>
>> +retry:
>> + bio = NULL;
>>   blk_start_plug();
>> - ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
>> - , flags);
>> + if (try_write_zeroes) {
>> + ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects,
>> +   gfp_mask, , flags);
>> + } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
>> + ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
>> + gfp_mask, );
>> + } else if (!bdev_write_zeroes_sectors(bdev)) {
>> + /*
>> +  * Manual zeroout is not allowed and either:
>> +  * - no zeroing offload support
>> +  * - zeroing offload support was indicated, but the device
>> +  *   reported ILLEGAL REQUEST (for some devices there is no
>> +  *   non-destructive way to verify whether WRITE ZEROES is
>> +  *   actually supported)
>> +  */
>> + ret = -EOPNOTSUPP;
>
> I don't understand the conditional above this error return - if
> we can't zero using either method we should always return an error.

This is to avoid return

[PATCH v2 1/2] block: factor out __blkdev_issue_zero_pages()

2017-10-04 Thread Ilya Dryomov
blkdev_issue_zeroout() will use this in !BLKDEV_ZERO_NOFALLBACK case.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 63 +
 1 file changed, 37 insertions(+), 26 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 62240f8832ca..9d2ab8bba52a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -274,6 +274,40 @@ static unsigned int __blkdev_sectors_to_bio_pages(sector_t 
nr_sects)
return min(pages, (sector_t)BIO_MAX_PAGES);
 }
 
+static int __blkdev_issue_zero_pages(struct block_device *bdev,
+   sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+   struct bio **biop)
+{
+   struct request_queue *q = bdev_get_queue(bdev);
+   struct bio *bio = *biop;
+   int bi_size = 0;
+   unsigned int sz;
+
+   if (!q)
+   return -ENXIO;
+
+   while (nr_sects != 0) {
+   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
+  gfp_mask);
+   bio->bi_iter.bi_sector = sector;
+   bio_set_dev(bio, bdev);
+   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+   while (nr_sects != 0) {
+   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
+   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
+   nr_sects -= bi_size >> 9;
+   sector += bi_size >> 9;
+   if (bi_size < sz)
+   break;
+   }
+   cond_resched();
+   }
+
+   *biop = bio;
+   return 0;
+}
+
 /**
  * __blkdev_issue_zeroout - generate number of zero filed write bios
  * @bdev:  blockdev to issue
@@ -304,9 +338,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
unsigned flags)
 {
int ret;
-   int bi_size = 0;
-   struct bio *bio = *biop;
-   unsigned int sz;
sector_t bs_mask;
 
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
@@ -316,30 +347,10 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
-   goto out;
-
-   ret = 0;
-   while (nr_sects != 0) {
-   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
-  gfp_mask);
-   bio->bi_iter.bi_sector = sector;
-   bio_set_dev(bio, bdev);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-
-   while (nr_sects != 0) {
-   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
-   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
-   nr_sects -= bi_size >> 9;
-   sector += bi_size >> 9;
-   if (bi_size < sz)
-   break;
-   }
-   cond_resched();
-   }
+   return ret;
 
-   *biop = bio;
-out:
-   return ret;
+   return __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask,
+biop);
 }
 EXPORT_SYMBOL(__blkdev_issue_zeroout);
 
-- 
2.4.3



[PATCH v2 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-04 Thread Ilya Dryomov
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:

  $ fallocate -zn -l 1k /dev/sdg
  fallocate: fallocate failed: Remote I/O error
  $ fallocate -zn -l 1k /dev/sdg  # OK
  $ fallocate -zn -l 1k /dev/sdg  # OK

The following calls succeed because sd_done() sets ->no_write_same in
response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
__blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.

This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
sd_done() has just set ->no_write_same thus indicating lack of offload
support.

Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
Cc: Christoph Hellwig <h...@lst.de>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: Hannes Reinecke <h...@suse.com>
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 41 +++--
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9d2ab8bba52a..17494275673e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -321,12 +321,6 @@ static int __blkdev_issue_zero_pages(struct block_device 
*bdev,
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
- *  Note that this function may fail with -EOPNOTSUPP if the driver signals
- *  zeroing offload support, but the device fails to process the command (for
- *  some devices there is no non-destructive way to verify whether this
- *  operation is actually supported).  In this case the caller should call
- *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
used.
- *
  *  If a device is using logical block provisioning, the underlying space will
  *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
  *
@@ -370,18 +364,45 @@ EXPORT_SYMBOL(__blkdev_issue_zeroout);
 int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
 {
-   int ret;
-   struct bio *bio = NULL;
+   int ret = 0;
+   sector_t bs_mask;
+   struct bio *bio;
struct blk_plug plug;
+   bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
+
+   bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+   if ((sector | nr_sects) & bs_mask)
+   return -EINVAL;
 
+retry:
+   bio = NULL;
blk_start_plug();
-   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
-   , flags);
+   if (try_write_zeroes) {
+   ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects,
+ gfp_mask, , flags);
+   } else if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
+   ret = __blkdev_issue_zero_pages(bdev, sector, nr_sects,
+   gfp_mask, );
+   } else if (!bdev_write_zeroes_sectors(bdev)) {
+   /*
+* Manual zeroout is not allowed and either:
+* - no zeroing offload support
+* - zeroing offload support was indicated, but the device
+*   reported ILLEGAL REQUEST (for some devices there is no
+*   non-destructive way to verify whether WRITE ZEROES is
+*   actually supported)
+*/
+   ret = -EOPNOTSUPP;
+   }
if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
bio_put(bio);
}
blk_finish_plug();
+   if (ret && try_write_zeroes) {
+   try_write_zeroes = false;
+   goto retry;
+   }
 
return ret;
 }
-- 
2.4.3



[PATCH v2 0/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-04 Thread Ilya Dryomov
Hi Christoph, Martin,

blkdev_issue_zeroout() now checks for any error.  This required a minor
refactor, so I dropped the stable tag, Jens can add it back if needed.

v1 -> v2:
- changed code flow in blkdev_issue_zeroout() according to Christoph's
  suggestion
- this required adding additional checks to blkdev_issue_zeroout() and
  __blkdev_issue_zero_pages(), the latter is no longer void

Previous patch(es) and discussion at

  https://marc.info/?l=linux-block=150471953327942=2
  https://marc.info/?l=linux-block=150601399031909=2

Thanks,

Ilya


Ilya Dryomov (2):
  block: factor out __blkdev_issue_zero_pages()
  block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

 block/blk-lib.c | 104 
 1 file changed, 68 insertions(+), 36 deletions(-)

-- 
2.4.3



Re: [PATCH 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-04 Thread Ilya Dryomov
On Tue, Oct 3, 2017 at 10:04 AM, Christoph Hellwig <h...@infradead.org> wrote:
> On Thu, Sep 21, 2017 at 07:12:52PM +0200, Ilya Dryomov wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
>> SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:
>>
>>   $ fallocate -zn -l 1k /dev/sdg
>>   fallocate: fallocate failed: Remote I/O error
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>
>> The following calls succeed because sd_done() sets ->no_write_same in
>> response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
>> __blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.
>>
>> This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
>> and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
>> specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
>> sd_done() has just set ->no_write_same thus indicating lack of offload
>> support.
>>
>> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
>> Cc: Christoph Hellwig <h...@lst.de>
>> Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
>> Cc: Hannes Reinecke <h...@suse.com>
>> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
>> ---
>>  block/blk-lib.c | 27 +--
>>  1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 6b97feb71065..1cb402beb983 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -316,12 +316,6 @@ static void __blkdev_issue_zero_pages(struct 
>> block_device *bdev,
>>   *  Zero-fill a block range, either using hardware offload or by explicitly
>>   *  writing zeroes to the device.
>>   *
>> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
>> - *  zeroing offload support, but the device fails to process the command 
>> (for
>> - *  some devices there is no non-destructive way to verify whether this
>> - *  operation is actually supported).  In this case the caller should call
>> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
>> used.
>> - *
>>   *  If a device is using logical block provisioning, the underlying space 
>> will
>>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>>   *
>> @@ -374,6 +368,27 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
>> sector_t sector,
>>   , flags);
>>   if (ret == 0 && bio) {
>>   ret = submit_bio_wait(bio);
>> + /*
>> +  * Fall back to a manual zeroout on any error, if allowed.
>> +  *
>> +  * Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
>> +  * driver signals zeroing offload support, but the device
>> +  * fails to process the command (for some devices there is no
>> +  * non-destructive way to verify whether this operation is
>> +  * actually supported).
>> +  */
>> + if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {
>
> No need for the additional levels of indentation here.  Also I
> really do not like the logic, we shouldn't have to duplicate much
> of the logic multiple times.
>
> I'd more go for something like (sketched in mail):
>
> bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev);
>
> retry:
> bio = NULL;
> blk_start_plug();
> if (try_write_zeroes)
> ret = __blkdev_issue_write_zeroes(...)
> else
> ret = __blkdev_issue_zero_pages(...)
> if (ret == 0 && bio) {
> ret = submit_bio_wait(bio);
> bio_put(bio);
> }
> blk_finish_plug();
> if (ret && try_write_zeroes) {
> try_write_zeroes = false;
> goto retry;
> }

Yeah, I didn't like the code flow either but we are going to duplicate
some of it either way.  In particular, !bdev_write_zeroes_sectors() ->
ret = -EOPNOTSUPP part is still needed to avoid propagating -EREMOTEIO
in BLKDEV_ZERO_NOFALLBACK case:

if (try_write_zeroes)
ret = __blkdev_issue_write_zeroes(...);
else if (!(flags & BLKDEV_ZERO_NOFALLBACK))
ret = __blkdev_issue_zero_pages(...);
else if (!bdev_write_zeroes_sectors(bdev))
ret = -EOPNOTSUPP;

bs_mask check from __blkdev_issue_zeroout() too.

I'll post v2 in a few.

Thanks,

Ilya


Re: [PATCH 0/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-10-02 Thread Ilya Dryomov
On Thu, Sep 21, 2017 at 7:12 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> Hi Christoph, Martin,
>
> blkdev_issue_zeroout() now checks for any error.  This required a minor
> refactor, so I dropped the stable tag, Jens can add it back if needed.
>
> Previous patch and discussion at
>
>   https://marc.info/?l=linux-block=150471953327942=2
>
> Thanks,
>
> Ilya
>
>
> Ilya Dryomov (2):
>   block: factor out __blkdev_issue_zero_pages()
>   block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()
>
>  block/blk-lib.c | 85 
> +++--
>  1 file changed, 53 insertions(+), 32 deletions(-)

Ping...  Christoph, could you please take a look?

Thanks,

Ilya


[PATCH 2/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-09-21 Thread Ilya Dryomov
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
is set.  Because REQ_OP_WRITE_ZEROES is implemented in terms of WRITE
SAME, blkdev_issue_zeroout() may fail with -EREMOTEIO:

  $ fallocate -zn -l 1k /dev/sdg
  fallocate: fallocate failed: Remote I/O error
  $ fallocate -zn -l 1k /dev/sdg  # OK
  $ fallocate -zn -l 1k /dev/sdg  # OK

The following calls succeed because sd_done() sets ->no_write_same in
response to a sense that would become BLK_STS_TARGET/-EREMOTEIO, causing
__blkdev_issue_zeroout() to fall back to generating ZERO_PAGE bios.

This means blkdev_issue_zeroout() must cope with WRITE ZEROES failing
and fall back to manually zeroing, unless BLKDEV_ZERO_NOFALLBACK is
specified.  For BLKDEV_ZERO_NOFALLBACK case, return -EOPNOTSUPP if
sd_done() has just set ->no_write_same thus indicating lack of offload
support.

Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
Cc: Christoph Hellwig <h...@lst.de>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: Hannes Reinecke <h...@suse.com>
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 6b97feb71065..1cb402beb983 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -316,12 +316,6 @@ static void __blkdev_issue_zero_pages(struct block_device 
*bdev,
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
- *  Note that this function may fail with -EOPNOTSUPP if the driver signals
- *  zeroing offload support, but the device fails to process the command (for
- *  some devices there is no non-destructive way to verify whether this
- *  operation is actually supported).  In this case the caller should call
- *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
used.
- *
  *  If a device is using logical block provisioning, the underlying space will
  *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
  *
@@ -374,6 +368,27 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
, flags);
if (ret == 0 && bio) {
ret = submit_bio_wait(bio);
+   /*
+* Fall back to a manual zeroout on any error, if allowed.
+*
+* Particularly, WRITE ZEROES may fail with -EREMOTEIO if the
+* driver signals zeroing offload support, but the device
+* fails to process the command (for some devices there is no
+* non-destructive way to verify whether this operation is
+* actually supported).
+*/
+   if (ret && bio_op(bio) == REQ_OP_WRITE_ZEROES) {
+   if (flags & BLKDEV_ZERO_NOFALLBACK) {
+   if (!bdev_write_zeroes_sectors(bdev))
+   ret = -EOPNOTSUPP;
+   } else {
+   bio_put(bio);
+   bio = NULL;
+   __blkdev_issue_zero_pages(bdev, sector,
+   nr_sects, gfp_mask, );
+   ret = submit_bio_wait(bio);
+   }
+   }
bio_put(bio);
}
blk_finish_plug();
-- 
2.4.3



[PATCH 0/2] block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

2017-09-21 Thread Ilya Dryomov
Hi Christoph, Martin,

blkdev_issue_zeroout() now checks for any error.  This required a minor
refactor, so I dropped the stable tag, Jens can add it back if needed.

Previous patch and discussion at

  https://marc.info/?l=linux-block=150471953327942=2

Thanks,

Ilya


Ilya Dryomov (2):
  block: factor out __blkdev_issue_zero_pages()
  block: cope with WRITE ZEROES failing in blkdev_issue_zeroout()

 block/blk-lib.c | 85 +++--
 1 file changed, 53 insertions(+), 32 deletions(-)

-- 
2.4.3



[PATCH 1/2] block: factor out __blkdev_issue_zero_pages()

2017-09-21 Thread Ilya Dryomov
blkdev_issue_zeroout() will use this in !BLKDEV_ZERO_NOFALLBACK case.

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 58 +++--
 1 file changed, 32 insertions(+), 26 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 62240f8832ca..6b97feb71065 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -274,6 +274,35 @@ static unsigned int __blkdev_sectors_to_bio_pages(sector_t 
nr_sects)
return min(pages, (sector_t)BIO_MAX_PAGES);
 }
 
+static void __blkdev_issue_zero_pages(struct block_device *bdev,
+   sector_t sector, sector_t nr_sects, gfp_t gfp_mask,
+   struct bio **biop)
+{
+   struct bio *bio = *biop;
+   int bi_size = 0;
+   unsigned int sz;
+
+   while (nr_sects != 0) {
+   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
+  gfp_mask);
+   bio->bi_iter.bi_sector = sector;
+   bio_set_dev(bio, bdev);
+   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+
+   while (nr_sects != 0) {
+   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
+   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
+   nr_sects -= bi_size >> 9;
+   sector += bi_size >> 9;
+   if (bi_size < sz)
+   break;
+   }
+   cond_resched();
+   }
+
+   *biop = bio;
+}
+
 /**
  * __blkdev_issue_zeroout - generate number of zero filed write bios
  * @bdev:  blockdev to issue
@@ -304,9 +333,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
unsigned flags)
 {
int ret;
-   int bi_size = 0;
-   struct bio *bio = *biop;
-   unsigned int sz;
sector_t bs_mask;
 
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
@@ -316,30 +342,10 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
ret = __blkdev_issue_write_zeroes(bdev, sector, nr_sects, gfp_mask,
biop, flags);
if (ret != -EOPNOTSUPP || (flags & BLKDEV_ZERO_NOFALLBACK))
-   goto out;
-
-   ret = 0;
-   while (nr_sects != 0) {
-   bio = next_bio(bio, __blkdev_sectors_to_bio_pages(nr_sects),
-  gfp_mask);
-   bio->bi_iter.bi_sector = sector;
-   bio_set_dev(bio, bdev);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
-
-   while (nr_sects != 0) {
-   sz = min((sector_t) PAGE_SIZE, nr_sects << 9);
-   bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0);
-   nr_sects -= bi_size >> 9;
-   sector += bi_size >> 9;
-   if (bi_size < sz)
-   break;
-   }
-   cond_resched();
-   }
+   return ret;
 
-   *biop = bio;
-out:
-   return ret;
+   __blkdev_issue_zero_pages(bdev, sector, nr_sects, gfp_mask, biop);
+   return 0;
 }
 EXPORT_SYMBOL(__blkdev_issue_zeroout);
 
-- 
2.4.3



Re: [PATCH] block: cope with WRITE SAME failing in blkdev_issue_zeroout()

2017-09-20 Thread Ilya Dryomov
On Tue, Sep 19, 2017 at 10:32 PM, Christoph Hellwig <h...@lst.de> wrote:
> On Wed, Sep 06, 2017 at 07:38:10PM +0200, Ilya Dryomov wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set.  This means blkdev_issue_zeroout() must cope with WRITE SAME
>> failing with BLK_STS_TARGET/-EREMOTEIO and explicitly write zeroes,
>> unless BLKDEV_ZERO_NOFALLBACK is specified.
>>
>> Commit 71027e97d796 ("block: stop using discards for zeroing") added
>> the following to __blkdev_issue_zeroout() comment:
>>
>>   "Note that this function may fail with -EOPNOTSUPP if the driver signals
>>   zeroing offload support, but the device fails to process the command (for
>>   some devices there is no non-destructive way to verify whether this
>>   operation is actually supported).  In this case the caller should call
>>   retry the call to blkdev_issue_zeroout() and the fallback path will be 
>> used."
>>
>> But __blkdev_issue_zeroout() doesn't fail in this case: if WRITE SAME
>> support is indicated, a REQ_OP_WRITE_ZEROES bio is built and returned
>> to blkdev_issue_zeroout().  -EREMOTEIO is then propagated up:
>>
>>   $ fallocate -zn -l 1k /dev/sdg
>>   fallocate: fallocate failed: Remote I/O error
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>
>> (The second fallocate(1) succeeds because sd_done() sets ->no_write_same
>> in response to a sense that would become BLK_STS_TARGET.)
>>
>> Retry __blkdev_issue_zeroout() if the I/O fails with -EREMOTEIO.  This
>> is roughly what we did until 4.12, sans BLKDEV_ZERO_NOFALLBACK knob.
>
> I'm really not sure we should check for -EREMOTEIO specifically, but
> Martin who is more familiar with the SCSI code might be able to
> correct me, I'd feel safer about checking for any error which is
> what the old code did.

Yeah, I pondered that.  The old code did check for any error, but it
wouldn't attempt another WRITE SAME after a failure.  To do that here,
we'd need to refactor __blkdev_issue_zeroout(); as it is, if we retry
__blkdev_issue_zeroout() on a random error, it would happily go the
REQ_OP_WRITE_ZEROES way again.

Martin, what do you think?

Thanks,

Ilya


Re: [PATCH] block: cope with WRITE SAME failing in blkdev_issue_zeroout()

2017-09-19 Thread Ilya Dryomov
On Wed, Sep 13, 2017 at 11:50 AM, Ilya Dryomov <idryo...@gmail.com> wrote:
> On Wed, Sep 6, 2017 at 7:38 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
>> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
>> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
>> is set.  This means blkdev_issue_zeroout() must cope with WRITE SAME
>> failing with BLK_STS_TARGET/-EREMOTEIO and explicitly write zeroes,
>> unless BLKDEV_ZERO_NOFALLBACK is specified.
>>
>> Commit 71027e97d796 ("block: stop using discards for zeroing") added
>> the following to __blkdev_issue_zeroout() comment:
>>
>>   "Note that this function may fail with -EOPNOTSUPP if the driver signals
>>   zeroing offload support, but the device fails to process the command (for
>>   some devices there is no non-destructive way to verify whether this
>>   operation is actually supported).  In this case the caller should call
>>   retry the call to blkdev_issue_zeroout() and the fallback path will be 
>> used."
>>
>> But __blkdev_issue_zeroout() doesn't fail in this case: if WRITE SAME
>> support is indicated, a REQ_OP_WRITE_ZEROES bio is built and returned
>> to blkdev_issue_zeroout().  -EREMOTEIO is then propagated up:
>>
>>   $ fallocate -zn -l 1k /dev/sdg
>>   fallocate: fallocate failed: Remote I/O error
>>   $ fallocate -zn -l 1k /dev/sdg  # OK
>>
>> (The second fallocate(1) succeeds because sd_done() sets ->no_write_same
>> in response to a sense that would become BLK_STS_TARGET.)
>>
>> Retry __blkdev_issue_zeroout() if the I/O fails with -EREMOTEIO.  This
>> is roughly what we did until 4.12, sans BLKDEV_ZERO_NOFALLBACK knob.
>>
>> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
>> Cc: Christoph Hellwig <h...@lst.de>
>> Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
>> Cc: Hannes Reinecke <h...@suse.com>
>> Cc: sta...@vger.kernel.org # 4.12+
>> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
>> ---
>>  block/blk-lib.c | 49 +
>>  1 file changed, 33 insertions(+), 16 deletions(-)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 3fe0aec90597..876b5478c1a2 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -287,12 +287,6 @@ static unsigned int 
>> __blkdev_sectors_to_bio_pages(sector_t nr_sects)
>>   *  Zero-fill a block range, either using hardware offload or by explicitly
>>   *  writing zeroes to the device.
>>   *
>> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
>> - *  zeroing offload support, but the device fails to process the command 
>> (for
>> - *  some devices there is no non-destructive way to verify whether this
>> - *  operation is actually supported).  In this case the caller should call
>> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
>> used.
>> - *
>>   *  If a device is using logical block provisioning, the underlying space 
>> will
>>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>>   *
>> @@ -343,6 +337,34 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
>> sector_t sector,
>>  }
>>  EXPORT_SYMBOL(__blkdev_issue_zeroout);
>>
>> +/*
>> + * This function may fail with -EREMOTEIO if the driver signals zeroing
>> + * offload support, but the device fails to process the command (for
>> + * some devices there is no non-destructive way to verify whether this
>> + * operation is actually supported).  In this case the caller should
>> + * retry and the fallback path in __blkdev_issue_zeroout() will be used
>> + * unless %flags contains BLKDEV_ZERO_NOFALLBACK.
>> + */
>> +static int __blkdev_issue_zeroout_wait(struct block_device *bdev,
>> +  sector_t sector, sector_t nr_sects,
>> +  gfp_t gfp_mask, unsigned flags)
>> +{
>> +   int ret;
>> +   struct bio *bio = NULL;
>> +   struct blk_plug plug;
>> +
>> +   blk_start_plug();
>> +   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
>> +   , flags);
>> +   if (ret == 0 && bio) {
>> +   ret = submit_bio_wait(bio);
>> +   bio_put(bio);
>> +   }
>> +   blk_finish_plug();
>> +
>> +   return ret;
>> +}
>> +
>>  /**
>>   * blkdev_issue_zeroout - 

Re: [PATCH] block: cope with WRITE SAME failing in blkdev_issue_zeroout()

2017-09-13 Thread Ilya Dryomov
On Wed, Sep 6, 2017 at 7:38 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
> permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
> is set.  This means blkdev_issue_zeroout() must cope with WRITE SAME
> failing with BLK_STS_TARGET/-EREMOTEIO and explicitly write zeroes,
> unless BLKDEV_ZERO_NOFALLBACK is specified.
>
> Commit 71027e97d796 ("block: stop using discards for zeroing") added
> the following to __blkdev_issue_zeroout() comment:
>
>   "Note that this function may fail with -EOPNOTSUPP if the driver signals
>   zeroing offload support, but the device fails to process the command (for
>   some devices there is no non-destructive way to verify whether this
>   operation is actually supported).  In this case the caller should call
>   retry the call to blkdev_issue_zeroout() and the fallback path will be 
> used."
>
> But __blkdev_issue_zeroout() doesn't fail in this case: if WRITE SAME
> support is indicated, a REQ_OP_WRITE_ZEROES bio is built and returned
> to blkdev_issue_zeroout().  -EREMOTEIO is then propagated up:
>
>   $ fallocate -zn -l 1k /dev/sdg
>   fallocate: fallocate failed: Remote I/O error
>   $ fallocate -zn -l 1k /dev/sdg  # OK
>
> (The second fallocate(1) succeeds because sd_done() sets ->no_write_same
> in response to a sense that would become BLK_STS_TARGET.)
>
> Retry __blkdev_issue_zeroout() if the I/O fails with -EREMOTEIO.  This
> is roughly what we did until 4.12, sans BLKDEV_ZERO_NOFALLBACK knob.
>
> Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
> Cc: Christoph Hellwig <h...@lst.de>
> Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
> Cc: Hannes Reinecke <h...@suse.com>
> Cc: sta...@vger.kernel.org # 4.12+
> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
> ---
>  block/blk-lib.c | 49 +
>  1 file changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 3fe0aec90597..876b5478c1a2 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -287,12 +287,6 @@ static unsigned int 
> __blkdev_sectors_to_bio_pages(sector_t nr_sects)
>   *  Zero-fill a block range, either using hardware offload or by explicitly
>   *  writing zeroes to the device.
>   *
> - *  Note that this function may fail with -EOPNOTSUPP if the driver signals
> - *  zeroing offload support, but the device fails to process the command (for
> - *  some devices there is no non-destructive way to verify whether this
> - *  operation is actually supported).  In this case the caller should call
> - *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
> used.
> - *
>   *  If a device is using logical block provisioning, the underlying space 
> will
>   *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
>   *
> @@ -343,6 +337,34 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
>  }
>  EXPORT_SYMBOL(__blkdev_issue_zeroout);
>
> +/*
> + * This function may fail with -EREMOTEIO if the driver signals zeroing
> + * offload support, but the device fails to process the command (for
> + * some devices there is no non-destructive way to verify whether this
> + * operation is actually supported).  In this case the caller should
> + * retry and the fallback path in __blkdev_issue_zeroout() will be used
> + * unless %flags contains BLKDEV_ZERO_NOFALLBACK.
> + */
> +static int __blkdev_issue_zeroout_wait(struct block_device *bdev,
> +  sector_t sector, sector_t nr_sects,
> +  gfp_t gfp_mask, unsigned flags)
> +{
> +   int ret;
> +   struct bio *bio = NULL;
> +   struct blk_plug plug;
> +
> +   blk_start_plug();
> +   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
> +   , flags);
> +   if (ret == 0 && bio) {
> +   ret = submit_bio_wait(bio);
> +   bio_put(bio);
> +   }
> +   blk_finish_plug();
> +
> +   return ret;
> +}
> +
>  /**
>   * blkdev_issue_zeroout - zero-fill a block range
>   * @bdev:  blockdev to write
> @@ -360,17 +382,12 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
> sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
>  {
> int ret;
> -   struct bio *bio = NULL;
> -   struct blk_plug plug;
>
> -   blk_start_plug();
> -   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
> -  

Re: [PATCH V2 0/3] block/loop: handle discard/zeroout error

2017-09-07 Thread Ilya Dryomov
Hi Shaohua,

You wrote:
> BTW: blkdev_issue_zeroout retries if we immediately find the device doesn't
> support zeroout, but it doesn't retry if submit_bio_wait returns -EOPNOTSUPP.
> Is this correct behavior?

I sent a patch for that yesterday, see "[PATCH] block: cope with WRITE
SAME failing in blkdev_issue_zeroout()" on linux-block.  It checks for
-EREMOTEIO, because that's what I hit, but I wonder if it should check
for -EOPNOTSUPP from submit_bio_wait() as well.  Can you think of
a case where, given bdev_write_zeroes_sectors() != 0, submit_bio_wait()
would fail with -EOPNOTSUPP and we would want to do explicit zeroing?

Please excuse broken threading.

Thanks,

Ilya


[PATCH] block: cope with WRITE SAME failing in blkdev_issue_zeroout()

2017-09-06 Thread Ilya Dryomov
sd_config_write_same() ignores ->max_ws_blocks == 0 and resets it to
permit trying WRITE SAME on older SCSI devices, unless ->no_write_same
is set.  This means blkdev_issue_zeroout() must cope with WRITE SAME
failing with BLK_STS_TARGET/-EREMOTEIO and explicitly write zeroes,
unless BLKDEV_ZERO_NOFALLBACK is specified.

Commit 71027e97d796 ("block: stop using discards for zeroing") added
the following to __blkdev_issue_zeroout() comment:

  "Note that this function may fail with -EOPNOTSUPP if the driver signals
  zeroing offload support, but the device fails to process the command (for
  some devices there is no non-destructive way to verify whether this
  operation is actually supported).  In this case the caller should call
  retry the call to blkdev_issue_zeroout() and the fallback path will be used."

But __blkdev_issue_zeroout() doesn't fail in this case: if WRITE SAME
support is indicated, a REQ_OP_WRITE_ZEROES bio is built and returned
to blkdev_issue_zeroout().  -EREMOTEIO is then propagated up:

  $ fallocate -zn -l 1k /dev/sdg
  fallocate: fallocate failed: Remote I/O error
  $ fallocate -zn -l 1k /dev/sdg  # OK

(The second fallocate(1) succeeds because sd_done() sets ->no_write_same
in response to a sense that would become BLK_STS_TARGET.)

Retry __blkdev_issue_zeroout() if the I/O fails with -EREMOTEIO.  This
is roughly what we did until 4.12, sans BLKDEV_ZERO_NOFALLBACK knob.

Fixes: c20cfc27a473 ("block: stop using blkdev_issue_write_same for zeroing")
Cc: Christoph Hellwig <h...@lst.de>
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: Hannes Reinecke <h...@suse.com>
Cc: sta...@vger.kernel.org # 4.12+
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-lib.c | 49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3fe0aec90597..876b5478c1a2 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -287,12 +287,6 @@ static unsigned int __blkdev_sectors_to_bio_pages(sector_t 
nr_sects)
  *  Zero-fill a block range, either using hardware offload or by explicitly
  *  writing zeroes to the device.
  *
- *  Note that this function may fail with -EOPNOTSUPP if the driver signals
- *  zeroing offload support, but the device fails to process the command (for
- *  some devices there is no non-destructive way to verify whether this
- *  operation is actually supported).  In this case the caller should call
- *  retry the call to blkdev_issue_zeroout() and the fallback path will be 
used.
- *
  *  If a device is using logical block provisioning, the underlying space will
  *  not be released if %flags contains BLKDEV_ZERO_NOUNMAP.
  *
@@ -343,6 +337,34 @@ int __blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
 }
 EXPORT_SYMBOL(__blkdev_issue_zeroout);
 
+/*
+ * This function may fail with -EREMOTEIO if the driver signals zeroing
+ * offload support, but the device fails to process the command (for
+ * some devices there is no non-destructive way to verify whether this
+ * operation is actually supported).  In this case the caller should
+ * retry and the fallback path in __blkdev_issue_zeroout() will be used
+ * unless %flags contains BLKDEV_ZERO_NOFALLBACK.
+ */
+static int __blkdev_issue_zeroout_wait(struct block_device *bdev,
+  sector_t sector, sector_t nr_sects,
+  gfp_t gfp_mask, unsigned flags)
+{
+   int ret;
+   struct bio *bio = NULL;
+   struct blk_plug plug;
+
+   blk_start_plug();
+   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
+   , flags);
+   if (ret == 0 && bio) {
+   ret = submit_bio_wait(bio);
+   bio_put(bio);
+   }
+   blk_finish_plug();
+
+   return ret;
+}
+
 /**
  * blkdev_issue_zeroout - zero-fill a block range
  * @bdev:  blockdev to write
@@ -360,17 +382,12 @@ int blkdev_issue_zeroout(struct block_device *bdev, 
sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned flags)
 {
int ret;
-   struct bio *bio = NULL;
-   struct blk_plug plug;
 
-   blk_start_plug();
-   ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
-   , flags);
-   if (ret == 0 && bio) {
-   ret = submit_bio_wait(bio);
-   bio_put(bio);
-   }
-   blk_finish_plug();
+   ret = __blkdev_issue_zeroout_wait(bdev, sector, nr_sects,
+ gfp_mask, flags);
+   if (ret == -EREMOTEIO)
+   ret = __blkdev_issue_zeroout_wait(bdev, sector, nr_sects,
+ gfp_mask, flags);
 
return ret;
 }
-- 
2.4.3



Re: [PATCH v2 2/2] nbd: don't leak nbd_config

2017-05-29 Thread Ilya Dryomov
On Tue, May 23, 2017 at 5:49 PM, Ilya Dryomov <idryo...@gmail.com> wrote:
> nbd_config is allocated in nbd_alloc_config(), but never freed.
>
> Fixes: 5ea8d10802ec ("nbd: separate out the config information")
> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
> ---
>  drivers/block/nbd.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index e725d8d5ab0b..f3f191ba8ca4 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1021,6 +1021,7 @@ static void nbd_config_put(struct nbd_device *nbd)
> }
> kfree(config->socks);
> }
> +   kfree(nbd->config);
> nbd->config = NULL;
>
> nbd->tag_set.timeout = 0;

Hi Josef,

Ping -- I think these should go into 4.12-rc, Jens is probably waiting
for your OK.

Thanks,

Ilya


[PATCH v2 1/2] nbd: nbd_reset() call in nbd_dev_add() is redundant

2017-05-23 Thread Ilya Dryomov
There is nothing to clear -- nbd_device has just been allocated.
Fold nbd_reset() into its other caller, nbd_config_put().

Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 drivers/block/nbd.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9a7bb2c29447..e725d8d5ab0b 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -937,14 +937,6 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, 
unsigned long arg)
return -ENOSPC;
 }
 
-/* Reset all properties of an NBD device */
-static void nbd_reset(struct nbd_device *nbd)
-{
-   nbd->config = NULL;
-   nbd->tag_set.timeout = 0;
-   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
-}
-
 static void nbd_bdev_reset(struct block_device *bdev)
 {
if (bdev->bd_openers > 1)
@@ -1029,7 +1021,10 @@ static void nbd_config_put(struct nbd_device *nbd)
}
kfree(config->socks);
}
-   nbd_reset(nbd);
+   nbd->config = NULL;
+
+   nbd->tag_set.timeout = 0;
+   queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
 
mutex_unlock(>config_lock);
nbd_put(nbd);
@@ -1483,7 +1478,6 @@ static int nbd_dev_add(int index)
disk->fops = _fops;
disk->private_data = nbd;
sprintf(disk->disk_name, "nbd%d", index);
-   nbd_reset(nbd);
add_disk(disk);
nbd_total_devices++;
return index;
-- 
2.4.3



[PATCH v2 2/2] nbd: don't leak nbd_config

2017-05-23 Thread Ilya Dryomov
nbd_config is allocated in nbd_alloc_config(), but never freed.

Fixes: 5ea8d10802ec ("nbd: separate out the config information")
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 drivers/block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index e725d8d5ab0b..f3f191ba8ca4 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1021,6 +1021,7 @@ static void nbd_config_put(struct nbd_device *nbd)
}
kfree(config->socks);
}
+   kfree(nbd->config);
nbd->config = NULL;
 
nbd->tag_set.timeout = 0;
-- 
2.4.3



Re: [PATCH] nbd: don't leak nbd_config

2017-05-23 Thread Ilya Dryomov
On Tue, May 23, 2017 at 4:16 PM, Jens Axboe <ax...@fb.com> wrote:
> On 05/23/2017 08:14 AM, Josef Bacik wrote:
>> On Tue, May 23, 2017 at 12:38:31PM +0200, Ilya Dryomov wrote:
>>> nbd_config is allocated in nbd_alloc_config(), but never freed.
>>>
>>> Fixes: 5ea8d10802ec ("nbd: separate out the config information")
>>> Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
>>> ---
>>>  drivers/block/nbd.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>
>> Oops, thanks Ilya
>>
>> Reviewed-by: Josef Bacik <jba...@fb.com>
>
> Since config is referenced, why isn't this done in nbd_config_put()
> instead?

Josef's 5ea8d10802ec added that nbd->config = NULL to nbd_reset(), so
I followed his lead.  It could be done nbd_config_put() -- nbd_reset()
call in nbd_dev_add() looks like a no-op to me.  Josef?

Thanks,

Ilya


[PATCH] nbd: don't leak nbd_config

2017-05-23 Thread Ilya Dryomov
nbd_config is allocated in nbd_alloc_config(), but never freed.

Fixes: 5ea8d10802ec ("nbd: separate out the config information")
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 drivers/block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9a7bb2c29447..882fb9efbab1 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -940,6 +940,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, 
unsigned long arg)
 /* Reset all properties of an NBD device */
 static void nbd_reset(struct nbd_device *nbd)
 {
+   kfree(nbd->config);
nbd->config = NULL;
nbd->tag_set.timeout = 0;
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
-- 
2.4.3



[PATCH] block: get rid of blk_integrity_revalidate()

2017-04-18 Thread Ilya Dryomov
Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.peter...@oracle.com>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Mike Snitzer <snit...@redhat.com>
Cc: sta...@vger.kernel.org # 4.4+, needs backporting
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
I don't have any integrity-capable disks or nvme separate-metadata
cards, so I couldn't test this as well as I wanted to.  25520d55cdb6
went in with some nvme work, but Martin recalls that it may have had
something to do with integrity-capable dm arrays losing the integrity
capability at runtime.  Previous discussion at

http://www.spinics.net/lists/ceph-devel/msg35413.html

---
 block/blk-integrity.c | 19 ++-
 block/partition-generic.c |  1 -
 fs/block_dev.c|  1 -
 include/linux/genhd.h |  2 --
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 9f0ff5ba4f84..35c5af1ea068 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@ void blk_integrity_register(struct gendisk *disk, struct 
blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;
 
-   blk_integrity_revalidate(disk);
+   disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);
 
@@ -430,26 +430,11 @@ EXPORT_SYMBOL(blk_integrity_register);
  */
 void blk_integrity_unregister(struct gendisk *disk)
 {
-   blk_integrity_revalidate(disk);
+   disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
memset(>queue->integrity, 0, sizeof(struct blk_integrity));
 }
 EXPORT_SYMBOL(blk_integrity_unregister);
 
-void blk_integrity_revalidate(struct gendisk *disk)
-{
-   struct blk_integrity *bi = >queue->integrity;
-
-   if (!(disk->flags & GENHD_FL_UP))
-   return;
-
-   if (bi->profile)
-   disk->queue->backing_dev_info->capabilities |=
-   BDI_CAP_STABLE_WRITES;
-   else
-   disk->queue->backing_dev_info->capabilities &=
-   ~BDI_CAP_STABLE_WRITES;
-}
-
 void blk_integrity_add(struct gendisk *disk)
 {
if (kobject_init_and_add(>integrity_kobj, _ktype,
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 7afb9907821f..0171a2faad68 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -497,7 +497,6 @@ int rescan_partitions(struct gendisk *disk, struct 
block_device *bdev)
 
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
-   blk_integrity_revalidate(disk);
check_disk_size_change(disk, bdev);
bdev->bd_invalidated = 0;
if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2eca00ec4370..56039dfbc674 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1451,7 +1451,6 @@ int revalidate_disk(struct gendisk *disk)
 
if (disk->fops->revalidate_disk)
ret = disk->fops->revalidate_disk(disk);
-   blk_integrity_revalidate(disk);
bdev = bdget_disk(disk, 0);
if (!bdev)
return ret;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 76f39754e7b0..76d6a1cd4153 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -722,11 +722,9 @@ static inline void par

Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

2017-03-10 Thread Ilya Dryomov
On Thu, Mar 2, 2017 at 4:24 AM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryo...@gmail.com> writes:
>
> Ilya,
>
> Ilya> Given the above, I'm not sure what the baseline is --
> Ilya> blk_integrity code isn't invoked for data-only lbafs.  Could nvme
> Ilya> folks please look at this?  rbd regression caused by integrity
> Ilya> revalidate change goes back to 4.4 and I'd really like to get it
> Ilya> fixed.  The proposed patch is attached to my earlier reply on
> Ilya> linux-block.
>
> I've had a couple of fires to attend to this week. I'll try and give
> your patch a spin on my FC setup tomorrow or Friday.

Hi Martin,

Ping...  Did you get a chance to try it?

Thanks,

Ilya


Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

2017-02-28 Thread Ilya Dryomov
On Fri, Feb 24, 2017 at 12:49 AM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryo...@gmail.com> writes:
>
> Ilya,
>
> Ilya> Well, blk_integrity_revalidate() doesn't clear the profile, it
> Ilya> just clears the stable pages flag.  Whoever calls
> Ilya> blk_integrity_unregister() to clear the profile can also clear the
> Ilya> stable pages flag -- why not let blk_integrity_unregister() clear
> Ilya> the flag like I suggested?
>
> That's what it used to do.
>
> blk_integrity_revalidate() was obviously introduced to overcome some
> problem. Unfortunately, I can't recall what that was and Google isn't
> being particularly helpful. I suspect it was either in the NVDIMM or
> NVMe camps since that's where the churn was.

Adding more people from 25520d55cdb6 -- does anyone remember the story
behind blk_integrity_revalidate()?

>
> I don't have a problem with your patch as long as we're sure there are
> no regressions. I would carry the gendisk check over, though.

I spent quite some time trying to do basic tests on nvme, since it uses
both nop_profile and real integrity profiles.  Unfortunately I don't
have access to an nvme card with separate-metadata capability and
upstream qemu driver doesn't support any metadata capabilities at all,
so I turned to https://github.com/OpenChannelSSD/qemu-nvme.  It lets
you configure mc and multiple lbafs, but

commit ff646a5841f743fdd9cfef138ac6be1f0ba4fbfb
Author: Keith Busch <keith.bu...@intel.com> Thu Oct 23 20:02:44 2014

As part of this, I am removing all meta-data support. It's not well
supported in any immediatly available operating system, so the code
is not well tested.

In the end I got it into a semi-working state by checking out an old
version and dropping some sanity checks.  I formatted the namespace in
a number of ways -- bdi/stable_pages_required and integrity/format
behaved as expected ("none", "nop", "T10-DIF-TYPE1-CRC").

Given the above, I'm not sure what the baseline is -- blk_integrity
code isn't invoked for data-only lbafs.  Could nvme folks please look
at this?  rbd regression caused by integrity revalidate change goes
back to 4.4 and I'd really like to get it fixed.  The proposed patch
is attached to my earlier reply on linux-block.

Thanks,

Ilya


Re: blk_integrity_revalidate() clears BDI_CAP_STABLE_WRITES

2017-02-22 Thread Ilya Dryomov
On Wed, Feb 22, 2017 at 5:41 AM, Martin K. Petersen
<martin.peter...@oracle.com> wrote:
>>>>>> "Ilya" == Ilya Dryomov <idryo...@gmail.com> writes:
>
> Ilya,
>
> Ilya> could you please explain blk_integrity_revalidate() and
> Ilya> its GENHD_FL_UP check in particular?  We have the queue,
> Ilya> bi->profile can't be NULL after blk_integrity_register(), and
> Ilya> since the latter "must" be used for registering the profile with
> Ilya> the block layer, wouldn't the following be sufficient for
> Ilya> blk_integrity users?
>
> IIrc, the FL_UP check fixed a registration problem in the nvme driver.

Did it have something to do with a NULL disk->queue?

>
> The rationale behind revalidate was that we need to handle devices which
> lose the integrity capability at runtime (i.e. a integrity-enabled DM
> device is extended with a non-cable drive forcing the feature to be
> turned off). The clearing of the integrity profile is more important in
> that case than zapping the stable pages flag. But that was the original
> reason for not just ORing BDI_CAP_STABLE_WRITES.
>
> I don't have a huge problem with keeping stable pages on if a device
> suddenly stops being integrity capable. However, I'd like to understand
> your use case a bit better.

Well, blk_integrity_revalidate() doesn't clear the profile, it just
clears the stable pages flag.  Whoever calls blk_integrity_unregister()
to clear the profile can also clear the stable pages flag -- why not
let blk_integrity_unregister() clear the flag like I suggested?

>
> Ilya> The alternative seems to be to set up a bogus
> Ilya> blk_integrity_profile (nop_profile won't do -- this one would have
> Ilya> to be truly bogus w/ NULL *_fn) under BLK_DEV_INTEGRITY ifdefs and
> Ilya> hope that nothing breaks.
>
> Can you point me to the relevant code on your end?

This code doesn't exist yet -- it's just something I thought I'd have
to do if my patch or something along those lines isn't accepted.  There
is the nop_profile with stub *_fn callbacks, which is used by nvme and
nvdimm to make bio_integrity_enabled() return true, so that per-interval
buffers are allocated in bio_integrity_prep().  "I want some space for
per-interval metadata, but no integrity checksums" is use case there.

In the rbd case, we don't want that.  We just want to set the stable
pages flag and make sure it's not reset by blk_integrity code.  So, if
blk_integrity_revalidate() isn't fixed, rbd will need to set up a bogus
profile with NULL *_fn callbacks to make bio_integrity_enabled() return
false.  This is obviously fragile, because blk_get_integrity() will no
longer return NULL...

We are setting BDI_CAP_STABLE_WRITES in rbd_init_disk(), see commit
bae818ee1577 ("rbd: require stable pages if message data CRCs are
enabled").  The CRCs are generated in write_partial_message_data() and
verified in read_partial_msg_data().

I'm attaching the patch I had in mind.

Thanks,

Ilya
From 24a0cf8bc437a65b3986e9ab25cf2f05e6bbd5d0 Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryo...@gmail.com>
Date: Mon, 20 Feb 2017 18:50:50 +0100
Subject: [PATCH] block: get rid of blk_integrity_revalidate()

Commit 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which clears the stable pages
bit if no blk_integrity profile is registered.  It's called from
revalidate_disk() and rescan_partitions(), making it impossible to set
BDI_CAP_STABLE_WRITES for drivers that support partitions and don't use
blk_integrity.  One such driver is rbd, where the ceph messenger is
responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cdb6 ("block: Inline blk_integrity in struct gendisk")
Signed-off-by: Ilya Dryomov <idryo...@gmail.com>
---
 block/blk-integrity.c | 19 ++-
 block/partition-generic.c |  1 -
 fs/block_dev.c|  1 -
 include/linux/genhd.h |  2 --
 4 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index d69c5c79f98e..319f2e4f4a8b 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -417,7 +417,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
 	bi->tuple_size = template->tuple_size;
 	bi->tag_size = template->tag_size;
 
-	blk_integrity_revalidate(disk);
+	disk->queue->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
 }
 EXPORT_SYMBOL(blk_integrity_register);
 
@@ -