Re: Bug: swap discard issue with zram caused by "block: don't deal with discard limit in blkdev_issue_discard()"

2018-10-27 Thread Ming Lei
On Fri, Oct 26, 2018 at 08:50:20AM +0100, Rui Salvaterra wrote:
> On Fri, 26 Oct 2018 at 07:27, Ming Lei  wrote:
> >
> > The patch titled with 'block: make sure discard bio is aligned with logical 
> > block size'
> > in the list may fix this issue, please test and see if it works.
> >
> > Thanks,
> > Ming
> 
> Hi, Ming,
> 
> Thanks for the quick reply. Unfortunately, it doesn't seem to apply
> cleanly to 4.19. To which tree/tag/commit should I apply the series?

The whole patchset is against linus tree (44786880df196a4200).

I am sure that the 1st patch can be applied cleanly against v4.19, and
it is enough for fixing this issue, please double check.

[ming@ming linux]$ git log --oneline -1
84df9525b0c2 Linux 4.19
[ming@ming linux]$ git am ~/emails/blk/181026\ \[PATCH\ 1∕3\]\ block\:\ make\ 
sure\ discard\ bio\ is\ aligned\ with\ logical\ block\ size.eml
Applying: block: make sure discard bio is aligned with logical block size
[ming@ming linux]$ 


Thanks,
Ming


Re: [PATCH 1/3] block: make sure discard bio is aligned with logical block size

2018-10-27 Thread Ming Lei
On Fri, Oct 26, 2018 at 09:44:15AM +0200, Christoph Hellwig wrote:
> > if (req_sects > UINT_MAX >> 9)
> > -   req_sects = UINT_MAX >> 9;
> > +   req_sects = (UINT_MAX >> 9) & ~bs_mask;
> 
> Given that we have this same thing duplicated in write zeroes
> what about a documented helper?

IMO, using UINT_MAX & bs_mask is better because it is self-explanatory
in the context.

If we introduce one helper, it may not be easy to find a better
name than UINT_MAX.

thanks,
Ming


Re: remove exofs and the T10 OSD code V2

2018-10-27 Thread Nick Desaulniers
On Sat, Oct 27, 2018 at 1:20 AM Christoph Hellwig  wrote:
>
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
>
> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
>
> Quote from Boaz:
>
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
>
> But yes feel free to remove the all thing!!
>
> I guess I can put it up on github. In a public tree.
>
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"

Hi Christoph,
Thanks for the series.  I suggest you pick up Nathan's patch:
https://github.com/ClangBuiltLinux/linux/commit/a37fa82bde58960c5c966a5c1bef8ace6c9d4f34
that removed the configs that were removed in this series from the
various defconfigs.

I think the other thread still has some questions about when it's ok
to remove filesystems. But it's ultimately the maintainers decision.
Let's continue the discussion in the other thread.
-- 
Thanks,
~Nick Desaulniers


Re: recent issues with heavy delete's causing soft lockups

2018-10-27 Thread Jens Axboe
On Oct 27, 2018, at 12:40 PM, Thomas Fjellstrom  wrote:
> 
> Hi
> 
> As of the past few months or so I've been dealing with my workstation locking 
> up for upwards of minutes at a time when deleting a large directory tree. I 
> don't recall this being a problem before.
> 
> Current setup is 3 SATA SSDs in an lvm vg. most space is allocated to an ext4 
> /home where my work projects live.
> 
> The main use case causing problems is deleting the "out" directory of an 
> android AOSP build tree. It can be upwards of 95GB in size with 240k or more 
> files. If I run a `rm -fr out` or `make clean` it will lock up anything 
> attempting to use the disk (eg: plasma, intellij, android studio, chrome, 
> etc) 
> for sometimes minutes.
> 
> I have tried different block scheduler settings including none, mq-deadline, 
> kyber and bfq none of which seem to improve things much at all.
> 
> It may be worth noting that disk space is starting to run low, perhaps 
> there's 
> some interaction going on with free space handling or ssd wear leveling...
> 
> That said, it seems to have started happening (or at least made worse) some 
> time around when mq was made the default and only implementation for sata.
> 
> if it helps, my system specs are:
> 
> Kernel: Debian Sid's 4.18.0-2-amd64 (4.18.10-2)
> CPU: AMD FX-8320 OCed to 4.4Ghz
> RAM: 32GB DDR3 1866
> MB: Asus 970 Aura Pro Gaming
> Storage: Kingston HyperX 3K 240G + Samsung 850 Evo 250G + SanDisk X300 500G
> 
> I'm thinking of testing with a different or older kernel, what would be the 
> best to test with?

Can you try 4.19? A patch went in since 4.18 that fixes a starvation issue 
around requeue conditions, which SATA is the one to most often hit. 

Jens

recent issues with heavy delete's causing soft lockups

2018-10-27 Thread Thomas Fjellstrom
Hi

As of the past few months or so I've been dealing with my workstation locking 
up for upwards of minutes at a time when deleting a large directory tree. I 
don't recall this being a problem before.

Current setup is 3 SATA SSDs in an lvm vg. most space is allocated to an ext4 
/home where my work projects live.

The main use case causing problems is deleting the "out" directory of an 
android AOSP build tree. It can be upwards of 95GB in size with 240k or more 
files. If I run a `rm -fr out` or `make clean` it will lock up anything 
attempting to use the disk (eg: plasma, intellij, android studio, chrome, etc) 
for sometimes minutes.

I have tried different block scheduler settings including none, mq-deadline, 
kyber and bfq none of which seem to improve things much at all.

It may be worth noting that disk space is starting to run low, perhaps there's 
some interaction going on with free space handling or ssd wear leveling...

That said, it seems to have started happening (or at least made worse) some 
time around when mq was made the default and only implementation for sata.

if it helps, my system specs are:

Kernel: Debian Sid's 4.18.0-2-amd64 (4.18.10-2)
CPU: AMD FX-8320 OCed to 4.4Ghz
RAM: 32GB DDR3 1866
MB: Asus 970 Aura Pro Gaming
Storage: Kingston HyperX 3K 240G + Samsung 850 Evo 250G + SanDisk X300 500G

I'm thinking of testing with a different or older kernel, what would be the 
best to test with?

Thanks for any assistance.

-- 
Thomas Fjellstrom
tho...@fjellstrom.ca





Re: [PATCH 01/14] blk-mq: kill q->mq_map

2018-10-27 Thread Jens Axboe
On 10/27/18 10:48 AM, Jens Axboe wrote:
> On 10/27/18 8:19 AM, jianchao.wang wrote:
>> Hi Jens
>>
>> On 10/26/18 5:16 AM, Jens Axboe wrote:
>>> It's just a pointer to set->mq_map, use that instead.
>> Instead of using the set->mq_map and then a two-dimensional set->mq_map,
>> how about migrate the mq_map from per-set to per-cpuctx ?
>> something like:
>> q->queue_hw_ctx[ctx->map[type]]
> 
> I think the current series is pretty clean in that regard, it goes
> from members -> map -> map array. I'd be willing to look at a
> conversion on top of that, if it makes things cleaner.

On top of that, this:

q->queue_hw_ctx[set->map[type].mq_map[cpu]]

is one less pointer dereference, so more efficient.

-- 
Jens Axboe



Re: [PATCH 04/28] ide: convert to blk-mq

2018-10-27 Thread Jens Axboe
On 10/27/18 4:51 AM, Hannes Reinecke wrote:
> On 10/25/18 11:10 PM, Jens Axboe wrote:
>> ide-disk and ide-cd tested as working just fine, ide-tape and
>> ide-floppy haven't. But the latter don't require changes, so they
>> should work without issue.
>>
>> Add helper function to insert a request from a work queue, since we
>> cannot invoke the blk-mq request insertion from IRQ context.
>>
>> Cc: David Miller 
>> Signed-off-by: Jens Axboe 
>> ---
>>   drivers/ide/ide-atapi.c |  25 --
>>   drivers/ide/ide-cd.c| 175 +---
>>   drivers/ide/ide-disk.c  |   5 +-
>>   drivers/ide/ide-io.c| 101 +--
>>   drivers/ide/ide-park.c  |   4 +-
>>   drivers/ide/ide-pm.c|  28 ++-
>>   drivers/ide/ide-probe.c |  68 +++-
>>   include/linux/ide.h |  13 ++-
>>   8 files changed, 240 insertions(+), 179 deletions(-)
>>
> There are some things here which are curious (like always setting the 
> queue depth to 32, and the insert_head thingie), but as this is legacy 
> anyway:

The insert head addition is needed, the alternative would be
making the blk-mq locks irq safe. And that would be a performance
hit that isn't acceptable. Perhaps I should make that clearer
in the commit message...

The 32 hard code is a bit strange, but I doubt it matters that much.
It's enough to saturate the drive for sure, and it'll be twice that
number if a scheduler is attached, ensuring we have plenty of
request to get the merging we want.

> Reviewed-by: Hannes Reinecke 

Thanks!

-- 
Jens Axboe



Re: [PATCH 01/14] blk-mq: kill q->mq_map

2018-10-27 Thread Jens Axboe
On 10/27/18 8:19 AM, jianchao.wang wrote:
> Hi Jens
> 
> On 10/26/18 5:16 AM, Jens Axboe wrote:
>> It's just a pointer to set->mq_map, use that instead.
> Instead of using the set->mq_map and then a two-dimensional set->mq_map,
> how about migrate the mq_map from per-set to per-cpuctx ?
> something like:
> q->queue_hw_ctx[ctx->map[type]]

I think the current series is pretty clean in that regard, it goes
from members -> map -> map array. I'd be willing to look at a
conversion on top of that, if it makes things cleaner.

-- 
Jens Axboe



Re: [PATCH 01/14] blk-mq: kill q->mq_map

2018-10-27 Thread jianchao.wang
Hi Jens

On 10/26/18 5:16 AM, Jens Axboe wrote:
> It's just a pointer to set->mq_map, use that instead.
Instead of using the set->mq_map and then a two-dimensional set->mq_map,
how about migrate the mq_map from per-set to per-cpuctx ?
something like:
q->queue_hw_ctx[ctx->map[type]]


Thanks
Jianchao


[PATCH V5] block: fix the DISCARD request merge

2018-10-27 Thread Jianchao Wang
There are two cases when handle DISCARD merge.
If max_discard_segments == 1, the bios/requests need to be contiguous
to merge. If max_discard_segments > 1, it takes every bio as a range
and different range needn't to be contiguous.

But now, attempt_merge screws this up. It always consider contiguity
for DISCARD for the case max_discard_segments > 1 and cannot merge
contiguous DISCARD for the case max_discard_segments == 1, because
rq_attempt_discard_merge always returns false in this case.
This patch fixes both of the two cases above.

Signed-off-by: Jianchao Wang 
---

V5:
 - get rid of the redundant 'else' in blk_discard_mergable

V4:
 - introduce blk_try_req_merge as suggestion of Christoph.

V3:
 - Introduce blk_discard_mergable into attempt_merge and
   blk_try_merge.
 - Some comment changes.

V2:
 - Add max_discard_segments > 1 checking in attempt_merge.
 - Change patch title and comment.
 - Add more comment in attempt_merge

 block/blk-merge.c | 46 --
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 42a4674..6b5ad27 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -714,6 +714,31 @@ static void blk_account_io_merge(struct request *req)
part_stat_unlock();
}
 }
+/*
+ * Two cases of handling DISCARD merge:
+ * If max_discard_segments > 1, the driver takes every bio
+ * as a range and send them to controller together. The ranges
+ * needn't to be contiguous.
+ * Otherwise, the bios/requests will be handled as same as
+ * others which should be contiguous.
+ */
+static inline bool blk_discard_mergable(struct request *req)
+{
+   if (req_op(req) == REQ_OP_DISCARD &&
+   queue_max_discard_segments(req->q) > 1)
+   return true;
+   return false;
+}
+
+enum elv_merge blk_try_req_merge(struct request *req, struct request *next)
+{
+   if (blk_discard_mergable(req))
+   return ELEVATOR_DISCARD_MERGE;
+   else if (blk_rq_pos(req) + blk_rq_sectors(req) == blk_rq_pos(next))
+   return ELEVATOR_BACK_MERGE;
+
+   return ELEVATOR_NO_MERGE;
+}
 
 /*
  * For non-mq, this has to be called with the request spinlock acquired.
@@ -731,12 +756,6 @@ static struct request *attempt_merge(struct request_queue 
*q,
if (req_op(req) != req_op(next))
return NULL;
 
-   /*
-* not contiguous
-*/
-   if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next))
-   return NULL;
-
if (rq_data_dir(req) != rq_data_dir(next)
|| req->rq_disk != next->rq_disk
|| req_no_special_merge(next))
@@ -760,11 +779,19 @@ static struct request *attempt_merge(struct request_queue 
*q,
 * counts here. Handle DISCARDs separately, as they
 * have separate settings.
 */
-   if (req_op(req) == REQ_OP_DISCARD) {
+
+   switch (blk_try_req_merge(req, next)) {
+   case ELEVATOR_DISCARD_MERGE:
if (!req_attempt_discard_merge(q, req, next))
return NULL;
-   } else if (!ll_merge_requests_fn(q, req, next))
+   break;
+   case ELEVATOR_BACK_MERGE:
+   if (!ll_merge_requests_fn(q, req, next))
+   return NULL;
+   break;
+   default:
return NULL;
+   }
 
/*
 * If failfast settings disagree or any of the two is already
@@ -888,8 +915,7 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 
 enum elv_merge blk_try_merge(struct request *rq, struct bio *bio)
 {
-   if (req_op(rq) == REQ_OP_DISCARD &&
-   queue_max_discard_segments(rq->q) > 1)
+   if (blk_discard_mergable(rq))
return ELEVATOR_DISCARD_MERGE;
else if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
return ELEVATOR_BACK_MERGE;
-- 
2.7.4



Re: [PATCH V4] block: fix the DISCARD request merge

2018-10-27 Thread jianchao.wang
Hi Jens

On 10/26/18 10:25 PM, Jens Axboe wrote:
> On 10/26/18 3:28 AM, Jianchao Wang wrote:
>> There are two cases when handle DISCARD merge.
>> If max_discard_segments == 1, the bios/requests need to be contiguous
>> to merge. If max_discard_segments > 1, it takes every bio as a range
>> and different range needn't to be contiguous.
>>
>> But now, attempt_merge screws this up. It always consider contiguity
>> for DISCARD for the case max_discard_segments > 1 and cannot merge
>> contiguous DISCARD for the case max_discard_segments == 1, because
>> rq_attempt_discard_merge always returns false in this case.
>> This patch fixes both of the two cases above.
> 
> Looks good to me, but:
> 
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 42a4674..cf817c7a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -714,6 +714,32 @@ static void blk_account_io_merge(struct request *req)
>>  part_stat_unlock();
>>  }
>>  }
>> +/*
>> + * Two cases of handling DISCARD merge:
>> + * If max_discard_segments > 1, the driver takes every bio
>> + * as a range and send them to controller together. The ranges
>> + * needn't to be contiguous.
>> + * Otherwise, the bios/requests will be handled as same as
>> + * others which should be contiguous.
>> + */
>> +static inline bool blk_discard_mergable(struct request *req)
>> +{
>> +if (req_op(req) == REQ_OP_DISCARD &&
>> +queue_max_discard_segments(req->q) > 1)
>> +return true;
>> +else
>> +return false;
>> +}
> 
> Please get rid of the redundant else.
> 

OK, I will change it and post next version.

Thanks
Jianchao


Re: [PATCH 06/28] blk-mq: remove the request_list usage

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

We don't do anything with it, that's just the legacy path.

Signed-off-by: Jens Axboe 
---
  block/blk-mq.c | 5 -
  1 file changed, 5 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes


Re: [PATCH 07/28] blk-mq: remove legacy check in queue blk_freeze_queue()

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

Signed-off-by: Jens Axboe 
---
  block/blk-mq.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c82dc44d4d8..a58d2d953876 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -177,8 +177,6 @@ void blk_freeze_queue(struct request_queue *q)
 * exported to drivers as the only user for unfreeze is blk_mq.
 */
blk_freeze_queue_start(q);
-   if (!q->mq_ops)
-   blk_drain_queue(q);
blk_mq_freeze_queue_wait(q);
  }
  


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes



Re: [PATCH 04/28] ide: convert to blk-mq

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

ide-disk and ide-cd tested as working just fine, ide-tape and
ide-floppy haven't. But the latter don't require changes, so they
should work without issue.

Add helper function to insert a request from a work queue, since we
cannot invoke the blk-mq request insertion from IRQ context.

Cc: David Miller 
Signed-off-by: Jens Axboe 
---
  drivers/ide/ide-atapi.c |  25 --
  drivers/ide/ide-cd.c| 175 +---
  drivers/ide/ide-disk.c  |   5 +-
  drivers/ide/ide-io.c| 101 +--
  drivers/ide/ide-park.c  |   4 +-
  drivers/ide/ide-pm.c|  28 ++-
  drivers/ide/ide-probe.c |  68 +++-
  include/linux/ide.h |  13 ++-
  8 files changed, 240 insertions(+), 179 deletions(-)

There are some things here which are curious (like always setting the 
queue depth to 32, and the insert_head thingie), but as this is legacy 
anyway:


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes


Re: [PATCH 03/28] mspro_block: convert to blk-mq

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

Straight forward conversion, there's room for improvement.

Signed-off-by: Jens Axboe 
---
  drivers/memstick/core/mspro_block.c | 121 +++-
  1 file changed, 66 insertions(+), 55 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes


Re: [PATCH 02/28] ms_block: convert to blk-mq

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

Straight forward conversion, room for optimization in how everything
is punted to a work queue. Also looks plenty racy all over the map,
with the state changes. I fixed a bunch of them up while doing the
conversion, but there are surely more.

Cc: Maxim Levitsky 
Signed-off-by: Jens Axboe 
---
  drivers/memstick/core/ms_block.c | 110 +--
  drivers/memstick/core/ms_block.h |   1 +
  2 files changed, 62 insertions(+), 49 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes


Re: [PATCH 01/28] sunvdc: convert to blk-mq

2018-10-27 Thread Hannes Reinecke

On 10/25/18 11:10 PM, Jens Axboe wrote:

Convert from the old request_fn style driver to blk-mq.

Cc: David Miller 
Signed-off-by: Jens Axboe 
---
  drivers/block/sunvdc.c | 149 +++--
  1 file changed, 98 insertions(+), 51 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes


remove exofs and the T10 OSD code V2

2018-10-27 Thread Christoph Hellwig
The only real user of the T10 OSD protocol, the pNFS object layout
driver never went to the point of having shipping products, and we
removed it 1.5 years ago.  Exofs is just a simple example without
real life users.

The code has been mostly unmaintained for years and is getting in the
way of block / SCSI changes, so I think it's finally time to drop it.

Quote from Boaz:

"As I said then. It is used in Universities for studies and experiments.
Every once in a while. I get an email with questions and reports.

But yes feel free to remove the all thing!!

I guess I can put it up on github. In a public tree.

Just that I will need to forward port it myself, til now you guys
been doing this for me ;-)"