Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Jens Axboe
On 2/1/18 12:52 PM, Keith Busch wrote:
> On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
>> I was able to reproduce on a test box, pretty trivially in fact:
>>
>> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
>> # mkfs.ext4 /dev/nvme2n1
>> # mount /dev/nvme2n1 /data -o discard
>> # dd if=/dev/zero of=/data/10g bs=1M count=10k
>> # sync
>> # rm /data/10g
>> # sync <- triggered
>>
>> Your patch still doesn't work, but mainly because we init the segments
>> to 0 when setting up a discard. The below works for me, and cleans up
>> the merge path a bit, since your patch was missing various adjustments
>> on both the merged and freed request.
> 
> I'm still finding cases not accounted even your patch. I had to use the
> following on top of that, and this pattern looks like it needs to be
> repeated for all schedulers:
> 
> ---
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..25c14c58385c 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, 
> struct bio *bio,
>   if (!*merged_request)
>   elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
>   return true;
> + case ELEVATOR_DISCARD_MERGE:
> + return bio_attempt_discard_merge(q, rq, bio);

Yes, we need this to enable the bio-to-request merging.

> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index c56f211c8440..a0f5752b6858 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, 
> struct request **rq,
>  
>   if (elv_bio_merge_ok(__rq, bio)) {
>   *rq = __rq;
> - return ELEVATOR_FRONT_MERGE;
> + return blk_try_merge(__rq, bio);

This isn't needed. We already know it's a front merge at this point,
and it can only be a normal read/write request.

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered
> 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

I'm still finding cases not accounted even your patch. I had to use the
following on top of that, and this pattern looks like it needs to be
repeated for all schedulers:

---
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55c0a745b427..25c14c58385c 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -259,6 +259,8 @@ bool blk_mq_sched_try_merge(struct request_queue *q, struct 
bio *bio,
if (!*merged_request)
elv_merged_request(q, rq, ELEVATOR_FRONT_MERGE);
return true;
+   case ELEVATOR_DISCARD_MERGE:
+   return bio_attempt_discard_merge(q, rq, bio);
default:
return false;
}
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index c56f211c8440..a0f5752b6858 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -451,7 +451,7 @@ static int dd_request_merge(struct request_queue *q, struct 
request **rq,
 
if (elv_bio_merge_ok(__rq, bio)) {
*rq = __rq;
-   return ELEVATOR_FRONT_MERGE;
+   return blk_try_merge(__rq, bio);
}
}
 
--


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 10:58:23AM -0700, Jens Axboe wrote:
> I was able to reproduce on a test box, pretty trivially in fact:
> 
> # echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
> # mkfs.ext4 /dev/nvme2n1
> # mount /dev/nvme2n1 /data -o discard
> # dd if=/dev/zero of=/data/10g bs=1M count=10k
> # sync
> # rm /data/10g
> # sync <- triggered

Nice! Thanks, this recipe works for me too.
 
> Your patch still doesn't work, but mainly because we init the segments
> to 0 when setting up a discard. The below works for me, and cleans up
> the merge path a bit, since your patch was missing various adjustments
> on both the merged and freed request.

Yep, your update is very similiar to my real patch, but I'm missing one
thing (elv_merge_requests). If you're already testing successfully with
your patch, I don't mind if you want to move forward with yours.

 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a2005a485335..e4561c95fc23 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct 
> request *rq,
>  {
>   if (bio_has_data(bio))
>   rq->nr_phys_segments = bio_phys_segments(q, bio);
> + else if (bio_op(bio) == REQ_OP_DISCARD)
> + rq->nr_phys_segments = 1;
>  
>   rq->__data_len = bio->bi_iter.bi_size;
>   rq->bio = rq->biotail = bio;
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..782940c65d8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
>   return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct 
> request *req,
> + struct request *next)
> +{
> + unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> + if (segments >= queue_max_discard_segments(q))
> + goto no_merge;
> + if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> + blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> + goto no_merge;
> +
> + req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> + return true;
> +no_merge:
> + req_set_nomerge(q, req);
> + return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>   struct request *next)
>  {
> @@ -683,9 +701,13 @@ static struct request *attempt_merge(struct 
> request_queue *q,
>* If we are allowed to merge, then append bio list
>* from next to rq and release next. merge_requests_fn
>* will have updated segment counts, update sector
> -  * counts here.
> +  * counts here. Handle DISCARDs separately, as they
> +  * have separate settings.
>*/
> - if (!ll_merge_requests_fn(q, req, next))
> + if (req_op(req) == REQ_OP_DISCARD) {
> + if (!req_attempt_discard_merge(q, req, next))
> + return NULL;
> + } else if (!ll_merge_requests_fn(q, req, next))
>   return NULL;
>  
>   /*
> @@ -715,7 +737,8 @@ static struct request *attempt_merge(struct request_queue 
> *q,
>  
>   req->__data_len += blk_rq_bytes(next);
>  
> - elv_merge_requests(q, req, next);
> + if (req_op(req) != REQ_OP_DISCARD)
> + elv_merge_requests(q, req, next);
>  
>   /*
>* 'next' is going away, so update stats accordingly
> 
> -- 
> Jens Axboe
> 


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Jens Axboe
On 2/1/18 8:26 AM, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
>> On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>>> if (total_phys_segments > queue_max_segments(q))
>>> -   return 0;
>>> +   return 0;
>>
>> This perhaps unintended change happens to point out another problem:
>> queue_max_segments is the wrong limit for discards, which require
>> queue_max_discard_segments.
>>
>> It might be easier to merge discard requests special, like how merging
>> a discard bio is handled (untested).
> 
> Yeah agreed, we should just split it up completely instead instead of
> special casing it in the read/write path.
> 
> 
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..01671e1373ff 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>>  return !q->mq_ops && req->special;
>>  }
>>  
>> +static bool req_attempt_discard_merge(struct request_queue *q, struct 
>> request *req,
>> +struct request *next)
>> +{
>> +unsigned short segments = blk_rq_nr_discard_segments(req);
>> +
>> +if (segments >= queue_max_discard_segments(q))
>> +goto no_merge;
>> +if (blk_rq_sectors(req) + bio_sectors(next->bio) >
>> +blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>> +goto no_merge;
>> +
>> +req->biotail->bi_next = next->bio;
>> +req->biotail = next->biotail;
>> +req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
>> +next->bio = NULL;
>> +return true;
>> +
>> +no_merge:
>> +req_set_nomerge(q, req);
>> +return false;
>> +}
>> +
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request 
>> *req,
>>  struct request *next)
>>  {
>> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct 
>> request_queue *q,
>>  if (req->write_hint != next->write_hint)
>>  return NULL;
>>  
>> +/*
>> + * Discards are ... special.
>> + */
>> +if (req_op(req) == REQ_OP_DISCARD) {
>> +if (req_attempt_discard_merge(q, req, next))
>> +return next;
>> +return NULL;
>> +}
>> +
>>  /*
>>   * If we are allowed to merge, then append bio list
>>   * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

I was able to reproduce on a test box, pretty trivially in fact:

# echo mq-deadline > /sys/block/nvme2n1/queue/scheduler
# mkfs.ext4 /dev/nvme2n1
# mount /dev/nvme2n1 /data -o discard
# dd if=/dev/zero of=/data/10g bs=1M count=10k
# sync
# rm /data/10g
# sync <- triggered

Your patch still doesn't work, but mainly because we init the segments
to 0 when setting up a discard. The below works for me, and cleans up
the merge path a bit, since your patch was missing various adjustments
on both the merged and freed request.


diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a485335..e4561c95fc23 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3282,6 +3282,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct 
request *rq,
 {
if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);
+   else if (bio_op(bio) == REQ_OP_DISCARD)
+   rq->nr_phys_segments = 1;
 
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..782940c65d8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,24 @@ static bool req_no_special_merge(struct request *req)
return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request 
*req,
+   struct request *next)
+{
+   unsigned short segments = blk_rq_nr_discard_segments(req);
+
+   if (segments >= queue_max_discard_segments(q))
+   goto no_merge;
+   if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+   goto no_merge;
+
+   req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+   return true;
+no_merge:
+   req_set_nomerge(q, req);
+   return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
@@ -683,9 +701,13 @@ static struct request *attempt_merge(struct request_queue 
*q,
 * If we are allowed to merge, then append bio list
 * from next to rq and release next. merge_requests_fn
 * will have updated segment counts, update sector
-* counts here.
+* counts here. Handle DISCARDs separately, as they
+* have separate settings.
 */
-   if (!ll_merge_requests_fn(q, req, next))
+  

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Keith Busch
On Thu, Feb 01, 2018 at 08:26:11AM -0700, Jens Axboe wrote:
> On 1/31/18 9:56 PM, Keith Busch wrote:
> 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 8452fc7164cc..01671e1373ff 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
> > return !q->mq_ops && req->special;
> >  }
> >  
> > +static bool req_attempt_discard_merge(struct request_queue *q, struct 
> > request *req,
> > +   struct request *next)
> > +{
> > +   unsigned short segments = blk_rq_nr_discard_segments(req);
> > +
> > +   if (segments >= queue_max_discard_segments(q))
> > +   goto no_merge;
> > +   if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> > +   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> > +   goto no_merge;
> > +
> > +   req->biotail->bi_next = next->bio;
> > +   req->biotail = next->biotail;
> > +   req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> > +   next->bio = NULL;
> > +   return true;
> > +
> > +no_merge:
> > +   req_set_nomerge(q, req);
> > +   return false;
> > +}
> > +
> >  static int ll_merge_requests_fn(struct request_queue *q, struct request 
> > *req,
> > struct request *next)
> >  {
> > @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct 
> > request_queue *q,
> > if (req->write_hint != next->write_hint)
> > return NULL;
> >  
> > +   /*
> > +* Discards are ... special.
> > +*/
> > +   if (req_op(req) == REQ_OP_DISCARD) {
> > +   if (req_attempt_discard_merge(q, req, next))
> > +   return next;
> > +   return NULL;
> > +   }
> > +
> > /*
> >  * If we are allowed to merge, then append bio list
> >  * from next to rq and release next. merge_requests_fn
> 
> This looks fine to me, the bio-to-request merge path already looks correct.
> Care to send a properly formatted patch?

Sending the patch now. It's a little different from the above so that it
doesn't need to duplicate some of the merging accounting.

Full disclosure, I have not found a way to trigger this merge. I'm just
running 'fio' with trim, randtrim, and trimwrite on device with
mq-deadline for the past hour, and haven't seen a merge happen yet.


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-02-01 Thread Jens Axboe
On 1/31/18 9:56 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>>  if (total_phys_segments > queue_max_segments(q))
>> -return 0;
>> +return 0;
> 
> This perhaps unintended change happens to point out another problem:
> queue_max_segments is the wrong limit for discards, which require
> queue_max_discard_segments.
> 
> It might be easier to merge discard requests special, like how merging
> a discard bio is handled (untested).

Yeah agreed, we should just split it up completely instead instead of
special casing it in the read/write path.


> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..01671e1373ff 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
>   return !q->mq_ops && req->special;
>  }
>  
> +static bool req_attempt_discard_merge(struct request_queue *q, struct 
> request *req,
> + struct request *next)
> +{
> + unsigned short segments = blk_rq_nr_discard_segments(req);
> +
> + if (segments >= queue_max_discard_segments(q))
> + goto no_merge;
> + if (blk_rq_sectors(req) + bio_sectors(next->bio) >
> + blk_rq_get_max_sectors(req, blk_rq_pos(req)))
> + goto no_merge;
> +
> + req->biotail->bi_next = next->bio;
> + req->biotail = next->biotail;
> + req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
> + next->bio = NULL;
> + return true;
> +
> +no_merge:
> + req_set_nomerge(q, req);
> + return false;
> +}
> +
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>   struct request *next)
>  {
> @@ -679,6 +701,15 @@ static struct request *attempt_merge(struct 
> request_queue *q,
>   if (req->write_hint != next->write_hint)
>   return NULL;
>  
> + /*
> +  * Discards are ... special.
> +  */
> + if (req_op(req) == REQ_OP_DISCARD) {
> + if (req_attempt_discard_merge(q, req, next))
> + return next;
> + return NULL;
> + }
> +
>   /*
>* If we are allowed to merge, then append bio list
>* from next to rq and release next. merge_requests_fn

This looks fine to me, the bio-to-request merge path already looks correct.
Care to send a properly formatted patch?

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote:
>   if (total_phys_segments > queue_max_segments(q))
> - return 0;
> + return 0;

This perhaps unintended change happens to point out another problem:
queue_max_segments is the wrong limit for discards, which require
queue_max_discard_segments.

It might be easier to merge discard requests special, like how merging
a discard bio is handled (untested).

---
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..01671e1373ff 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -550,6 +550,28 @@ static bool req_no_special_merge(struct request *req)
return !q->mq_ops && req->special;
 }
 
+static bool req_attempt_discard_merge(struct request_queue *q, struct request 
*req,
+   struct request *next)
+{
+   unsigned short segments = blk_rq_nr_discard_segments(req);
+
+   if (segments >= queue_max_discard_segments(q))
+   goto no_merge;
+   if (blk_rq_sectors(req) + bio_sectors(next->bio) >
+   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
+   goto no_merge;
+
+   req->biotail->bi_next = next->bio;
+   req->biotail = next->biotail;
+   req->nr_phys_segments = segments + blk_rq_nr_discard_segments(next);
+   next->bio = NULL;
+   return true;
+
+no_merge:
+   req_set_nomerge(q, req);
+   return false;
+}
+
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
@@ -679,6 +701,15 @@ static struct request *attempt_merge(struct request_queue 
*q,
if (req->write_hint != next->write_hint)
return NULL;
 
+   /*
+* Discards are ... special.
+*/
+   if (req_op(req) == REQ_OP_DISCARD) {
+   if (req_attempt_discard_merge(q, req, next))
+   return next;
+   return NULL;
+   }
+
/*
 * If we are allowed to merge, then append bio list
 * from next to rq and release next. merge_requests_fn
--


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 8:33 PM, jianchao.wang wrote:
> Sorry, Jens, I think I didn't get the point.
> Do I miss anything ?
> 
> On 02/01/2018 11:07 AM, Jens Axboe wrote:
>> Yeah I agree, and my last patch missed that we do care about segments for
>> discards. Below should be better...
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..055057bd727f 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>>  static int ll_merge_requests_fn(struct request_queue *q, struct request 
>> *req,
>>  struct request *next)
>>  {
>> -int total_phys_segments;
>> -unsigned int seg_size =
>> -req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
>> +int total_phys_segments = req->nr_phys_segments +
>> +next->nr_phys_segments;
> 
> For DISCARD reqs, the total_phys_segments is still zero here.

This seems broken, it should count the ranges in the discard request.

>> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue 
>> *q, struct request *req,
>>  blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  return 0;
>>  
>> -total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +/*
>> + * If the requests aren't carrying any data payloads, we don't need
>> + * to look at the segment count
>> + */
>> +if (bio_has_data(next->bio) &&
>> +blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +unsigned int seg_size = req->biotail->bi_seg_back_size +
>> +next->bio->bi_seg_front_size;
> 
> Yes, total_phys_segments will not be decreased.
> 
>> +
>>  if (req->nr_phys_segments == 1)
>>  req->bio->bi_seg_front_size = seg_size;
>>  if (next->nr_phys_segments == 1)
>> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
>> struct request *req,
>>  }
>>  
>>  if (total_phys_segments > queue_max_segments(q))
>> -return 0;
>> +return 0;
>>  
>>  if (blk_integrity_merge_rq(q, req, next) == false)
>>  return 0;
> 
> But finally, the merged DISCARD req's nr_phys_segment is still zero.
> 
> blk_rq_nr_discard_segments will return 1 but this req has two bios.
> blk_rq_nr_discard_segments 's comment says

They should have the range count. The discard merge stuff is a bit of
a hack, it would be nice to get that improved.

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread jianchao.wang
Sorry, Jens, I think I didn't get the point.
Do I miss anything ?

On 02/01/2018 11:07 AM, Jens Axboe wrote:
> Yeah I agree, and my last patch missed that we do care about segments for
> discards. Below should be better...
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..055057bd727f 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
>  static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
>   struct request *next)
>  {
> - int total_phys_segments;
> - unsigned int seg_size =
> - req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
> + int total_phys_segments = req->nr_phys_segments +
> + next->nr_phys_segments;

For DISCARD reqs, the total_phys_segments is still zero here.

>  
>   /*
>* First check if the either of the requests are re-queued
> @@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, 
> struct request *req,
>   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>   return 0;
>  
> - total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> - if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> + /*
> +  * If the requests aren't carrying any data payloads, we don't need
> +  * to look at the segment count
> +  */
> + if (bio_has_data(next->bio) &&
> + blk_phys_contig_segment(q, req->biotail, next->bio)) {
> + unsigned int seg_size = req->biotail->bi_seg_back_size +
> + next->bio->bi_seg_front_size;

Yes, total_phys_segments will not be decreased.

> +
>   if (req->nr_phys_segments == 1)
>   req->bio->bi_seg_front_size = seg_size;
>   if (next->nr_phys_segments == 1)
> @@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
> struct request *req,
>   }
>  
>   if (total_phys_segments > queue_max_segments(q))
> - return 0;
> + return 0;
>  
>   if (blk_integrity_merge_rq(q, req, next) == false)
>   return 0;

But finally, the merged DISCARD req's nr_phys_segment is still zero.

blk_rq_nr_discard_segments will return 1 but this req has two bios.
blk_rq_nr_discard_segments 's comment says
-- 
Each discard bio merged into a request is counted as one segment
--

Maybe patch below should be followed with yours.

diff --git a/block/blk-core.c b/block/blk-core.c
index a2005a4..b444fb7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1763,7 +1763,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
req->biotail = bio;
req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
-   req->nr_phys_segments = segments + 1;
 
blk_account_io_start(req, false);
return true;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4f3df80..1af2138 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1312,7 +1312,13 @@ static inline unsigned short 
blk_rq_nr_phys_segments(struct request *rq)
  */
 static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
 {
-   return max_t(unsigned short, rq->nr_phys_segments, 1);
+   struct bio *bio;
+   unsigned short count = 0;
+
+   __rq_for_each_bio(bio, req)
+   count ++;
+
+   return count;
 }


Thanks
Jianchao


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 8:03 PM, jianchao.wang wrote:
> Hi Jens
> 
> 
> On 01/31/2018 11:29 PM, Jens Axboe wrote:
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue 
>> *q, struct request *req,
>>  blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  return 0;
>>  
>> +/*
>> + * For DISCARDs, the segment count isn't interesting since
>> + * the requests have no data attached.
>> + */
>>  total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +if (total_phys_segments &&
>> +blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  if (req->nr_phys_segments == 1)
>>  req->bio->bi_seg_front_size = seg_size;
>>  if (next->nr_phys_segments == 1)
> 
> This patch will avoid the nr_phys_segments to be set to 0x,
> but the merged req will have two bios but zero nr_phys_segments.
> 
> We have to align with the DISCARD merging strategy.
> 
> Please refer to:
> /*
>  * Number of discard segments (or ranges) the driver needs to fill in.
>  * Each discard bio merged into a request is counted as one segment.
>  */
> static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
> {
>return max_t(unsigned short, rq->nr_phys_segments, 1);
> }
> bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
>   struct bio *bio)
> {
>   unsigned short segments = blk_rq_nr_discard_segments(req);
> 
>   if (segments >= queue_max_discard_segments(q))
>   goto no_merge;
>   if (blk_rq_sectors(req) + bio_sectors(bio) >
>   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>   goto no_merge;
> 
>   req->biotail->bi_next = bio;
>   req->biotail = bio;
>   req->__data_len += bio->bi_iter.bi_size;
>   req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
>   req->nr_phys_segments = segments + 1;
> 
>   blk_account_io_start(req, false);
>   return true;
> no_merge:
>   req_set_nomerge(q, req);
>   return false;
> }
> 
> blk_rq_nr_discard_segments will get a wrong value finally.
> 
> Maybe we could change blk_rq_nr_discard_segments to iterate and count the 
> bios in one request
> to decide the DISCARD request nr_phy_segment. And discard the 
> nr_phys_segments operations in
> the DISCARD merging path, plus your patch here.

Yeah I agree, and my last patch missed that we do care about segments for
discards. Below should be better...

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..055057bd727f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,8 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
-   int total_phys_segments;
-   unsigned int seg_size =
-   req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+   int total_phys_segments = req->nr_phys_segments +
+   next->nr_phys_segments;
 
/*
 * First check if the either of the requests are re-queued
@@ -574,8 +573,15 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
blk_rq_get_max_sectors(req, blk_rq_pos(req)))
return 0;
 
-   total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-   if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+   /*
+* If the requests aren't carrying any data payloads, we don't need
+* to look at the segment count
+*/
+   if (bio_has_data(next->bio) &&
+   blk_phys_contig_segment(q, req->biotail, next->bio)) {
+   unsigned int seg_size = req->biotail->bi_seg_back_size +
+   next->bio->bi_seg_front_size;
+
if (req->nr_phys_segments == 1)
req->bio->bi_seg_front_size = seg_size;
if (next->nr_phys_segments == 1)
@@ -584,7 +590,7 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
}
 
if (total_phys_segments > queue_max_segments(q))
-   return 0;
+   return 0;
 
if (blk_integrity_merge_rq(q, req, next) == false)
return 0;

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 4:33 PM, Keith Busch wrote:
> On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote:
>>
>> How about something like the below?
>>
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 8452fc7164cc..cee102fb060e 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue 
>> *q, struct request *req,
>>  blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>>  return 0;
>>  
>> +/*
>> + * For DISCARDs, the segment count isn't interesting since
>> + * the requests have no data attached.
>> + */
>>  total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
>> -if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
>> +if (total_phys_segments &&
>> +blk_phys_contig_segment(q, req->biotail, next->bio)) {
>>  if (req->nr_phys_segments == 1)
>>  req->bio->bi_seg_front_size = seg_size;
>>  if (next->nr_phys_segments == 1)
> 
> That'll keep it from going to 0x, but you'll still hit the warning and
> IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
> will return 1, and since you really had 2 segments, the nvme driver will
> overrun its array.

Yeah you are right, that patch was shit. How about the below? We only
need to worry about segment size and number of segments if we are
carrying data. req->biotail and next->bio must be the same type, so
should be safe.


diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8452fc7164cc..cf9adc4c64b5 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -553,9 +553,7 @@ static bool req_no_special_merge(struct request *req)
 static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
struct request *next)
 {
-   int total_phys_segments;
-   unsigned int seg_size =
-   req->biotail->bi_seg_back_size + next->bio->bi_seg_front_size;
+   int total_phys_segments = 0;
 
/*
 * First check if the either of the requests are re-queued
@@ -574,17 +572,27 @@ static int ll_merge_requests_fn(struct request_queue *q, 
struct request *req,
blk_rq_get_max_sectors(req, blk_rq_pos(req)))
return 0;
 
-   total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
-   if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
-   if (req->nr_phys_segments == 1)
-   req->bio->bi_seg_front_size = seg_size;
-   if (next->nr_phys_segments == 1)
-   next->biotail->bi_seg_back_size = seg_size;
-   total_phys_segments--;
-   }
+   /*
+* If the requests aren't carrying any data payloads, we don't need
+* to look at the segment count
+*/
+   if (bio_has_data(next->bio)) {
+   total_phys_segments = req->nr_phys_segments +
+   next->nr_phys_segments;
+   if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
+   unsigned int seg_size = req->biotail->bi_seg_back_size +
+   next->bio->bi_seg_front_size;
+
+   if (req->nr_phys_segments == 1)
+   req->bio->bi_seg_front_size = seg_size;
+   if (next->nr_phys_segments == 1)
+   next->biotail->bi_seg_back_size = seg_size;
+   total_phys_segments--;
+   }
 
-   if (total_phys_segments > queue_max_segments(q))
-   return 0;
+   if (total_phys_segments > queue_max_segments(q))
+   return 0;
+   }
 
if (blk_integrity_merge_rq(q, req, next) == false)
return 0;

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread jianchao.wang
Hi Jens


On 01/31/2018 11:29 PM, Jens Axboe wrote:
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, 
> struct request *req,
>   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>   return 0;
>  
> + /*
> +  * For DISCARDs, the segment count isn't interesting since
> +  * the requests have no data attached.
> +  */
>   total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> - if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> + if (total_phys_segments &&
> + blk_phys_contig_segment(q, req->biotail, next->bio)) {
>   if (req->nr_phys_segments == 1)
>   req->bio->bi_seg_front_size = seg_size;
>   if (next->nr_phys_segments == 1)

This patch will avoid the nr_phys_segments to be set to 0x,
but the merged req will have two bios but zero nr_phys_segments.

We have to align with the DISCARD merging strategy.

Please refer to:
/*
 * Number of discard segments (or ranges) the driver needs to fill in.
 * Each discard bio merged into a request is counted as one segment.
 */
static inline unsigned short blk_rq_nr_discard_segments(struct request *rq)
{
   return max_t(unsigned short, rq->nr_phys_segments, 1);
}
bool bio_attempt_discard_merge(struct request_queue *q, struct request *req,
struct bio *bio)
{
unsigned short segments = blk_rq_nr_discard_segments(req);

if (segments >= queue_max_discard_segments(q))
goto no_merge;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req)))
goto no_merge;

req->biotail->bi_next = bio;
req->biotail = bio;
req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
req->nr_phys_segments = segments + 1;

blk_account_io_start(req, false);
return true;
no_merge:
req_set_nomerge(q, req);
return false;
}

blk_rq_nr_discard_segments will get a wrong value finally.

Maybe we could change blk_rq_nr_discard_segments to iterate and count the bios 
in one request
to decide the DISCARD request nr_phy_segment. And discard the nr_phys_segments 
operations in
the DISCARD merging path, plus your patch here.

Thanks
Jianchao


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote:
> 
> How about something like the below?
> 
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 8452fc7164cc..cee102fb060e 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -574,8 +574,13 @@ static int ll_merge_requests_fn(struct request_queue *q, 
> struct request *req,
>   blk_rq_get_max_sectors(req, blk_rq_pos(req)))
>   return 0;
>  
> + /*
> +  * For DISCARDs, the segment count isn't interesting since
> +  * the requests have no data attached.
> +  */
>   total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
> - if (blk_phys_contig_segment(q, req->biotail, next->bio)) {
> + if (total_phys_segments &&
> + blk_phys_contig_segment(q, req->biotail, next->bio)) {
>   if (req->nr_phys_segments == 1)
>   req->bio->bi_seg_front_size = seg_size;
>   if (next->nr_phys_segments == 1)

That'll keep it from going to 0x, but you'll still hit the warning and
IO error. Even worse, this will corrupt memory: blk_rq_nr_discard_segments
will return 1, and since you really had 2 segments, the nvme driver will
overrun its array.


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/30/18 9:25 PM, jianchao.wang wrote:
> Hi Jens
> 
> On 01/30/2018 11:57 PM, Jens Axboe wrote:
>> On 1/30/18 8:41 AM, Jens Axboe wrote:
>>> Hi,
>>>
>>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>>> as of yesterday, right after the block tree merge:
>>>
>>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>>> Merge: 9697e9da8429 796baeeef85a
>>> Author: Linus Torvalds 
>>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>>
>>> Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>>
>>> It's hitting this case:
>>>
>>> if (WARN_ON_ONCE(n != segments)) {  
>>> 
>>>
>>> in nvme, indicating that rq->nr_phys_segments does not equal the
>>> number of bios in the request. Anyone seen this? I'll take a closer
>>> look as time permits, full trace below.
>>>
>>>
>>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
>>> nvme_setup_cmd+0x3d3/0x430
>>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
>>> snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
>>> snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
>>> snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
>>> x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
>>> snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
>>> videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
>>> crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
>>> hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
>>> intel_gtt
>>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ 
>>> #176
>>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 
>>> 12/19/2017
>>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>>  RSP: 0018:880423e9f838 EFLAGS: 00010217
>>>  RAX:  RBX: 880423e9f8c8 RCX: 0001
>>>  RDX: 88022b200010 RSI: 0002 RDI: 327f
>>>  RBP: 880421251400 R08: 88022b20 R09: 0009
>>>  R10:  R11:  R12: 
>>>  R13: 88042341e280 R14:  R15: 880421251440
>>>  FS:  () GS:88044150() 
>>> knlGS:
>>>  CS:  0010 DS:  ES:  CR0: 80050033
>>>  CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
>>>  DR0:  DR1:  DR2: 
>>>  DR3:  DR6: fffe0ff0 DR7: 0400
>>>  Call Trace:
>>>   nvme_queue_rq+0x40/0xa00
>>>   ? __sbitmap_queue_get+0x24/0x90
>>>   ? blk_mq_get_tag+0xa3/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>>   ? deadline_remove_request+0x49/0xb0
>>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>>   __blk_mq_run_hw_queue+0x53/0xa0
>>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>>   blk_mq_run_hw_queue+0x6c/0xd0
>>>   blk_mq_sched_insert_request+0x96/0x140
>>>   __blk_mq_try_issue_directly+0x3d/0x190
>>>   blk_mq_try_issue_directly+0x30/0x70
>>>   blk_mq_make_request+0x1a4/0x6a0
>>>   generic_make_request+0xfd/0x2f0
>>>   ? submit_bio+0x5c/0x110
>>>   submit_bio+0x5c/0x110
>>>   ? __blkdev_issue_discard+0x152/0x200
>>>   submit_bio_wait+0x43/0x60
>>>   ext4_process_freed_data+0x1cd/0x440
>>>   ? account_page_dirtied+0xe2/0x1a0
>>>   ext4_journal_commit_callback+0x4a/0xc0
>>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>>   ? kjournald2+0xb0/0x250
>>>   kjournald2+0xb0/0x250
>>>   ? wait_woken+0x80/0x80
>>>   ? commit_timeout+0x10/0x10
>>>   kthread+0x111/0x130
>>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>>   ? do_group_exit+0x3a/0xa0
>>>   ret_from_fork+0x1f/0x30
>>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 
>>> 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 
>>> c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>>  ---[ end trace 50

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread jianchao.wang
Hi Jens

On 01/30/2018 11:57 PM, Jens Axboe wrote:
> On 1/30/18 8:41 AM, Jens Axboe wrote:
>> Hi,
>>
>> I just hit this on 4.15+ on the laptop, it's running Linus' git
>> as of yesterday, right after the block tree merge:
>>
>> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
>> Merge: 9697e9da8429 796baeeef85a
>> Author: Linus Torvalds 
>> Date:   Mon Jan 29 11:51:49 2018 -0800
>>
>> Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
>>
>> It's hitting this case:
>>
>> if (WARN_ON_ONCE(n != segments)) {   
>>
>>
>> in nvme, indicating that rq->nr_phys_segments does not equal the
>> number of bios in the request. Anyone seen this? I'll take a closer
>> look as time permits, full trace below.
>>
>>
>>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
>> nvme_setup_cmd+0x3d3/0x430
>>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
>> snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
>> snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
>> snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
>> x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
>> snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
>> videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
>> crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
>> hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
>> intel_gtt
>>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ 
>> #176
>>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 
>> 12/19/2017
>>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>>  RSP: 0018:880423e9f838 EFLAGS: 00010217
>>  RAX:  RBX: 880423e9f8c8 RCX: 0001
>>  RDX: 88022b200010 RSI: 0002 RDI: 327f
>>  RBP: 880421251400 R08: 88022b20 R09: 0009
>>  R10:  R11:  R12: 
>>  R13: 88042341e280 R14:  R15: 880421251440
>>  FS:  () GS:88044150() knlGS:
>>  CS:  0010 DS:  ES:  CR0: 80050033
>>  CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
>>  DR0:  DR1:  DR2: 
>>  DR3:  DR6: fffe0ff0 DR7: 0400
>>  Call Trace:
>>   nvme_queue_rq+0x40/0xa00
>>   ? __sbitmap_queue_get+0x24/0x90
>>   ? blk_mq_get_tag+0xa3/0x250
>>   ? wait_woken+0x80/0x80
>>   ? blk_mq_get_driver_tag+0x97/0xf0
>>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>>   ? deadline_remove_request+0x49/0xb0
>>   blk_mq_do_dispatch_sched+0x4f/0xc0
>>   blk_mq_sched_dispatch_requests+0x106/0x170
>>   __blk_mq_run_hw_queue+0x53/0xa0
>>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>>   blk_mq_run_hw_queue+0x6c/0xd0
>>   blk_mq_sched_insert_request+0x96/0x140
>>   __blk_mq_try_issue_directly+0x3d/0x190
>>   blk_mq_try_issue_directly+0x30/0x70
>>   blk_mq_make_request+0x1a4/0x6a0
>>   generic_make_request+0xfd/0x2f0
>>   ? submit_bio+0x5c/0x110
>>   submit_bio+0x5c/0x110
>>   ? __blkdev_issue_discard+0x152/0x200
>>   submit_bio_wait+0x43/0x60
>>   ext4_process_freed_data+0x1cd/0x440
>>   ? account_page_dirtied+0xe2/0x1a0
>>   ext4_journal_commit_callback+0x4a/0xc0
>>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>>   ? kjournald2+0xb0/0x250
>>   kjournald2+0xb0/0x250
>>   ? wait_woken+0x80/0x80
>>   ? commit_timeout+0x10/0x10
>>   kthread+0x111/0x130
>>   ? kthread_create_worker_on_cpu+0x50/0x50
>>   ? do_group_exit+0x3a/0xa0
>>   ret_from_fork+0x1f/0x30
>>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 
>> 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 
>> 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>>  ---[ end trace 50d361cc444506c8 ]---
>>  print_req_error: I/O error, dev nvme0n1, sector 847167488
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
> 

Let's consider the case following:

blk_mq_bio_to_request()
  -> blk_init_request_from_bio()
-> blk_rq_bio_prep()

if (bio_has_data(bio))
rq->nr_phys_segments = bio_phys_segments(q, bio);

static inline bool bio_has_d

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Jens Axboe
On 1/30/18 1:49 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote:
>> On 1/30/18 1:30 PM, Keith Busch wrote:
>>> On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:

 Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
>>>
>>> Is this still a problem if you don't use an IO scheduler? With deadline,
>>> I'm not finding any path to bio_attempt_discard_merge which is where the
>>> nr_phys_segments is supposed to get it set to 2. Not sure how it could
>>> becmoe 0x, though.
>>
>> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> 
>> __blk_mq_sched_bio_merge()
>>  -> blk_mq_attempt_merge() -> bio_attempt_discard_merge()
> 
> That's the calls only if you don't have an elevator_queue, right? With
> deadline, it looks like it goes through this path (ftrace confirms):
> 
>   __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()
> 
> Which doesn't have a case for ELEVATOR_DISCARD_MERGE.
> 
> Relavant function_graph:
> 
>   46)   |blk_mq_make_request() {
>   46)   0.133 us|  blk_queue_bounce();
>   46)   0.370 us|  blk_queue_split();
>   46)   0.314 us|  bio_integrity_prep();
>   46)   0.081 us|  blk_attempt_plug_merge();
>   46)   |  __blk_mq_sched_bio_merge() {
>   46)   |dd_bio_merge() {
>   46)   0.792 us|  _raw_spin_lock();
>   46)   |  blk_mq_sched_try_merge() {

Yeah I guess you are right, it can't happen for mq-deadline since the
generic sched path doesn't support it.

I'll see if I can make it happen again, but I'm not too hopeful. And if
I can't, it's hard to compare with "none" to see if it makes a difference
or not.

-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Keith Busch
On Tue, Jan 30, 2018 at 01:32:25PM -0700, Jens Axboe wrote:
> On 1/30/18 1:30 PM, Keith Busch wrote:
> > On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
> >>
> >> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
> > 
> > Is this still a problem if you don't use an IO scheduler? With deadline,
> > I'm not finding any path to bio_attempt_discard_merge which is where the
> > nr_phys_segments is supposed to get it set to 2. Not sure how it could
> > becmoe 0x, though.
> 
> blk_mq_make_request() -> blk_mq_sched_bio_merge() -> 
> __blk_mq_sched_bio_merge()
>   -> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

That's the calls only if you don't have an elevator_queue, right? With
deadline, it looks like it goes through this path (ftrace confirms):

  __blk_mq_sched_bio_merge() -> dd_bio_merge() -> blk_mq_sched_try_merge()

Which doesn't have a case for ELEVATOR_DISCARD_MERGE.

Relavant function_graph:

  46)   |blk_mq_make_request() {
  46)   0.133 us|  blk_queue_bounce();
  46)   0.370 us|  blk_queue_split();
  46)   0.314 us|  bio_integrity_prep();
  46)   0.081 us|  blk_attempt_plug_merge();
  46)   |  __blk_mq_sched_bio_merge() {
  46)   |dd_bio_merge() {
  46)   0.792 us|  _raw_spin_lock();
  46)   |  blk_mq_sched_try_merge() {


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Jens Axboe
On 1/30/18 1:30 PM, Keith Busch wrote:
> On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
>>
>> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.
> 
> Is this still a problem if you don't use an IO scheduler? With deadline,
> I'm not finding any path to bio_attempt_discard_merge which is where the
> nr_phys_segments is supposed to get it set to 2. Not sure how it could
> becmoe 0x, though.

blk_mq_make_request() -> blk_mq_sched_bio_merge() -> __blk_mq_sched_bio_merge()
-> blk_mq_attempt_merge() -> bio_attempt_discard_merge()

Doesn't matter what IO scheduler you use.

I don't know if it triggers without a scheduler. I've been running this code
continually on the laptop (always do), and haven't seen it before today.


-- 
Jens Axboe



Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Keith Busch
On Tue, Jan 30, 2018 at 08:57:49AM -0700, Jens Axboe wrote:
> 
> Looking at the disassembly, 'n' is 2 and 'segments' is 0x.

Is this still a problem if you don't use an IO scheduler? With deadline,
I'm not finding any path to bio_attempt_discard_merge which is where the
nr_phys_segments is supposed to get it set to 2. Not sure how it could
becmoe 0x, though.


Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Jens Axboe
On 1/30/18 8:41 AM, Jens Axboe wrote:
> Hi,
> 
> I just hit this on 4.15+ on the laptop, it's running Linus' git
> as of yesterday, right after the block tree merge:
> 
> commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
> Merge: 9697e9da8429 796baeeef85a
> Author: Linus Torvalds 
> Date:   Mon Jan 29 11:51:49 2018 -0800
> 
> Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
> 
> It's hitting this case:
> 
> if (WARN_ON_ONCE(n != segments)) {
>   
> 
> in nvme, indicating that rq->nr_phys_segments does not equal the
> number of bios in the request. Anyone seen this? I'll take a closer
> look as time permits, full trace below.
> 
> 
>  WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
> nvme_setup_cmd+0x3d3/0x430
>  Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
> snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
> snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
> snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
> x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
> snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
> videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
> crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
> hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
> intel_gtt
>  CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ #176
>  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
>  RIP: 0010:nvme_setup_cmd+0x3d3/0x430
>  RSP: 0018:880423e9f838 EFLAGS: 00010217
>  RAX:  RBX: 880423e9f8c8 RCX: 0001
>  RDX: 88022b200010 RSI: 0002 RDI: 327f
>  RBP: 880421251400 R08: 88022b20 R09: 0009
>  R10:  R11:  R12: 
>  R13: 88042341e280 R14:  R15: 880421251440
>  FS:  () GS:88044150() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  Call Trace:
>   nvme_queue_rq+0x40/0xa00
>   ? __sbitmap_queue_get+0x24/0x90
>   ? blk_mq_get_tag+0xa3/0x250
>   ? wait_woken+0x80/0x80
>   ? blk_mq_get_driver_tag+0x97/0xf0
>   blk_mq_dispatch_rq_list+0x7b/0x4a0
>   ? deadline_remove_request+0x49/0xb0
>   blk_mq_do_dispatch_sched+0x4f/0xc0
>   blk_mq_sched_dispatch_requests+0x106/0x170
>   __blk_mq_run_hw_queue+0x53/0xa0
>   __blk_mq_delay_run_hw_queue+0x83/0xa0
>   blk_mq_run_hw_queue+0x6c/0xd0
>   blk_mq_sched_insert_request+0x96/0x140
>   __blk_mq_try_issue_directly+0x3d/0x190
>   blk_mq_try_issue_directly+0x30/0x70
>   blk_mq_make_request+0x1a4/0x6a0
>   generic_make_request+0xfd/0x2f0
>   ? submit_bio+0x5c/0x110
>   submit_bio+0x5c/0x110
>   ? __blkdev_issue_discard+0x152/0x200
>   submit_bio_wait+0x43/0x60
>   ext4_process_freed_data+0x1cd/0x440
>   ? account_page_dirtied+0xe2/0x1a0
>   ext4_journal_commit_callback+0x4a/0xc0
>   jbd2_journal_commit_transaction+0x17e2/0x19e0
>   ? kjournald2+0xb0/0x250
>   kjournald2+0xb0/0x250
>   ? wait_woken+0x80/0x80
>   ? commit_timeout+0x10/0x10
>   kthread+0x111/0x130
>   ? kthread_create_worker_on_cpu+0x50/0x50
>   ? do_group_exit+0x3a/0xa0
>   ret_from_fork+0x1f/0x30
>  Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 
> 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 
> 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
>  ---[ end trace 50d361cc444506c8 ]---
>  print_req_error: I/O error, dev nvme0n1, sector 847167488

Looking at the disassembly, 'n' is 2 and 'segments' is 0x.

-- 
Jens Axboe



WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-30 Thread Jens Axboe
Hi,

I just hit this on 4.15+ on the laptop, it's running Linus' git
as of yesterday, right after the block tree merge:

commit 0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0
Merge: 9697e9da8429 796baeeef85a
Author: Linus Torvalds 
Date:   Mon Jan 29 11:51:49 2018 -0800

Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block

It's hitting this case:

if (WARN_ON_ONCE(n != segments)) {  

in nvme, indicating that rq->nr_phys_segments does not equal the
number of bios in the request. Anyone seen this? I'll take a closer
look as time permits, full trace below.


 WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 
nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc 
snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat 
snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 
snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb 
snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer 
videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 
crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore 
hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd 
intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U   4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:880423e9f838 EFLAGS: 00010217
 RAX:  RBX: 880423e9f8c8 RCX: 0001
 RDX: 88022b200010 RSI: 0002 RDI: 327f
 RBP: 880421251400 R08: 88022b20 R09: 0009
 R10:  R11:  R12: 
 R13: 88042341e280 R14:  R15: 880421251440
 FS:  () GS:88044150() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 55b684795030 CR3: 02e09006 CR4: 001606e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 
8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 
0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff 
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

-- 
Jens Axboe