On 08/23/2016 03:14 PM, Jens Axboe wrote:
On 08/23/2016 03:11 PM, Jens Axboe wrote:
On 08/23/2016 02:54 PM, Gabriel Krisman Bertazi wrote:
Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com> writes:

Can you share what you ran to online/offline CPUs? I can't reproduce
this here.

I was using the ppc64_cpu tool, which shouldn't do nothing more than
write to sysfs.  but I just reproduced it with the script below.

Note that this is ppc64le.  I don't have a x86 in hand to attempt to
reproduce right now, but I'll look for one and see how it goes.

Hi,

Any luck on reproducing it?  We were initially reproducing with a
proprietary stress test, but I gave a try to a generated fio jobfile
associated with the SMT script I shared earlier and I could reproduce
the crash consistently in less than 10 minutes of execution.  this was
still ppc64le, though.  I couldn't get my hands on nvme on x86 yet.

Nope, I have not been able to reproduce it. How long does the CPU
offline/online actions take on ppc64? It's pretty slow on x86, which may
hide the issue. I took out the various printk's associated with bringing
a CPU off/online, as well as IRQ breaking parts, but didn't help in
reproducing it.

The job file I used, as well as the smt.sh script, in case you want to
give it a try:

jobfile: http://krisman.be/k/nvmejob.fio
smt.sh:  http://krisman.be/k/smt.sh

Still, the trigger seems to be consistently a heavy load of IO
associated with CPU addition/removal.

My workload looks similar to yours, in that it's high depth and with a
lot of jobs to keep most CPUs loaded. My bash script is different than
yours, I'll try that and see if it helps here.

Actually, I take that back. You're not using O_DIRECT, hence all your
jobs are running at QD=1, not the 256 specified. That looks odd, but
I'll try, maybe it'll hit something different.

Can you try this patch? It's not perfect, but I'll be interested if it
makes a difference for you.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 758a9b5..41def54 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -810,11 +810,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
        struct list_head *dptr;
        int queued;

-       WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
-
        if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
                return;

+       WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask));
+
        hctx->run++;

        /*
@@ -1075,15 +1075,11 @@ static void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx,
 }

void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue,
-               bool async)
+                          bool async)
 {
+       struct blk_mq_ctx *ctx = rq->mq_ctx;
        struct request_queue *q = rq->q;
        struct blk_mq_hw_ctx *hctx;
-       struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx;
-
-       current_ctx = blk_mq_get_ctx(q);
-       if (!cpu_online(ctx->cpu))
-               rq->mq_ctx = ctx = current_ctx;

        hctx = q->mq_ops->map_queue(q, ctx->cpu);

@@ -1093,8 +1089,6 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue,

        if (run_queue)
                blk_mq_run_hw_queue(hctx, async);
-
-       blk_mq_put_ctx(current_ctx);
 }

 static void blk_mq_insert_requests(struct request_queue *q,
@@ -1105,14 +1099,9 @@ static void blk_mq_insert_requests(struct request_queue *q,

 {
        struct blk_mq_hw_ctx *hctx;
-       struct blk_mq_ctx *current_ctx;

        trace_block_unplug(q, depth, !from_schedule);

-       current_ctx = blk_mq_get_ctx(q);
-
-       if (!cpu_online(ctx->cpu))
-               ctx = current_ctx;
        hctx = q->mq_ops->map_queue(q, ctx->cpu);

        /*
@@ -1125,14 +1114,12 @@ static void blk_mq_insert_requests(struct request_queue *q,

                rq = list_first_entry(list, struct request, queuelist);
                list_del_init(&rq->queuelist);
-               rq->mq_ctx = ctx;
                __blk_mq_insert_req_list(hctx, ctx, rq, false);
        }
        blk_mq_hctx_mark_pending(hctx, ctx);
        spin_unlock(&ctx->lock);

        blk_mq_run_hw_queue(hctx, from_schedule);
-       blk_mq_put_ctx(current_ctx);
 }

static int plug_ctx_cmp(void *priv, struct list_head *a, struct list_head *b) @@ -1692,6 +1679,11 @@ static int blk_mq_hctx_cpu_offline(struct blk_mq_hw_ctx *hctx, int cpu)
        while (!list_empty(&tmp)) {
                struct request *rq;

+               /*
+                * FIXME: we can't just move the req here. We'd have to
+                * pull off the bio chain and add it to a new request
+                * on the target hw queue
+                */
                rq = list_first_entry(&tmp, struct request, queuelist);
                rq->mq_ctx = ctx;
                list_move_tail(&rq->queuelist, &ctx->rq_list);

--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to