subject:"Device or HBA level QD throttling creates randomness in sequetial workload"

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2017-01-30 Thread Jens Axboe

On 01/30/2017 11:28 AM, Kashyap Desai wrote:
>> -Original Message-
>> From: Jens Axboe [mailto:ax...@kernel.dk]
>> Sent: Monday, January 30, 2017 10:03 PM
>> To: Bart Van Assche; osan...@osandov.com; kashyap.de...@broadcom.com
>> Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org;
>> h...@infradead.org; linux-bl...@vger.kernel.org; paolo.vale...@linaro.org
>> Subject: Re: Device or HBA level QD throttling creates randomness in
>> sequetial workload
>>
>> On 01/30/2017 09:30 AM, Bart Van Assche wrote:
>>> On Mon, 2017-01-30 at 19:22 +0530, Kashyap Desai wrote:
>>>> -   if (atomic_inc_return(&instance->fw_outstanding) >
>>>> -   instance->host->can_queue) {
>>>> -   atomic_dec(&instance->fw_outstanding);
>>>> -   return SCSI_MLQUEUE_HOST_BUSY;
>>>> -   }
>>>> +   if (atomic_inc_return(&instance->fw_outstanding) >
> safe_can_queue) {
>>>> +   is_nonrot = blk_queue_nonrot(scmd->device->request_queue);
>>>> +   /* For rotational device wait for sometime to get fusion
>>>> + command
>>>> from pool.
>>>> +* This is just to reduce proactive re-queue at mid layer
>>>> + which is
>>>> not
>>>> +* sending sorted IO in SCSI.MQ mode.
>>>> +*/
>>>> +   if (!is_nonrot)
>>>> +   udelay(100);
>>>> +   }
>>>
>>> The SCSI core does not allow to sleep inside the queuecommand()
>>> callback function.
>>
>> udelay() is a busy loop, so it's not sleeping. That said, it's obviously
> NOT a
>> great idea. We want to fix the reordering due to requeues, not introduce
>> random busy delays to work around it.
> 
> Thanks for feedback. I do realize that udelay() is going to be very odd
> in queue_command call back.   I will keep this note. Preferred solution is
> blk mq scheduler patches.

It's coming in 4.11, so you don't have to wait long.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2017-01-30 Thread Kashyap Desai

> -Original Message-
> From: Jens Axboe [mailto:ax...@kernel.dk]
> Sent: Monday, January 30, 2017 10:03 PM
> To: Bart Van Assche; osan...@osandov.com; kashyap.de...@broadcom.com
> Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org;
> h...@infradead.org; linux-bl...@vger.kernel.org; paolo.vale...@linaro.org
> Subject: Re: Device or HBA level QD throttling creates randomness in
> sequetial workload
>
> On 01/30/2017 09:30 AM, Bart Van Assche wrote:
> > On Mon, 2017-01-30 at 19:22 +0530, Kashyap Desai wrote:
> >> -   if (atomic_inc_return(&instance->fw_outstanding) >
> >> -   instance->host->can_queue) {
> >> -   atomic_dec(&instance->fw_outstanding);
> >> -   return SCSI_MLQUEUE_HOST_BUSY;
> >> -   }
> >> +   if (atomic_inc_return(&instance->fw_outstanding) >
safe_can_queue) {
> >> +   is_nonrot = blk_queue_nonrot(scmd->device->request_queue);
> >> +   /* For rotational device wait for sometime to get fusion
> >> + command
> >> from pool.
> >> +* This is just to reduce proactive re-queue at mid layer
> >> + which is
> >> not
> >> +* sending sorted IO in SCSI.MQ mode.
> >> +*/
> >> +   if (!is_nonrot)
> >> +   udelay(100);
> >> +   }
> >
> > The SCSI core does not allow to sleep inside the queuecommand()
> > callback function.
>
> udelay() is a busy loop, so it's not sleeping. That said, it's obviously
NOT a
> great idea. We want to fix the reordering due to requeues, not introduce
> random busy delays to work around it.

Thanks for feedback. I do realize that udelay() is going to be very odd
in queue_command call back.   I will keep this note. Preferred solution is
blk mq scheduler patches.
>
> --
> Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2017-01-30 Thread Jens Axboe

On 01/30/2017 09:30 AM, Bart Van Assche wrote:
> On Mon, 2017-01-30 at 19:22 +0530, Kashyap Desai wrote:
>> -   if (atomic_inc_return(&instance->fw_outstanding) >
>> -   instance->host->can_queue) {
>> -   atomic_dec(&instance->fw_outstanding);
>> -   return SCSI_MLQUEUE_HOST_BUSY;
>> -   }
>> +   if (atomic_inc_return(&instance->fw_outstanding) > safe_can_queue) {
>> +   is_nonrot = blk_queue_nonrot(scmd->device->request_queue);
>> +   /* For rotational device wait for sometime to get fusion command
>> from pool.
>> +* This is just to reduce proactive re-queue at mid layer which is
>> not
>> +* sending sorted IO in SCSI.MQ mode.
>> +*/
>> +   if (!is_nonrot)
>> +   udelay(100);
>> +   }
> 
> The SCSI core does not allow to sleep inside the queuecommand() callback
> function.

udelay() is a busy loop, so it's not sleeping. That said, it's obviously
NOT a great idea. We want to fix the reordering due to requeues, not
introduce random busy delays to work around it.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2017-01-30 Thread Bart Van Assche

On Mon, 2017-01-30 at 19:22 +0530, Kashyap Desai wrote:
> -   if (atomic_inc_return(&instance->fw_outstanding) >
> -   instance->host->can_queue) {
> -   atomic_dec(&instance->fw_outstanding);
> -   return SCSI_MLQUEUE_HOST_BUSY;
> -   }
> +   if (atomic_inc_return(&instance->fw_outstanding) > safe_can_queue) {
> +   is_nonrot = blk_queue_nonrot(scmd->device->request_queue);
> +   /* For rotational device wait for sometime to get fusion command
> from pool.
> +* This is just to reduce proactive re-queue at mid layer which is
> not
> +* sending sorted IO in SCSI.MQ mode.
> +*/
> +   if (!is_nonrot)
> +   udelay(100);
> +   }

The SCSI core does not allow to sleep inside the queuecommand() callback
function.

Bart.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2017-01-30 Thread Kashyap Desai

Hi Jens/Omar,

I used git.kernel.dk/linux-block branch - blk-mq-sched (commit
0efe27068ecf37ece2728a99b863763286049ab5) and confirm that issue reported in
this thread is resolved.

Now I am seeing MQ and  SQ mode both are resulting in sequential IO pattern
while IO is getting re-queued in block layer.

To make similar performance without blk-mq-sched feature, is it good to
pause IO for few usec in LLD?
I mean, I want to avoid driver asking SML/Block layer to re-queue the IO (if
it is Sequential on Rotational media.)

Explaining w.r.t megaraid_sas driver.  This driver expose can_queue, but it
internally consume commands for raid 1, fast  path.
In worst case, can_queue/2 will consume all firmware resources and driver
will re-queue further IOs to SML as below -

   if (atomic_inc_return(&instance->fw_outstanding) >
   instance->host->can_queue) {
   atomic_dec(&instance->fw_outstanding);
   return SCSI_MLQUEUE_HOST_BUSY;
   }

I want to avoid above SCSI_MLQUEUE_HOST_BUSY.

Need your suggestion for below changes -

diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 9a9c84f..a683eb0 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "megaraid_sas_fusion.h"
 #include "megaraid_sas.h"
@@ -2572,7 +2573,15 @@ void megasas_prepare_secondRaid1_IO(struct
megasas_instance *instance,
struct megasas_cmd_fusion *cmd, *r1_cmd = NULL;
union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
u32 index;
-   struct fusion_context *fusion;
+   boolis_nonrot;
+   u32 safe_can_queue;
+   u32 num_cpus;
+   struct fusion_context *fusion;
+
+   fusion = instance->ctrl_context;
+
+   num_cpus = num_online_cpus();
+   safe_can_queue = instance->cur_can_queue - num_cpus;

fusion = instance->ctrl_context;

@@ -2584,11 +2593,15 @@ void megasas_prepare_secondRaid1_IO(struct
megasas_instance *instance,
return SCSI_MLQUEUE_DEVICE_BUSY;
}

-   if (atomic_inc_return(&instance->fw_outstanding) >
-   instance->host->can_queue) {
-   atomic_dec(&instance->fw_outstanding);
-   return SCSI_MLQUEUE_HOST_BUSY;
-   }
+   if (atomic_inc_return(&instance->fw_outstanding) > safe_can_queue) {
+   is_nonrot = blk_queue_nonrot(scmd->device->request_queue);
+   /* For rotational device wait for sometime to get fusion command
from pool.
+* This is just to reduce proactive re-queue at mid layer which is
not
+* sending sorted IO in SCSI.MQ mode.
+*/
+   if (!is_nonrot)
+   udelay(100);
+   }

cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);

` Kashyap

> -Original Message-
> From: Kashyap Desai [mailto:kashyap.de...@broadcom.com]
> Sent: Tuesday, November 01, 2016 11:11 AM
> To: 'Jens Axboe'; 'Omar Sandoval'
> Cc: 'linux-scsi@vger.kernel.org'; 'linux-ker...@vger.kernel.org'; 'linux-
> bl...@vger.kernel.org'; 'Christoph Hellwig'; 'paolo.vale...@linaro.org'
> Subject: RE: Device or HBA level QD throttling creates randomness in
> sequetial workload
>
> Jens- Replied inline.
>
>
> Omar -  I tested your WIP repo and figure out System hangs only if I pass
> "
> scsi_mod.use_blk_mq=Y". Without this, your WIP branch works fine, but I
> am looking for scsi_mod.use_blk_mq=Y.
>
> Also below is snippet of blktrace. In case of higher per device QD, I see
> Requeue request in blktrace.
>
> 65,128 10 6268 2.432404509 18594  P   N [fio]
>  65,128 10 6269 2.432405013 18594  U   N [fio] 1
>  65,128 10 6270 2.432405143 18594  I  WS 148800 + 8 [fio]
>  65,128 10 6271 2.432405740 18594  R  WS 148800 + 8 [0]
>  65,128 10 6272 2.432409794 18594  Q  WS 148808 + 8 [fio]
>  65,128 10 6273 2.432410234 18594  G  WS 148808 + 8 [fio]
>  65,128 10 6274 2.432410424 18594  S  WS 148808 + 8 [fio]
>  65,128 23 3626 2.432432595 16232  D  WS 148800 + 8
> [kworker/23:1H]
>  65,128 22 3279 2.432973482 0  C  WS 147432 + 8 [0]
>  65,128  7 6126 2.433032637 18594  P   N [fio]
>  65,128  7 6127 2.433033204 18594  U   N [fio] 1
>  65,128  7 6128 2.433033346 18594  I  WS 148808 + 8 [fio]
>  65,128  7 6129 2.433033871 18594  D  WS 148808 + 8 [fio]
>  65,128  7 6130 2.433034559 18594  R  WS 148808 + 8 [0]
>  65,128  7 6131 2.433039796 18594  Q  WS 148816 + 8 [fio]
>  65,128  7 6132 2.433040206 18594  G  WS 148816 + 8 [fio]
>  65,128  7 6133 2.433040351 18594  S  WS 148816 + 8 [fio]
>  65,128  9 6392 2.433133729 0  C  WS 147240 + 8 [0]
>  65,128  9 6393

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-31 Thread Kashyap Desai

Jens- Replied inline.


Omar -  I tested your WIP repo and figure out System hangs only if I pass "
scsi_mod.use_blk_mq=Y". Without this, your WIP branch works fine, but I am
looking for scsi_mod.use_blk_mq=Y.

Also below is snippet of blktrace. In case of higher per device QD, I see
Requeue request in blktrace.

65,128 10 6268 2.432404509 18594  P   N [fio]
 65,128 10 6269 2.432405013 18594  U   N [fio] 1
 65,128 10 6270 2.432405143 18594  I  WS 148800 + 8 [fio]
 65,128 10 6271 2.432405740 18594  R  WS 148800 + 8 [0]
 65,128 10 6272 2.432409794 18594  Q  WS 148808 + 8 [fio]
 65,128 10 6273 2.432410234 18594  G  WS 148808 + 8 [fio]
 65,128 10 6274 2.432410424 18594  S  WS 148808 + 8 [fio]
 65,128 23 3626 2.432432595 16232  D  WS 148800 + 8 [kworker/23:1H]
 65,128 22 3279 2.432973482 0  C  WS 147432 + 8 [0]
 65,128  7 6126 2.433032637 18594  P   N [fio]
 65,128  7 6127 2.433033204 18594  U   N [fio] 1
 65,128  7 6128 2.433033346 18594  I  WS 148808 + 8 [fio]
 65,128  7 6129 2.433033871 18594  D  WS 148808 + 8 [fio]
 65,128  7 6130 2.433034559 18594  R  WS 148808 + 8 [0]
 65,128  7 6131 2.433039796 18594  Q  WS 148816 + 8 [fio]
 65,128  7 6132 2.433040206 18594  G  WS 148816 + 8 [fio]
 65,128  7 6133 2.433040351 18594  S  WS 148816 + 8 [fio]
 65,128  9 6392 2.433133729 0  C  WS 147240 + 8 [0]
 65,128  9 6393 2.433138166   905  D  WS 148808 + 8 [kworker/9:1H]
 65,128  7 6134 2.433167450 18594  P   N [fio]
 65,128  7 6135 2.433167911 18594  U   N [fio] 1
 65,128  7 6136 2.433168074 18594  I  WS 148816 + 8 [fio]
 65,128  7 6137 2.433168492 18594  D  WS 148816 + 8 [fio]
 65,128  7 6138 2.433174016 18594  Q  WS 148824 + 8 [fio]
 65,128  7 6139 2.433174282 18594  G  WS 148824 + 8 [fio]
 65,128  7 6140 2.433174613 18594  S  WS 148824 + 8 [fio]
CPU0 (sdy):
 Reads Queued:   0,0KiB  Writes Queued:  79,
316KiB
 Read Dispatches:0,0KiB  Write Dispatches:   67,
18,446,744,073PiB
 Reads Requeued: 0   Writes Requeued:86
 Reads Completed:0,0KiB  Writes Completed:   98,
392KiB
 Read Merges:0,0KiB  Write Merges:0,
0KiB
 Read depth: 0   Write depth: 5
 IO unplugs:79   Timer unplugs:   0



` Kashyap

> -Original Message-
> From: Jens Axboe [mailto:ax...@kernel.dk]
> Sent: Monday, October 31, 2016 10:54 PM
> To: Kashyap Desai; Omar Sandoval
> Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> bl...@vger.kernel.org; Christoph Hellwig; paolo.vale...@linaro.org
> Subject: Re: Device or HBA level QD throttling creates randomness in
> sequetial
> workload
>
> Hi,
>
> One guess would be that this isn't around a requeue condition, but rather
> the
> fact that we don't really guarantee any sort of hard FIFO behavior between
> the
> software queues. Can you try this test patch to see if it changes the
> behavior for
> you? Warning: untested...

Jens - I tested the patch, but I still see random IO pattern for expected
Sequential Run. I am intentionally running case of Re-queue  and seeing
issue at the time of Re-queue.
If there is no Requeue, I see no issue at LLD.


>
> diff --git a/block/blk-mq.c b/block/blk-mq.c index
> f3d27a6dee09..5404ca9c71b2
> 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -772,6 +772,14 @@ static inline unsigned int queued_to_index(unsigned
> int
> queued)
>   return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
>   }
>
> +static int rq_pos_cmp(void *priv, struct list_head *a, struct list_head
> +*b) {
> + struct request *rqa = container_of(a, struct request, queuelist);
> + struct request *rqb = container_of(b, struct request, queuelist);
> +
> + return blk_rq_pos(rqa) < blk_rq_pos(rqb); }
> +
>   /*
>* Run this hardware queue, pulling any software queues mapped to it in.
>* Note that this function currently has various problems around
> ordering @@ -
> 812,6 +820,14 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx
> *hctx)
>   }
>
>   /*
> +  * If the device is rotational, sort the list sanely to avoid
> +  * unecessary seeks. The software queues are roughly FIFO, but
> +  * only roughly, there are no hard guarantees.
> +  */
> + if (!blk_queue_nonrot(q))
> + list_sort(NULL, &rq_list, rq_pos_cmp);
> +
> + /*
>* Start off with dptr being NULL, so we start the first request
>* immediately, even if we have more pending.
>*/
>
> --
> Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-31 Thread Jens Axboe


Hi,

One guess would be that this isn't around a requeue condition, but
rather the fact that we don't really guarantee any sort of hard FIFO
behavior between the software queues. Can you try this test patch to see
if it changes the behavior for you? Warning: untested...

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3d27a6dee09..5404ca9c71b2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -772,6 +772,14 @@ static inline unsigned int queued_to_index(unsigned 
int queued)

return min(BLK_MQ_MAX_DISPATCH_ORDER - 1, ilog2(queued) + 1);
 }

+static int rq_pos_cmp(void *priv, struct list_head *a, struct list_head *b)
+{
+   struct request *rqa = container_of(a, struct request, queuelist);
+   struct request *rqb = container_of(b, struct request, queuelist);
+
+   return blk_rq_pos(rqa) < blk_rq_pos(rqb);
+}
+
 /*
  * Run this hardware queue, pulling any software queues mapped to it in.
  * Note that this function currently has various problems around ordering
@@ -812,6 +820,14 @@ static void __blk_mq_run_hw_queue(struct 
blk_mq_hw_ctx *hctx)

}

/*
+* If the device is rotational, sort the list sanely to avoid
+* unecessary seeks. The software queues are roughly FIFO, but
+* only roughly, there are no hard guarantees.
+*/
+   if (!blk_queue_nonrot(q))
+   list_sort(NULL, &rq_list, rq_pos_cmp);
+
+   /*
 * Start off with dptr being NULL, so we start the first request
 * immediately, even if we have more pending.
 */

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-26 Thread Omar Sandoval

On Tue, Oct 25, 2016 at 12:24:24AM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Omar Sandoval [mailto:osan...@osandov.com]
> > Sent: Monday, October 24, 2016 9:11 PM
> > To: Kashyap Desai
> > Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> > bl...@vger.kernel.org; ax...@kernel.dk; Christoph Hellwig;
> > paolo.vale...@linaro.org
> > Subject: Re: Device or HBA level QD throttling creates randomness in
> sequetial
> > workload
> >
> > On Mon, Oct 24, 2016 at 06:35:01PM +0530, Kashyap Desai wrote:
> > > >
> > > > On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> > > > > Hi -
> > > > >
> > > > > I found below conversation and it is on the same line as I wanted
> > > > > some input from mailing list.
> > > > >
> > > > > http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> > > > >
> > > > > I can do testing on any WIP item as Omar mentioned in above
> > > discussion.
> > > > > https://github.com/osandov/linux/tree/blk-mq-iosched
> > >
> > > I tried build kernel using this repo, but looks like it is not allowed
> > > to reboot due to some changes in  layer.
> >
> > Did you build the most up-to-date version of that branch? I've been
> force
> > pushing to it, so the commit id that you built would be useful.
> > What boot failure are you seeing?
> 
> Below  is latest commit on repo.
> commit b077a9a5149f17ccdaa86bc6346fa256e3c1feda
> Author: Omar Sandoval 
> Date:   Tue Sep 20 11:20:03 2016 -0700
> 
> [WIP] blk-mq: limit bio queue depth
> 
> I have latest repo from 4.9/scsi-next maintained by Martin which boots
> fine.  Only Delta is  " CONFIG_SBITMAP" is enabled in WIP blk-mq-iosched
> branch. I could not see any meaningful data on boot hang, so going to try
> one more time tomorrow.

The blk-mq-bio-queueing branch has the latest work there separated out.
Not sure that it'll help in this case.

> >
> > > >
> > > > Are you using blk-mq for this disk? If not, then the work there
> > > > won't
> > > affect you.
> > >
> > > YES. I am using blk-mq for my test. I also confirm if use_blk_mq is
> > > disable, Sequential work load issue is not seen and  scheduling
> > > works well.
> >
> > Ah, okay, perfect. Can you send the fio job file you're using? Hard to
> tell exactly
> > what's going on without the details. A sequential workload with just one
> > submitter is about as easy as it gets, so this _should_ be behaving
> nicely.
> 
> 
> 
> ; setup numa policy for each thread
> ; 'numactl --show' to determine the maximum numa nodes
> [global]
> ioengine=libaio
> buffered=0
> rw=write
> bssplit=4K/100
> iodepth=256
> numjobs=1
> direct=1
> runtime=60s
> allow_mounted_write=0
> 
> [job1]
> filename=/dev/sdd
> ..
> [job24]
> filename=/dev/sdaa

Okay, so you have one high-iodepth job per disk, got it.

> When I tune /sys/module/scsi_mod/parameters/use_blk_mq = 1, below is a
> ioscheduler detail. (It is in blk-mq mode. )
> /sys/devices/pci:00/:00:02.0/:02:00.0/host10/target10:2:13/10:
> 2:13:0/block/sdq/queue/scheduler:none
> 
> When I have set /sys/module/scsi_mod/parameters/use_blk_mq = 0,
> ioscheduler picked by SML is .
> /sys/devices/pci:00/:00:02.0/:02:00.0/host10/target10:2:13/10:
> 2:13:0/block/sdq/queue/scheduler:noop deadline [cfq]
> 
> I see in blk-mq performance is very low for Sequential Write work load and
> I confirm that blk-mq convert Sequential work load into random stream due
> to  io-scheduler change in blk-mq vs legacy block layer.

Since this happens when the fio iodepth exceeds the per-device QD, my
best guess is that this is that requests are getting requeued and
scrambled when that happens. Do you have the blktrace lying around?

> > > > > Is there any workaround/alternative in latest upstream kernel, if
> > > > > user wants to see limited penalty  for Sequential Work load on HDD
> ?
> > > > >
> > > > > ` Kashyap
> > > > >
> >
> > P.S., your emails are being marked as spam by Gmail. Actually, Gmail
> seems to
> > mark just about everything I get from Broadcom as spam due to failed
> DMARC.
> >
> > --
> > Omar

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-24 Thread Kashyap Desai

> -Original Message-
> From: Omar Sandoval [mailto:osan...@osandov.com]
> Sent: Monday, October 24, 2016 9:11 PM
> To: Kashyap Desai
> Cc: linux-scsi@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> bl...@vger.kernel.org; ax...@kernel.dk; Christoph Hellwig;
> paolo.vale...@linaro.org
> Subject: Re: Device or HBA level QD throttling creates randomness in
sequetial
> workload
>
> On Mon, Oct 24, 2016 at 06:35:01PM +0530, Kashyap Desai wrote:
> > >
> > > On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> > > > Hi -
> > > >
> > > > I found below conversation and it is on the same line as I wanted
> > > > some input from mailing list.
> > > >
> > > > http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> > > >
> > > > I can do testing on any WIP item as Omar mentioned in above
> > discussion.
> > > > https://github.com/osandov/linux/tree/blk-mq-iosched
> >
> > I tried build kernel using this repo, but looks like it is not allowed
> > to reboot due to some changes in  layer.
>
> Did you build the most up-to-date version of that branch? I've been
force
> pushing to it, so the commit id that you built would be useful.
> What boot failure are you seeing?

Below  is latest commit on repo.
commit b077a9a5149f17ccdaa86bc6346fa256e3c1feda
Author: Omar Sandoval 
Date:   Tue Sep 20 11:20:03 2016 -0700

[WIP] blk-mq: limit bio queue depth

I have latest repo from 4.9/scsi-next maintained by Martin which boots
fine.  Only Delta is  " CONFIG_SBITMAP" is enabled in WIP blk-mq-iosched
branch. I could not see any meaningful data on boot hang, so going to try
one more time tomorrow.


>
> > >
> > > Are you using blk-mq for this disk? If not, then the work there
> > > won't
> > affect you.
> >
> > YES. I am using blk-mq for my test. I also confirm if use_blk_mq is
> > disable, Sequential work load issue is not seen and  scheduling
> > works well.
>
> Ah, okay, perfect. Can you send the fio job file you're using? Hard to
tell exactly
> what's going on without the details. A sequential workload with just one
> submitter is about as easy as it gets, so this _should_ be behaving
nicely.



; setup numa policy for each thread
; 'numactl --show' to determine the maximum numa nodes
[global]
ioengine=libaio
buffered=0
rw=write
bssplit=4K/100
iodepth=256
numjobs=1
direct=1
runtime=60s
allow_mounted_write=0

[job1]
filename=/dev/sdd
..
[job24]
filename=/dev/sdaa

When I tune /sys/module/scsi_mod/parameters/use_blk_mq = 1, below is a
ioscheduler detail. (It is in blk-mq mode. )
/sys/devices/pci:00/:00:02.0/:02:00.0/host10/target10:2:13/10:
2:13:0/block/sdq/queue/scheduler:none

When I have set /sys/module/scsi_mod/parameters/use_blk_mq = 0,
ioscheduler picked by SML is .
/sys/devices/pci:00/:00:02.0/:02:00.0/host10/target10:2:13/10:
2:13:0/block/sdq/queue/scheduler:noop deadline [cfq]

I see in blk-mq performance is very low for Sequential Write work load and
I confirm that blk-mq convert Sequential work load into random stream due
to  io-scheduler change in blk-mq vs legacy block layer.

>
> > >
> > > > Is there any workaround/alternative in latest upstream kernel, if
> > > > user wants to see limited penalty  for Sequential Work load on HDD
?
> > > >
> > > > ` Kashyap
> > > >
>
> P.S., your emails are being marked as spam by Gmail. Actually, Gmail
seems to
> mark just about everything I get from Broadcom as spam due to failed
DMARC.
>
> --
> Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-24 Thread Omar Sandoval

On Mon, Oct 24, 2016 at 06:35:01PM +0530, Kashyap Desai wrote:
> >
> > On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> > > Hi -
> > >
> > > I found below conversation and it is on the same line as I wanted some
> > > input from mailing list.
> > >
> > > http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> > >
> > > I can do testing on any WIP item as Omar mentioned in above
> discussion.
> > > https://github.com/osandov/linux/tree/blk-mq-iosched
> 
> I tried build kernel using this repo, but looks like it is not allowed to
> reboot due to some changes in  layer.

Did you build the most up-to-date version of that branch? I've been
force pushing to it, so the commit id that you built would be useful.
What boot failure are you seeing?

> >
> > Are you using blk-mq for this disk? If not, then the work there won't
> affect you.
> 
> YES. I am using blk-mq for my test. I also confirm if use_blk_mq is
> disable, Sequential work load issue is not seen and  scheduling works
> well.

Ah, okay, perfect. Can you send the fio job file you're using? Hard to
tell exactly what's going on without the details. A sequential workload
with just one submitter is about as easy as it gets, so this _should_ be
behaving nicely.

> >
> > > Is there any workaround/alternative in latest upstream kernel, if user
> > > wants to see limited penalty  for Sequential Work load on HDD ?
> > >
> > > ` Kashyap
> > >

P.S., your emails are being marked as spam by Gmail. Actually, Gmail
seems to mark just about everything I get from Broadcom as spam due to
failed DMARC.

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-24 Thread Kashyap Desai

>
> On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> > Hi -
> >
> > I found below conversation and it is on the same line as I wanted some
> > input from mailing list.
> >
> > http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> >
> > I can do testing on any WIP item as Omar mentioned in above
discussion.
> > https://github.com/osandov/linux/tree/blk-mq-iosched

I tried build kernel using this repo, but looks like it is not allowed to
reboot due to some changes in  layer.

>
> Are you using blk-mq for this disk? If not, then the work there won't
affect you.

YES. I am using blk-mq for my test. I also confirm if use_blk_mq is
disable, Sequential work load issue is not seen and  scheduling works
well.

>
> > Is there any workaround/alternative in latest upstream kernel, if user
> > wants to see limited penalty  for Sequential Work load on HDD ?
> >
> > ` Kashyap
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-21 Thread Omar Sandoval

On Fri, Oct 21, 2016 at 05:43:35PM +0530, Kashyap Desai wrote:
> Hi -
> 
> I found below conversation and it is on the same line as I wanted some
> input from mailing list.
> 
> http://marc.info/?l=linux-kernel&m=147569860526197&w=2
> 
> I can do testing on any WIP item as Omar mentioned in above discussion.
> https://github.com/osandov/linux/tree/blk-mq-iosched

Are you using blk-mq for this disk? If not, then the work there won't
affect you.

> Is there any workaround/alternative in latest upstream kernel, if user
> wants to see limited penalty  for Sequential Work load on HDD ?
> 
> ` Kashyap
> 
> > -Original Message-
> > From: Kashyap Desai [mailto:kashyap.de...@broadcom.com]
> > Sent: Thursday, October 20, 2016 3:39 PM
> > To: linux-scsi@vger.kernel.org
> > Subject: Device or HBA level QD throttling creates randomness in
> sequetial
> > workload
> >
> > [ Apologize, if you find more than one instance of my email.
> > Web based email client has some issue, so now trying git send mail.]
> >
> > Hi,
> >
> > I am doing some performance tuning in MR driver to understand how sdev
> > queue depth and hba queue depth play role in IO submission from above
> layer.
> > I have 24 JBOD connected to MR 12GB controller and I can see performance
> for
> > 4K Sequential work load as below.
> >
> > HBA QD for MR controller is 4065 and Per device QD is set to 32
> >
> > queue depth from  256 reports 300K IOPS queue depth from  128
> > reports 330K IOPS queue depth from  64 reports 360K IOPS queue
> depth
> > from  32 reports 510K IOPS
> >
> > In MR driver I added debug print and confirm that more IO come to driver
> as
> > random IO whenever I have  queue depth more than 32.
> >
> > I have debug using scsi logging level and blktrace as well. Below is
> snippet of
> > logs using scsi logging level.  In summary, if SML do flow control of IO
> due to
> > Device QD or HBA QD, IO coming to LLD is more random pattern.
> >
> > I see IO coming to driver is not sequential.
> >
> > [79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0
> 3b 00
> > 00 01 00 [79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00
> 00 03
> > c0 3c 00 00 01 00 [79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB:
> Write(10) 2a
> > 00 00 03 c0 5b 00 00 01 00
> >
> >  After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b".
> > Two Sequence are overlapped due to sdev QD throttling.
> >
> > [79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0
> 5c 00
> > 00 01 00 [79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00
> 00 03
> > c0 3d 00 00 01 00 [79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB:
> Write(10) 2a
> > 00 00 03 c0 5d 00 00 01 00 [79546.912259] sd 18:2:21:0: [sdy] tag#857
> CDB:
> > Write(10) 2a 00 00 03 c0 3e 00 00 01 00 [79546.912268] sd 18:2:21:0:
> [sdy]
> > tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00
> >
> >  If scsi_request_fn() breaks due to unavailability of device queue (due
> to below
> > check), will there be any side defect as I observe ?
> > if (!scsi_dev_queue_ready(q, sdev))
> >  break;
> >
> > If I reduce HBA QD and make sure IO from above layer is throttled due to
> HBA
> > QD, there is a same impact.
> > MR driver use host wide shared tag map.
> >
> > Can someone help me if this can be tunable in LLD providing additional
> settings
> > or it is expected behavior ? Problem I am facing is, I am not able to
> figure out
> > optimal device queue depth for different configuration and work load.
> >
> > Thanks, Kashyap

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-21 Thread Kashyap Desai

Hi -

I found below conversation and it is on the same line as I wanted some
input from mailing list.

http://marc.info/?l=linux-kernel&m=147569860526197&w=2

I can do testing on any WIP item as Omar mentioned in above discussion.
https://github.com/osandov/linux/tree/blk-mq-iosched

Is there any workaround/alternative in latest upstream kernel, if user
wants to see limited penalty  for Sequential Work load on HDD ?

` Kashyap

> -Original Message-
> From: Kashyap Desai [mailto:kashyap.de...@broadcom.com]
> Sent: Thursday, October 20, 2016 3:39 PM
> To: linux-scsi@vger.kernel.org
> Subject: Device or HBA level QD throttling creates randomness in
sequetial
> workload
>
> [ Apologize, if you find more than one instance of my email.
> Web based email client has some issue, so now trying git send mail.]
>
> Hi,
>
> I am doing some performance tuning in MR driver to understand how sdev
> queue depth and hba queue depth play role in IO submission from above
layer.
> I have 24 JBOD connected to MR 12GB controller and I can see performance
for
> 4K Sequential work load as below.
>
> HBA QD for MR controller is 4065 and Per device QD is set to 32
>
> queue depth from  256 reports 300K IOPS queue depth from  128
> reports 330K IOPS queue depth from  64 reports 360K IOPS queue
depth
> from  32 reports 510K IOPS
>
> In MR driver I added debug print and confirm that more IO come to driver
as
> random IO whenever I have  queue depth more than 32.
>
> I have debug using scsi logging level and blktrace as well. Below is
snippet of
> logs using scsi logging level.  In summary, if SML do flow control of IO
due to
> Device QD or HBA QD, IO coming to LLD is more random pattern.
>
> I see IO coming to driver is not sequential.
>
> [79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0
3b 00
> 00 01 00 [79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00
00 03
> c0 3c 00 00 01 00 [79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB:
Write(10) 2a
> 00 00 03 c0 5b 00 00 01 00
>
>  After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b".
> Two Sequence are overlapped due to sdev QD throttling.
>
> [79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0
5c 00
> 00 01 00 [79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00
00 03
> c0 3d 00 00 01 00 [79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB:
Write(10) 2a
> 00 00 03 c0 5d 00 00 01 00 [79546.912259] sd 18:2:21:0: [sdy] tag#857
CDB:
> Write(10) 2a 00 00 03 c0 3e 00 00 01 00 [79546.912268] sd 18:2:21:0:
[sdy]
> tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 00 01 00
>
>  If scsi_request_fn() breaks due to unavailability of device queue (due
to below
> check), will there be any side defect as I observe ?
> if (!scsi_dev_queue_ready(q, sdev))
>  break;
>
> If I reduce HBA QD and make sure IO from above layer is throttled due to
HBA
> QD, there is a same impact.
> MR driver use host wide shared tag map.
>
> Can someone help me if this can be tunable in LLD providing additional
settings
> or it is expected behavior ? Problem I am facing is, I am not able to
figure out
> optimal device queue depth for different configuration and work load.
>
> Thanks, Kashyap
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-20 Thread Kashyap Desai

[ Apologize, if you find more than one instance of my email.
Web based email client has some issue, so now trying git send mail.]

Hi,

I am doing some performance tuning in MR driver to understand how sdev queue 
depth and hba queue depth play role in IO submission from above layer.
I have 24 JBOD connected to MR 12GB controller and I can see performance for 4K 
Sequential work load as below.

HBA QD for MR controller is 4065 and Per device QD is set to 32

queue depth from  256 reports 300K IOPS 
queue depth from  128 reports 330K IOPS
queue depth from  64 reports 360K IOPS 
queue depth from  32 reports 510K IOPS

In MR driver I added debug print and confirm that more IO come to driver as 
random IO whenever I have  queue depth more than 32.

I have debug using scsi logging level and blktrace as well. Below is snippet of 
logs using scsi logging level.  In summary, if SML do flow control of IO due to 
Device QD or HBA QD, IO coming to LLD is more random pattern.

I see IO coming to driver is not sequential.

[79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0 3b 00 
00 01 00
[79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00 00 03 c0 3c 00 
00 01 00
[79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB: Write(10) 2a 00 00 03 c0 5b 00 
00 01 00 

 After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b". 
Two Sequence are overlapped due to sdev QD throttling.

[79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0 5c 00 
00 01 00
[79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00 00 03 c0 3d 00 
00 01 00
[79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB: Write(10) 2a 00 00 03 c0 5d 00 
00 01 00
[79546.912259] sd 18:2:21:0: [sdy] tag#857 CDB: Write(10) 2a 00 00 03 c0 3e 00 
00 01 00
[79546.912268] sd 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 
00 01 00

 If scsi_request_fn() breaks due to unavailability of device queue (due to 
below check), will there be any side defect as I observe ?
if (!scsi_dev_queue_ready(q, sdev))
 break;

If I reduce HBA QD and make sure IO from above layer is throttled due to HBA 
QD, there is a same impact.
MR driver use host wide shared tag map.

Can someone help me if this can be tunable in LLD providing additional settings 
or it is expected behavior ? Problem I am facing is, I am not able to figure 
out optimal device queue depth for different configuration and work load.

Thanks, Kashyap

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-20 Thread Kashyap Desai

[ Apologize, if you find more than one instance of my email.
Web based email client has some issue, so now trying git send mail.]

Hi,

I am doing some performance tuning in MR driver to understand how sdev queue 
depth and hba queue depth play role in IO submission from above layer.
I have 24 JBOD connected to MR 12GB controller and I can see performance for 4K 
Sequential work load as below.

HBA QD for MR controller is 4065 and Per device QD is set to 32

queue depth from  256 reports 300K IOPS 
queue depth from  128 reports 330K IOPS
queue depth from  64 reports 360K IOPS 
queue depth from  32 reports 510K IOPS

In MR driver I added debug print and confirm that more IO come to driver as 
random IO whenever I have  queue depth more than 32.

I have debug using scsi logging level and blktrace as well. Below is snippet of 
logs using scsi logging level.  In summary, if SML do flow control of IO due to 
Device QD or HBA QD, IO coming to LLD is more random pattern.

I see IO coming to driver is not sequential.

[79546.912041] sd 18:2:21:0: [sdy] tag#854 CDB: Write(10) 2a 00 00 03 c0 3b 00 
00 01 00
[79546.912049] sd 18:2:21:0: [sdy] tag#855 CDB: Write(10) 2a 00 00 03 c0 3c 00 
00 01 00
[79546.912053] sd 18:2:21:0: [sdy] tag#886 CDB: Write(10) 2a 00 00 03 c0 5b 00 
00 01 00 

 After LBA "00 03 c0 3c" next command is with LBA "00 03 c0 5b". 
Two Sequence are overlapped due to sdev QD throttling.

[79546.912056] sd 18:2:21:0: [sdy] tag#887 CDB: Write(10) 2a 00 00 03 c0 5c 00 
00 01 00
[79546.912250] sd 18:2:21:0: [sdy] tag#856 CDB: Write(10) 2a 00 00 03 c0 3d 00 
00 01 00
[79546.912257] sd 18:2:21:0: [sdy] tag#888 CDB: Write(10) 2a 00 00 03 c0 5d 00 
00 01 00
[79546.912259] sd 18:2:21:0: [sdy] tag#857 CDB: Write(10) 2a 00 00 03 c0 3e 00 
00 01 00
[79546.912268] sd 18:2:21:0: [sdy] tag#858 CDB: Write(10) 2a 00 00 03 c0 3f 00 
00 01 00

 If scsi_request_fn() breaks due to unavailability of device queue (due to 
below check), will there be any side defect as I observe ?
if (!scsi_dev_queue_ready(q, sdev))
 break;

If I reduce HBA QD and make sure IO from above layer is throttled due to HBA 
QD, there is a same impact.
MR driver use host wide shared tag map.

Can someone help me if this can be tunable in LLD providing additional settings 
or it is expected behavior ? Problem I am facing is, I am not able to figure 
out optimal device queue depth for different configuration and work load.

Thanks, Kashyap

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

Re: Device or HBA level QD throttling creates randomness in sequetial workload

RE: Device or HBA level QD throttling creates randomness in sequetial workload

Device or HBA level QD throttling creates randomness in sequetial workload

Device or HBA level QD throttling creates randomness in sequetial workload

15 matches

Site Navigation

Mail list logo

Footer information