Re: [Scst-devel] Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-29 Thread Vladislav Bolkhovitin

Chris Worley, on 10/28/2009 09:47 PM wrote:

It appears that SRP tries to coalesce and fragment initiator I/O
requests into 64KB packets, as that looks to be the size requested
to/from the device on the target side (and the I/O scheduler is
disabled on the target).

Is there a way to control this, where no coalescing occurs when
latency is an issue and requests are small, and no fragmentation
occurs when requests are large?

Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data?


You can at any time see size of requests you are receiving on the target 
side by either enabling "scsi" logging (hopefully, you know how to do 
it) or by looking in /proc/scsi_tgt/sgv. In the latter file you will see 
general statistics for power of 2 allocations, i.e. request for 10K will 
increase 16K row.


Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread David Dillow
On Wed, 2009-10-28 at 16:25 -0400, Chris Worley wrote:
> On Wed, Oct 28, 2009 at 1:58 PM, David Dillow  wrote:
> > Under noop, the block layer will send requests as soon as it can without
> > merging. If it has more requests outstanding than the queue length on
> > the SRP initiator, then it will merge the new request with the queued
> > ones if possible.
> 
> So, noop will merge requests when the queue is full, but not hold-off
> to merge?

Correct.

> > The SRP initiator just hands off requests as quick as they are sent to
> > it by the block layer. You can control how big those requests are by
> > tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb
> > which gets set by the max_sect parameter when adding the SRP target.
> 
> So the block layer may also hold-off on small requests, and decreasing
> max_sectors_kb will force it to flush to the SRP initiator ASAP (or is
> this just used for fragmentation of large requests)?

It is just used for breaking up large requests. The deadline, as, and
cfq schedulers may have some hold-off -- I've not checked -- but noop
does not.

You can check the length of the queue by looking
at /sys/class/scsi_disk/$TARGET/device/queue_depth.

That may well be 63, which is the maximum queue depth for the SRP
initiator unless you patch the source. Keep in mind that those 63
requests are shared across all LUNs on that connection, so you may queue
up before that, if you are driving many LUNs.

> Note that 'm trying to minimize latency for very small requests.

Reads or writes?
Are you doing direct IO or plain read/write?
File system or block device access?
Are you using the SCSI devices (/dev/sda etc) or DM multipath
(/dev/mpath/*)?

The SRP initiator is playing the cards it has been dealt, but you could
be getting coalescing from the rest of the system -- for example, I have
no idea if the SRP target code will do read ahead and turn a 4KB request
into a 64KB one -- I suspect it is possible. You can also turn on SCSI
logging to see what is being handed to the initiator to be sure which
side of the connection this is occurring.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread Chris Worley
On Wed, Oct 28, 2009 at 1:58 PM, David Dillow  wrote:
> On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote:
>> There is no scheduler running on either target or initiator on the
>> drives in question (sorry I worded that incorrectly initially), or so
>> I've been told (this information is second-hand).
>
> So, noop scheduler, then?

Yes, "elevator=noop" on both sides.  Again, sorry to be unclear about that.

>
> Under noop, the block layer will send requests as soon as it can without
> merging. If it has more requests outstanding than the queue length on
> the SRP initiator, then it will merge the new request with the queued
> ones if possible.

So, noop will merge requests when the queue is full, but not hold-off to merge?


>
> The SRP initiator just hands off requests as quick as they are sent to
> it by the block layer. You can control how big those requests are by
> tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb
> which gets set by the max_sect parameter when adding the SRP target.

So the block layer may also hold-off on small requests, and decreasing
max_sectors_kb will force it to flush to the SRP initiator ASAP (or is
this just used for fragmentation of large requests)?

Note that 'm trying to minimize latency for very small requests.

Thanks,

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread David Dillow
On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote:
> There is no scheduler running on either target or initiator on the
> drives in question (sorry I worded that incorrectly initially), or so
> I've been told (this information is second-hand). 

So, noop scheduler, then?

Under noop, the block layer will send requests as soon as it can without
merging. If it has more requests outstanding than the queue length on
the SRP initiator, then it will merge the new request with the queued
ones if possible.

>  I did see iostat
> output from the initiator in his case, where there were long waits and
> service times that I'm guessing was due to some coalescing/merging.
> There was also a hint in the iostat output that a scheduler was
> enabled, as there were non-zero values (occasionally) under the
> [rw]qm/s columns, which, if I understand iostat correctly, means there
> is a scheduler merging results.
> 
> So you're saying there is no hold-off for merging on the initiator
> side of the IB/SRP stack?

The SRP initiator just hands off requests as quick as they are sent to
it by the block layer. You can control how big those requests are by
tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb
which gets set by the max_sect parameter when adding the SRP target.

You can get some hold-off potentially by using a non-noop scheduler for
the block device, see /sys/block/$DEV/queue/scheduler. 'as' or
'deadline' may fit your bill, but they have a habit of breaking up
requests into smaller chunks.

Also, you want 'options ib_srp srp_sg_tablesize=255'
in /etc/modprobe.conf, as by default it only allows 12 scatter/gather
entries, which will only guarantee a 48KB request size. Using 255
guarantees you can send a 1020KB request. Of course, if the pages
coalesce in the request, you can send much larger requests before
running out of S/G entires. max_sectors_kb will limit what gets sent in
either case.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread Roland Dreier

 > It appears that SRP tries to coalesce and fragment initiator I/O
 > requests into 64KB packets, as that looks to be the size requested
 > to/from the device on the target side (and the I/O scheduler is
 > disabled on the target).

There is no code in the SRP initiator that does anything to change IO
requests that I know of.  So I think this is happening somewhere higher
in the stack.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread Chris Worley
On Wed, Oct 28, 2009 at 1:14 PM, Bart Van Assche
 wrote:
> On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley  wrote:
>> It appears that SRP tries to coalesce and fragment initiator I/O
>> requests into 64KB packets, as that looks to be the size requested
>> to/from the device on the target side (and the I/O scheduler is
>> disabled on the target).
>>
>> Is there a way to control this, where no coalescing occurs when
>> latency is an issue and requests are small, and no fragmentation
>> occurs when requests are large?
>>
>> Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting 
>> data?
>
> Regarding avoiding coalescing of I/O requests: which I/O scheduler is
> being used on the initiator system and how has it been configured via
> sysfs ?

There is no scheduler running on either target or initiator on the
drives in question (sorry I worded that incorrectly initially), or so
I've been told (this information is second-hand).  I did see iostat
output from the initiator in his case, where there were long waits and
service times that I'm guessing was due to some coalescing/merging.
There was also a hint in the iostat output that a scheduler was
enabled, as there were non-zero values (occasionally) under the
[rw]qm/s columns, which, if I understand iostat correctly, means there
is a scheduler merging results.

So you're saying there is no hold-off for merging on the initiator
side of the IB/SRP stack?
>
> Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might
> help to avoid fragmentation of large requests by the SRP protocol.
> Please post a follow-up message to the mailing list with your findings
> such that MAX_RDMA_SIZE can be converted from a compile-time constant
> to a sysfs variable if this would be useful.

Will do.

Thanks,

Chris
>
> Bart.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread Bart Van Assche
On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley  wrote:
> It appears that SRP tries to coalesce and fragment initiator I/O
> requests into 64KB packets, as that looks to be the size requested
> to/from the device on the target side (and the I/O scheduler is
> disabled on the target).
>
> Is there a way to control this, where no coalescing occurs when
> latency is an issue and requests are small, and no fragmentation
> occurs when requests are large?
>
> Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting 
> data?

Regarding avoiding coalescing of I/O requests: which I/O scheduler is
being used on the initiator system and how has it been configured via
sysfs ?

Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might
help to avoid fragmentation of large requests by the SRP protocol.
Please post a follow-up message to the mailing list with your findings
such that MAX_RDMA_SIZE can be converted from a compile-time constant
to a sysfs variable if this would be useful.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Adjusting minimum packet size or "wait to merge requests" in SRP

2009-10-28 Thread Chris Worley
It appears that SRP tries to coalesce and fragment initiator I/O
requests into 64KB packets, as that looks to be the size requested
to/from the device on the target side (and the I/O scheduler is
disabled on the target).

Is there a way to control this, where no coalescing occurs when
latency is an issue and requests are small, and no fragmentation
occurs when requests are large?

Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data?

Thanks,

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html