Re: [Scst-devel] Adjusting minimum packet size or "wait to merge requests" in SRP
Chris Worley, on 10/28/2009 09:47 PM wrote: It appears that SRP tries to coalesce and fragment initiator I/O requests into 64KB packets, as that looks to be the size requested to/from the device on the target side (and the I/O scheduler is disabled on the target). Is there a way to control this, where no coalescing occurs when latency is an issue and requests are small, and no fragmentation occurs when requests are large? Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data? You can at any time see size of requests you are receiving on the target side by either enabling "scsi" logging (hopefully, you know how to do it) or by looking in /proc/scsi_tgt/sgv. In the latter file you will see general statistics for power of 2 allocations, i.e. request for 10K will increase 16K row. Vlad -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
On Wed, 2009-10-28 at 16:25 -0400, Chris Worley wrote: > On Wed, Oct 28, 2009 at 1:58 PM, David Dillow wrote: > > Under noop, the block layer will send requests as soon as it can without > > merging. If it has more requests outstanding than the queue length on > > the SRP initiator, then it will merge the new request with the queued > > ones if possible. > > So, noop will merge requests when the queue is full, but not hold-off > to merge? Correct. > > The SRP initiator just hands off requests as quick as they are sent to > > it by the block layer. You can control how big those requests are by > > tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb > > which gets set by the max_sect parameter when adding the SRP target. > > So the block layer may also hold-off on small requests, and decreasing > max_sectors_kb will force it to flush to the SRP initiator ASAP (or is > this just used for fragmentation of large requests)? It is just used for breaking up large requests. The deadline, as, and cfq schedulers may have some hold-off -- I've not checked -- but noop does not. You can check the length of the queue by looking at /sys/class/scsi_disk/$TARGET/device/queue_depth. That may well be 63, which is the maximum queue depth for the SRP initiator unless you patch the source. Keep in mind that those 63 requests are shared across all LUNs on that connection, so you may queue up before that, if you are driving many LUNs. > Note that 'm trying to minimize latency for very small requests. Reads or writes? Are you doing direct IO or plain read/write? File system or block device access? Are you using the SCSI devices (/dev/sda etc) or DM multipath (/dev/mpath/*)? The SRP initiator is playing the cards it has been dealt, but you could be getting coalescing from the rest of the system -- for example, I have no idea if the SRP target code will do read ahead and turn a 4KB request into a 64KB one -- I suspect it is possible. You can also turn on SCSI logging to see what is being handed to the initiator to be sure which side of the connection this is occurring. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
On Wed, Oct 28, 2009 at 1:58 PM, David Dillow wrote: > On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote: >> There is no scheduler running on either target or initiator on the >> drives in question (sorry I worded that incorrectly initially), or so >> I've been told (this information is second-hand). > > So, noop scheduler, then? Yes, "elevator=noop" on both sides. Again, sorry to be unclear about that. > > Under noop, the block layer will send requests as soon as it can without > merging. If it has more requests outstanding than the queue length on > the SRP initiator, then it will merge the new request with the queued > ones if possible. So, noop will merge requests when the queue is full, but not hold-off to merge? > > The SRP initiator just hands off requests as quick as they are sent to > it by the block layer. You can control how big those requests are by > tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb > which gets set by the max_sect parameter when adding the SRP target. So the block layer may also hold-off on small requests, and decreasing max_sectors_kb will force it to flush to the SRP initiator ASAP (or is this just used for fragmentation of large requests)? Note that 'm trying to minimize latency for very small requests. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
On Wed, 2009-10-28 at 13:38 -0600, Chris Worley wrote: > There is no scheduler running on either target or initiator on the > drives in question (sorry I worded that incorrectly initially), or so > I've been told (this information is second-hand). So, noop scheduler, then? Under noop, the block layer will send requests as soon as it can without merging. If it has more requests outstanding than the queue length on the SRP initiator, then it will merge the new request with the queued ones if possible. > I did see iostat > output from the initiator in his case, where there were long waits and > service times that I'm guessing was due to some coalescing/merging. > There was also a hint in the iostat output that a scheduler was > enabled, as there were non-zero values (occasionally) under the > [rw]qm/s columns, which, if I understand iostat correctly, means there > is a scheduler merging results. > > So you're saying there is no hold-off for merging on the initiator > side of the IB/SRP stack? The SRP initiator just hands off requests as quick as they are sent to it by the block layer. You can control how big those requests are by tuning /sys/block/$DEV/queue/max_sectors_kb up to .../max_hw_sectors_kb which gets set by the max_sect parameter when adding the SRP target. You can get some hold-off potentially by using a non-noop scheduler for the block device, see /sys/block/$DEV/queue/scheduler. 'as' or 'deadline' may fit your bill, but they have a habit of breaking up requests into smaller chunks. Also, you want 'options ib_srp srp_sg_tablesize=255' in /etc/modprobe.conf, as by default it only allows 12 scatter/gather entries, which will only guarantee a 48KB request size. Using 255 guarantees you can send a 1020KB request. Of course, if the pages coalesce in the request, you can send much larger requests before running out of S/G entires. max_sectors_kb will limit what gets sent in either case. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
> It appears that SRP tries to coalesce and fragment initiator I/O > requests into 64KB packets, as that looks to be the size requested > to/from the device on the target side (and the I/O scheduler is > disabled on the target). There is no code in the SRP initiator that does anything to change IO requests that I know of. So I think this is happening somewhere higher in the stack. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
On Wed, Oct 28, 2009 at 1:14 PM, Bart Van Assche wrote: > On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley wrote: >> It appears that SRP tries to coalesce and fragment initiator I/O >> requests into 64KB packets, as that looks to be the size requested >> to/from the device on the target side (and the I/O scheduler is >> disabled on the target). >> >> Is there a way to control this, where no coalescing occurs when >> latency is an issue and requests are small, and no fragmentation >> occurs when requests are large? >> >> Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting >> data? > > Regarding avoiding coalescing of I/O requests: which I/O scheduler is > being used on the initiator system and how has it been configured via > sysfs ? There is no scheduler running on either target or initiator on the drives in question (sorry I worded that incorrectly initially), or so I've been told (this information is second-hand). I did see iostat output from the initiator in his case, where there were long waits and service times that I'm guessing was due to some coalescing/merging. There was also a hint in the iostat output that a scheduler was enabled, as there were non-zero values (occasionally) under the [rw]qm/s columns, which, if I understand iostat correctly, means there is a scheduler merging results. So you're saying there is no hold-off for merging on the initiator side of the IB/SRP stack? > > Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might > help to avoid fragmentation of large requests by the SRP protocol. > Please post a follow-up message to the mailing list with your findings > such that MAX_RDMA_SIZE can be converted from a compile-time constant > to a sysfs variable if this would be useful. Will do. Thanks, Chris > > Bart. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Adjusting minimum packet size or "wait to merge requests" in SRP
On Wed, Oct 28, 2009 at 7:47 PM, Chris Worley wrote: > It appears that SRP tries to coalesce and fragment initiator I/O > requests into 64KB packets, as that looks to be the size requested > to/from the device on the target side (and the I/O scheduler is > disabled on the target). > > Is there a way to control this, where no coalescing occurs when > latency is an issue and requests are small, and no fragmentation > occurs when requests are large? > > Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting > data? Regarding avoiding coalescing of I/O requests: which I/O scheduler is being used on the initiator system and how has it been configured via sysfs ? Adjusting the constant MAX_RDMA_SIZE in scst/srpt/src/ib_srpt.h might help to avoid fragmentation of large requests by the SRP protocol. Please post a follow-up message to the mailing list with your findings such that MAX_RDMA_SIZE can be converted from a compile-time constant to a sysfs variable if this would be useful. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Adjusting minimum packet size or "wait to merge requests" in SRP
It appears that SRP tries to coalesce and fragment initiator I/O requests into 64KB packets, as that looks to be the size requested to/from the device on the target side (and the I/O scheduler is disabled on the target). Is there a way to control this, where no coalescing occurs when latency is an issue and requests are small, and no fragmentation occurs when requests are large? Or, am I totally wrong in my assumption that SRP is coalescing/fragmenting data? Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html