Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 22/09/2014 8:37, Christoph Hellwig wrote: One thing that is missing is generation multiqueue-aware tags at the blk-mq level, which should be as simple as always adding a queue prefix in the tag allocation code. Hello Christoph, Adding a queue prefix in the tag allocation code is an interesting idea. Encoding the hardware context index in the upper bits of the 'tag' field in 'struct request' is something I have considered. The reason I have not done that is because I think several block drivers assume that the rq->tag field is a number in the range [0..queue_depth-1]. Here is just one example from the mtip32xx driver: fis->sect_count = ((rq->tag << 3) | (rq->tag >> 5)); Did you consider switching srp to use the block layer provided tags? This is on my to-do list. The only reason I have not yet done this is because I have not yet had the time to work on it. Another item that is on my to-do list is to eliminate per-request memory allocation and instead to use your patch that added a "cmd_size" field in the SCSI host template. Also do you have any performance numbers for just using multiple queues inside srp vs using blk-mq exposed queues? So far I have only rerun the multithreaded write test. For that test I see about 15% more IOPS with this patch series (exploiting multiple hardware queues and a 1:1 mapping between hardware context and RDMA queue pair) compared to the previous implementation (one hardware queue and multiple RDMA queue pairs). Please keep in mind that in that test the CPU's of the target system are saturated so the performance potential of using multiple hardware queues is probably larger than the difference I measured. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 2014-09-22 10:31, Jens Axboe wrote: On 2014-09-22 10:25, Bart Van Assche wrote: On 22/09/2014 8:37, Christoph Hellwig wrote: One thing that is missing is generation multiqueue-aware tags at the blk-mq level, which should be as simple as always adding a queue prefix in the tag allocation code. Hello Christoph, Adding a queue prefix in the tag allocation code is an interesting idea. Encoding the hardware context index in the upper bits of the 'tag' field in 'struct request' is something I have considered. The reason I have not done that is because I think several block drivers assume that the rq->tag field is a number in the range [0..queue_depth-1]. Here is just one example from the mtip32xx driver: fis->sect_count = ((rq->tag << 3) | (rq->tag >> 5)); Most drivers assume that the tag is within a certain range, the queue prefix would only work on drivers where the tag number is just some arbitrary "cookie". So for SCSI it should work, and I don't think we need it anywhere else. Alternatively, we can wrap tag retrieval in some function to mask off the queue prefix for the cases where we just want an index. Would probably be better to have a tag generation function instead, that uses rq->tag and ORs in the queue prefix, actually. That way rq->tag would remain being a plain index, and the issued tag in the driver could then use some function to add the queue prefix, if it needs unique tags across queues in a shared tag set. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 2014-09-22 10:25, Bart Van Assche wrote: On 22/09/2014 8:37, Christoph Hellwig wrote: One thing that is missing is generation multiqueue-aware tags at the blk-mq level, which should be as simple as always adding a queue prefix in the tag allocation code. Hello Christoph, Adding a queue prefix in the tag allocation code is an interesting idea. Encoding the hardware context index in the upper bits of the 'tag' field in 'struct request' is something I have considered. The reason I have not done that is because I think several block drivers assume that the rq->tag field is a number in the range [0..queue_depth-1]. Here is just one example from the mtip32xx driver: fis->sect_count = ((rq->tag << 3) | (rq->tag >> 5)); Most drivers assume that the tag is within a certain range, the queue prefix would only work on drivers where the tag number is just some arbitrary "cookie". So for SCSI it should work, and I don't think we need it anywhere else. Alternatively, we can wrap tag retrieval in some function to mask off the queue prefix for the cases where we just want an index. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
Hi Bart, I like these changes modulo the minor comments we had. One thing that is missing is generation multiqueue-aware tags at the blk-mq level, which should be as simple as always adding a queue prefix in the tag allocation code. Did you consider switching srp to use the block layer provided tags? Also do you have any performance numbers for just using multiple queues inside srp vs using blk-mq exposed queues? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 09/19/2014 06:55 AM, Bart Van Assche wrote: > Hello, > > Although the SRP protocol supports multichannel operation, although > since considerable time RDMA HCA's are available that support multiple > completion vectors and although multichannel operation yields better > performance than using a single channel, the Linux SRP initiator does > not yet support multichannel operation. While adding multichannel > support in the SRP initiator I encountered a few challenges of which I > think these need wider discussion. The topics I would like invite wider > discussion about are as follows: > - How to avoid unneeded inter-socket cache traffic. Should the blk-mq > layer e.g. assign CPU cores to hardware queues such that all CPU cores > associated with a single hardware queue reside on the same CPU socket? > (see also patch 1/8) Right now these are deliberately symmetric, and hence can result in hardware queues being left unused. Whether it makes sense to make this change or not, I think will greatly depend on how many queues are available. There's a crossover point where having more queues doesn't help, and it can even make things worse (interrupt load, etc). For your specific case or 4 queues, it probably DOES make sense to use them all, however. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
On 9/19/2014 3:56 PM, Bart Van Assche wrote: > [PATCH 1/8] blk-mq: Use all available hardware queues > > Suppose that a system has two CPU sockets, three cores per socket, > that it does not support hyperthreading and that four hardware > queues are provided by a block driver. With the current algorithm > this will lead to the following assignment of CPU cores to hardware > queues: > >HWQ 0: 0 1 >HWQ 1: 2 3 >HWQ 2: 4 5 >HWQ 3: (none) > > This patch changes the queue assignment into: > >HWQ 0: 0 1 >HWQ 1: 2 >HWQ 2: 3 4 >HWQ 3: 5 > > In other words, this patch has the following three effects: > - All four hardware queues are used instead of only three. > - CPU cores are spread more evenly over hardware queues. For the >above example the range of the number of CPU cores associated >with a single HWQ is reduced from [0..2] to [1..2]. > - If the number of HWQ's is a multiple of the number of CPU sockets >it is now guaranteed that all CPU cores associated with a single >HWQ reside on the same CPU socket. > > Signed-off-by: Bart Van Assche > Cc: Jens Axboe > Cc: Christoph Hellwig > Cc: Ming Lei > --- > block/blk-mq-cpumap.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c > index 1065d7c..8e56455 100644 > --- a/block/blk-mq-cpumap.c > +++ b/block/blk-mq-cpumap.c > @@ -17,7 +17,7 @@ > static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues, >const int cpu) > { > - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues); > + return cpu * nr_queues / nr_cpus; > } > > static int get_first_sibling(unsigned int cpu) > Seems reasonable enough. Reviewed-by: Sagi Grimberg -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/8] IB/srp: Add multichannel support
[PATCH 1/8] blk-mq: Use all available hardware queues Suppose that a system has two CPU sockets, three cores per socket, that it does not support hyperthreading and that four hardware queues are provided by a block driver. With the current algorithm this will lead to the following assignment of CPU cores to hardware queues: HWQ 0: 0 1 HWQ 1: 2 3 HWQ 2: 4 5 HWQ 3: (none) This patch changes the queue assignment into: HWQ 0: 0 1 HWQ 1: 2 HWQ 2: 3 4 HWQ 3: 5 In other words, this patch has the following three effects: - All four hardware queues are used instead of only three. - CPU cores are spread more evenly over hardware queues. For the above example the range of the number of CPU cores associated with a single HWQ is reduced from [0..2] to [1..2]. - If the number of HWQ's is a multiple of the number of CPU sockets it is now guaranteed that all CPU cores associated with a single HWQ reside on the same CPU socket. Signed-off-by: Bart Van Assche Cc: Jens Axboe Cc: Christoph Hellwig Cc: Ming Lei --- block/blk-mq-cpumap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 1065d7c..8e56455 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -17,7 +17,7 @@ static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues, const int cpu) { - return cpu / ((nr_cpus + nr_queues - 1) / nr_queues); + return cpu * nr_queues / nr_cpus; } static int get_first_sibling(unsigned int cpu) -- 1.8.4.5 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 0/8] IB/srp: Add multichannel support
Hello, Although the SRP protocol supports multichannel operation, although since considerable time RDMA HCA's are available that support multiple completion vectors and although multichannel operation yields better performance than using a single channel, the Linux SRP initiator does not yet support multichannel operation. While adding multichannel support in the SRP initiator I encountered a few challenges of which I think these need wider discussion. The topics I would like invite wider discussion about are as follows: - How to avoid unneeded inter-socket cache traffic. Should the blk-mq layer e.g. assign CPU cores to hardware queues such that all CPU cores associated with a single hardware queue reside on the same CPU socket? (see also patch 1/8) - How to pass the hardware context selected by the block layer to the SCSI LLD queuecommand() callback function ? (see also patches 2/8 and 3/8). - Which approach should a SCSI LLD follow for selection of an MSI-X completion vector to ensure that the interrupt handler is invoked on the same CPU socket as the blk-mq hardware context data structures ? As one can see patch 8/8 relies on the assumption that completion vectors have been spread evenly over CPU sockets. If a HCA e.g. supports eight completion vectors then that means that in a system with two CPU sockets vectors 0-3 are associated with a CPU core on the first CPU socket and vectors 4-7 with a CPU core on the second CPU socket. The patches in this series are: 0001-blk-mq-Use-all-available-hardware-queues.patch 0002-scsi-mq-Add-support-for-multiple-hardware-queues.patch 0003-scsi-mq-Pass-hctx-to-low-level-SCSI-drivers.patch 0004-IB-srp-Move-ib_destroy_cm_id-call-into-srp_free_ch_i.patch 0005-IB-srp-Remove-stale-connection-retry-mechanism.patch 0006-IB-srp-Avoid-that-I-O-hangs-due-to-a-cable-pull-duri.patch 0007-IB-srp-Separate-target-and-channel-variables.patch 0008-IB-srp-Add-multichannel-support.patch Note: a predecessor of this patch series has been used to measure the performance of Christoph's scsi-mq patches that have been merged in kernel version v3.17-rc1. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html