On Mon, Aug 2, 2010 at 3:08 PM, Vladislav Bolkhovitin <v...@vlnb.net> wrote: > > Bart Van Assche, on 08/02/2010 12:15 PM wrote: >> >> SRP I/O with small block sizes causes a high CPU load. Processing IB >> completions on the context of a kernel thread instead of in interrupt context >> allows to process up to 25% more I/O operations per second. This patch does >> add a kernel parameter 'thread' that allows to specify whether to process IB >> completions in interrupt context or in kernel thread context. Also, the IB >> receive notification processing loop is rewritten as proposed earlier by >> Ralph >> Campbell (see also https://patchwork.kernel.org/patch/89426/). As the >> measurement results below show, rewriting the IB receive notification >> processing loop did not have a measurable impact on performance. Processing >> IB receive notifications in thread context however does have a measurable >> impact: workloads with I/O depth one are processed at most 10% slower and >> workloads with larger I/O depths are processed up to 25% faster. >> >> block size number of IOPS IOPS IOPS >> in bytes threads without with with >> ($bs) ($numjobs) this patch thread=n thread=y >> 512 1 25,400 25,400 23,100 >> 512 128 122,000 122,000 153,000 >> 4096 1 25,000 25,000 22,700 >> 4096 128 122,000 121,000 157,000 >> 65536 1 14,300 14,400 13,600 >> 65536 4 36,700 36,700 36,600 >> 524288 1 3,470 3,430 3,420 >> 524288 4 5,020 5,020 4,990 >> >> performance test used to gather the above results: >> fio --bs=${bs} --ioengine=sg --buffered=0 --size=128M --rw=read \ >> --thread --numjobs=${numjobs} --loops=100 --group_reporting \ >> --gtod_reduce=1 --name=${dev} --filename=${dev} >> other ib_srp kernel module parameters: srp_sg_tablesize=128 > > How about results of "dd Xflags=direct" in different modes to find out the > lowest > latency the driver can process 512 and 4K packets? Sorry, I don't trust fio, > when > it comes to precise latency measurements.
It would be interesting to compare such results, but unfortunately, dd does not provide a way to perform I/O from multiple threads simultaneously. I have tried to run multiple dd processes in parallel, but that resulted in much lower IOPS results than a comparable multithreaded fio test. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html