On Mon, Aug 2, 2010 at 3:08 PM, Vladislav Bolkhovitin <v...@vlnb.net> wrote:
>
> Bart Van Assche, on 08/02/2010 12:15 PM wrote:
>>
>> SRP I/O with small block sizes causes a high CPU load. Processing IB
>> completions on the context of a kernel thread instead of in interrupt context
>> allows to process up to 25% more I/O operations per second. This patch does
>> add a kernel parameter 'thread' that allows to specify whether to process IB
>> completions in interrupt context or in kernel thread context. Also, the IB
>> receive notification processing loop is rewritten as proposed earlier by 
>> Ralph
>> Campbell (see also https://patchwork.kernel.org/patch/89426/). As the
>> measurement results below show, rewriting the IB receive notification
>> processing loop did not have a measurable impact on performance. Processing
>> IB receive notifications in thread context however does have a measurable
>> impact: workloads with I/O depth one are processed at most 10% slower and
>> workloads with larger I/O depths are processed up to 25% faster.
>>
>> block size  number of    IOPS        IOPS      IOPS
>>  in bytes    threads     without     with      with
>>   ($bs)     ($numjobs)  this patch  thread=n  thread=y
>>    512           1        25,400      25,400    23,100
>>    512         128       122,000     122,000   153,000
>>   4096           1        25,000      25,000    22,700
>>   4096         128       122,000     121,000   157,000
>>  65536           1        14,300      14,400    13,600
>>  65536           4        36,700      36,700    36,600
>> 524288           1         3,470       3,430     3,420
>> 524288           4         5,020       5,020     4,990
>>
>> performance test used to gather the above results:
>>   fio --bs=${bs} --ioengine=sg --buffered=0 --size=128M --rw=read \
>>       --thread --numjobs=${numjobs} --loops=100 --group_reporting \
>>       --gtod_reduce=1 --name=${dev} --filename=${dev}
>> other ib_srp kernel module parameters: srp_sg_tablesize=128
>
> How about results of "dd Xflags=direct" in different modes to find out the 
> lowest
> latency the driver can process 512 and 4K packets? Sorry, I don't trust fio, 
> when
> it comes to precise latency measurements.

It would be interesting to compare such results, but unfortunately, dd
does not provide a way to perform I/O from multiple threads
simultaneously. I have tried to run multiple dd processes in parallel,
but that resulted in much lower IOPS results than a comparable
multithreaded fio test.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to