On Jun 26, 2014, at 12:38 PM, James Bottomley 
<james.bottom...@hansenpartnership.com> wrote:

> On June 26, 2014 11:41:48 AM EDT, "Atchley, Scott" <atchle...@ornl.gov> wrote:
>> On Jun 26, 2014, at 10:55 AM, James Bottomley
>> <james.bottom...@hansenpartnership.com> wrote:
>> 
>>> On Thu, 2014-06-26 at 16:53 +0200, Bart Van Assche wrote:
>>>> On 06/11/14 11:09, Sagi Grimberg wrote:
>>>>> + return xfer_len + (xfer_len >> ilog2(sector_size)) * 8;
>>>> 
>>>> Sorry that I just noticed this now, but why is a shift-right and
>> ilog2()
>>>> used in the above expression instead of just dividing the transfer
>>>> length by the sector size ?
>>> 
>>> It's a performance thing.  Division is really slow on most CPUs.
>>> However, we know the divisor is a power of two so we re-express the
>>> division as a shift, which the processor can do really fast.
>>> 
>>> James
>> 
>> I have done this in the past as well, but have you benchmarked it?
>> Compilers typically do the right thing in this case (i.e replace
>> division with shift).
> 
> The compiler can only do that for values which are reducible to constants at 
> compile time. This is a runtime value, the compiler has no way of deducing 
> that it will be a power of 2
> 
> James

You're right, I should have said runtime.

However, gcc on Intel seems to choose the right algorithm at runtime. On a 
trivial app with -O0, I see the same performance for shift and division if the 
divisor is a power of two. Is see ~38% penalty if the divisor is not a power of 
2. With -O3, shift is faster than division by about ~17% when the divisor is a 
power of two.

Scott--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to