On Tue, Sep 20, 2011 at 6:36 PM, Josh Simmons <simmons...@gmail.com> wrote:
> On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileh...@lycos.com> wrote:
>>
>> My version with bsr is faster.
>>
>> Bye,
>> bearophile
>>
>
> Is that science or guessing?
>
> My horribly unscientific test shows the opposite to be true, I'm
> looking over the assembly output to see if there's an extraneous
> factor.
>

Ah, when the one I gave was slower it wasn't being unrolled by gcc,
when yours was slower my trivial loop was being vectorised. When
they're both handled the same the results are the same not favoring
either.

Cool.

I do like that the propagate-right version can be vectorised though.

Reply via email to