Hi Tim,
in fact I was trying the OR-alternative -- however, it's only a win on older 
AMD Opterons (16 cycles vs. 20), but cannot beat the __builtin_clz alternative 
on Intel.

Best regards,
Rainer



On Wednesday 12 October 2011 11:26:52 Tim Mattox wrote:
> All,
> If you wanted to speedup these routines for processors without
> __builtin_clz, there are a variety of variations in C to implement clz
> efficiently. See Hacker's Delight nlz (number of leading zeros):
> http://www.hackersdelight.org/HDcode/nlz.c.txt
> 
> Or from my Ph.D. advisor's magic algorithm's page:
> http://aggregate.org/MAGIC/#Leading%20Zero%20Count
> 
> And you can directly implement opal_next_poweroftwo()
> with this:
> http://aggregate.org/MAGIC/#Next%20Largest%20Power%20of%202
> 
> The Hacker's Delight webpage (and book) are fun to read for that
> certain kind of person. :-)
> http://www.hackersdelight.org/

Reply via email to