Nuno Diegues <n...@ist.utl.pt> writes: What workloads did you test this on?
> +static inline float fastLog(float x) > +{ > + union { float f; uint32_t i; } vx = { x }; > + float y = vx.i; > + y *= 8.2629582881927490e-8f; > + return y - 87.989971088f; > +} > + > +static inline float fastSqrt(float x) > +{ > + union > + { > + int i; > + float x; > + } u; > + > + u.x = x; > + u.i = (1<<29) + (u.i >> 1) - (1<<22); > + return u.x; > +} Are you sure you need floating point here? If the program does not use it in any other ways faulting in the floating point state can be quite expensive. I bet fixed point would work for such simple purposes too. > + serial_lock.read_unlock(tx); > + > + // Obtain the delta performance with respect to the last period. > + uint64_t current_cycles = rdtsc(); > + uint64_t cycles_used = current_cycles - optimizer.last_cycles; It may be worth pointing out that rdtsc does not return cycles. In fact the ratio to real cycles is variable depending on the changing frequency. I hope your algorithms can handle that. > + > + // Compute gradient descent for the number of retries. > + double change_for_better = current_throughput / optimizer.last_throughput; > + double change_for_worse = optimizer.last_throughput / current_throughput; > + int32_t last_attempts = optimizer.last_attempts; > + int32_t current_attempts = optimizer.optimized_attempts; > + int32_t new_attempts = current_attempts; > + if (unlikely(change_for_worse > 1.40)) > + { > + optimizer.optimized_attempts = optimizer.best_ever_attempts; > + optimizer.last_throughput = current_throughput; > + optimizer.last_attempts = current_attempts; > + return; > + } > + > + if (unlikely(random() % 100 < 1)) > + { So where is the seed for that random stored? Could you corrupt some user's random state? Is the state per thread or global? If it's per thread how do you initialize so that they threads do start with different seeds. If it's global what synchronizes it? Overall the algorithm looks very complicated with many mysterious magic numbers. Are there simplifications possible? While the retry path is not extremely critical it should be at least somewhat optimized, otherwise it will dominate the cost of short transactions. One problems with so many magic numbers is that they may be good on one system, but bad on another. -Andi -- a...@linux.intel.com -- Speaking for myself only