Nuno Diegues <n...@ist.utl.pt> writes:

What workloads did you test this on?

> +static inline float fastLog(float x)
> +{
> +  union { float f; uint32_t i; } vx = { x };
> +  float y = vx.i;
> +  y *= 8.2629582881927490e-8f;
> +  return y - 87.989971088f;
> +}
> +
> +static inline float fastSqrt(float x)
> +{
> +  union
> +  {
> +    int i;
> +    float x;
> +  } u;
> +
> +  u.x = x;
> +  u.i = (1<<29) + (u.i >> 1) - (1<<22);
> +  return u.x;
> +}

Are you sure you need floating point here? If the program does not 
use it in any other ways faulting in the floating point state can be
quite expensive. 

I bet fixed point would work for such simple purposes too.
> +  serial_lock.read_unlock(tx);
> +
> +  // Obtain the delta performance with respect to the last period.
> +  uint64_t current_cycles = rdtsc();
> +  uint64_t cycles_used = current_cycles - optimizer.last_cycles;

It may be worth pointing out that rdtsc does not return cycles.
In fact the ratio to real cycles is variable depending on the changing 
frequency.
I hope your algorithms can handle that.
> +
> +  // Compute gradient descent for the number of retries.
> +  double change_for_better = current_throughput / optimizer.last_throughput;
> +  double change_for_worse = optimizer.last_throughput / current_throughput;
> +  int32_t last_attempts = optimizer.last_attempts;
> +  int32_t current_attempts = optimizer.optimized_attempts;
> +  int32_t new_attempts = current_attempts;
> +  if (unlikely(change_for_worse > 1.40))
> +    {
> +      optimizer.optimized_attempts = optimizer.best_ever_attempts;
> +      optimizer.last_throughput = current_throughput;
> +      optimizer.last_attempts = current_attempts;
> +      return;
> +    }
> +
> +  if (unlikely(random() % 100 < 1))
> +    {

So where is the seed for that random stored? Could you corrupt some
user's random state? Is the state per thread or global? 
If it's per thread how do you initialize so that they threads do
start with different seeds.
If it's global what synchronizes it?

Overall the algorithm looks very complicated with many mysterious magic
numbers. Are there simplifications possible? While the retry path is not
extremely critical it should be at least somewhat optimized, otherwise
it will dominate the cost of short transactions.

One problems with so many magic numbers is that they may be good on one
system, but bad on another.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only

Reply via email to