> Here are some results for different CPU (time measured by encoding
> 100000 sectors with non zero data including scrambling):
> 
> I'll put the new lec.cc sources to the patches section of SF.

Why

      d = *p_lsb;

      p0_lsb ^= (*coeffs0)[d];
      p1_lsb ^= (*coeffs1)[d];

      d = *p_msb;

      p0_msb ^= (*coeffs0)[d];
      p1_msb ^= (*coeffs1)[d];

      coeffs0++;
      coeffs1++;

      p_lsb += 2 * 43;
      p_msb += 2 * 43;

and not (provided that coeffs01 is a pointer to array of pointers to
[256][2] matrices)

      d = *p_lsb;

      p0_lsb ^= (*coeffs01)[d][0];
      p1_lsb ^= (*coeffs01)[d][1];

      d = *p_msb;

      p0_msb ^= (*coeffs01)[d][0];
      p1_msb ^= (*coeffs01)[d][1];

      coeffs01++;

      p_lsb += 2 * 43;
      p_msb += 2 * 43;

or even (povided that short_coeffs01 is [originally] a pointer to a
[43][256] matrix of shorts)

      d0 = *p_lsb;
      d1 = *(p_lsb+1);

      short_p01_lsb ^= short_coeffs01[d0];
      short_p01_msb ^= short_coeffs01[d1];

      short_coeffs01+=43;

      p_lsb += 2 * 43;

I.e. gentler on cache and from 8 loads down to 4. Then point with last
variant is that it requires not more that 7 registers which perfectly
fits IA-32 bank.

Another way to loosen up compiler optimization would be to declare
tables as const. This implies that tables has to be wrapped into classes
[as const instances can be initialized from class constructors only].
This would also obsolete lec_init().

Is it OK like this or should I submit working code?

Cheers. A.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to