> Here are some results for different CPU (time measured by encoding > 100000 sectors with non zero data including scrambling): > > I'll put the new lec.cc sources to the patches section of SF.
Why d = *p_lsb; p0_lsb ^= (*coeffs0)[d]; p1_lsb ^= (*coeffs1)[d]; d = *p_msb; p0_msb ^= (*coeffs0)[d]; p1_msb ^= (*coeffs1)[d]; coeffs0++; coeffs1++; p_lsb += 2 * 43; p_msb += 2 * 43; and not (provided that coeffs01 is a pointer to array of pointers to [256][2] matrices) d = *p_lsb; p0_lsb ^= (*coeffs01)[d][0]; p1_lsb ^= (*coeffs01)[d][1]; d = *p_msb; p0_msb ^= (*coeffs01)[d][0]; p1_msb ^= (*coeffs01)[d][1]; coeffs01++; p_lsb += 2 * 43; p_msb += 2 * 43; or even (povided that short_coeffs01 is [originally] a pointer to a [43][256] matrix of shorts) d0 = *p_lsb; d1 = *(p_lsb+1); short_p01_lsb ^= short_coeffs01[d0]; short_p01_msb ^= short_coeffs01[d1]; short_coeffs01+=43; p_lsb += 2 * 43; I.e. gentler on cache and from 8 loads down to 4. Then point with last variant is that it requires not more that 7 registers which perfectly fits IA-32 bank. Another way to loosen up compiler optimization would be to declare tables as const. This implies that tables has to be wrapped into classes [as const instances can be initialized from class constructors only]. This would also obsolete lec_init(). Is it OK like this or should I submit working code? Cheers. A. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]