On Tue, Jun 06, 2000 at 09:31:46AM +0900, Daniel C. Sobral wrote:
> > > > > Alternative A:
> > > > > 
> > > > >       x = table[i].x;
> > > > >       y = table[i].y;
> > > > > 
> > > > > Alternative B:
> > > > > 
> > > > >       d = table[i];
> > > > >       x = d & MASK;
> > > > >       y = d >> SHIFT;
> > > > 
> > > > Alternative A should be much faster. The compiler should be smart 
> [stuff about d being a structure]
> 
> It isn't.

Ah, I didn't realize you have freedom of changing table[i]'s type 
between implementations .

Okay, I change my mind then. B is better. I ran a quick test with -O3
on i386. What happens in A is that it transfers 32-bit values anyway,
but isn't smart enough to do it only once. So it accesses *(table+i*2),
and then *(table+2+i*2),  both accesses taking one instruction (and
i*2 sitting precomputed in a register). It puts one in eax, stores ax
away, then puts the other in eax, and stores ax away.

In B, it accesses (*table+i*2) once, puts it in eax, stores ax away,
rotates eax, stores ax away. Rotation should win over memory access
even if it goes through cache, especially considering the memory
access has a constant displacement inside the instrution.

If you test it, be sure to declare x and y volatile, otherwise you'll
the hardest time getting gcc from keeping them in registers. Don't use
a constant i, or it'll precompute addresses, etc. Use -O3 -g -S,
and .stabs entries in the assembly file will mark line boundaries in
source.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to