It does. I ran a test here earlier today on the cost of mis-aligned data 
access, and I figured the results would be of interest to folks in general.

A few caveats--this test was run on a lightly loaded Compaq TurboLaser with 
6 700MHz EV6 processors and 16G of memory. There was no swapping, and the 
memory was pre-initialized to a known value (all bytes 0x01) before the 
test was run to make sure there weren't any cache preloading issues (though 
with 100M elements I think it's safe to assume I blew cache quite 
handily... :). The backplane bandwidth on this beast is somewhere around 
2.5G/sec if I remember right. The machine design is reasonably old as these 
things go--a generation or two old--so don't go using this data to go 
evaluating new hardware or anything. The numbers are for relative 
comparisons only.

The code being timed does this:

   foo = mem;
   start_ticks = times(&start);
   for(index = 0; index < top_index; index++) {
     total += foo[index];
   }
   end_ticks = times(&finish);

with the array being either 8, 16, 32, or 64 bit integers, with index and 
total unconditionally set to 64-bit integers.

The only difference between the aligned and unaligned runs is the pointer 
to the aligned data is on an 8-byte boundary, and the unaligned data is the 
aligned pointer plus 1.

The results from multiple runs varied a bit, but the time differences 
between aligned and unaligned access was pretty much the same.

   Aligned access
   int8 took 96 (96 elapsed) for 100000000 elements
   int16 took 175 (189 elapsed) for 100000000 elements
   int32 took 177 (194 elapsed) for 100000000 elements
   int64 took 192 (211 elapsed) for 100000000 elements

   Unaligned access
   int8 took 93 (92 elapsed) for 100000000 elements
   int16 took 218 (218 elapsed) for 100000000 elements
   int32 took 216 (216 elapsed) for 100000000 elements
   int64 took 3123 (3157 elapsed) for 100000000 elements

The moral? Align your 64-bit data. :) And don't tune for your host, because 
when I told the compiler to generate host-specific code, the 16 and 32 bit 
numbers got worse by a factor of 10. For those that want numbers, the 
penalties generally are:

8-bit:  -3.125%
16-bit: 24.5%
32-bit: 22%
64-bit: 1626.5%

(No, I don't know why unaligned access to 8-bit data is faster, but there 
you go)

What does this mean for perl? Probably not a whole lot, since we deal 
mostly with 8-bit character data. It does illustrate that it really *is* 
worth keeping alignment issues in mind when designing data structures. 
(While the compiler will, presumably, generate aligned structure members, 
that doesn't mean that dynamically generated arrays of them will be 
properly aligned...)

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to