It does. I ran a test here earlier today on the cost of mis-aligned data
access, and I figured the results would be of interest to folks in general.
A few caveats--this test was run on a lightly loaded Compaq TurboLaser with
6 700MHz EV6 processors and 16G of memory. There was no swapping, and the
memory was pre-initialized to a known value (all bytes 0x01) before the
test was run to make sure there weren't any cache preloading issues (though
with 100M elements I think it's safe to assume I blew cache quite
handily... :). The backplane bandwidth on this beast is somewhere around
2.5G/sec if I remember right. The machine design is reasonably old as these
things go--a generation or two old--so don't go using this data to go
evaluating new hardware or anything. The numbers are for relative
comparisons only.
The code being timed does this:
foo = mem;
start_ticks = times(&start);
for(index = 0; index < top_index; index++) {
total += foo[index];
}
end_ticks = times(&finish);
with the array being either 8, 16, 32, or 64 bit integers, with index and
total unconditionally set to 64-bit integers.
The only difference between the aligned and unaligned runs is the pointer
to the aligned data is on an 8-byte boundary, and the unaligned data is the
aligned pointer plus 1.
The results from multiple runs varied a bit, but the time differences
between aligned and unaligned access was pretty much the same.
Aligned access
int8 took 96 (96 elapsed) for 100000000 elements
int16 took 175 (189 elapsed) for 100000000 elements
int32 took 177 (194 elapsed) for 100000000 elements
int64 took 192 (211 elapsed) for 100000000 elements
Unaligned access
int8 took 93 (92 elapsed) for 100000000 elements
int16 took 218 (218 elapsed) for 100000000 elements
int32 took 216 (216 elapsed) for 100000000 elements
int64 took 3123 (3157 elapsed) for 100000000 elements
The moral? Align your 64-bit data. :) And don't tune for your host, because
when I told the compiler to generate host-specific code, the 16 and 32 bit
numbers got worse by a factor of 10. For those that want numbers, the
penalties generally are:
8-bit: -3.125%
16-bit: 24.5%
32-bit: 22%
64-bit: 1626.5%
(No, I don't know why unaligned access to 8-bit data is faster, but there
you go)
What does this mean for perl? Probably not a whole lot, since we deal
mostly with 8-bit character data. It does illustrate that it really *is*
worth keeping alignment issues in mind when designing data structures.
(While the compiler will, presumably, generate aligned structure members,
that doesn't mean that dynamically generated arrays of them will be
properly aligned...)
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk