> > > BTW, although malloc() returns a pointer "suitable aligned", that
> > > doesn't mean it is fast. The comment refers to the fact that some CPUs
> > > don't like fetching a long (say) except off a 4-byte boundary.
> > "Some CPUs"? It should say "all, but Intel", shouldn't it? And secondly
> > define "like." Even latest Intel CPUs like data being aligned on natural
> > boundaries. And when it comes to floating point data one should probably
> > say "love" instead of "like" as performance penalties of 25%-30% are
> > most common.
> 
> I meant that on some CPUs it is an _error_ to fetch unaligned data.
And it's still "all, but Intel" (soon to become "all, but IA-32":-), not
"some":-)
> 
> > > whereas for performance on an x86 you want to align on a
> > > paragraph boundary (16 bytes?).
> 
> I mean that the x86 cache is divided into 16-byte chunks and they are
> loaded using RAM burst mode starting at the beginning of the paragraph,
Some CPUs have 256 or even 512 bit wide cache lines.
> so access that is aligned on paragraph beginnings is very much faster.
Well, only if you have to access sparse objects that are not larger than
the cache line. When you work on larger buffers (like ones in WWW
applications) such things has diminishing impact. And if one still feels
like cunning optimizations today one uses prefetch instructions offered
by modern CPUs.:-)

Ben, it's getting completely off topic. We probably should cut it off or
make it personal.

Andy.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to