Jonathan S. Shapiro wrote:
> There are good *performance* reasons to zero at allocation time: it
> eliminates memory reads (CLZ == Cache Line Zero).
Indeed. This can be quite a boost. The reason is that if you just start
writing to memory, then the following must happen:
fetch cache line
merge writes into cache line
finally evict cache line back to memory
When using DCBZ (data cache block set to zero) the memory system does
not need to fetch the memory, hence reducing memory traffic in half.
Intel does this slightly differently, using their write combining
feature. Essentially, due to its long pipeline it has write-combining
buffers that ensure that if you write to all areas of a cache-line, the
cache will never be read. The only thing to remember is to avoid writing
to more than N (N = 4 typically) different memory areas at the same
time. See e.g.:
http://software.intel.com/en-us/articles/write-combining-store-buffers-on-hyper-threading-technology-enabled-systems
Thanks,
PKE.
--
Pål-Kristian Engstad ([email protected]),
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.
"Emacs would be a far better OS if it was shipped with
a halfway-decent text editor." -- Slashdot, Dec 13. 2005.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev