Jonathan S. Shapiro wrote:
> There are good *performance* reasons to zero at allocation time: it 
> eliminates memory reads (CLZ == Cache Line Zero).
Indeed. This can be quite a boost. The reason is that if you just start 
writing to memory, then the following must happen:

    fetch cache line
    merge writes into cache line
    finally evict cache line back to memory

When using DCBZ (data cache block set to zero) the memory system does 
not need to fetch the memory, hence reducing memory traffic in half.

Intel does this slightly differently, using their write combining 
feature. Essentially, due to its long pipeline it has write-combining 
buffers that ensure that if you write to all areas of a cache-line, the 
cache will never be read. The only thing to remember is to avoid writing 
to more than N (N = 4 typically) different memory areas at the same 
time. See e.g.: 
http://software.intel.com/en-us/articles/write-combining-store-buffers-on-hyper-threading-technology-enabled-systems

Thanks,

PKE.

-- 
Pål-Kristian Engstad ([email protected]), 
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.

"Emacs would be a far better OS if it was shipped with 
 a halfway-decent text editor." -- Slashdot, Dec 13. 2005.



_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to