On 27 January 2012 02:55, Xin Tong <xerox.time.t...@gmail.com> wrote: > I think intel new architecture does split instruction cache/data cache. > http://upload.wikimedia.org/wikipedia/commons/6/64/Intel_Nehalem_arch.svg
It may have a separate I/D cache in the implementation, but from the programmer's point of view they are unified (ie the hardware will be maintaining coherency between the two caches), because the x86 architecture requires this. > But I do not know what kind of inconsistency you refer to if the icache and > dcache are split. can you please give an example. Basic example: suppose we have a small function at address 0x1000 and another at 0x1010 (so they are in the same cache line), and a really simple set up with an L1 ICache, L2 DCache and main memory. * we execute the function at 0x1000 -- this pulls the cache line into the ICache * we then modify the code in the function at 0x1010: this is going to be a read, modify, write, which will pull the cache line into the DCache. However when we write the new code this change will just sit in the DCache. * so now we have a copy of the old code in the ICache, and the new version in the DCache. At this point the caches are incoherent, and if we just tried to call the function at 0x1010 we'd be executing the wrong code * on ARM, to correct this the program has to perform explicit cache maintenance operations: * 1. clean the DCache: this forces 'dirty' lines in the DCache to be written out, in this case to main memory * 2. invalidate the ICache: this causes the ICache to forget the old, stale cached data it holds, so the next access will reload the ICache from main memory * now if we call the function at 0x1010 it will see the changed code that we wrote The x86 architecture doesn't need the cache maintenance operations because it requires the hardware to deal with it (for instance, by having the ICache "snoop" writes to the DCache and automatically invalidate any lines it has that are written to) so it can't get into an incoherent state. CPU architectures that require explicit cache maintenance for self-modifying code are I think more common than ones like x86 which don't. (x86 is basically forced to be this way to maintain backwards compatibility with old code written before there were any caches for x86.) What this means in practice for TCG is that once we've written out some code we need to call flush_icache_range() for that memory. The x86 implementation of that function is just a no-op. -- PMM