Hi Stijn, It's true that there are subtle bugs in the coherence protocol that seem to appear only when you use a different DRAM model that creates different timings. I spent a fair amount of time a month or two ago to try and fix things up, and I made some progress, but it's hard to fix one subtle bug without introducing another one, and I got busy with other things before I could wrap it up. I hope to get back to it soon. I could send you my patches if you are interested.
This complexity is one of the reasons we're focusing more on Ruby as our long-term memory system model. As far as the particular behavior you are seeing, note that the protocol is configuration-independent, so the UpgradeReq has to be passed through the L2 cache in case there are other L2 or L3 caches that need to be invalidated. Main memory is responsible for responding to the UpgradeReq since that's the point where it globally completes. There's no need for the DRAM controller to access DRAM or impose a corresponding delay though; it could just respond right away. Steve On Mon, Aug 23, 2010 at 7:15 AM, Stijn Eyerman <[email protected]> wrote: > Hi, > > I'm simulating the PARSEC benchmarks using M5 (with the manual and files > supplied by the UTAustin people). Some benchmarks execute without errors in > functional mode, but seem to get stuck in an infinite loop when using timing > simulation. > > One such is dedup with the test input. The cause is that one cpu (i.c. cpu2) > stops dispatching and issuing instructions while it is not finished. > After some days of debugging, I found this to be the cause: > - cpu3 executes a conditional store (SC) > - the cache line is present in its private L1 cache, but the status is > shared, so an UpgradeReq-event is scheduled > - the L1 cache of cpu2 finds the same cache line and invalidates it > - the UpgradeReq is also sent to the shared L2 cache (why?), where it causes > a miss (cache line not present) > - main memory is accessed (why?) and the store cannot continue until after > the memory latency > - in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache line > - since the cacheline was invalidated, it accesses the bus > - the L1 cache of cpu3 detects this load request, finds an mshr that waits > on the return of data for that cache line (the UpgradeReq), attaches the > request to that mshr's targets, and inhibits the L2 access for that memory > operation (since it will be served by the coherence protocol) > - when the memory has served the UpgradeReq from cpu3, the SC on cpu3 can > continue > - the mshr finds another target (the LL from cpu2) but deletes it because > the cache line is not dirty > --> see Cache::handleSnoop in cache_impl.hh: > bool respond = blk->isDirty() && pkt->needsResponse(); > ... > if (respond){ > ... > } > else if (is_timing && is_deferred) { > delete pkt; > } > - the status of the LL in cpu2 remains issued, but since the request is > deleted, no answer returns, and the cpu blocks forever > > It is probably worth to note that I use the DRAM module to simulate physical > memory. I've seen the warning that it is not tested with the current memory > model, but as far as I can deduce, this is not the cause of the error (it > just calculates and returns the memory latency). > > Can someone help me with this (complicated) problem? > > Thanks! > > Stijn > > -- > dr. ir. Stijn Eyerman > > Ghent University > Electronics and Information Systems Department > Sint-Pietersnieuwstraat 41 > 9000 Gent > Belgium > > t: +32 9 264 3456 > f: +32 9 264 3594 > e: [email protected] > w: http://www.elis.UGent.be/~seyerman/ > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users > _______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
