Re: [m5-users] Possible bug with LLSC instructions and coherence protocol?

Steve Reinhardt Mon, 23 Aug 2010 15:27:06 -0700

Hi Stijn,

It's true that there are subtle bugs in the coherence protocol that
seem to appear only when you use a different DRAM model that creates
different timings.  I spent a fair amount of time a month or two ago
to try and fix things up, and I made some progress, but it's hard to
fix one subtle bug without introducing another one, and I got busy
with other things before I could wrap it up.  I hope to get back to it
soon.  I could send you my patches if you are interested.


This complexity is one of the reasons we're focusing more on Ruby as
our long-term memory system model.

As far as the particular behavior you are seeing, note that the
protocol is configuration-independent, so the UpgradeReq has to be
passed through the L2 cache in case there are other L2 or L3 caches
that need to be invalidated.  Main memory is responsible for
responding to the UpgradeReq since that's the point where it globally
completes.  There's no need for the DRAM controller to access DRAM or
impose a corresponding delay though; it could just respond right away.

Steve

On Mon, Aug 23, 2010 at 7:15 AM, Stijn Eyerman
<[email protected]> wrote:
> Hi,
>
> I'm simulating the PARSEC benchmarks using M5 (with the manual and files
> supplied by the UTAustin people). Some benchmarks execute without errors in
> functional mode, but seem to get stuck in an infinite loop when using timing
> simulation.
>
> One such is dedup with the test input. The cause is that one cpu (i.c. cpu2)
> stops dispatching and issuing instructions while it is not finished.
> After some days of debugging, I found this to be the cause:
> - cpu3 executes a conditional store (SC)
> - the cache line is present in its private L1 cache, but the status is
> shared, so an UpgradeReq-event is scheduled
> - the L1 cache of cpu2 finds the same cache line and invalidates it
> - the UpgradeReq is also sent to the shared L2 cache (why?), where it causes
> a miss (cache line not present)
> - main memory is accessed (why?) and the store cannot continue until after
> the memory latency
> - in the meanwhile cpu2 executes a LoadLocked (LL) to the same cache line
> - since the cacheline was invalidated, it accesses the bus
> - the L1 cache of cpu3 detects this load request, finds an mshr that waits
> on the return of data for that cache line (the UpgradeReq), attaches the
> request to that mshr's targets, and inhibits the L2 access for that memory
> operation (since it will be served by the coherence protocol)
> - when the memory has served the UpgradeReq from cpu3, the SC on cpu3 can
> continue
> - the mshr finds another target (the LL from cpu2) but deletes it because
> the cache line is not dirty
>   --> see Cache::handleSnoop in cache_impl.hh:
>                bool respond = blk->isDirty() && pkt->needsResponse();
>                ...
>                if (respond){
>                ...
>                }
>                else if (is_timing && is_deferred) {
>                    delete pkt;
>                }
> - the status of the LL in cpu2 remains issued, but since the request is
> deleted, no answer returns, and the cpu blocks forever
>
> It is probably worth to note that I use the DRAM module to simulate physical
> memory. I've seen the warning that it is not tested with the current memory
> model, but as far as I can deduce, this is not the cause of the error (it
> just calculates and returns the memory latency).
>
> Can someone help me with this (complicated) problem?
>
> Thanks!
>
> Stijn
>
> --
> dr. ir. Stijn Eyerman
>
> Ghent University
> Electronics and Information Systems Department
> Sint-Pietersnieuwstraat 41
> 9000 Gent
> Belgium
>
> t: +32 9 264 3456
> f: +32 9 264 3594
> e: [email protected]
> w: http://www.elis.UGent.be/~seyerman/
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] Possible bug with LLSC instructions and coherence protocol?

Reply via email to